March diversion, in lieu of watching any basketball. This year Kaggle (sponsored by HP) is
again offering modest prizes for predicting probabilities of the NCAA Tournament, and this
year they provide somewhat better data: 30 years of past seasonal results as well as the tournament
outcome data.
I gave a talk about my technology for this a week or so ago in the Stat Department, slides for
that talk are linked here: http://www.econ.uiuc.edu/~roger/research/bracketology/MM.html
A question that puzzled me last year was whether it might help to shrink the model's predicted
probabilities toward 1/2. In an effort to explore this, I estimated models and predicted tournament
probabilities for each of the last 30 years. A naive shrinkage rule was employed which simply
took the model's original probabilities as if they were based on the frequency of successes out
of 20 trials, and used a Beta(a,a) prior to do the shrinkage. This yields a linear shrinkage of
modest amout for a taking values 0, 1, ... , 5, where 0 corresponds to no shrinkage at all.
The results are rather mixed. In the figure below I plot the logistic loss achieved for each
year for the various levels of a, with the black line representing no shrinkage and the various
colored lines the various degrees of shrinkage. In 2/3 of the years no shrinkage is best. In
one year, 2011, in which the model is most disastrous, maximal shrinkage is optimal. Comparison
of the mean performance over the whole 30 year span yields:
a = 0 a = 1 a = 2 a = 3 a = 4 a = 5
0.5403810 0.5419280 0.5464682 0.5515166 0.5566610 0.5617073
So, it would appear that shrinkage just tends to make the predictions more bland without really
helping much at all.
No comments:
Post a Comment