Monday, March 16, 2015

So the time has come put your brackets on the table, here is a random realization of a bracket from
the QBracketology model complete with scores, and survival curves for the various teams based on
1000 realizations of the Tournament.  As you can see, not surprisingly, Kentucky is an overwhelming
favorite, with a probability of 0.407 of winning.  Only time will tell whether the estimated probabilities of the model are competitive in the Kaggle competition.

Sunday, March 1, 2015

Mea Copula Redux

Never one to leave bad enough alone,  I've yet again returned to Quantile Bracketology as a
March diversion,  in lieu of watching any basketball.  This year Kaggle (sponsored by HP) is
again offering modest prizes for predicting probabilities of the NCAA Tournament, and this
year they provide somewhat better data:  30 years of past seasonal results as well as the tournament
outcome data.

I gave a talk about my technology for this a week or so ago in the Stat Department, slides for
that talk are linked here:  http://www.econ.uiuc.edu/~roger/research/bracketology/MM.html

A question that puzzled me last year was whether it might help to shrink the model's predicted
probabilities toward 1/2.  In an effort to explore this, I estimated models and predicted tournament
probabilities for each of the last 30 years.  A naive shrinkage rule was employed which simply
took the model's original probabilities as if they were based on the frequency of successes out
of 20 trials, and used a Beta(a,a) prior to do the shrinkage.  This yields a linear shrinkage of
modest amout for a taking values 0, 1, ... , 5,  where 0 corresponds to no shrinkage at all.
The results are rather mixed.  In the figure below I plot the logistic loss achieved for each
year for the various levels of a, with the black line representing no shrinkage and the various
colored lines the various degrees of shrinkage.  In 2/3 of the years no shrinkage is best.  In
one year, 2011, in which the model is most disastrous, maximal shrinkage is optimal.  Comparison
of the mean performance over the whole 30 year span yields:

   a =  0    a =  1    a =  2    a =  3    a =  4    a =  5 
0.5403810 0.5419280 0.5464682 0.5515166 0.5566610 0.5617073 

So, it would appear that shrinkage just tends to make the predictions more bland without really
helping much at all.