Da Void of Meaning

Sunday, November 15, 2015

Requirements for Academic Work

Inquiring minds often wanted know: What sort of qualifications were necessary for academic
positions? Finally we have an answer from the University of Arkansas. In addition to passing
background checks for prior criminal records one must be capable of:

Sedentary Work - Exerting 10 pounds: Occasionally, Light Work - Exerting up to 20 pounds: Occasionally, Kneeling: Occasionally, Climbing (Stairs, Ladders, etc.): Occasionally, Lifting 5 -10 lbs: Occasionally, Lifting 10-25 lbs: Occasionally, Carrying 5-10 lbs: Occasionally, Pushing/pulling 10-25 lbs: Occasionally, Sitting for long periods of time: Occasionally, Standing for long periods of time: Occasionally, Speaking; Essential, Hearing: Essential, Vision: Ability to distinguish similar colors, depth perception, close vision: Essential, Walking - Short Distances: Frequently, Walking - Moderate Distances; Occasionally.

Friday, November 6, 2015

Ab Uno Disce Omnes

Virgil's dictum: "From one example, all is revealed," would seem to be the very antithesis of
statistical thinking, so it seems surprising to discover that a single observation is enough to
construct a confidence interval for the mean in the standard Gaussian model with unknown
mean and variance. The construction is described below and comes from class notes of Charles Stein,related to me by Steve Portnoy. Any further information about the origins of this problem
or its solution would be most welcome.

Details in html
Details in pdf

Thursday, October 15, 2015

Pricing the Internet

More or less by accident I ran across http://www.dtc.umn.edu/~odlyzko/doc/smart.pricing.pdf
an analysis of prospects for internet pricing by Andrew Odlyzko, an old Bell Labs hand now
at the U. of Minnesota. He makes the case against congestion pricing, arguing that consumers
and perhaps business as well prefer simple flat rates, i.e. usage independent pricing and that
technology is capable of expanding to meet this preference. Having argued the other side
of this case in http://comjnl.oxfordjournals.org/content/27/1/8.full.pdf long ago, I was a bit
surprised, and am still not entirely persuaded. Maybe this only reflects that I haven't fully
accepted recent developments in behavioral economics?

Wednesday, April 8, 2015

Amsterdam Econometrics Games 2015

This year for the first time we sent a UIUC team
to the Econometric Games in Amsterdam. This is a 3 day competition among
30 Universities; there is a preliminary round to select 10 finalists and then a
second round to determine the three prize winners. The topic this year was
Longevity and Longevity Risk. Details of this year's competition are available
here http://www.econ.uiuc.edu/~roger/EGames/Case1.pdf and
here http://www.econ.uiuc.edu/~roger/EGames/Case2.pdf

Our team: Nicolas Bottan, Cesare Buiatti, Julia Gonzalez and Jiaying Gu
placed second in this competition outperforming perennial favorites like
Copenhagen, Aarhus, Amsterdam, Carlos III as well as Oxford, Cambridge
and Harvard. Our team deserves a ticker tape parade when they return
this weekend, unfortunately ticker tape is scarce these days. So we will have
a celebration of their remarkable achievement on April 17th as part of the
Fourth Annual Boneyard Conference of Econometrics. Details here:
http://www.econ.uiuc.edu/~roger/seminar/Boneyard15.pdf

Monday, March 16, 2015

So the time has come put your brackets on the table, here is a random realization of a bracket from
the QBracketology model complete with scores, and survival curves for the various teams based on
1000 realizations of the Tournament. As you can see, not surprisingly, Kentucky is an overwhelming
favorite, with a probability of 0.407 of winning. Only time will tell whether the estimated probabilities of the model are competitive in the Kaggle competition.

Sunday, March 1, 2015

Mea Copula Redux

Never one to leave bad enough alone, I've yet again returned to Quantile Bracketology as a
March diversion, in lieu of watching any basketball. This year Kaggle (sponsored by HP) is
again offering modest prizes for predicting probabilities of the NCAA Tournament, and this
year they provide somewhat better data: 30 years of past seasonal results as well as the tournament
outcome data.

I gave a talk about my technology for this a week or so ago in the Stat Department, slides for
that talk are linked here: http://www.econ.uiuc.edu/~roger/research/bracketology/MM.html

A question that puzzled me last year was whether it might help to shrink the model's predicted
probabilities toward 1/2. In an effort to explore this, I estimated models and predicted tournament
probabilities for each of the last 30 years. A naive shrinkage rule was employed which simply
took the model's original probabilities as if they were based on the frequency of successes out
of 20 trials, and used a Beta(a,a) prior to do the shrinkage. This yields a linear shrinkage of
modest amout for a taking values 0, 1, ... , 5, where 0 corresponds to no shrinkage at all.
The results are rather mixed. In the figure below I plot the logistic loss achieved for each
year for the various levels of a, with the black line representing no shrinkage and the various
colored lines the various degrees of shrinkage. In 2/3 of the years no shrinkage is best. In
one year, 2011, in which the model is most disastrous, maximal shrinkage is optimal. Comparison
of the mean performance over the whole 30 year span yields:

   a =  0    a =  1    a =  2    a =  3    a =  4    a =  5 

0.5403810 0.5419280 0.5464682 0.5515166 0.5566610 0.5617073 

So, it would appear that shrinkage just tends to make the predictions more bland without really

helping much at all.

Thursday, March 20, 2014

Mea Copula

The usual craziness last Sunday drove me to consider updating the bracketology methods that Gib
Bassett and I described here:
http://www.tandfonline.com/doi/abs/10.1198/jbes.2009.07093#.Uys1PVxTq6Q

Since the Intel (Kaggle.com) contest had reasonable looking data structures, it was reasonably
straightforward to modify my old R software to do 2013-14 version of what we had done
earlier for 2003-4. The Kaggle competition wanted submissions that made probability estimates
for all 2278 possible match ups for this years tournament. Our approach was to estimate a
pairwise comparison model for team scores for the entire pre-tournament season for a relatively
fine grid of tau values -- by (what else?) quantile regression. The resulting QR models have a
design matrix that is 10724 by 704, but are very sparse so on a grid 1:199/200 the estimation
can all be done in about a minute on my desktop machine. Then comes the fun of simulating
tournament brackets, and estimating probabilities. For any pairing, ij, we have an estimated
quantile function for team i's score and another estimated quantile function for team j's score.
Yesterday afternoon as the deadline for the Intel contest loomed closer and closer, I lost focus
and decided to compute probability estimates based on what one might call the comonotonic
copula -- that is under the assumption that if in our hypothetical game team i achieves quantile
tau in its scoring performance, then team j will also achieve quantile tau performance in this
game. Thus, if team i's quantile function lies above the quantile function for team j, the
predicted probability will be 1, that is we will assert that team i will always beat team j.
But clearly team i might have a bad day, when team j has a good one, so another limiting
option would be to consider an independent copula: each team gets an independent draw
from a uniform that then gets plugged into their quantile function. Various compromises then
suggest themselves. In the JBES paper we used a Frank copula model that implied a Kendall
correlation of about 0.27, so mild positive correlation between performance of the two teams.
In contrast to the initial probability estimates with the comonotonic model that produced roughly
half of the 2278 phats at 0 or 1, the Frank copula model produced a much more uniform distribution
of them, as illustrated by the histogram below. Unfortunately, this didn't occur to me until after
the deadline passed for the Intel submissions.