Tuesday, November 16, 2021

Hansen's Gauss-Markov Theorem


 

Edgeworth's 1920 paper "The Element of Chance in Competitive Examinations" mocks excessive reliance on "reasoning with the aid of the gens d'arme's hat -- from which as from a conjuror's, so much can be extracted."  In this spirit Bruce Hansen's recent paper, "A Modern Gauss-Markov Theorem" argues that the econometrics slogan  "OLS is BLUE" can be modified to "OLS is BUE", that is that we need not restrict attention to linear estimators, OLS can be considered minimum variance unbiased in a suitable class of more general regression models.  

Since I'm thanked in the acknowledgments, I thought it might be prudent to make explicit a few reservations I have about Bruce's version of the GMT.  Here then is my unexpurgated original comment on an earlier draft of the paper.

Bruce,

I hope that you won’t mind an unsolicited comment on your recent Gauss-Markov paper.  I was wandering around somewhat aimlessly yesterday looking for recent work on model averaging for a refereeing task, and it attracted my attention.  (Spoiler alert:  I’ve always hated the GM Thm since it seemed to restrict attention to such a small class of estimators that its optimality claim was nearly vacuous.)

There is of course the (Rao?) result that ols is MVUE in the Gaussian linear model, but you want to say something much stronger, that it is MVUE in a much bigger class of models, but then the qualifiers become critical.  I think that I understand where you are coming from, but I wonder whether you might be misleading the youth of econometrics by the way that you develop the argument.  Your “for all F in calF_2”  is quite strong.  Of course median regression can be much more efficient than mean regression in iid error linear models and both are unbiased when the errors are symmetric.  When errors are iid and not symmetric then median regression is biased, but only the intercept is biased, the slope parameters are still potentially much more efficient than the mean regression estimates would be. Here, I don’t mean to suggest that there is anything special about median regression — a plethora of other estimators would serve as well.  There is merit, I concede, in the idea that “if you want to estimate a mean you should use the sample mean, etc” — I’ve heard this from Lars several times, but on the other side of the argument there is the infamous Bahadur-Savage result that the mean is never identified, in the sense that slight perturbations of the tails of the population distribution can make it bounce around arbitrarily. Of course, this depends upon what “slight” might mean.  Your “for all F…” condition and unbiasedness for any linear contrast gets us back very close to requiring linearity, it seems.

The paper is fine, I just think that it might need a surgeon general’s warning of some sort.

Best

Roger

PS.  The photo, taken recently at the Museum of the History of Paris (Carnavalet), depicts a metal, Napoleanic era, hat of the type that Edgeworth presumably had in mind.


PPS (added March 12 2022)  There are now two papers circulating one by Steve Portnoy and the other by Benedikt Potscher showing that the unbiasedness condition of Hansen admits only linear estimators, so my "very close" in the note above could be strengthened a bit.  It also occurred to me after writing the original message that the 1757 proposal of Boscovich defines an estimator that can have superior (to OLS) asymptotic MSE performance and is asymptotically unbiased for iid error linear models.  Details are given here.

Friday, November 5, 2021

UNIX is 50!

 



I don't think that there was anything remotely as influential in my research  experience as the existence of UNIX.  When I arrived at Bell Labs in 1976 UNIX was still in its infancy but already there were rumblings of a new statistics  language called S that would revolutionize my world.  In my last few years at Murray Hill my office was across the hall from Rick Becker's, so I was able learn S from one of its original authors.  When I returned to UIUC in 1983 it was a struggle to maintain my access to S and UNIX.  I recall the director of campus computing services telling me at that time that "UNIX wasn't appropriate  for educational institutions because it was too flexible."  Eventually accounts on various Vaxen were created and life went on with a commercial manifestation of S called Splus.  In 1989 on a yearlong  sabbatical adventure I was even able to maintain my dependence on UNIX with a dubious version by SCO  on a Zenith portable.  Sometime in 1999 I made the transition to R, and have never looked back.  Without UNIX all of this would have been almost unthinkable.

Friday, September 17, 2021

Proust as the ultimate anti-Bayesian

 Terrific talk by Jean Tirole last night on "The Common Good after Covid" sponsored by the IFS.  It covered a lot of ground in an hour.  Perhaps my favorite bit -- apropos  the lamentable failure to appreciate the arrival of new scientific evidence by the general public -- was this quote from Swann's Way: "The facts do not penetrate the world where our beliefs live."  If this seems too pithy for Proust you can google to find a more elaborated version: 

“The facts of life do not penetrate to the sphere in which our beliefs are cherished; they did not engender those beliefs, and they are powerless to destroy them; they can inflict on them continual blows of contradiction and disproof without weakening them; and an avalanche of miseries and maladies succeeding one another without interruption in the bosom of a family will not make it lose faith in either the clemency of its God or the capacity of its physician.”

Tuesday, September 7, 2021

Before there was Fortran, there was the Jacquard loom

 In the museum of the history of Lyon there is a beautiful example of a Jacquard loom used in weaving silk at the beginning of the 19th century.  Designs were implemented on punch cards as shown in the photo below and produced the flamboyant  pattern in the next photo.  One can't help but envy the results especially by comparison with the paltry spew of Phillips curve coefficients emanating from my early experiences with Fortran and Hollerith cards.



Saturday, August 21, 2021

Trump's Next Hotel

 

Joliet Prison pictured above, built in 1858, and closed in 2002 has recently reopened for public tours.

I had a dream that Angela Davis convinced the Donald to buy Joliet and turn it into a B&B.  Her argument was compelling:  in our carceral state everyone should spend some time in prison.  As Waguih Ghali's absurdly self-aware character, Ram, says in the wonderful novel Beer in the Snooker Club:  "I wouldn't like to go to prison, but I would like to have been."  Of course, most of us aren't as self-aware as Ram, but with some prodding maybe we could be convinced, if only for a night or two.There would have to be an ad campaign:  Joliet, sleep -- perchance to dream?  Joliet, commune with the Blues Brothers.  Put Joliet on your bucket list.

It would be an expensive renovation.  Joliet is old, was never in very good shape, and has fallen into disrepair.  But the Donald doesn't have any other worthwhile projects at the moment.  It wouldn't have to be fancy -- that's the whole point isn't it -- people need the experience of sleeping on a hard bunk, with hardtack en lieu of croissant for breakfast.  Those with a Gramsci-complex, or just aspirations for a more ordinary literary career could book longer stays.  There are many encouraging precedents of prison inspiration.

Privatization of prisons has already gone too far; it is time to encourage those who have not had prison experience to see what it is like.  Maybe this will help create a more humane carceral system.

Thursday, July 8, 2021

Huber Round Robin Problem

 With n teams of equal ability and one team that can beat the others with probability p > 1/2 how many teams are needed to assure the best team wins a round robin tournament in which each team plays all other teams once? Some details here.

Wednesday, June 23, 2021

Voronograms

Voronograms are a primitive form of nonparametric regression intended to explore edge detection methods.  Given scattered data on the plane, the idea is to fit  piecewise constant functions  defined on the Voronoi tessellation.  I was motivated to revive an 2004 R package for this by some recent conversations about total variation smoothing with Ryan Tibshirani and one of his students.  The package is linked here. The idea was an off-shoot of work with Ivan Mizera on triograms which were intended to fit quantile surfaces using a roughness penalty that corresponded to total variation of the gradient of the fitted function  on the Delone triangulation of a sample of scattered points.  Triograms are piecewise linear with breaks in the gradient along the edges of the triangulation.  In contrast Voronograms are piecewise constant with breaks in the function itself across edges of the Voronoi tessellation.  Examples appear in the figure below which can be reproduced by the command demo(tseg) in the R package.  As usual there is a lambda parameter that controls the strength of the penalization.



Wednesday, May 26, 2021

Brezhnev Fone

My DIY project over the Covid Containment has been putting together this Brezhnev Fone:



 It is (almost) ready to be moved back to its new home, the desk of the Director of the School of Slavonic and East European Studies at UCL, who happens to be my spouse.  It can't (yet) get Leonid Ilyich on the line but it can talk with the Google assistant via a raspberry pi and Google voice kit lodged inside.  It provided an entertaining reintroduction to the wonderful world of simple electronics and a chance to learn a tiny bit of python.  Gory details are available  here.

Wednesday, April 28, 2021

Cauchy Priors Redux

 Victor Chernozhukov posed the following question on twitter a few days ago:  "Suppose that X~ N(0,1), and Y ~ Cauchy, X and Y are independent. What is   E[X | X+Y]?"

Thien An replied "isn't it the mean of the density proportional to exp(-x^2/2)/[1+(z-x)^2]?" and she included this nice plot


illustrating the behavior of E(X | X+Y).

This recalled some rather ancient suggestions by Jim Berger about the utility of Cauchy priors.Reformulating Victor's question slightly, suppose that X ~ N(t,1) and you have prior t ~ C, Thien's nice plot can be interpreted as showing the posterior mean of t as a function of X. For small values of |X| there is moderate shrinkage back to 0, and as |X| grows the posterior mean does too.  However, for large values of X, this tendency is reversed and eventually very large |X| values are ignored  entirely.  This might seem odd, why is the data being ignored?

I like to think about this in terms of the comedian Richard Pryor's famous question:  "Who are you going to believe, me, or your lying eyes*."  If the observed X is far from the center of the prior distribution, you decide that it is just an aberration that is totally consistent with your prior belief that t could be quite extreme since the prior has such heavy tails, but you stick with your prior belief that it is much more likely that t is near 0.

In contrast if  the prior were Gaussian, say, t ~ N(0,1),  then the posterior mean is determined by linear shrinkage, so E(t|X) is midway between 0 and X, and for large |X|, we end up with a posterior mean in a place where neither the prior or likelihood have any mass.

* In this case "me" should be interpreted as your prior, and X as what you see with your lying eyes.



Tuesday, April 13, 2021

Convexification and Potato Peeling

 You may have wondered, as I have, about the convexifying effect of peeling a potato, so it comes as a great relief that this is a well studied subject.  The opening paragraph of Goodman (1981) is priceless:

A swivel-type potato peeler, when used in the usual (and somewhat wasteful) way, produces a convex peeled potato, which is a subset of the original unpeeled and non-convex potato. This suggests, in an obvious way, the problem of maximizing the volume of a convex body contained in a given non-convex body. That such a largest convex subset exists follows at once from the Blaschke selection theorem (see Proposition 1 below). The problems then become: (1) What proportion of the original volume are we assured of saving? and (2) How do we determine the largest convex 'peeled potato' inside a given non-convex one ?


A swivel-type potato peeler, when used in the usual (and somewhat wasteful) way, produces a convex peeled potato, which is a subset of the original unpeeled and non-convex potato. This suggests, in an obvious way, the problem of maximizing the volume of a convex body contained in a given non-convex body. That such a largest convex subset exists follows at once from the Blaschke selection theorem (see Proposition 1 below). The problems then become: (1) What proportion of the original volume are we assured of saving? and (2) How do we determine the largest convex 'peeled potato' inside a given non-convex one ?

A swivel-type potato peeler, when used in the usual (and somewhat wasteful) way, produces a convex peeled potato, which is a subset of the original unpeeled and non-convex potato. This suggests, in an obvious way, the problem of maximizing the volume of a convex body contained in a given non-convex body. That such a largest convex subset exists follows at once from the Blaschke selection theorem (see Proposition 1 below). The problems then become: (1) What proportion of the original volume are we assured of saving? and (2) How do we determine the largest convex 'peeled potato' inside a given non-convex one ?A swivel-type potato peeler, when used in the usual (and somewhat wasteful) way, produces a convex peeled potato, which is a subset of the original unpeeled and non-convex potato. This suggests, in an obvious way, the problem of maximizing the volume of a convex body contained in a given non-convex body. That such a largest convex subset exists follows at once from the Blaschke selection theorem (see Proposition 1 below). The problems then become: (1) What proportion of the original volume are we assured of saving? and (2) How do we determine the largest convex 'peeled potato' inside a given non-convex one ?A swivel-type potato peeler, when used in the usual (and somewhat wasteful) way, produces a convex peeled potato, which is a subset of the original unpeeled and non-convex potato. This suggests, in an obvious way, the problem of maximizing the volume of a convex body contained in a given non-convex body. That such a largest convex subset exists follows at once from the Blaschke selection theorem (see Proposition 1 below). The problems then become: (1) What proportion of the original volume are we assured of saving? and (2) How do we determine the largest convex 'peeled potato' inside a given non-convex one ?

Wednesday, January 13, 2021

Two new vinaigrettes

 I've added two new vinaigrettes to the growing list at http://www.econ.uiuc.edu/~roger/research/vinaigrettes/vinaigrette.html

One is a survey of currently available methods for estimation and inference for quantile regression, the other is about computational methods for univariate quantiles.  Both incorporate some new methods in my quantreg package.  As usual there is an element of samo-kritika  to these notes, since they try to reflect several drawbacks of the current state of affairs, not just the successes.