Tuesday, January 14, 2014

Kondratiev Meets Pynchon

The ineffable Elif Batuman's twitter feed led me to this PLOS paper on long cycles in literary
and economic misery, confirming the iron law of Kondratiev that there are long cycles in the
reappearance of Kondratiev's long cycles.  Each reappearance is cause to celebrate Slutsky's
devastating critique of such nonsense in the paper:  The Summation of Random Causes as the
Source of Cyclic Processes, Econometrica, 1937,  originally written and published in 1927
when Slutsky was employed in Kondratiev's "Conjuncture Institute"  in Moscow.

Sunday, January 12, 2014

Momentary Lapses

One of my hobbies (in the sense of xkcd) is poking fun at method of moments methods.  For quite a
while my first lecture of the graduate econometric theory course here has had several cautionary tales
designed to make students wary of moment based statistical procedures.  Yeah,  this blog was bound
to get mired down in matters statistical eventually, it is surprising that it took this long.

One of these cautionary tales was based on an American Statistician paper by Bruce Lindsay
and Prasanta Basak called: Moments Determine the Tail of a Distribution (But Not Much Else)
Rarely has there been a more accurately titled paper in the statistical literature.  The main
result of their paper was to provide bounds on how different two distribution functions, say F and G
can be, given that they have the same first n moments.  They provide a very nice extension of an
old result of Akhiezer for the general case, and give an example showing that for the Gaussian
distribution, G, there is an F with the same first 60 moments, for which |F(0) - G(0)| > 0.1116.
This seems rather astonishing, even after matching the first 60 (!) moments we can not expect
to be within 1/10 at the median of the distribution.

Whenever I thought about this, I wondered what would these F look like?  Could we actually
calculate a particular example to illustrate this sort of phenomena.  Such problems are usually
called Method of Moment Spaces problems, and it occurred to me a little while ago that it
would be easy to compute such animals by standard linear programming methods.  My first
attempt at this employed Mosek, my favorite general linear programming engine, and when
I tried to find an F to match the first 20 Gaussian moments, I discovered that the problem
was numerically pretty nasty:  the moments grow quite large and the resulting linear programming
problem becomes quite ill conditioned.  This led me to wonder whether the solution that Mosek
produced claiming an F that differed by 0.1846 at zero was really reliable.  At this point I decided
that it might be fun to try to solve the same problem in Mathematica, since it was capable of
doing so in extended precision.  So with some help and encouragement from my colleague,
Steve Portnoy,  I wrote the following simple program:

GaussianMomentMatching[n_, m_] := Module[{x,A,c,b,B,p},
   fm[u_,v_] := v^u;
   gm[j_] := If[OddQ[j-1],(j-1)!!,0];
   x = 8 Table[j, {j, -m, m}]/m;
   A = Outer[fm, Range[0, n], x] /. Indeterminate -> 1;
   b = Table[gm[j], {j,n}];
   c=(1 - Sign[x])/2;
   c[[m+1]] = 0;
   B = Table[0, {2}, {n+1}];
   B[[1]] = Prepend[b,1];
   B = Transpose[B];
   p = LinearProgramming[c,A,B,Method->"Simplex"];
   0.5 - N[c.p]
   ]

I am restricting attention to F's that have point masses on a grid from -8 to 8 with 2m + 1 
possible values, and matching first n moments of the Gaussian.  For matching n = 20
moments with m = 250,  this produces essentially identical results as Mosek with a gap of 0.1846 at
zero.  And now the rational solution in the p's matches precisely.  For n = 30, I get
a gap of 0.155.  The Mathematica code is a bit slow, so I haven't tried to go beyond
30,  but I'm still astonished by the fact that the gaps are so large.  Clearly Lindsay is
right when he says that moment matching for distributions with unbounded support
doesn't tell us much about the center of the distribution.  Of the  501 possible points
of support for these solutions only n+1 points get positive mass, this is a feature of
the linear programming formulation.  An entertaining aspect of the Mathematica solution
is that these p's with positive mass are (wildly) rational, e.g.

    4016749983193866933980157909278598671875 
______________________________________________.  So what do these matching solutions

1398521361440385903587657725093918246728368128

look like?  I've plotted the n = 20 solution in the figure below.  Note that I'm really plotting the cube
root of the probability masses in an effort to make the small tail masses visible.  Not surprisingly,
the solution looks quite symmetric since the odd moments must all match zero, but a little mass
out in the tails can make the moments mimic the Gaussian moments exactly.

Another question on might ask about all of this might be:  how different can the quantiles of
F and G be?  Obviously, the medians agree exactly, but as is apparent from the figure above
there are other quantiles that can be different by as much as 0.5.