# Da Void of Meaning

## Friday, September 16, 2016

### Dawn of the δ-method

Several years ago my colleague Steve Portnoy wrote a letter to the editor of the American Statistician

in response to an article that they had published called Who invented the δ-method? The article

claimed priority for Robert Dorfman on the basis of an article appearing in 1938 called "A Note

on the δ-method for Finding Variance Formulae" published in the Biometric Bulletin. Portnoy

pointed out that Joe Doob had written about the δ-method in a 1935 Annals paper titled, "On the

limiting distribution of certain statistics" referring to it as the "well-known δ-method" and citing

prior work by T.L. Kelley and Sewell Wright, and noting rather modestly that his Theorem 1

"shows an interpretation which can be given to the results obtained by this method." It seems

plausible that Doob's is the first formal justification for the method, and it is puzzling to put it

euphemistically that Dorfman made no mention of Doob's article. Perhaps this oversight can be

forgiven as a juvenile mistake since the Dorfman paper was written shortly after he finished his

undergraduate studies at Columbia, while working at the Worcester State Hospital, pictured above.

This august institution was reputed to be the first asylum for the insane in New England, and also happened to be the publisher of the Biometric Bulletin. Dorfman later went on to earn a Phd at Berkeley, and taught at Harvard where hecoauthored an influential book about linear programming with Paul Samuelson and Robert Solow.

## Thursday, September 15, 2016

### Bag of Little Bootstrap for QR

In the never ending quest to speed up inference for large quantile regression problems,

I have started to look into the Kleiner, et al Bag of Little Bootstraps proposal. After a

serious confusion on my part was corrected with the help of Xiaofeng Shao, I've come

to the following code fragment added to summary.rq in my quantreg package:

else if (se == "BLB"){ # Bag of Little Bootstraps

n <- length(y)

b <- ceiling(n^gamma)

S <- n %/% b

U <- matrix(sample(1:n, b * S), b, S)

Z <- matrix(0, NCOL(x), S)

for(i in 1:S){

u <- U[,i]

B <- matrix(0, NCOL(x), R)

for(j in 1:R){

w <- c(rmultinom(1, n, rep(1/b, b)))

B[,j] <- rq.wfit(x[u,], y[u], tau, weights = w, method = "fnb")$coef

}

Z[,i] <- sqrt(diag(cov(t(B))))

}

serr <- apply(Z, 1, mean)

}

I have started to look into the Kleiner, et al Bag of Little Bootstraps proposal. After a

serious confusion on my part was corrected with the help of Xiaofeng Shao, I've come

to the following code fragment added to summary.rq in my quantreg package:

else if (se == "BLB"){ # Bag of Little Bootstraps

n <- length(y)

b <- ceiling(n^gamma)

S <- n %/% b

U <- matrix(sample(1:n, b * S), b, S)

Z <- matrix(0, NCOL(x), S)

for(i in 1:S){

u <- U[,i]

B <- matrix(0, NCOL(x), R)

for(j in 1:R){

w <- c(rmultinom(1, n, rep(1/b, b)))

B[,j] <- rq.wfit(x[u,], y[u], tau, weights = w, method = "fnb")$coef

}

Z[,i] <- sqrt(diag(cov(t(B))))

}

serr <- apply(Z, 1, mean)

}

In the eventual implementation I managed to embed the inner loop into fortran

which helps to speed things up a bit, although of course it would be eventually

helpful to allow this to be distributed across cluster nodes.

which helps to speed things up a bit, although of course it would be eventually

helpful to allow this to be distributed across cluster nodes.

I should also mention that the implicit assumption here that BLB works for

moments, appears to be (presently) beyond the scope of the current theory,

## Friday, March 25, 2016

### Round of 16: What's Sweet about It?

The Statistics Department is running a March Madness contest, and I couldn't resist entering.

It is organized a little differently than the usual bracket picking, which made it more fun to

prepare an entry. You are given a budget of 100 units, and you must pick a subset of teams

as many as you want subject to the budget constraint: Teams seeded 1 cost 25, 2 cost 19, ...

16 seeds cost 1. I simulated 10,000 brackets, and recorded the survival probabilities as in

the earlier survival plot on this blog, and then computed the expected number of wins for

each team, normalized by their cost, ordered the teams and produced the following list of

teams. The winner is the entry whose teams accumulate the largest number of wins.

It is organized a little differently than the usual bracket picking, which made it more fun to

prepare an entry. You are given a budget of 100 units, and you must pick a subset of teams

as many as you want subject to the budget constraint: Teams seeded 1 cost 25, 2 cost 19, ...

16 seeds cost 1. I simulated 10,000 brackets, and recorded the survival probabilities as in

the earlier survival plot on this blog, and then computed the expected number of wins for

each team, normalized by their cost, ordered the teams and produced the following list of

teams. The winner is the entry whose teams accumulate the largest number of wins.

EWins Seeds Cost Bang CumCost CumEWins

Gonzaga 1.1619 11 4 0.2904750 4 1.1619

Pittsburgh 0.9041 10 4 0.2260250 8 2.0660

Cincinnati 1.0757 9 5 0.2151400 13 3.1417

Iowa 1.6385 7 8 0.2048125 21 4.7802

Syracuse 0.8112 10 4 0.2028000 25 5.5914

VA Commonwealth 0.7855 10 4 0.1963750 29 6.3769

West Virginia 2.4625 3 13 0.1894231 42 8.8394

Duke 2.2013 4 12 0.1834417 54 11.0407

Purdue 1.9888 5 11 0.1808000 65 13.0295

Connecticut 0.9040 9 5 0.1808000 70 13.9335

Butler 0.8602 9 5 0.1720400 75 14.7937

Indiana 1.8908 5 11 0.1718909 86 16.6845

Texas A&M 1.8507 3 13 0.1423615 99 18.5352

## Tuesday, March 22, 2016

## Monday, March 14, 2016

### Bracketology 2016

Again, it is time to fill the brackets. This year, again, I'm planning to join the Kaggle gaggle, but

I thought I would report here what this years survival probabilities look like. The situation is quite

different than last year, when Kentucky was the overwhelming favorite. This year there is a quite

close race with three teams: Kansas, MSU, and UNC all above 0.10 probability of winning it all,

with 0.16, 0.12, 0.11 respectively. This is based on my 1000 simulations of the tournament with

my standard QR model, just like last year when Kentucky was at 0.40. Here is a Tufte type

spark lines graphic with this years survival functions.

This year we have also posted a new bracket generator at

http://www.econ.uiuc.edu:8080/QBracketology/

that can be used to generate a random bracket according to this year's fitted model. Thanks to

Ignacio Sarmiento Barbieri for help with the R shiny implementation of this. Further details

about the methods underlying all of this are available here:

http://www.econ.uiuc.edu/~roger/research/bracketology/MM.html

I thought I would report here what this years survival probabilities look like. The situation is quite

different than last year, when Kentucky was the overwhelming favorite. This year there is a quite

close race with three teams: Kansas, MSU, and UNC all above 0.10 probability of winning it all,

with 0.16, 0.12, 0.11 respectively. This is based on my 1000 simulations of the tournament with

my standard QR model, just like last year when Kentucky was at 0.40. Here is a Tufte type

spark lines graphic with this years survival functions.

This year we have also posted a new bracket generator at

http://www.econ.uiuc.edu:8080/QBracketology/

that can be used to generate a random bracket according to this year's fitted model. Thanks to

Ignacio Sarmiento Barbieri for help with the R shiny implementation of this. Further details

about the methods underlying all of this are available here:

http://www.econ.uiuc.edu/~roger/research/bracketology/MM.html

## Sunday, January 3, 2016

### Edgeworth and the Newsvendor

Several months ago I received a new paper by Mukherjee, Brown and Rusmevichientong on

empirical Bayes methods for the "newsvendor problem." It is a matter of considerable personal

embarrassment that I hadn't ever heard about the newsvendor problem. In its simplest form we

have a newspaper vendor (remember them?) who must decide how many papers of various sorts

he should stock. He sacrifices profits if he stocks too many or too few, the loss is linear in both

directions, but asymmetric. He knows the distribution, F, of the (random) demand for each

paper, and of course his solution is to stock the a/(a+b) quantile of F, where a is the marginal

cost of stocking too few papers, and b is the marginal cost of stock too many. I learned this

bit of basic decision theory wisdom from Tom Ferguson's wonderful textbook in a course from

Bruce Hill eons ago, and I was aware that it appears in Raiffa and Schlaifer's text as well, but

I was surprised to learn that there was an extensive operations research literature going back

at least to the seminal paper of Arrow, Harris and Marschak in 1951. Why I was surprised,

since this is a rather fundamental problem in inventory policy, is another question, but I won't

delve into that. * Instead, I decided to look into the earlier history of this idea and began to see

quite a lot of references to Edgeworth's (1888) paper "The mathematical theory of banking."

More embarrassment ensued when I realized that I'd never read this paper. Reading it revealed

a very clear formulation of the newsvendor problem applied to banking, unfortunately Edgeworth's

solution seemed a little murky. I've tried to untangle all this in a couple of pages below, but as

you will see, it resists a fully satisfactory untangling.

Edgeworth.pdf

* I would like to stipulate that the simple static model underlying the newsvendor problem

is only a small part of the accomplishment of the AHM paper that introduced dynamic sS

rules as well.

empirical Bayes methods for the "newsvendor problem." It is a matter of considerable personal

embarrassment that I hadn't ever heard about the newsvendor problem. In its simplest form we

have a newspaper vendor (remember them?) who must decide how many papers of various sorts

he should stock. He sacrifices profits if he stocks too many or too few, the loss is linear in both

directions, but asymmetric. He knows the distribution, F, of the (random) demand for each

paper, and of course his solution is to stock the a/(a+b) quantile of F, where a is the marginal

cost of stocking too few papers, and b is the marginal cost of stock too many. I learned this

bit of basic decision theory wisdom from Tom Ferguson's wonderful textbook in a course from

Bruce Hill eons ago, and I was aware that it appears in Raiffa and Schlaifer's text as well, but

I was surprised to learn that there was an extensive operations research literature going back

at least to the seminal paper of Arrow, Harris and Marschak in 1951. Why I was surprised,

since this is a rather fundamental problem in inventory policy, is another question, but I won't

delve into that. * Instead, I decided to look into the earlier history of this idea and began to see

quite a lot of references to Edgeworth's (1888) paper "The mathematical theory of banking."

More embarrassment ensued when I realized that I'd never read this paper. Reading it revealed

a very clear formulation of the newsvendor problem applied to banking, unfortunately Edgeworth's

solution seemed a little murky. I've tried to untangle all this in a couple of pages below, but as

you will see, it resists a fully satisfactory untangling.

Edgeworth.pdf

* I would like to stipulate that the simple static model underlying the newsvendor problem

is only a small part of the accomplishment of the AHM paper that introduced dynamic sS

rules as well.

## Thursday, December 3, 2015

### The Pianists' Birthday Problem

We are used to the idea that common birthdays are not that unusual. I recently attended a

chamber music concert involving the young German pianist Markus Groh. A quick look at

his website revealed a rather extreme version of birthday matching: Groh was born January 5th,

the same day as Alfred Brendel, Maurizo Pollini, and Arturo Benedetti Michelangeli!

chamber music concert involving the young German pianist Markus Groh. A quick look at

his website revealed a rather extreme version of birthday matching: Groh was born January 5th,

the same day as Alfred Brendel, Maurizo Pollini, and Arturo Benedetti Michelangeli!

Subscribe to:
Posts (Atom)