Lan Wang (UMinn), Xuming He (UMich) and I gave an introductory overview lecture at the

2017 JSM meeting in Baltimore a couple of days ago. Slides are available here. A photo of

the three of us, and Annie Qu, who was chairing the session taken by Regina Liu appears below.

# Da Void of Meaning

## Friday, August 4, 2017

## Tuesday, July 11, 2017

## Friday, December 9, 2016

### R Vinaigrettes

Knuth's call for a literate programming style has spawned a new genre of statistical exposition, the R vignette, and thereby raised the dreary task of documenting computer code to the level of a minor art form, like finger painting or tap dancing. These vignettes are intended to reveal something of the authors contribution to the greater glory of data analysis, usually in the form of an R package.

This development has been enormously successful, and yet there is a general unease within the research community, a feeling that many of the almost 10,000 packages currently on CRAN have not received adequate vetting, or vignetting. In this spirit I would like to propose a new genre, the

R vinaigrette. These would be brief communications that expose some feature, or bug, in the collective enterprise of statistical software. As the name suggests there should be something piquant about a vinaigrette, some lemon juice to balance the oils, or mustard, or vinegar. I would only insist that, like the vignette, the vinaigrette must be reproducible. Ideally, they should also satisfy the Kolmogorov dictum that every single discovery should fit in a four-page Doklady note, since "the human brain is not capable of creating anything more complicated at one time."

An example is now available at http://www.econ.uiuc.edu/~roger/research/ebayes/Bdecon.pdf

This development has been enormously successful, and yet there is a general unease within the research community, a feeling that many of the almost 10,000 packages currently on CRAN have not received adequate vetting, or vignetting. In this spirit I would like to propose a new genre, the

R vinaigrette. These would be brief communications that expose some feature, or bug, in the collective enterprise of statistical software. As the name suggests there should be something piquant about a vinaigrette, some lemon juice to balance the oils, or mustard, or vinegar. I would only insist that, like the vignette, the vinaigrette must be reproducible. Ideally, they should also satisfy the Kolmogorov dictum that every single discovery should fit in a four-page Doklady note, since "the human brain is not capable of creating anything more complicated at one time."

An example is now available at http://www.econ.uiuc.edu/~roger/research/ebayes/Bdecon.pdf

## Monday, November 28, 2016

### Optimal Transport on the London Tube

I've been reading Alfred Galichon's terrific new monograph on optimal transportation, and was

inspired over the fall break to look into his example in Section 8.4 on routes for the Paris metro.

Data for the London Underground was more easily accessible, so I made a toy tube router function

for R that takes an origin and destination and computes an "optimal" path by minimizing the

cumulative distance between stops. An example path is illustrated in the figure below with the

lines color coded, unfortunately my current data sources don't account for links that have multiple

lines, so the routes typically overstate the number of line changes. (It would be nice to penalize

line changes with a fixed cost, but this would have extended the project beyond the fall break.)

Data and code is available here:http://www.econ.uiuc.edu/~roger/research/OT/tube.tar.gz

It is all very simple, just a linear program, but it makes you think about how one might scale it

up to the scheme used by Google Maps.

inspired over the fall break to look into his example in Section 8.4 on routes for the Paris metro.

Data for the London Underground was more easily accessible, so I made a toy tube router function

for R that takes an origin and destination and computes an "optimal" path by minimizing the

cumulative distance between stops. An example path is illustrated in the figure below with the

lines color coded, unfortunately my current data sources don't account for links that have multiple

lines, so the routes typically overstate the number of line changes. (It would be nice to penalize

line changes with a fixed cost, but this would have extended the project beyond the fall break.)

Data and code is available here:http://www.econ.uiuc.edu/~roger/research/OT/tube.tar.gz

It is all very simple, just a linear program, but it makes you think about how one might scale it

up to the scheme used by Google Maps.

## Wednesday, November 9, 2016

## Friday, September 16, 2016

### Dawn of the δ-method

Several years ago my colleague Steve Portnoy wrote a letter to the editor of the American Statistician

in response to an article that they had published called Who invented the δ-method? The article

claimed priority for Robert Dorfman on the basis of an article appearing in 1938 called "A Note

on the δ-method for Finding Variance Formulae" published in the Biometric Bulletin. Portnoy

pointed out that Joe Doob had written about the δ-method in a 1935 Annals paper titled, "On the

limiting distribution of certain statistics" referring to it as the "well-known δ-method" and citing

prior work by T.L. Kelley and Sewell Wright, and noting rather modestly that his Theorem 1

"shows an interpretation which can be given to the results obtained by this method." It seems

plausible that Doob's is the first formal justification for the method, and it is puzzling to put it

euphemistically that Dorfman made no mention of Doob's article. Perhaps this oversight can be

forgiven as a juvenile mistake since the Dorfman paper was written shortly after he finished his

undergraduate studies at Columbia, while working at the Worcester State Hospital, pictured above.

This august institution was reputed to be the first asylum for the insane in New England, and also happened to be the publisher of the Biometric Bulletin. Dorfman later went on to earn a Phd at Berkeley, and taught at Harvard where hecoauthored an influential book about linear programming with Paul Samuelson and Robert Solow.

## Thursday, September 15, 2016

### Bag of Little Bootstrap for QR

In the never ending quest to speed up inference for large quantile regression problems,

I have started to look into the Kleiner, et al Bag of Little Bootstraps proposal. After a

serious confusion on my part was corrected with the help of Xiaofeng Shao, I've come

to the following code fragment added to summary.rq in my quantreg package:

else if (se == "BLB"){ # Bag of Little Bootstraps

n <- length(y)

b <- ceiling(n^gamma)

S <- n %/% b

U <- matrix(sample(1:n, b * S), b, S)

Z <- matrix(0, NCOL(x), S)

for(i in 1:S){

u <- U[,i]

B <- matrix(0, NCOL(x), R)

for(j in 1:R){

w <- c(rmultinom(1, n, rep(1/b, b)))

B[,j] <- rq.wfit(x[u,], y[u], tau, weights = w, method = "fnb")$coef

}

Z[,i] <- sqrt(diag(cov(t(B))))

}

serr <- apply(Z, 1, mean)

}

I have started to look into the Kleiner, et al Bag of Little Bootstraps proposal. After a

serious confusion on my part was corrected with the help of Xiaofeng Shao, I've come

to the following code fragment added to summary.rq in my quantreg package:

else if (se == "BLB"){ # Bag of Little Bootstraps

n <- length(y)

b <- ceiling(n^gamma)

S <- n %/% b

U <- matrix(sample(1:n, b * S), b, S)

Z <- matrix(0, NCOL(x), S)

for(i in 1:S){

u <- U[,i]

B <- matrix(0, NCOL(x), R)

for(j in 1:R){

w <- c(rmultinom(1, n, rep(1/b, b)))

B[,j] <- rq.wfit(x[u,], y[u], tau, weights = w, method = "fnb")$coef

}

Z[,i] <- sqrt(diag(cov(t(B))))

}

serr <- apply(Z, 1, mean)

}

In the eventual implementation I managed to embed the inner loop into fortran

which helps to speed things up a bit, although of course it would be eventually

helpful to allow this to be distributed across cluster nodes.

which helps to speed things up a bit, although of course it would be eventually

helpful to allow this to be distributed across cluster nodes.

I should also mention that the implicit assumption here that BLB works for

moments, appears to be (presently) beyond the scope of the current theory,

Subscribe to:
Posts (Atom)