Saturday, April 18, 2020

Computational Breakthroughs for Quantile Regression

Xuming He sent me a new paper with Xiaoou Pan, Kean Ming Tan, and Wen-Xin Zhou on computation of quantile regression.  They suggest a new way to smooth the objective
function that preserves convexity and use first order coordinate descent.  At first reading I thought that their comparison might be focused exclusively on the simplex method in my quantreg package which is based on the 1974 Barrodale and Roberts algorithm, so I decided to do some comparisons myself  with the interior point method that I call the Frisch-Newton algorithm.  My initial comparisons confirm their conclusions that the new approach has massive advantages in terms of computational speed and a slight advantage in terms of accuracy.  Based on past experience I'm always skeptical about accuracy comparisons since they have often been due to some sort of shrinkage effect that while perhaps unintended had the effect of negating the affine equivariance of the original definition.  This is not the case for Xuming's new proposal.  My interpretation of the efficiency gain is that smoothing the objective function means that the new problem is in effect producing an estimator that is like local averaging of several adjacent quantiles.  One might think of it as local Huberization and it is known that provided that the bandwidth isn't too large this can yield efficiency gains.

My experiments are based on very limited experience with dense n by p designs with both X and y Gaussian.  The He et al implementation is available  in R from https://cran.r-project.org/web/packages/conquer/index.html  Sample sizes range from 1000 to 100,000, and p ranges from 10 to 200.Timings (in seconds) for the mean of 10 replications look like this:

             Frisch-Newton                   He et al
        n = 1K n = 5K n = 10K n = 100K  n = 1K n = 5K n = 10K n = 100K
p = 10   0.006  0.032   0.070    0.531  0.005  0.008   0.017    0.063
p = 50   0.051  0.292   0.499    2.056  0.010  0.016   0.030    0.228
p = 100  0.126  0.591   0.843    6.130  0.011  0.034   0.061    0.485
p = 200  0.459  1.292   2.225   21.391  0.047  0.078   0.124    0.967

Whereas mean absolute error of the coefficient estimates averaged over 10 replications is:

             Frisch-Newton                   He et al
        n = 1K n = 5K n = 10K n = 100K  n = 1K n = 5K n = 10K n = 100K
p = 10   0.438  0.149   0.103    0.034  0.414  0.146   0.096    0.033
p = 50   1.627  0.730   0.524    0.157  1.447  0.682   0.498    0.152
p = 100  3.244  1.447   1.027    0.321  2.849  1.346   0.950    0.313
p = 200  7.141  2.856   2.019    0.616  6.093  2.554   1.823    0.594

Independently, Victor Chernozhukov, Ivan Fernandez-Val and Blaise Melly have proposed new
computational methods for estimating the QR process exploiting the "globbing" ideas that Steve Portnoy and I used in our Hare and Tortoise paper some time ago.  This also yields dramatic improvements in computational speed.  For further details, see https://arxiv.org/abs/1909.05782

Taken together and combined with various new ideas for bootstrapping these new developments substantially improve the scalability of QR methods.  Congratulations, and many thanks to all
contributors to this research.

No comments:

Post a Comment