## Efficient active learning with generalized linear models

AISTATS, 2007.

Active learning can significantly reduce the amount of training data
required to fit parametric statistical models for supervised learning
tasks. Here we present an efficient algorithm for choosing the optimal
(most informative) query when the output labels are related to the
inputs by a generalized linear model (GLM). The algorithm is based on
a Laplace approximation of the posterior distribution of the GLM's
parameters. The algorithm requires only low-rank matrix manipulations
and a single two-dimensional search to choose the optimal query and
has complexity O(n^2) (with n the dimension of the feature space),
making active learning with GLMs feasible even for high-dimensional
feature spaces. In certain cases the twodimensional search may be
reduced to a onedimensional search, further improving the algorithm's
efficiency. Simulation results show that the model parameters can be
estimated much more efficiently using the active learning technique
than by using randomly chosen queries. We compute the asymptotic
posterior covariance semi-analytically and demonstrate that the
algorithm empirically achieves this asymptotic convergence rate, which
is generally better than the convergence rate in the random-query
setting. Finally, we generalize the approach to efficiently handle
both output history effects (for applications to time-series models of
autoregressive type) and slow, non-systematic drifts in the model
parameters.

Reprint | Related work
on active learning | Liam Paninski's research page