Batch Active Learning at Scale. (arXiv:2107.14263v1 [cs.LG])

The ability to train complex and highly effective models often requires an
abundance of training data, which can easily become a bottleneck in cost, time,
and computational resources. Batch active learning, which adaptively issues
batched queries to a labeling oracle, is a common approach for addressing this
problem. The practical benefits of batch sampling come with the downside of
less adaptivity and the risk of sampling redundant examples within a batch — a
risk that grows with the batch size. In this work, we analyze an efficient
active learning algorithm, which focuses on the large batch setting. In
particular, we show that our sampling method, which combines notions of
uncertainty and diversity, easily scales to batch sizes (100K-1M) several
orders of magnitude larger than used in previous studies and provides
significant improvements in model training efficiency compared to recent
baselines. Finally, we provide an initial theoretical analysis, proving label
complexity guarantees for a related sampling method, which we show is
approximately equivalent to our sampling method in specific settings.



Related post