A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing. (arXiv:2212.00007v1 [cs.HC])
Due to the noises in crowdsourced labels, label aggregation (LA) has emerged
as a standard procedure to post-process crowdsourced labels. LA methods
estimate true labels from crowdsourced labels by modeling worker qualities.
Most existing LA methods are iterative in nature. They need to traverse all the
crowdsourced labels multiple times in order to jointly and iteratively update
true labels and worker qualities until convergence. Consequently, these methods
have high space and time complexities. In this paper, we treat LA as a dynamic
system and model it as a Dynamic Bayesian network. From the dynamic model we
derive two light-weight algorithms, LAtextsuperscript{onepass} and
LAtextsuperscript{twopass}, which can effectively and efficiently estimate
worker qualities and true labels by traversing all the labels at most twice.
Due to the dynamic nature, the proposed algorithms can also estimate true
labels online without re-visiting historical data. We theoretically prove the
convergence property of the proposed algorithms, and bound the error of
estimated worker qualities. We also analyze the space and time complexities of
the proposed algorithms and show that they are equivalent to those of majority
voting. Experiments conducted on 20 real-world datasets demonstrate that the
proposed algorithms can effectively and efficiently aggregate labels in both
offline and online settings even if they traverse all the labels at most twice.
Source: https://arxiv.org/abs/2212.00007