# Privacy-preserving Data Filtering in Federated Learning Using Influence Approximation. (arXiv:2205.11518v1 [cs.CR])

Federated Learning by nature is susceptible to low-quality, corrupted, or
even malicious data that can severely degrade the quality of the learned model.
Traditional techniques for data valuation cannot be applied as the data is
never revealed. We present a novel technique for filtering, and scoring data
based on a practical influence approximation that can be implemented in a
privacy-preserving manner. Each agent uses his own data to evaluate the
influence of another agent’s batch, and reports to the center an obfuscated
score using differential privacy. Our technique allows for almost perfect
($>92%$ recall) filtering of corrupted data in a variety of applications using
real-data. Importantly, the accuracy does not degrade significantly, even under
really strong privacy guarantees ($varepsilon leq 1$), especially under
realistic percentages of mislabeled data (for $15%$ mislabeled data we only
lose $10%$ in accuracy).