# Improvements to Supervised EM Learning of Shared Kernel Models by Feature Space Partitioning. (arXiv:2205.15304v1 [cs.LG])

Expectation maximisation (EM) is usually thought of as an unsupervised
learning method for estimating the parameters of a mixture distribution,
however it can also be used for supervised learning when class labels are
available. As such, EM has been applied to train neural nets including the
probabilistic radial basis function (PRBF) network or shared kernel (SK) model.
This paper addresses two major shortcomings of previous work in this area: the
lack of rigour in the derivation of the EM training algorithm; and the
computational complexity of the technique, which has limited it to low
dimensional data sets. We first present a detailed derivation of EM for the
Gaussian shared kernel model PRBF classifier, making use of data association
theory to obtain the complete data likelihood, Baum’s auxiliary function (the
E-step) and its subsequent maximisation (M-step). To reduce complexity of the
resulting SKEM algorithm, we partition the feature space into $R$
non-overlapping subsets of variables. The resulting product decomposition of
the joint data likelihood, which is exact when the feature partitions are
independent, allows the SKEM to be implemented in parallel and at $R^2$ times
lower complexity. The operation of the partitioned SKEM algorithm is
demonstrated on the MNIST data set and compared with its non-partitioned
counterpart. It eventuates that improved performance at reduced complexity is
achievable. Comparisons with standard classification algorithms are provided on
a number of other benchmark data sets.