SA-GD: Improved Gradient Descent Learning Strategy with Simulated Annealing. (arXiv:2107.07558v1 [cs.LG])

Gradient descent algorithm is the most utilized method when optimizing
machine learning issues. However, there exists many local minimums and saddle
points in the loss function, especially for high dimensional non-convex
optimization problems like deep learning. Gradient descent may make loss
function trapped in these local intervals which impedes further optimization,
resulting in poor generalization ability. This paper proposes the SA-GD
algorithm which introduces the thought of simulated annealing algorithm to
gradient descent. SA-GD method offers model the ability of mounting hills in
probability, tending to enable the model to jump out of these local areas and
converge to a optimal state finally. We took CNN models as an example and
tested the basic CNN models on various benchmark datasets. Compared to the
baseline models with traditional gradient descent algorithm, models with SA-GD
algorithm possess better generalization ability without sacrificing the
efficiency and stability of model convergence. In addition, SA-GD can be
utilized as an effective ensemble learning approach which improves the final
performance significantly.



Related post