KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow. (arXiv:2306.15676v1 [cs.AR])
Dataflow scheduling decisions are of vital importance to neural network (NN)
accelerators. Recent scalable NN accelerators support a rich set of advanced
dataflow techniques. The problems of comprehensively representing and quickly
finding optimized dataflow schemes thus become significantly more complicated
and challenging. In this work, we first propose comprehensive and pragmatic
dataflow representations for temporal and spatial scheduling on scalable
multi-node NN architectures. An informal hierarchical taxonomy highlights the
tight coupling across different levels of the dataflow space as the major
difficulty for fast design exploration. A set of formal tensor-centric
directives accurately express various inter-layer and intra-layer schemes, and
allow for quickly determining their validity and efficiency. We then build a
generic, optimized, and fast dataflow solver, KAPLA, which makes use of the
pragmatic directives to explore the design space with effective validity check
and efficiency estimation. KAPLA decouples the upper inter-layer level for fast
pruning, and solves the lower intra-layer schemes with a novel bottom-up cost
descending method. KAPLA achieves within only 2.2% and 7.7% energy overheads on
the result dataflow for training and inference, respectively, compared to the
exhaustively searched optimal schemes. It also outperforms random and
machine-learning-based approaches, with more optimized results and orders of
magnitude faster search speedup.
Source: https://arxiv.org/abs/2306.15676