A Novel Depth-First Scheduling for Spatially Dynamic Neural Networks

Efficient hardware execution of convolutional neural networks (CNNs) increasingly relies on methods that reduce both data movement and memory usage. Depth-first processing is a proven approach to minimize on-chip memory and off-chip bandwidth requirements. In parallel, Spatially Dynamic Neural Networks (SDyNNs) exploit runtime spatial sparsity to skip redundant pixel computations, achieving adaptive efficiency. However, existing hardware implementations for SDyNNs neglect the benefits of depth-first scheduling. This paper introduces a flexible multi-core hardware architecture that, for the first time, integrates depth-first execution with spatially dynamic pruning. We propose a novel scheduling strategy tailored to the concurrent execution of convolutional and decision layers in SDyNNs. Our approach modifies the depth-first paradigm to support dynamic pruning at the pixel level while maintaining full computational parallelism. Analytical results demonstrate that this architecture significantly reduces latency and memory demands compared to prior two-array implementations, particularly under realistic sparsity conditions.