A Flexible Multi-Core Hardware Architecture for Stereo-Based Depth Estimation CNNs

Stereo-based depth estimation is becoming more and more important in many applications like self-driving vehicles, earth observation, cartography, robotics and so on. Modern approaches to depth estimation employ artificial intelligence techniques, particularly convolutional neural networks (CNNs). However, stereo-based depth estimation networks involve dual processing paths for left and right input images, which merge at intermediate layers, posing challenges for efficient deployment on modern hardware accelerators. Specifically, modern depth-first and layer-fused execution strategies, which are commonly used to reduce I/O communication and on-chip memory demands, are not readily compatible with such non-linear network structures. To address this limitation, we propose a flexible multi-core hardware architecture tailored for stereo-based depth estimation CNNs. The architecture supports layer-fused execution while efficiently managing dual-path computation and its fusion, enabling improved resource utilization. Experimental results demonstrate a latency reduction of up to 24% compared to state-of-the-art depth-first implementations that do not incorporate stereo-specific optimizations.