Robust vision-in-the-loop system through NN fine-tuning using digital twins

Many autonomous systems are increasingly adopting Neural Networks (NNs) based perception in vision-in-the-loop (VIL) control systems. In many industrial applications, the features (shape, size and texture) of the object of interest varies, which imposes robustness requirements on the perception algorithm. Further, performance of the VIL system imposes strict latency requirements. Using NNs in VIL system poses two challenges. First, the NN models should be lightweight resulting in a low closed-loop latency. Second, availability of representative training data for ensuring robustness of the lightweight NN models. Collecting such training data is expensive and often, infeasible in many industrial systems. In this work we propose an approach for training the NNs used for VIL applications using digital twins (DT). The DT is used for automatically generating and labelling training data representing various features like object shapes and directional lighting. Starting from a lightweight NN base model, our proposed approach fine-tunes or retrains the model using DT- generated training data achieving desired performance and robustness on a different target operating condition. The approach is validated considering a VIL semiconductor motion stage system with square and rectangular dies of dimension of (0.5cm × 0.5cm) and (0.5cm × 1cm) respectively. The VIL system limits the positioning error in the range of 2% compared to 12% positioning error with no vision feedback.