Advanced science.  Applied technology.


Automated Search of Object Tracking Deep Neural Networks for Optimized Deployment on Novel Processors, 10-R6215

Principal Investigators
Jake Janssen
Dan Rossiter
Inclusive Dates 
10/14/21 to 02/14/22


The objective of this internal research project was to create a robust and automated approach for optimizing object-detecting deep neural networks (DNN) for new computing platforms for which the DNN will need to be deployed. Having the ability to optimize a DNN quickly and easily for new platforms is essential for successful long-term commercial deployment of any system that utilizes DNN based algorithms. One such system that this IR focused on was the Southwest Research Institute® (SwRI®)-owned Active-Vision™ vehicle tracking software, which leverages an advanced DNN-based object detection algorithm and deep sort tracking algorithm to locate and uniquely identify vehicles on the road using traffic camera video feeds. The previous iteration of the Active-Vision DNN required high performance graphics processing units (GPU) to do the processing and was only able to achieve speeds that would allow it to handle a handful of cameras on a single GPU. This speed limitation is a major risk that Active-Vision previously faced when looking to expand to new customers.


To achieve an approach for automating the search of efficient DNNs, we had to first curate a clean dataset that could be used as an automated metric for our search algorithm. The data used to train the original Active-Vision object detection network was found to have deficiencies, mainly regarding lack of data collected in urban environments such as city intersections, which caused the algorithm to perform poorly on supplied data from our department of transportation customer. Additional data was incorporated from autonomous vehicle datasets which added more vehicle examples and greatly improved the number of non-car data that the algorithm would learn to ignore in its predictions. The team then needed to choose the baseline algorithm for which we would be doing our automated network searching. The team chose to use the yoloV5 framework, which has a rich set of training features as well as the ability to search various model parameters by changing a single configuration file. Finally, the team trained multiple variants of the model to find one that was able to achieve the desired speed requirements for the different compute platforms. To minimize the search space, a constant scaling factor was first applied to the network to understand how the accuracy would change as we made the DNN faster. One of these models was chosen as the baseline and then a more refined search of that neural network was applied.


This research resulted in various model topologies based on the yoloV5 framework that can run at or faster than the desired five frames per second on the different hardware platforms we were targeting. The techniques used to achieve these speeds are generic enough to allow for exploration of models on different platforms and will help enable Active-Vision and other object detection computer vision systems to be able to adapt to the various compute platforms more freely.