Advanced science.  Applied technology.

Search

Design of Efficient Deep Neural Networks on FPGAs for Low-Power and Low-Latency Applications, 10-R6211

Principal Investigators
Mike Koets
Inclusive Dates 
10/01/21 to 04/01/23

Background

Machine learning has evolved to provide effective solutions to previously unsolvable problems in fields such as computer vision, time series analysis, and natural language processing. While these techniques are effective, machine learning is characterized by its reliance on powerful processors to compute large matrix multiplications requiring millions of computations for a single image in computer vision applications. This can be a limiting factor for deploying machine learning-based algorithms onboard satellites which have more constrained size, weight, power, and cost (SWaP-C) considerations than most terrestrial machine learning systems. Recent research has explored the use of low-bit precision mathematical representations of weights and activations to accelerate machine learning inference on radiation-tolerant processing components, particularly Field Programmable Gate Arrays (FPGA). Using low-precision mathematics with four or less bits representing each value allows for greater efficiency through reduced memory movement and greatly simplified mathematics, which can be effectively accelerated on FPGAs. This greater efficiency does come at the cost of reduced accuracy. If standard quantization techniques are used for low-precision quantization, then the accuracy of the neural network is greatly reduced, making it a less desirable solution.

Approach

This research focused on low precision quantization techniques that would achieve the desired level of accuracy on an object detection machine learning task, while being more efficient than current, higher precision-based techniques. To accomplish this, this research explored various types of low-precision quantization, weight compression in the form of weight sparsity, and novel deep learning architectures that minimize the overall data movement. This research evaluated the performance of each technique by how much the accuracy of the network would decrease as well as how much performance increased with each technique.

Accomplishments

This research resulted in the understanding of techniques for reducing an off-the-shelf efficient deep neural network to use low-bit precision while having minimal accuracy loss compared to the full precision counterpart. The produced techniques were also designed for efficient deployment on FPGAs. This research is still ongoing, and the next step will be to create a working implementation of the algorithm that can run on a space-rated FPGA.