Advanced science.  Applied technology.

Search

Synthetic Data for Neural Network Training, 18-R8866

Principal Investigators
Joel Allardyce
Inclusive Dates 
07/01/18 to 10/01/19

Background

Accurate, reliable, easy to use motion capture technology has long been the goal for biomechanics researchers. Measuring how humans move is the core knowledge upon which research into the internal kinematic and dynamic environment of joints is built. Over the past 30 plus years, marker-based motion capture systems have been the gold standard for measuring human motion. While incredibly useful, the limitations of traditional marker-based motion capture systems have spurred the development of “markerless” motion capture systems that aim to directly combat these challenges. SwRI has developed our own makerless motion capture system that is based on a deep convolutional neural network (dCNN) framework. dCNN’s are machine-learning algorithms that require well-defined and accurate training data sets to teach the algorithm the relationships between the images and the underlying kinematics. These training sets are costly and time consuming to collect and process, putting undue strain on budgets and timelines of potential projects. We aim to directly address this cost limitation for capturing training data by developing a methodology to significantly expand and augment a limited training data set with synthetic data. The successful completion of this program will significantly reduce the cost and time requirements of training new markerless biomechanics neural networks.

Approach

In order to achieve the goals of this program, a semi-automated toolset was created for the generation of synthetic data. This toolset included a statistical kinematic model which can generate new unique motion samples constrained by the measured variability within a set of experimental data. This allows for an expansion of the included motion variability within the dataset. A tool to convert results from this model or experimentally captured data into a format compatible with the animation software package Blender is used to render the synthetic data takes. The toolset also includes an automated pipeline that can take a set of converted kinematic data and generate a complete synthetic data set, with multiple subjects, backgrounds, lighting conditions, clothing textures and more with minimal human interaction. Using the synthetic data generated with this toolset, a series of networks was then trained with different combinations of real and synthetic training data to investigate the effect the data, including synthetic data, had on measurement accuracy. Those networks were then assessed and compared to quantify those effects and identify the optimal combination of real and synthetic data.

Accomplishments

Over the course of the project we have developed a methodology and computational toolset with which to generate synthetic data for the training of a neural network. This toolset includes a statistical kinematic model to generate new unique motion samples constrained by the measured variability within a set of experimental data. As well as a tool to convert results from this model or experimentally captured data into a format compatible with the animation software package, Blender. This converted data is then fed into an automated pipeline that takes input kinematics, character models, and background and generates a complete synthetic data set with minimal human interaction. Using the synthetic data generated with the developed toolset, a series of networks were trained to assess the effect the data, including synthetic data, has on measurement accuracy. Improvements of up to 28% were seen for joint center position accuracy and 18.5% for joint angle accuracy. Accuracy improvements were not universal, however, with the biggest effect of including synthetic data being a leveling off of accuracy across all subjects. When deployed, the system must have consistent accuracy for any of the many varied subjects that it might be tasked to measure. This observed performance leveling will be critical to ensure success. Synthetic data will not replace experimentally captured data but can serve to augment it, significantly reducing the costs of collecting large training sets.