Use of Natural Language Processing and Descriptive Taxonomies to Increase Automation of Learning Object Metadata Generation, 07-R9640

Printer Friendly Version

Principal Investigators
Robert Van Dam
Brett Knight
Doretta E. Gordon

Inclusive Dates:  07/01/06 – 06/30/07

Background - The primary goals of the reusable learning objects initiative outlined by the Advanced Distributed Learning Co-Labs are to provide content that is interoperable, accessible, reusable, durable, maintainable and adaptable to individual and organizational needs. Success in the areas of reusability and adaptability relies heavily on the quality of the learning objects themselves as well as the quality and completeness of the metadata attached to each object.

The purpose of this project was to identify a means of auto-generating metadata for the educational category elements of the learning object metadata (LOM) schema using natural language processing and information extraction techniques. The goal was to auto-generate metadata that was identical to that generated manually by a panel of expert coders.

Approach - The methodology for the study included baseline metadata generation by an expert panel for 10 sharable content objects (SCOs). Natural language processing and information extraction techniques were then identified and programmed for each element of the educational category. A set of iterative tests and refinements was conducted to create the final auto-generation algorithm. Finally, 10 new SCOs were provided to the same expert panel for manual generation of metadata. The 10 SCOs were run through the auto-generation algorithm and compared to the manually generated "gold standard."

Accomplishments - While the results of the study were disappointing due to lack of varied data from which to build the logical model, several planned approaches to improvement of the model relative to application environments were identified.

2007 Program Home