Principal Investigators
Heath Spindle
Valerie Darling
FJ Olugbodi
Inclusive Dates 
05/06/2024 to 09/06/2024

Background 

This LAMP research project explores the potential of configuring a SharePoint document library of SwRI’s corpus of internal research (IR) reports to serve as a Retrieval-Augmented-Generation (RAG) source for a Large Language Model (LLM) database assistant. The project was established to allow users to engage with and leverage SwRI’s proprietary IR archive to generate accurate, domain-specific insights across decades of research. 

Approach 

A subset of IR reports was selected from the report archive, representing known topical subjects within a labeled dataset for response validation. Controlled metadata fields were established to standardize key prompt entry points. The project leveraged a framework for integrating an LLM with external data sources and Python to build, test, and deploy a RAG model. The methodology involved optimizing document parsers, adjusting document chunk sizes, employing sentence embedding models, and testing various document retrieval methods. The focus was on achieving high general search accuracy, effective semantic search, and minimizing computational demands. Additionally, a Conversational RAG model was incorporated to facilitate dynamic, interactive exchanges with the system. 

Accomplishments 

The custom-configured RAG model has successfully enabled the LLM to deliver accurate responses, concise report summaries, and dynamic user engagement. This project, using a SharePoint document library as a RAG source, can be extended to other business areas by applying the same technology to sensitive or extensive document libraries enabling broader access, streamlined insights, and deeper engagement with critical internal content across the organization.