Principal Investigators
Heath Spidle
Maritza Spott
Valerie Darling
Coryn Millander
Inclusive Dates 
10/31/2024 to 10/31/2025

Background

This LAMP research project explores configuring a SharePoint document library of SwRI’s corpus of internal research (IR) reports to serve as a Retrieval-Augmented-Generation (RAG) source for a Large Language Model (LLM) database assistant. The project was established to allow users to engage with and leverage SwRI’s proprietary IR archive to generate accurate, domain-specific insights across decades of research.

Approach

The archive of Internal Research final reports is processed into the IRD Archive SharePoint document library. Controlled metadata fields are established to standardize key data entry points that support a traditional SharePoint PNP Search experience and natural language LLM engagement.

The project shifted production to a model incorporated with the internal SwRI-GPT interface where access controls managed by ITC could be implemented and maintained. A Model Context Protocol (MCP) server was developed, which allows SwRI-GPT to interact with the RAG and perform queries about SwRI’s IR&D reports. Several information retrieval methodologies and technologies were explored. The focus was on achieving high general search accuracy, effective semantic search and minimizing computational demands.

Accomplishments

The custom-configured RAG model and MCP server was successfully integrated into the SwRI-GPT interface with custom authentication developed to reflect SwRI Governance policy for SwRI Proprietary content.

It has successfully enabled SwRI-GPT to deliver accurate responses, concise report summaries and dynamic user engagement. This project, using a SharePoint document library as an RAG source, can be extended to other business areas by applying the same technology to sensitive or extensive document libraries enabling broader access, streamlined insights and deeper engagement with critical internal content across the organization.