Background
This LAMP research project explores configuring a SharePoint document library of SwRI’s corpus of internal research (IR) reports to serve as a Retrieval-Augmented-Generation (RAG) source for a Large Language Model (LLM) database assistant. The project was established to allow users to engage with and leverage SwRI’s proprietary IR archive to generate accurate, domain-specific insights across decades of research.
Approach
The archive of Internal Research final reports is processed into the IRD Archive SharePoint document library. Controlled metadata fields are established to standardize key data entry points that support a traditional SharePoint PNP Search experience and natural language LLM engagement.
The project shifted production to a model incorporated with the internal SwRI-GPT interface where access controls managed by ITC could be implemented and maintained. A Model Context Protocol (MCP) server was developed, which allows SwRI-GPT to interact with the RAG and perform queries about SwRI’s IR&D reports. Several information retrieval methodologies and technologies were explored. The focus was on achieving high general search accuracy, effective semantic search and minimizing computational demands.
Accomplishments
The custom-configured RAG model and MCP server was successfully integrated into the SwRI-GPT interface with custom authentication developed to reflect SwRI Governance policy for SwRI Proprietary content.
It has successfully enabled SwRI-GPT to deliver accurate responses, concise report summaries and dynamic user engagement. This project, using a SharePoint document library as an RAG source, can be extended to other business areas by applying the same technology to sensitive or extensive document libraries enabling broader access, streamlined insights and deeper engagement with critical internal content across the organization.