RAG-powered Chatbot for Efficient Data Retrieval

Internet Services

USA

nda

We helped a company that handles operations through a SaaS platform simplify knowledge retrieval by combining the intelligence of LLMs, the algorithms of RAG, and the usability of a chatbot.

Web app

Python

Node.js

overview

[01]

About the client

Our client is a mid-sized company with a diverse organizational structure (including disparate departments like IT, HR, finance, etc.) that operates on a SaaS product.

[02]

The Problem

The client has an extensive and continuously growing internal knowledge base, consisting of different corporate policies, guidelines, employee data, and other documentation. It has become a challenge for the client’s employees to find and retrieve the needed information from data storage: the necessary information is dispersed across various files, manual search time slows down the workflows, and it often ends up with incomplete data.

Thus, the client needs a quick and accurate way to retrieve internal company information. It should be a search service that can:

understand the user’s query and its context,
find documents with the relevant data in the knowledge base,
select information that answers this query,
synthesize this information,
return it in the form of a direct response.

Also, since the client’s knowledge base contains sensitive data and has different role-based access levels, the service must be able to differentiate user roles and only provide accessible information.

Solution

A good solution for such tasks today can be a chatbot service powered by LLM. However, there are some nuances in using general-knowledge LLMs, too. In our case, the LLM must retrieve information from a domain-specific and isolated database. This means it will have to process data it wasn’t previously trained on.

To do this effectively, the LLM will require additional fine-tuning, more computational resources, and a large amount of training data. Also, the need to process sensitive and restricted information poses additional limitations. Such an approach would be too costly and time-consuming. RAG offers a great alternative to resolve this challenge.

[01]

The RAG Approach

Therefore, our solution was to leverage RAG (retrieval-augmented generation) – a technique that enhances data processing by LLMs with information retrieval, enabling LLMs to provide more relevant, accurate, and up-to-date responses. RAG works by using a search algorithm to query external sources (like knowledge bases, databases, or web pages) before the LLM gets to generate a response. This approach helps us to:

Reduce hallucinations: chatbot responses are tightly associated with information from the database, with every response including a reference to the information source.
Handle complex queries: the chatbot can understand nuanced queries and answer questions that require gathering and synthesizing information from multiple resources.
Increase response accuracy: since RAG implies data retrieval from a designated source, the chatbot responses are more relevant and to-the-point.
Ensure sensitive data security: RAG integrates multiple measures for data security, like encryption, data anonymization, access control, etc.

[02]

The Chatbot

To implement the chatbot leveraging RAG, we needed to set up vector similarity search – a technique that helps find similar content based on its numerical vector representations (aka “embeddings”). These embeddings represent content (words, paragraphs, documents, images, or videos) by numbers that reflect the semantic relationships between its elements. This way, the system can perform a deep search of the knowledge base by sense, making it possible to find data with similar meaning but different wording.

Thus, the chatbot implementation took several stages:

Knowledge base indexation. First, we broke down the entire knowledge base into chunks of text. Depending on their content, these chunks can vary in size, from several words to entire paragraphs and chapters. Defining their size for each use case is a painstaking and iterative process, as it determines the degree of the chatbot’s eventual response accuracy and relevance. Then we converted these chunks into high-dimensional vector embeddings using ML and stored them in a special vector database.
Information retrieval. Every user’s message is also converted into an embedding. Then, the system can look for similar embeddings in the knowledge base and find the most relevant pieces of data that will answer the request.
Response generation. The relevant data we retrieved from the knowledge base and the user’s message are sent to the LLM. It synthesizes all this information and generates a comprehensive and context-based response. This way, the chatbot can provide summaries of key points from multiple sources, quote relevant documents, or generate new content based on the insights from the available resources.
Prompt engineering. We also had to make sure the chatbot behaves in a defined manner. We created system prompts to set the chatbot’s role, type of output, response complexity, form, volume, and tone of voice, conversation domain, and data access restrictions.

The Result

Using the RAG technique, we created a smart service that can leverage LLM capabilities and, at the same time, retrieve data from the client’s isolated database with several considerable benefits: