In our previous posts on embeddings, I used the term “retrieval-augmented generation” (RAG). While certainly not the penultimate approach in the long run, RAG-powered applications have become a popular technique used for the engineering and delivery of generative AI solutions in the current wave of legal AI. Powered by our custom suite of Kelvin Embeddings, the Kelvin team can support various forms of RAG-based solutions within specific legal workflows. Given its current popularity, in this series we take a closer look at RAG with the aim of understanding where it fits in the landscape of generative AI.
Retrieval-augmented generation can be broken down into two essential steps: information retrieval (IR) and text generation. So, to understand retrieval-augmented generation, we first need to have a grasp of information retrieval broadly and then we can turn to the bigger picture.
A Brief Overview of Information Retrieval
The ability to retrieve relevant information from large bodies of available information is a long-standing problem. Whether it is general information or information in speciality fields such as medicine, finance or law, information retrieval techniques have long supported this task of sifting through the sands of potential results to help an individual locate the information they are seeking.
Wikipedia defines information retrieval in computing and information science “as the process of obtaining information system resources that are relevant to an information need from a collection of those resources.” Information retrieval has a long history and even predates computing. The Dewey Decimal System, for example, is the most widely known organizational taxonomy which for more than one hundred years has allowed users to conduct information retrieval tasks by identifying a relevant topic and looking on the shelf for relevant materials.
In the legal world, many professionals have used speciality information retrieval systems such as Westlaw or Lexis. The quality as well as the ease of use of these legal information retrieval systems has improved over the years with early systems requiring boolean logic or the deeper understanding of manual classification systems such as the West Key Number System.
In the more contemporary era, most individuals have leveraged information retrieval systems such as Google, Yahoo and Bing to search the internet to find relevant information. More modern information retrieval systems used syntax-based techniques to relate natural language searches to relevant results. Every time you input a query into a search engine, like Google or Yahoo, you interact with an information retrieval system. We query Google with the hope of retrieving relevant information - What is the weather? What is an easy, healthy recipe for dinner? What is the best hotel for families in Chicago? The more the information aligns with what is true and relevant to our original query, the better the information retrieval system.
The quality of information retrieval systems have been significantly impacted by the ability to collect data from user search behavior. Did the user select the first, second or tenth item on the list of returned results? If users continually select the tenth item, then rank it higher. Rinse, repeat, rerank as more users use the underlying system.
With this understanding of information retrieval, let us turn to large language models and efforts to use prompt engineering and retrieval-augmented generation to improve the quality of results rendered by leading foundational models.
Large Language Models and the Limits of Manual Prompt Engineering
Many users have an intuitive understanding of how to refine or tune a Google search to achieve better results. In this context, we are all loosely familiar with prompt engineering. Prompt engineering isn’t really engineering per se, just people changing queries until they get the response they want. We might call this process of editing a query prior to submitting it half-shot prompting.
Some of the intuitions we have developed from web based searches do not map neatly to the world of LLM prompting. Because we are sitting against the backdrop of a complex orchestration of underlying technologies such as transformers and the peculiarities associated with this class of neural networks, it is difficult to understand input, output correspondences.
In particular, the optimization process involving gradient descent in such complex models is intricate, and as such the internal workings are not entirely transparent or interpretable. It is thus sometimes challenging to predict how specific prompt modifications will affect the model’s generated responses.
Manual prompt engineering is thus a bit of a haphazard manner to approach the problem of getting the best out of a given large language model. A more systematized approach to the problem is to anchor results against some sort of ‘ground truth’. This brings us to the theory behind retrieval-augmented generation.
RAG as an Engineering Centric Approach to Prompt Engineering
As a bridge to Part II, a useful starting point is the following definition from our previous post:
Retrieval augmentation - also known as retrieval-augmented generation (RAG) - is a process in which a model’s “internal knowledge” is combined with external sources of information to support question-answering or text generation tasks.
We will pick up here in Part II of our series on retrieval-augmented generation (RAG).