· 5 min read

Why Just Use One LLM? The Path from GPT (for Everything) to K-LLMs

There is not going to be 'One Ring to Rule them All'

There is not going to be 'One Ring to Rule them All'

The Path from GPT Dominates the World to a Competitive Market of Large Language Models

In the past year, we have seen incredible results from large language models (LLMs) - AI systems trained on vast amounts of text data to generate, understand, and interact with human-like language across a wide range of topics and contexts.

OpenAI initially dominated the Generative AI world with the release of ChatGPT and subsequently GPT-4, with millions of users and a tremendous amount of media coverage. After all, it seems that GPT-4 is capable of doing anything from writing a song to creating a recipe out of a list of ingredients you find lying around your kitchen to even passing the bar exam.

It was not clear when or if other leading technology companies could match the performance of the GPT family of models. Early releases such as Google’s Bard and Meta’s LLaMa1 revealed a significant gap between GPT and all other competitors. However, as 2023 continued, truly competitive foundational models were released, including Meta’s LLaMA2, Amazon’s Titan and Anthropic’s Claude 2. Mostly recently, Google’s Gemini team reported results on various benchmarks which place them in a virtual dead heat with the performance of GPT-4. This is all a way of saying that while the state of the art is in constant flux, the GPT family of models are no longer the only game in town. And it is reasonable to expect that this trend will continue in 2024 and beyond.

The Case for LLM Agnosticism and Introducing K-LLMs

Given the range of options now available (with more likely coming soon), what strategy should an organization employ to future proof themselves to changes in the market landscape? First and foremost, one should be somewhat LLM agnostic and not go ‘all in’ on any single model or provider - as today’s market leader could become tomorrow’s second rate model.

The sophistication of these LLMs might suggest that choosing just one model to perform all tasks across all domains would suffice. However, this assumption is proving to be a misconception. This is true not only at the overall level but also at the level of individual workstreams and tasks. For example, let’s say hypothetically Gemini is better at summarization while GPT-4 is better at complex reasoning. The optimal strategy would then be to break up a workstream and feed each piece to the model which performs best at that given component.

This put us on the path to K-LLMs (where K is an arbitrary number greater than one). K-LLMs is just a fancy way of saying that we should mix and match models rather than simply subscribe to the view that there is ‘one ring to rule them all.’ The K-LLM idea was a recent focus of the speech by Palantir CTO Shyam Sankar at the 2023 AIPConference where he likened each model to individual human experts with their own knowledge and own biases. The K-LLM paradigm can allow us to wash out these inaccuracies by ‘crowdsourcing’ across models. This crowdsourcing over K-LLMs allows a user to select the best performing model for various workstreams and even individual tasks. It is also possible to automate this selection process as we enable this exact type of functionality through our Kelvin Chat Experience powered by Kelvin Agent.

K-LLMs and the Role of Domain Models

To reiterate, the K-LLMs strategy is simply the idea that we can leverage the power of large language models by mixing and matching as between them. So what role do domain-specific models play if we are to employ the K-LLM strategy?

Let’s take a step back. We can define a domain-specific model as an LLM that is trained on a curated set of high-quality, domain-specific data, such as, law, medicine, or finance, with the intent of using it solely to perform tasks in said field. There are several examples already in existence. For instance, Google has built a medical domain specific model called MedPalm2. In a field arguably adjacent to law, Bloomberg released BloombergGPT in the spring of this year - a 50-billion parameter LLM trained on a wide range of financial data with the intention of supporting a range of natural language processing tasks in finance. Importantly, Bloomberg’s research team found that “the BloombergGPT model outperforms existing open models of a similar size on financial tasks by large margins, while still performing on par or better on general NLP benchmarks.” We of course recognize that there is an active debate as to whether a domain model will or will not outperform a general model. In the end, we believe that the answer will vary by workstream and task.

So what about a Legal Domain Model? Our CEO Mike Bommarito announced last week that we are currently training our own instruction-aligned domain-specific LegalGPT Model - using the Kelvin Legal DataPack - a dataset with over 300 billion legal and financial tokens from nearly 100TB of content. As we have reported previously and our ongoing research continues to confirm, a legal-specific model outperforms generalized LLMs on numerous tasks. Our instruction-aligned Kelvin Legal Language Model is coming soon and will fit within the cocktail of the K-LLMs paradigm.

So 2024 here we come! 🚀


Author Headshot

Jessica Mefford Katz, PhD

Jessica is a Co-Founding Partner and a Vice President at 273 Ventures.

Jessica holds a Ph.D. in Analytic Philosophy and applies the formal logic and rigorous frameworks of the field to technology and data science. She is passionate about assisting teams and individuals in leveraging data and technology to more accurately inform decision-making.

Would you like to learn more about the AI-enabled future of legal work? Send your questions to Jessica by email or LinkedIn.

Back to Blog