Shifting from models to applications
I’ve neglected publishing here for the last year or so, but during that time the focus of my learning about machine learning has shifted from a focus on understanding models, and in particular large language models and towards how to build useful applications using those models.
This distinction is one that is very deliberately burry in a lot of the marketing about AI. People talk about ChatGPT, Claude, or Gemini and they speak as if this is all a single model. But really the model is a component in a larger architecture - perhaps a central component, but a component nonetheless.
Some of the other elements that go into an LLM-based application include
Some sort of structured user interface and organization for interaction. In the chat-focused applications above this is usually the concept of a thread of messages.
A set of (sometimes interacting) prompts and pipes between different information. In a YouTube video about Gemini reasoning about user intent, they break down that responding to a single user interaction involves an entire workflow with several decision steps. Each of those steps is either an algorithmic process or an interaction with a model, resulting in some structured or unstructured information that gets taken to the next step.
Often some sort of structured/known data source that can be searched and fed in as context
Often some sort of conceptual “memory” and way to keep track of what has already happened.
Emerging AI Architectures
Fleshing these out a little bit, there are two big architectural patterns that have emerged for building applications around LLMs: Retrieval Augmented Generation (RAG) and Agents.
A few points about each:
Retrieval Augmented Generation
Core architecture for layering proprietary/domain knowledge into chat interaction.
Essential mental model:
Use query/prompt to have LLM generate a search
Load relevant documents from search
Use documents + initial prompt as context to LLM to generate response
Stereotypical example: “Chat with this project’s documentation”
There’s a great article on the langchain blog that goes deeper into how RAG works and the different elements of it.
Some of the benefits of using RAG:
It’s a relatively straightforward way to create a unique and valuable chat/textual interaction
Can work “globally” (e.g. All customers search these documents) or “locally” (e.g. we search content from your specific documents)
It has “better” factuality / reduced hallucinations
There’s a reduced dependency (relative to fine tuning or other mechanisms to customize) on training data
Can link to / directly reference source material.
That last one is key for building traceability and the ability to verify factuality into an AI application. Because the model has no concept of “truth” and (at least in today’s generations) no way to trace back the source material for any particular thing that was generated, having a way to connect (outside the LLM) to the original source is the only way I have found where you can create true traceability & ability to verify factuality.
Generative Agents
Architecture for creating evolving system.
Core components
Memory stream
Reflection/summarization
Planning
Stereotypical example: AI-based game characters that learn and evolve
One of the classic examples of using a generative agent is in this academic research paper that explores an entire game filled with characters that learn, evolve, and interact.
Variations on agentic approaches are being attempted all over, but most of the well-published examples (e.g. Mini-AGI) demo well but break down quickly in production.
Some of the benefits of Generative Agents are they
Can learn and improve behavior over time (even without model changes)
Maintain history of previous interactions
The Application Layer is the frontier for LLMs
At this point the power of the big LLM foundation models is very well established, but what feels much more unproven is how to build actual useful and valuable applications with them.
The most successful examples so far have been coding assistants (Which follow a variation on RAG — super interesting conversation around this in this podcast) and summarization assistants for meetings.
Outside of those, there have been a few niche consumer successes like ChatPDF or PhotoAI but much of the world still feels like it is grappling with how to use LLMs effectively, with as many epic failures as successes.
There’s a lot to learn and figure out here… if you’re reading this and curious, one least thing you might be interested in is joining the AI in Action discussion group that happens weekly organized as a part of Latent Space. You can find that (and other events like ML focused paper clubs) at the Latent Space Luma.