Thinking about Latent Space
I think one of the most powerful ways for thinking about what an LLM is actually doing is this concept of a "Latent Space".
What is a latent space? Here's a description from Chat-GPT (that yes, checks out with what I've learned)
Latent space refers to an abstract space that represents a compressed or hidden representation of complex data, often high-dimensional data such as images, videos, or sound. In machine learning, the term is frequently used in the context of generative models, such as autoencoders and variational autoencoders, where a latent space is learned by mapping the high-dimensional input data to a lower-dimensional latent space.
In this lower-dimensional latent space, the data is represented in a more compact and meaningful way, allowing for easier manipulation, exploration, and generation of new data. The latent space is often designed to have desirable properties, such as continuity and smoothness, that make it possible to interpolate between different points in the space, and generate novel data that follows the underlying distribution of the original data.
*Interestingly, this was one example where GPT-3.5 did better than GPT-4. The GPT-4 answer got bogged down in too many details and examples and didn't do a great job of explaining the core concept.
Describing an LLM using the concept of Latent Space
Conceptually, an LLM is doing a few things. Breaking it down using a transformer model (as I explored in https://www.fewshotlearning.co/p/trying-to-understand-transformer) the LLM does the following:
Transform from a set of language tokens into a token-based "embedding" or latent space of word concepts.
Use this word-based representation plus a positional vector as input to the encoder side of our transformer, which transforms the chunk of token-based embeddings into a location in a different (language concept based? phrase based?) latent space. I'll tentatively describe the vectors that describe a position in this latent space as "concept vectors".
The decoder then is essentially applying a mapping function, trained over a large amount of data, to move from the position in latent space described by the current content vector to the next most likely position in latent space, as described by our current position plus a single token. This mapping function is part of what has been trained over all of the millions of documents of training data.
The token is output, and the decoder us run again with it and all previous output tokens added to the input until it generates the next token. Steps 3 and 4 are applied iteratively until you reach a stop point.
One way we might conceptualize what has happened as language models have grown is that they have both gotten a much more detailed "latent space", and a much more precise mapping function.
All of the exercises in prompt tuning are about setting up the right starting position in latent space. "Think step by step, cite your sources" is moving us to a part of the latent space where the language outputted includes breaking things down into steps and adding source links.

Implications for what LLMs are and aren’t good at
The latent space encodes both linguistic patterns and knowledge, captured by the training data. This is what allows an LLM like GPT-4 to not only handle language tasks, but to share and explore knowledge about the world.
However understanding the core model is a key concept. The LLM is mapping over a space that is purely derived from language. When we see LLMs reproducing what we might describe as higher order reasoning, they're not doing it the same way we might do so. We use language as an interface to other types of mental models, but for LLMs language is all there is.
It's hard to reason about what this means in the abstract, so let's look at an example.
An example using logical reasoning
One of the interesting emergent properties evident in the latest LLMs is the emergence of what appears to be logical reasoning. Especially when we prompt the LLM with something like "chain of thought prompting", the LLM appears to be able to reason logically.
I don't believe that there is an embodied concept of reasoning, but rather that there is an area of latent space along some dimensions of the concept vector that captures the parts of the training data that look like logical reasoning. If you are able to move the model into that part of the space, it will reproduce what looks like logical reasoning.
The fidelity of that logical reasoning is still somewhat suspect. Sometimes it works, sometimes it doesn't.
GPT-4 is way better than prior models at it, but still gets things wrong pretty frequently.
And I think that is related to this difference in what is happening. The LLM doesn't have an underlying representation of a logical model or logical reasoning that it can somehow "check its work" against. It has instead a high-dimensional vector space that has some number of dimensions that represent "logical-like" arguments, and a mapping function that attempts to reproduce those.
Implications for LLMs as general purpose agents.
I think this comes to a core limitation of LLMs as "general purpose" agents.
Humans use language as an interface to attempt to communicate about other underlying models. Using the logic example, there is a relatively straightforward model for how logic works that can be described as a relatively simple set of rules. When we talk about logical problems, we are mapping between particular situations (described using language) and that underlying logical model.
LLMs have no such underlying logical model. They have linguistic model. By training it with very large numbers of logical examples, it can get to pretty high fidelity on reproducing what looks like logic, but it both is inefficient in that representation relative to our simpler rule-based model, and it will tend to fail in places where small linguistic changes imply large logical changes.
Does this imply LLMs cannot derive these underlying models?
I'm not sure that I'm willing to go that far. There is definitely research showing the abilities of these models to represent underlying rules and state when properly trained.
However, it seems pretty clear with the current generation of LLMs that they have not managed to derive an underlying logical model, or they wouldn't fall victim to the types of mistakes they do now. And OpenAI's Sam Altman seems to be indicating that we've reached the end of gains to be made simply by scaling up models.
It's possible that we'll be able to train multi-modal models that address this by training on many different types of data. Apparently GPT-4 can do this to some extent with images & text (but that has not been released to the public generally, leading me to believe it's got a lot of edge cases and issues and needs to be pretty carefully constrained)
Instead, I think it would point to a future that looks much more like the latter half of that article: multiple models wrapped up inside of applications, where large numbers of domain-specific models are integrated together with some sort of interface layer.
In other words, a lot like our existing AI world. Except what LLMs do provide is an extremely powerful interface layer, where we can ask for what we want using natural language, and it can interpret that natural language to understand what model is likely to be the best at answering our question.
Looking towards the future
I'm using this mental framework for a few different things.
First, to try to better understand what LLMs are and more importantly are not going to be good at themselves. I'll flesh this out in future posts, but I think a broad way of thinking about this is that the better a domain is modeled by language, the better an LLM will do at it. And the more small linguistic changes mean big domain changes, the worse an LLM will do.
A quick example is around asking for statistics - when asking an LLM about the world (say what percentage of people have anxiety disorders), the difference between 21% and 42% is extremely small linguistically, but makes a massive difference in our model of the world.
Second, to think about how to integrate LLMs into applications. I'm looking strongly at things I do using text, and trying to figure out applications take advantage of the LLM strong understanding of text to make my processes better.
Third, to try to understand what our risks are of some sort of "True AGI", with either utopian or dystopian outcomes. There's a lot of very smart people who are concerned here, but based on what we've seen here I think the LLM advancements are not a massive acceleration in the danger curve.
They have provided a step-function increase in our ability to parse and do things with natural language, which is an extremely powerful general-purpose technology, and now there is a tremendous excitement and rush of people figuring out now ways to apply this technology. That creates a massive excitement, lots of new people and money jumping into AI, and probably does increase the likelihood and speed at which we'll arrive at something that looks like AGI.
But I see no evidence that LLMs themselves have put us near to that, and the projections of "AGI in the next 5 years" are IMO pure hyperbole.