The fundamental breakthrough that appears to have led to the current “Cambrian Explosion” around language models was the invention of the Transformer architecture. If I’m understanding it properly, this new way of arranging neural networks dramatically simplified the way we represent contextual information about how a word fits in a sentence, allowing us to encode it in some vectors that can be passed along, which in turn allows these models to take what was once a serial process and parallelize it, running many tokens in parallel at once.
Trying to understand Transformer Models
Trying to understand Transformer Models
Trying to understand Transformer Models
The fundamental breakthrough that appears to have led to the current “Cambrian Explosion” around language models was the invention of the Transformer architecture. If I’m understanding it properly, this new way of arranging neural networks dramatically simplified the way we represent contextual information about how a word fits in a sentence, allowing us to encode it in some vectors that can be passed along, which in turn allows these models to take what was once a serial process and parallelize it, running many tokens in parallel at once.