Tech

TTT models might be the next frontier in generative AI | TechCrunch

Published

5 months ago

July 17, 2024

Admin

TTT models might be the next frontier in generative AI | TechCrunch

After years of dominance by the form of AI known as the transformer, the hunt is on for new architectures.

Transformers underpin OpenAI’s video-generating model Sora, and they’re at the heart of text-generating models like Anthropic’s Claude, Google’s Gemini and GPT-4o. But they’re beginning to run up against technical roadblocks — in particular, computation-related roadblocks.

Transformers aren’t especially efficient at processing and analyzing vast amounts of data, at least running on off-the-shelf hardware. And that’s leading to steep and perhaps unsustainable increases in power demand as companies build and expand infrastructure to accommodate transformers’ requirements.

A promising architecture proposed this month is test-time training (TTT), which was developed over the course of a year and a half by researchers at Stanford, UC San Diego, UC Berkeley and Meta. The research team claims that TTT models can not only process far more data than transformers, but that they can do so without consuming nearly as much compute power.

The hidden state in transformers

A fundamental component of transformers is the “hidden state,” which is essentially a long list of data. As a transformer processes something, it adds entries to the hidden state to “remember” what it just processed. For instance, if the model is working its way through a book, the hidden state values will be things like representations of words (or parts of words).

“If you think of a transformer as an intelligent entity, then the lookup table — its hidden state — is the transformer’s brain,” Yu Sun, a post-doc at Stanford and a co-contributor on the TTT research, told TechCrunch. “This specialized brain enables the well-known capabilities of transformers such as in-context learning.”

The hidden state is part of what makes transformers so powerful. But it also hobbles them. To “say” even a single word about a book a transformer just read, the model would have to scan through its entire lookup table — a task as computationally demanding as rereading the whole book.

So Sun and team had the idea of replacing the hidden state with a machine learning model — like nested dolls of AI, if you will, a model within a model.

It’s a bit technical, but the gist is that the TTT model’s internal machine learning model, unlike a transformer’s lookup table, doesn’t grow and grow as it processes additional data. Instead, it encodes the data it processes into representative variables called weights, which is what makes TTT models highly performant. No matter how much data a TTT model processes, the size of its internal model won’t change.

Sun believes that future TTT models could efficiently process billions of pieces of data, from words to images to audio recordings to videos. That’s far beyond the capabilities of today’s models.

“Our system can say X words about a book without the computational complexity of rereading the book X times,” Sun said. “Large video models based on transformers, such as Sora, can only process 10 seconds of video, because they only have a lookup table ‘brain.’ Our eventual goal is to develop a system that can process a long video resembling the visual experience of a human life.”

Skepticism around the TTT models

So will TTT models eventually supersede transformers? They could. But it’s too early to say for certain.

TTT models aren’t a drop-in replacement for transformers. And the researchers only developed two small models for study, making TTT as a method difficult to compare right now to some of the larger transformer implementations out there.

“I think it’s a perfectly interesting innovation, and if the data backs up the claims that it provides efficiency gains then that’s great news, but I couldn’t tell you if it’s better than existing architectures or not,” said Mike Cook, a senior lecturer in King’s College London’s department of informatics who wasn’t involved with the TTT research. “An old professor of mine used to tell a joke when I was an undergrad: How do you solve any problem in computer science? Add another layer of abstraction. Adding a neural network inside a neural network definitely reminds me of that.”

Regardless, the accelerating pace of research into transformer alternatives points to growing recognition of the need for a breakthrough.

This week, AI startup Mistral released a model, Codestral Mamba, that’s based on another alternative to the transformer called state space models (SSMs). SSMs, like TTT models, appear to be more computationally efficient than transformers and can scale up to larger amounts of data.

AI21 Labs is also exploring SSMs. So is Cartesia, which pioneered some of the first SSMs and Codestral Mamba’s namesakes, Mamba and Mamba-2.

Should these efforts succeed, it could make generative AI even more accessible and widespread than it is now — for better or worse.

Related Topics:AI generative AI Research ttt models

Up Next

This Prime Day has cranked up the deals on Bose and Sony speakers

Don't Miss

The Sonos Era 100 Drops to Its Lowest Price on Amazon for Prime Day – IGN

Crunchbase News Today

TTT models might be the next frontier in generative AI | TechCrunch

Tech

TTT models might be the next frontier in generative AI | TechCrunch

The hidden state in transformers

Skepticism around the TTT models

UFC 2025 Futures – Featherweight (Crypto Bots of the World Unite) | MMA Gambling Podcast (Ep.724)

iOS 19 Rumored to Be Compatible With These iPhones

Last-minute White Elephant gifts everyone will want to steal

Five Flyers Prospects to Play in World Junior Championship | Philadelphia Flyers

Fashion Mogul Julie Wainwright Just Sold Her Modern Beverly Hills Home for $10.7 Million

Millions expected to hit Chicago roads, airports for holiday travel, experts say

Elevate Your Holiday Style: 20 Fashion Essentials You Can’t Miss

Nearly 1/4 Of Job-Seeking Class Of 2024 Harvard MBAs Couldn’t Find Work After Months Of Searching

‘The best atmosphere in sport’: How traditional ‘pub game’ darts has become must-watch sport for Britons at Christmas

8 health and fitness tips to stave off an indulgent Christmas