The debate about how far scaling can take Neural Networks like Large Language Models continues, with voices on both sides. Some believe that LLMs have hit a wall, while others speculate about step-function steps towards Artificial General Intelligence. Today’s LLMs – the foundational technology behind Generative AI – are plagued by scalability issues, high energy demands, and limitations in their ability to generalize, extrapolate, and reason. Enter the Titans!
Last month, a Cornell University PhD student researcher working with a team of scientists at the Algorithms and Optimization group at Google Research proposed a novel family of neural network architectures (Titans) designed to address these issues. Inspired by how human memory works, they introduced a new neural long-term memory module designed to learn and memorize historical context during test time. This module complements attention mechanisms (Transformers, which function as short-term memory due to their limited context window) and the ability to retain long-past information.
One possible immediate application, and a pet peeve of mine, is extending the context window that chatbot interfaces like ChatGPT or Google Gemini use. This is the “amount of stuff” that you can load into the conversation before you are alerted that you exceeded a limit. But this is only the beginning, and if the researchers are right, this can transform (sorry for the pun!) the way we train models and reasoning.
Their paper is technical, as it should be, so I resorted to AI to help me with a first summary. NotebookLM (Google) helps generate conversational summaries in layman's terms before you decide to dive into the topic. Here it is:
The proof is in the pudding, and the authors’ experimental results demonstrate that Titans outperform existing models with better scaling for longer sequences and improved results in language modeling tasks. It is exciting to see the race between the research labs to see who will come up with the next breakthrough.
PS: I have asked Grok to illustrate this blog entry (as you can see from the watermark on the bottom right of the picture).
PS2: I find it interesting that NotebookLM took some license expanding on the discussion about privacy and societal impacts (6 min 15 sec mark in the recording) and using examples like learning Kung Fu skills (7 min 5 sec mark). Privacy is only briefly mentioned in the paper and there are not references to Matrix, the movie.