Introduction to AI Language Models

Most large language models break down text into thousands of tiny units called tokens. This turns the text into representations that models can understand. However, these tokens quickly become expensive to store and compute with as conversations with end users grow longer. When a user chats with an AI for lengthy periods, this challenge can cause the AI to forget things it’s been told and get information muddled, a problem some call “context rot.”

The Problem with Current Methods

The current method of using tokens to store information can lead to inefficiencies in AI models. As the conversation grows, the number of tokens required to store the information increases, making it difficult for the AI to process and retain the information.

New Methods Developed by DeepSeek

The new methods developed by DeepSeek could help to overcome this issue. Instead of storing words as tokens, its system packs written information into image form, almost as if it’s taking a picture of pages from a book. This allows the model to retain nearly the same information while using far fewer tokens, the researchers found.

How it Works

Essentially, the OCR model is a test bed for these new methods that permit more information to be packed into AI models more efficiently. Besides using visual tokens instead of just text tokens, the model is built on a type of tiered compression that is not unlike how human memories fade: Older or less critical content is stored in a slightly more blurry form in order to save space.

Reaction from the Research Community

Text tokens have long been the default building block in AI systems. Using visual tokens instead is unconventional, and as a result, DeepSeek’s model is quickly capturing researchers’ attention. Andrej Karpathy, the former Tesla AI chief and a founding member of OpenAI, praised the paper, saying that images may ultimately be better than text as inputs for LLMs. Manling Li, an assistant professor of computer science at Northwestern University, says the paper offers a new framework for addressing the existing challenges in AI memory.

Conclusion

The new methods developed by DeepSeek have the potential to revolutionize the way AI models process and retain information. By using visual tokens and tiered compression, AI models can store more information while using fewer tokens, making them more efficient and effective. This breakthrough could lead to significant improvements in AI technology, enabling AI models to have longer and more meaningful conversations with users.

FAQs

Q: What is the current problem with AI language models?
A: The current problem with AI language models is that they break down text into thousands of tiny units called tokens, which can become expensive to store and compute with as conversations grow longer.
Q: How does DeepSeek’s new method work?
A: DeepSeek’s new method packs written information into image form, using visual tokens instead of text tokens, and employs tiered compression to store older or less critical content in a slightly more blurry form.
Q: What do researchers think of DeepSeek’s new method?
A: Researchers, including Andrej Karpathy and Manling Li, have praised DeepSeek’s new method, saying it offers a new framework for addressing existing challenges in AI memory and could potentially lead to significant improvements in AI technology.
Q: What are the potential benefits of DeepSeek’s new method?
A: The potential benefits of DeepSeek’s new method include more efficient and effective AI models that can store more information while using fewer tokens, enabling longer and more meaningful conversations with users.