Claude 4 AI Model Refactored in 7 Hour Coding Marathon

Introduction to Anthropic’s New Models

On Thursday, Anthropic released Claude Opus 4 and Claude Sonnet 4, marking the company’s return to larger model releases after primarily focusing on mid-range Sonnet variants since June of last year. The new models represent what the company calls its most capable coding models yet, with Opus 4 designed for complex, long-running tasks that can operate autonomously for hours.

The Demand for Agentic AI Applications

Alex Albert, Anthropic’s head of Claude Relations, told Ars Technica that the company chose to revive the Opus line because of growing demand for agentic AI applications. "Across all the companies out there that are building things, there’s a really large wave of these agentic applications springing up, and a very high demand and premium being placed on intelligence," Albert said. "I think Opus is going to fit that groove perfectly."

Understanding Claude’s AI Model Sizes

Before we go further, a brief refresher on Claude’s three AI model "size" names (introduced in March 2024) is probably warranted. Haiku, Sonnet, and Opus offer a tradeoff between price (in the API), speed, and capability. Haiku models are the smallest, least expensive to run, and least capable in terms of what you might call "context depth" (considering conceptual relationships in the prompt) and encoded knowledge.

Haiku, Sonnet, and Opus Models

Haiku models retain fewer concrete facts and thus tend to confabulate more frequently (plausibly answering questions based on lack of data) than larger models, but they are much faster at basic tasks than larger models. Sonnet is traditionally a mid-range model that hits a balance between cost and capability, and Opus models have always been the largest and slowest to run. However, Opus models process context more deeply and are hypothetically better suited for running deep logical tasks.

New Model Capabilities

There is no Claude 4 Haiku just yet, but the new Sonnet and Opus models can reportedly handle tasks that previous versions could not. In our interview with Albert, he described testing scenarios where Opus 4 worked coherently for up to 24 hours on tasks like playing Pokémon while coding refactoring tasks in Claude Code ran for seven hours without interruption. Earlier Claude models typically lasted only one to two hours before losing coherence, Albert said, meaning that the models could only produce useful self-referencing outputs for that long before beginning to output too many errors.

Conclusion

The release of Claude Opus 4 and Claude Sonnet 4 marks a significant milestone for Anthropic, as the company continues to push the boundaries of AI capabilities. With the growing demand for agentic AI applications, these new models are well-positioned to meet the needs of developers and users alike.

FAQs

Q: What are the main differences between Haiku, Sonnet, and Opus models?
A: Haiku models are the smallest and least expensive, but also the least capable. Sonnet models offer a balance between cost and capability, while Opus models are the largest and most capable, but also the slowest to run.
Q: What is the significance of the new Opus 4 model?
A: Opus 4 is designed for complex, long-running tasks that can operate autonomously for hours, making it well-suited for agentic AI applications.
Q: How do the new Sonnet and Opus models compare to earlier versions?
A: The new models can reportedly handle tasks that previous versions could not, with Opus 4 working coherently for up to 24 hours on certain tasks.