Moonshot AI Outperforms GPT-5 and Claude at a Fraction of the Cost

Introduction to Moonshot AI

A Chinese AI startup, Moonshot, has disrupted expectations in artificial intelligence development after its Kimi K2 Thinking model surpassed OpenAI’s GPT-5 and Anthropic’s Claude Sonnet 4.5 across multiple performance benchmarks, sparking renewed debate about whether America’s AI dominance is being challenged by cost-efficient Chinese innovation.

Performance Metrics Challenge US Models

According to the company’s GitHub blog post, Kimi K2 Thinking scored 44.9% on Humanity’s Last Exam, a large language model benchmark consisting of 2,500 questions across a broad range of subjects, exceeding GPT-5’s 41.7%. The model also achieved 60.2% on the BrowseComp benchmark, which evaluates web browsing proficiency and information-seeking persistence of large language model agents, and scored 56.3% to lead in the Seal-0 benchmark designed to challenge search-augmented models on real-world research queries.

Cost Efficiency Raises Questions

The popularity of the model grew after CNBC reported its training cost was merely US$4.6 million, though Moonshot AI did not comment on the cost. According to calculations by the South China Morning Post, the cost of Kimi K2 Thinking’s application programming interface was six to 10 times cheaper than that of OpenAI and Anthropic’s models.

Technical Capabilities and Limitations

Moonshot AI researchers said Kimi K2 Thinking set “new records across benchmarks that assess reasoning, coding and agent capabilities”. The model can execute up to 200-300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems. Independent testing by consultancy Artificial Analysis placed Kimi K2 on top of its Tau-2 Bench Telecom agentic benchmark with 93% accuracy, which was described as the highest score it has independently measured.

Market Implications and Competitive Pressure

Zhang Ruiwang, a Beijing-based information technology system architect, said the trend was for Chinese companies to keep costs down, explaining, “The overall performance of Chinese models still lags behind top US models, so they have to compete in the realms of cost-effectiveness to have a way out”. Zhang Yi, chief analyst at consultancy iiMedia, said the training costs of Chinese AI models were seeing a “cliff-like drop” driven by innovation in model architecture and training technique, and input of quality training data, marking a shift away from the heaping of computing resources in the early days.

Industry Response and Future Outlook

Deedy Das, a partner at early-stage venture capital firm Menlo Ventures, wrote in a post on X that “Today is a turning point in AI. A Chinese open-source model is #1. Seminal moment in AI”. Nathan Lambert wrote in a Substack article that the success of Chinese open-source AI developers, including Moonshot AI and DeepSeek, showed how they “made the closed labs sweat,” adding “There’s serious pricing pressure and expectations that [the US developers] need to manage”.

Conclusion

The release of Kimi K2 Thinking positions Moonshot AI alongside other Chinese AI companies like DeepSeek, Qwen, and Baichuan that are increasingly challenging the narrative of American AI supremacy through cost-efficient innovation and open-source development strategies. Whether this represents a sustainable competitive advantage or a temporary convergence in capabilities remains to be seen as both US and Chinese companies continue advancing their models.

FAQs

What is Moonshot AI’s Kimi K2 Thinking model?
Kimi K2 Thinking is an open-source AI model developed by Moonshot AI that has surpassed OpenAI’s GPT-5 and Anthropic’s Claude Sonnet 4.5 in multiple performance benchmarks.
How much did it cost to train Kimi K2 Thinking?
The training cost of Kimi K2 Thinking was reported to be US$4.6 million.
What are the technical capabilities of Kimi K2 Thinking?
Kimi K2 Thinking can execute up to 200-300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems.
How does Kimi K2 Thinking compare to other AI models?
Kimi K2 Thinking has achieved higher scores than GPT-5 and other models in several benchmarks, including Humanity’s Last Exam and BrowseComp.
What are the implications of Kimi K2 Thinking for the AI industry?
The release of Kimi K2 Thinking challenges the narrative of American AI supremacy and highlights the growing competitiveness of Chinese AI companies in the field of artificial intelligence.