Alibaba's Qwen AI Model Breaks Open-Source Records

Introduction to Qwen3-235B-A22B-Thinking-2507

The Qwen team from Alibaba has just released a new version of their open-source reasoning AI model with some impressive benchmarks. Meet Qwen3-235B-A22B-Thinking-2507. Over the past three months, the Qwen team has been hard at work scaling up what they call the “thinking capability” of their AI, aiming to improve both the quality and depth of its reasoning.

Key Features and Benchmarks

The result of their efforts is a model that excels at the really tough stuff: logical reasoning, complex maths, science problems, and advanced coding. In these areas that typically require a human expert, this new Qwen model is now setting the standard for open-source models. On reasoning benchmarks, Qwen’s latest open-source AI model achieves 92.3 on AIME25 and 74.1 on LiveCodeBench v6 for coding. It also holds its own in more general capability tests, scoring 79.7 on Arena-Hard v2, which measures how well it aligns with human preferences.

How it Works

At its heart, this is a massive reasoning AI model from the Qwen team with 235 billion parameters in total. However, it uses Mixture-of-Experts (MoE), which means it only activates a fraction of those parameters – about 22 billion – at any one time. Think of it like having a huge team of 128 specialists on call, but only the eight best-suited for a specific task are brought in to actually work on it. Perhaps one of its most impressive features is its massive memory. Qwen’s open-source reasoning AI model has a native context length of 262,144 tokens; a huge advantage for tasks that involve understanding vast amounts of information.

Getting Started with Qwen3-235B-A22B-Thinking-2507

For the developers and tinkerers out there, the Qwen team has made it easy to get started. The model is available on Hugging Face. You can deploy it using tools like sglang or vllm to create your own API endpoint. The team also points to their Qwen-Agent framework as the best way to make use of the model’s tool-calling skills. To get the best performance from their open-source AI reasoning model, the Qwen team have shared a few tips. They suggest an output length of around 32,768 tokens for most tasks, but for really complex challenges, you should boost that to 81,920 tokens to give the AI enough room to “think”. They also recommend giving the model specific instructions in your prompt, like asking it to “reason step-by-step” for maths problems, to get the most accurate and well-structured answers.

Conclusion

The release of this new Qwen model provides a powerful yet open-source reasoning AI that can rival some of the best proprietary models out there, especially when it comes to complex, brain-bending tasks. It will be exciting to see what developers ultimately build with it.

FAQs

Q: What is Qwen3-235B-A22B-Thinking-2507?
A: Qwen3-235B-A22B-Thinking-2507 is a new version of the open-source reasoning AI model developed by the Qwen team from Alibaba.
Q: What are the key features of Qwen3-235B-A22B-Thinking-2507?
A: The model excels at logical reasoning, complex maths, science problems, and advanced coding, and has a massive memory with a native context length of 262,144 tokens.
Q: How can I get started with Qwen3-235B-A22B-Thinking-2507?
A: The model is available on Hugging Face, and you can deploy it using tools like sglang or vllm to create your own API endpoint.
Q: What are the benchmarks for Qwen3-235B-A22B-Thinking-2507?
A: The model achieves 92.3 on AIME25, 74.1 on LiveCodeBench v6 for coding, and 79.7 on Arena-Hard v2.
Q: What are the tips for getting the best performance from Qwen3-235B-A22B-Thinking-2507?
A: The Qwen team recommends an output length of around 32,768 tokens for most tasks, and boosting that to 81,920 tokens for really complex challenges. They also recommend giving the model specific instructions in your prompt.