Huawei’s AI Chip Breakthrough

Huawei is expected to begin large-scale shipments of the Ascend 910C AI chip as early as next month, according to people familiar with the matter.

While limited quantities have already been delivered, mass deployment would mark an important step for Chinese firms seeking domestic alternatives to US-made semiconductors.

The Need for Domestic Alternatives

The move comes at a time when Chinese developers face tighter restrictions on access to Nvidia hardware. The US government recently informed Nvidia that sales of its H20 AI chip to China require an export licence. That’s left developers in China looking for options that can support large-scale training and inference workloads.

Huawei’s Solution: The Ascend 910C Chip

The Huawei Ascend 910C chip isn’t built on the most advanced process nodes, but it represents a workaround. The chip is essentially a dual-package version of the earlier 910B, with two processors to double the performance and memory. Sources familiar with the chip say it performs comparably to Nvidia’s H100.

CloudMatrix 384: A Full Rack-Scale AI Platform

Rather than relying on cutting-edge manufacturing, Huawei has adopted a brute-force approach, combining multiple chips and high-speed optical interconnects to scale up performance. This approach is central to Huawei’s CloudMatrix 384 system, a full rack-scale AI platform for training large models.

The CloudMatrix 384 features 384 Huawei Ascend 910C chips deployed in 16 racks comprising of 12 compute racks and four networking. Unlike copper-based systems, Huawei’s platform uses optical interconnects, enabling high-bandwidth communication between components of the system.

Performance and Efficiency

According to analysis from SemiAnalysis, the architecture includes 6,912 800G LPO optical transceivers to form an optical all-to-all mesh network. This allows Huawei’s system to deliver approximately 300 petaFLOPs of BF16 compute power – outpacing Nvidia’s GB200 NVL72 system, which reaches around 180 BF16 petaFLOPs.

The CloudMatrix also claims advantages in higher memory bandwidth and capacity, offering more than double the bandwidth and over 3.6 times the high-bandwidth memory (HBM) capacity. However, the gains are not without drawbacks, as the Huawei system is predicted to be 2.3 times less efficient per floating point operation than Nvidia’s GB200 and has lower power efficiency per unit of memory bandwidth and capacity.

Production and Supply Chain

Sources indicate that China’s largest chip foundry, SMIC, is producing some of the main components for the 910C using its 7nm N+2 process. Yield levels remain a concern, however, and some of the 910C units reportedly include chips produced by TSMC for Chinese firm Sophgo. Huawei has denied using TSMC-made parts.

Conclusion

While the Huawei Ascend 910C may not match Nvidia in power efficiency or process technology, it signals a broader trend. Chinese technology firms are developing homegrown alternatives to foreign components, even if it means using less advanced methods to achieve similar outcomes. As global AI demand surges and export restrictions tighten, Huawei’s ability to deliver a scalable AI hardware solution domestically could help shape China’s artificial intelligence future.

Frequently Asked Questions

Q: What is the Huawei Ascend 910C chip?

A: The Huawei Ascend 910C chip is a domestic alternative to US-made semiconductors, designed for large-scale training and inference workloads.

Q: How does the Ascend 910C chip perform compared to Nvidia’s H100?

A: Sources familiar with the chip say it performs comparably to Nvidia’s H100.

Q: What is the CloudMatrix 384 system?

A: The CloudMatrix 384 system is a full rack-scale AI platform for training large models, featuring 384 Huawei Ascend 910C chips deployed in 16 racks.