New MLPerf Inference v4.1 Benchmark Results Highlight Rapid Hardware and Software Innovations in Generative AI Systems

New Mixture of Experts Benchmark Tracks Emerging Architectures for AI Models

MLCommons Announces New Results for MLPerf Inference v4.1 Benchmark Suite

Today, MLCommons announced new results for its industry-standard MLPerf Inference v4.1 benchmark suite, which delivers machine learning (ML) system performance benchmarking in an architecture-neutral, representative, and reproducible manner. This release includes first-time results for a new benchmark based on a mixture of experts (MoE) model architecture. It also presents new findings on power consumption related to inference execution.

MLPerf Inference v4.1

The MLPerf Inference benchmark suite, which encompasses both data center and edge systems, is designed to measure how quickly hardware systems can run AI and ML models across a variety of deployment scenarios. The open-source and peer-reviewed benchmark suite creates a level playing field for competition that drives innovation, performance, and energy efficiency for the entire industry. It also provides critical technical information for customers who are procuring and tuning AI systems.

New Benchmark: Mixture of Experts (MoE)

The MoE benchmark is unique and one of the most complex implemented by MLCommons to date. It uses the open-source Mixtral 8x7B model as a reference implementation and performs inferences using datasets covering three independent tasks: general Q&A, solving math problems, and code generation.

Benchmarking Power Consumption

The MLPerf Inference v4.1 benchmark includes 31 power consumption test results across three submitted systems covering both datacenter and edge scenarios. These results demonstrate the continued importance of understanding the power requirements for AI systems running inference tasks. As power costs are a substantial portion of the overall expense of operating AI systems.

The Increasing Pace of AI Innovation

Today, we are witnessing an incredible groundswell of technological advances across the AI ecosystem, driven by a wide range of providers including AI pioneers; large, well-established technology companies; and small startups. MLCommons would especially like to welcome first-time MLPerf Inference submitters AMD and Sustainable Metal Cloud, as well as Untether AI, which delivered both performance and power efficiency results.

View the Results

To view the results for MLPerf Inference v4.1, please visit HERE.

Conclusion

The MLPerf Inference v4.1 benchmark suite is a significant step forward in providing a standardized and representative benchmark for measuring the performance of AI and ML models. The addition of the MoE benchmark and the focus on power consumption will help to drive innovation, performance, and energy efficiency in the AI industry.

Frequently Asked Questions (FAQs)

What is the purpose of the MLPerf Inference v4.1 benchmark suite?
- The MLPerf Inference v4.1 benchmark suite is designed to measure the performance of AI and ML models across a variety of deployment scenarios.
What is the new Mixture of Experts (MoE) benchmark?
- The MoE benchmark is a new benchmark that uses a collection of smaller "expert" models to generate results, rather than a single massive model.
What is the focus of the MLPerf Inference v4.1 benchmark?
- The focus of the MLPerf Inference v4.1 benchmark is on providing a standardized and representative benchmark for measuring the performance of AI and ML models, with a focus on power consumption and the Mixture of Experts (MoE) architecture.