Launched four years ago, MLPerf paper money that overall performance and engagement have increased, with 19 organizations submitting twice as many results and six times as many power measurements in the leading Inference 2.0 (data center and edge) exercise. .
The results show that NVIDIA remains the main AI accelerator, in terms of data center and widely available offerings.
David Kanter, CEO of MLCommons , the body behind MLPerf, said: “It has been an extraordinary effort on the part of the machine learning community, with so many new entrants and a tremendous increase in the number and diversity of submissions. I am particularly excited to see increased adoption of power and energy measurements, highlighting the industry’s interest in efficient AI.”
The MLPerf benchmark is held four times a year, with inference results in the first and third quarters and training results in the moment and fourth quarters. Of the two, model training is more computationally intensive and tends to fall into the realm of HPC; inference is less so, but it is still demanding. The latest round of inference had three different benchmarks: Inference v2.0 (data center and edge); Mobile v2.0 (mobile phones); and Tiny v0.7 (IoT). MLPerf divides exercises into divisions and categories to make comparisons between systems fairer and easier, as shown on the next slide.
NVIDIA was, again, the company that obtained the best results, in most tests. Quacomm had strong results, particularly in edge AI applications. Its Qualcomm Cloud AI 100 accelerator is designed not only to have good performance, but also to be energy efficient, a quality that was evident during the tests.
During an NVIDIA briefing with media and analysts, David Salvator, Product Manager for AI Inference and Cloud, recognized the great power of Qualcomm. “There are a couple of places in CNN-type networks where, frankly, Qualcomm has offered a pretty good solution when it comes to efficiency. That said, we outperformed them on both workloads and, in the case of SSD-Large, by a factor of about three or four. A really substantial performance difference, whether you put it in the context of how many servers it would take to receive equivalent performance, which really reduces their advantage per watt.”
NVIDIA’s briefing also focused on its latest Jetson AGX ORIN device and its edge performance. Software was once again a key driver of performance gains. Also featured was NVIDIA’s Triton software platform, which was used with both NVIDIA-based systems and presentations based on AWS instances that use their Inferentia processor instead of NVIDIA accelerators.
Intel, which participated in the Closed Inference Division last call, did not this time; instead, it opted for the Open Split, which allows for greater flexibility of hardware and system components.
There have been a couple of changes to the latest MLPerf Inference components and procedure. One of them was to shorten the time needed to run the tests. As explained on the MLPerf website, “we made a rule change that allows each benchmark test, more or less, to run in less than 10 minutes. And that took a lot of statistical analysis and work to get correct. But this has shortened the execution time of some of the benchmarks running on lower performance systems. You know, there are people who are presenting on Raspberry Pi. And this allows them to do it in a much more well timed manner.”