SC2024 had a few sessions talking about the importance of benchmarking systems. The relevance of LINPACK — which is used to rank the Top500 — has been a hot topic of discussion lately. Panelists said supercomputing benchmarking will remain important for many reasons. But it would need to keep up with applications, hardware, and codebases.
The relevance depends on the audience and applications. It’s not obvious, but benchmarks also have geopolitical, social, environmental, and economic implications.
For science, LINPACK measures the time-to-science and value extracted from hardware.
For AI, benchmarks such as MLPerf help hyperscalers get a real-world view of performance and optimize new models to hardware. The benchmark tells the company that the model can be fine-tuned further for GPUs or ASICs.
Here are some reasons why benchmarks are important.
Top500 Is a historical record of computing progress
Top500, which has run for 32 years, is a historical record of how humanity has moved forward in computing.
“The word that comes to his mind about Linpack is, of course, the word legacy. It’s an important legacy that we continue so we have historical continuity,” said Piotr Luszczek of Massachusetts Institute of Technology Lincoln Lab and the University of Tennessee during a session.
A one-line wonder for politicians
The Top500 list is a yardstick to measure U.S. progress in computing. As an example, Top500 provides an easy one-liner to tell politicians that the U.S. is leading over China.
Of course, the Top500 is much more complex than that. But impatient politicians want a quick summary, and the Top500 provides those answers.
“Composite metrics are always problematic. Having a lot of numbers is good for the engineers, but it’s not good for politicians, so to speak. They just want the one number,” Luszczek said.
“I’ve listened to many hours of science policy testimony with members of Congress asking our agency leaders about these things. They don’t know squat about HPL, but they know a list, and they ask, ‘Are we in the lead or are we still in the lead?’” said Jack Wells, a scientific program manager at Nvidia.
Wells previously worked as director of science for ORNL.
The stock measure of computing
Wells said the Top500 is to computing what the Dow Jones Industrials is to the stock market.
“It has the same impact when the community says, ‘Oh, the Dow Industrials did this today.’ It’s the same thing. But we know it changes,“ Wells said.
The Top500 showed that Dennard Scaling stopped in 2017-18. Computing progress shows that chiplets in AMD’s MI-series GPUs are impacting the landscape of computing performance.
Generating value
Supercomputing modules are put into production as soon as they are ready. Benchmarking provides a way to measure the readiness of the system.
“When the first part of it becomes available, it goes into production. When the next part of it is available, it goes into production… And that way, you start to generate value from that machine sooner and sooner,“ said Andrew Jones of Microsoft Azure at a technical session discussing benchmarks.
“If you look at the hyperscaler market training the large language models, there is clear enough business value to having a more capable AI model out before your competitors,“ Jones said.
Customers generally look at AI models’ accuracy and response time. MLPerf has many benchmarking tools that consider many points, including the environmental impact, to measure AI performance.
Optimizing Workloads
Jones said that significantly more science can be done on a system that’s better operated than a larger one that is more poorly operated.
Deploying the latest generation or a new generation technology may not be more efficient than “one that has chosen to optimize its actual science per dollar or science per megawatt,“ Jones said.
David Kanter, head of MLPerf, pointed out that benchmarking helps build out targeted systems. He gave the example of RIKEN’s Fugaku supercomputer.
“One of the supercomputer sites that has consistently impressed me is the folks at Riken and with Fugaku… they wrote a paper that basically said… we are not optimizing for the most flops because most workloads are sparse… we’re going to sacrifice half our peak flops. But gosh, on real workloads, we’re going to come out,“ Kanter said.
Environmental Impact
The talk about optimizing the power efficiency of computing is done under the label of sustainability.
“What we really mean is we don’t want to spend money on megawatts; we’d rather spend money on compute,“ Jones said.
Building the physical infrastructure — including the manufacturing processes, mining, and laying concrete for data centers — has a “much larger, much bigger carbon output than the actual electricity consumed during the operation of supercomputers,“ Jones said.
“One of the most sustainable things you can do with your supercomputer is not to make it slightly more efficient in energy terms but to move it somewhere else in the world,“ Jones said.
Talent Evaluation
Beyond hardware, benchmarks also measure the software stack, tuning skills, and team expertise.
Kanter said single numbers don’t capture the system complexity of benchmarking.
The panelists said there may be statistical variations of a given application or code on a particular combination of hardware and software.
“The user does not run an application 100 times on the supercomputer and then pick the one with the fastest result,“ Jones said.
Benchmarking can indirectly be a good way to measure the expertise of your IT staff.
“You’re measuring the performance of your benchmarking code, the portability of your benchmarking code, how well you’ve been able to tune it. You’re measuring the performance of your benchmarking team and the skill of the benchmarking team,“ Jones said.
Communicating with stakeholders
Kanter said that not everyone is looking for one metric — different teams look for different numbers.
The benchmark can explain things with different metrics to teams and then facilitate cooperation.
There could be one metric on one codebase, but there may be differences in sizes and conditions. The rules could be different.
The panelists said that that could be different when you’re trying to figure out which system to buy versus how to help users effectively utilize that system.
Academia & Research, Community, Energy, Entertainment, Financial Services, Government, Life Sciences, Manufacturing, Oil & Gas, Retail, semiconductor, Space & Physics, Weather & Climate
This articles is written by : Nermeen Nabil Khear Abdelmalak
All rights reserved to : USAGOLDMIES . www.usagoldmines.com
You can Enjoy surfing our website categories and read more content in many fields you may like .
Why USAGoldMines ?
USAGoldMines is a comprehensive website offering the latest in financial, crypto, and technical news. With specialized sections for each category, it provides readers with up-to-date market insights, investment trends, and technological advancements, making it a valuable resource for investors and enthusiasts in the fast-paced financial world.
