Sept. 30, 2024 — In 2016, the Division of Power’s Exascale Computing Project, or ECP, got down to develop superior software program for the arrival of exascale-class supercomputers able to a quintillion (1018) or extra calculations per second. That meant rethinking, reinventing and optimizing dozens of scientific purposes and software program instruments to leverage exascale’s thousandfold enhance in computing energy.
That point has arrived as the primary DOE exascale laptop — the Oak Ridge Leadership Computing Facility’s Frontier — opens to customers all over the world. “Exascale’s New Frontier” explores the purposes and software program know-how for driving scientific discoveries within the exascale period.
The Scientific Problem
A single drop of water or handful of grime can include its personal universe of microbial organisms, many so small they proceed to evade detection by all however the closest examination. Piecing collectively the traces of those microbes, notably the proteins they depend on to outlive, requires sifting via mountains of knowledge at a time. That’s lengthy been a activity past the attain of even the quickest supercomputers — till now.
Why Exascale?
The ExaBiome project, a joint effort of scientists at Lawrence Berkeley and Los Alamos nationwide laboratories and the Joint Genome Institute, seeks to catalog these microscopic ecosystems, or microbiomes, through the facility of exascale computing on the Frontier supercomputer at DOE’s Oak Ridge Nationwide Laboratory. The ExaBiome staff has spent years growing and optimizing codes equivalent to MetaHipMer for assembling genomes from microbial samples; the Protein Alignment via Sparse Matrices code, or PASTIS; and the Excessive Efficiency Markov Clustering algorithm, or HipMCL. These purposes harness exascale’s speeds to reconstruct, classify and evaluate collected genome sequences and to know the connection and performance of genes inside microbial species.
“Meeting is like placing collectively a jigsaw puzzle with no field cowl to information us and with all of the items dumped collectively from a whole bunch of various puzzles,” stated Kathy Yelick, a senior computational scientist at Berkeley Lab. “We select these items and put them collectively into sequences. We could not have an entire puzzle, however we not less than have elements of it. Then we will put these lengthy sequences into bins that go collectively and evaluate them to what we already know. These organisms stay in communities, so one thing like a single soil pattern might need a whole bunch of hundreds of those species in it.”
Even the typical supercomputer can’t deal with calculations of that dimension and complexity.
“If we don’t have the aptitude of operating these massive, distributed computations, these small species find yourself trying like errors as a result of there aren’t sufficient single microbes to be acknowledged on their very own,” stated Leonid Oliker, a senior computational scientist at Berkeley Lab and director of the ExaBiome venture. “It’s solely when these microbes are mixed that there’s one thing to see. That’s why solely an exascale machine like Frontier can do that on the velocity and scale that we’d like.”
Frontier Success
The ExaBiome codes have run calculations throughout all of Frontier’s greater than 9,000 compute nodes. Frontier’s large exascale throughput allowed researchers to shrink the cataloging and comparability work of months or weeks into days or hours. The general peak efficiency on Frontier displays a 536× enchancment over the benchmark initially set by the staff.
“Exascale has enabled us to find new microbial species that don’t exist in any established databases,” Yelick stated. “Due to Frontier, we will analyze a lot bigger datasets than have ever been doable — as much as 100 terabytes and extra.
“Exascale has modified not simply our understanding however how we conduct science within the environmental biology group. Now that we will analyze datasets this massive, it’s price amassing extra knowledge. Earlier than, there was much less incentive to gather data on this a lot element as a result of we have been actually restricted by the computational elements so far as what we may analyze. We couldn’t do something with it. Now we’ve the chance to research as a lot as we will acquire.”
What’s Subsequent?
The ExaBiome staff plans to use exascale evaluation to among the basic questions of biology and genomics, from the human biome to microbe samples gathered from the ocean flooring.
“Due to Frontier, we’re gaining a a lot clearer, far more detailed image of what’s dwelling and taking place within the microbial world,” Yelick stated. “What’s the purposeful conduct of those microbes? What genes do they possess? How do they work together? We’re now nearer to answering all of those questions. If we actually wish to perceive all of the microbes on this planet and precisely how they work together with each other, that’s an issue too huge even for Frontier. However this can be a begin.”
Assist for this analysis got here from the ECP, a collaborative effort of the DOE Workplace of Science and the Nationwide Nuclear Safety Administration, and from the DOE Workplace of Science’s Superior Scientific Computing Analysis program. The OLCF is an Workplace of Science person facility at ORNL.
Source: OLCF