As scientific data sets become progressively larger, algorithms to process the data become more complex. Artificial Intelligence (AI) has emerged as a solution to efficiently analyze these massive data sets, and new computer processor types—such as graphics processing units (GPUs) and field-programmable gate arrays (FPGAs)—help speed up the work of AI algorithms. This combination of AI and new processor types is leading to a revolution in the realm of data analysis.
In an effort to shift direction in the application of real-time AI at scale, the National Science Foundation (NSF) has funded $15 million in support of advancing scientific knowledge and discovery with the Accelerated AI Algorithms for Data-Driven Discovery (A3D3) Institute. Its mission is to incorporate AI algorithms with new processors to support analyses of these unprecedented data sets.
“AI-assisted analysis of multidisciplinary data sets will be critical in helping researchers locate and explore trends that can lead to new discoveries,” said UC San Diego Chancellor Pradeep K. Khosla. “The new multi-disciplinary and geographically distributed A3D3 Institute, supported through NSF’s Harnessing the Data Revolution (HDR) program, will lead the way with a collaborative team of researchers from UC San Diego, Caltech, Duke University, MIT, Purdue University, UIUC, University of Minnesota, University of Washington and the University of Wisconsin-Madison.”
UC San Diego influence
UC San Diego’s Javier Duarte, an assistant professor in the Department of Physics who collaborates with researchers at the San Diego Supercomputer Center (SDSC), will serve as an institute principal investigator (PI) and as the university’s institute board representative, as well as the equity and career representative on the executive board. In that capacity, he will also co-supervise the post-baccalaureate program with Mia Liu at Purdue University. Frank Würthwein, interim director at SDSC, will participate, as will Amit Majumdar, who leads SDSC’s Data Enabled Scientific Computing Division. Additionally, UC San Diego postdoctoral researcher Daniel Diaz and graduate student researchers Raghav Kansal, Farouk Mokhtar and Anthony Aportela will develop accelerated AI algorithms.
Duarte said that his work will be split between developing ultrafast machine learning algorithms deployed in specialized hardware, such as FPGAs, that can be used to process data from sensors in real time, and developing heterogeneous computing pipelines to enable faster processing of big scientific data.
Targeting fields of science for fast AI and outreach
To take full advantage of fast AI, the A3D3 Institute will target fundamental problems in three fields of science: high energy physics, multi-messenger astrophysics and systems neuroscience.
“A3D3 works closely within these domains to develop customized AI solutions to process large datasets in real-time, significantly enhancing their discovery potential,” said Duarte. “The ultimate goal of A3D3 is to construct the institutional knowledge essential for real-time applications of AI in any scientific field.”
Duarte also noted that A3D3 will empower scientists with new tools to deal with the coming data deluge through dedicated outreach efforts.
“The post-baccalaureate program, for example, will be specifically aimed at helping underrepresented minority students, identifying as Black, Latinx, Indigenous, women or LGBT+, from institutions without extensive research opportunities, gain valuable research experience in order to ‘bridge the gap’ between undergraduate and graduate programs,” he said.
The director and PI of the A3D3 Institute is the University of Washington’s Shih-Chieh Hsu, a colleague and former student of Würthwein’s, who gave an example of the potential impact of the work at the institute.
“At the Large Hadron Collider (LHC), the challenge of processing data is daunting. With future aggregate data rates exceeding one petabit per second, the data rates at the LHC exceed all other devices in the world,” Hsu explained. “The aim of A3D3 is to build a series of tools that will enable the processing of all of this information in real-time using AI. Through the use of AI, A3D3 aims to perform advanced analyses, such as anomaly detection, and particle reconstruction on all collisions happening 40 million times per second!”
Real-time analyses in astrophysics and neuroscience
For UC San Diego-based projects out of SDSC, such as Voyager, an experimental AI research resource, and the National Research Platform, a first-of-its kind testbed for a cyberinfrastructure ecosystem, Majumdar said there is a lot of synergy between them and the institute, given A3D3’s incorporation of AI algorithms and new processors.
“Voyager is based on dedicated AI hardware from Habana, while NRP includes both FPGAs and more conventional GPUs,” noted Würthwein.
Duarte noted that within the field of multi-messenger astrophysics, A3D3 will be working to integrate AI to promptly and computationally process the data from telescopes, neutrino detectors, and gravitational-wave detectors efficiently in order to quickly identify astronomical events corresponding to the most violent phenomena in the cosmos.
“The ability to identify and further distribute these events as astronomical alerts enables the entire transient astronomy community to cross-correlate observations and understand astrophysical phenomena across multiple different forces,” Duarte said.
Amy Orsborn, an assistant professor in the Department of Electrical and Computer Engineering and the Department of Bioengineering at the University of Washington, explained that in systems neuroscience, A3D3 is working to discover the computations that brain-wide neural networks perform to process sensory and motor information during behavior. To do so, A3D3 will develop and implement high-throughput and low-latency AI algorithms to process, organize and analyze massive neural datasets in real time.
“These real-time analyses will enable new approaches to probing brain function such as causal, closed-loop manipulations. Applying powerful AI methods to systems neuroscience will significantly advance our ability to analyze and interpret neural activity and its relationship to behavior,” said Orsborn.
According to the NSF, the five new institutes will enable breakthroughs through collaborative, co-designed programs to formulate innovative data-intensive approaches for addressing critical national challenges. First outcomes are expected by 2023.
This project is supported by the NSF (grant no. 2117997).