With the U.S. and many other countries working ‘round the clock to mitigate the devastating effects of the COVID-19 disease caused by the SARS-CoV-2 virus, the San Diego Supercomputer Center at the University of California San Diego is providing priority access to its high-performance computer systems and other resources to researchers advancing our understanding of the virus and efforts to develop an effective vaccine in as short a time as possible.
“Supercomputers have already demonstrated without a doubt their capabilities to accelerate scientific research, and like no other event before, this pandemic underscores the importance of such systems and expertise in benefiting science and society,” said SDSC Director Michael Norman. “For us, it absolutely crystalizes SDSC’s mission, which is to deliver lasting impact across the greater scientific community by creating innovative end-to-end computational and data solutions to meet the biggest research challenges of our time. That time is here.”
Access to Comet is being coordinated through the recently announced national alliance of high-performance computing resources called the COVID-19 HPC Consortium, which marshals the capabilities from some of the most powerful and advanced computers in the world – a combined 402 petaflops of compute power along with extensive cloud resources – to combat COVID-19, known as the coronavirus.
These resources are being made available at no cost to scientists. Allocations on Comet, as well as other supercomputers, are being made via the National Science Foundation’s XSEDE (Extreme Science and Engineering Discovery Environment) program. Details about submitting a request can be found here.
One of the most popular science gateways across the entire XSEDE resource portfolio is CIPRES, created as a portal under the NSF-funded CyberInfrastructure for Phylogenetic RESearch (CIPRES) project in 2009. Based at SDSC and running on Comet, the gateway helps scientists explore evolutionary relationships by comparing DNA sequence information between species.
Antoine Chaillon, an assistant professor in the Division of Infectious Diseases at UC San Diego’s Department of Medicine, and He Liu, a graduate student at the National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention, are using Comet and CIPRES to develop Bayesian diffusion models using SARS-CoV-2 sequences to characterize dispersal of the virus in California and China, respectively. Chaillon is also studying the timing and multiplicity of introductions within San Diego and potential dispersal from California toward other domestic and international locations.
“These bioinformatic analyses are computationally demanding so access to high-performance computer resources is critical,” said Mark Miller, a biology researcher with SDSC and Principal Investigator for CIPRES. “These studies should help us understand how the virus moves through populations, and how it changes with time – extremely helpful both in understanding the current pandemic and managing future outbreaks.”
Another science gateway using Comet for COVID-19 research is the I-TASSER gateway, an online server for automated and reliable protein structure and function prediction. Yang Zhang, a professor of computational medicine and bioinformatics at the University of Michigan, and his research group have been running COVID-19 research work on Comet and and already published a paper in the Journal of Proteome. This repository contains the whole-genome protein structure and function models for the SARS-CoV-2 and is made freely available to the community, and some 26,000 unique users have accessed and downloaded the models as of early April, including the New York Times. After systematically reanalyzing the genome sequences and protein structures created for the SARS-CoV-2 genome, the researchers concluded that pangolins, instead of snakes as previously thought, should be the missing link in coronavirus jumping from bats to humans. “Comet not only provides us the opportunity for creating the SARS-CoV-2 models,” said Zhang. “It also provides a critical platform and resource for us to develop advanced algorithms and techniques behind the I-TASSER gateway, which is the core of the system.”
Using SDSC’s Campus Computing Cluster
Scientists and students involved in COVID-19 research also have access to the Triton Shared Computing Cluster (TSCC), operated on behalf UC San Diego by SDSC. Since its launch in 2013, TSCC has grown to more than 30 participating labs/groups across UC San Diego with some 300 researcher-owned compute nodes, plus an additional 75 common nodes available to anyone on campus through a pay-as-you-go recharge model. TSCC is also available to researchers from other UC campuses, other educational institutions, and industry.
“I’m now using TSCC for my undergrad course on viral genomics (BIMM170),” said Rachel Dutton, an assistant professor and microbiologist at UC San Diego who is interested in bacterial genetics. “Students will be using it to analyze a few SARS-CoV-2 genomes.”
On the industry side, Allele Biotech, a company established by researchers from UC San Diego and other local institutions, is using TSCC to help develop yet another platform for a multi-warhead vaccine against SARS-CoV-2 based on bioinformatics, as well as treatment to cytokine storm, both using banked human stem cells analyzed at SDSC.
Other COVID-19 research initiatives spearheaded by SDSC researchers include:
- Early Alert System Using Data Science and AI: The goal of this SDSC initiative is to develop the equivalent of a weather map for the spread and severity of the virus based on creating algorithms for sickness prediction and early alert through a symptom reporting tool and interactive visualization map to better analyze and visualize COVID-19 in Southern California. The project is being led by SDSC’s Workflows for Data Science Center of Excellence (WorDS) in collaboration with the Center for Aerosol Impacts on Chemistry of the Environment (CAICE), Center for Microbiome Innovation (CMI), and the Halıcıoğlu Data Science Institute at UC San Diego. The system is based on another WorDS-initiated project called WIFIRE Lab, which is being used by firefighters and first-responders to better monitor, predict, and mitigate wildfires. “UC San Diego’s Report Your Symptoms and its associated collaborative analysis hub on top of the Pacific Research Platform (PI: Larry Smarr) are focusing on creating integrated research datasets to accelerate COVID-19 research on our campus and beyond through automated workflows and data science” said Ilkay Altintas, SDSC’s chief data science officer, director of the WorDS Center, and WIFIRE Lab’s principal investigator.
- Repurposing FDA-approved drugs as potential COVID-19 protease inhibitors: SDSC researchers Valentina Kouznetsova and Igor Tsigelny recently submitted a paper for publication after creation of the pharmacophore model and conducting data mining of the conformational database of FDA-approved drugs and identifying 64 compounds as potential inhibitors of the COVID-19 protease. The conformations of these compounds underwent 3D fingerprint similarity clusterization. The researchers also conducted docking of possible conformers of these drugs to the binding pocket of protease and then conducted the same docking of random compounds. Among the selected compounds are two HIV protease inhibitors and two hepatitis C protease inhibitors along with three drugs that have already shown positive results in testing with COVID-19. SDSC high school student David Huang, a participant in SDSC’s Research Experience for High School Students (REHS) program, conducted the computational docking.