XSEDE Resource Provides Open-Access Phylogenetic Supercomputing
A new Web resource developed at the San Diego Supercomputer Center (SDSC) at the University of California, San Diego is helping thousands of researchers worldwide unravel the enigmas of phylogenetics, the study of evolutionary relationships among virtually every species on the planet.
The CIPRES Science Gateway (CIPRES stands for Cyber Infrastructure for Phylogenetic RESearch), created by SDSC researchers, allows these studies to proceed in significantly shorter times without having to understand how to operate complex computers. Scientists anywhere in the world upload their data via a Web browser free of charge under a grant provided by the National Science Foundation (NSF).
CIPRES is part of the NSF’s Extreme Science and Engineering Discovery Environment (XSEDE). It is part of the XSEDE Science Gateway initiative, designed to provide scientists with broad and easy access to supercomputers.
Arman Bilge, a 10th grader at Lexington High School in Massachusetts, was a newbie to phylogenetics when a science teacher there organized an after-school phylogenetic tree club. In the club, Bilge learned how to use a variety of software applications, including one well known to systematic biologists called BEAST.
That led Bilge to create a map and timeline that identified when the Human Immunodeficiency Virus (HIV) arrived in the Americas, and where and when it spread across North and South America.
“The BEAST is a beast,” Bilge said in a telephone interview from Lexington. He managed to tame it enough to create a detailed phylogenetic tree based on similarities and differences in the 3,000 nucleotide subunits of a gene for an envelope protein among 700 known HIV-1 strains.
“You ask the BEAST program to guess at a most likely family tree for all of them and the software scores each possible tree with a likelihood function,” Bilge said. “The number of possible phylogenetic trees for HIV-1 exceeds the number of protons in the universe, and only one of them is correct, so this is a big calculation.”
Bilge first tried to run the analysis on his home computer. “I ran it for three weeks, but I didn’t reach the accepted way of knowing that you came to the end,” he said. Bilge said his parameters and settings were impossible for any computer to analyze, but he learned from the experience. “I started multiple simultaneous runs on CIPRES and the geographic component of my project is the result of the concatenation of these analyses,” he said.
The phylogenetic tree Bilge published for his science fair project was the one that BEAST said was the most optimal, and Bilge said his conclusions supported the previously published results of HIV experts: “A single introduction of the virus in Haiti in the mid-1900s resulted in its dispersion across the American continent.”
While Bilge may not have been satisfied with his residual statistical uncertainty, the judges at the 2012 Massachusetts Science and Engineering Fair were – they awarded him first place in the biology category.
Researchers say the gateway, and access to powerful supercomputers, are helping to answer increasingly sophisticated phylogenetic questions.
“The CIPRES Science Gateway makes it possible for researchers to make use of all this new information more quickly and effectively,” said Mark Miller, principal investigator of the CIPRES Gateway. “Our team is excited to have supported more than 300 publications of phylogenetic studies involving species in every branch of the Tree of Life.”
“It’s an important additional step in the conduct of science,” said Peter Nelson, a graduate student in the Department of Botany & Plant Pathology at Oregon State University in Corvallis. “This is a new opportunity for people who don’t yet have grant money, but who want to do meaningful research – and you don’t have to leave your computer.”
Nelson, a theorist in botany, is trying to understand the evolutionary processes that may operate one way in genetically homogeneous communities, but in a different way in more genetically diverse communities. He studies the divergence of tree species in North America. “We use GenBank and other sequence databases to gather the data, and free software is available to edit the sequences,” he said. “But the process is so computationally intensive I could never have accomplished it on a personal computer.”
Shedding new light on origins
All life forms, from simple bacteria to primates and plants, descended from a single common ancestor. A diagram of all the evolutionary relationships looks like a highly branched tree with the common ancestor at the base of the trunk, and extinct and living groups forming the branches. All living species are represented by leaves at the tips of the outermost limbs. This Tree of Life, like evolution itself, is not static; rather the branching process continues today as groups of individuals in single species, such as the Eastern Meadowlark appear to be splitting into two because of long-term geographical or environmental factors.
The phylogenetic history of each living species is contained in its DNA, and SDSC’s CIPRES Gateway is helping scientists analyze all the evolutionary relationships by making it possible for them to compare similarities and differences in the DNA among large numbers of species.
Phylogenetics is essential to understanding not only the history of life on earth, but also how populations of flowering plants, insects, crustaceans, fish, fungi, insects and microorganisms slowly change in response to their surroundings.
Such studies can also shed new light on how and where lineages began after challenging long-accepted theories. Researchers, for example, are using the CIPRES Gateway to clarify the evolution of wild grapes, which University of Florida Botany Professor J. Richard Abbott wrote, “indicate that American lineages could be older than Asian.” Abbott and his co-authors reported the controversial finding in a report in the February 2012 issue of Molecular Phylogenetics and Evolution.
In another project, Andrew F. Hugall and Devi Stuart-Fox, zoology researchers in the Department of Zoology at the University of Melbourne in Australia, used the CIPRES Gateway to provide the first phylogenetic analysis supporting an evolutionary theory that new species of birds are generated faster when the ancestral species exhibits color variations in its feathers.
Hugall and Stuart-Fox reported in the May 9, 2012, issue of Nature that speciation rates were almost three times higher for so-called color polymorphic species of birds of prey than similar monomorphic bird species. As the prevalence of feather-color polymorphism falls, so too does the rate of speciation.
The discipline of phylogenetic systematics combines taxonomy, or the description and naming of living species as well as fossilized life forms found in natural history museums, with modern phylogenetic studies. Systematic biologists combine a variety of sources of information, analyses and hypotheses to organize related groups of species, such as vertebrates, into clades and clades within clades. For example, the vertebrate clade is further subdivided into clades of amphibians, primates, rodents, and other groups of related species.
“Studies by systematic and evolutionary biologists have historically been limited by the number of available DNA sequences in public databases like GenBank,” said Miller. However, he added that modern DNA sequencing technologies generate data so quickly that analyzing all relevant data on conventional laptops can take weeks.
“There is a huge need in the community for easy access to computing resources,” said Miller. To meet that enthusiastic demand, Miller’s team at SDSC and their collaborators around the country continue to combine emerging techniques in computational biology with computer science.
Jan Zverina, 858-534-5111, firstname.lastname@example.org
Warren R. Froelich, 858 822-3622, email@example.com