An international consortium of nearly 200 plant scientists, including engineers at the University of California San Diego, has released gene sequences for more than 1100 plant species, the culmination of a nine-year research project.
The One Thousand Plant Transcriptomes Initiative (1KP) is a global collaboration to examine the diversification of plant species, genes and genomes across the more than one-billion-year history of green plants dating back to the ancestors of flowering plants and green algae.
The advance was made possible in large part thanks to the development of new computational methods capable of assembling gene sequences, inferring phylogenies and modelling evolutionary processes from extremely large datasets with high accuracy. Some of these methods have become standard tools for computational biologists.
The findings, published Oct. 23 in Nature, reveal the evolutionary history and timing of whole genome duplications and the origins, expansions and contractions of gene families contributing to fundamental genetic innovations enabling the evolution of green algae, mosses, ferns, conifer trees, flowering plants and all other green plant lineages. The history of how and when plants secured the ability to grow tall, and make seeds, flowers and fruits provides a framework for understanding plant diversity around the planet including annual crops and long-lived forest tree species.
The massive scope of the project demanded development and refinement of new computational tools for sequence assembly and phylogenetic analysis.
“Beyond its significance to plant biology, 1KP has been a trailblazing project,” said co-senior author Siavash Mirarab, a professor of electrical and computer engineering at UC San Diego. “Methods that we ended up developing to be able to analyze this dataset pushed the boundaries of what kinds of analyses were possible on large data and have now been adopted by the wider research community. Countless other projects have benefited from the method development that initiated in this project.”
Mirarab and computer scientist Tandy Warnow of the University of Illinois developed new algorithms for inferring evolutionary relationships from hundreds of genes for over one thousand species, addressing substantial heterogeneity in evolutionary histories across the genomes.
In order to construct highly accurate species trees for datasets in this study, Mirarab and Warnow developed a method for species tree estimation that is capable of analyzing one thousand transcriptomes at high accuracy. The method, called ASTRAL-II and later ASTRAL-III, is now one of the leading methods for species tree estimation and has been used by hundreds of other studies.
Mirarab and his lab at UC San Diego also developed an efficient and highly accurate statistical technique for measuring the support on each branch. The technique, dubbed local posterior probability (localPP), examines frequencies of topologies of gene trees as they pertain to individual parts of the phylogeny and computes summary statistics.
These frequencies, however, can also be very informative if visually inspected. Accordingly, Mirarab and his team developed a visualization tool that can quickly compute these summary statistics, dubbed DiscoVista. The tool also enables other biologists to explore their data.
The development of all these algorithms happened in direct response to needs that were revealed in the analysis of the 1KP data. However, the methods have now become more widely adopted by the research community.
The 1KP study inspired a community effort to gather and sequence diverse plant lineages derived from terrestrial and aquatic habitats on a global scale. Over 100 taxonomic specialists contributed material from field and living collections. By sequencing and analyzing genes from a broad sampling of plant species, researchers are better able to reconstruct gene content in the ancestors of all crops and model plant species, and gain a more complete picture of the gene and genome duplications that enabled evolutionary innovations.
Paper title: “One Thousand Plant Transcriptomes and Phylogenomics of Green Plants.”