| February
17, 2005
California Researchers Collaborate With Perlegen Sciences
On Map Of Human Genetic Variation Across Populations
New Study Makes Whole-Genome Association Studies
Possible
By Doug Ramsey
|
| Variation
among humans is evident in the Science cover image and in
a new map of key genetic signposts in three human populations.
This resource will speed efforts to pinpoint disease-related
genes and will advance population and evolutionary genetics.
[Image: Joshua Moglia] |
Computer scientists
at two research centers affiliated with the University of California
have teamed with biologists from Perlegen Sciences, Inc., to
map key genetic signposts across three human populations. Their
study – published in the Feb. 18 issue of Science
– could make widely accessible the analysis of human variation
based on whole-genome data, and speed efforts to pinpoint DNA
variations that are associated with disease or with how patients
respond differently to drugs.
“This project
sets a new milestone in the search for genetic elements linked
to complex genetic diseases such as Alzheimer's, cancer and
multiple sclerosis,” said co-author David R. Cox, Chief
Scientific Officer at Mountain View, CA-based Perlegen. “Genome-wide
analysis may soon become a standard methodology in the search
for more effective, individualized treatments.”
Researchers at Perlegen
sequenced the single-letter variations (called single-nucleotide
polymorphisms, or SNPs) in the DNA of 71 individuals of European
American, African American, and Han Chinese American ancestry.
Subsequently, scientists at the California Institute for Telecommunications
and Information Technology (Calit2) at the University of California,
San Diego, and the UC Berkeley-affiliated International Computer
Science Institute (ICSI) helped analyze the set of over 100
million genotypes from the over 1.5 million SNPs sequenced in
each sample by Perlegen.
“This is the
first time that a SNP data set of that scale is being sequenced,”
said Eran Halperin, a research scientist at Berkeley-based ICSI.
“For each of the 23 pairs of chromosomes in human DNA,
the resulting data set consisted of 71 genotypes, which mix
together the information from both copies of the chromosome.
To see a clearer picture of a variation, we really want to know
the variation on each chromosome, and we can do that by inferring
haplotypes – the sequences of nucleotide bases in each
copy of the chromosome.”
Halperin and Calit2
researcher Eleazar Eskin, who co-authored the study with Perlegen
scientists, have pioneered a method for translating genotypes
into haplotypes, using the HAP software tool they co-developed
For this study, the bioinformatics researchers had to process
more than 190 million data points. “Using other programs,
haplotyping would require at least a few months of CPU time,”
said Eskin, an assistant professor in Computer Science and Engineering
at UC San Diego’s Jacobs School of Engineering. “Using
HAP on a regular laptop, this work would take only 200 CPU hours.
But we were able to use a cluster of computers from Calit2’s
OptIPuter project, and that allowed us to perform our final
entire analysis in less than 12 hours.”
Until now, due to the
high cost of sequencing technology, disease association studies
have traditionally been performed over short genomic regions.
The Science study indicates that genome-wide association
studies will now be possible for a considerably reduced budget,
as scientists build on the publicly-available data and tools
made available by Perlegen, ICSI and Calit2.
The researchers in
San Diego and Berkeley also used the HAP tool to partition the
human genome into ‘blocks’, or regions, of limited
diversity. These are regions where only a few common patterns
account for the majority of the variation in the population.
The resulting haplotype ‘maps’ across the three
populations appeared qualitatively similar to the maps compiled
by Perlegen using a different technique called ‘linkage
disequilibrium’ (LD). LD involves correlations of DNA
variants in physical proximity along a chromosome, and results
from a combination of processes including mutation, natural
selection, and genetic drift. Linkage disequilibrium is complex
and varies from one region of the genome to another, as well
as between different populations. According to the study, “LD
maps and haplotype maps represent somewhat different aspects
of the local structure of genetic variation.”
“The partitioning
of genomes into highly correlated regions may be extremely useful
for geneticists worldwide,” added ICSI’s Halperin.
“They could choose to sequence a small subset of SNPs
in each region, and use the high correlations between the different
SNPs in order to predict the SNPs that were not sequenced.”
The HAP study found
substantially more blocks in the African American map than in
the European American and Han Chinese maps, indicating that
the greatest genetic diversity was in samples of African American
descent (a finding consistent with previous studies).
Other findings in the
Science paper, titled "Whole Genome Patterns of
Common DNA Variation in Three Diverse Human Populations,"
include:
- Most functional
human genetic variation is not population-specific;
- The majority of
the 1.58 million SNPs with high-quality genotypes were common
in all three populations; and
- “Private SNPs”
– those SNPs segregating in only one population sample
– were only 18% of the total.
Maps of the haplotype
structure and the variants that are common in each region can
be downloaded from the Calit2 HAP site, which is hosted by the
National Biomedical Computational Resource at UCSD (see Related
Links below). “We hope that researchers interested in
specific regions of the genome will use this site to obtain
information on the human variation in those regions,”
said Calit2 director Larry Smarr. “This is a great example
of the revolution in computational biology and its potential
benefits to society in the study of cardiovascular disease,
mental illness and other conditions thought to result from a
complex interplay of multiple genetic and environmental factors.”
The SNPs analyzed in
the Science study represent only a fraction of the
more than 10 million common SNPs expected to exist in the human
genome. But researchers at Perlegen developed a mathematical
algorithm to identify so-called ‘tag SNPs’ that
provide guideposts for finding common variants in the human
genome. “This study and software tools mean that you no
longer have to wait to do whole-genome association studies,”
said Perlegen scientist David A. Hinds, lead author on the study.
“We've effectively figured out how to reduce the genotyping
burden by identifying a reduced set of tag SNPs, thus decreasing
the difficulty and cost of association studies. That said, even
when reducing to tag SNPs, we still need to be able to genotype
at least several hundred thousand SNPs to have a comprehensive
whole-genome association study.”
“This research
provides a tool for exploring many questions remaining regarding
the causal role of common human DNA variation in complex human
traits and for investigating the nature of genetic variation
within and between human populations," the Science
paper concludes.
Perlegen is also cooperating
with the public-sector International HapMap Project, which is
expected to release more detailed descriptions of genetic variations
later this year. “We see these two efforts as complementary,”
said Perlegen’s Hinds. “The HapMap project will
yield a denser map, with more SNPs across a deeper set of individuals."
HapMap will describe variation across individuals of Japanese,
Chinese, Nigerian and European ancestry.
Media Contacts:
Doug Ramsey, UCSD/Calit2,
(858) 822-5825
Leah Hitchcock,
ICSI, (510) 666-2974
Paul Cusenza,
Perlegen Sciences, (650) 575-6716
|