|
|
From NeuroLex
| Abbrev
|
1000 Genomes Project +
|
| CurationStatus
|
uncurated +
|
| DefiningCitation
|
http://www.1000genomes.org/ +
|
| Definition
|
Database of genomic sequence data spanning … Database of genomic sequence data spanning several human populations including many families, that can be downloaded. Redundant sequencing on various platforms and by different groups of scientists of the same samples can be compared.
Recent improvements in sequencing technology ("next-gen" sequencing platforms) have sharply reduced the cost of sequencing. The 1000 Genomes Project is the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. As with other major human genome reference projects, data from the 1000 Genomes Project will be made available quickly to the worldwide scientific community through freely accessible public databases.
The goal of the 1000 Genomes Project is to find most genetic variants that have frequencies of at least 1% in the populations studied. This goal can be attained by sequencing many individuals lightly. To sequence a person's genome, many copies of the DNA are broken into short pieces and each piece is sequenced. The many copies of DNA mean that the DNA pieces are more-or-less randomly distributed across the genome. The pieces are then aligned to the reference sequence and joined together. To find the complete genomic sequence of one person with current sequencing platforms requires sequencing that person's DNA the equivalent of about 28 times (called 28X). If the amount of sequence done is only an average of once across the genome (1X), then much of the sequence will be missed, because some genomic locations will be covered by several pieces while others will have none. The deeper the sequencing coverage, the more of the genome will be covered at least once. Also, people are diploid; the deeper the sequencing coverage, the more likely that both chromosomes at a location will be included. In addition, deeper coverage is particularly useful for detecting structural variants, and allows sequencing errors to be corrected.
Sequencing is still too expensive to deeply sequence the many samples being studied for this project. However, any particular region of the genome generally contains a limited number of haplotypes. Data can be combined across many samples to allow efficient detection of most of the variants in a region. The Project currently plans to sequence each sample to about 4X coverage; at this depth sequencing cannot provide the complete genotype of each sample, but should allow the detection of most variants with frequencies as low as 1%. Combining the data from 2500 samples should allow highly accurate estimation (imputation) of the variants and genotypes for each sample that were not seen directly by the light sequencing.
The sequence and alignment data generated by the 1000genomes project is made available as quickly as possible via our mirrored ftp sites. ly as possible via our mirrored ftp sites.
|
| ExampleImage
|
1000 Genomes Project.PNG +
|
| Has default formThis property is a special property in this wiki.
|
Resource +
|
| Has role
|
Downloadable Database +
|
| Id
|
nlx_143819 +
|
| Is part of
|
Wellcome Trust Sanger Institute; Hinxton; United Kingdom +,
Harvard Medical School; Massachusetts; USA +
|
| Keywords
|
Human +,
Genetic variation +,
Gene +,
Human gene +
|
| Label
|
Resource:1000 Genomes: A Deep Catalog of Human Genetic Variation +
|
| Modification dateThis property is a special property in this wiki.
|
28 January 2013 19:22:29 +
|
| ModifiedDate
|
28 January 2013 +
|
| Page has default formThis property is a special property in this wiki.
|
Resource +
|
| PublicationLink
|
http://www.1000genomes.org/sites/1000genomes.org/files/docs/nature09534.pdf +
|
| SuperCategory
|
Resource +
|
| Synonym
|
International 1000 Genomes Project +,
1000 Genomes +
|
| Categories |
Resource
|
|
|