arrow

From NeuroLex

Jump to: navigation, search



Resource:1000 Genomes Project and AWS

Name: Resource:1000 Genomes Project and AWS
Description: A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use.

The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes

The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year.

Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data.

All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP.

Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.
Other Name(s): 1000 Genomes Project AWS, 1000 Genomes Project and Amazon Web Services, 000 Genomes Project Amazon Web Services
Parent Organization: Resource:1000 Genomes: A Deep Catalog of Human Genetic Variation, Resource:Amazon Web Services
Resource Type(s): Data set
Keywords: Genomic data, Genome, Cloud computing, Cloud, Human, Gene, Genetic variation, Research, DNA
Abbreviation: 1000 Genomes Project and AWS
Resource: Resource
URL: http://aws.amazon.com/1000genomes/
Publication link: http://www.nih.gov/news/health/mar2012/nhgri-29.htm
Id: nlx_144340
Link to OWL / RDF: Download this content as OWL/RDF

Curation status: Curated

For Resource Owners:

A sitemap will keep your NIF Registry description up-to-date and inform search engines about your resource.

Please login to create the sitemap. (top right)

Learn more about what NIF can do for your resource.
Proudly proclaim your inclusion in NIF by displaying the "Registered with NIF" button on your site.

Notes

This page uses this default form:Resource

Contributors

Aarnaud, Bandrow



bookmark
Facts about Resource:1000 Genomes Project and AWSRDF feed
Abbrev1000 Genomes Project and AWS  +
CurationStatusuncurated  +
DefiningCitationhttp://aws.amazon.com/1000genomes/  +
DefinitionA dataset containing the full genomic sequ A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use.

The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes

The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year.

Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data.

All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP.

Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.
ob flows, such as CloudBurst and Crossbow.
ExampleImage1000 Genomes Project and AWS.PNG  +
Has default formThis property is a special property in this wiki.Resource  +
Has roleData set  +
Idnlx_144340  +
Is part ofResource:1000 Genomes: A Deep Catalog of Human Genetic Variation  +, and Resource:Amazon Web Services  +
KeywordsGenomic data  +, Genome  +, Cloud computing  +, Cloud  +, Human  +, Gene  +, Genetic variation  +, Research  +, and DNA  +
LabelResource:1000 Genomes Project and AWS  +
ModifiedDate25 January 2013  +
Page has default formThis property is a special property in this wiki.Resource  +
PublicationLinkhttp://www.nih.gov/news/health/mar2012/nhgri-29.htm  +
SuperCategoryResource  +
Synonym1000 Genomes Project AWS  +, 1000 Genomes Project and Amazon Web Services  +, and 000 Genomes Project Amazon Web Services  +