arrow

From NeuroLex

Jump to: navigation, search



Resource:Nh3D: A Reference Dataset of Structures of Non-homologous Proteins

Name: Resource:Nh3D: A Reference Dataset of Structures of Non-homologous Proteins
Description: THIS RESOURCE IS NO LONGER IN SERVICE, documented on July 17, 2013.

It is freely available as a reference dataset for the statistical analysis of sequence and structure features of proteins in the PDB. It is a dataset of structurally dissimilar proteins. This dataset has been compiled by selecting well resolved representatives from the Topology level of the CATH database which hierarchically classifies all protein structures.
These have been been pruned to remove:
i) domains that may contain homologous elements (by pairwise sequence comparison and structural superposition of aligned residues)
ii) internal duplications (by repeat detection)
iii) regions with high B-Factor
The statistical analysis of protein structures requires datasets in which structural features can be considered independently distributed, i.e. not related through common ancestry, and that fulfill minimal requirements regarding the experimental quality of the structures it contains. However, non-redundant datasets based on sequence similarity invariably contain distantly related homologues. Here a reference dataset of non-homologous protein domains is provided, assuming that structural dissimilarity at the topology level is incompatible with recognizable common ancestry.

It contains the best refined representatives of each Topology level, validates structural dissimilarity and removes internally duplicated fragments. The compilation of Nh3D is fully scripted. The current Nh3D list contains 570 domains with a total of 90780 residues. It covers more than 70% of folds at the Topology level of the CATH database and represents more than 90% of the structures in the PDB that have been classified by CATH. Even though all protein pairs are structurally dissimilar, some pairwise sequence identities after global alignment are greater than 30%. Nh3D is freely available as a reference dataset for the statistical analysis of sequence and structure features of proteins in the PDB.
Abbreviation: Nh3D
Parent Organization: University of Toronto; Ontario; Canada
Resource Type(s): Database
Resource: Resource
URL: http://www.schematikon.org/Nh3D.html
*Id: nif-0000-21286
Availability: THIS RESOURCE IS NO LONGER IN SERVICE, documented on July 17, 2013.
Keywords: duplication, element, feature, fragment, align, alignment, analysis, b-factor, dissimilar, homologous, protein, protein structure databases, residue, sequence, statistical, structurally, structure, topology
Link to OWL / RDF: Download this content as OWL/RDF

Curation status: Uncurated

This resource will be curated within 7 days.

For Resource Owners:
After the resource is curated, you may create a sitemap, which will help keep your registry description up-to-date and inform search engines about your resource.

Note: For a new resource, the website's URL must first be verified by a NIF curator before you may proceed.

Learn more about what NIF can do for your resource.
Proudly proclaim your inclusion in NIF by displaying the "Registered with NIF" button on your site. Please login to create the sitemap. (top right)

Notes

This page uses this default form:Resource

Contributors

Akash, Ccdbuser, Nifbot2, Zaidaziz



bookmark

*Note: Neurolex imports many terms and their ids from existing community ontologies, e.g., the Gene Ontology. Neurolex, however, is a dynamic site and any content beyond the identifier should not be presumed to reflect the content or views of the source ontology. Users should consult with the authoritative source for each ontology for current information.

Facts about Resource:Nh3D: A Reference Dataset of Structures of Non-homologous ProteinsRDF feed
AbbrevNh3D  +
AvailabilityTHIS RESOURCE IS NO LONGER IN SERVICE  +, Documented on July 17  +, and 2013.  +
CurationStatusuncurated  +
DefiningCitationhttp://www.schematikon.org/Nh3D.html  +
DefinitionTHIS RESOURCE IS NO LONGER IN SERVICE, doc THIS RESOURCE IS NO LONGER IN SERVICE, documented on July 17, 2013.

It is freely available as a reference dataset for the statistical analysis of sequence and structure features of proteins in the PDB. It is a dataset of structurally dissimilar proteins. This dataset has been compiled by selecting well resolved representatives from the Topology level of the CATH database which hierarchically classifies all protein structures.
These have been been pruned to remove:
i) domains that may contain homologous elements (by pairwise sequence comparison and structural superposition of aligned residues)
ii) internal duplications (by repeat detection)
iii) regions with high B-Factor
The statistical analysis of protein structures requires datasets in which structural features can be considered independently distributed, i.e. not related through common ancestry, and that fulfill minimal requirements regarding the experimental quality of the structures it contains. However, non-redundant datasets based on sequence similarity invariably contain distantly related homologues. Here a reference dataset of non-homologous protein domains is provided, assuming that structural dissimilarity at the topology level is incompatible with recognizable common ancestry.

It contains the best refined representatives of each Topology level, validates structural dissimilarity and removes internally duplicated fragments. The compilation of Nh3D is fully scripted. The current Nh3D list contains 570 domains with a total of 90780 residues. It covers more than 70% of folds at the Topology level of the CATH database and represents more than 90% of the structures in the PDB that have been classified by CATH. Even though all protein pairs are structurally dissimilar, some pairwise sequence identities after global alignment are greater than 30%. Nh3D is freely available as a reference dataset for the statistical analysis of sequence and structure features of proteins in the PDB.
ure features of proteins in the PDB.
Has default formThis property is a special property in this wiki.Resource  +
Has roleDatabase  +
Idnif-0000-21286  +
Is part ofUniversity of Toronto; Ontario; Canada  +
KeywordsDuplication  +, Element  +, Feature  +, Fragment  +, Align  +, Alignment  +, Analysis  +, B-factor  +, Dissimilar  +, Homologous  +, Protein  +, Protein structure databases  +, Residue  +, Sequence  +, Statistical  +, Structurally  +, Structure  +, and Topology  +
LabelResource:Nh3D: A Reference Dataset of Structures of Non-homologous Proteins  +
ModifiedDate17 July 2013  +
Page has default formThis property is a special property in this wiki.Resource  +
SuperCategoryResource  +