UniParc : UniProt Archive a non-redundant archive of protein sequences extracted from Swi

UniProt Archive (UniParc) is part of UniProt project. It is a non-redundant archive of protein sequences extracted from public databases UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, PIR-PSD, EMBL, EMBL WGS, Ensembl, IPI, PDB, PIR-PSD, RefSeq, FlyBase, WormBase, H-Invitational Database, TROME database, European Patent Office proteins, United States Patent and Trademark Office proteins (USPTO) and Japan Patent Office proteins. UniParc contains only pro... more

Also known as:

  • UniProt Archive; a non-redundant archive of protein sequences extracted from Swiss-Prot, TrEMBL, PIR-PSD, EMBL, Ensembl, IPI, PDB, RefSeq, FlyBase, WormBase, European Patent Office, United States Patent and Trademark Office, and Japanese Patent Office

Public

Number of triples:

  • 490,000,000

Number of topics:

  • 16,695,439

SPARQL point:

  • http://uniparc.bio2rdf.org/sparql

Data source size:

  • 28,060 kB

Data source file format:

SPARQL port number:

  • 8,908

Reserved namespace:

  • uniparc

Description:

  • The UniProt Archive (UniParc) is a comprehensive non-redundant proteinsequence archive. Its protein sequences are retrieved from predominantpublicly accessible resources, including Swiss-Prot, TrEMBL, EMBL, Ensembl,RefSeq and PDB. To avoid redundancy each unique sequence is stored only once with a stable protein identifier, which can be used later to identify the same protein in all source databases. When proteins are loaded into UniParc, database cross-references are created to link them to the origins of the sequences. As a result, performing a sequence search against UniParc is equivalent to performing the same search against all databasescross-referenced by it.
  • The UniProt archive (UniParc), part of the UniProt databases, is an archival protein sequence collection from all major publicly accessible resources. New and revised protein sequences are added daily into UniParc while not deleting the previous versions. A UniParc sequence version is provided and incremented each time the underlying sequence changes, making it possible to observe the history of sequence changes in all source databases. To avoid redundancy, each unique sequence is assigned a unique identifier and is stored only once. The basic information stored with each UniParc entry is the identifier, the sequence, cyclic redundancy check number (CRC64), source database(s) with accession and version numbers, and a time stamp; all other information must be retrieved from the source databases. Each source database accession number is tagged with its status in that database, indicating if the sequence still exists or has been deleted at that source.

Provider homepage:

  • http://www.ebi.ac.uk/uniparc/

Identifier example:

  • UniParc:UPI000000000A

URL pattern:

  • http://www.ebi.ac.uk/cgi-bin/dbfetch?db=uniparc&id=%s

Triple number:

  • 5,595,543

Namespace number:

  • 33
top ↑

We can also tell you UniParc : UniProt Archive a non-redundant archive of protein sequences extracted from Swi is a…

If you know more about UniParc : UniProt Archive a non-redundant archive of protein sequences extracted from Swi, you can add more facts here »

These people have edited this topic:

Edit this topic
Edit and Show details

Add or delete facts, download data in JSON or RDF formats, and explore topic metadata.

Freebase Logo
What is Freebase?

Freebase is a huge collection of facts, built by people like you. Freebase connects facts in ways other sites can't, giving you new ways to explore millions of subjects.
You can help improve it!

Freebase Attribution

Freebase data is free for use under the CC-BY license.

Learn more about Freebase licensing and attribution