close

  
Filter options:

Freebase Commons Metaweb System Types /type

Object is not asserted on this topic.

Freebase Commons Common /common

  • The UniProt NREF (UniProt Reference Clusters) database. The two major objectives of UniRef are: (i) to facilitate sequence merging in UniProt, and (ii) to allow faster and more informative sequence similarity searches. Although the UniProt Knowledgebase is much less redundant than UniParc, it still contains a certain level of redundancy because it is not possible to use fully automatic merging without risking merging of similar sequences from different proteins. However, such automatic procedures are extremely useful in compiling the UniRef databases to obtain complete coverage of sequence space while hiding redundant sequences (but not their descriptions) from view. A high level of redundancy results in several problems, including slow database searches and long lists of similar or identical alignments that can obscure novel matches in the output. Thus, a more even sampling of sequence space is advantageous. This can be addressed by clustering closely similar sequences to yield a representative subset of sequences. Therefore, we have created various non-redundant databases with different sequence identity cut-offs. In the UniRef90 and UniRef50 databases no pair of sequences in the representative set has 90% or 50% mutual sequence identity. The UniRef100 database presents identical sequences and sub-fragments as a single entry with protein IDs, sequences, bibliography, and links to protein databases.

Freebase Commons Computers /computer

Freebase Commons Internet /internet

  • The UniProt Reference Clusters are three separate datasets that compress sequence space at different resolutions, achieved by merging sequences and sub-sequences that are 100% (UniRef100), =90% (UniRef90), or =50% (UniRef50) identical, regardless of source organism. The UniRef100 database provides the most comprehensive non-redundant coverage of the known protein sequence space including not only all of UniProtKB but also splice variants that are not separated out in these databases, as well as additional active sequences from UniParc. The UniRef90 and UniRef50 databases provide a more even sampling of sequences by reducing the numbers of closely related sequence. This speeds sequence similarity searches while rendering such searches more informative. The compression of UniRef100 into UniRef90 and UniRef50 yields size reductions of approximately 40% and 65%, respectively.

Comments

Hide