Databases of clustered and deeply annotated protein sequences and alignments
The Uniclust90, Uniclust50, Uniclust30 databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. The clusterings show a high consistency of functional annotation owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering.
Uniclust sequences are annotated with matches to Pfam, SCOP domains and proteins in the PDB. Due to the use of our sensitive HHblits homology detection tool for the domain annotation, Uniclust has annotations for many PDB/SCOP/PFam domains that are not annotated in UniProt/InterPro.
All databases can be downloaded here.
Mirdita M.*, von den Driesch L.*, Galiez C., Martin M. J., Söding J.#, and Steinegger M.#, Uniclust databases of clustered and deeply annotated protein sequences and alignments, Nucleic Acids Res. 2016.
*shared first authors, #corresponding authors