How to assess the quality of my proteome

It is important to remember that assessing the quality of a proteome is a difficult task and the PQI score should not always be treated as the final definitive judgement.

1. Is my proteome already present in PQI?

Search the PQI webpage to see whether the proteome of interest is already present in PQI. We try to update the PQI as often as possible but if you can't find your proteome of interest, please contact us. You can use the Suggest a Proteome form if you can provide the location of the fasta files online. If your proteome has not been yet published, or you would not wish to make it publicly available please contact us directly.

Now that you've found your proteome of interest, let's walk through the PQI proteome page on the example of Nanoarchaeum equitans.

2. Your proteome's local clade

The first important piece of information is the expandable "Compared to" list. This list contains species from the local clade of the assessed proteome. All clade-based metrics compare the proteome to the other proteomes in its clade. The smallest number of proteomes in a clade is 10. The clade always contains all proteomes descending from their most recent common ancestor. In the case of proteomes that do not have any closely related species in the dataset, their most recent common ancestor is quite distant and thus has many (often very diverse) descendants.

Nanoarchaeum equitans is compared to 51 other species. The clade makes a diverse dataset; if the the proteome gets a low score in one of the clade-based metrics it means it is very different from all the other species.

3. The User score

At the time of writing this page the user score for this proteome was 3 stars. This probably means someone has found some problems with this proteome; check the bottom of the proteome page for any user comments.

4. The PQI score

The PQI score is the average of scores from automated PQI metrics. The mapping from raw metric scores to human readable ratings was designed such that the majority of proteomes get at least 3 stars; scores of 3 stars should be considered as below average (but still reasonable); scores with 2 stars or less are reserved only for the few real outliers.

In the following part of this guide each metric will be described separately.

This metric is self explanatory - X's in the proteome allow directly determining the completeness of the proteome. Nanoarchaeum equitans has 0 X's and gets 5 stars.

This metric gauges the popularity of the proteomes and hence may indicate the extensiveness of the research performed. Nanoarchaeum equitans was awarded over 3 stars. This suggests it's a reasonably well studied species.

This metric applies only to Eukaryotes. It shows the proportion of the Core Eukaryotic sequences (their homologs) that can be found in the assessed proteome. Nanoarchaeum equitans is not an Eukaroyte and hasn't been assessed with this metric.

Uniprot is aiming to serve as the database of all known protein sequences. Newly identified protein sequences are constantly submitted to Uniprot; if a high percentage of the sequences is not covered in the UniProt database this may suggest inaccuracy of the proteome. Over 97% of Nanoarchaeum equitans sequences can be found in Uniprot, which gives it over 4 stars.

The proportion of the proteome's sequences that were annotated with SCOP domains is the proportion of structured protein sequences found in the proteome vs disordered regions and gaps. Closely related species are likely to have a similar proportion between the two types of protein.

Nanoarchaeum equitans had only 56% of sequences annotated, which makes it a far outlier compared with other species in the local clade. It has been assigned 1 star, which strongly points to a problem with the quality of this proteome; the amount of disorder and gaps measured is highly improbable and this could mean that gene calls encompass too vast regions.

Comparing the proportion of the proteome's proteins that were annotated with SCOP domains to other protoeomes from the local phylogenetic clade gives a good idea about the quality of the proteome, since SCOP domains are highly conserved units in the context of evolution. Nanoarchaeum equitans has 64% proteins annotated, which lies within the standard range amongst the local phylogenetic clade - it has been awarded over 4 stars for this metric.

The number of Superfamilies gives an indication of the diversity of the proteome. Nanoarchaeum equitans has only been assigned with 228 Superfamilies, which is much lower than the mean number in the clade. PQI assigned it a score of below 2 stars for this metric implying that this proteome is a far outlier in the distribution.

The mean sequence length metric checks if the lengths of the protein sequences in the proteome are comparable with the mean lengths of proteins in proteomes of related species. Nanoarchaeum equitans was awarded almost 5 stars because its mean protein length is very similar to the mean lengths in the other species from its clade.

This metric allows gaining insight into the completeness and accuracy of the protein sequences in the proteome.

In Nanoarchaeum equitans the mean hit length is 241, which is much lower than in other proteomes of the clade. This suggests the sequences of Nanoarchaeum equitans may be fragmentary.

The number of families gives an indication of the diversity of the proteome at the SCOP family level. Nanoarchaeum equitans has only been assigned with 82 Families, which is much lower than the mean number in the clade. PQI assigned it a score of below 2 stars for this metric implying that this proteome is a far outlier in the distribution.

The number of Domain Architectures is a measure of the uniqueness of the sequences that make up a proteome. Nanoarchaeum equitans has only been assigned with 258 Domain Architectures, which is much lower than the mean number in the clade. PQI assigned it a score of below 2 stars for this metric implying that this proteome is a far outlier in the distribution.

In conclusion, some PQI metrics point to incompleteness of the proteome of Nanoarchaeum equitans, while others give expected measurements. Note that although it is likely that the low PQI scores indicate real problems with this proteome, it may also be true that this organism, being parasitic, is a very unique member of the Tree of Life.