Written by Jonathon Briggs from the Brumer lab, Michael Smith Laboratories
The exponential rate at which genomic data is being generated resulting from lower costs of sequencing created information-rich resources and tools for scientists in the digital age. Large sequence databases allow researchers to gain insights across multiple species leading to a greater understanding of gene conservation, evolution and is been used to make function predictions based on sequence information. However, vast data sets require more powerful tools to search, highlight features of, and understand the sequence data available. A recent paper published in Nature Communications from the lab of Dr. Joerg Gsponer, in collaboration with Dr. Steven Jones of the Michael Smith Genome Sciences Centre, outlines a new method to predict the deleteriousness of mutations in human cells based on the assumption that variations observed in species closely related to human, i.e. share longer taxonomy lineage with human, are more significant when assessing conservation compared to those in distantly related species.
While “the concept that sequences in closely-related species are more relevant to human is not new”, explained Dr. Nawar Malhis, the lead author of the study, the new conservation measures introduced in the paper are novel and the results are substantially more reliable compared to existing conservation measures. Dr. Malhis gave the example of comparing human genes to a Goldfish and Chimpanzee to look at variations in homologous genes. Human mutations observed in the Chimpanzee reference genome are more likely to be benign than those observed in the Goldfish reference genome.
The novel method developed by Dr. Malhis predicts deleteriousness of human variants in protein coding regions based on Local Identity and Shared Taxa, hence known as LIST. Unlike existing methods, LIST takes into account taxonomic distance between the query species (human) and the reference species. LIST results are substantially more accurate than existing conservation methods, and even outperform those that take into account genomic annotations. LIST advantage over other tools is even larger for sequences with shallow alignment depth, which are often found in proteins with intrinsically disordered regions.
A specific example given in the paper is that of a known deleterious variant of human recombinase harboring an arginine to glutamine mutation, which has been highly-studied and known to be associated with hereditary breast cancer. Existing tools based on conservation (PROVEAN, SIFT and EVmutation) predict this to be benign. LIST, however, characterizes the mutation as deleterious due to arginine variants in homologues present in species taxonomically-distant from humans.
While precomputed predictions of the ~20K human Swiss-Prot protein sequences are available online, Dr. Malhis is continuing work on LIST to create an even more powerful tools for assessing deleteriousness of mutations to an organism. An online server for LIST 2.0 is now available. LIST 2.0 differs from what is describes in the paper in that it substitutes the multiple sequence alignments used in LIST by a much faster pairwise sequence alignments without loss in prediction quality.
This work was supported by CIHR, NSERC, Genome Canada and Genome BC and MSFHR.