HGV2014 Meeting Report, Session 6: “UNDERSTANDING THE EVOLVING GENOME”
Caveats: I have not taken notes in every talk of every session, a lack of notes for a particular speaker does not constitute disinterest on my part, I simply took notes for the talks that were directly related to my current work. If I have misquoted, misrepresented or misunderstood anything, and you are the speaker concerned, or a member of the team involved in the work, please leave a comment on the post, and I will rectify the situation accordingly.
6.1 Yves Moreau, University of Leuven, Belgium: “Variant Prioritisation by genomic data fusion”
An essential part of the prioritization process is the integration of phenotype.
Critical paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3083082/ “Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis”
Yves introduced “Endeavour” which takes a gene list and matches it to the disease of interest and ranks them, but this requires phenotypic information to be ‘rich’. Two main questions need to be addressed 1) What genes are related to a phenotype? And 2) Which variants in a gene are pathogenic? Candidate gene prioritization is not a new thing, and has a long history in microarray analysis. Whilst it’s easy to interrogate things like pathway information, GO terms and literature it is much harder to find relevant expression profile information or functional annotation and existing machine learning tools do not really support these data types.
Critical paper: http://www.ncbi.nlm.nih.gov/pubmed/16680138 “Gene prioritization through genomic data fusion.”
Critical resource: http://homes.esat.kuleuven.be/~bioiuser/endeavour/tool/endeavourweb.php
Endeavour can be trained, rank according to various criteria and then merge ranks to provide ordered statistics
Next eXtasy was introduced, another variant prioritization tool for non-synonymous variants given a specific phenotype.
Critical resource: http://homes.esat.kuleuven.be/~bioiuser/eXtasy/
Critical paper: http://www.nature.com/nmeth/journal/v10/n11/abs/nmeth.2656.html “eXtasy: variant prioritization by genomic data fusion”
eXtasy allows variants to be ranked by effects on structural change in the protein, association in a case/control or GWAS study, evolutionary conservation.
The problem though is one of multiscale data integration – we might know that a megabase region is interesting through one technique, a gene is interesting by another technique, and then we need to find the variant of interest from a list of variants in that gene.
They have performed HGMD to HPO mappings (1142 HPO terms cover HGMD mutations). It was noted that Polyphen and SIFT are useless for distinguishing between disease causing and rare, benign variants.
eXtasy produces rankings for a VCF file by taking the trained classifier data and using a random forest approach to rank. One of the underlying assumptions of this approach is that any rare variant found in the 1kG dataset is benign as they are meant to be nominally asymptomatic individuals.
These approaches are integrated into NGS-Logistics a federated analysis of variants over multiple sites which has some similarities to the Beacon approaches discussed previously. NGS-Logistics is a project looking for test and partner sites
Critical paper: http://genomemedicine.com/content/6/9/71/abstract
Critical resource: https://ngsl.esat.kuleuven.be
However it’s clear what is required as much as a perfect database of pathogenic mutations is also a database of benign ones – both local population controls for ethnicity matching, but also high MAF variants, rare variants in asymptomatic datasets.
6.2 Aoife McLysaght, Trinity College Dublin: “Dosage Sensitive Genes in Evolution and Disease”
Aiofe started by saying that most CNVs in the human genome are benign. The quality that makes a CNV pathogenic is that of gene dosage. Haploinsufficiency (where half the product != half the activity) affects about 3% of genes in a systematic study in yeast. This is going to affect certain classes of genes, for instance those where concentration dependent effects are very important (morphogens in developmental biology for example).
This can occur through mechanisms like a propensity towards low affinity promiscuous aggregation of protein product. Consequently the relative balance of genes can be the problem where it affects the stoichiometry of the system.
This is against the background of clear genome duplication over the course of vertebrate evolution. This would suggest that dosage sensitive genes should be retained after subsequent genome chromosomal rearrangement and loss. About 20-30% of the genes can be traced back to these duplication events and they are enriched for developmental genes and members of protein complexes. These are called “ohnologs”
What is interesting is that 60% of these are never associated with CNV events or deletions and duplications in healthy people and they are highly enriched for disease genes.
Critical paper: http://www.pnas.org/content/111/1/361.full “Ohnologs are overrepresented in pathogenic copy number mutations”
6.3 Suganthi Balasubramanian, Yale: “Making sense of nonsense: consequence of premature termination”
Under discussion in this talk was the characterization of Loss of Function (LoF) mutations. There’s a lot of people who prefer not to use this term and would rather describe them as broken down into various classes which can include
- Truncating nonsense SNVs
- Splice disrupting mutations
- Frameshift indels
- Large structural variations
The average person carries around a hundred LoF mutations of which around 1/5th are in a homozygous state.
It was commented that people trying to divine information from e.g. 1kG datasets had to content with lots of sequencing artefacts or annotation artefacts when assessing this.
Critical paper: http://www.sciencemag.org/content/335/6070/823 “A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes”
Critical resource: http://macarthurlab.org/lof/
In particular the introduction of stop codons in a transcript are hard to predict. Some of the time this will be masked by splicing events or controlled by nonsense-mediated decay which means they may not be pathogenic at all.
Also stops codons in the last exon of a gene may not be of great interest as they are unlikely to have large effects on protein conformation.
The ALOFT pipeline was developed to annotate loss of function mutations. This uses a number of resources to make predictions including information about NMD, protein domains, gene networks (shortest path to known disease genes) as well as evolutionary conservation scores (GERP), dn/ds information from mouse and macaque and a random forest approach to classification. A list of benign variants is used in the training set including things like homozygous stop mutations in the 1kG dataset which are assumed to be non-pathogenic. Dominant effects are likely to occur in haploinsufficient genes with an HGMD entry.