We’re looking for a new member for our Project Management Team! (now filled)

Let’s cut the preamble and let you just hit the link: http://jobs.tgac.ac.uk/Details.asp?vacancyID=11099

For those people seeking an alternate career path in science this is a fantastic opportunity to engage with external customers and our Science Faculty as well as working closely with the lab staff and bioinformaticians in my group at TGAC.

Advert blurb:

The Genome Analysis Centre (TGAC) has an exciting opportunity within the Platforms and Pipelines Group for a Customer Liaison Officer. Working in a dynamic, high-throughput, genomics facility with a focus on next-generation sequencing (NGS), the post holder will ensure that high customer service levels are achieved as TGACs customer base grows and expands by responding to customers in a timely manner and ensuring that the outputs from the projects are produced on time and to specification. This is an opportunity to work closely with bioinformatics and laboratory based teams in an Agile project management environment.

We’re (not) recruiting a bioinformatician!

Note: Actually we’re not, we never recruited to this position, but keep your eyes peeled – we will be readvertising this post in the next couple of months! (Dec 15/Jan16)


The third of our three currently open posts is for a bioinformatician within the Platforms and Pipelines bioinformatics team.  The post will join three other bioinformaticians and the Bioinformatics Manager to provide bioinformatics consultation and data analysis across a number of diverse sequencing/genomics platforms and projects.

This is a great opportunity to work in a cutting edge genomics environment and experience the incredible breadth of TGAC’s work.  The bioinformatics team is closely integrated with our project management and lab teams, and as the final step in project delivery are an important part of making sure that we deliver excellent data and analysis to our customers.

TGAC has a particular remit to work with bleeding-edge technologies, and whilst next-generation sequencing is at the core of what we do the successful candidate will be able to experience first hand data from new platforms as they come online.

If you’re interested, hit the link: http://jobs.tgac.ac.uk/Details.asp?vacancyID=9636


We’re recruiting a Bioinformatics Manager! (now filled)

The Platforms and Pipelines Group is looking for a leader for our bioinformatics team, this is an exciting opportunity to work in an environment with a very strong focus on genomics and computational biology.  The post-holder will be critical to the development of our National Capability in Genomics, driving the transfer of cutting edge analysis techniques from Science Faculty into production for the wider research community.

The bioinformatics team is strongly aligned with our high-througput laboratory and as such are expected to deal with data from all commercially available sequencing platforms as well as optical mapping platforms, serving applications from exome sequencing to RNA-Seq, metagenomics, genome assembly, epigenetics and more. The post holder will also have a rare opportunity to work across a huge breadth of non-model and model organism genomes.

An eye for detail, a strong focus on QC and the ability to direct the future of a team of people to support customer requirements in a rapidly changing scientific environment are a must.

Interested?  Hit the link: http://jobs.tgac.ac.uk/Details.asp?vacancyID=9559 pp-bioinformatics-manager

We’re recruiting an Automation Specialist! (now filled)

The Platforms and Pipelines Group at TGAC is recruiting an Automation Specialist for our high-throughput next-generation sequencing and genomics laboratory.  The role will involve leading development of new automation protocols on a number of liquid handling systems from Perkin-Elmer, Beckman Coulter and Labcyte.

Our liquid handling systems support critical parts of our DNA extraction and NGS library preparation pipelines, and the right candidate will be someone who enjoys working with cutting edge robotics platforms and transitioning complex lab protocols onto liquid handlings systems. This is a critical post for the smooth running of the group and will liaise closely with vendors to train on-site and prospective users of the system.  TGAC has an exciting remit to work with cutting edge technologies and tools and the post holder will be expected to advise on future strategies for lab automation.

Interested? Then take a look at : http://jobs.tgac.ac.uk/Details.asp?vacancyID=9623


Snuck into the lab to take a few (blurry) pictures!

So I nipped into the lab at the end of the day to drop some samples off with Chris, and ended up getting a lesson in setting up and programming the Sciclone G3 Perkin Elmer automated liquid handlers from Gawain and Fiona.  On the way out I thought I’d just grab a few pictures of the toys..

So top to bottom, left to right…

Ion Proton (Life Technologies), 3xMiSeqs (Illumina)

Irys (BioNano Genomics),  Genome Analyser (Illumina, historical interest only!).

HiSeqs (mix of 2000 and 2500 – Illumina, 2000’s in the foreground), RS II (PacBio)

DSC_0024 DSC_0018 (2) DSC_0023 DSC_0022 DSC_0019 DSC_0020

I left out the Argus (Opgen) which sits next to the BioNano, and the 454 FLX machines which are also no longer in use, but flank the aisle next to the bank of Illumina’s.

Positions open in the Platforms & Pipelines Group at TGAC (now filled)

So my new role, which I will elaborate on later, is as the Head of the Platforms and Pipelines Group at The Genome Analysis Centre in Norwich.  We have a couple of positions vacant currently in the group.  For those of you interested in Bioinformatics posts, stay tuned for later advertisements.

For those of you unfamiliar with TGAC, it is a UK hub for innovative Bioinformatics through research, analysis and interpretation of multiple, complex data sets. It hosts one of the largest computing hardware facilities dedicated to life science research in Europe.  It also has a state of the art DNA sequencing facility operating multiple complementary technologies for data generation that provide the foundation for analyses furthering our fundamental understanding of genomes and how they function.

The positions open are in Chris Watkins group, the Platforms & Pipelines Project Management Team.  One is a Customer Liaison Officer (Pipelines) and the other is a Customer Liaison Officer (Sales).

Full job descriptions are here:

Customer Liaison Officer (Pipelines)

Customer Liaison Officer (Sales)

The Pipelines role is a new post, and will ensure that there are timely responses to customer enquiries and ensuring that a projects deliverables are provided on time, to a high quality and communicated clearly to the customer. An undergraduate degree in a relevant area is essential, and candidates with some knowledge of cutting edge bioinformatics and genomics would be welcome.

The Sales role is also a new post, and where the Pipelines role deals with customer engagement after a project is secured, the Sales role will deal with the securing those projects. This means dealing with sales enquiries, issuing quotes and invoices, and work on some marketing aspects for the team.  Again an undergraduate degree in a relevant area is essential, and given the complexity of the projects, those with post-graduate qualifications would be welcome.

Please circulate this to people you think might be interested!

15th International Conference on Human Genome Variation – Meeting report

Last week I was lucky enough to attend the HGV2014 meeting at the Culloden Hotel in Belfast. It was my first trip to Northern Ireland and my first attendance at an HGV meeting.  The meeting is small and intimate, but had a great wide-ranging programme, and I would heartily recommend attending if you get the chance and have an interest in clincal or human genomics.

Have a look at the full programme here: http://hgvmeeting.org/

Here’s a link to my write-ups for each session  (where I had notes that I could reconstruct!):

  1. Interpreting the human variome
  2. The tractable cancer genome
  3. Phenomes, genomes and archaeomes
  4. Answering the global genomics challenge
  5. Improving our health: Time to get personal
  6. Understanding the evolving genome
  7. Next-gen ‘omics and the actionable genome



Caveats: I have not taken notes in every talk of every session, a lack of notes for a particular speaker does not constitute disinterest on my part, I simply took notes for the talks that were directly related to my current work. If I have misquoted, misrepresented or misunderstood anything, and you are the speaker concerned, or a member of the team involved in the work, please leave a comment on the post, and I will rectify the situation accordingly.

7.1    Christine Eng, Baylor College of Medicine: “Clinical Exome Sequencing for the Diagnosis of Mendelian Disorders”

Christine spoke about the pipeline for clinical WES at Baylor. Samples are sequenced to 140x to achieve 85%>40x coverage for the exome. A SNP array is run in conjunction with each sample. Concordance with the SNP array is tested for each sample and this must exceed 99%.

BWA is the primary mapper, but variants are called with ATLAS and annotated with Cassandra (Annovar is a dependency of Cassandra)

Critical resource: https://www.hgsc.bcm.edu/software/cassandra

Critical resource: http://sourceforge.net/projects/atlas2/

Critical paper: http://genome.cshlp.org/content/20/2/273.short “A SNP discovery method to assess variant allele probability from next-generation resequencing data”

Variants are filtered against HGMD. Filtered for variants which are <5% MAF. 4000 clinical internal exomes have been run so there is a further requirement for variants to have a <2% MAF in this dataset.

New gene list is updated for the system weekly and VOUS are reported in genes related to the disorder to all patients – this is much more extensive reporting than for those groups who feel VOUS muddy the waters.

An expanded report can be requested in addition which also reports deleterious mutations in genes for which there is no disease/phenotype linkage. The hit rate for molecular diagnostics via clinical exome is 25% and 75% are not clinically solved. These are then asked if they would like to opt in to a research programme so that the data can be shared and aggregated for greater diagnostic power.

11/504 cases had two distinct disorders presenting at the same time. 280 cases were autosomal dominant and 86% of the dominant cases are de novo mutations. 187 cases were autosomal recessive and this was 57% compound heterozygous, 3% UPD and 37% had homozygosity due to shared ancestry.

Many initially unsolved diagnoses can be revisited and successfully resolved 6-12 months later on revisiting the data such is the base of new data deposition.

They use guidelines from CPIC (from PharmGKB) and data on drug/gene interactions and there is linking to a prescription database, so the pipeline is ‘end to end’.

Critical resource: http://www.pharmgkb.org/page/cpic


Caveats: I have not taken notes in every talk of every session, a lack of notes for a particular speaker does not constitute disinterest on my part, I simply took notes for the talks that were directly related to my current work. If I have misquoted, misrepresented or misunderstood anything, and you are the speaker concerned, or a member of the team involved in the work, please leave a comment on the post, and I will rectify the situation accordingly.

6.1    Yves Moreau, University of Leuven, Belgium: “Variant Prioritisation by genomic data fusion”


An essential part of the prioritization process is the integration of phenotype.

Critical paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3083082/ “Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis”

Yves introduced “Endeavour” which takes a gene list and matches it to the disease of interest and ranks them, but this requires phenotypic information to be ‘rich’. Two main questions need to be addressed 1) What genes are related to a phenotype? And 2) Which variants in a gene are pathogenic? Candidate gene prioritization is not a new thing, and has a long history in microarray analysis. Whilst it’s easy to interrogate things like pathway information, GO terms and literature it is much harder to find relevant expression profile information or functional annotation and existing machine learning tools do not really support these data types.

Critical paper: http://www.ncbi.nlm.nih.gov/pubmed/16680138 “Gene prioritization through genomic data fusion.”

Critical resource: http://homes.esat.kuleuven.be/~bioiuser/endeavour/tool/endeavourweb.php

Endeavour can be trained, rank according to various criteria and then merge ranks to provide ordered statistics

Next eXtasy was introduced, another variant prioritization tool for non-synonymous variants given a specific phenotype.

Critical resource: http://homes.esat.kuleuven.be/~bioiuser/eXtasy/

Critical paper: http://www.nature.com/nmeth/journal/v10/n11/abs/nmeth.2656.html “eXtasy: variant prioritization by genomic data fusion”

eXtasy allows variants to be ranked by effects on structural change in the protein, association in a case/control or GWAS study, evolutionary conservation.

The problem though is one of multiscale data integration – we might know that a megabase region is interesting through one technique, a gene is interesting by another technique, and then we need to find the variant of interest from a list of variants in that gene.

They have performed HGMD to HPO mappings (1142 HPO terms cover HGMD mutations). It was noted that Polyphen and SIFT are useless for distinguishing between disease causing and rare, benign variants.

eXtasy produces rankings for a VCF file by taking the trained classifier data and using a random forest approach to rank. One of the underlying assumptions of this approach is that any rare variant found in the 1kG dataset is benign as they are meant to be nominally asymptomatic individuals.

These approaches are integrated into NGS-Logistics a federated analysis of variants over multiple sites which has some similarities to the Beacon approaches discussed previously. NGS-Logistics is a project looking for test and partner sites

Critical paper: http://genomemedicine.com/content/6/9/71/abstract

Critical resource: https://ngsl.esat.kuleuven.be

However it’s clear what is required as much as a perfect database of pathogenic mutations is also a database of benign ones – both local population controls for ethnicity matching, but also high MAF variants, rare variants in asymptomatic datasets.

6.2    Aoife McLysaght, Trinity College Dublin: “Dosage Sensitive Genes in Evolution and Disease”


Aiofe started by saying that most CNVs in the human genome are benign. The quality that makes a CNV pathogenic is that of gene dosage. Haploinsufficiency (where half the product != half the activity) affects about 3% of genes in a systematic study in yeast. This is going to affect certain classes of genes, for instance those where concentration dependent effects are very important (morphogens in developmental biology for example).

This can occur through mechanisms like a propensity towards low affinity promiscuous aggregation of protein product. Consequently the relative balance of genes can be the problem where it affects the stoichiometry of the system.

This is against the background of clear genome duplication over the course of vertebrate evolution. This would suggest that dosage sensitive genes should be retained after subsequent genome chromosomal rearrangement and loss. About 20-30% of the genes can be traced back to these duplication events and they are enriched for developmental genes and members of protein complexes. These are called “ohnologs”

What is interesting is that 60% of these are never associated with CNV events or deletions and duplications in healthy people and they are highly enriched for disease genes.

Critical paper: http://www.pnas.org/content/111/1/361.full “Ohnologs are overrepresented in pathogenic copy number mutations”

6.3    Suganthi Balasubramanian, Yale: “Making sense of nonsense: consequence of premature termination”

Under discussion in this talk was the characterization of Loss of Function (LoF) mutations. There’s a lot of people who prefer not to use this term and would rather describe them as broken down into various classes which can include

  • Truncating nonsense SNVs
  • Splice disrupting mutations
  • Frameshift indels
  • Large structural variations

The average person carries around a hundred LoF mutations of which around 1/5th are in a homozygous state.

It was commented that people trying to divine information from e.g. 1kG datasets had to content with lots of sequencing artefacts or annotation artefacts when assessing this.

Critical paper: http://www.sciencemag.org/content/335/6070/823 “A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes”

Critical resource: http://macarthurlab.org/lof/

In particular the introduction of stop codons in a transcript are hard to predict. Some of the time this will be masked by splicing events or controlled by nonsense-mediated decay which means they may not be pathogenic at all.

Also stops codons in the last exon of a gene may not be of great interest as they are unlikely to have large effects on protein conformation.

The ALOFT pipeline was developed to annotate loss of function mutations. This uses a number of resources to make predictions including information about NMD, protein domains, gene networks (shortest path to known disease genes) as well as evolutionary conservation scores (GERP), dn/ds information from mouse and macaque and a random forest approach to classification. A list of benign variants is used in the training set including things like homozygous stop mutations in the 1kG dataset which are assumed to be non-pathogenic. Dominant effects are likely to occur in haploinsufficient genes with an HGMD entry.


Caveats: I have not taken notes in every talk of every session, a lack of notes for a particular speaker does not constitute disinterest on my part, I simply took notes for the talks that were directly related to my current work. If I have misquoted, misrepresented or misunderstood anything, and you are the speaker concerned, or a member of the team involved in the work, please leave a comment on the post, and I will rectify the situation accordingly.

5.1    Mark Lawler, QUB, Belfast: “Personalised Cancer Medicine; Are we there yet?”

Another talk from Mark who was an excellent chair for some conference sessions as well. One of the biggest problems with personalized medicine is that some data is already silo’d, or at very best fragmented.

In the UK getting science into clinical practice within the NHS is really predicated on the evidence that it reduces costs, is transformational in terms of treatment and adds value to the current system. So the bar is set quite high.

This was contrasted with the INCa Tumour Molecular Profiling Programme which is running in France with colorectal and lung cancers. This is drawing on 28 labs around Europe. INCa appears to be run under the auspices of the Institut National du Cancer.

Critical resource: http://www.e-cancer.fr/en

Mark felt that empowering patient advocacy was going to be an important drive in NHS uptake of new technologies and tests. But equally important was increasing personalized medicine literacy amongst GPs, policymakers and the insurance industry.

5.2    Nazneen Rahman, ICR, London “Implementing large-scale, high-throughput cancer predisposition genomic testing in the clinic”

Nazneen is obviously interested in testing germline mutations unlike much of the rest of the cancer programme which was focused on somatic mutation detection. Consequently working with blood draws and not biopsy material.

There are >100 predisposition genes implicated in 40+ cancers and there is variable contribution depending on the mutation and the cancer type. 15% of ovarian cancers result from germline variants, and this falls to 2-3% of all cancers. For this kind of screening a negative result is just as important as a positive one.

On the NHS testing for about half these predisposition genes is already available but even basic BRAF testing is not rolled out completely so tests have ‘restricted access’.

What is really needed is more samples. Increased sample throughput drives ‘mainstreaming of cancer genetics’. And three phases need to be tested – data generation, data analysis and data interpretation.

Critical resource: http://mcgprogramme.com/

They are using a targeted panel (CAPPA – which I believe is a TruSight Cancer Panel) where every base must be covered to at least 50x, which means mean target coverage of samples approaches 1000x even for germline detection. There’s a requirement for a <8week TAT and positive and negative calls must be made. It was acknowledged that there will be a switch to WEX/WES ‘in time’ when it is cheap.

The lab runs rapid runs on a HiSeq 2500 at a density of 48 samples per run. This gives a capacity of 500+ samples per week (so I assume there’s more than one 2500 available!). 50ng of starting DNA is required and there is a very low failure rate. 2.5k samples have been run to date. 384 of these were for BRCA1/2. 3 samples have failed and 15 required ‘Sanger filling’.

In terms of analysis Stampy is used for the aligner and Platypus for variant calling due to its superior handling of indels. A modified version of ExomeDepth is used for CNV calling and internal development produced coverage evaluation and HGVS parsers. All pathogenic mutations are still validated with Sanger or another validation method.

Data interpretation is the bottleneck now, its intensive work for pathogenic variants, and VOUS are an issue – they cannot be analysed in a context independent fashion and are ‘guilty until proven innnocent’ in the clinicians mind.

They have also performed exome sequencing of 1k samples, and observed an average of 117 variants per individual of clinical significance to cancer and 16% of the population has a rare BRCA variant.

Nazneen prefers to assume that VOUS are not implicated in advance, we should stick to reporting what is known, until such time a previous VOUS is declared to be pathogenic in some form. But we should be able to autoclassify 95% of the obvious variants, reducing some of the interpretation burden. Any interpretation pipeline needs to be dynamic and iteratively improved with decision trees built into the software. As such control variant data is important, ethnic variation is a common trigger for VOUS, where the variant is not in the reference sequence, but is a population level variant for an ethnic group.

Incorporating gene level information is desirable but rarely used. For instance information about how variable a gene is would be useful in assessing whether something was likely to be pathogenic – against a background which may be highly changeable vs. one that changes little.

Although variants are generally stratified into 5 levels of significance they really need to be collapsed down into a binary state of ‘do something’ or ‘do nothing’. A number of programs help in the classification including SIFT, PolyPhen, MAPP, AlignGVD, NN-Splice, MutationTaster. The report also has Google Scholar link outs (considered to be easier to query sanely than PubMed).

To speed analysis all the tools are used to precompute scores for every base substitution possible in the panel design.

5.3    Timothy Caulfield, University of Alberta, Canada: “Marketing the Myth of Personalised Prevention in the Age of Genomics”

No notes, here but an honorable mention for Tim who gave what was easily the most entertaining talk of the conference focusing on the misappropriation of genomics health by the snake oil industries of genomic matched dating, genomic influenced exercise regimes and variant led diets.  He also asked the dangerous question that if you 1) eat healthily 2) don’t smoke 3) drink in moderation 4) exercise is there really any value in personalized medicine except for a few edge cases? Health advice hasn’t changed much in decades. And people still live unhealthily. You won’t change this by offering them a genetic test and asking them to modify their behavior. If you ever have a chance to see Tim speak, it’s worth attending. He asked for a show of hands who had done 23andMe. Quite shocking for a genetics conference 3 people had their hand in the air. Myself, Tim and one of the other speakers.

1 2 3 9