Deanna Church, 10x Genomics: Interview Q&A

Q: First off congratulations on your new job at 10x Genomics. Can you tell us more about the role you have at 10x Genomics?

A: Thank you! It is great to be at a cutting-edge technology company. My role at 10x is to lead the applications group. It is an exciting opportunity because it means my group gets to take the technology being developed at 10x and explore what scientific questions it can address. This means getting to work with great external scientists as well as doing our own work internally. I enjoy being able to engage with leading scientists and learn new things all the time. While I still love to think about how we can improve and use the genome, I’m learning about the world of single cell analysis and now spend a lot of time thinking about the exciting new scientific questions this technology enables.

Q: What do we have to think about when thinking of individual genome analysis?

A: The first thing we should think about is that we carry two haplotypes, that is two copies of each chromosome, not just one. When these two haplotypes are very different, such as when structural variation is present, then analyzing these regions using traditional approaches will often fail. Most traditional analysis approaches try to average across both haplotypes, which can work OK when the two copies are almost identical, but fails miserably as the two haplotypes become more diverse.

The other thing we need to think about for individual genome analysis, especially in a clinical context, is that identifying rare variants is much more important than finding common ones. We want to find the things we don’t see in everyone, and that may be important to understanding a phenotype. It is important to remember that these rare variants can be more than just single nucleotide (SNPs) changes. The ability to identify a variety of variant types is critical for these N-of-one experiments. This may also be true for population based studies as well. A pre-print from Harvard’s Gaurav Bhatia and colleagues last year provided evidence that haplotype-based analysis may have more power than looking at individual SNPs when looking at complex disease.

Q: Why is there a need to improve our reference model and genome analysis? What are some of the major pitfalls of the current approach?

A: Absolutely. The initial model used by the Human Genome Project (HGP) involved producing a haploid consensus assembly. Thenotion was that the reference assembly only needed to represent a single haplotype. This model arose because the thought was that differences between different haplotypes in humans were not great, and consisted of largely single nucleotide differences. We now know that there can be a larger difference between two haplotypes, and in some cases, creating a consensus is not possible. The Genome Reference Consortium (GRC) started providing additional haplotype representations for regions of the genome where we know this kind of diversity exists. However, this approach is not scalable, and in fact few tools can take full advantage of this information, though there is evidence that using all the available reference assembly data will improve your analysis. Work from Benedict Paten, Alex Dilthey, Erik Garrison and others is starting to lay the ground work for developing a new model for reference-based genome analysis. This involves combining the reference assembly and known variation into a combined graph structure. I’m enthusiastic about this approach, but there is still a lot of work that needs to happen before this is a reality.

Q: You have mentioned in the past that genome analysis is not just sequencing an individual, aligning those genomes to reference genomes and coming up with a list of variants that define that genome. Rather we should perform a de novo assembly of that genome. Why is this important?

A: De novo assembly of individual genomes provides an unbiased assessment of the genome. Reference-based analysis fails in regions where the individual genome differs substantially from the reference assembly. These differences may be common in the population, just not represented in the reference. Reads from these regions may not align to the reference assembly, or often they will align to related, but not identical sequences. These off-target alignments can lead to errors in downstream analysis. As genomes become more different from the reference, a situation common in cancer, reference based analysis becomes harder and harder. Importantly, reconstructing individual haplotypes, rather than consensus sequences, provides the most detailed picture of an individual genome.

Q: What is the advantage of the 10x Genomics’ Technology when mapping the human genome?

A: The long-range information provided by the 10x Linked-Read data is critical for the reconstruction of individual haplotypes. We can partition long molecules and then tag the reads that are derived from these molecules. Because we only lightly sequence each molecule, we can obtain a high level of physical coverage across the genome while doing roughly the normal amount of sequence. For example, at any given location in the genome we have an average of 30 sequence reads covering that location, but with 10x Linked-Reads we could have 150 molecules that cover that location. Having more molecule information provides long-range data and allows us to reconstruct individual haplotypes. Additionally, the molecular barcoding allows to better resolve closely related sequences that are typically impossible to analyze with traditional short-read sequencing approaches. This allows for improved individual genome analysis.

Q: Is there anything else you would like to share with this audience? (Optional)

A: The advances in genome analysis and personalized medicine over the past decade are important and should serve as an incentive for moving forward in this area. We’ve seen a glimpse of what we can do, but we need to continue improving our technology and genome analysis tools. We also need to think about how we move beyond looking at just the genome for a better understanding of development and disease.