Diversity in Clinical Genetics Remains Poorly Defined

Diversity in Clinical Genetics Remains Poorly Defined but the Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group is working to address this important issue. In clinical genetics and genomics, many approaches depend on the ability to identify genetic variation that appears to be non-randomly distributed in a population. However, genetic variation often clusters in ways that reflect how peoples’ ancestors were grouped together. These historical associations are often summarized by the terms Race, Ethnicity, and Ancestry but what these terms mean, both semantically and biologically, are still very unclear. The paper below provides no clear solutions but is an excellent introduction and discussion of this problem and the challenges we face in addressing it.

Clinical Genetics Lacks Standard Definitions and Protocols for the Collection and Use of Diversity Measures

Abstract

Genetics researchers and clinical professionals rely on diversity measures such as race, ethnicity, and ancestry (REA) to stratify study participants and patients for a variety of applications in research and precision medicine. However, there are no comprehensive, widely accepted standards or guidelines for collecting and using such data in clinical genetics practice. Two NIH-funded research consortia, the Clinical Genome Resource (ClinGen) and Clinical Sequencing Evidence-generating Research (CSER), have partnered to address this issue and report how REA are currently collected, conceptualized, and used. Surveying clinical genetics professionals and researchers (n = 448), we found heterogeneity in the way REA are perceived, defined, and measured, with variation in the perceived importance of REA in both clinical and research settings. The majority of respondents (>55%) felt that REA are at least somewhat important for clinical variant interpretation, ordering genetic tests, and communicating results to patients. However, there was no consensus on the relevance of REA, including how each of these measures should be used in different scenarios and what information they can convey in the context of human genetics. A lack of common definitions and applications of REA across the precision medicine pipeline may contribute to inconsistencies in data collection, missing or inaccurate classifications, and misleading or inconclusive results. Thus, our findings support the need for standardization and harmonization of REA data collection and use in clinical genetics and precision health research.

READ MORE …

Here’s a link to the group that published this paper: https://www.clinicalgenome.org/working-groups/ancestry/

Here’s a link to the group that published this paper: https://www.clinicalgenome.org/working-groups/ancestry/

Five myths about measurement error in epidemiological research

Five myths about measurement error in epidemiological research

This isn’t a genomics paper but genomics and epidemiology are often natural bedfellows—and we all should be taking epidemiology a little more seriously these days.

Abstract

Epidemiologists are often confronted with datasets to analyse which contain measurement error due to, for instance, mistaken data entries, inaccurate recordings and measurement instrument or procedural errors. If the effect of measurement error is misjudged, the data analyses are hampered and the validity of the study’s inferences may be affected. In this paper, we describe five myths that contribute to misjudgments about measurement error, regarding expected structure, impact and solutions to mitigate the problems resulting from mismeasurements. The aim is to clarify these measurement error misconceptions. We show that the influence of measurement error in an epidemiological data analysis can play out in ways that go beyond simple heuristics, such as heuristics about whether or not to expect attenuation of the effect estimates. Whereas we encourage epidemiologists to deliberate about the structure and potential impact of measurement error in their analyses, we also recommend exercising restraint when making claims about the magnitude or even direction of effect of measurement error if not accompanied by statistical measurement error corrections or quantitative bias analysis. Suggestions for alleviating the problems or investigating the structure and magnitude of measurement error are given.


Thank you for reading!

Germline-restricted chromosome (GRC) is widespread among songbirds

Significance

We discovered that contrary to other bird species and most other animals, all examined songbird lineages contain a different number of chromosomes in the somatic and germline genomes. Their germ cells have an additional germline-restricted chromosome (GRC). GRCs contain highly duplicated genetic material represented by repetitive elements and sequences homologous to unique regions of the somatic genome. Surprisingly, GRCs even in very closely related species, vary drastically in size and show little homology. We hypothesize that the GRC was formed as an additional parasitic microchromosome in the songbird ancestor about 35 million years ago and subsequently underwent significant changes in size and genetic content, becoming an important component of the germline genome.

Abstract

An unusual supernumerary chromosome has been reported for two related avian species, the zebra and Bengalese finches. This large, germline-restricted chromosome (GRC) is eliminated from somatic cells and spermatids and transmitted via oocytes only. Its origin, distribution among avian lineages, and function were mostly unknown so far. Using immunolocalization of key meiotic proteins, we found that GRCs of varying size and genetic content are present in all 16 songbird species investigated and absent from germline genomes of all eight examined bird species from other avian orders. Results of fluorescent in situ hybridization of microdissected GRC probes and their sequencing indicate that GRCs show little homology between songbird species and contain a variety of repetitive elements and unique sequences with paralogs in the somatic genome. Our data suggest that the GRC evolved in the common ancestor of all songbirds and underwent significant changes in the extant descendant lineages.

Thank you for reading!

Suggestions for Science Communications

Hyped-up science erodes trust. Here’s how researchers can fight back.

Science is often poorly communicated. Researchers can fight back.

By Brian Resnick, Vox

In 2018, psychology PhD student William McAuliffe co-published a paper in the prestigious journal Nature Human Behavior. The study’s conclusion — that people become less generous over time when they make decisions in an environment where they don’t know or interact with other people — was fairly nuanced.

But the university’s press department, perhaps in an attempt to make the study more attractive to news outlets, amped up the finding. The headline of the press release heralding the publication of the study read “Is big-city living eroding our nice instinct?

From there, the study took on a new life as stories in the press appeared with headlines like “City life makes humans less kind to strangers.”

This interpretation wasn’t correct: The study was conducted in a lab, not a city. And it measured investing money, not overall kindness.


READ MORE…


Thank you for reading!

A Primer on Cancer from the NIH National Cancer Institute

Here is an excellent resource for understanding and explaining cancer, directly from the National Institute of Health National Cancer Institute.

A dividing breast cancer cell. Credit: National Cancer Institute / Univ. of Pittsburgh Cancer Institute

A dividing breast cancer cell. Credit: National Cancer Institute / Univ. of Pittsburgh Cancer Institute

What is Cancer? A Collection of Related Diseases

Cancer is the name given to a collection of related diseases. In all types of cancer, some of the body’s cells begin to divide without stopping and spread into surrounding tissues.Cancer can start almost anywhere in the human body, which is made up of trillions of cells. Normally, human cells grow and divide to form new cells as the body needs them. When cells grow old or become damaged, they die, and new cells take their place.

When cancer develops, however, this orderly process breaks down. As cells become more and more abnormal, old or damaged cells survive when they should die, and new cells form when they are not needed. These extra cells can divide without stopping and may form growths called tumors.

Many cancers form solid tumors, which are masses of tissue. Cancers of the blood, such as leukemias, generally do not form solid tumors.

Cancerous tumors are malignant, which means they can spread into, or invade, nearby tissues. In addition, as these tumors grow, some cancer cells can break off and travel to distant places in the body through the blood or the lymph system and form new tumors far from the original tumor.

Unlike malignant tumors, benign tumors do not spread into, or invade, nearby tissues. Benign tumors can sometimes be quite large, however. When removed, they usually don’t grow back, whereas malignant tumors sometimes do. Unlike most benign tumors elsewhere in the body, benign brain tumors can be life threatening.

Differences between Cancer Cells and Normal Cells

Cancer cells differ from normal cells in many ways that allow them to grow out of control and become invasive. One important difference is that cancer cells are less specialized than normal cells. That is, whereas normal cells mature into very distinct cell types with specific functions, cancer cells do not. This is one reason that, unlike normal cells, cancer cells continue to divide without stopping.


READ MORE …

Evolution of Cancer Cell Chromosomes Visualized using Organoids

This is free art suggesting “mutation”.

This is free art suggesting “mutation”.

Tumor cell populations are full of mutations, mutations that provide genetic diversity that can allow some to survive chemotherapeutic agents. A series of single nucleotide changes may lead to tumor growth but most dramatic changes in genome composition, and increases in genetic variation, occur after tumor cells begin replicating without regard for their chromosome number and composition. This chromosomal instability creates variation in tumors, allowing for the most aggressive subpopulations to proliferate and generating a diverse pool of genotypes—all of which need to be wiped out if the cancer is to be eradicated. While it has been possible to identify the effects of chromosomal instability (wide-spread aneuploidy) for a long time, studying the mechanisms directly has been difficult given the limited amount of genome sampling and karyotyping (chromosome imaging) that was possible compared to the amount of change in tumors.

A recent paper by Bolhaqueiro et al. describes a technological advance for studying chromosome instability that involves genetically engineering cancer cells to express fluorescent proteins that label chromosomes and culturing those cells into organoids, 3D clusters that more accurately mimic how cell grow in vivo than cells in a flat culture. This approach allows rapid single cell karyotyping and imaging of chromosome behavior during cell division. Paired with single cell sequencing, the direct study of the chromosome instability in organoids may have broad applicability. Cancer cells all start with a fairly similar toolkit and undergo a finite number of replications, so with sufficient study more of their vulnerabilities become apparent and possible to target.

Below is the introduction and link to a longer summary of the Bolhaqueiro article; the original article is interesting but longer, geared towards an expert audience, and behind a paywall.

Watching cancer cells evolve through chromosomal instability

Chromosomal abnormalities are a hallmark of many types of human cancer, but it has been difficult to observe such changes in living cells and to study how they arise. Progress is now being made on this front.

Sarah C. Johnson & Sarah E. McClelland (2019) Nature

The genomes of cancer cells are littered with mutations (errors in individual nucleotides), some of which might contribute to growth of the cancer by activating tumour-promoting genes called oncogenes, or by switching off genes belonging to a class known as tumour suppressors, which fight cancer. Yet, arguably even more important are the genomic abnormalities that occur in tumour cells on a much larger scale. For example, such a cell might contain anomalous numbers of entire chromosomes (a situation termed aneuploidy). As the tumour evolves, chromosomal abnormalities can vary between neighbouring cancer cells. This suggests that chromosomal changes can occur by repeated chromosomal ‘shuffling’ during each cell division, resulting in a high rate of genomic change, termed chromosomal instability.

READ MORE …

You are a collection of clones

cell-3089947_640.jpg

Every cell in your body has almost the same DNA as all the others. Mutations accrue with each cell division, so swaths of tissues can be traced back to the same parent cell because they all share the same distinguishing mutations. The patches of more closely related cells are clones, specifically “somatic clones” because they are all in the same body. The importance of clonal variation within individuals is becoming more explicitly recognized, but we have been worrying about them for a long time: cancer is what we call the unregulated growth of somatic clones. Detecting somatic clones before they become diseased is getting easier and we are beginning to understand the processes that help keep abnormal clones in line. Hopefully this shift in perspective will help medicine advance and help people to live longer and healthier lives.

For a good introduction to somatic mosaicism, written for a very broad audience, there is an excellent article, in the New York Times by Carl Zimmer.

For more advanced reading, here is a recent article from Science describing how gene expression analyses were used to show that large clonal populations are spread across essentially all normal healthy tissues, but the most mutations were found in the skin, lung, and esophagus—potentially explaining why these tissues are so prone to cancer.

Yizhak et al. (2019) RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science

Here is a review article on how neighboring cells with different mutations compete with each other and how this kind of battle contributes to cancer.

Di Gregorio et al. (2016) Cell Competition and Its Role in the Regulation of Cell Fitness from Development to Cancer. Developmental Cell

Transmitting Learned Behavior to Future Generations (in Worms)

cElegans.jpg

Parents pass along DNA, RNA, proteins and all the contents of cytoplasm directly to their offspring, but not memories. So parental experiences don’t directly inform the behavior of their children… unless those children are roundworms. Two recent papers describe mechanisms that allow parental experiences to be transmitted to their offspring, sometimes for several generations, in the roundworm Caenorhabditis elegans ( C.elegans — nobody says the full name). In both cases transgenerational behavior effects were dependent on small RNA populations, adding another function to the already diverse set of roles that RNA can accomplish. There is no evidence that other species, especially large complex ones like humans, exhibit anything like these mechanisms—but these papers have created a legitimate avenue of investigating if similar pathways exist in other animals and how the experiences of the parents might directly effect the behavior of their offspring.

Longer summaries can be found in the Neuroscience News and Science Daily, and summaries and links to the two original articles are listed below. Neither article is easy reading but the summaries are pretty good.

Posner et al. (2019) Neuronal Small RNAs Control Behavior Transgenerationally. Cell

Summary

It is unknown whether the activity of the nervous system can be inherited. In Caenorhabditis elegans nematodes, parental responses can transmit heritable small RNAs that regulate gene expression transgenerationally. In this study, we show that a neuronal process can impact the next generations. Neurons-specific synthesis of RDE-4-dependent small RNAs regulates germline amplified endogenous small interfering RNAs (siRNAs) and germline gene expression for multiple generations. Further, the production of small RNAs in neurons controls the chemotaxis behavior of the progeny for at least three generations via the germline Argonaute HRDE-1. Among the targets of these small RNAs, we identified the conserved gene saeg-2, which is transgenerationally downregulated in the germline. Silencing of saeg-2 following neuronal small RNA biogenesis is required for chemotaxis under stress. Thus, we propose a small-RNA-based mechanism for communication of neuronal processes transgenerationally.

Moore et al. (2019) Piwi/PRG-1 Argonaute and TGF-β Mediate Transgenerational Learned Pathogenic Avoidance. Cell

Summary

The ability to inherit learned information from parents could be evolutionarily beneficial, enabling progeny to better survive dangerous conditions. We discovered that, after C. eleganshave learned to avoid the pathogenic bacteria Pseudomonas aeruginosa (PA14), they pass this learned behavior on to their progeny, through either the male or female germline, persisting through the fourth generation. Expression of the TGF-β ligand DAF-7 in the ASI sensory neurons correlates with and is required for this transgenerational avoidance behavior. Additionally, the Piwi Argonaute homolog PRG-1 and its downstream molecular components are required for transgenerational inheritance of both avoidance behavior and ASI daf-7 expression. Animals whose parents have learned to avoid PA14 display a PA14 avoidance-based survival advantage that is also prg-1 dependent, suggesting an adaptive response. Transgenerational epigenetic inheritance of pathogenic learning may optimize progeny decisions to increase survival in fluctuating environmental conditions.


Linking How Horses Run to Their Alleles

horse-3611921_640.jpg

A paper in PLoS Genetics has identified a selection of genetic variants that clearly distinguish horses breeds that pace (running with the two legs on the same side move together) and those that trot (opposite front and back move together). Thought no physiological role has been demonstrated for the these mutations, yet, they appear to be good candidates for connecting single nucleotide changes to discrete and clearly recognizable inherited differences in behavior—and maybe a step towards understanding instincts.

McCoy, et al. (2019) Identification and validation of genetic variants predictive of gait in standardbred horses. PLoS Genetics

Author summary

Certain horse breeds have been developed over generations specifically for the ability to perform alternative patterns of movement, or gaits. Current understanding of the genetic basis for these gaits is limited to one known mutation apparently necessary, but not sufficient, for explaining variability in “gaitedness.” The Standardbred breed includes two distinct groups, trotters, which exhibit a two-beat gait in which the opposite forelimb and hind limb move together, and pacers, which exhibit an alternative two-beat gait where the legs on the same side of the body move together. Our long-term objective is to identify variants underlying the ability of certain Standardbreds to pace. In this study, we were able to identify several regions of the genome highly associated with pacing and, within these regions, a number of specific highly associated variants. Although the biological function of these variants has yet to be determined, we developed a model based on seven variants that was > 99% accurate in predicting whether an individual was a pacer or a trotter in two independent populations. This predictive model can be used by horse owners to make breeding and training decisions related to this economically important trait, and by scientists interested in understanding the biology of coordinated gait development.

READ MORE …

Coming to America: New Studies Continue to Complicate the Story

water-4013446_960_720.jpg

Here’s an interesting summary in the Smithsonian covering a couple of recent articles concerning the first people who settled the Americas. Well-worth the read.

Ancient DNA Reveals Complex Story of Human Migration Between Siberia and North America

By Brian Handwerk

smithsonian.com 
June 5, 2019

There is plenty of evidence to suggest that humans migrated to the North American continent via Beringia, a land mass that once bridged the sea between what is now Siberia and Alaska. But exactly who crossed, or re-crossed, and who survived as ancestors of today’s Native Americans has been a matter of long debate.

READ MORE …

The Limits of What DNA Can Predict

Want remarkably clear insights into genetics and public health with a bare minimum of reading? Well, some corners of Twitter have recently become an incredible resource if you’re interested in learning something about predictive statistics, epidemiology, genomics, and population genetics. There are no better examples of this than the tweetorials that Dr. Cecile Janssen posts. Dr. Janssen is a professor of translational epidemiology in the department of Epidemiology of the Rollins School of Public HealthEmory University, and her website, like her posts, contains insightful guides for thinking critically about DNA sequence data, heritability and health.

If you would like some key insights into predicting complex traits from DNA in a handful of tweets, follow this link: Why it is so hard to predict complex diseases and traits from DNA?

For a slightly longer read, here’s her article from WIRED on how DNA is best applied: DNA tells great stories -- about the past, not future

And a more advanced read, still aimed at a fairly general audience: Designing babies through gene editing: science or science fiction?

Humans and Domesticated Animals Got High the Same Way, Evolutionarily Speaking

533102_640.jpg

Convergent evolution, when two separate groups develop traits in response to the same environmental factors, is one of the clearest indicators of adaptation. Think of birds and bats, separate groups that have wings adapted for flight. Convergent evolution at the molecular level can be inferred when consistent changes are seen in the same genes in different populations that have encountered similar changes in environment or selective pressure . Will and Hueta-Sanchez have just published an exciting review article documenting how specific pathways and genes are repeatedly mutated in human and animal populations as they evolved to live in high-altitude, low-oxygen conditions in three populations spanning Asia (the Tibetan Plateau), Africa (the Ethiopian Highlands) and South America (the Andean Altiplano). While this story is just plain interesting for its own sake, it’s also a great illustration of how understanding evolutionary history can yield powerful insights into the adaptive fraction of our genomes. This is just one of several interesting and insightful articles published in this edition of Philosophical Transactions of the Royal Society B, a theme issue on ‘Convergent evolution in the genomics era: new insights and directions'.

Witt & Huerta-Sánchez (2019) Convergent evolution in human and domesticate adaptation to high-altitude environments. Philosophical Transactions of the Royal Society B

Abstract

Humans and their domestic animals have lived and thrived in high-altitude environments worldwide for thousands of years. These populations have developed a number of adaptations to survive in a hypoxic environment, and several genomic studies have been conducted to identify the genes that drive these adaptations. Here, we discuss the various adaptations and genetic variants that have been identified as adaptive in human and domestic animal populations and the ways in which convergent evolution has occurred as these populations have adapted to high-altitude environments. We found that human and domesticate populations have adapted to hypoxic environments in similar ways. Specific genes and biological pathways have been involved in high-altitude adaptation for multiple populations, although the specific variants differ between populations. Additionally, we found that the gene EPAS1 is often a target of selection in hypoxic environments and has been involved in multiple adaptive introgression events. High-altitude environments exert strong selective pressures, and human and animal populations have evolved in convergent ways to cope with a chronic lack of oxygen.


READ MORE …

A Boat Load of Genomes -- Saving Species Sequences

An excellent overview of why and how whole genome sequencing projects are moving to record as many species genomes as possible is now available for free at LabAnimal, a Nature research journal covering in vivo studies. It provides excellent coverage of technical advances and approaches making these efforts possible and one of the cutting edge campaigns, the Vertebrate Genome Project, while still remaining clear and accessible for the average reader. Enjoy!

Micheal Eisenstein (2019) Building an Annotated Arc. LabAnimal

Rapid evolution in hardware and software for DNA analysis and falling costs per experiment are making it easier for scientists to prospect the genomes of classic model organisms as well as novel species that intrigue them. Some groups are using this approach to explore biomedical questions in species with characteristics that parallel human traits, as seen with studies of cancer and behavioral disorders in domesticated dogs or vocal communication in songbirds. Others are studying species with unusual features that might nevertheless prove beneficial to human health, such as long-lived but cancer- and virus-resistant bats or the highly regenerative axolotl.”

READ MORE …

Getting Genome Annotation Right: A Refreshing Criticism

Next-generation genome annotation: we still struggle to get it right

by Steven L. Salzberg, Genome Biology, 2019

Abstract

While the genome sequencing revolution has led to the sequencing and assembly of many thousands of new genomes, genome annotation still uses very nearly the same technology that we have used for the past two decades. The sheer number of genomes necessitates the use of fully automated procedures for annotation, but errors in annotation are just as prevalent as they were in the past, if not more so. How are we to solve this growing problem?

How to Train your Genomics Models

First open resource hosts trained machine-learning genomics models to facilitates their use and exchange

A powerful new resource, one that is actually a new kind of resource, has come online and, hopefully, will help accelerate advances in genomics and the fight against many types of disease. The scale of genome data is so large that computational tools are required for every major step of acquiring, organizing, and analyzing genomes. Generating useful models from large genomic datasets, the kind you generate when studying human disease, is often difficult and time consuming and many aspects of this are now being automated using various types of machine learning approaches. Machine learning in this context can be roughly summarized as using computers to generate and evaluate huge numbers of statistical models in order to clarify relationships in datasets. To do this, the machine learning program needs to train on useful datasets. So for many cutting edge applications, the program doesn’t just need to be written but also trained—and this second step can require large amounts of time and computational resources, making the transmission and broader application of these programs less likely, until now. The Kipoi repository is the first open resource for machine learning methods in genomics, making cutting edge approaches available to clinicians and smaller labs. This resource is sure to speed the application and innovation in machine learning based genomics approaches, and hopefully we will all benefit from this new site for the free exchange of ideas.

For more information, here’s a nice summary from Technology Networks.

Here is the introduction from the original article, published in Nature Biotechnology.

Advances in machine learning, coupled with rapidly growing genome sequencing and molecular profiling datasets, are catalyzing progress in genomics1. In particular, predictive machine learning models, which are mathematical functions trained to map input data to output values, have found widespread usage. Prominent examples include calling variants from whole-genome sequencing data2,3, estimating CRISPR guide activity4,5 and predicting molecular phenotypes, including transcription factor binding, chromatin accessibility and splicing efficiency, from DNA sequence1,6,7,8,9,10,11. Once trained, these models can be probed in silico to infer quantitative relationships between diverse genomic data modalities, enabling several key applications such as the interpretation of functional genetic variants and rational design of synthetic genes.

However, despite the pivotal importance of predictive models in genomics, it is surprisingly difficult to share and exchange models effectively. In particular, there is no established standard for depositing and sharing trained models. This lack is in stark contrast to bioinformatics software and workflows, which are commonly shared through general-purpose software platforms such as the highly successful Bioconductor project12. Similarly, there exist platforms to share genomic raw data, including Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/), ArrayExpress (https://www.ebi.ac.uk/arrayexpress) and the European Nucleotide Archive (https://www.ebi.ac.uk/ena). In contrast, trained genomics models are made available via scattered channels, including code repositories, supplementary material of articles and author-maintained web pages. The lack of a standardized framework for sharing trained models in genomics hampers not only the effective use of these models—and in particular their application to new data—but also the use of existing models as building blocks to solve more complex tasks.

READ MORE …

Categorizing Cells with Machine Learning and Latent Space

Picture1.png

Two exciting and complementary machine learning methods for assigning cell identity based on single-cell sequencing data were published in a paper from Johns Hopkins. The first program, scCoGAPS, defines latent spaces from a single-cell RNA-sequencing dataset to categorize cells and the second program, projectR, evaluates latent spaces in independent target datasets using transfer learning. These two methods are interesting advances towards a goal that is likely still far off—understanding exactly what makes each cell what it is. For an excellent summary read the press release, Finding A Cell’s True Identity.

The original article is a more complicated reading but interesting through out.

Stein-O’Brien, et al. (2019) Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species. Cell Systems

Summary

Analysis of gene expression in single cells allows for decomposition of cellular states as low-dimensional latent spaces. However, the interpretation and validation of these spaces remains a challenge. Here, we present scCoGAPS, which defines latent spaces from a source single-cell RNA-sequencing (scRNA-seq) dataset, and projectR, which evaluates these latent spaces in independent target datasets via transfer learning. Application of developing mouse retina to scRNA-Seq reveals intrinsic relationships across biological contexts and assays while avoiding batch effects and other technical features. We compare the dimensions learned in this source dataset to adult mouse retina, a time-course of human retinal development, select scRNA-seq datasets from developing brain, chromatin accessibility data, and a murine-cell type atlas to identify shared biological features. These tools lay the groundwork for exploratory analysis of scRNA-seq data via latent space representations, enabling a shift in how we compare and identify cells beyond reliance on marker genes or ensemble molecular identity.

Human Genome Reference Sequence: Summary or Example?

Graph.png

There is no one human genome. Each person starts life with two non-identical copies of a genome, and variations both small and large begin to accumulate each time those copies are copied. And then there are the differences between individuals. If we think of the genome as a single list of bases at specific positions then point mutations—substitutions, small inserts and deletions—are easy enough to map to those position, however major structural variants—inversions, translocations and repetitive sequences—complicate how we map these mutations. Reference genomes, a consensus representation of deeply sequenced human genomes have traditionally been the basis of how we map nucleotides and variants to positions on chromosomes but long read technologies are making it increasingly apparent that structural variants are quite common and new methods for representing the human genome.

The first of the following articles lays out why a more advanced model for capturing the variation in the human genome is needed. The article after that describes how multiple genomes and their structural variation can be summarized using graphs, a computational improvement on the current linear reference genomes. The last article discusses the some of the single molecule sequencing technology bringing this issue to the fore. There are many other articles that deal with this topic, but these are a good start.

Yang, et al. (2019) One reference genome is not enough. Genome Biology

Abstract

A recent study on human structural variation indicates insufficiencies and errors in the human reference genome, GRCh38, and argues for the construction of a human pan-genome.

########################################################################################

Here’s an article describing how structural variants can be captured in a graph.

Rakocevic, et al. (2019) Fast and accurate genomic analyses using genome graphs. Nature Genetics

Abstract

The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.

########################################################################################

Here’s an article describing how next-next generation sequencing is illuminating the diversity of structural variants across human populations.

Chaisson, et al. (2015) Resolving the complexity of the human genome using single-molecule sequencing. Nature

Abstract

Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing2, next-generation mapping3, microfluidics-based linked reads4, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novoassembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.

Thank you for reading!

Personalized medicine approaches designed with Fruit Flies

Cagan.jpg

The promise of personalized medicine has been limited by at least two factors: 1) the power of our models—including our ability to get and process data, and 2) our ability to test potential therapeutic solutions in meaningful biological systems, rapidly and systematically, before testing in humans. While we’re seeing rapid and predictable improvements with the power of our models corresponding improvements in how we test therapies have been less consistent. Even if you have a good, genome-level understanding of an individual’s cancer, there is no guidebook for what treatment will be effective against that particular type of cancer. One promising approach has been to use fruit flies (Drosophila) genetically-modified with similar mutational loads as cancer patients to test the efficacy of drug combinations for suppressing tumor growth.

Ross Cagan is an exceptionally creative researcher who leads a lab at the Icahn School of Medicine at Mount Sinai and has been using fruit flies to develop personalized medicine approaches for over 10 years. This week, the Cagan lab published one of the first examples of clinically effective therapies based on such an approach, with lead author Dr. Erdem Bangi (see below for abstract and link). This article describes how Drosophila were screened for effective drug combinations for inhibiting tumors with a similar composition and complexity as a terminally ill colorectal cancer patient’s tumors, and how a specific drug combination was identified and used to effectively shrink the patient’s tumors. Though this treatment did not provide a permanent cure, it did appear to extend the patient’s life and is important proof that this type of approach can be effective.

Excellent, more in-depth summaries can be found HERE and HERE and HERE, so check them out.

The original article, with its somewhat difficult to penetrate title…

Bangi, E., et al. (2019) A personalized platform identifies trametinib plus zoledronate for a patient with KRAS-mutant metastatic colorectal cancerScience Advances

Abstract

Colorectal cancer remains a leading source of cancer mortality worldwide. Initial response is often followed by emergent resistance that is poorly responsive to targeted therapies, reflecting currently undruggable cancer drivers such as KRAS and overall genomic complexity. Here, we report a novel approach to developing a personalized therapy for a patient with treatment-resistant metastatic KRAS-mutant colorectal cancer. An extensive genomic analysis of the tumor’s genomic landscape identified nine key drivers. A transgenic model that altered orthologs of these nine genes in the Drosophila hindgut was developed; a robotics-based screen using this platform identified trametinib plus zoledronate as a candidate treatment combination. Treating the patient led to a significant response: Target and nontarget lesions displayed a strong partial response and remained stable for 11 months. By addressing a disease’s genomic complexity, this personalized approach may provide an alternative treatment option for recalcitrant disease such as KRAS-mutant colorectal cancer.

READ MORE …

Thank you for reading!

Polygenic traits should not be used for selecting embryos

These are actually sea urchin embryos …

The article below is an important perspective on the troubling potential use of polygenic trait scores to select embryos, written by one of the directors of the EMBL-EBI on his blog. Polygenic traits are directly affected by several loci and typically exhibit phenotypes that have continuous distributions, such as intelligence and height. While some pretty obvious arguments can be made for why using polygenic traits for selecting embryos would be immoral, this article helps to make clear that it would also likely be an ineffective way to guarantee your child has a certain height and IQ.

Polygenic trait scores, their value to medicine and for making predictions about humans, is being discussed very actively right now. Some of the most exciting, real-time conversations about polygenic traits and polygenic risk scores are happening on Twitter in real time. I strongly encourage you to follow Ewan Birney (@ewanbirney) and Cecile Janssens (@cecilejanssens) professor of translational epidemiology at Emory University, for her consistently clear and insightful comments on how we interpret whole genome data.

Why embryo selection for polygenic traits is wrong.

MAY 26, 2019 BY EWANBIRNEY

This week (May 20th 2019) has seen yet another splash by an American company offering a polygenic trait score on embryos including intelligence. This is wrong on a number of levels; ethically it is wrong to make this decision as an independent laboratory without broad societal buy in; scientifically it is wrong to imagine the ways we assess polygenic traits will translate into safe and effective embryo selection; for the specifics of IQ/Educational attainment trait this trait is so complex this is additionally unwise over and above any concerns.

READ MORE …


Bold Chinese Experiment Genetically Engineers Monkeys, maybe makes them Smarter, definitely raises some ethical questions

Bold Chines Experiment Genetically Engineers Monkeys, maybe makes them Smarter, definitely raises some ethical questions

Chinese researchers are going hard lately! Following projects that include genetically modifying embryos and letting two develop into twin human babies, and cloning primates, another envelope-pushing report comes from the Chinese Bio-Science community—this time by inserting a human version of a gene into a Rhesus Monkey. The gene MCPH1 is thought to play an important role in human brain development and contribute to the distinctively human cognitive ability. The genetically modified monkeys exhibited slower (more human-like) brain development and possibly even improved cognitive ability. This work was published by Oxford University Press on behalf of China Science Publishing & Media Ltd., which is ostensibly a peer-review journal, but not PLOS or PNAS, and it unclear if this work would be given the green light at an American University. There will certainly be debate in the press about this topic, which should be thrilling, but hopefully it will hasten a some thoughtful conclusions.


Read the original article HERE and other summaries here and here and here.