Diversity in Clinical Genetics Remains Poorly Defined

Diversity in Clinical Genetics Remains Poorly Defined but the Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group is working to address this important issue. In clinical genetics and genomics, many approaches depend on the ability to identify genetic variation that appears to be non-randomly distributed in a population. However, genetic variation often clusters in ways that reflect how peoples’ ancestors were grouped together. These historical associations are often summarized by the terms Race, Ethnicity, and Ancestry but what these terms mean, both semantically and biologically, are still very unclear. The paper below provides no clear solutions but is an excellent introduction and discussion of this problem and the challenges we face in addressing it.

Clinical Genetics Lacks Standard Definitions and Protocols for the Collection and Use of Diversity Measures

Abstract

Genetics researchers and clinical professionals rely on diversity measures such as race, ethnicity, and ancestry (REA) to stratify study participants and patients for a variety of applications in research and precision medicine. However, there are no comprehensive, widely accepted standards or guidelines for collecting and using such data in clinical genetics practice. Two NIH-funded research consortia, the Clinical Genome Resource (ClinGen) and Clinical Sequencing Evidence-generating Research (CSER), have partnered to address this issue and report how REA are currently collected, conceptualized, and used. Surveying clinical genetics professionals and researchers (n = 448), we found heterogeneity in the way REA are perceived, defined, and measured, with variation in the perceived importance of REA in both clinical and research settings. The majority of respondents (>55%) felt that REA are at least somewhat important for clinical variant interpretation, ordering genetic tests, and communicating results to patients. However, there was no consensus on the relevance of REA, including how each of these measures should be used in different scenarios and what information they can convey in the context of human genetics. A lack of common definitions and applications of REA across the precision medicine pipeline may contribute to inconsistencies in data collection, missing or inaccurate classifications, and misleading or inconclusive results. Thus, our findings support the need for standardization and harmonization of REA data collection and use in clinical genetics and precision health research.

READ MORE …

Here’s a link to the group that published this paper: https://www.clinicalgenome.org/working-groups/ancestry/

Here’s a link to the group that published this paper: https://www.clinicalgenome.org/working-groups/ancestry/

Five myths about measurement error in epidemiological research

Five myths about measurement error in epidemiological research

This isn’t a genomics paper but genomics and epidemiology are often natural bedfellows—and we all should be taking epidemiology a little more seriously these days.

Abstract

Epidemiologists are often confronted with datasets to analyse which contain measurement error due to, for instance, mistaken data entries, inaccurate recordings and measurement instrument or procedural errors. If the effect of measurement error is misjudged, the data analyses are hampered and the validity of the study’s inferences may be affected. In this paper, we describe five myths that contribute to misjudgments about measurement error, regarding expected structure, impact and solutions to mitigate the problems resulting from mismeasurements. The aim is to clarify these measurement error misconceptions. We show that the influence of measurement error in an epidemiological data analysis can play out in ways that go beyond simple heuristics, such as heuristics about whether or not to expect attenuation of the effect estimates. Whereas we encourage epidemiologists to deliberate about the structure and potential impact of measurement error in their analyses, we also recommend exercising restraint when making claims about the magnitude or even direction of effect of measurement error if not accompanied by statistical measurement error corrections or quantitative bias analysis. Suggestions for alleviating the problems or investigating the structure and magnitude of measurement error are given.


Thank you for reading!

Germline-restricted chromosome (GRC) is widespread among songbirds

Significance

We discovered that contrary to other bird species and most other animals, all examined songbird lineages contain a different number of chromosomes in the somatic and germline genomes. Their germ cells have an additional germline-restricted chromosome (GRC). GRCs contain highly duplicated genetic material represented by repetitive elements and sequences homologous to unique regions of the somatic genome. Surprisingly, GRCs even in very closely related species, vary drastically in size and show little homology. We hypothesize that the GRC was formed as an additional parasitic microchromosome in the songbird ancestor about 35 million years ago and subsequently underwent significant changes in size and genetic content, becoming an important component of the germline genome.

Abstract

An unusual supernumerary chromosome has been reported for two related avian species, the zebra and Bengalese finches. This large, germline-restricted chromosome (GRC) is eliminated from somatic cells and spermatids and transmitted via oocytes only. Its origin, distribution among avian lineages, and function were mostly unknown so far. Using immunolocalization of key meiotic proteins, we found that GRCs of varying size and genetic content are present in all 16 songbird species investigated and absent from germline genomes of all eight examined bird species from other avian orders. Results of fluorescent in situ hybridization of microdissected GRC probes and their sequencing indicate that GRCs show little homology between songbird species and contain a variety of repetitive elements and unique sequences with paralogs in the somatic genome. Our data suggest that the GRC evolved in the common ancestor of all songbirds and underwent significant changes in the extant descendant lineages.

Thank you for reading!

Suggestions for Science Communications

Hyped-up science erodes trust. Here’s how researchers can fight back.

Science is often poorly communicated. Researchers can fight back.

By Brian Resnick, Vox

In 2018, psychology PhD student William McAuliffe co-published a paper in the prestigious journal Nature Human Behavior. The study’s conclusion — that people become less generous over time when they make decisions in an environment where they don’t know or interact with other people — was fairly nuanced.

But the university’s press department, perhaps in an attempt to make the study more attractive to news outlets, amped up the finding. The headline of the press release heralding the publication of the study read “Is big-city living eroding our nice instinct?

From there, the study took on a new life as stories in the press appeared with headlines like “City life makes humans less kind to strangers.”

This interpretation wasn’t correct: The study was conducted in a lab, not a city. And it measured investing money, not overall kindness.


READ MORE…


Thank you for reading!

Epigenetic Atlas: A Map of Human Methylation Variants

anatomy-254129_640.jpg

Methylation of cytosines in CpG dinucleotides of DNA is a fundamental, and long-lasting, epigenetic modification that is associated with gene regulation and has been linked to human disease. Gunasekara et al. have published the first survey of human DNA methylation variation across individuals and tissues. This study conducted whole-genome bisulfite sequencing on thyroid, heart, and brain tissue collected from 10 human cadavers. These tissues were selected for study because they are derived from distinct germ layers of the embryo and therefore diverge very early in development. Computational analyses of this bisulfide sequencing found found nearly 10,000 regions that showed significant variation across samples. These variable region comprise about 0.1% of the genome and show association with transposable elements, subtelomeric regions and genes that have been implicated in a variety of human diseases. This “atlas” may serve an important reference as we begin unravelling the relationship between DNA and chromatin modifications to specific disease states.

You can read the Baylor University press release or an overview from ScienceDaily, both of which provide an easy to read, full-page summary of the study. The abstract and link to the original article is listed below.

Gunasekara et al. (2019) A genomic atlas of systemic interindividual epigenetic variation in humans. Genome Biology

Abstract

Background

DNA methylation is thought to be an important determinant of human phenotypic variation, but its inherent cell type specificity has impeded progress on this question. At exceptional genomic regions, interindividual variation in DNA methylation occurs systemically. Like genetic variants, systemic interindividual epigenetic variants are stable, can influence phenotype, and can be assessed in any easily biopsiable DNA sample. We describe an unbiased screen for human genomic regions at which interindividual variation in DNA methylation is not tissue-specific.

Results

For each of 10 donors from the NIH Genotype-Tissue Expression (GTEx) program, CpG methylation is measured by deep whole-genome bisulfite sequencing of genomic DNA from tissues representing the three germ layer lineages: thyroid (endoderm), heart (mesoderm), and brain (ectoderm). We develop a computational algorithm to identify genomic regions at which interindividual variation in DNA methylation is consistent across all three lineages. This approach identifies 9926 correlated regions of systemic interindividual variation (CoRSIVs). These regions, comprising just 0.1% of the human genome, are inter-correlated over long genomic distances, associated with transposable elements and subtelomeric regions, conserved across diverse human ethnic groups, sensitive to periconceptional environment, and associated with genes implicated in a broad range of human disorders and phenotypes. CoRSIV methylation in one tissue can predict expression of associated genes in other tissues.

Conclusions

In addition to charting a previously unexplored molecular level of human individuality, this atlas of human CoRSIVs provides a resource for future population-based investigations into how interindividual epigenetic variation modulates risk of disease.

READ MORE …

A Primer on Cancer from the NIH National Cancer Institute

Here is an excellent resource for understanding and explaining cancer, directly from the National Institute of Health National Cancer Institute.

A dividing breast cancer cell. Credit: National Cancer Institute / Univ. of Pittsburgh Cancer Institute

A dividing breast cancer cell. Credit: National Cancer Institute / Univ. of Pittsburgh Cancer Institute

What is Cancer? A Collection of Related Diseases

Cancer is the name given to a collection of related diseases. In all types of cancer, some of the body’s cells begin to divide without stopping and spread into surrounding tissues.Cancer can start almost anywhere in the human body, which is made up of trillions of cells. Normally, human cells grow and divide to form new cells as the body needs them. When cells grow old or become damaged, they die, and new cells take their place.

When cancer develops, however, this orderly process breaks down. As cells become more and more abnormal, old or damaged cells survive when they should die, and new cells form when they are not needed. These extra cells can divide without stopping and may form growths called tumors.

Many cancers form solid tumors, which are masses of tissue. Cancers of the blood, such as leukemias, generally do not form solid tumors.

Cancerous tumors are malignant, which means they can spread into, or invade, nearby tissues. In addition, as these tumors grow, some cancer cells can break off and travel to distant places in the body through the blood or the lymph system and form new tumors far from the original tumor.

Unlike malignant tumors, benign tumors do not spread into, or invade, nearby tissues. Benign tumors can sometimes be quite large, however. When removed, they usually don’t grow back, whereas malignant tumors sometimes do. Unlike most benign tumors elsewhere in the body, benign brain tumors can be life threatening.

Differences between Cancer Cells and Normal Cells

Cancer cells differ from normal cells in many ways that allow them to grow out of control and become invasive. One important difference is that cancer cells are less specialized than normal cells. That is, whereas normal cells mature into very distinct cell types with specific functions, cancer cells do not. This is one reason that, unlike normal cells, cancer cells continue to divide without stopping.


READ MORE …

GenomeWeb: Cartana, Lunaphore Form In Situ Sequencing Technology Partnership

Euro.image.jpg

NEW YORK (GenomeWeb) — Swedish startup Cartana has signed an agreement to integrate its in situ RNA sequencing (ISS) technology with Lunaphore Technologies' microfluidic tissue processor technology.

Cartana's technology, which was originally developed in the lab of Stockholm University's Mats Nilsson, is based on using barcoded padlock probes to target genes of interest. The probes target cDNA and are amplified in situ using rolling circle amplification, followed by sequencing-by-ligation, also directly on the tissue.

Under the terms of the deal, the companies will work with Nilsson to join the technology with Lunaphore's Fast Fluidic Exchange rapid immunohistochemistry platform in order to develop hardware for automated sequencing and imaging cycles.

READ MORE…

Evolution of Cancer Cell Chromosomes Visualized using Organoids

This is free art suggesting “mutation”.

This is free art suggesting “mutation”.

Tumor cell populations are full of mutations, mutations that provide genetic diversity that can allow some to survive chemotherapeutic agents. A series of single nucleotide changes may lead to tumor growth but most dramatic changes in genome composition, and increases in genetic variation, occur after tumor cells begin replicating without regard for their chromosome number and composition. This chromosomal instability creates variation in tumors, allowing for the most aggressive subpopulations to proliferate and generating a diverse pool of genotypes—all of which need to be wiped out if the cancer is to be eradicated. While it has been possible to identify the effects of chromosomal instability (wide-spread aneuploidy) for a long time, studying the mechanisms directly has been difficult given the limited amount of genome sampling and karyotyping (chromosome imaging) that was possible compared to the amount of change in tumors.

A recent paper by Bolhaqueiro et al. describes a technological advance for studying chromosome instability that involves genetically engineering cancer cells to express fluorescent proteins that label chromosomes and culturing those cells into organoids, 3D clusters that more accurately mimic how cell grow in vivo than cells in a flat culture. This approach allows rapid single cell karyotyping and imaging of chromosome behavior during cell division. Paired with single cell sequencing, the direct study of the chromosome instability in organoids may have broad applicability. Cancer cells all start with a fairly similar toolkit and undergo a finite number of replications, so with sufficient study more of their vulnerabilities become apparent and possible to target.

Below is the introduction and link to a longer summary of the Bolhaqueiro article; the original article is interesting but longer, geared towards an expert audience, and behind a paywall.

Watching cancer cells evolve through chromosomal instability

Chromosomal abnormalities are a hallmark of many types of human cancer, but it has been difficult to observe such changes in living cells and to study how they arise. Progress is now being made on this front.

Sarah C. Johnson & Sarah E. McClelland (2019) Nature

The genomes of cancer cells are littered with mutations (errors in individual nucleotides), some of which might contribute to growth of the cancer by activating tumour-promoting genes called oncogenes, or by switching off genes belonging to a class known as tumour suppressors, which fight cancer. Yet, arguably even more important are the genomic abnormalities that occur in tumour cells on a much larger scale. For example, such a cell might contain anomalous numbers of entire chromosomes (a situation termed aneuploidy). As the tumour evolves, chromosomal abnormalities can vary between neighbouring cancer cells. This suggests that chromosomal changes can occur by repeated chromosomal ‘shuffling’ during each cell division, resulting in a high rate of genomic change, termed chromosomal instability.

READ MORE …

Unproven Stem Cell Therapies Earn Traction and Criticism

This an interesting article about the some of efforts in take advantage of Stem Cell therapies happening in China, and some of actions being taken to slow these efforts down to a responsible rate.

China urged to abandon plan to sell unproven cell therapies

David Cyranoski, Nature

An international stem-cell body says the country’s proposed law could put patients at risk.

An international group of stem-cell researchers is urging China to cancel draft regulations that would permit some hospitals to sell therapies developed from patients’ own cells, without approval from the nation’s drug regulator.

The International Society for Stem Cell Research (ISSCR) sent a statement outlining its concerns to Jiao Hong, director of China’s National Medical Products Administration in Beijing, on 20 May. The society, which is based in Skokie, Illinois, represents more than 4,000 scientists, clinicians and ethicists around the world.

“We are deeply concerned that China’s newly proposed regulations will provide incentives for hospitals to market unsafe and ineffective interventions directly to consumers. This has the potential to harm the people of China, undermine public health and discredit the international standing of the Chinese regenerative medicine community,” warns the statement, which was signed by society president Doug Melton.

READ MORE …

You are a collection of clones

cell-3089947_640.jpg

Every cell in your body has almost the same DNA as all the others. Mutations accrue with each cell division, so swaths of tissues can be traced back to the same parent cell because they all share the same distinguishing mutations. The patches of more closely related cells are clones, specifically “somatic clones” because they are all in the same body. The importance of clonal variation within individuals is becoming more explicitly recognized, but we have been worrying about them for a long time: cancer is what we call the unregulated growth of somatic clones. Detecting somatic clones before they become diseased is getting easier and we are beginning to understand the processes that help keep abnormal clones in line. Hopefully this shift in perspective will help medicine advance and help people to live longer and healthier lives.

For a good introduction to somatic mosaicism, written for a very broad audience, there is an excellent article, in the New York Times by Carl Zimmer.

For more advanced reading, here is a recent article from Science describing how gene expression analyses were used to show that large clonal populations are spread across essentially all normal healthy tissues, but the most mutations were found in the skin, lung, and esophagus—potentially explaining why these tissues are so prone to cancer.

Yizhak et al. (2019) RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science

Here is a review article on how neighboring cells with different mutations compete with each other and how this kind of battle contributes to cancer.

Di Gregorio et al. (2016) Cell Competition and Its Role in the Regulation of Cell Fitness from Development to Cancer. Developmental Cell

Transmitting Learned Behavior to Future Generations (in Worms)

cElegans.jpg

Parents pass along DNA, RNA, proteins and all the contents of cytoplasm directly to their offspring, but not memories. So parental experiences don’t directly inform the behavior of their children… unless those children are roundworms. Two recent papers describe mechanisms that allow parental experiences to be transmitted to their offspring, sometimes for several generations, in the roundworm Caenorhabditis elegans ( C.elegans — nobody says the full name). In both cases transgenerational behavior effects were dependent on small RNA populations, adding another function to the already diverse set of roles that RNA can accomplish. There is no evidence that other species, especially large complex ones like humans, exhibit anything like these mechanisms—but these papers have created a legitimate avenue of investigating if similar pathways exist in other animals and how the experiences of the parents might directly effect the behavior of their offspring.

Longer summaries can be found in the Neuroscience News and Science Daily, and summaries and links to the two original articles are listed below. Neither article is easy reading but the summaries are pretty good.

Posner et al. (2019) Neuronal Small RNAs Control Behavior Transgenerationally. Cell

Summary

It is unknown whether the activity of the nervous system can be inherited. In Caenorhabditis elegans nematodes, parental responses can transmit heritable small RNAs that regulate gene expression transgenerationally. In this study, we show that a neuronal process can impact the next generations. Neurons-specific synthesis of RDE-4-dependent small RNAs regulates germline amplified endogenous small interfering RNAs (siRNAs) and germline gene expression for multiple generations. Further, the production of small RNAs in neurons controls the chemotaxis behavior of the progeny for at least three generations via the germline Argonaute HRDE-1. Among the targets of these small RNAs, we identified the conserved gene saeg-2, which is transgenerationally downregulated in the germline. Silencing of saeg-2 following neuronal small RNA biogenesis is required for chemotaxis under stress. Thus, we propose a small-RNA-based mechanism for communication of neuronal processes transgenerationally.

Moore et al. (2019) Piwi/PRG-1 Argonaute and TGF-β Mediate Transgenerational Learned Pathogenic Avoidance. Cell

Summary

The ability to inherit learned information from parents could be evolutionarily beneficial, enabling progeny to better survive dangerous conditions. We discovered that, after C. eleganshave learned to avoid the pathogenic bacteria Pseudomonas aeruginosa (PA14), they pass this learned behavior on to their progeny, through either the male or female germline, persisting through the fourth generation. Expression of the TGF-β ligand DAF-7 in the ASI sensory neurons correlates with and is required for this transgenerational avoidance behavior. Additionally, the Piwi Argonaute homolog PRG-1 and its downstream molecular components are required for transgenerational inheritance of both avoidance behavior and ASI daf-7 expression. Animals whose parents have learned to avoid PA14 display a PA14 avoidance-based survival advantage that is also prg-1 dependent, suggesting an adaptive response. Transgenerational epigenetic inheritance of pathogenic learning may optimize progeny decisions to increase survival in fluctuating environmental conditions.


Linking How Horses Run to Their Alleles

horse-3611921_640.jpg

A paper in PLoS Genetics has identified a selection of genetic variants that clearly distinguish horses breeds that pace (running with the two legs on the same side move together) and those that trot (opposite front and back move together). Thought no physiological role has been demonstrated for the these mutations, yet, they appear to be good candidates for connecting single nucleotide changes to discrete and clearly recognizable inherited differences in behavior—and maybe a step towards understanding instincts.

McCoy, et al. (2019) Identification and validation of genetic variants predictive of gait in standardbred horses. PLoS Genetics

Author summary

Certain horse breeds have been developed over generations specifically for the ability to perform alternative patterns of movement, or gaits. Current understanding of the genetic basis for these gaits is limited to one known mutation apparently necessary, but not sufficient, for explaining variability in “gaitedness.” The Standardbred breed includes two distinct groups, trotters, which exhibit a two-beat gait in which the opposite forelimb and hind limb move together, and pacers, which exhibit an alternative two-beat gait where the legs on the same side of the body move together. Our long-term objective is to identify variants underlying the ability of certain Standardbreds to pace. In this study, we were able to identify several regions of the genome highly associated with pacing and, within these regions, a number of specific highly associated variants. Although the biological function of these variants has yet to be determined, we developed a model based on seven variants that was > 99% accurate in predicting whether an individual was a pacer or a trotter in two independent populations. This predictive model can be used by horse owners to make breeding and training decisions related to this economically important trait, and by scientists interested in understanding the biology of coordinated gait development.

READ MORE …

Coming to America: New Studies Continue to Complicate the Story

water-4013446_960_720.jpg

Here’s an interesting summary in the Smithsonian covering a couple of recent articles concerning the first people who settled the Americas. Well-worth the read.

Ancient DNA Reveals Complex Story of Human Migration Between Siberia and North America

By Brian Handwerk

smithsonian.com 
June 5, 2019

There is plenty of evidence to suggest that humans migrated to the North American continent via Beringia, a land mass that once bridged the sea between what is now Siberia and Alaska. But exactly who crossed, or re-crossed, and who survived as ancestors of today’s Native Americans has been a matter of long debate.

READ MORE …

The Limits of What DNA Can Predict

Want remarkably clear insights into genetics and public health with a bare minimum of reading? Well, some corners of Twitter have recently become an incredible resource if you’re interested in learning something about predictive statistics, epidemiology, genomics, and population genetics. There are no better examples of this than the tweetorials that Dr. Cecile Janssen posts. Dr. Janssen is a professor of translational epidemiology in the department of Epidemiology of the Rollins School of Public HealthEmory University, and her website, like her posts, contains insightful guides for thinking critically about DNA sequence data, heritability and health.

If you would like some key insights into predicting complex traits from DNA in a handful of tweets, follow this link: Why it is so hard to predict complex diseases and traits from DNA?

For a slightly longer read, here’s her article from WIRED on how DNA is best applied: DNA tells great stories -- about the past, not future

And a more advanced read, still aimed at a fairly general audience: Designing babies through gene editing: science or science fiction?

Humans and Domesticated Animals Got High the Same Way, Evolutionarily Speaking

533102_640.jpg

Convergent evolution, when two separate groups develop traits in response to the same environmental factors, is one of the clearest indicators of adaptation. Think of birds and bats, separate groups that have wings adapted for flight. Convergent evolution at the molecular level can be inferred when consistent changes are seen in the same genes in different populations that have encountered similar changes in environment or selective pressure . Will and Hueta-Sanchez have just published an exciting review article documenting how specific pathways and genes are repeatedly mutated in human and animal populations as they evolved to live in high-altitude, low-oxygen conditions in three populations spanning Asia (the Tibetan Plateau), Africa (the Ethiopian Highlands) and South America (the Andean Altiplano). While this story is just plain interesting for its own sake, it’s also a great illustration of how understanding evolutionary history can yield powerful insights into the adaptive fraction of our genomes. This is just one of several interesting and insightful articles published in this edition of Philosophical Transactions of the Royal Society B, a theme issue on ‘Convergent evolution in the genomics era: new insights and directions'.

Witt & Huerta-Sánchez (2019) Convergent evolution in human and domesticate adaptation to high-altitude environments. Philosophical Transactions of the Royal Society B

Abstract

Humans and their domestic animals have lived and thrived in high-altitude environments worldwide for thousands of years. These populations have developed a number of adaptations to survive in a hypoxic environment, and several genomic studies have been conducted to identify the genes that drive these adaptations. Here, we discuss the various adaptations and genetic variants that have been identified as adaptive in human and domestic animal populations and the ways in which convergent evolution has occurred as these populations have adapted to high-altitude environments. We found that human and domesticate populations have adapted to hypoxic environments in similar ways. Specific genes and biological pathways have been involved in high-altitude adaptation for multiple populations, although the specific variants differ between populations. Additionally, we found that the gene EPAS1 is often a target of selection in hypoxic environments and has been involved in multiple adaptive introgression events. High-altitude environments exert strong selective pressures, and human and animal populations have evolved in convergent ways to cope with a chronic lack of oxygen.


READ MORE …

A Boat Load of Genomes -- Saving Species Sequences

An excellent overview of why and how whole genome sequencing projects are moving to record as many species genomes as possible is now available for free at LabAnimal, a Nature research journal covering in vivo studies. It provides excellent coverage of technical advances and approaches making these efforts possible and one of the cutting edge campaigns, the Vertebrate Genome Project, while still remaining clear and accessible for the average reader. Enjoy!

Micheal Eisenstein (2019) Building an Annotated Arc. LabAnimal

Rapid evolution in hardware and software for DNA analysis and falling costs per experiment are making it easier for scientists to prospect the genomes of classic model organisms as well as novel species that intrigue them. Some groups are using this approach to explore biomedical questions in species with characteristics that parallel human traits, as seen with studies of cancer and behavioral disorders in domesticated dogs or vocal communication in songbirds. Others are studying species with unusual features that might nevertheless prove beneficial to human health, such as long-lived but cancer- and virus-resistant bats or the highly regenerative axolotl.”

READ MORE …

Getting Genome Annotation Right: A Refreshing Criticism

Next-generation genome annotation: we still struggle to get it right

by Steven L. Salzberg, Genome Biology, 2019

Abstract

While the genome sequencing revolution has led to the sequencing and assembly of many thousands of new genomes, genome annotation still uses very nearly the same technology that we have used for the past two decades. The sheer number of genomes necessitates the use of fully automated procedures for annotation, but errors in annotation are just as prevalent as they were in the past, if not more so. How are we to solve this growing problem?

How to Train your Genomics Models

First open resource hosts trained machine-learning genomics models to facilitates their use and exchange

A powerful new resource, one that is actually a new kind of resource, has come online and, hopefully, will help accelerate advances in genomics and the fight against many types of disease. The scale of genome data is so large that computational tools are required for every major step of acquiring, organizing, and analyzing genomes. Generating useful models from large genomic datasets, the kind you generate when studying human disease, is often difficult and time consuming and many aspects of this are now being automated using various types of machine learning approaches. Machine learning in this context can be roughly summarized as using computers to generate and evaluate huge numbers of statistical models in order to clarify relationships in datasets. To do this, the machine learning program needs to train on useful datasets. So for many cutting edge applications, the program doesn’t just need to be written but also trained—and this second step can require large amounts of time and computational resources, making the transmission and broader application of these programs less likely, until now. The Kipoi repository is the first open resource for machine learning methods in genomics, making cutting edge approaches available to clinicians and smaller labs. This resource is sure to speed the application and innovation in machine learning based genomics approaches, and hopefully we will all benefit from this new site for the free exchange of ideas.

For more information, here’s a nice summary from Technology Networks.

Here is the introduction from the original article, published in Nature Biotechnology.

Advances in machine learning, coupled with rapidly growing genome sequencing and molecular profiling datasets, are catalyzing progress in genomics1. In particular, predictive machine learning models, which are mathematical functions trained to map input data to output values, have found widespread usage. Prominent examples include calling variants from whole-genome sequencing data2,3, estimating CRISPR guide activity4,5 and predicting molecular phenotypes, including transcription factor binding, chromatin accessibility and splicing efficiency, from DNA sequence1,6,7,8,9,10,11. Once trained, these models can be probed in silico to infer quantitative relationships between diverse genomic data modalities, enabling several key applications such as the interpretation of functional genetic variants and rational design of synthetic genes.

However, despite the pivotal importance of predictive models in genomics, it is surprisingly difficult to share and exchange models effectively. In particular, there is no established standard for depositing and sharing trained models. This lack is in stark contrast to bioinformatics software and workflows, which are commonly shared through general-purpose software platforms such as the highly successful Bioconductor project12. Similarly, there exist platforms to share genomic raw data, including Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/), ArrayExpress (https://www.ebi.ac.uk/arrayexpress) and the European Nucleotide Archive (https://www.ebi.ac.uk/ena). In contrast, trained genomics models are made available via scattered channels, including code repositories, supplementary material of articles and author-maintained web pages. The lack of a standardized framework for sharing trained models in genomics hampers not only the effective use of these models—and in particular their application to new data—but also the use of existing models as building blocks to solve more complex tasks.

READ MORE …

Heavy Data Science Startup, Tempus, Brings in $200 Million in New Funding

Tempus, a health technology building massive data sets of cancer-related information, knowledge-bases, has acquired an additional $200 million in funding. This injection of cash brings the Chicago based startup’s valuation to $3.1 billion and is reportedly intended to permit increased growth and the investigation of additional pathologies, such as diabetes and depression. With moves like this, Tempus seems like it’s worth watching. Read More at Forbes