Coming to America: New Studies Continue to Complicate the Story

water-4013446_960_720.jpg

Here’s an interesting summary in the Smithsonian covering a couple of recent articles concerning the first people who settled the Americas. Well-worth the read.

Ancient DNA Reveals Complex Story of Human Migration Between Siberia and North America

By Brian Handwerk

smithsonian.com 
June 5, 2019

There is plenty of evidence to suggest that humans migrated to the North American continent via Beringia, a land mass that once bridged the sea between what is now Siberia and Alaska. But exactly who crossed, or re-crossed, and who survived as ancestors of today’s Native Americans has been a matter of long debate.

READ MORE …

Human Genome Reference Sequence: Summary or Example?

Graph.png

There is no one human genome. Each person starts life with two non-identical copies of a genome, and variations both small and large begin to accumulate each time those copies are copied. And then there are the differences between individuals. If we think of the genome as a single list of bases at specific positions then point mutations—substitutions, small inserts and deletions—are easy enough to map to those position, however major structural variants—inversions, translocations and repetitive sequences—complicate how we map these mutations. Reference genomes, a consensus representation of deeply sequenced human genomes have traditionally been the basis of how we map nucleotides and variants to positions on chromosomes but long read technologies are making it increasingly apparent that structural variants are quite common and new methods for representing the human genome.

The first of the following articles lays out why a more advanced model for capturing the variation in the human genome is needed. The article after that describes how multiple genomes and their structural variation can be summarized using graphs, a computational improvement on the current linear reference genomes. The last article discusses the some of the single molecule sequencing technology bringing this issue to the fore. There are many other articles that deal with this topic, but these are a good start.

Yang, et al. (2019) One reference genome is not enough. Genome Biology

Abstract

A recent study on human structural variation indicates insufficiencies and errors in the human reference genome, GRCh38, and argues for the construction of a human pan-genome.

########################################################################################

Here’s an article describing how structural variants can be captured in a graph.

Rakocevic, et al. (2019) Fast and accurate genomic analyses using genome graphs. Nature Genetics

Abstract

The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.

########################################################################################

Here’s an article describing how next-next generation sequencing is illuminating the diversity of structural variants across human populations.

Chaisson, et al. (2015) Resolving the complexity of the human genome using single-molecule sequencing. Nature

Abstract

Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing2, next-generation mapping3, microfluidics-based linked reads4, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novoassembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.

Thank you for reading!

Where mutations are not tolerated: a good summary of an outstanding study

Big datasets pinpoint new regions to explore the genome for disease

A dataset of more than 100,000 individuals allows researchers to identify genetic regions that are intolerant to change and may underlie developmental disorders.

background-20147_640.jpg

Imagine rain falling on a square of sidewalk. While the raindrops appear to land randomly, over time a patch of sidewalk somehow remains dry. The emerging pattern suggests something special about this region. This analogy is akin to a new method devised by researchers at University of Utah Health. They explored more than 100,000 healthy humans to identify regions of our genes that are intolerant to change. They believe that DNA mutations in these "constrained" regions may cause severe pediatric diseases.

"Instead of focusing on where DNA changes are, we looked for parts of genes where DNA changes are not," said Aaron Quinlan, Ph.D., associate professor of Human Genetics and Biomedical Informatics at U of U Health and associate director of the USTAR Center for Genetic Discovery. "Our model searches for exceptions to the rule of dense genetic variation in this massive dataset to reveal constrained regions of genes that are devoid of variation. We believe these regions may be lethal or cause extreme phenotypes of disease when mutated."

While this approach is conceptually simple, only recently has there been enough human genomes available to make it happen. These new, invariable stretches may reveal new disease-causing genes and can be used to help pinpoint the cause of disease in patients with developmental disorders. The results of this study are available online in the December 10 issue of the journal Nature Genetics.


READ MORE …

Connecting chromatin states (Epigenetics) to structural variation in human genomes

Chromatin organization modulates the origin of heritable structural variations in human genome 

Tanmoy Roychowdhury and Alexej Abyzov

Nucleic Acids Research (Article)

Abstract

Connecting chromatin states (Epigenetics) to structural variation in human genomes. Genome Media.

“Structural variations (SVs) in the human genome originate from different mechanisms related to DNA repair, replication errors, and retrotransposition. Our analyses of 26 927 SVs from the 1000 Genomes Project revealed differential distributions and consequences of SVs of different origin, e.g. deletions from non-allelic homologous recombination (NAHR) are more prone to disrupt chromatin organization while processed pseudogenes can create accessible chromatin. Spontaneous double stranded breaks (DSBs) are the best predictor of enrichment of NAHR deletions in open chromatin. This evidence, along with strong physical interaction of NAHR breakpoints belonging to the same deletion suggests that majority of NAHR deletions are non-meiotic i.e. originate from errors during homology directed repair (HDR) of spontaneous DSBs. In turn, the origin of the spontaneous DSBs is associated with transcription factor binding in accessible chromatin revealing the vulnerability of functional, open chromatin. The chromatin itself is enriched with repeats, particularly fixed Alu elements that provide the homology required to maintain stability via HDR. Through co-localization of fixed Alus and NAHR deletions in open chromatin we hypothesize that old Alu expansion had a stabilizing role on the human genome.”

Population-specific structural variation

Genome maps across 26 human populations reveal population-specific patterns of structural variation

Abstract—Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome.

READ MORE …

A discussion of the limitations of a single, static reference genome

Buffalo gave us spicy wings and the ‘book of life.’ Here’s why that’s undermining personalized medicine

“The human reference genome, largely completed in 2001, has achieved near-mythic status. It is “the book of life,” the “operating manual for Homo sapiens.” But the reference genome falls short in ways that have become embarrassing, misleading, and, in the worst cases, emblematic of the white European dominance of science — shortcomings that are threatening the dream of genetically based personalized medicine.“

READ MORE …

Disease risk estimates need more samples from more populations (Genome Biology)

Genetic disease risks can be misestimated across global populations

Michelle S. Kim, Kane P. Patel, Andrew K. Teng, Ali J. Berens, and Joseph Lachance

Genome Biology (Research article)

globe-population.png

Accurate assessment of health disparities requires unbiased knowledge of genetic risks in different populations. Unfortunately, most genome-wide association studies use genotyping arrays and European samples. Here, we integrate whole genome sequence data from global populations, results from thousands of genome-wide association studies (GWAS), and extensive computer simulations to identify how genetic disease risks can be misestimated. In contrast to null expectations, we find that risk allele frequencies at known disease loci are significantly different for African populations compared to other continents. 


READ MORE …

Alzheimers insights from the desk of the NIH Director, Dr. Francis Collins

Largest-Ever Alzheimer’s Gene Study Brings New Answers

Alzheimer’s.Risk_.Genes2_.png

Predicting whether someone will get Alzheimer’s disease (AD) late in life, and how to use that information for prevention, has been an intense focus of biomedical research. The goal of this work is to learn not only about the genes involved in AD, but how they work together and with other complex biological, environmental, and lifestyle factors to drive this devastating neurological disease.

It’s good news to be able to report that an international team of researchers, partly funded by NIH, has made more progress in explaining the genetic component of AD. Their analysis, involving data from more than 35,000 individuals with late-onset AD, has identified variants in five new genes that put people at greater risk of AD [1]. It also points to molecular pathways involved in AD as possible avenues for prevention, and offers further confirmation of 20 other genes that had been implicated previously in AD.

The results of this largest-ever genomic study of AD suggests key roles for genes involved in the processing of beta-amyloid peptides, which form plaques in the brain recognized as an important early indicator of AD. They also offer the first evidence for a genetic link to proteins that bind tau, the protein responsible for telltale tangles in the AD brain that track closely with a person’s cognitive decline.


READ MORE …

More populations need to be sampled from the whole human family tree

Lack of diversity hinders genetic studies. We can change that


As a geneticist, I feel fortunate to live in the post-genomic era. The sequencing of the human genome has made it possible to make advances in understanding human genetics at an unprecedented pace. Genetic research is changing our understanding of early human migration and offering tantalizing insights into human biology. I have high hopes that we will be able to use these insights to better prevent, treat, and potentially cure diseases.

READ MORE …