Cancer mutation characterization with machine learning (original article -- very cool)
/Integrated structural variation and point mutation signatures in cancer genomes using correlated topic models
Loss of DNA repair mechanisms can leave specific mutation signatures in the genomes of cancer cells. To identify cancers with broken DNA-repair processes, accurate methods are needed for detecting mutation signatures and, in particular, their activities or probabilities within individual cancers. In this paper, we introduce a class of statistical modeling methods used for natural language processing, known as “topic models”, that outperform standard methods for signature analysis. We show that topic models that incorporate signature probability correlations across cancers perform best, while jointly analyzing multiple mutation types improves robustness to low mutation counts.