
MIT lab creates tool to find cancer mutation drivers 'anywhere in the genome'
Precision oncology has been heralded as the way forward in treating cancer, but searching for driver mutations has complicated that journey as existing methods can’t be applied to the entire genome.
But a new method out of MIT looks to change the way cancer driver mutations are found and, in doing so, speed up the process. A group out of Bonnie Berger’s lab at the institution has developed what they named “Dig,” an interactive map for finding driver elements and mutations “anywhere in the genome,” according to a paper published this week in Nature Biotechnology.
The group developed a deep learning model by training the AI to map cancer-specific somatic mutation rates using the Pan-Cancer Analysis of Whole Genomes, or PCAWG, a dataset including 37 cancer types. The team used high-resolution epigenetic assays from healthy tissues to guide the predictions.
To dig deeper, the lab applied Dig to another task: finding new coding and non-coding candidate drivers of cancer using whole-genome, whole-exome and targeted sequencing cancer datasets available to the public.
Dig appeared to beat out many competing methods, the group found, noting their tool had the highest measure of accuracy in 24 out of 32 cohorts in PCAWG. Skin and blood cancers in PCAWG were excluded because of “local hypermutation processes,” they wrote.
The group’s tool also “matched or exceeded the performance” of existing methods used to search excess of mutations in driver elements that had already been identified.
The accuracy is thanks, in part, to the deep learning tool’s ability to find active transcription start sites and other epigenetic structures and then associate those structures with the mutation rates, the group wrote.
In essence, the computer science and AI lab led by Berger, head of the computation and biology group, used AI to predict mutation rates and compared it to real data to help train the model to look for mutations throughout the genome.
Finding driver mutations has typically been done using arbitrary regions of the genome, rather than the broader genetic landscape, a process that is time-consuming and expensive, the group writes in their paper. Previous methods have focused on DNA sequences that are involved in making proteins, like coding and promoter sequences, which leave out a big swath of the genome.
“These limitations contribute to catalogs of cancer driver elements remaining incomplete, particularly in the non-coding genome, hindering precision oncology,” according to the paper.
The group found rare mutations responsible for cancer that occur outside the already explored regions, perhaps shining a light on as much as one-tenth of tumors.
“Although the driver candidates we report — in cryptic splice sites, 5′ UTRs and rarely mutated genes — occurred at low frequencies individually, our estimates suggest that they collectively contribute to the disease pathology of up to 10% of tumors (summing across the percent of tumors predicted to carry excess mutations in each of these elements),” the group wrote in the paper.
The team claims the tool can be used quite broadly and can churn out results in a matter of minutes. They have made it publicly available.
“Through this framework, our maps enable millions of mutations to be evaluated in arbitrary cancer cohorts in minutes using the resources of a personal computer,” the MIT lab wrote in the paper.