We are seeing patients in-person and through Video Visits. Learn more about how we’re keeping you safe and please review our updated visitor policy. Please also consider supporting Weill Cornell Medicine’s efforts to support our front-line workers.
Sandra and Edward Meyer Cancer Center

Warning message

The subscription service is currently unavailable. Please try again later.

You are here


Shedding a light on "dark matter" data

Friday, March 31, 2017

How many cancer researchers does it take to change a lightbulb?

Far fewer if one of them is a computational biologist. Even less if she arrives with a manual mapping the immediate and remote circuitry making the bulb function – or, in this case, malfunction.

Unfortunately, today’s scientists are working from incomplete manuals when it comes to figuring out the complex cancer matrix. When the bulb goes out, they have been trying to find fixes by examining faults in the main wiring, or flipping obvious switches. They may be overlooking an enormous web of wiring deep within the house. Or hidden switches in other rooms. Or blown-out fuses in the basement. Or the back-up generator that has been keeping this afloat while other parts of the system sputter and spark and threaten to cause major destruction.

Ekta Khurana, Ph.D.Ekta Khurana, Ph. D. Ekta Khurana, Ph. D., assistant professor of computational genomics in the Institute for Computational Biomedicine at Weill Cornell Medicine, is trying to change that. She has created a computational tool that can mine terabytes of overlooked genomic data to uncover potentially unknown drivers of cancer.

Most major tumor sequencing projects thus far have focused on identifying genes that are frequently mutated and thereby expected to have primary roles in the development of cancer. That data is then analyzed to look for ‘driver’ mutations -- primarily in proteins that change amino acid encoding – as opposed to ‘passenger’ mutations, which accumulate through various mutational processes but are considered irrelevant to tumor development.

Yet around 25% of patients do not show any mutations in known ‘drivers.’ And even well-studied cancer types such as non-small-cell lung cancer still have major subpopulations with no observable 'driver' mutations. So how can their cancers be explained – or treated properly ?

Part of the problem is, we are not armed with all of the information. The protein-coding component of the genome accounts for less than 2% of the total sequence, so we are ignoring the remaining 98%, which has to do with mutations in non-coding regions.

These non-coding regions play important supporting roles and their mutations could contribute to cancer development or progression. In the underlying circuitry network of cancer, they are the secondary switches that control the primary protein switch, or the wires that provide ways to circumvent a power loss.   

In more specific terms, some help protect the genome by separating genes from each other with long gaps, so that the transcriptional machinery in one gene or part of the chromosome can work independently of others – these are called insulators. Other non-coding sequences determine where transcription factors attach, thereby controlling the flow of genetic information from DNA to mRNA. Or they can block transcription by acting as an operator to which a repressor protein attaches. Still others determine the expression levels of various genes, or regulate when and where genes are expressed. Thus, they can act as ‘enhancers,’ ‘silencers,’ ‘promoters,’ or in variety of other ways, not all of which are understood.

“Some patients don’t have any obvious driver mutations. Even if one driver mutation is identified, it often takes more than one to actually cause cancer. And many known drivers still cannot be targeted with available drugs,” Khurana said. “Because of all these reasons, I think it’s important to look in non-coding regions to find new candidates, new targets.”

For decades, researchers in other chronic illnesses, such as diabetes and hypertension, have turned to non-coding regions for answers to the genetic causes or risk factors for their diseases and, in many cases, found them there, Khurana said.

“But in cancer, it’s been really overlooked,” she said.

That began to change in 2013, when Levi Garraway at Harvard Medical School and colleagues took a closer look at whole genome sequences of malignant melanomas and found two mutations in a non-coding region in 71 percent of the tumors they analyzed—making them more common than the known melanoma mutations in the coding regions of the genes BRAF and NRAS. Mutations in these regions of the telomerase reverse transcriptase (TERT) gene, which encodes a component of the telomerase enzyme that protects the ends of chromosomes and supports cell longevity, have since been found in many other cancers as well.

Now more than 1,000 researchers are collaborating in an international consortium to analyze cancer whole-genome sequences, and publication numbers on the topic are on the rise. Hundreds of labs are eager to incorporate non-coding variant data into their work, but such analyses are difficult. They don’t know where to look to find noteworthy driver mutations, or how to interpret the potential impact of their findings along the complex network. There’s no annotated catalog of non-coding drivers to help.

That’s where Khurana comes in. Her team is one of a few in the world working to develop computational models that can make sense of all the “dark matter” data currently being discarded during whole genome sequencing, and to turn it into something useful for researchers. The same year that Garraway was publishing about TERT, Khurana published about her computational method to prioritize non-coding mutations that could be drivers, FunSeq.

She is now working alongside others to analyze around 3,000 tumor whole-genomes as part of the Pan-Cancer Analysis of Whole Genome (PCAWG) consortium, as well as optimizing FunSeq to identify non-coding drivers of many cancer types.

She said it’s been a challenge that has required lots of innovative thinking.

“We cannot think the same way as we did when we developed models to predict drivers in the coding regions. We have to use a wider scope of biological knowledge. We have to think about the bigger picture, about entire networks,” Khurana said.

But it’s also been rewarding, as she’s been able to validate her early findings by collaborating with clinical colleagues at the Sandra and Edward Meyer Cancer Center and the Caryl and Israel Englander Institute for Precision Medicine, to see if her predictions play out in cell lines and mouse models of colorectal and prostate cancer.

“Identification of functional non-coding variants that drive tumor growth remains a challenge and a bottleneck for the use of whole-genome sequencing in the clinic,” Khurana said. “I hope to help change that.” 

Khurana will be presenting her findings at the Annual Meeting of the Association for American Cancer Researchers in Washington, DC, on April 4. She will also be participating in multiple presentations of the pan-cancer whole-genome sequencing analysis study on April 3. Her recent review on the topic of non-coding genetic variation in cancer can be read here.

Computational method to identify non-coding cancer drivers – Tuesday, April 4, 8 a.m. – 12 p.m.