Weill Cornell team publishes details of precision medicine knowledgebase of somatic tumor mutations

Tuesday, June 28, 2016

Photo of Olivier ElementoOlivier Elemento Researchers from Weill Cornell Medicine have published a paper in the BioRxiv that describes the Precision Medicine Knowledgebase (PMKB), an online platform that offers access to clinical-grade tumor mutations, annotations, and interpretations identified in patient samples. 

According to the paper, the repository uses the Human Genome Variation Society (HGVS) variant description format to build a data model that links variants to both tumor- and tissue-specific interpretations. Researchers can browse the information contained in the knowledgebase by gene, tumor type, or tissue. It includes support for all major variant types, standardized authentication protocols, defined user roles, and tools for tracking user activity. It also includes an application-programming interface that grants users programmatic access to the PMKB data, letting them use third-party algorithms and programs to query the information contained in the database.

PMKB focuses on somatic cancer variants and germline mutations linked to cancer risks. According to recent numbers from the website, the PMKB currently contains 144 genes, 458 variant descriptions, and 286 clinical interpretations gleaned from the scientific literature. The mutations are associated with tumor subtypes such as adenocarcinoma, acute myeloid leukemia, renal cell carcinoma, and more. These are clinical-grade interpretations created by doctors who have training in molecular pathology, Olivier Elemento, associate director of Weill Cornell's Institute for Computational Biomedicine and one of the study's authors, told GenomeWeb. It is the knowledgebase that pathologists affiliated with the medical school use in clinical reports.

In fact, according to numbers from PMKB's developers, these interpretations have been used in over 1,500 hotspot mutation panels and 750 whole-exome sequencing tests conducted at Weill Cornell and NewYork-Presbyterian. This includes interpretations of mutations found in genes such as EGFR, BRAF, KRAS, and KIT, which are associated with the largest numbers of interpretable variants in the database.

Last month, Weill Cornell signed an agreement with Australia's Garvan Institute of Medical Research to share de-identified germline and somatic mutation data as well as de-identified clinical data generated at both institutions to increase statistical power of their studies, provide new insights into genotype and phenotype associations, and better assess and interpret rare somatic and germline variants. As part of those efforts, the partners agreed to collaborate on the PMKB's development.

But the Cornell researchers are casting a wider net for partners willing to contribute variants and annotations to the PMKB, hoping to create a resource akin to Wikepedia but for clinical-grade cancer variant annotations and interpretations, Elemento said. To that end, they are holding discussions with researchers at other institutions outside of Garvan including at New York University and the University of Pennsylvania, he told GenomeWeb.

They also hope to get contributions from the broader cancer research community, and have set up mechanisms through which qualified medical geneticists and pathologists can contribute variant description and interpretations to the platform. This includes using the Security Assertion Markup Language-based (SAML) authentication format, which makes it possible to use institution-based logins with external web applications. This allows a wide variety of users from different institutions to access and make edits to the PMKB.

Initially, when third-party scientists submit interpretations to the PMKB these are housed separately from the rest of the information in the repository until Weill Cornell board-certified pathologists have had a chance to review and approve them or edit them prior to approval. All changes to variant entries are tracked in an audit log. Submissions to the database have to adhere to the formats used for variants in the PMKB and they should be supported by peer-reviewed papers that clearly support the actionability of the variants in terms of their ability to influence patient care.

They Weill Cornell researchers are also exploring data-sharing partnerships with some larger existing resources. For example, Elemento said he recently joined one of the work groups organized by the Global Alliance for Genomics and Health that is essentially focused on establishing mechanisms for querying information stored in disparate repositories. That group is trying to address issues associated with reconciling information across sites where data may be stored in different formats and using different standards as well as ways of keeping the databases current.

"Part of the reason why it makes sense to keep different knowledgebases is that every institution is different when it comes to the different kinds of cancers they treat," Elemento said. For example, at Cornell, clinicians have a lot of expertise on prostate cancers and hematological malignancies and that expertise is reflected in the content of the PMKB. Also, clinical testing laboratories have very specific needs and ways of interpreting the information they use in clinical decisions.

"Eventually, it is possible that we will all agree on interpretations but I think before that happens, we'll need to maintain this diversity of interpretations," he said. Through the GA4GH collaboration, at least, there is a way to tap into and query the information contained in all the individual knowledgbases without necessarily consolidating them. This way clinicians can access to multiple interpretations, weigh them, and select those with the strongest support.

The PMKB is similar to existing resources such as the Washington University School of Medicine's Clinical Interpretations of Variants in Cancer (CiViC) database and Vanderbilt University's My Cancer Genome, which also seek to catalogue information about clinically relevant tumor mutations. 

CiViC, released in beta last year, provides a public forum for sharing written summaries based on scientific researchers about the clinical relevance of mutations found in sequenced tumor samples. Like PMKB, members of the community can read and comment on posted summaries; suggest corrections, updates, or more nuanced descriptions of variants; as well as submit new evidence statements about particular variants. Also, as with the PMKB, researchers can use these summaries in the clinical testing reports that they generate for clinicians, according to CiViC's developers. Similarly, Vanderbilt's My Cancer Genome provides data on genes, mutations, and treatments associated with multiple tumor subtypes including breast, bladder, ovarian, and colorectal cancers. It also includes information on National Cancer Institute-supported clinical trials searchable by disease or gene and resources that support research involving rarer cancer mutations.

However, the Weill Cornell team believes that the PMKB offers some distinct features not included in some existing repositories. It's not clear which databases the researchers looked at specifically but, according to the paper, they claim that in some resources they assessed the mutation interpretations "do not meet required levels of brevity and specificity." For example, some databases do not provide variant interpretations by tumor type while others cover only point mutations and indels and exclude other common clinically relevant mutations such as gene fusions and copy number alterations, they wrote.

Moreover, the researchers attach genotype information to the each of the mutations so that clinicians can assess mutations in the context of the different tumor types, Elemento told GenomeWeb. So, for instance, BRAF mutations are important in melanomas but not necessarily in colorectal cancer, which is useful information for clinicians to know when making treatment decisions for colorectal cancer patients. Cornell pathologists also spend a "significant" amount of time updating the information in the database including adding new interpretations and support for existing interpretations, he said. In comparison, some repositories are not as well maintained or as current.

Also, there are "clinically critical" features missing from some databases such as "whether a variant is a pertinent negative in a given tumor type," the paper states. That is something that's becoming very important when it comes to signing out cases in genomic testing, Elemento said. These are basically mutations whose status must be included in clinical reports whether or not they are present in samples. "There are quite a few mutations where [reporting that] the mutation is negative [meaning] that is there is no mutation is almost as important as calling the mutation," he explained. An example is the BRAF V600E, whose status has to be reported in clinical reports in metastatic melanoma cases whether it is positive or negative, he said.

The researchers also track germline variants that are important for predicting cancer predisposition such as mutations found in the BRCA1 and 2 genes. Here PMKB's developers have also identified another potential two-way data-sharing opportunity. Elemento said that his team is currently talking with members of the GA4GH's BRCA Exchange consortium about the possibility of including variants from that repository in the PMKB using the PMKB's API. Similarly, the Weill Cornell researchers would also contribute variants from the PMKB to BRCA Exchange, he said. 

This article originally appeared on GenomeWeb. Read the original here.