Bioinformatics: Big Data Versus the Big C

Posted: Published on July 11th, 2014

This post was added by Dr P. Richardson

See Inside

The torrents of data flowing out of cancer research and treatment are yielding fresh insight into the disease

BRENDAN MONROE

In 2013, geneticist Stephen Elledge answered a question that had puzzled cancer researchers for nearly 100 years. In 1914, German biologist Theodor Boveri suggested that the abnormal number of chromosomes called aneuploidy seen in cancers might drive the growth of tumours. For most of the next century, researchers made little progress on the matter. They knew that cancers often have extra or missing chromosomes or pieces of chromosomes, but they did not know whether this was important or simply a by-product of tumour growth and they had no way of finding out.

People had ignored it for a long time, primarily because it's really hard to understand, says Elledge, of Brigham and Women's Hospital in Boston, Massachusetts. What we didn't know before is that it's actually driving cancer.

Elledge found that where aneuploidy had resulted in missing tumour-suppressor genes, or extra copies of the oncogenes that promote cancer, tumours grow more aggressively (T. Davoli et al. Cell 155, 948962; 2013). His insight that aneuploidy is not merely an odd feature of tumours, but an engine of their growth came from mining voluminous amounts of cellular data. And, says Elledge, it shows how the ability of computers to sift through ever-growing troves of information can help us to deepen our understanding of cancer and open the door to discoveries.

Modern cancer care has the potential to generate huge amounts of data. When a patient is diagnosed, the tumour's genome might be sequenced to see if it is likely to respond to a particular drug. The sequencing might be repeated as treatment progresses to detect changes. The patient might have his or her normal tissue sequenced as well, a practice that is likely to grow as costs come down. The doctor will record the patient's test results and medical history, including dietary and smoking habits, in an electronic health record. The patient may also have computed tomography (CT) and magnetic resonance imaging (MRI) scans to determine the stage of the disease. Multiply all that by the nearly 1.7 million people diagnosed with cancer in 2013 in the United States alone and it becomes clear that oncology is going to generate even more data than it does now. Computers can mine the data for patterns that may advance the understanding of cancer biology and suggest targets for therapy.

Elledge's discovery was the result of a computational method that he and his colleagues developed called the Tumor Suppressor and Oncogene Explorer. They used it to mine large data sets, including the Cancer Genome Atlas, maintained by the US National Cancer Institute, based in Bethesda, Maryland, and the Catalogue of Somatic Mutations in Cancer, run by the Wellcome Trust Sanger Institute in Hinxton, UK. The databases contained roughly 1.2 million mutations from 8,207 tissue samples of more than 20 types of tumour.

The researchers selected a set of parameters that helped to identify the genes they were looking for, such as the mutation rate or the ratio of benign mutations to those that cause a gene to stop functioning. They then applied statistical classification methods to differentiate between suppressor genes and oncogenes. About 70 suppressor genes and 50 oncogenes were already known for these tumour types, but Elledge and his colleagues increased that to about 320 and 200, respectively (although that number could fall, because some genes could turn out to be false positives). They also identified pathways in the growth process that might make good drug targets.

Making this sort of finding requires large data sets. Any individual cancer cell's a mess, but if you look at enough tumours, you get a pattern, Elledge says. The only way you can figure this out is if you look at them globally.

Go here to see the original:
Bioinformatics: Big Data Versus the Big C

Related Posts
This entry was posted in BioInformatics. Bookmark the permalink.

Comments are closed.