Skip to main content

Data Science Transforming Biomedical Research at Feinberg

New data science initiatives propel Feinberg to 'the future of medicine'


  • 8.4 million patient records allow scientists to develop precision medicine
  • “No other institution is approaching big data as holistically as we are”
  • Scientists analyze genetic information to develop more personalized, effective treatments
  • Connecting faculty is key to impactful data science research

CHICAGO --- Data Science is transforming biomedical research at the Feinberg School of Medicine, propelling important discoveries in rare and common diseases and beginning to translate those findings into new treatments and individualized patient care at an accelerated pace.

Findings gleaned from deep dives into data are already informing research in such fields as cardiovascular disease, cancer and care for critically ill children at Northwestern. To support the burgeoning data science field, Feinberg is recruiting faculty, creating a major new center, training graduate students and connecting scientists to each other’s data.

Data science storage

Much of the research is possible because of the oceanic depth of research and clinical data housed in Northwestern Medicine’s Enterprise Data Warehouse (NMEDW)-- one of the leading and most mature depositories in the country with 8.4 million unique patient records. That includes 95 million inpatient admissions and outpatient visits and 101 billion data elements (a patient lab test, for example) -- a number updated by 14 million new data elements every night.   

“The depth of the data on those individuals now allows us to drill down in a way that’s never been possible before and really understand individual responses to treatments,” said Dr. Donald Lloyd-Jones, chair of preventive medicine and director of Northwestern University Clinical and Translational Sciences Institute. “It’s allowing us to develop true precision medicine so we can better tailor treatments to the people who are most likely to respond and least likely to have adverse effects. That’s the end game of this. That’s the future of medicine.” 

Researchers’ use of the Enterprise Data Warehouse has soared 250 percent since 2011, supporting 858 research projects in the last four years.

Feinberg’s ‘holistic approach’ to data science

As data science in biomedical research begins to explode, Feinberg is investing heavily to position itself as a leader in the nascent field. It has recruited 20 new faculty members, created the new Center for Data Science and Informatics (CDSI), established data science classes for all graduate students and is developing the equivalent of for scientists to connect them to each others' research data. 

In addition, Feinberg recently netted a prestigious $1.25 million, five-year National Institutes of Health grant to train “the next generation of tool builders” to answer critical questions in data science better and faster. The Biomedical Data Driven Discovery Training Program will recruit three new students a year (and support each for two years) from Feinberg and the McCormick School of Engineering.

“I don’t know of any other institution that is approaching big data as holistically as we are,” said Justin Starren, the director of CDSI. “We are treating this as a fundamental skill that should be part of the general education of a biomedical graduate student.     

“Increasingly, biology is transitioning from a small team, small data world to a big team, big data world,” Starren said. “The way you approach a problem when you think about it as a big data problem is different. Training graduate students today in biomedicine without giving them exposure to data science would be equivalent to not giving them exposure to statistics.”

Translating findings into treatments

Thanks to data science, a patient with heart failure is selected for a life-changing clinical trial.

Learn about Feinberg School of Medicine's Role

Most of the Feinberg faculty has not been trained in data science, so the new center, which absorbs and expands on the Northwestern University Biomedical Informatics Center, will help them integrate data science into their research by linking them to newly recruited data science collaborators. 

Research data sets will soon be shared by other scientists for their use through the upcoming Data Index Project, an index that will compile searchable data sets from Northwestern studies. One aim of the project is to spark collaborations between scientists, a top priority throughout the University. 

“The more we can connect scientists studying big data to each other, the more productive and impactful our research will be,” Starren said.

Data science and the heart

A perfect example of how data science is shaping care at Northwestern is the research of Dr. Sanjiv Shah, an associate professor of medicine in cardiology at Feinberg. He uses NMEDW electronic health records to identify patients for enrollment in a specialized heart failure clinical program and clinical trial, and then uses a combination of deep phenotyping and machine learning to discover new ways to understand the disease process and ultimately improve treatment.

“We view this as a paradigm for how we want to help a number of clinical programs evolve,” Lloyd-Jones said. “We are helping them align their clinical and research missions by harnessing the analytical power of data science.”

It’s allowing us to develop true precision medicine, so we can better tailor treatments to the people who are most likely to respond and least likely to have adverse effects. That’s the end game of this. That’s the future of medicine. ”

Dr. Donald Lloyd-Jones
Chair of preventive medicine and director of Northwestern University Clinical and Translational Sciences Institute

Data science and critically ill children

Dr. Mark Wainwright, professor of pediatrics and neurology, and his team are looking for the signals from data science to improve the outcomes of critically ill children at Anne & Robert H. Lurie Children’s Hospital of Chicago. This group is developing tools to integrate and analyze data from all the different monitoring devices attached to a critically ill child in order to provide earlier warning of changes in a patient’s condition that require intervention by the medical team. By analyzing the trajectory of thousands of pediatric patients in intensive care, they will have computers develop an algorithm of signals to warn of an unstable situation that needs immediate attention.  

“This would be invaluable, allowing us to catch much earlier the subtle signals that a child is getting worse,” Wainwright said. “We could then intervene and prevent a cardiac arrest or other serious complications.”

Data science and cancer care

Data science also is on the cusp of transforming cancer care as scientists analyze volumes of critical genetic information to develop more personalized and effective treatment for individual patients. Ramana Davuluri, director of the Informatics Cancer Core at the Robert H. Lurie Comprehensive Cancer Center of Northwestern University, is developing methods for analyses of multi-omics data sets from patients with glioblastoma, the most common and aggressive malignant brain tumor as well as prostate, breast and ovarian cancers. The goal is to parse the genetic differences between groups of patients within each cancer to determine which treatments will best help them.

Genetics by the numbers 11,620 patients in Eugene repository at NU 4,964 number of genetic variants associated with common complex diseases and traits 6,191,387,962 number of bases in the human genome 35 petabases per year of total global gene sequencing capacity 1 prtabase equals one thousand trillion base pairs of DNA sequence 1,000,000 pet abases per year of total global gene sequencing capacity predicted in 2025

‘Mind-Boggling Amounts of Data’ and Supercomputers

The enormous volume of data generated in genomic data mining requires a tremendous amount of computing power, far more than the average desktop computer can handle.

Dr. Elizabeth McNally, who recently took the helm of the Center for Genetic Medicine, is well acquainted with the challenges of analyzing mind-boggling amounts of data. She is leading an NIH-sponsored project to examine whole genome sequencing from 300 individuals with cardiomyopathy, a common cause of heart failure. As a clinician, McNally leads the Program in Cardiovascular Genetics at the Bluhm Cardiovascular Institute at Northwestern Medicine. She works with a team of physicians and genetic counselors where they routinely use genetic testing in patients and families with inherited cardiovascular and neuromuscular diseases to determine the gene variants contributing to their diseases. This information helps establish a diagnosis and guides a therapeutic approach.

“Each genome is composed of 3 billion base pairs, the building units of the genome, and each of us have four or five million differences between us,” McNally said. “We are trying to figure out which one or two of these variants causes disease in that individual. It’s like looking for a needle in a pile of needles. It would be impossible without powerful computers.”  

“If I tried to analyze 250 genomes and the millions of differences in genes on my desktop computer, it would take 50 years to do it,” McNally said. 

She uses the supercomputer at Argonne National Laboratory for the computing power to quickly sequence and analyze her patients’ genomes in just a few days. Now she is looking forward to working with Quest, Northwestern’s high-performance computing cluster.

Data Science and Electronic Medical Records

Other important research driven by data science is NIH’s Electronic Medical Records and Genomics (eMERGE) project led at Northwestern by Rex Chisholm, the vice dean for scientific affairs and graduate education, and Maureen Smith, clinical director of NUgene. Chisholm, Smith and their team have analyzed clinical data and genome sequences from individuals in the NUgene biobank to learn which genetic abnormalities cause certain diseases and determine which drug is most effective for individuals, depending on their gene variant. Now eMERGE is building and testing a computer-decision support system — integrating genetic variant data into the electronic health records — to help doctors prescribe the correct drug.

Listen: Dr. Elizabeth M McNally, director of the Center for Genetic Medicine at Northwestern University Feinberg School of Medicine, describes how large data sets have impacted her work.

Audio Transcript

What's next in the data science race?

Scientists are racing to keep up with statistical methods and develop ever-more sophisticated techniques to extract true meaning from the data. “The analytical skills and the methodological skills are being invented everyday, because there are new problems and challenges," Lloyd-Jones said. “This is a massive amount of data and we need to understand what is random variation and what is important variation that can lead to disease.”

Read More

Our multimedia coverage on data science continues.

Back to top