Skip to main content

Data Science Transforming Northwestern Research

‘Seismic shift’ in how faculty are doing research, how students are learning



  • ‘Every single field of academia will be affected by data science’
  • ‘In 10 to 15 years, data science basics will be as important as learning algebra’
  • Data science propelled astrophysicist’s work on biggest breakthrough in modern physics
  • Read four-part series about how Northwestern is using data science as it changes the world

EVANSTON, Ill. --- Northwestern Medicine cardiologist Dr. Sanjiv Shah was able to search a database of 3 million unique patient records to zero in on Larry Sherman’s rare form of heart disease.

Sherman couldn’t even lift a bag of groceries before he enrolled in Shah’s heart drug trial in 2008. Today, Sherman is toting boxes of LPs, 45s and 78s to the used record store he recently opened on the South Side of Chicago.

The data science that helped diagnose his illness and led to such an extraordinary change in Sherman’s life also is at the heart of a major transformation at Northwestern University. Data science is affecting every aspect of Northwestern’s learning and research enterprises — among other things, leading to breakthroughs in precision medicine; contributing to a revolution in astronomy, with profound insights about the universe; transforming the scope and depth of social science research with significant policy implications; changing the way humanities scholarship is conducted and fueling research about consumer behavior that is affecting how companies do business locally and globally. 

The tsunami of data pouring into Northwestern today, powered by extraordinary recent advances in computer speed and capacity, is transforming scholarship and changing the way this institution conducts research, teaches students, solves problems, promotes learning and extends the frontiers of human knowledge. 

In her commencement speech at Northwestern last year, alumna Virginia Rometty, chairman, president and chief executive officer of IBM, told the Class of 2015 that they were graduating at “a truly unique moment in history” — the dawn of a new era of computing made possible by big data. 

Every important decision mankind makes will be informed by a cognitive computer, predicted Rometty, a 1979 graduate of Northwestern’s McCormick School of Engineering and Applied Science. “These systems, like humans, reason; they deal in the gray areas; they build hypotheses; they test them with vast amounts of data, and they rapidly determine possible answers with explicit degrees of confidence,” she said.

These systems, Rometty observed, represent a transformational industry now driving global progress: “What steam was to the 18th century, electricity to the 19th and hydrocarbons to the 20th, data will be to the 21st century. That’s why I call data a new natural resource.” 

While data science has long been part of research at the University, what’s changing is data’s digital reach into practically every discipline and Northwestern’s robust and growing ability to analyze and compute the data. The University wants to make sure all faculty and students are riding the wave.

What steam was to the 18th century, electricity to the 19th and hydrocarbons to the 20th, data will be to the 21st century. That’s why I call data a new natural resource. ”

IBM CEO Virginia Rometty
McCormick ’79

Leading the charge

Luis Amaral, professor of chemical and biological engineering at the McCormick School of Engineering, is helping lead the charge at Northwestern to instruct fellow educators and students how to use the tools of this technology across disciplines, departments, institutes and schools.  

“Every single field of academia will be affected by data science, and everyone needs to understand it,” he said. “It’s a seismic shift in how faculty think about and do research, and it involves essential skills undergraduates and graduate students must learn to work for companies or launch their own startups in the 21st century.”

Largely through the Northwestern Institute on Complex Systems (NICO), Amaral has been central to the University’s efforts to train scholars and students alike in the basics of data science, including a popular boot camp he ran for the last three years that has evolved into a one-credit course for undergraduates that will be launched this fall. He cited the matchmaking of scholars from disparate disciplines that occurs in the data science training and informal get-togethers he organizes, opening up exciting possibilities for research collaborations that characterize Northwestern’s emphasis on experiential, whole-brain learning that transcends strict academic boundaries.   

“Few students come in with data science expertise, and not many faculty are trained in it,” said Amaral, who stressed that people with no programming knowledge can learn relatively quickly how to turn their laptops into powerful tools. “In 10 to 15 years, data science basics will be as important as learning algebra.”

Northwestern has the computational ability to handle and aggregate lots of data, and along with that comes the ability to aggregate data the University couldn’t manage before. The next layer is the analytics that are being built to help develop the science of data for deeper understanding of this wealth of digital information. 

Preparing Northwestern's students and scholars

Professor Luis Amaral discusses Northwestern's growing data science training options, from undergraduate boot camps to executive education.

Read the Interview

Transforming scholarly research

In one historic example, Northwestern astrophysicist Vicky Kalogera utilized Northwestern’s Quest High-Performance Computing Cluster in pathbreaking work in 2015 that helped lead to one of the biggest discoveries in modern physics. The director of Northwestern’s Center for Interdisciplinary Exploration and Research in Astrophysics (CIERA), Kalogera is a key researcher in the worldwide consortium of scientists involved in the first detection here on Earth of gravitational waves and a binary black hole billions of light years away. An expert in black-hole formation in binary systems and related data analysis, Kalogera worked for nearly two decades to reach this moment. The detection of gravitational waves at long last confirmed Albert Einstein’s general theory of relativity, opening up an unprecedented new window onto the cosmos.  

“The idea that now, 100 years later, not only are we confirming his predictions, but we also are using these observations to learn about black holes, was unimaginable to Einstein, first of all, but even to us,” said Kalogera, who attended the February press conference about the announcement in Washington, D.C., and was cited in the worldwide media coverage. She is the Erastus O. Haven Professor and associate chair of the department of physics and astronomy in the Weinberg College of Arts and Sciences.

Listen: Professor Vicky Kalogera discusses how data science and data analysis played a major part in the detection of gravitational waves and binary black holes.

Audio Transcript

Though data in the social sciences tends be smaller in scope, the work that scholars such as Northwestern’s David Figlio are doing from their offices on relatively inexpensive desktop computers, but with powerful technology, also is having a transformative effect on research and policy. 

For nearly a decade, Figlio, the director of the Institute for Policy Research and the Orrington Lunt Professor of Education and Social Policy, has been compiling an unprecedented database of birth, education and early childhood records for more than 2 million Florida children. As computational power gets faster, easier to use and smarter, his research is questioning basic assumptions. One recent Figlio study featured in The New York Times, for example, found a correlation of higher birth weights with better test scores when the babies reached school age. The findings raised questions about America’s medicalized approach to childbirth.

 “Up until a few years ago, tracking children over time like this was impossible,” Figlio said. “Now we’re able to study new questions, revisit experiments and follow them for longer periods.”

Though best known for transforming the physical and life sciences, the data science revolution also has inspired researchers who traditionally work with smaller data sets in fields such as economics, political science, geography and sociology. The use of big data in the social sciences has the potential to change long-standing paradigms, because it allows scientists to ask — and try to answer — previously unfathomable questions, including ethical and philosophical concerns raised by technological advancements. It has the potential to help researchers find the answers to fundamental questions of businesses, governments and social sciences, as well as health and other fields. 

Digging for answers in data

Larry Sherman, the patient with the rare form of heart disease — which, like the proverbial needle in a haystack, was found among millions of patient records — puts a very human face on the extraordinary capabilities of data science. His condition emerged thanks to expanding abilities of researchers to generate and store unimaginable amounts of data and then to drill down, with superfast and powerful computing analytics, to extract the finely detailed information that is driving precision medicine today.

The massive treasure trove of data stored in Northwestern Medicine’s Enterprise Data Warehouse, now 8.4 million patient records, is propelling important discoveries in rare and common diseases — allowing scientists to perform individualized patient treatments tailored to the people who are most likely to respond and least likely to have adverse effects.

Elizabeth McNally, who recently took the helm of the Center for Genetic Medicine and who works with mind-boggling amounts of data, couldn’t possibly do her research without today’s powerful computers. With a team of physicians and genetic counselors, McNally routinely uses genetic testing in patients and families with inherited cardiovascular and neuromuscular diseases to determine gene variants contributing to their diseases.

“If I tried to analyze 250 genomes and the millions of differences in genes on my desktop computer, it would take 50 years to do it,” McNally said.

datanumbers infographic Northwestern Medicine Enterprise Data Warehouse (EDW) but the numbers More than 8.4 million distinct patients in data warehouse 2016 More than 529 million lab results sine 1998 More than 14.7 million data elements updated each day in data warehouse More than 44.5 million prescriptions since 1996 129 different computer systems feeding data to the data warehouse More than 19.2 impatient admissions in data warehouse More than 75.8 million outpatient visits in dat warehouse 35 Terabytes of data in the data warehouse More than 1.5 Petabytes total NU data stored 1 terabyte equals 1024 gigabytes 1 petabyte equals 1024 terabytes

Data science is changing the scale of research across all disciplines, Amaral stressed. “Naturally, changing the scale of the research greatly affects the way it needs to be done,” he said. “If you’re a historian used to looking at a few sources for your research but now want to integrate analyses from 1,000 sources, you can’t do it the way you used to.” 

In engineering, for example, data science is helping Northwestern scholars form new collaborations with the humanities to visualize data to communicate insight into art. In cognitive sciences, it is helping researchers do computational modeling and analyze how machines process information to study and understand how human brains process information, he said.

The McCormick School of Engineering is leading the way in data science with a fast-growing computer science department, strength in data analytics and optimization and the invention of new data science tools. Employers of highly reputable companies, such as IBM, Google, Facebook and PayPal, are beating down the doors to hire graduates of Northwestern’s Master of Science in Analytics Program.

“In the not-so-distant past, a competitive advantage was who had data,” McCormick Dean Julio Ottino said. “Now the advantage is who can make sense of the data.” 

Crossing academic boundaries

Big data in itself is not really new to Northwestern, observed Jay Walsh, vice president for research. “But now we have the computational ability to handle ever-increasing amounts of data. We have the ability to aggregate and generate data in ways we couldn’t before, and we’re able to develop the science of data with the analytics that do the understanding.” 

Walsh meets monthly with associate deans for research and other top administrators to get briefed on what is new with data science across the University.

"At Northwestern, we have long emphasized and supported an interdisciplinary culture that encourages faculty and students to work across schools and departments to foster innovation and discovery,” said Provost Daniel Linzer. “And data science is opening up possibilities for learning that are breaking down boundaries in unprecedented ways  whether in medicine, engineering, law, business or the life or physical sciences — and expanding our research options while contributing to a deeper understanding of the world.”

Across the nation, new professors increasingly are being hired to focus on data science research in higher education, and expanding speed and storage capacity in computing and more precise methods of analysis are rapidly transforming research at universities and colleges around the globe.

The implications of extracting ever-deepening learning from massive volumes of data are profound, changing the world in myriad ways and the means by which Northwestern is making sense of it all and furthering knowledge.

In addition to the rapid expansion of data science learning, student demand and research opportunities in computer science are skyrocketing. On June 1, Northwestern University announced it will hire an additional 20 faculty members and substantially expand its commitment to this field in the years ahead. Half of the new faculty appointments will be in core computer science areas and half structured as collaborative “CS+X” appointments with other disciplines. The University is making initial investments in advance of fundraising to support the overall effort, which is expected to exceed $150 million.

“This is an investment in the future of the University. Computer science has become a foundational discipline for many of our students, and faculty across the University are increasingly using computational thinking,” Northwestern President Morton Schapiro said. “It is important for Northwestern to continue to enable new paths for exploration. The time is right to make this commitment.”

Interest in computer science among students at Northwestern has increased significantly. In the last five years, the number of computer science majors has more than tripled. Overall course enrollments have more than doubled, with non-majors taking many advanced classes. Basic computer science skills have become a prerequisite for many jobs for new graduates, and demand for such knowledge will only increase in the future.

Computer science and computational thinking have the potential to touch nearly every field at the University. The explosion of available data combined with increased computing power has resulted in research growth in areas such as artificial intelligence and machine learning, robotics and data analytics. While this has led to a wealth of new research within computer science, the power of the field lies in its ability to provide new collaborations and points of view to many academic disciplines. This collaboration has a long history at the University: Several Northwestern computer science professors have joint appointments in areas such as music, journalism and education, and many of the new faculty positions will be at new intersections of disciplines. These CS+X appointments will accelerate collaborative research.

- Storer H. Rowley, director, and Marla Paul, health sciences editor, in Media Relations, contributed to this article.

Read More

Highlights of the other three parts of our four-part multimedia coverage on data science:

Back to top