Skip to main content

Spreading the Word and Sharing the Tools of Data Science

Initiatives provide tools, knowledge and connections to take research further

Amaral

Highlights

  • Big data is about the scale of data now available in different disciplines
  • Programming as a ‘second language’ is critical to doing data science
  • Researchers must take advantage of deeper, smarter, faster ways to analyze data
  • Linking scholars from very different disciplines who might not otherwise meet

Luis Amaral — as detective — has long used data science for his research to uncover the most valuable information hidden in piles of data. 

As matchmaker, he also spends much of his time plotting new ways to fix up other Northwestern researchers and data scientists from different disciplines, with the hope of stimulating new collaborations and innovation. 

Amaral learned early on from his experiences as co-director of the Northwestern Institute on Complex Systems (NICO) that data science and collaboration are game changers, and he now is eagerly spreading the word to faculty and students across the University.

Researchers, he warned, need to take advantage of the highly advanced computing power that in recent years has made it possible to analyze ever proliferating data in deeper, smarter and faster ways.

“Data science is going to affect pretty much every discipline and every scholarly endeavor, because the technology is making much more information available to scholars,” Amaral said. “To take advantage, they need to learn new approaches and techniques. If they don’t, they will fall behind.”

The professor of chemical and biological engineering at the McCormick School of Engineering and Applied Science has been leading efforts at Northwestern to provide the training, tools and resources faculty, students and postdoctoral fellows need to get started or hone their skills in data science.

The programming “boot camp” that Amaral ran for the last couple of years — and that has evolved into an undergraduate course to be launched this fall — has been a big part of that training. Amaral also hosts lunch “dating events” for researchers, in hopes that creative collaborations will evolve, similarly to how he and Kellogg’s Brian Uzzi came together, almost accidentally, more than a decade ago to combine the know-how of their distinctly different disciplines in a high-impact study of creative teams.

Read the following Q&A for Amaral’s take on data science, what sets the University apart and where efforts are headed.

Q. What is data science — and how is it changing research? 

A. Often, when people think about big data they just think about size — petabytes and terabytes of data, gigantic amounts that the regular person cannot understand. We believe this is a very wrong perspective.

Big data is not only about the scale of Google and collecting all the websites in the world, it also is about the scale of data now available to an historian who is interested in a single period in history, for example.

In past years, the historian would have visited a university in Europe and read two or three books, sources that only exist in that one library. But now, because those books are being digitized, she can stay in her Evanston office and access maybe 1,000 books about that particular period. The challenge becomes very different when dealing with 1,000 books versus two or three. She never needed to use computers and digitized versions to do her scholarship before, but now she will. The data is big for her, and what she’s doing is data science. 

Our goal is to enable scholars from all disciplines to make that jump and to give them the resources and opportunities they need to be successful. 

“We are using the University’s history of interdisciplinarity and collaboration to create an environment in which people can find good matches to do data science.” ”

Luis Amaral
Co-director of the Northwestern Institute on Complex Systems (NICO)

Q. What kind of data science training is available at Northwestern?

A. The necessary basic skill is being able to program — to write code and algorithms so you can work productively with your information and make discoveries. Programming is like learning a second language: You learn it by using it. This need for immersion led me to create a programming “boot camp” for students and postdoctoral fellows with no prior programming experience. Last September, 230 attendees from 10 Northwestern schools came with just their laptop computers. We taught them how to transform their laptop into a tool they can use to conduct research. Now the boot camp has evolved into a one-credit class for undergraduates, starting this fall. We plan to eventually expand it to include opportunities for graduate students, postdocs, faculty and staff.

The number of data science training opportunities at Northwestern continues to grow. For example, we have space in The Garage for students looking to develop their data programming skills and find entrepreneurial collaborators. The annual Computational Research Day showcases computational research as well as the related support the University provides for such work. The Feinberg School of Medicine recently launched a new training program for graduate students.

Knowing it’s important for managers to have a working knowledge of data science for decision-making, Kellogg offers a Program on Data Analytics, and McCormick’s Master of Science in Analytics Program educates multi-dimensional data science experts for business, who are in great demand.

Q. How is data science creating new, cross-disciplinary connections?

A. We are using the University’s history of interdisciplinarity and collaboration to create an environment in which people can find good matches to do data science. And we do this best by talking to one another. NICO provides an important hub in that respect. It has existed for more than a decade, so we are using its existing infrastructure and philosophy to bring together people from disparate fields. With faculty and students who know who is doing what research across the University, NICO can connect people who might not otherwise meet.

Earlier this year, we started hosting small monthly lunches, getting faculty to discuss their research informally over a turkey sandwich. Each lunch has a general theme, such as water and cities or neuroscience. Our goals are to find common interest, spark new ideas and — ultimately — make lasting connections. 

The administration is funding numerous initiatives to encourage data science collaborations at all levels. One provides seed money to collaborative faculty teams who are using data science approaches to solve research problems. Another supports talented postdoctoral fellows and students already at Northwestern, who are knowledgeable in data science, and a third is for recruiting to the University top postdocs and students with data science skills. The postdocs and students will be particularly instrumental in stimulating new data science collaborations among faculty.

Back to top