Skip to main content

Big Data to Transform Social Science Research

Huge amounts of data have the potential to change long-standing paradigms



  • Social scientists asking previously unanswerable questions
  • Researchers use data to analyze human behavior
  • Humanities scholars explore ethical, philosophical questions
  • Making inroads in health policy, medical malpractice

EVANSTON, Ill. --- Until recently, it was nearly impossible for social scientists like David Figlio to study and track large groups of children over time. Governments couldn’t maintain the data, and regardless, they weren’t keen on sharing.

As computing power increased, however, Northwestern University’s Figlio figured out how to access and merge a remarkable repository of government statistics involving 2 million Florida schoolchildren. He has since created the nation’s first massive data set containing matching birth and education records, information that is changing the type of real-world problems social science researchers can solve.

“Data science is an extremely important scholarly pursuit in its own right,” said Figlio, an economist and director of Northwestern’s Institute for Policy Research. “But it’s also a tool that allows social scientists, business scholars, legal scholars, humanists and others to do their work differently.”

Though best known for transforming the physical and life sciences, the data science revolution also has inspired researchers who traditionally work with smaller data sets in fields such as economics, political science, geography and sociology. The use of big data in the social sciences has the potential to change long-standing paradigms, experts say, because it allows scientists to ask — and try to answer — previously unfathomable questions, including ethical and philosophical concerns raised by technological advancements.

It’s not the data per se that are so revolutionary; it’s finding what we can do with them.”

David Figlio
Orrington Lunt Professor of Education and Social Policy

The promise of the “big data” movement is that the numbers hold the answers to fundamental questions of businesses, governments and social sciences.

“It’s not the data per se that are so revolutionary; it’s finding what we can do with them,” said Figlio, the Orrington Lunt Professor of Education and Social Policy. 

One of his studies indicated that children with a heavier birthweight did better in school later in life. These types of results could have long-ranging effects, such as prompting doctors to reassess the need for early inductions.

Birth Education Infographic - Venn diagram showing how advancements in cloud storage capacity and speed of data transfer and Government records including birth and education records for over 2 million florida children over 10 years resulted in Nation’s first large scale data set containing matching birth and education records that was used to analyze the effect of birth weight on Cognitive development. The chart shows that babies around 4.5 lbs scored in the 39th percentile whereas babies around 8.4 lbs scored in the 56th percentile. mathematical or logical s

Across Northwestern, researchers and students in the humanities, education policy, marketing, social media, law and other areas are measuring how humans behave by using the extreme computing power that has evolved over the last few years to make new types of social, archival and wearable data.

They’re arming police officers with a novel crime-fighting tool: advanced data analysis and potentially life-saving intelligence. They’re looking at how people manage their availability in the 24/7 digital world. And they’re working with real-time data and partnering with companies, disrupting the traditional relationship between marketers and consumers.

The experts at Northwestern University Information Technology (NUIT) host popular training sessions and workshops to identify skill gaps, propose remedies and connect researchers who may have very different backgrounds. Northwestern University Libraries also offers workshops on data science, and librarians help faculty curate and manage unwieldy data.

“We can see what a humanist is doing and think about how that applies to a computer science researcher,” said Joseph Paris, associate director for research at NUIT. “Is there an opportunity for collaboration?” 

Figlio, who spent more than a decade building relationships and trust, was one of the first people to obtain government records with an eye to improving education policy. His team is examining the construction of “next-generation” data sets that link administrative data, such as welfare and school records, to population data, such as births and deaths.

Old school surveys are still useful. “But for the price of just one survey, we can do enormous amounts of social science research that can directly translate into policies and practices that people wouldn’t have been able to uncover previously,” Figlio said.

Bringing high-tech to legal research

Read more about the research of Northwestern Law's Bernie Black, which focuses on health policy and medical malpractice — utilizing a high-security NUIT computer.

Read the Interview

A behavioral approach to big data

Can we figure out how humans behave by analyzing big data? It’s a central question for researchers inside the School of Communication’s Social Media Lab, who are using statistics to sort out how people behave in the digital world, including coping with around-the-clock availability and using anonymous platforms.

One series of studies looking at text messages and deception explores why, for example, some people lie to their friends about when they first read a text. The phenomenon has been dubbed “butler lies,” after the fibs a butler might have told to cover for his employer.

“The lies are important because they allow us a way to preserve relationships,” said Jeremy Birnholtz, director of Northwestern’s Social Media Lab. “By understanding how people use and craft butler lies to manage their availability, we can devise better ways to cope with the instantaneous nature of the system, which is offering more information about when and where you’ve opened a message.”

Birnholtz’s team also has amassed more than 2 million “yaks,” or posts on the anonymous platform Yik Yak, from 35 college campuses. The data suggests that anonymous online behavior “depends in part on whether you feel like a disconnected individual or part of a group,” said Birnholtz, an associate professor in the School of Communication who has worked with Facebook’s core data science team.

“We’re looking for linguistic cues like using ‘we’ versus ‘I’ or positive vs. negative emotion words to see which identity people are drawing on,” Birnholtz said.

As the data sets grow, however, new problems crop up. Researchers are limited to what a server can see; yet servers rarely capture a complete picture because not everyone uses a search engine, Twitter or has a Facebook page.

Moreover, researchers normally try to stand apart from the world they’re observing. But in the social sciences, the rampant use of the big data system is breaking down the wall between the researcher and reality, said James Webster, a professor in the department of communication studies.

Using data science in the physical world, such as predicting the weather, won’t change the weather. But in the social world, making a prediction based on an analysis can affect what the researcher is observing.

“Google doesn’t just measure the popularity of websites. It creates popularity,” said Webster, author of “The Marketplace of Attention: How Audiences Take Shape in a Digital Age.”

“We just have to overcome the irrational exuberance and realize the data isn’t perfect and that it can be a self-fulfilling prophecy,” Webster added. “While at the same time, we have to stay mindful that it’s capable of generating insights not otherwise possible.” 

Helping companies reach consumers

In the commercial world, big data has upended the traditional marketing paradigm, according to studies from Medill’s Integrated Marketing Communications Spiegel Research Center that analyzed staggering amounts of big data.

For more than a century, marketers completely controlled their messages, creating and broadcasting advertisements to passive consumers. Today, consumers often unwittingly act as foot soldiers when they write reviews about good or bad experiences with brands or post to social media groups.

Working collaboratively with other schools, including Northwestern’s Kellogg School of Management, as well as with major companies like IBM, Peapod and comScore, IMC researchers have studied how participating in social media discussions and writing or reading negative comments about a product influences future purchases. 

The data is far more accurate and precise than self-reports of the past, because researchers are analyzing actual consumer behavior.

“We’re able to test ideas in the real world and give them evidence-based suggestions on how to respond to negative customer comments,” said Ed Malthouse, Theodore R. and Annie Laurie Sills Professor of Integrated Marketing Communications. “We’re also able to help stimulate discussions that will make customers more loyal in the future.”

Reviews matter, IMC researchers have found, and shoppers are more likely to buy a product rated between 4.2 and 4.5 stars in online reviews. A perfect 5.0, however, is a little “too good to be true,” the study indicated.

Apps are also important; people who shop with an app place larger orders — and shop more often — than those who don’t, according to a study using data from the online grocer Peapod. The research also suggests that we shop differently on mobile devices because of the small screen; we’re more likely to use phones or tablets for routine or habitual purchases and desktop computers for things that require research or consideration. 

“This environment is very different from before, and companies are struggling to adapt to the new conditions,” Malthouse said. 

Aiding law enforcement

Police departments also are struggling to find their way, but they often simply don’t have staff available to them with the advanced statistical training or resources to analyze the overwhelming amount of data.

Under the guidance of adjunct lecturer Mark Iris, mathematically gifted students in the Mathematical Methods in the Social Sciences (MMSS) program help analyze large amounts of data for police departments, which can then be used to help illuminate misconceptions and better inform policy and operations decisions. 

One recent MMSS project that examined crime by location in Houston according to an analysis of “micro hotspots,” was published in The Police Chief, the lead publication of the International Association of Chiefs of Police.

Since Iris began matching students with police agencies that have high volumes of data, the teams have worked on more than 30 projects, initially in Chicago and more recently in Los Angeles and Long Beach, California, and in Philadelphia and Houston.

The Impact on How We Work

Kellogg School of Management believes that business leaders need a working knowledge of data science to reap its rewards.

Learn how Kellogg is Training Managers

Asking the tough questions

The overriding question, of course, is: What about the humanities? Scholars like Sylvester Johnson believe that data scientists of the future, including those trained at Northwestern, should be prepared to weigh ethical and philosophical ramifications intertwined with technological advancements.

“What does it mean to be human in a world dominated by intelligent machines?” asked Johnson, an associate professor of African American studies and religious studies. “The humanist perspective is vital to the development of artificial intelligence.”

Johnson became fascinated with data science while devising an algorithm to help analyze an old, particularly voluminous text. He was struck by the fact that he was using artificial intelligence to categorize words and meaning.

 “The ability to be cognitive, to think and reason — soon those characteristics, long thought to be distinctly human, may no longer belong exclusively to us,” Johnson said. “The whole point of the science behind big data is really to make the machines do that on a greater scale with greater efficiency.”

Listen: Professor Sylvester Johnson Discusses Using Artificial Intelligence to Analyze a Large Volume of Text and Categorize its Words and Meaning

Audio Transcript

Read More

See the other installments of our four-part multimedia series.

Back to top