Skip to content Skip to navigation

New MS&E Professor Johan Ugander Tackles Social Science Questions on a Massive Scale Using “Big Experimentation” Techniques

For a few years now, Big Data has been the belle of the social scientist’s ball—allowing study of human behavior at a previously unheard of scale. But according to new MS&E Professor Johan Ugander, who joined the faculty this past fall, Big Data will have to share the limelight with what he calls, “Big Experimentation.”

Nobody can deny that Big Data has led to stunning advancements across healthcare, finance, optimization and more, allowing business leaders, policy makers, doctors and academics to make better decisions for products, governments and research.

Recently, social scientists entered the mix, seeing interesting new applications for Big Data and analytics.

“I started grad school at Cornell in Applied Math right in the middle of the Big Data storm,” said Ugander.

But he became more interested in questions of cause and effect. While Big Data allows researchers to identify correlations (i.e., see two things that happen at the same time), finding causal relationships (where one event causes the other) requires experimentation.

According to Ugander, this is the future of academia in social science. “The research tide has turned to trying to perform causal, not correlational inferences.”

Ugander said he considers himself “a methodologist trying to get to the bottom of difficult questions—to tease apart cause and effect through experimentation.”

 

Testing Group and Control Group… and Never the Two Shall Meet

You may remember this point from basic statistics: that correlation doesn’t imply causation. Basically, just because two things are happening at the same time doesn’t mean that one thing caused the other.

Observational studies with large data sets can help identify those correlations but can’t determine causation with the certainty of a randomized trial, such as an A/B test.

Ideally, said Ugander, you create two separate populations with no cross-pollination—a control group and a treatment group. These two separate groups would have no contact and look similar demographically.

This seems simple enough. For instance, to see if a blue purchase button or a green purchase button on a website would encourage more sales, you simply give two separate groups two different versions of your website, track the purchase behavior, and you’re done.

But with social network experiments, things get more complicated, because everyone is so interconnected. And these complications are at the heart of Professor Ugander’s expertise.

Consider the example of Facebook, where “my experience is strongly affected by your experience,” said Ugander. Imagine a new type of photo upload feature that Facebook would like to test. “If you make the website really good for a person in the testing group, they will upload more content, which will make the website more engaging for their Facebook friends,” he said.

But if their friends happen to be in the control group, then Facebook has created what Ugander calls “a spillover effect.” Basically, the test group’s content impacts the experience for their friends in the control group, which then contaminates the experiment.

To prevent this, Ugander said he can use his extensive background in graph clustering to find “dense communities of people that are strongly connected” to become a testing or a control group. The groups are isolated from one another, so there is no spillover effect.

Since Facebook is a perfect example of a highly networked environment, it’s no surprise that Ugander honed his craft as an employee on Facebook’s Data Science Team.

 

Around the World in 80 Machines

During Professor Ugander’s work with Facebook, he undertook a massive effort to divide up Facebook’s data stores, specifically for their friend graph.

It’s hard to imagine, but Facebook has graphed the personal relationships and connections of more than one billion users—roughly 1/7th of the world’s population. Friends, and friends of friends, and friends of friends of friends… you get the picture.

In the past, that tree of worldwide personal relationships was stored on machines. Since this massive graph was far too big to store on a single machine, Ugander’s early Facebook job was to use his graph partitioning knowledge to divide up the data among 80 machines. And it had to make sense.

In order to make speedy and accurate friend recommendations, it was best to make them from one machine (vs. across all 80.) “I cut up the friendship graph, so that as many of peoples’ connections as possible were on one machine as opposed to multiple machines,” said Ugander. In effect, he divided the world into 80 functional parts.

As a side note, Ugander said it was from this experience that his team was able determine “how short the path is between two arbitrary people.” Most of us have heard of the concept of six degrees of separation (probably in conjunction with actor Kevin Bacon). Using the Facebook friend graph, the team determined that there are actually four (not six) degrees of separation between any two people.

We are all more closely linked to each other than we think. The fact that we are so connected makes Ugander’s methods of isolating dense clusters of people for testing purposes even more important.

Ugander said these methods for identifying groups for testing are in use at Facebook, as well as other companies “where the experience is highly networked,” but he points to broader applications for these concepts. His examples include cell phone plan marketers, web auction technologies and developmental economics projects.

According to Ugander, these concepts can be applied anywhere you need to create “balanced clusters of individuals” who won’t influence each other during the experiment.

 

How Many People Does It Take to Convince You?

When asked if any of his research findings surprised him, Ugander discussed one particular social science study about “complex contagion” from his Facebook days.

Complex contagion is the notion that if more people try to get you to do something, such as to donate to a particular cause, you have a much greater chance of doing it.

“With most social phenomena, there is a diminishing return when more and more people try to convince you to take a particular action—after awhile, it falls on deaf ears,” said Ugander.

But in the past five or ten years, there has been a debate about whether there are some situations in which it “may take three or four people trying to get you to take the same action before you’ll even consider it,” said Ugander.

Ugander, with coauthors Lars Backstrom, Cameron Marlow (both at Facebook) and Jon Kleinberg (Cornell), determined that complex contagion is strongest when people who are from different areas of your life try to get you to sign up for Facebook.

Their study, “Structural Diversity in Complex Contagion,” shows that a person is much more likely to join Facebook when multiple people who don’t know each other—for instance, a work colleague, a surfing buddy, and a college roommate—all try to convince that person.

Since that study, Ugander has published additional articles about social network assembly, graph clustering and crowdsourcing, which are available on his MS&E profile page.

Prior to joining MS&E, Ugander spent a year as a post-doctoral researcher collaborating with Eric Horvitz, director of Microsoft Research, who also visited MS&E last spring as part of the New Directions Lecture Series.

Ugander also completed a six-week expedition to Northern Alaska with his wife, funded in part by the American Alpine Club and Patagonia. Their expedition report was just accepted for publication in American Alpine Journal.

Now that he’s firmly entrenched at MS&E, Ugander is focused on intensive expeditions into big data sets and big experiments. He said, “I’m excited about being part of MS&E’s foray into blending analytics and social sciences. This work has far-reaching implications in our increasingly networked world. There is much to do.”

Written by: Rachel Street

Tuesday, February 23, 2016