How AI and crowdsourcing help social scientists sample diverse populations

Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Watch now.

In 2010, three psychologists from the University of British Columbia published a paper with an intriguing title: The WEIRDest people in the world? Paradoxically, the paper was about Americans. The three scientists had devoted their research careers to cross-cultural variability of human psychology and traveled the seven seas to study small-scale tribal societies. In the paper, they voiced a growing concern about how heavily the humanities — psychology, economics, sociology, political science and others — were relying on samples of Americans. From lab experiments to panel studies, by and large, data collection from people meant data collection from American people.

The rich, the poor and the barely surviving

In science, to say that you learned something about people should imply that you have randomly sampled people around the globe, not just from one country. Voluminous evidence shows how differently people think and behave across the world’s cultures — from strategies in financial games to basic cognition, e.g., spatial orientation or susceptibility to visual illusions.

But if you are sampling from only one country, your best bet is to not sample from the U.S.: In every single distribution, the U.S. is on a tail, never in the middle. Along with a few other developed countries, mainly in Western Europe, Americans stand out as being very different from the rest of the world. You can even say weird. Beautifully weird in many respects: forward-looking, cooperative, secure — but not at all representative of the world’s population.

Look at the world’s wealth distribution, and you’ll easily see why Westerners are so different. They live longer lives in stable environments, they eat well and breathe relatively clean air, they own homes and cars, they have jobs, bank accounts and insurance. This all is simply not the case for most other inhabitants of the planet, who have a substantially lower standard of living, not to mention that close to 700 million people — around 10% of the global population — are living in extreme poverty, on less than $2 a day, with a looming risk of dying from famine or diseases.

Event

Intelligent Security Summit

Learn the critical role of AI & ML in cybersecurity and industry specific case studies on December 8. Register for your free pass today.

What is WEIRD?

The term WEIRD doesn’t just mean “odd.” In social sciences, it also stands for Western, Educated, Industrialized, Rich, Democratic — an original acronym the paper’s authors introduced to describe the world’s “golden billion.” This term refers to individuals from largely developed and wealthy post-industrial societies who are oblivious to everyday occurrences still ubiquitous today in many other parts of the globe, e.g., husbands routinely beating their wives, children dying in infancy, or people practicing open defecation.

If you’re reading this piece, chances are you’re WEIRD, too, and so are your coworkers, family, friends and possibly everyone else you know. And, when you hear the word “diversity,” you probably think about it in the modern American sense – five ethnicities, with poverty defined as annual household income below $20,000. Well, the world has 650 ethnicities, and there are countries where the median annual household income is $200, which is the median daily wage for American workers. Yes, including African Americans, Native Americans, Asian Americans, and Latinx Americans in research is crucial for scientific diversity, as much as studying populations of low-income areas of the U.S. is. But it’s not enough. By the world’s standards, that will still be the diversity of the wealthy: Even if in America these people aren’t considered rich, they’re much richer than 95% of the world’s population.

This leads us to one simple conclusion: to make science truly and globally diverse, we must go beyond WEIRD samples.

The risk and fall of MTurk

In fact, just a little over a decade ago, things were even worse: Within the “golden billion,” researchers had been mostly getting their data from an even smaller subset of Westerners: undergraduates. Many of the coolest discoveries about the “nature of people” have been obtained on U.S. student samples. Cognitive dissonance? Students. The prisoner’s dilemma? Students. Marshmallow test? OK, that was Stanford faculty’s kids; not much better in terms of sample diversity.

To be fair, it hasn’t really been the fault of researchers, who have limited resources for recruiting participants. Most scholars have tiny research budgets; some get grants, but it takes years, while most research ideas never get funded at all. Academic timing is tight, with one shot to get tenured, so most researchers can’t really afford to think outside the box about how to obtain their research subjects. They need simple solutions, and undergrads are one such solution: They’re around, and you don’t have to pay them since they do it for credits. This is the reason young scholars typically start their research journey by testing their hypotheses on students — and often continue doing so for the rest of their careers.

Since the late 2000s, this has changed. Quite accidentally, the change was brought about by Amazon. Academic researchers noticed Mechanical Turk (MTurk), a platform originally created to label data for machine learning algorithms using crowdsourcing. Crowdsourcing essentially means receiving labeled data from a large group of online contributors and aggregating their results — as opposed to a smaller group of narrowly trained in-house specialists. As a byproduct, MTurk had hundreds of thousands of registered Americans waiting for new tasks to earn money from.

Some open-minded researchers tried running an academic survey on MTurk. It worked. Moreover, the data kicked in within a day, while oftentimes, it takes you a whole semester to run one study. MTurk was cheap, and it was fast. What else could you wish for if you’re a tenure-track professor eager to get published?

The word spread, and within a decade, MTurk became a go-to tool for academic researchers to collect data on. Social sciences changed, too: They were not about students anymore but about housewives, retired people and blue-collar workers— new population samples that are far more representative than your typical college kids. With all its issues and downsides — from underpaying participants to not controlling data quality properly — MTurk deserves a tribute: It revolutionized social sciences by empowering scientists to collect data from non-student samples easily and affordably.

Today, MTurk is gradually giving place to solutions customized for social sciences, such as those from Prolific, CloudResearch, Qualtrics and Toloka. But they all got a shot because Amazon pioneered in this space by changing the very idea of academic data collection.

Beyond WEIRD

So, in the last decade, social scientists went beyond student samples, and most importantly, they managed to do so at scale. However, the problem remains: Those samples are still WEIRD; that is, they’re limited to Americans or Western Europeans at best. Researchers who want to go beyond WEIRD have been facing the same problem: no quick or affordable way to do so.

Say you want to test your hypothesis on people from Botswana, Malaysia and Poland. You must either find a collaborator (a challenge in and of itself) or turn to panel agencies, a feasible solution only for those who have a lot of money to play with, as a quote can easily reach $15,000 for one study. To afford this, a researcher would have to find a big grant in their field (if such a grant is even available), apply, wait for months to hear back and likely not get it anyway. In short, there’s just no way your average scholar could afford international panels for routine hypothesis testing.

Fortunately, this state of affairs has also been undergoing a major change, and not only because researchers now have access to non-students as their research subjects. Crucially, crowdsourcing platforms today aren’t as homogeneous as MTurk was when it first launched. Getting participants from South America, Africa or Asia — even from largely rural areas — is quite doable now, provided these people have internet access, which today is becoming less and less of an issue.

Dr. Philipp Chapkovsky, a behavioral economist at WZB Berlin Social Science Center, studies how external information shapes group polarization, trust and altruism. One of his interests is the nature and consequences of corruption.

“Corruption indices of countries and regions are a valuable tool for policymakers, but they may result in statistical discrimination — people from a more ‘corrupt’ region may be perceived as less trustworthy or more inclined to dishonest behaviors,” Dr. Chapkovsky explains.

In one experiment, Dr. Chapkovsky and his team investigated how information about corruption levels may harm intergroup relations. The scientists faced a problem: All leading data collection platforms provided access only to American and Western European participants — that is, to people who likely never experienced corruption in their everyday lives.

“We needed access to participants from developing countries who know what corruption is — not from Netflix shows featuring imaginary politicians but from real-life experience. When you study corruption, it makes sense to research people from Venezuela, Nigeria, Iran, or Bangladesh. You can’t study day-to-day corruption on American or British participants, it’s just not there. Moreover, to test our particular hypothesis, we needed specific countries with large interregional variation of corruption levels, so we could keep the country factor fixed.”

Accidentally, Dr. Chapkovsky came across a social sciences offering by one of the newer options mentioned above, Toloka. Focusing on data-centric AI development through its large fleet of contributors from 120 countries, the platform was able to give the researcher exactly what he had been after: previously silent voices from cultures other than the U.S. and the UK.

“We manipulated the information people had about three different geographical regions of their home country. Then we had them play two simple behavioral games: ‘Cheating game’ and ‘Trust game’. We found that, indeed, information about a certain region being ‘corrupt’ decreased trust towards anyone from that region and made people substantially overestimate the degree of dishonesty of their fellow players.”

Another researcher, Dr. Paul Conway, an Associate Professor at University of Southampton School of Psychology and a lecturer at the Centre for Research on Self and Identity, studies the psychology of morality. “I am interested in factors that influence how people decide what is right or wrong, who is good and bad, and how to assign blame and punishment.”

Like other researchers in moral psychology, Dr. Conway has found that some factors influencing moral judgment appear widely or even universally endorsed, whereas others may be culture-dependent.

“All known human cultures agree that it is wrong to intentionally harm an innocent target,” Dr. Conway explains. “Yet, people might disagree over who is innocent or whether harm was intentional. People view some factors as more important than others in upholding moral norms: for example, harming one innocent person to save several people is often acceptable.”

Dr. Conway had been testing his hypotheses on research participants from the United States and Great Britain until he came to realize that this was not painting a full picture of human moral perceptions. Although there were a few cross-cultural studies in his field, those were often massive, expensive and challenging undertakings, impractical for testing many questions about the psychology behind moral decisions. “In science, you need large samples — until recently, you couldn’t easily get those outside Western countries. Even with the right grant to fund studies, it can still be a logistical challenge to access large diverse samples,” he admits. “Researchers who wanted to access more cultural diversity were often forced to trade off quantity and quality of data.”

Dr. Conway had been seeking a way to quickly, easily and affordably access respondents from different cultures, especially underdeveloped regions of the world. It turned out to be easier than he had previously anticipated:

“Crowdsourcing has become a game changer for psychologists like myself. For over a decade, I’ve been using crowdsourcing platforms like MTurk and Prolific to tap into Western populations beyond college undergrads. Recently, I also started using crowdsourcing to obtain quick access to participants from secluded regions of the globe that are of interest to my research. This is helpful to test whether the findings in Western populations hold in other regions around the globe.”

Crowdsourcing platforms are still not representative in a rigorous scientific sense: Participants must have internet access and spare time to perform tasks, which biases the sample. Not all of them are attentive or read well enough to provide quality responses. Be that as it may, it’s still much more diverse than the convenient student samples social sciences had to rely on until recently. Originally designed to assist machine learning engineers, crowdsourcing platforms are gradually changing the way social sciences operate, bringing real diversity into what scientists are learning about human nature.

Elena Brandt is Toloka for Social Sciences PhD Candidate in Social Psychology.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!