Let’s say your research question involves a sensitive subject. How many people accept bribes? Cheat on tests? Hold racist views? Wouldn’t vote for a female presidential candidate?
If you just ask this question, straight up, you will run head on into social desirability bias, which is:
the tendency of survey respondents to answer questions in a manner that will be viewed favorably by others
Unfortunately, even with anonymous surveys, which are the rule at least for us (e.g. when Gradient runs a survey, we don’t know the name or any other identifying information about any survey respondents), this effect can still bias responses.
When doing research on sensitive issues, one method we like to use is called the list experiment (or item-count technique). Here’s how it works:
Split your sample into two conditions, A & B. In the A condition, you show a list of prompts, and ask how many of the prompts the respondent agrees with. Taking an example from the World Bank website, which references this paper:
“How many of the following statements are true for you?”
I remember where I was the day of the Challenger space shuttle disaster
I spent a lot of time playing video games as a kid
I would vote to legalize marijuana if there was a ballot question in my state
I have voted for a political candidate who was pro-life
In Condition B, we mix in an additional statement, which is the one that we actually care about:
I remember where I was the day of the Challenger space shuttle disaster
I spent a lot of time playing video games as a kid
I would vote to legalize marijuana if there was a ballot question in my state
I have voted for a political candidate who was pro-life
I consider myself to be heterosexual
The respondent only supplies a number representing the count of statements they agree with and knows that there’s no way to figure out which statements they are. This embeds privacy and anonymity into the design of the survey itself, which (hopefully) gives the respondent the sense of security they need to answer completely truthfully.
Although we can’t map respondents’ answers to specific prompts, we can use the average count from group A to estimate the average count from group B’s first four questions, since the first four questions are the same in both groups. E.g. if the average number of agreed-with statements from Group A was 2.3, then we can assume that the average number of agreed-with statements from the first four prompts in Group B was also 2.3 (or close to it).
Anything above 2.3 would then have to come from the additional question. Say, for the sake of example, that the average from Group B was 3.1, leading to a difference of 0.8 between the two groups. Our estimate, then, of the percentage of people that agreed with the additional prompt (“I consider myself to be heterosexual”) would be 80%.
Unfortunately, though, this method — like any advanced statistical and/or survey method — can’t just be applied without substantial critical thinking about the specific problem. Here’s a good overview of why you might want to “think again” when thinking about doing a list experiment. In particular, you have to make sure that it’s not obvious which item you care about. E.g. it wouldn’t work to have a setup like the following:
“How many of the following statements are true for you?”
I like baseball
I always use my turn signal when switching lanes
I like dogs more than cats
I like coffee more than tea
I commit felonies by accepting bribes
You also need to take care to avoid having control statements that are too highly correlated (and in fact, you want them to be negatively correlated). In the example above, if someone says they agree to five of the statements, they have essentially agreed to the fifth statement, because they’ve agreed to all of them. For this reason you want to avoid the situation where someone agrees to all of the first four statements (because it puts them in the situation where agreeing with the sensitive item makes their answer non-anonymous). Having positively correlated statements makes this more likely, and having negatively correlated statements makes this less likely.
Finally, it takes a lot of data and advanced modeling to do this well. Simply taking the difference in means (in the example above, this was 3.1-2.3 = 0.8) is a very noisy estimate — more advanced multivariate regression techniques, along with a large sample size, are generally required to obtain a reasonably tight uncertainty interval.
But, done well, the list experiment technique is one of the best options known today to obtain estimates to sensitive questions. It’s a method that we (at Gradient) spend a lot of time improving, including offering different response options to remove the “floor” and “ceiling” effects. For example, we would combine the “4” and “5” responses into a single “4 or 5” response option. This way, even if a respondent agreed with all of the statements, and in a normal survey would have their anonymity removed by responding truthfully, they would respond “4 or 5” here and their anonymity would be preserved. This makes the model more complicated to estimate, but is worth it in order to preserve anonymity. We also maintain a battery of negatively correlated control variables, so that we can re-use data across different surveys. Finally, we don’t just hold out the sensitive item, but randomly hold out different items -- even the control items -- when showing the survey.
Have something sensitive you need to research?
Whoa this is kinda mind boggling...yet makes SO much sense