How Accurate Data can Make or Break (and how to fudge the numbers to further your personal agenda)

According to a recent study by the American Automobiles Association (AAA), there has been a decrease in the number of adults in the United States who would be afraid to ride in a self-driving vehicle, year over year. 63% of respondents reported fear of riding in a self-driving car, compared to 78% of respondents at the same time last year. 

While this sounds inpressive, and even exciting, to proponents of self-driving tech, in reality it is only a magic trick. Except that it isn’t magic; it’s manipulation. Thinly-veiled manipulation at that, and I’m gonna give you an overview into how they did it, why they did it, and how you can do the same.

Lets say that you’re surveying 1,000 people on their favourite chocolate bar out of three possible answers; Twix, M&Ms, and ‘other’. 13% like Twix, 4% like M&Ms, and 83% like some other brand of chocolate bar. A year later, you want to run the same survey to see whether people’s attitudes have changed, but how do you pick your sample? The best way to get reliable data would be to survey the same 1,000 people that you surveyed originally, thus keeping error and bias out of your study, and actually tracking the change in hearts and minds of individuals over time. 

That would be the best way to do it, but let’s say that you don’t like the results that your survey showed last time and want to show that M&Ms is gaining popularity, since it is also your favourite. You could choose a new sample of 1,000 from a biased source, such as by finding respondents in pro-M&M communities and groups, but that might bring about suspicious results, such as an unusually high number of respondents reporting that they prefer M&Ms when compared to the previous results.

Another thing that you could do is plan your sample-selection process in a way that looks unbiased while not entirely being unbiased. In other words, choose a new random sample of 1,000 respondents. You might not get the result you want, but there’s a better chance of that than polling the same respondents as before, who might not have changed their minds in the relatively short space of 12 months. That way, you get a fresh shot at getting the results you want while still having an unbiased questionnaire that is administered in the same way every time the survey is conducted. Since the sample method is also random, it isn’t obvious that you’ve fudged your study because you don’t like the results, without looking at the raw data from your survey. 

Without looking at the raw data of the AAA self-driving survey, it is impossible to tell whether the same people were surveyed both times for both surveys, but this can be a good case study on how polling studies don’t nessesarily paint an accurate picture of the world around us. If the survey was anonymous, then there truly is no way of knowing what happened.

Had the same sample of people been surveyed twice, 12 months apart, we could get an idea of how public perception of self-driving tech changes over time. Survey two completely different groups, however, and the results are meaningless, even if they are surveyed 12 months after the original group.

We have no idea on the feelings of the second group in the AAA study 12 months ago, so we have no idea as to whether their minds have changed in that time. It’s entirely possible that they were more in favour of self-driving than the first group, at the time of the original polling, and hadn’t changed their minds when it came time to poll them. 

There’s just no way to know what happened, rendering this study basically worthless and meaningless for anything other than convincing people of the change in public attitudes toward self-driving tech. 

Do you want your own studies fudged in order to further your adjenda while appearing to be a impartial and unbiased data scientist? Well there you have it.

other things you might like