Posted on 2014/10/13 by

Big data is small data, too

Big Data comic

I’m the first to admit it: we might be popular, we might create a lot of great relationships, we might blah blah blah. But OkCupid doesn’t really know what it’s doing. Neither does any other website. It’s not like people have been building these things for very long, or you can go look up a blueprint or something. Most ideas are bad. Even good ideas could be better. Experiments are how you sort all this out.

(Rudder, We Experiment On Human Beings!)

The (by now forgotten) scandal that the paper Experimental evidence of massive-scale emotional contagion through social networks entailed, triggered a provocative response   from OKCupid co-founder and data analytics leader, Christian Rudder. OKCupid, whose slogan is ‘we do math to get your dates’ is one of the most popular websites for online dating, with almost 2,000,000 people using it per month. While the main purpose of the service is getting ‘matches’ among individuals that would lead to dates, the mechanics behind this (for using an old metaphor) is less simple. The declared basis for doing those matches is an algorithm built upon the information people have explicitly gave about some questions. Not only their answers are considered but how they like someone else to answer them, and the individual value or importance of each question. In addition, OKCupid recollects data, find patterns and speculate hypothesis for improving their algorithms. And they experiment.

Probably the most controversial part of Rudder’s article was the unapologetic confession that, as a part of an experiment, they mismatched people in purpose, telling them that their affinity value (probably the most important feature on the site) was higher/lower than the real one. Or more precisely, than the value gave by the algorithm they designed.

To test this, we took pairs of bad matches (actual 30% match) and told them they were exceptionally good for each other (displaying a 90% match.)† Not surprisingly, the users sent more first messages when we said they were compatible. After all, that’s what the site teaches you to do.

But we took the analysis one step deeper. We asked: does the displayed match percentage cause more than just that first message—does the mere suggestion cause people to actually like each other? As far as we can measure, yes, it does.

When we tell people they are a good match, they act as if they are. Even when they should be wrong for each other.

After this article, Rudder was confronted in a radio show by two journalists:

AG: Have you thought about bringing in, say, like an ethicist to, to vet your experiments?
CR: To wring his hands all day for a hundred thousand dollars a year?
AG: Well, y’know, you could pay him, y’know, on a case by case basis, maybe not a hundred thousand a year.
CR: Sure, yeah, I was making a joke. No we have not thought about that.

Besides, if it is possible, the ethical aspect of the problem (or freely paraphrasing Jonathan Sterne: ‘as a research topic, I prefer you dead’), what Rudder was arguing about was the penultimate value of an algorithm, always perfectible and valid only in relation with a predeterminate goal. Also, following Underwood, Rudder could also claim that ‘this is a work in progress’ and that ‘we don’t fully understand the terms we’re working with’ (2013). A new object of study, as Jockers says, requires ‘a new methodology, a new way of thinking about our object of study’ (4). And a new methodology  demands speculation, tests and failures. An algorithm ‘can give us a starting point for discussion’ (Underwood, 2013), a discussion between the hypothesis embodied by the algorithm and the data gathered (or what is also called trial and error).

CR: I think part of what’s confusing people about this experiment is the result. The algorithm does kind of work, y’know and power of suggestion is also there. But like, what if it had gone the other way? What if our algorithm was far worse than random? Then if we hadn’t had run that experiment we basically are doing something terrible to all the users. Like this is the only way to find this stuff out, if you guys have an alternative to the scientific method I’m all ears.

(…) But again like I haven’t really heard an answer from you guys about how do you change the algorithm in any way, whether it’s this extreme way of basically making it random, or just changing the way a variable is weighted. How do you make a change and test it?

(…)If you think that OkCupid has unlocked the mysteries of love and has an ironclad algorithm, prophetically can tell you exactly who is right for you, you’re a crazy. Y’know? So like, we’re doing our best, for sure, and it’s the same thing. Like I think people will realize that that’s how these sites work, that’s how they evolve, they’re doing the best job that they can, and they also have their own interests as well. And, and maybe that’s the process that we’re looking at. And that’s the kind of, again the kind of conversation that I think Facebook on accident, and OkCupid on purpose is trying to kickstart.

Although Rudder defends their right to experiment for improving the algorithm they use (at the end…isn’t OKCupid an experiment itself?), this particular essay had a very different purpose: people were deceived in order to understand how much people are influenced by the announcement of a value, regardless the truth of that announcement. The discussion Rugged was proposing was not really about how useful a particular algorithm was but the impact of the algorithm itself. In other words, he was asking about the value of the algorithm they produce, the value of Big Data, as a cultural object. Using van Dijck’s term, its Dataism value. As DNA (Thacker) or neuroscience, an algorithm has a cultural meaning not completely independent but different from its declared one. An algorithm is an algorithm is an algorithm. As boyd and Crawford outlined, Big Data is an interplay of technology, analysis and mythology (2012), and was exactly mythology what this experiment was about. However, mythology is not that distant from the ethical aspect Rugged was trying to avoid, and it is also related to one of the problems both Jocker and Underwood mention in their texts: consent requires knowledge; critical thinking requires knowledge; compelling database interrogations requires knowledge. The ability to contrast or manipulate Big Data means understanding how this information was gathered, the criteria (and ideology) for the building of the database and the analysis applied to that information.

Or maybe we are just asking what we talk about when we talk about love.

Dear [nameA]
Because of a diagnostic test, your match percentage with [nameB] was misstated as [%]. It is actually [%]. We wanted to let you know!
Best,

OkCupid

References

-boyd, danah and Kate Crawford. (2012).“Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon.” Information, Communication, & Society 15:5, p. 662-679.
-Jockers, Matthew L. Macroanalysis: Digital methods and literary history. University of Illinois Press, 2013.
-Kramer, Adam DI, Jamie E. Guillory, and Jeffrey T. Hancock. “Experimental evidence of massive-scale emotional contagion through social networks.” Proceedings of the National Academy of Sciences (2014): 201320040.
-Rettberg, Jill Walker. Seeing Ourselves through Technology: How We Use Selfies, Blogs and Wearable Devices to See and Shape Ourselves, 2014.
-“OkCupid Lied To Users About Their Compatibility As An Experiment – Forbes.” Accessed October 9, 2014. http://www.forbes.com/sites/kashmirhill/2014/07/28/okcupid-experiment-compatibility-deception/.
-“We Experiment On Human Beings! « OkTrends.” N.p., n.d. Web. 9 Oct. 2014.
-“Where to Start with Text Mining. | The Stone and the Shell.” Accessed October 9, 2014. http://tedunderwood.com/2012/08/14/where-to-start-with-text-mining/.
-“We Don’t Already Understand the Broad Outlines of Literary History. | The Stone and the Shell.” Accessed October 9, 2014. http://tedunderwood.com/2013/02/08/we-dont-already-know-the-broad-outlines-of-literary-history/.
Print Friendly