A reply to ‘Facebook: You are your ‘Likes” - World leading higher education information and services

$\"\"$

This week Charles Foster discussed the recent study showing that Facebook ‘Likes’ can be plugged into an algorithm to predict things about people – things about their demographics, their habits and their personalities – that they didn’t explicitly disclose. Charles argued that, even though the individual ‘Likes’ were voluntarily published, to use an algorithm to generate further predictions would be unethical on the grounds that individuals have not consented to it and, consequently, that to go ahead and do it anyway is a violation of their privacy.

I wish to make three points contesting his strong conclusion, instead offering a more qualified position: simply running the algorithm on publically available ‘Likes’ data is not unethical, even if no consent has been given. Doing particular things based on the output of the algorithm, however, might be.

Point one: the algorithm does not generate ‘knowledge’

Charles argued that running the algorithm is a violation of privacy; a violation akin to using x-ray glasses to see beyond the hall and kitchen walls within which you were invited. This is, however, over-stating what the algorithm can achieve. There is a crucial distinction to be made between the fine-grained knowledge that using x-ray glasses would enable and the coarse-grained statistical prediction that the algorithm generates. The closer analogy would be for me to invite you into my hall and kitchen, whereupon you proceed to make an estimation of the state of my bedroom based on your experience of observing patterns of states of rooms within houses – you are the Sherlock Holmes of bedroom-state induction. You see that my hall and kitchen are untidy and make a prediction that my bedroom probably is also. It may be that this pattern indeed pertains in my house, but you don’t know this, and you don’t know the particular ways in which it is untidy: you have not actually seen my heaps of clothes and scattered papers. In making your prediction – even if it turns out to be accurate – I am not violated.

This outcome suggests that violations of privacy necessarily involve the unauthorised acquisition of knowledge, and that generating mere predictions cannot therefore intrude on my interests. It is an interesting question whether generating statistical predictions can ever be a violation of privacy. Perhaps there is a level of accuracy of prediction that makes it ethically on par with illicitly acquiring knowledge of personal details, but my intuition is that there is not. The way in which one comes to the prediction is importantly different from the way in which once comes to have knowledge, and the detail offered by prediction will always be less precise.

Point two: what matters ethically is what is done with the prediction the algorithm generates

The example Charles cited of targeting women who are likely to become mothers with advertising is indeed troubling. The example involves a US retail network using customer shopping records to predict pregnancies of its female customers and sending them well-timed and well-targeted offers. The worry is that these offers may be sent to women who have a particular interest in avoiding even the suggestion of pregnancy: for example unmarried mothers in a culture where such a pregnancy is unacceptable. If the Facebook ‘Like’ algorithm were to be able to make such a prediction (the study in question did not suggest that it does) then targeted adverting based on this prediction would indeed be ethically problematic. However, the concern does not attach to the running of the algorithm per se. The consent issues that attach to targeted advertising are not the same as those that attach to the mere statistical analysis of voluntarily given data. It might be asked why anyone would bother to make statistical predictions if they did not intend to exploit them, but the fact remains that simply making the predictions – running a linear/logistic regression model on public data – is not unethical.

Point three: the information voluntarily published is generally not that dissociated from the predicted information

The concern emphasised in Charles’ piece was that the algorithm might be able to predict things about me that I had absolutely no idea that my public ‘Likes’ even hinted at. He reported that fewer than 5% of gay users were connected with explicitly gay groups. However, ‘Likes’ that were moderately indicative of male homosexuality included ‘Britney Spears’ and ‘Desperate Housewives’. Of course, there will be a whole variety of men who like Britney but the finding that, of the men who ‘Like’ Britney Spears on Facebook, more are gay than straight is perhaps not a massive surprise. Significantly, ‘Liking’ something on Facebook is a public, expressive act: there will be many men who like Britney but have not proclaimed that they ‘Like’ her. It’s also possible (but not essential) that some of these proclamations are motivated precisely by people’s desires to define and present themselves in the public sphere, intending the individual ‘Likes’ to express more about their wider identities than would a tick on an anonymous music survey. Thus, the statistical prediction should neither upset those for whom it is true nor those for whom it is false: it is a description of a piece of data predicting other piece of data, accompanied by a value for the proportion of instances in which this prediction will be correct.

So, far from the algorithm running, unrestricted through our private lives, it in fact knows nothing more about us than the public ‘Likes’ it is fed. Running it will generate predictions of lesser or greater accuracy but, unless I confirm or deny the truth of the prediction, it knows nothing new about me, remaining a speculating but ignorant guest in my messy hall and kitchen.