Accountable Algorithms: A Research Agenda

May 12th, 2015 by Christian

(or, Caillou Sucks)

What should people who are interested in accountability and algorithms be thinking about? Here is one answer: My eleven-minute remarks are now online from a recent event at NYU. I’ve edited them to intersperse my slides.

This talk was partly motivated by the ethics work being done in the machine learning community. That is very exciting and interesting work and I love, love, love it. My remarks are an attempt to think through the other things we might also need to do. Let me know how to replace the “??” in my slides with something more meaningful!

Preview: My remarks contain a minor attempt at a Michael Jackson joke.

Here is the video: https://www.youtube.com/embed/rJfDKx2fjdE

A number of fantastic Social Media Collective people were at this conference — you can hear Kate Crawford in the opening remarks.  For more videos from the conference, see:

Algorithms and Accountabilityhttp://www.law.nyu.edu/centers/ili/algorithmsconference

Thanks to Joris van Hoboken, Helen Nissenbaum and Elana Zeide for organizing such a fab event.

If you bought this 11-minute presentation you might also buy: Auditing Algorithms, a forthcoming workshop at Oxford.

http://auditingalgorithms.wordpress.com

 

(This post was cross-posted to The Social Media Collective.)

 

 


The Facebook “It’s Not Our Fault” Study

May 7th, 2015 by Christian

Today in Science, members of the Facebook data science team released a provocative study about adult Facebook users in the US “who volunteer their ideological affiliation in their profile.” The study “quantified the extent to which individuals encounter comparatively more or less diverse” hard news “while interacting via Facebook’s algorithmically ranked News Feed.”*

  • The research found that the user’s click rate on hard news is affected by the positioning of the content on the page by the filtering algorithm. The same link placed at the top of the feed is about 10-15% more likely to get a click than a link at position #40 (figure S5).
  • The Facebook news feed curation algorithm, “based on many factors,” removes hard news from diverse sources that you are less likely to agree with but it does not remove the hard news that you are likely to agree with (S7). They call news from a source you are less likely to agree with “cross-cutting.”*
  • The study then found that the algorithm filters out 1 in 20 cross-cutting hard news stories that a self-identified conservative sees (or 5%) and 1 in 13 cross-cutting hard news stories that a self-identified liberal sees (8%).
  • Finally, the research then showed that “individuals’ choices about what to consume” further limits their “exposure to cross-cutting content.” Conservatives will click on only 17% a little less than 30% of cross-cutting hard news, while liberals will click 7% a little more than 20% (figure 3).

My interpretation in three sentences:

  1. We would expect that people who are given the choice of what news they want to read will select sources they tend to agree with–more choice leads to more selectivity and polarization in news sources.
  2. Increasing political polarization is normatively a bad thing.
  3. Selectivity and polarization are happening on Facebook, and the news feed curation algorithm acts to modestly accelerate selectivity and polarization.

I think this should not be hugely surprising. For example, what else would a good filter algorithm be doing other than filtering for what it thinks you will like?

But what’s really provocative about this research is the unusual framing. This may go down in history as the “it’s not our fault” study.

 

Facebook: It’s not our fault.

I carefully wrote the above based on my interpretation of the results. Now that I’ve got that off my chest, let me tell you about how the Facebook data science team interprets these results. To start, my assumption was that news polarization is bad.  But the end of the Facebook study says:

“we do not pass judgment on the normative value of cross-cutting exposure”

This is strange, because there is a wide consensus that exposure to diverse news sources is foundational to democracy. Scholarly research about social media has–almost universally–expressed concern about the dangers of increasing selectivity and polarization. But it may be that you do not want to say that polarization is bad when you have just found that your own product increases it. (Modestly.)

And the sources cited just after this quote sure do say that exposure to diverse news sources is important. But the Facebook authors write:

“though normative scholars often argue that exposure to a diverse ‘marketplace of ideas’ is key to a healthy democracy (25), a number of studies find that exposure to cross-cutting viewpoints is associated with lower levels of political participation (22, 26, 27).”

So the authors present reduced exposure to diverse news as a “could be good, could be bad” but that’s just not fair. It’s just “bad.” There is no gang of political scientists arguing against exposure to diverse news sources.**

The Facebook study says it is important because:

“our work suggests that individuals are exposed to more cross-cutting discourse in social media they would be under the digital reality envisioned by some

Why so defensive? If you look at what is cited here, this quote is saying that this study showed that Facebook is better than a speculative dystopian future.*** Yet the people referred to by this word “some” didn’t provide any sort of point estimates that were meant to allow specific comparisons. On the subject of comparisons, the study goes on to say that:

“we conclusively establish that…individual choices more than algorithms limit exposure to attitude-challenging content.”

compared to algorithmic ranking, individuals’ choices about what to consume had a stronger effect”

Alarm bells are ringing for me. The tobacco industry might once have funded a study that says that smoking is less dangerous than coal mining, but here we have a study about coal miners smoking. Probably while they are in the coal mine. What I mean to say is that there is no scenario in which “user choices” vs. “the algorithm” can be traded off, because they happen together (Fig. 3 [top]). Users select from what the algorithm already filtered for them. It is a sequence.**** I think the proper statement about these two things is that they’re both bad — they both increase polarization and selectivity. As I said above, the algorithm appears to modestly increase the selectivity of users.

The only reason I can think of that the study is framed this way is as a kind of alibi. Facebook is saying: It’s not our fault! You do it too!

 

Are we the 4%?

In my summary at the top of this post, I wrote that the study was about people “who volunteer their ideological affiliation in their profile.” But the study also describes itself by saying:

“we utilize a large, comprehensive dataset from Facebook.”

“we examined how 10.1 million U.S. Facebook users interact”

These statements may be factually correct but I found them to be misleading. At first, I read this quickly and I took this to mean that out of the at least 200 million Americans who have used Facebook, the researchers selected a “large” sample that was representative of Facebook users, although this would not be representative of the US population. The “limitations” section discusses the demographics of “Facebook’s users,” as would be the normal thing to do if they were sampled. There is no information about the selection procedure in the article itself.

Instead, after reading down in the appendices, I realized that “comprehensive” refers to the survey research concept: “complete,” meaning that this was a non-probability, non-representative sample that included everyone on the Facebook platform. But out of hundreds of millions, we ended up with a study of 10.1m because users were excluded unless they met these four criteria:

  1. “18 or older”
  2. “log in at least 4/7 days per week”
  3. “have interacted with at least one link shared on Facebook that we classified as hard news”
  4. “self-report their ideological affiliation” in a way that was “interpretable”

That #4 is very significant. Who reports their ideological affiliation on their profile?

 

add your political views

 

It turns out that only 9% of Facebook users do that. Of those that report an affiliation, only 46% reported an affiliation in a way that was “interpretable.” That means this is a study about the 4% of Facebook users unusual enough to want to tell people their political affiliation on the profile page. That is a rare behavior.

More important than the frequency, though, is the fact that this selection procedure confounds the findings. We would expect that a small minority who publicly identifies an interpretable political orientation to be very likely to behave quite differently than the average person with respect to consuming ideological political news.  The research claims just don’t stand up against the selection procedure.

But the study is at pains to argue that (italics mine):

“we conclusively establish that on average in the context of Facebook, individual choices more than algorithms limit exposure to attitude-challenging content.”

The italicized portion is incorrect because the appendices explain that this is actually a study of a specific, unusual group of Facebook users. The study is designed in such a way that the selection for inclusion in the study is related to the results. (“Conclusively” therefore also feels out of place.)

Algorithmium: A Natural Element?

Last year there was a tremendous controversy about Facebook’s manipulation of the news feed for research. In the fracas it was revealed by one of the controversial study’s co-authors that based on the feedback received after the event, many people didn’t realize that the Facebook news feed was filtered at all. We also recently presented research with similar findings.

I mention this because when the study states it is about selection of content, who does the selection is important. There is no sense in this study that a user who chooses something is fundamentally different from the algorithm hiding something from them. While in fact the the filtering algorithm is driven by user choices (among other things), users don’t understand the relationship that their choices have to the outcome.

 

not sure if i hate facebook or everyone i know

 

In other words, the article’s strange comparison between “individual’s choices” and “the algorithm,” should be read as “things I choose to do” vs. the effect of “a process Facebook has designed without my knowledge or understanding.” Again, they can’t be compared in the way the article proposes because they aren’t equivalent.

I struggled with the framing of the article because the research talks about “the algorithm” as though it were an element of nature, or a naturally occurring process like convection or mitosis. There is also no sense that it changes over time or that it could be changed intentionally to support a different scenario.*****

Facebook is a private corporation with a terrible public relations problem. It is periodically rated one of the least popular companies in existence. It is currently facing serious government investigations into illegal practices in many countries, some of which stem from the manipulation of its news feed algorithm. In this context, I have to say that it doesn’t seem wise for these Facebook researchers to have spun these data so hard in this direction, which I would summarize as: the algorithm is less selective and less polarizing. Particularly when the research finding in their own study is actually that the Facebook algorithm is modestly more selective and more polarizing than living your life without it.

 

 

Update: (6pm Eastern)

Wow, if you think I was critical have a look at these. It turns out I am the moderate one.

Eszter Hargittai from Northwestern posted on Crooked Timber that we should “stop being mesmerized by large numbers and go back to taking the fundamentals of social science seriously.” And (my favorite): “I thought Science was a serious peer-reviewed publication.”

Nathan Jurgenson from Maryland and Snapchat wrote on Cyborgology (“in a fury“) that Facebook is intentionally “evading” its own role in the production of the news feed. “Facebook cannot take its own role in news seriously.” He accuses the authors of using the “Big-N trick” to intentionally distract from methodological shortcomings. He tweeted that “we need to discuss how very poor corporate big data research gets fast tracked into being published.”

Zeynep Tufekci from UNC wrote on Medium that “I cannot remember a worse apples to oranges comparison” and that the key take-away from the study is actually the ordering effects of the algorithm (which I did not address in this post). “Newsfeed placement is a profoundly powerful gatekeeper for click-through rates.”

 

Update: (5/10)

A comment helpfully pointed out that I used the wrong percentages in my fourth point when summarizing the piece. Fixed it, with changes marked.

 

Update: (5/15)

It’s now one week since the Science study. This post has now been cited/linked in The New York Times, Fortune, Time, Wired, Ars Technica, Fast Company, Engaget, and maybe even a few more. I am still getting emails. The conversation has fixated on the <4% sample, often saying something like: “So, Facebook said this was a study about cars, but it was actually only about blue cars.” That’s fine, but the other point in my post is about what is being claimed at all, no matter the sample.

I thought my “coal mine” metaphor about the algorithm would work but it has not always worked. So I’ve clamped my Webcam to my desk lamp and recorded a four-minute video to explain it again, this time with a drawing.******

Here’s the video:
https://www.youtube.com/watch?v=eBPntMSDGSs

If the coal mine metaphor failed me, what would be a better metaphor? I’m not sure. Suggestions?

 

 

 

Notes:

* Diversity in hard news, in their study, would be a self-identified liberal who receives a story from FoxNews.com, or a self-identified conservative who receives one from the HuffingtonPost.com, where the stories are about “national news, politics, [or] world affairs.” In more precise terms, for each user “cross-cutting content” was defined as stories that are more likely to be shared by partisans who do not have the same self-identified ideological affiliation that you do.

** I don’t want to make this even more nitpicky, so I’ll put this in a footnote. The paper’s citations to Mutz and Huckfeldt et al. to mean that “exposure to cross-cutting viewpoints is associated with lower levels of political participation” is just bizarre. I hope it is a typo. These authors don’t advocate against exposure to cross-cutting viewpoints.

*** Perhaps this could be a new Facebook motto used in advertising: “Facebook: Better than one speculative dystopian future!”

**** In fact, algorithm and user form a coupled system of at least two feedback loops. But that’s not helpful to measure “amount” in the way the study wants to, so I’ll just tuck it away down here.

***** Facebook is behind the algorithm but they are trying to peer-review research about it without disclosing how it works — which is a key part of the study. There is also no way to reproduce the research (or do a second study on a primary phenomenon under study, the algorithm) without access to the Facebook platform.

****** In this video, I intentionally conflate (1) the number of posts filtered and (2) the magnitude of the bias of the filtering. I did so because the difficulty with the comparison works the same way for both, and I was trying to make the example simpler. Thanks to Cedric Langbort for pointing out that “baseline error” is the clearest way of explaining this.

 

(This was cross-posted to The Social Media Collective and Wired.)


Should You Boycott Traditional Journals?

March 30th, 2015 by Christian

(Or, Should I Stay or Should I Go?)

Is it time to boycott “traditional” scholarly publishing? Perhaps you are an academic researcher, just like me. Perhaps, just like me, you think that there are a lot of exciting developments in scholarly publishing thanks to the Internet. And you want to support them. And you also want people to read your research. But you also still need to be sure that your publication venues are held in high regard.

Or maybe you just receive research funding that is subject to new open access requirements.

Ask me about OPEN ACCESS

Academia is a funny place. We are supposedly self-governing. So if we don’t like how our scholarly communications are organized we should be able to fix this ourselves. If we are dissatisfied with the journal system, we’re going to have to do something about it. The question of whether or not it is now time to eschew closed access journals is something that comes up a fair amount among my peers.

It comes up often enough that a group of us at Michigan decided to write an article on the topic. Here’s the article.  It just came out yesterday (open access, of course):

Carl Lagoze, Paul Edwards, Christian Sandvig, & Jean-Christophe Plantin. (2015). Should I stay or Should I Go? Alternative Infrastructures in Scholarly Publishing. International Journal of Communication 9: 1072-1081.

The article is intended for those who want some help figuring out the answer to the question the article title poses: Should I stay or should I go? It’s meant help you decipher the unstable landscape of scholarly publishing these days. (Note that we restrict our topic to journal publishing.)

Researching it was a lot of fun, and I learned quite a bit about how scholarly communication works.

  • It contains a mention of the first journal. Yes, the first one that we would recognize as a journal in today’s terms. It’s Philosophical Transactions published by the Royal Society of London. It’s on Volume 373.
  • It should teach you about some of the recent goings-on in this area. Do you know what a green repository is? What about an overlay journal? Or the “serials crisis“?
  • It addresses a question I’ve had for a while: What the heck are those arXiv people up to? If it’s so great, why hasn’t it spread to all disciplines?
  • There’s some fun discussion of influential experiments in scholarly publishing. Remember the daring foundation of the Electronic Journal of Communication? Vectors? Were you around way-back-in-the-day when the pioneering, Web-based JCMC looked like this hot mess below? Little did we know that we were actually looking at the future.(*)

jcmc-1-1

(JCMC circa 1995)

(*): Unless we were looking at the Gopher version, then in that case we were not looking at the future.

Ultimately, we adapt a framework from Hirschman that we found to be an aid to our thinking about what is going on today in scholarly communication. Feel free to play this song on a loop as you read it.

 

(This post has been cross-posted on The Social Media Collective.)


Eco’s “How to Write a Thesis” in 15 Maxims

March 24th, 2015 by Christian

(or, Thesis Advice, Click-Bait Style)

Italian semiotician and novelist Umberto Eco released How to Write a Thesis in 1977, well before his rise to international intellectual stardom. It has just been released in English for the first time by MIT Press. I’ve just read it.

9780262527132_0_0

I was thinking of assigning it in doctoral seminars, but I regret that a great deal of the book involves scholarly practices that are no longer relevant to anyone. For instance: Is it OK to insert an unnecessary footnote in the middle of your text so that your footnote numbering matches up correctly with what you’ve already typed? (Meaning: So you don’t have to re-type the entire manuscript. On a typewriter.)

It turns out that it is not OK to insert unnecessary footnotes.

And there’s a whole bunch of things about index card management, diacritical marks, and library union indices. And some stuff about the laurea.

However, even if I do not find the book relevant to assign as a whole, Eco’s great wit and strong opinions did lead me to compile the best quotes from the book. I present them to you here:

Eco’s 15 Maxims for PhD Students:

From How to Write a Thesis [1977/2015], selected by me. These are slightly paraphrased to make them work in a list. I hope you like them as much as I did.

  1. Academic humility is the knowledge that anyone can teach us something. Practice it.
  2. A thesis is like a chess game that requires a player to plan in advance all the moves he will make to checkmate his opponent.
  3. How long does it take to write a thesis? No longer than three years and no less than six months.
  4. Imagine that you have a week to take a 600-mile car trip. Even if you are on vacation, you will not leave your house and begin driving indiscriminately in a random direction. A provisional table of contents will function as your work plan.
  5. You must write a thesis that you are able to write.
  6. Your thesis exists to prove the hypothesis that you devised at the outset, not to show the breadth of your knowledge.
  7. What you should never do is quote from an indirect source pretending that you have read the original.
  8. Quote the object of your interpretive analysis with reasonable abundance.
  9. Use notes to pay your debts.
  10. You should not become so paranoid that you believe you have been plagiarized every time a professor or another student addresses a topic related to your thesis.
  11. If you read the great scientists or the great critics you will see that, with a few exceptions, they are quite clear and are not ashamed of explaining things well.
  12. You are not Proust. Do not write long sentences.
  13. The language of a thesis is a metalanguage, that is, a language that speaks of other languages. A psychiatrist who describes the mentally ill does not express himself in the manner of his patients.
  14. If you do not feel qualified, do not defend your thesis.
  15. Do not whine and be complex-ridden, because it is annoying.

 


The Google Algorithm as a Robotic Nose

January 16th, 2015 by Christian

Algorithms, in the view of author Christopher Steiner, are poised to take over everything.  Algorithms embedded in software are now everywhere: Netflix recommendations, credit scores, driving directions, stock trading, Google search, Facebook’s news feed, the TSA’s process to decide who gets searched, the Home Depot prices you are quoted online, and so on. Just a few weeks ago, Ashtan Soltani, the new Chief Technologist of the FTC, has said that algorithmic transparency  is his central priority for the US government agency that is tasked with administration of fairness and justice in trade. Commentators are worried that the rise of hidden algorithmic automation is leading to a problematic new “black box society.”

But given that we want to achieve these “transparent” algorithms, how would we do that? Manfred Broy, writing in the context of software engineering, has said that one of the frustrations of working with software is that it is “almost intangible.”  Even if we suddenly obtained the source code for anything we wanted (which is unlikely) it usually not clear what code is doing.  How can we begin to have a meaningful conversation about the consequences of “an algorithm” by achieving some broad, shared understanding of what it is and what it is doing?

06-Sandvig-Seeing-the-Sort-2014-WEB.jpg

(An Ask.com advertising campaign.)

The answer, even among experts, is that we use metaphor, cartoons, diagrams, and abstraction. As a small beginning to tackling this problem of representing the algorithm, this week I have a new journal article out in the open access journal Media-N, titled “Seeing the Sort.” In it, I try for a critical consideration of how we represent algorithms visually. From flowcharts to cartoons, I go through examples of “algorithm public relations,” meaning both how algorithms are revealed to the public and also what spin the visualizers are trying for.

The most fun of writing the piece was choosing the examples, which include The Algo-Rythmics (an effort to represent algorithms in dance), an algorithm represented as a 19th century grist mill, and this Google cartoon that represents its algorithm as a robotic nose that smells Web pages:

(The Google algorithm as a robotic nose that smells Web pages.)

Read the article:

Sandvig, Christian. (2015). Seeing the Sort: The Aesthetic and Industrial Defense of “The Algorithm.” Media-N. vol. 10, no. 1. http://median.newmediacaucus.org/art-infrastructures-information/seeing-the-sort-the-aesthetic-and-industrial-defense-of-the-algorithm/ (this was also cross-posted to the Social Media Collective.)


Bad Behavior has blocked 178 access attempts in the last 7 days.