Should You Boycott Traditional Journals?

March 30th, 2015 by Christian

(Or, Should I Stay or Should I Go?)

Is it time to boycott “traditional” scholarly publishing? Perhaps you are an academic researcher, just like me. Perhaps, just like me, you think that there are a lot of exciting developments in scholarly publishing thanks to the Internet. And you want to support them. And you also want people to read your research. But you also still need to be sure that your publication venues are held in high regard.

Or maybe you just receive research funding that is subject to new open access requirements.

Ask me about OPEN ACCESS

Academia is a funny place. We are supposedly self-governing. So if we don’t like how our scholarly communications are organized we should be able to fix this ourselves. If we are dissatisfied with the journal system, we’re going to have to do something about it. The question of whether or not it is now time to eschew closed access journals is something that comes up a fair amount among my peers.

It comes up often enough that a group of us at Michigan decided to write an article on the topic. Here’s the article.  It just came out yesterday (open access, of course):

Carl Lagoze, Paul Edwards, Christian Sandvig, & Jean-Christophe Plantin. (2015). Should I stay or Should I Go? Alternative Infrastructures in Scholarly Publishing. International Journal of Communication 9: 1072-1081.

The article is intended for those who want some help figuring out the answer to the question the article title poses: Should I stay or should I go? It’s meant help you decipher the unstable landscape of scholarly publishing these days. (Note that we restrict our topic to journal publishing.)

Researching it was a lot of fun, and I learned quite a bit about how scholarly communication works.

  • It contains a mention of the first journal. Yes, the first one that we would recognize as a journal in today’s terms. It’s Philosophical Transactions published by the Royal Society of London. It’s on Volume 373.
  • It should teach you about some of the recent goings-on in this area. Do you know what a green repository is? What about an overlay journal? Or the “serials crisis“?
  • It addresses a question I’ve had for a while: What the heck are those arXiv people up to? If it’s so great, why hasn’t it spread to all disciplines?
  • There’s some fun discussion of influential experiments in scholarly publishing. Remember the daring foundation of the Electronic Journal of Communication? Vectors? Were you around way-back-in-the-day when the pioneering, Web-based JCMC looked like this hot mess below? Little did we know that we were actually looking at the future.(*)

jcmc-1-1

(JCMC circa 1995)

(*): Unless we were looking at the Gopher version, then in that case we were not looking at the future.

Ultimately, we adapt a framework from Hirschman that we found to be an aid to our thinking about what is going on today in scholarly communication. Feel free to play this song on a loop as you read it.

 

(This post has been cross-posted on The Social Media Collective.)


Eco’s “How to Write a Thesis” in 15 Maxims

March 24th, 2015 by Christian

(or, Thesis Advice, Click-Bait Style)

Italian semiotician and novelist Umberto Eco released How to Write a Thesis in 1977, well before his rise to international intellectual stardom. It has just been released in English for the first time by MIT Press. I’ve just read it.

9780262527132_0_0

I was thinking of assigning it in doctoral seminars, but I regret that a great deal of the book involves scholarly practices that are no longer relevant to anyone. For instance: Is it OK to insert an unnecessary footnote in the middle of your text so that your footnote numbering matches up correctly with what you’ve already typed? (Meaning: So you don’t have to re-type the entire manuscript. On a typewriter.)

It turns out that it is not OK to insert unnecessary footnotes.

And there’s a whole bunch of things about index card management, diacritical marks, and library union indices. And some stuff about the laurea.

However, even if I do not find the book relevant to assign as a whole, Eco’s great wit and strong opinions did lead me to compile the best quotes from the book. I present them to you here:

Eco’s 15 Maxims for PhD Students:

From How to Write a Thesis [1977/2015], selected by me. These are slightly paraphrased to make them work in a list. I hope you like them as much as I did.

  1. Academic humility is the knowledge that anyone can teach us something. Practice it.
  2. A thesis is like a chess game that requires a player to plan in advance all the moves he will make to checkmate his opponent.
  3. How long does it take to write a thesis? No longer than three years and no less than six months.
  4. Imagine that you have a week to take a 600-mile car trip. Even if you are on vacation, you will not leave your house and begin driving indiscriminately in a random direction. A provisional table of contents will function as your work plan.
  5. You must write a thesis that you are able to write.
  6. Your thesis exists to prove the hypothesis that you devised at the outset, not to show the breadth of your knowledge.
  7. What you should never do is quote from an indirect source pretending that you have read the original.
  8. Quote the object of your interpretive analysis with reasonable abundance.
  9. Use notes to pay your debts.
  10. You should not become so paranoid that you believe you have been plagiarized every time a professor or another student addresses a topic related to your thesis.
  11. If you read the great scientists or the great critics you will see that, with a few exceptions, they are quite clear and are not ashamed of explaining things well.
  12. You are not Proust. Do not write long sentences.
  13. The language of a thesis is a metalanguage, that is, a language that speaks of other languages. A psychiatrist who describes the mentally ill does not express himself in the manner of his patients.
  14. If you do not feel qualified, do not defend your thesis.
  15. Do not whine and be complex-ridden, because it is annoying.

 


The Google Algorithm as a Robotic Nose

January 16th, 2015 by Christian

Algorithms, in the view of author Christopher Steiner, are poised to take over everything.  Algorithms embedded in software are now everywhere: Netflix recommendations, credit scores, driving directions, stock trading, Google search, Facebook’s news feed, the TSA’s process to decide who gets searched, the Home Depot prices you are quoted online, and so on. Just a few weeks ago, Ashtan Soltani, the new Chief Technologist of the FTC, has said that algorithmic transparency  is his central priority for the US government agency that is tasked with administration of fairness and justice in trade. Commentators are worried that the rise of hidden algorithmic automation is leading to a problematic new “black box society.”

But given that we want to achieve these “transparent” algorithms, how would we do that? Manfred Broy, writing in the context of software engineering, has said that one of the frustrations of working with software is that it is “almost intangible.”  Even if we suddenly obtained the source code for anything we wanted (which is unlikely) it usually not clear what code is doing.  How can we begin to have a meaningful conversation about the consequences of “an algorithm” by achieving some broad, shared understanding of what it is and what it is doing?

06-Sandvig-Seeing-the-Sort-2014-WEB.jpg

(An Ask.com advertising campaign.)

The answer, even among experts, is that we use metaphor, cartoons, diagrams, and abstraction. As a small beginning to tackling this problem of representing the algorithm, this week I have a new journal article out in the open access journal Media-N, titled “Seeing the Sort.” In it, I try for a critical consideration of how we represent algorithms visually. From flowcharts to cartoons, I go through examples of “algorithm public relations,” meaning both how algorithms are revealed to the public and also what spin the visualizers are trying for.

The most fun of writing the piece was choosing the examples, which include The Algo-Rythmics (an effort to represent algorithms in dance), an algorithm represented as a 19th century grist mill, and this Google cartoon that represents its algorithm as a robotic nose that smells Web pages:

(The Google algorithm as a robotic nose that smells Web pages.)

Read the article:

Sandvig, Christian. (2015). Seeing the Sort: The Aesthetic and Industrial Defense of “The Algorithm.” Media-N. vol. 10, no. 1. http://median.newmediacaucus.org/art-infrastructures-information/seeing-the-sort-the-aesthetic-and-industrial-defense-of-the-algorithm/ (this was also cross-posted to the Social Media Collective.)


Corrupt Personalization

June 26th, 2014 by Christian

(“And also Bud Light.”)

In my last two posts I’ve been writing about my attempt to convince a group of sophomores with no background in my field that there has been a shift to the algorithmic allocation of attention — and that this is important. In this post I’ll respond to a student question. My favorite: “Sandvig says that algorithms are dangerous, but what are the the most serious repercussions that he envisions?” What is the coming social media apocalypse we should be worried about?

google flames

This is an important question because people who study this stuff are NOT as interested in this student question as they should be. Frankly, we are specialists who study media and computers and things — therefore we care about how algorithms allocate attention among cultural products almost for its own sake. Because this is the central thing that we study, we don’t spend a lot of time justifying it.

And our field’s most common response to the query “what are the dangers?” often lacks the required sense of danger. The most frequent response is: “extensive personalization is bad for democracy.” (a.k.a. Pariser’s “filter bubble,” Sunstein’s “egocentric” Internet, and so on). This framing lacks a certain house-on-fire urgency, doesn’t it?

(sarcastic tone:) “Oh, no! I’m getting to watch, hear, and read exactly what I want. Help me! Somebody do something!”

Sometimes (as Hindman points out) the contention is the opposite, that Internet-based concentration is bad for democracy.  But remember that I’m not speaking to political science majors here. The average person may not be as moved by an abstract, long-term peril to democracy as the average political science professor. As David Weinberger once said after I warned about the increasing reliance on recommendation algorithms, “So what?” Personalization sounds like a good thing.

As a side note, the second most frequent response I see is that algorithms are now everywhere. And they work differently than what came before. This also lacks a required sense of danger! Yes, they’re everywhere, but if they are a good thing

So I really like this question “what are the the most serious repercussions?” because I think there are some elements of the shift to attention-sorting algorithms that are genuinely “dangerous.” I can think of at least two, probably more, and they don’t get enough attention. In the rest of this post I’ll spell out the first one which I’ll call “corrupt personalization.”

Here we go.

Common-sense reasoning about algorithms and culture tells us that the purveyors of personalized content have the same interests we do. That is, if Netflix started recommending only movies we hate or Google started returning only useless search results we would stop using them. However: Common sense is wrong in this case. Our interests are often not the same as the providers of these selection algorithms.  As in my last post, let’s work through a few concrete examples to make the case.

In this post I’ll use Facebook examples, but the general problem of corrupt personalization is present on all of our media platforms in wide use that employ the algorithmic selection of content.

(1) Facebook “Like” Recycling

Screen Shot 2012-12-10 at 12.44.34 PM

(Image from ReadWriteWeb.)

On Facebook, in addition to advertisements along the side of the interface, perhaps you’ve noticed “featured,” “sponsored,” or “suggested” stories that appear inside your news feed, intermingled with status updates from your friends. It could be argued that this is not in your interest as a user (did you ever say, “gee, I’d like ads to look just like messages from my friends”?), but I have bigger fish to fry.

Many ads on Facebook resemble status updates in that there can be messages endorsing the ads with “likes.” For instance, here is an older screenshot from ReadWriteWeb:

pages you may like on facebook

Another example: a “suggested” post was mixed into my news feed just this morning. recommending World Cup coverage on Facebook itself. It’s a Facebook ad for Facebook, in other words.  It had this intriguing addendum:

CENSORED likes facebook

So, wait… I have hundreds of friends and eleven of them “like” Facebook?  Did they go to http://www.facebook.com and click on a button like this:

Facebook like button magnified

But facebook.com doesn’t even have a “Like” button!  Did they go to Facebook’s own Facebook page (yes, there is one) and click “Like”? I know these people and that seems unlikely. And does Nicolala really like Walmart? Hmmm…

What does this “like” statement mean? Welcome to the strange world of “like” recycling. Facebook has defined “like” in ways that depart from English usage.  For instance, in the past Facebook has determined that:

  1. Anyone who clicks on a “like” button is considered to have “liked” all future content from that source. So if you clicked a “like” button because someone shared a “Fashion Don’t” from Vice magazine, you may be surprised when your dad logs into Facebook three years later and is shown a current sponsored story from Vice.com like “Happy Masturbation Month!” or “How to Make it in Porn” with the endorsement that you like it.  Vice.com example is from Craig Condon [NSFW].)
  2. Anyone who “likes” a comment on a shared link is considered to “like” wherever that link points to.  a.k.a. “‘liking a share.” So if you see a (real) FB status update from a (real) friend and it says: “Yuck! The McLobster is a disgusting product idea!” and your (real) friend include a (real) link like this one — that means if you clicked “like” your friends may see McDonald’s ads in the future that include the phrase “(Your Name) likes McDonalds.” (This example is from ReadWriteWeb.)

fauxLike_mcdonalds

This has led to some interesting results, like dead people “liking” current news stories on Facebook.

There is already controversy about advertiser “like” inflation, “like” spam, and fake “likes,” — and these things may be a problem too, but that’s not what we are talking about here.  In the examples above the system is working as Facebook designed it to. A further caveat: note that the definition of “like” in Facebook’s software changes periodically and when they are sued. Facebook now has an opt-out setting for the above two “features.”

But these incendiary examples are exceptional fiascoes — on the whole the system probably works well. You likely didn’t know that your “like” clicks are merrily producing ads on your friends pages and in your name because you cannot see them.  These “stories” do not appear on your news feed and cannot be individually deleted.

Unlike the examples from my last post you can’t quickly reproduce these results with certainty on your own account. Still, if you want to try, make a new Facebook account under a fake name (warning! dangerous!) and friend your real account. Then use the new account to watch your status updates.

Why would Facebook do this? Obviously it is a controversial practice that is not going to be popular with users. Yet Facebook’s business model is to produce attention for advertisers, not to help you — silly rabbit. So they must have felt that using your reputation to produce more ad traffic from your friends was worth the risk of irritating you. Or perhaps they thought that the practice could be successfully hidden from users — that strategy has mostly worked!

In sum this is a personalization scheme that does not serve your goals, it serves Facebook’s goals at your expense.

(2) “Organic” Content

This second group of examples concerns content that we consider to be “not advertising,” a.k.a. “organic” content. Funnily enough, algorithmic culture has produced this new use of the word “organic” — but has also made the boundary between “advertising” and “not advertising” very blurry.

funny-organic-food-ad

 

The general problem is that there are many ways in which algorithms act as mixing valves between things that can be easily valued with money (like ads) and things that can’t. And this kind of mixing is a normative problem (what should we do) and not a technical problem (how do we do it).

For instance, for years Facebook has encouraged nonprofits, community-based organizations, student clubs, other groups, and really anyone to host content on facebook.com.  If an organization creates a Facebook page for itself, the managers can update the page as though it were a profile.

Most page managers expect that people who “like” that page get to see the updates… which was true until January of this year. At that time Facebook modified its algorithm so that text updates from organizations were not widely shared. This is interesting for our purposes because Facebook clearly states that it wants page operators to run Facebook ad campaigns, and not to count on getting traffic from “organic” status updates, as it will no longer distribute as many of them.

This change likely has a very differential effect on, say, Nike‘s Facebook page, a small local business‘s Facebook page, Greenpeace International‘s Facebook page, and a small local church congregation‘s Facebook page. If you start a Facebook page for a school club, you might be surprised that you are spending your labor writing status updates that are never shown to anyone. Maybe you should buy an ad. Here’s an analytic for a page I manage:

this week page likes facebook

 

The impact isn’t just about size — at some level businesses might expect to have to insert themselves into conversations via persuasive advertising that they pay for, but it is not as clear that people expect Facebook to work this way for their local church or other domains of their lives. It’s as if on Facebook, people were using the yellow pages but they thought they were using the white pages.  And also there are no white pages.

(Oh, wait. No one knows what yellow pages and white pages are anymore. Scratch that reference, then.)

No need to stop here, in the future perhaps Facebook can monetize my family relationships. It could suggest that if I really want anyone to know about the birth of my child, or I really want my “insightful” status updates to reach anyone, I should turn to Facebook advertising.

Let me also emphasize that this mixing problem extends to the content of our personal social media conversations as well. A few months back, I posted a Facebook status update that I thought was humorous. I shared a link highlighting the hilarious product reviews for the Bic “Cristal For Her” ballpoint pen on Amazon. It’s a pen designed just for women.

bic crystal for her

The funny thing is that I happened to look at a friend of mine’s Facebook feed over their shoulder, and my status update didn’t go away. It remained, pegged at the top of my friend’s news feed, for as long as 14 days in one instance. What great exposure for my humor, right? But it did seem a little odd… I queried my other friends on Facebook and some confirmed that the post was also pegged at the top of their news feed.

I was unknowingly participating in another Facebook program that converts organic status updates into ads. It does this by changing their order in the news feed and adding the text “Sponsored” in light gray, which is very hard to see. Otherwise at least some updates are not changed. I suspect Facebook’s algorithm thought I was advertising Amazon (since that’s where the link pointed), but I am not sure.

This is similar to Twitter’s “Promoted Tweets” but there is one big difference.  In the Facebook case the advertiser promotes content — my content — that they did not write. In effect Facebook is re-ordering your conversations with your friends and family on the basis of whether or not someone mentioned Coke, Levi’s, and Anheuser Busch (confirmed advertisers in the program).

Sounds like a great personal social media strategy there–if you really want people to know about your forthcoming wedding, maybe just drop a few names? Luckily the algorithms aren’t too clever about this yet so you can mix up the word order for humorous effect.

(Facebook status update:) “I am so delighted to be engaged to this wonderful woman that I am sitting here in my Michelob drinking a Docker’s Khaki Collection. And also Coke.”

Be sure to use links. I find the interesting thing about this mixing of the commercial and non-commercial to be that it sounds to my ears like some sort of corny, unrealistic science fiction scenario and yet with the current Facebook platform I believe the above example would work. We are living in the future.

So to recap, if Nike makes a Facebook page and posts status updates to it, that’s “organic” content because they did not pay Facebook to distribute it. Although any rational human being would see it as an ad. If my school group does the same thing, that’s also organic content, but they are encouraged to buy distribution — which would make it inorganic. If I post a status update or click “like” in reaction to something that happens in my life and that happens to involve a commercial product, my action starts out as organic, but then it becomes inorganic (paid for) because a company can buy my words and likes and show them to other people without telling me. Got it? This paragraph feels like we are rethinking CHEM 402.

The upshot is that control of the content selection algorithm is used by Facebook to get people to pay for things they wouldn’t expect to pay for, and to show people personalized things that they don’t think are paid for. But these things were in fact paid for.  In sum this is again a scheme that does not serve your goals, it serves Facebook’s goals at your expense.

The Danger: Corrupt Personalization

With these concrete examples behind us, I can now more clearly answer this student question. What are the most serious repercussions of the algorithmic allocation of attention?

I’ll call this first repercussion “corrupt personalization” after C. Edwin Baker. (Baker, a distinguished legal philosopher, coined the phrase “corrupt segmentation” in 1998 as an extension of the theories of philosopher Jürgen Habermas.)

Here’s how it works: You have legitimate interests that we’ll call “authentic.” These interests arise from your values, your community, your work, your family, how you spend your time, and so on. A good example might be that as a person who is enrolled in college you might identify with the category “student,” among your many other affiliations. As a student, you might be authentically interested in an upcoming tuition increase or, more broadly, about the contention that “there are powerful forces at work in our society that are actively hostile to the college ideal.”

However, you might also be authentically interested in the fact that your cousin is getting married. Or in pictures of kittens.

Grumpy-Cat-meme-610x405

Corrupt personalization is the process by which your attention is drawn to interests that are not your own. This is a little tricky because it is impossible to clearly define an “authentic” interest. However, let’s put that off for the moment.

In the prior examples we saw some (I hope) obvious places where my interests diverged from that of algorithmic social media systems. Highlights for me were:

  • When I express my opinion about something to my friends and family, I do not want that opinion re-sold without my knowledge or consent.
  • When I explicitly endorse something, I don’t want that endorsement applied to other things that I did not endorse.
  • If I want to read a list of personalized status updates about my friends and family, I do not want my friends and family sorted by how often they mention advertisers.
  • If a list of things is chosen for me, I want the results organized by some measure of goodness for me, not by how much money someone has paid.
  • I want paid content to be clearly identified.
  • I do not want my information technology to sort my life into commercial and non-commercial content and systematically de-emphasize the noncommercial things that I do, or turn these things toward commercial purposes.

More generally, I think the danger of corrupt personalization is manifest in three ways.

  1. Things that are not necessarily commercial become commercial because of the organization of the system. (Merton called this “pseudo-gemeinschaft,” Habermas called it “colonization of the lifeworld.”)
  2. Money is used as a proxy for “best” and it does not work. That is, those with the most money to spend can prevail over those with the most useful information. The creation of a salable audience takes priority over your authentic interests. (Smythe called this the “audience commodity,” it is Baker’s “market filter.”)
  3. Over time, if people are offered things that are not aligned with their interests often enough, they can be taught what to want. That is, they may come to wrongly believe that these are their authentic interests, and it may be difficult to see the world any other way. (Similar to Chomsky and Herman’s [not Lippman’s] arguments about “manufacturing consent.”)

There is nothing inherent in the technologies of algorithmic allocation that is doing this to us, instead the economic organization of the system is producing these pressures. In fact, we could design a system to support our authentic interests, but we would then need to fund it. (Thanks, late capitalism!)

To conclude, let’s get some historical perspective. What are the other options, anyway? If cultural selection is governed by computer algorithms now, you might answer, “who cares?” It’s always going to be governed somehow. If I said in a talk about “algorithmic culture” that I don’t like the Netflix recommender algorithm, what is supposed to replace it?

This all sounds pretty bad, so you might think I am asking for a return to “pre-algorithmic” culture: Let’s reanimate the corpse of Louis B. Mayer and he can decide what I watch. That doesn’t seem good either and I’m not recommending it. We’ve always had selection systems and we could even call some of the earlier ones “algorithms” if we want to.  However, we are constructing something new and largely unprecedented here and it isn’t ideal. It isn’t that I think algorithms are inherently dangerous, or bad — quite the contrary. To me this seems like a case of squandered potential.

With algorithmic culture, computers and algorithms are allowing a new level of real-time personalization and content selection on an individual basis that just wasn’t possible before. But rather than use these tools to serve our authentic interests, we have built a system that often serves a commercial interest that is often at odds with our interests — that’s corrupt personalization.

If I use the dominant forms of communication online today (Facebook, Google, Twitter, YouTube, etc.) I can expect content customized for others to use my name and my words without my consent, in ways I wouldn’t approve of. Content “personalized” for me includes material I don’t want, and obscures material that I do want. And it does so in a way that I may not be aware of.

This isn’t an abstract problem like a long-term threat to democracy, it’s more like a mugging — or at least a confidence game or a fraud. It’s violence being done to you right now, under your nose. Just click “like.”

In answer to your question, dear student, that’s my first danger.

* * *

ADDENDUM:

This blog post is already too long, but here is a TL;DR addendum for people who already know about all this stuff.

I’m calling this corrupt personalization because I cant just apply Baker’s excellent ideas about corrupt segments — the world has changed since he wrote them. Although this post’s reasoning is an extension of Baker, it is not a straightforward extension.

Algorithmic attention is a big deal because we used to think about media and identity using categories, but the algorithms in wide use are not natively organized that way. Baker’s ideas were premised on the difference between authentic and inauthentic categories (“segments”), yet segments are just not that important anymoreBermejo calls this the era of post-demographics.

Advertisers used to group demographics together to make audiences comprehensible, but it may no longer be necessary to buy and sell demographics or categories as they are a crude proxy for purchasing behavior. If I want to sell a Subaru, why buy access to “Brite Lights, Li’l City” (My PRIZM marketing demographic from the 1990s) when I can directly detect “intent to purchase a station wagon” or “shopping for a Subaru right now”? This complicates Baker’s idea of authentic segments quite a bit. See also Gillespie’s concept of calculated publics.

Also Baker was writing in an era where content was inextricably linked to advertising because it was not feasible to decouple them. But today algorithmic attention sorting has often completely decoupled advertising from content. Online we see ads from networks that are based on user behavior over time, rather than what content the user is looking at right now. The relationship between advertising support and content is therefore more subtle than in the previous era, and this bears more investigation.

Okay, okay I’ll stop now.

(This post was cross-posted to The Social Media Collective.)


Show and Tell: Algorithmic Culture

March 25th, 2014 by Christian

(or, What you need to know about “puppy dog hate”)

(or, “It’s not that I’m uninterestedin hygiene…”)

Last week I tried to get a group of random sophomores to care about algorithmic culture. I argued that software algorithms are transforming communication and knowledge. The jury is still out on my success at that, but in this post I’ll continue the theme by reviewing the interactive examples I used to make my point. I’m sharing them because they are fun to try. I’m also hoping the excellent readers of this blog can think of a few more.

I’ll call my three examples “puppy dog hate,” “top stories fail,” and “your DoubleClick cookie filling.”  They should highlight the ways in which algorithms online are selecting content for your attention. And ideally they will be good fodder for discussion. Let’s begin:

Three Ways to Demonstrate Algorithmic Culture

(1.) puppy dog hate (Google Instant)

You’ll want to read the instructions fully before trying this. Go to http://www.google.com/ and type “puppy”, then [space], then “dog”, then [space], but don’t hit [Enter].  That means you should have typed “puppy dog ” (with a trailing space). Results should appear without the need to press [Enter]. I got this:

Now repeat the above instructions but instead of “puppy” use the word “bitch” (so: “bitch dog “).  Right now you’ll get nothing. I got nothing. (The blank area below is intentionally blank.) No matter how many words you type, if one of the words is “bitch” you’ll get no instant results.

What’s happening? Google Instant is the Google service that displays results while you are still typing your query. In the algorithm for Google Instant, it appears that your query is checked against a list of forbidden words. If the query contains one of the forbidden words (like “bitch”) no “instant” results will be shown, but you can still search Google the old-fashioned way by pressing [Enter].

This is an interesting example because it is incredibly mild censorship, and that is typical of algorithmic sorting on the Internet. Things aren’t made to be impossible, some things are just a little harder than others. We can discuss whether or not this actually matters to anyone. After all, you could still search for anything you wanted to, but some searches are made slightly more time-consuming because you will have to press [Enter] and you do not receive real-time feedback as you construct your search query.

It’s also a good example that makes clear how problematic algorithmic censorship can be. The hackers over at 2600 reverse engineered Google Instant’s blacklist (NSFW) and it makes absolutely no sense. The blocked words I tried (like “bitch”) produce perfectly inoffensive search results (sometimes because of other censorship algorithms, like Google SafeSearch). It is not clear to me why they should be blocked. For instance, anatomical terms for some parts of the female anatomy are blocked while other parts of the female anatomy are not blocked.

Some of the blocking is just silly. For instance, “hate” is blocked. This means you can make the Google Instant results disappear by adding “hate” to the end of an otherwise acceptable query. e.g., “puppy dog hate ” will make the search results I got earlier disappear as soon as I type the trailing space. (Remember not to press [Enter].)

This is such a simple implementation that it barely qualifies as an algorithm. It also differs from my other examples because it appears that an actual human compiled this list of blocked words. That might be useful to highlight because we typically think that companies like Google do everything with complicated math and not site-by-site or word-by-word rules–they have claimed as much, but this example shows that in fact this crude sort of blacklist censorship still goes on.

Google does censor actual search results (what you get after pressing [Enter]) in a variety of ways but that is a topic for another time. This exercise with Google Instant at least gets us started thinking about algorithms, whose interests they are serving, and whether or not they are doing their job well.

(2.) Top Stories Fail (Facebook)

In this example, you’ll need a Facebook account.  Go to http://www.facebook.com/ and look for the tiny little toggle that appears under the text “News Feed.” This allows you to switch between two different sorting algorithms: the Facebook proprietary EdgeRank algorithm (this is the default), and “most recent.” (On my interface this toggle is in the upper left, but Facebook has multiple user interfaces at any given time and for some people it appears in the center of the page at the top.)

Switch this toggle back and forth and look at how your feed changes.

What’s happening? Okay, we know that among 18-29 year-old Facebook users the median number of friends is now 300. Even given that most people are not over-sharers, with some simple arithmetic it is clear that some of the things posted to Facebook may never be seen by anyone. A status update is certainly unlikely to be seen by anywhere near your entire friend network. Facebook’s “Top Stories” (EdgeRank) algorithm is the solution to the oversupply of status updates and the undersupply of attention to them, it determines what appears on your news feed and how it is sorted.

We know that Facebook’s “Top Stories” sorting algorithm uses a heavy hand. It is quite likely that you have people in your friend network that post to Facebook A LOT but that Facebook has decided to filter out ALL of their posts. These might be called your “silenced Facebook friends.” Sometimes when people do this toggling-the-algorithm exercise they exclaim: “Oh, I forgot that so-and-so was even on Facebook.”

Since we don’t know the exact details of EdgeRank, it isn’t clear exactly how Facebook is deciding which of your friends you should hear from and which should be ignored. Even though the algorithm might be well-constructed, it’s interesting that when I’ve done this toggling exercise with a large group a significant number of people say that Facebook’s algorithm produces a much more interesting list of posts than “Most Recent,” while a significant number of people say the opposite — that Facebook’s algorithm makes their news feed worse. (Personally, I find “Most Recent” produces a far more interesting news feed than “Top Stories.”)

It is an interesting intellectual exercise to try and reverse-engineer Facebook’s EdgeRank on your own by doing this toggling. Why is so-and-so hidden from you? What is it they are doing that Facebook thinks you wouldn’t like? For example, I think that EdgeRank doesn’t work well for me because I select my friends carefully, then I don’t provide much feedback that counts toward EdgeRank after that. So my initial decision about who to friend works better as a sort without further filtering (“most recent”) than Facebook’s decision about what to hide. (In contrast, some people I spoke with will friend anyone, and they do a lot more “liking” than I do.)

What does it mean that your relationship to your friends is mediated by this secret algorithm? A minor note: If you switch to “most recent” some people have reported that after a while Facebook will switch you back to Facebook’s “Top Stories” algorithm without asking.

There are deeper things to say about Facebook, but this is enough to start with. Onward. 

(3.) Your DoubleClick Cookie Filling (DoubleClick)

This example will only work if you browse the Web regularly from the same Web browser on the same computer and you have cookies turned on. (That describes most people.) Go to the Google Ads settings page — the URL is a mess so here’s a shortcut: http://bit.ly/uc256google

Look at the right column, headed “Google Ads Across The Web,” then scroll down and look for the section marked “Interests.” The other parts may be interesting too, such as Google’s estimate of your Gender, Age, and the language you speak — all of which may or may not be correct.  Here’s a screen shot:

If you have “interests” listed, click on “Edit” to see a list of topics.

What’s Happening? Google is the largest advertising clearinghouse on the Web. (It bought DoubleClick in 2007 for over $3 billion.) When you visit a Web site that runs Google Ads — this is likely quite common — your visit is noted and a pattern of all of your Web site visits is then compiled and aggregated with other personal information that Google may know about you.

What a big departure from some old media! In comparison, in most states it is illegal to gather a list of books you’ve read at the library because this would reveal too much information about you. Yet for Web sites this data collection is the norm.

This settings page won’t reveal Google’s ad placement algorithm, but it shows you part of the result: a list of the categories that the algorithm is currently using to choose advertising content to display to you. Your attention will be sold to advertisers in these categories and you will see ads that match these categories.

This list is quite volatile and this is linked to the way Google hopes to connect advertisers with people who are interested in a particular topic RIGHT NOW. Unlike demographics that are presumed to change slowly (age) or not to change at all (gender), Google appears to base a lot of its algorithm on your recent browsing history. That means if you browse the Web differently you can change this list fairly quickly (in a matter of days, at least).

Many people find the list uncannily accurate, while some are surprised at how inaccurate it is. Usually it is a mixture. Note that some categories are very specific (“Currency Exchange”), while others are very broad (“Humor”).  Right now it thinks I am interested in 27 things, some of them are:

  • Standardized & Admissions Tests (Yes.)
  • Roleplaying Games (Yes.)
  • Dishwashers (No.)
  • Dresses (No.)

You can also type in your own interests to save Google the trouble of profiling you.

Again this is an interesting algorithm to speculate about. I’ve been checking this for a few years and I persistently get “Hygiene & Toiletries.” I am insulted by this. It’s not that I’m uninterested in hygiene but I think I am no more interested in hygiene than the average person. I don’t visit any Web sites about hygiene or toiletries. So I’d guess this means… what exactly? I must visit Web sites that are visited by other people who visit sites about hygiene and toiletries. Not a group I really want to be a part of, to be honest.

These were three examples of algorithm-ish activities that I’ve used. Any other ideas? I was thinking of trying something with an item-to-item recommender system but I could not come up with a great example. I tried anonymized vs. normal Web searching to highlight location-specific results but I could not think of a search term that did a great job showing a contrast.  I also tried personalized twitter trends vs. location-based twitter trends but the differences were quite subtle. Maybe you can do better.

In my next post I’ll write about how the students reacted to all this.

 

(This was also cross-posted to The Social Media Collective.)

 


Bad Behavior has blocked 221 access attempts in the last 7 days.