You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Eco’s “How to Write a Thesis” in 15 Maxims

March 24th, 2015 by Christian

(or, Thesis Advice, Click-Bait Style)

Italian semiotician and novelist Umberto Eco released How to Write a Thesis in 1977, well before his rise to international intellectual stardom. It has just been released in English for the first time by MIT Press. I’ve just read it.

9780262527132_0_0

I was thinking of assigning it in doctoral seminars, but I regret that a great deal of the book involves scholarly practices that are no longer relevant to anyone. For instance: Is it OK to insert an unnecessary footnote in the middle of your text so that your footnote numbering matches up correctly with what you’ve already typed? (Meaning: So you don’t have to re-type the entire manuscript. On a typewriter.)

It turns out that it is not OK to insert unnecessary footnotes.

And there’s a whole bunch of things about index card management, diacritical marks, and library union indices. And some stuff about the laurea.

However, even if I do not find the book relevant to assign as a whole, Eco’s great wit and strong opinions did lead me to compile the best quotes from the book. I present them to you here:

Eco’s 15 Maxims for PhD Students:

From How to Write a Thesis [1977/2015], selected by me. These are slightly paraphrased to make them work in a list. I hope you like them as much as I did.

  1. Academic humility is the knowledge that anyone can teach us something. Practice it.
  2. A thesis is like a chess game that requires a player to plan in advance all the moves he will make to checkmate his opponent.
  3. How long does it take to write a thesis? No longer than three years and no less than six months.
  4. Imagine that you have a week to take a 600-mile car trip. Even if you are on vacation, you will not leave your house and begin driving indiscriminately in a random direction. A provisional table of contents will function as your work plan.
  5. You must write a thesis that you are able to write.
  6. Your thesis exists to prove the hypothesis that you devised at the outset, not to show the breadth of your knowledge.
  7. What you should never do is quote from an indirect source pretending that you have read the original.
  8. Quote the object of your interpretive analysis with reasonable abundance.
  9. Use notes to pay your debts.
  10. You should not become so paranoid that you believe you have been plagiarized every time a professor or another student addresses a topic related to your thesis.
  11. If you read the great scientists or the great critics you will see that, with a few exceptions, they are quite clear and are not ashamed of explaining things well.
  12. You are not Proust. Do not write long sentences.
  13. The language of a thesis is a metalanguage, that is, a language that speaks of other languages. A psychiatrist who describes the mentally ill does not express himself in the manner of his patients.
  14. If you do not feel qualified, do not defend your thesis.
  15. Do not whine and be complex-ridden, because it is annoying.

 


The Google Algorithm as a Robotic Nose

January 16th, 2015 by Christian

Algorithms, in the view of author Christopher Steiner, are poised to take over everything.  Algorithms embedded in software are now everywhere: Netflix recommendations, credit scores, driving directions, stock trading, Google search, Facebook’s news feed, the TSA’s process to decide who gets searched, the Home Depot prices you are quoted online, and so on. Just a few weeks ago, Ashtan Soltani, the new Chief Technologist of the FTC, has said that algorithmic transparency  is his central priority for the US government agency that is tasked with administration of fairness and justice in trade. Commentators are worried that the rise of hidden algorithmic automation is leading to a problematic new “black box society.”

But given that we want to achieve these “transparent” algorithms, how would we do that? Manfred Broy, writing in the context of software engineering, has said that one of the frustrations of working with software is that it is “almost intangible.”  Even if we suddenly obtained the source code for anything we wanted (which is unlikely) it usually not clear what code is doing.  How can we begin to have a meaningful conversation about the consequences of “an algorithm” by achieving some broad, shared understanding of what it is and what it is doing?

06-Sandvig-Seeing-the-Sort-2014-WEB.jpg

(An Ask.com advertising campaign.)

The answer, even among experts, is that we use metaphor, cartoons, diagrams, and abstraction. As a small beginning to tackling this problem of representing the algorithm, this week I have a new journal article out in the open access journal Media-N, titled “Seeing the Sort.” In it, I try for a critical consideration of how we represent algorithms visually. From flowcharts to cartoons, I go through examples of “algorithm public relations,” meaning both how algorithms are revealed to the public and also what spin the visualizers are trying for.

The most fun of writing the piece was choosing the examples, which include The Algo-Rythmics (an effort to represent algorithms in dance), an algorithm represented as a 19th century grist mill, and this Google cartoon that represents its algorithm as a robotic nose that smells Web pages:

(The Google algorithm as a robotic nose that smells Web pages.)

Read the article:

Sandvig, Christian. (2015). Seeing the Sort: The Aesthetic and Industrial Defense of “The Algorithm.” Media-N. vol. 10, no. 1. http://median.newmediacaucus.org/art-infrastructures-information/seeing-the-sort-the-aesthetic-and-industrial-defense-of-the-algorithm/ (this was also cross-posted to the Social Media Collective.)


Corrupt Personalization

June 26th, 2014 by Christian

(“And also Bud Light.”)

In my last two posts I’ve been writing about my attempt to convince a group of sophomores with no background in my field that there has been a shift to the algorithmic allocation of attention — and that this is important. In this post I’ll respond to a student question. My favorite: “Sandvig says that algorithms are dangerous, but what are the the most serious repercussions that he envisions?” What is the coming social media apocalypse we should be worried about?

google flames

This is an important question because people who study this stuff are NOT as interested in this student question as they should be. Frankly, we are specialists who study media and computers and things — therefore we care about how algorithms allocate attention among cultural products almost for its own sake. Because this is the central thing that we study, we don’t spend a lot of time justifying it.

And our field’s most common response to the query “what are the dangers?” often lacks the required sense of danger. The most frequent response is: “extensive personalization is bad for democracy.” (a.k.a. Pariser’s “filter bubble,” Sunstein’s “egocentric” Internet, and so on). This framing lacks a certain house-on-fire urgency, doesn’t it?

(sarcastic tone:) “Oh, no! I’m getting to watch, hear, and read exactly what I want. Help me! Somebody do something!”

Sometimes (as Hindman points out) the contention is the opposite, that Internet-based concentration is bad for democracy.  But remember that I’m not speaking to political science majors here. The average person may not be as moved by an abstract, long-term peril to democracy as the average political science professor. As David Weinberger once said after I warned about the increasing reliance on recommendation algorithms, “So what?” Personalization sounds like a good thing.

As a side note, the second most frequent response I see is that algorithms are now everywhere. And they work differently than what came before. This also lacks a required sense of danger! Yes, they’re everywhere, but if they are a good thing

So I really like this question “what are the the most serious repercussions?” because I think there are some elements of the shift to attention-sorting algorithms that are genuinely “dangerous.” I can think of at least two, probably more, and they don’t get enough attention. In the rest of this post I’ll spell out the first one which I’ll call “corrupt personalization.”

Here we go.

Common-sense reasoning about algorithms and culture tells us that the purveyors of personalized content have the same interests we do. That is, if Netflix started recommending only movies we hate or Google started returning only useless search results we would stop using them. However: Common sense is wrong in this case. Our interests are often not the same as the providers of these selection algorithms.  As in my last post, let’s work through a few concrete examples to make the case.

In this post I’ll use Facebook examples, but the general problem of corrupt personalization is present on all of our media platforms in wide use that employ the algorithmic selection of content.

(1) Facebook “Like” Recycling

Screen Shot 2012-12-10 at 12.44.34 PM

(Image from ReadWriteWeb.)

On Facebook, in addition to advertisements along the side of the interface, perhaps you’ve noticed “featured,” “sponsored,” or “suggested” stories that appear inside your news feed, intermingled with status updates from your friends. It could be argued that this is not in your interest as a user (did you ever say, “gee, I’d like ads to look just like messages from my friends”?), but I have bigger fish to fry.

Many ads on Facebook resemble status updates in that there can be messages endorsing the ads with “likes.” For instance, here is an older screenshot from ReadWriteWeb:

pages you may like on facebook

Another example: a “suggested” post was mixed into my news feed just this morning. recommending World Cup coverage on Facebook itself. It’s a Facebook ad for Facebook, in other words.  It had this intriguing addendum:

CENSORED likes facebook

So, wait… I have hundreds of friends and eleven of them “like” Facebook?  Did they go to http://www.facebook.com and click on a button like this:

Facebook like button magnified

But facebook.com doesn’t even have a “Like” button!  Did they go to Facebook’s own Facebook page (yes, there is one) and click “Like”? I know these people and that seems unlikely. And does Nicolala really like Walmart? Hmmm…

What does this “like” statement mean? Welcome to the strange world of “like” recycling. Facebook has defined “like” in ways that depart from English usage.  For instance, in the past Facebook has determined that:

  1. Anyone who clicks on a “like” button is considered to have “liked” all future content from that source. So if you clicked a “like” button because someone shared a “Fashion Don’t” from Vice magazine, you may be surprised when your dad logs into Facebook three years later and is shown a current sponsored story from Vice.com like “Happy Masturbation Month!” or “How to Make it in Porn” with the endorsement that you like it. (Vice.com example is from Craig Condon [NSFW].)
  2. Anyone who “likes” a comment on a shared link is considered to “like” wherever that link points to.  a.k.a. “‘liking a share.” So if you see a (real) FB status update from a (real) friend and it says: “Yuck! The McLobster is a disgusting product idea!” and your (real) friend include a (real) link like this one — that means if you clicked “like” your friends may see McDonald’s ads in the future that include the phrase “(Your Name) likes McDonalds.” (This example is from ReadWriteWeb.)

fauxLike_mcdonalds

This has led to some interesting results, like dead people “liking” current news stories on Facebook.

There is already controversy about advertiser “like” inflation, “like” spam, and fake “likes,” — and these things may be a problem too, but that’s not what we are talking about here.  In the examples above the system is working as Facebook designed it to. A further caveat: note that the definition of “like” in Facebook’s software changes periodically and when they are sued. Facebook now has an opt-out setting for the above two “features.”

But these incendiary examples are exceptional fiascoes — on the whole the system probably works well. You likely didn’t know that your “like” clicks are merrily producing ads on your friends pages and in your name because you cannot see them.  These “stories” do not appear on your news feed and cannot be individually deleted.

Unlike the examples from my last post you can’t quickly reproduce these results with certainty on your own account. Still, if you want to try, make a new Facebook account under a fake name (warning! dangerous!) and friend your real account. Then use the new account to watch your status updates.

Why would Facebook do this? Obviously it is a controversial practice that is not going to be popular with users. Yet Facebook’s business model is to produce attention for advertisers, not to help you — silly rabbit. So they must have felt that using your reputation to produce more ad traffic from your friends was worth the risk of irritating you. Or perhaps they thought that the practice could be successfully hidden from users — that strategy has mostly worked!

In sum this is a personalization scheme that does not serve your goals, it serves Facebook’s goals at your expense.

(2) “Organic” Content

This second group of examples concerns content that we consider to be “not advertising,” a.k.a. “organic” content. Funnily enough, algorithmic culture has produced this new use of the word “organic” — but has also made the boundary between “advertising” and “not advertising” very blurry.

funny-organic-food-ad

 

The general problem is that there are many ways in which algorithms act as mixing valves between things that can be easily valued with money (like ads) and things that can’t. And this kind of mixing is a normative problem (what should we do) and not a technical problem (how do we do it).

For instance, for years Facebook has encouraged nonprofits, community-based organizations, student clubs, other groups, and really anyone to host content on facebook.com.  If an organization creates a Facebook page for itself, the managers can update the page as though it were a profile.

Most page managers expect that people who “like” that page get to see the updates… which was true until January of this year. At that time Facebook modified its algorithm so that text updates from organizations were not widely shared. This is interesting for our purposes because Facebook clearly states that it wants page operators to run Facebook ad campaigns, and not to count on getting traffic from “organic” status updates, as it will no longer distribute as many of them.

This change likely has a very differential effect on, say, Nike‘s Facebook page, a small local business‘s Facebook page, Greenpeace International‘s Facebook page, and a small local church congregation‘s Facebook page. If you start a Facebook page for a school club, you might be surprised that you are spending your labor writing status updates that are never shown to anyone. Maybe you should buy an ad. Here’s an analytic for a page I manage:

this week page likes facebook

 

The impact isn’t just about size — at some level businesses might expect to have to insert themselves into conversations via persuasive advertising that they pay for, but it is not as clear that people expect Facebook to work this way for their local church or other domains of their lives. It’s as if on Facebook, people were using the yellow pages but they thought they were using the white pages.  And also there are no white pages.

(Oh, wait. No one knows what yellow pages and white pages are anymore. Scratch that reference, then.)

No need to stop here, in the future perhaps Facebook can monetize my family relationships. It could suggest that if I really want anyone to know about the birth of my child, or I really want my “insightful” status updates to reach anyone, I should turn to Facebook advertising.

Let me also emphasize that this mixing problem extends to the content of our personal social media conversations as well. A few months back, I posted a Facebook status update that I thought was humorous. I shared a link highlighting the hilarious product reviews for the Bic “Cristal For Her” ballpoint pen on Amazon. It’s a pen designed just for women.

bic crystal for her

The funny thing is that I happened to look at a friend of mine’s Facebook feed over their shoulder, and my status update didn’t go away. It remained, pegged at the top of my friend’s news feed, for as long as 14 days in one instance. What great exposure for my humor, right? But it did seem a little odd… I queried my other friends on Facebook and some confirmed that the post was also pegged at the top of their news feed.

I was unknowingly participating in another Facebook program that converts organic status updates into ads. It does this by changing their order in the news feed and adding the text “Sponsored” in light gray, which is very hard to see. Otherwise at least some updates are not changed. I suspect Facebook’s algorithm thought I was advertising Amazon (since that’s where the link pointed), but I am not sure.

This is similar to Twitter’s “Promoted Tweets” but there is one big difference.  In the Facebook case the advertiser promotes content — my content — that they did not write. In effect Facebook is re-ordering your conversations with your friends and family on the basis of whether or not someone mentioned Coke, Levi’s, and Anheuser Busch (confirmed advertisers in the program).

Sounds like a great personal social media strategy there–if you really want people to know about your forthcoming wedding, maybe just drop a few names? Luckily the algorithms aren’t too clever about this yet so you can mix up the word order for humorous effect.

(Facebook status update:) “I am so delighted to be engaged to this wonderful woman that I am sitting here in my Michelob drinking a Docker’s Khaki Collection. And also Coke.”

Be sure to use links. I find the interesting thing about this mixing of the commercial and non-commercial to be that it sounds to my ears like some sort of corny, unrealistic science fiction scenario and yet with the current Facebook platform I believe the above example would work. We are living in the future.

So to recap, if Nike makes a Facebook page and posts status updates to it, that’s “organic” content because they did not pay Facebook to distribute it. Although any rational human being would see it as an ad. If my school group does the same thing, that’s also organic content, but they are encouraged to buy distribution — which would make it inorganic. If I post a status update or click “like” in reaction to something that happens in my life and that happens to involve a commercial product, my action starts out as organic, but then it becomes inorganic (paid for) because a company can buy my words and likes and show them to other people without telling me. Got it? This paragraph feels like we are rethinking CHEM 402.

The upshot is that control of the content selection algorithm is used by Facebook to get people to pay for things they wouldn’t expect to pay for, and to show people personalized things that they don’t think are paid for. But these things were in fact paid for.  In sum this is again a scheme that does not serve your goals, it serves Facebook’s goals at your expense.

The Danger: Corrupt Personalization

With these concrete examples behind us, I can now more clearly answer this student question. What are the most serious repercussions of the algorithmic allocation of attention?

I’ll call this first repercussion “corrupt personalization” after C. Edwin Baker. (Baker, a distinguished legal philosopher, coined the phrase “corrupt segmentation” in 1998 as an extension of the theories of philosopher Jürgen Habermas.)

Here’s how it works: You have legitimate interests that we’ll call “authentic.” These interests arise from your values, your community, your work, your family, how you spend your time, and so on. A good example might be that as a person who is enrolled in college you might identify with the category “student,” among your many other affiliations. As a student, you might be authentically interested in an upcoming tuition increase or, more broadly, about the contention that “there are powerful forces at work in our society that are actively hostile to the college ideal.”

However, you might also be authentically interested in the fact that your cousin is getting married. Or in pictures of kittens.

Grumpy-Cat-meme-610x405

Corrupt personalization is the process by which your attention is drawn to interests that are not your own. This is a little tricky because it is impossible to clearly define an “authentic” interest. However, let’s put that off for the moment.

In the prior examples we saw some (I hope) obvious places where my interests diverged from that of algorithmic social media systems. Highlights for me were:

  • When I express my opinion about something to my friends and family, I do not want that opinion re-sold without my knowledge or consent.
  • When I explicitly endorse something, I don’t want that endorsement applied to other things that I did not endorse.
  • If I want to read a list of personalized status updates about my friends and family, I do not want my friends and family sorted by how often they mention advertisers.
  • If a list of things is chosen for me, I want the results organized by some measure of goodness for me, not by how much money someone has paid.
  • I want paid content to be clearly identified.
  • I do not want my information technology to sort my life into commercial and non-commercial content and systematically de-emphasize the noncommercial things that I do, or turn these things toward commercial purposes.

More generally, I think the danger of corrupt personalization is manifest in three ways.

  1. Things that are not necessarily commercial become commercial because of the organization of the system. (Merton called this “pseudo-gemeinschaft,” Habermas called it “colonization of the lifeworld.”)
  2. Money is used as a proxy for “best” and it does not work. That is, those with the most money to spend can prevail over those with the most useful information. The creation of a salable audience takes priority over your authentic interests. (Smythe called this the “audience commodity,” it is Baker’s “market filter.”)
  3. Over time, if people are offered things that are not aligned with their interests often enough, they can be taught what to want. That is, they may come to wrongly believe that these are their authentic interests, and it may be difficult to see the world any other way. (Similar to Chomsky and Herman’s [not Lippman’s] arguments about “manufacturing consent.”)

There is nothing inherent in the technologies of algorithmic allocation that is doing this to us, instead the economic organization of the system is producing these pressures. In fact, we could design a system to support our authentic interests, but we would then need to fund it. (Thanks, late capitalism!)

To conclude, let’s get some historical perspective. What are the other options, anyway? If cultural selection is governed by computer algorithms now, you might answer, “who cares?” It’s always going to be governed somehow. If I said in a talk about “algorithmic culture” that I don’t like the Netflix recommender algorithm, what is supposed to replace it?

This all sounds pretty bad, so you might think I am asking for a return to “pre-algorithmic” culture: Let’s reanimate the corpse of Louis B. Mayer and he can decide what I watch. That doesn’t seem good either and I’m not recommending it. We’ve always had selection systems and we could even call some of the earlier ones “algorithms” if we want to.  However, we are constructing something new and largely unprecedented here and it isn’t ideal. It isn’t that I think algorithms are inherently dangerous, or bad — quite the contrary. To me this seems like a case of squandered potential.

With algorithmic culture, computers and algorithms are allowing a new level of real-time personalization and content selection on an individual basis that just wasn’t possible before. But rather than use these tools to serve our authentic interests, we have built a system that often serves a commercial interest that is often at odds with our interests — that’s corrupt personalization.

If I use the dominant forms of communication online today (Facebook, Google, Twitter, YouTube, etc.) I can expect content customized for others to use my name and my words without my consent, in ways I wouldn’t approve of. Content “personalized” for me includes material I don’t want, and obscures material that I do want. And it does so in a way that I may not be aware of.

This isn’t an abstract problem like a long-term threat to democracy, it’s more like a mugging — or at least a confidence game or a fraud. It’s violence being done to you right now, under your nose. Just click “like.”

In answer to your question, dear student, that’s my first danger.

* * *

ADDENDUM:

This blog post is already too long, but here is a TL;DR addendum for people who already know about all this stuff.

I’m calling this corrupt personalization because I cant just apply Baker’s excellent ideas about corrupt segments — the world has changed since he wrote them. Although this post’s reasoning is an extension of Baker, it is not a straightforward extension.

Algorithmic attention is a big deal because we used to think about media and identity using categories, but the algorithms in wide use are not natively organized that way. Baker’s ideas were premised on the difference between authentic and inauthentic categories (“segments”), yet segments are just not that important anymoreBermejo calls this the era of post-demographics.

Advertisers used to group demographics together to make audiences comprehensible, but it may no longer be necessary to buy and sell demographics or categories as they are a crude proxy for purchasing behavior. If I want to sell a Subaru, why buy access to “Brite Lights, Li’l City” (My PRIZM marketing demographic from the 1990s) when I can directly detect “intent to purchase a station wagon” or “shopping for a Subaru right now”? This complicates Baker’s idea of authentic segments quite a bit. See also Gillespie’s concept of calculated publics.

Also Baker was writing in an era where content was inextricably linked to advertising because it was not feasible to decouple them. But today algorithmic attention sorting has often completely decoupled advertising from content. Online we see ads from networks that are based on user behavior over time, rather than what content the user is looking at right now. The relationship between advertising support and content is therefore more subtle than in the previous era, and this bears more investigation.

Okay, okay I’ll stop now.

(This post was cross-posted to The Social Media Collective.)


Show and Tell: Algorithmic Culture

March 25th, 2014 by Christian

(or, What you need to know about “puppy dog hate”)

(or, “It’s not that I’m uninterestedin hygiene…”)

Last week I tried to get a group of random sophomores to care about algorithmic culture. I argued that software algorithms are transforming communication and knowledge. The jury is still out on my success at that, but in this post I’ll continue the theme by reviewing the interactive examples I used to make my point. I’m sharing them because they are fun to try. I’m also hoping the excellent readers of this blog can think of a few more.

I’ll call my three examples “puppy dog hate,” “top stories fail,” and “your DoubleClick cookie filling.”  They should highlight the ways in which algorithms online are selecting content for your attention. And ideally they will be good fodder for discussion. Let’s begin:

Three Ways to Demonstrate Algorithmic Culture

(1.) puppy dog hate (Google Instant)

You’ll want to read the instructions fully before trying this. Go to http://www.google.com/ and type “puppy”, then [space], then “dog”, then [space], but don’t hit [Enter].  That means you should have typed “puppy dog ” (with a trailing space). Results should appear without the need to press [Enter]. I got this:

Now repeat the above instructions but instead of “puppy” use the word “bitch” (so: “bitch dog “).  Right now you’ll get nothing. I got nothing. (The blank area below is intentionally blank.) No matter how many words you type, if one of the words is “bitch” you’ll get no instant results.

What’s happening? Google Instant is the Google service that displays results while you are still typing your query. In the algorithm for Google Instant, it appears that your query is checked against a list of forbidden words. If the query contains one of the forbidden words (like “bitch”) no “instant” results will be shown, but you can still search Google the old-fashioned way by pressing [Enter].

This is an interesting example because it is incredibly mild censorship, and that is typical of algorithmic sorting on the Internet. Things aren’t made to be impossible, some things are just a little harder than others. We can discuss whether or not this actually matters to anyone. After all, you could still search for anything you wanted to, but some searches are made slightly more time-consuming because you will have to press [Enter] and you do not receive real-time feedback as you construct your search query.

It’s also a good example that makes clear how problematic algorithmic censorship can be. The hackers over at 2600 reverse engineered Google Instant’s blacklist (NSFW) and it makes absolutely no sense. The blocked words I tried (like “bitch”) produce perfectly inoffensive search results (sometimes because of other censorship algorithms, like Google SafeSearch). It is not clear to me why they should be blocked. For instance, anatomical terms for some parts of the female anatomy are blocked while other parts of the female anatomy are not blocked.

Some of the blocking is just silly. For instance, “hate” is blocked. This means you can make the Google Instant results disappear by adding “hate” to the end of an otherwise acceptable query. e.g., “puppy dog hate ” will make the search results I got earlier disappear as soon as I type the trailing space. (Remember not to press [Enter].)

This is such a simple implementation that it barely qualifies as an algorithm. It also differs from my other examples because it appears that an actual human compiled this list of blocked words. That might be useful to highlight because we typically think that companies like Google do everything with complicated math and not site-by-site or word-by-word rules–they have claimed as much, but this example shows that in fact this crude sort of blacklist censorship still goes on.

Google does censor actual search results (what you get after pressing [Enter]) in a variety of ways but that is a topic for another time. This exercise with Google Instant at least gets us started thinking about algorithms, whose interests they are serving, and whether or not they are doing their job well.

(2.) Top Stories Fail (Facebook)

In this example, you’ll need a Facebook account.  Go to http://www.facebook.com/ and look for the tiny little toggle that appears under the text “News Feed.” This allows you to switch between two different sorting algorithms: the Facebook proprietary EdgeRank algorithm (this is the default), and “most recent.” (On my interface this toggle is in the upper left, but Facebook has multiple user interfaces at any given time and for some people it appears in the center of the page at the top.)

Switch this toggle back and forth and look at how your feed changes.

What’s happening? Okay, we know that among 18-29 year-old Facebook users the median number of friends is now 300. Even given that most people are not over-sharers, with some simple arithmetic it is clear that some of the things posted to Facebook may never be seen by anyone. A status update is certainly unlikely to be seen by anywhere near your entire friend network. Facebook’s “Top Stories” (EdgeRank) algorithm is the solution to the oversupply of status updates and the undersupply of attention to them, it determines what appears on your news feed and how it is sorted.

We know that Facebook’s “Top Stories” sorting algorithm uses a heavy hand. It is quite likely that you have people in your friend network that post to Facebook A LOT but that Facebook has decided to filter out ALL of their posts. These might be called your “silenced Facebook friends.” Sometimes when people do this toggling-the-algorithm exercise they exclaim: “Oh, I forgot that so-and-so was even on Facebook.”

Since we don’t know the exact details of EdgeRank, it isn’t clear exactly how Facebook is deciding which of your friends you should hear from and which should be ignored. Even though the algorithm might be well-constructed, it’s interesting that when I’ve done this toggling exercise with a large group a significant number of people say that Facebook’s algorithm produces a much more interesting list of posts than “Most Recent,” while a significant number of people say the opposite — that Facebook’s algorithm makes their news feed worse. (Personally, I find “Most Recent” produces a far more interesting news feed than “Top Stories.”)

It is an interesting intellectual exercise to try and reverse-engineer Facebook’s EdgeRank on your own by doing this toggling. Why is so-and-so hidden from you? What is it they are doing that Facebook thinks you wouldn’t like? For example, I think that EdgeRank doesn’t work well for me because I select my friends carefully, then I don’t provide much feedback that counts toward EdgeRank after that. So my initial decision about who to friend works better as a sort without further filtering (“most recent”) than Facebook’s decision about what to hide. (In contrast, some people I spoke with will friend anyone, and they do a lot more “liking” than I do.)

What does it mean that your relationship to your friends is mediated by this secret algorithm? A minor note: If you switch to “most recent” some people have reported that after a while Facebook will switch you back to Facebook’s “Top Stories” algorithm without asking.

There are deeper things to say about Facebook, but this is enough to start with. Onward. 

(3.) Your DoubleClick Cookie Filling (DoubleClick)

This example will only work if you browse the Web regularly from the same Web browser on the same computer and you have cookies turned on. (That describes most people.) Go to the Google Ads settings page — the URL is a mess so here’s a shortcut: http://bit.ly/uc256google

Look at the right column, headed “Google Ads Across The Web,” then scroll down and look for the section marked “Interests.” The other parts may be interesting too, such as Google’s estimate of your Gender, Age, and the language you speak — all of which may or may not be correct.  Here’s a screen shot:

If you have “interests” listed, click on “Edit” to see a list of topics.

What’s Happening? Google is the largest advertising clearinghouse on the Web. (It bought DoubleClick in 2007 for over $3 billion.) When you visit a Web site that runs Google Ads — this is likely quite common — your visit is noted and a pattern of all of your Web site visits is then compiled and aggregated with other personal information that Google may know about you.

What a big departure from some old media! In comparison, in most states it is illegal to gather a list of books you’ve read at the library because this would reveal too much information about you. Yet for Web sites this data collection is the norm.

This settings page won’t reveal Google’s ad placement algorithm, but it shows you part of the result: a list of the categories that the algorithm is currently using to choose advertising content to display to you. Your attention will be sold to advertisers in these categories and you will see ads that match these categories.

This list is quite volatile and this is linked to the way Google hopes to connect advertisers with people who are interested in a particular topic RIGHT NOW. Unlike demographics that are presumed to change slowly (age) or not to change at all (gender), Google appears to base a lot of its algorithm on your recent browsing history. That means if you browse the Web differently you can change this list fairly quickly (in a matter of days, at least).

Many people find the list uncannily accurate, while some are surprised at how inaccurate it is. Usually it is a mixture. Note that some categories are very specific (“Currency Exchange”), while others are very broad (“Humor”).  Right now it thinks I am interested in 27 things, some of them are:

  • Standardized & Admissions Tests (Yes.)
  • Roleplaying Games (Yes.)
  • Dishwashers (No.)
  • Dresses (No.)

You can also type in your own interests to save Google the trouble of profiling you.

Again this is an interesting algorithm to speculate about. I’ve been checking this for a few years and I persistently get “Hygiene & Toiletries.” I am insulted by this. It’s not that I’m uninterested in hygiene but I think I am no more interested in hygiene than the average person. I don’t visit any Web sites about hygiene or toiletries. So I’d guess this means… what exactly? I must visit Web sites that are visited by other people who visit sites about hygiene and toiletries. Not a group I really want to be a part of, to be honest.

These were three examples of algorithm-ish activities that I’ve used. Any other ideas? I was thinking of trying something with an item-to-item recommender system but I could not come up with a great example. I tried anonymized vs. normal Web searching to highlight location-specific results but I could not think of a search term that did a great job showing a contrast.  I also tried personalized twitter trends vs. location-based twitter trends but the differences were quite subtle. Maybe you can do better.

In my next post I’ll write about how the students reacted to all this.

 

(This was also cross-posted to The Social Media Collective.)

 


Think About New Media Algorithmically

March 20th, 2014 by Christian

(or: How to Explain Yourself to a General Audience of Sophomores)

I recently gave a guest lecture to the University of Michigan sophomore special topics course “22 Ways to Think About New Media.”  This is a course intended for students who have not yet declared a major, where each week a faculty member from a different discipline describes a “way” that they think about “New Media.”  One goal of this is “a richer appreciation of the liberal arts and sciences,” and so I was asked to consider my remarks in the context of questions like: “What is the place of your work in society? What kinds of questions do you ask? How, in short, do you think?”

Wow, that’s a tall order. Explain and defend your field — communication and information studies — to people who have never encountered it before. Tell (for example) an undergraduate interested in chemistry why they should care about your work. And say something interesting about new media. Well, I’ll give it a shot. Here’s a summary of my attempt.

I decided that the way I want people to think about New Media is “algorithmically.” I meant that as a one-word shorthand for “I am interested in algorithms,” or “I think about new media algorithms and try to understand their implications,” and not “I am an algorithm.” (*)

A central question in the study of communication is this one: How do communication and information systems and institutions organize and shape what we know and think? That is, there is a great amount of material that could be watched, read, and heard but of course we each only have time to experience a small fraction of the whole. While we have some freedom to choose what we experience, there are also processes in media systems that shape what music, movies, news, and even conversations we pay attention to. This shaping ultimately helps to determine our shared culture, and new media are now transforming these processes — and therefore our shared culture.

(For instance, Twitter’s algorithms currently think I should pay attention to #NCAAMarchMadness2014 [which is trending]. They tell me this is a recommendation “just for me” [see below]. In fact I hate sports, so perhaps Twitter hates me.)

I used the example of trashy pop bands – a student suggested One Direction – to illustrate this. There may be a large number of musicians with enough skill to comprise a trashy pop band but only a few trashy pop bands are successful at any given time. Musical talent is far more widely distributed than attention to specific bands. Even a casual music listener will agree that talent does not necessarily determine popularity. So what does?

The same is true of more serious topics—consider news. There is enough serious news to fill many newspapers but somehow it comes to be that we hear about certain topics over and over again, while other topics are ignored. How is it that the same events might get more coverage at one moment but less at another moment? It does not seem to be about the “quality” of the news story or the importance of the events, taken in isolation. At this point I employed Ethan Zuckerman’s comparison of attention to Kim Kardashian vs. famine.

Google Trends: Interest in Kardashian vs. famine

(Click to enlarge)

Ultimately this shaping and organization of communication and information determines who we are as a collective, as a public, as a society. A central problem in the study of communication and information has been: how do communication and information systems and institutions shape our knowledge and attention?

This is a particularly interesting moment to consider this topic because, while this is a perennial research problem in the study of communication (cf. Gatekeeping Theory, Agenda-Setting Theory, Framing, Priming, Cultivation Theory, Theories of the Public Sphere, etc.), the new prevalence of attention sorting algorithms on the Internet is transforming the way that attention and knowledge are shaped. A useful phrase naming the overall phenomenon is “Algorithmic Culture,” coined by Alex Galloway.

Decades ago, decisions made by a few behind-the-scenes industry professionals like legendary music producer John Hammond would be instrumental in selecting and promoting specific media content (like the musical acts of Count Basie, Bob Dylan, and Aretha Franklin), and newspaper owners like Joseph Pulitzer decided what should be spread as news (such as color comic strips or crusading investigative reporting exposing government corruption).

They may or may not have done a good job, but it is interesting that today they do not wield power in the same way. Today on the Internet many decisions about media content and advertising are made by algorithms. An algorithm, or step-by-step procedure for accomplishing something, is typically a piece of computer software that uses some data about you to determine what you will watch, hear, or read. A simple algorithm might be “show the most recent thing any friend of mine has posted” — however most algorithms in use are much more complex.

Algorighms sort both content and advertising. Older media industries often promoted content quite broadly, but now the resulting decisions may be individualized to you, meaning that no two people might see the same Web page. Although algorithms are written by people, they often have effects that are hard for any single person to anticipate.

To introduce this topic, I suggested two online readings that are intended to be accessible to a general audience. They both consider how new media are now re-shaping the selection of content online by focusing on the idea of the algorithm. I decided to forward these two from The Atlantic:

(1.) “The Algorithm Economy: Inside the Formulas of Facebook and Amazon,” by Derek Thompson, 12 March 2014, The Atlantic

This very short blog post introduces the idea that algorithms (meaning, a repeatable step-by-step procedure for accomplishing something) now drive much of our experience with new media. It contrasts two major algorithms that most people are familiar with: (1) Amazon.com product recommendations (technically called item-to-item collaborative filtering) and (2) the Facebook news feed (called EdgeRank). A key point is that all algorithms are not equal — these two implementations of algorithmic sorting of content are quite different in their implications and effects.

(2.) “A Guide to the Digital Advertising Industry That’s Watching Your Every Click,” by Joe Turow, 7 Feb 2012, The Atlantic

Most content on the Internet is available for free and supported by online advertising. This longer article is a book excerpt from the introduction of Turow’s book The Daily You. It introduces the new ways that the online advertising industry operates and describes the way that firms match customer data to online content and advertising. This article focuses on the data about audiences that must be gathered and analyzed in order to provide personalized advertising. It then raises the question of whether or not people know about this large-scale data collection about them and considers how they feel about it.

Optional extra: For a more in-depth treatment of the topic, see Tarleton Gillespie’s “The Relevance of Algorithms,” recently released in Media Technologies.

Okay, I’ll stop here for now. But in my next post, I’ll consider how to demonstrate the effects of algorithmic sorting in a simple and easy-to-understand way. Then I’ll tell you how the students reacted to all this.

(*) – Although some days I do feel like an algorithm.


Bad Behavior has blocked 111 access attempts in the last 7 days.