You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Archive for the 'Geeking' Category

Show and Tell: Algorithmic Culture

Tuesday, March 25th, 2014

(or, What you need to know about “puppy dog hate”)

(or, “It’s not that I’m uninterestedin hygiene…”)

Last week I tried to get a group of random sophomores to care about algorithmic culture. I argued that software algorithms are transforming communication and knowledge. The jury is still out on my success at that, but in this post I’ll continue the theme by reviewing the interactive examples I used to make my point. I’m sharing them because they are fun to try. I’m also hoping the excellent readers of this blog can think of a few more.

I’ll call my three examples “puppy dog hate,” “top stories fail,” and “your DoubleClick cookie filling.”  They should highlight the ways in which algorithms online are selecting content for your attention. And ideally they will be good fodder for discussion. Let’s begin:

Three Ways to Demonstrate Algorithmic Culture

(1.) puppy dog hate (Google Instant)

You’ll want to read the instructions fully before trying this. Go to http://www.google.com/ and type “puppy”, then [space], then “dog”, then [space], but don’t hit [Enter].  That means you should have typed “puppy dog ” (with a trailing space). Results should appear without the need to press [Enter]. I got this:

Now repeat the above instructions but instead of “puppy” use the word “bitch” (so: “bitch dog “).  Right now you’ll get nothing. I got nothing. (The blank area below is intentionally blank.) No matter how many words you type, if one of the words is “bitch” you’ll get no instant results.

What’s happening? Google Instant is the Google service that displays results while you are still typing your query. In the algorithm for Google Instant, it appears that your query is checked against a list of forbidden words. If the query contains one of the forbidden words (like “bitch”) no “instant” results will be shown, but you can still search Google the old-fashioned way by pressing [Enter].

This is an interesting example because it is incredibly mild censorship, and that is typical of algorithmic sorting on the Internet. Things aren’t made to be impossible, some things are just a little harder than others. We can discuss whether or not this actually matters to anyone. After all, you could still search for anything you wanted to, but some searches are made slightly more time-consuming because you will have to press [Enter] and you do not receive real-time feedback as you construct your search query.

It’s also a good example that makes clear how problematic algorithmic censorship can be. The hackers over at 2600 reverse engineered Google Instant’s blacklist (NSFW) and it makes absolutely no sense. The blocked words I tried (like “bitch”) produce perfectly inoffensive search results (sometimes because of other censorship algorithms, like Google SafeSearch). It is not clear to me why they should be blocked. For instance, anatomical terms for some parts of the female anatomy are blocked while other parts of the female anatomy are not blocked.

Some of the blocking is just silly. For instance, “hate” is blocked. This means you can make the Google Instant results disappear by adding “hate” to the end of an otherwise acceptable query. e.g., “puppy dog hate ” will make the search results I got earlier disappear as soon as I type the trailing space. (Remember not to press [Enter].)

This is such a simple implementation that it barely qualifies as an algorithm. It also differs from my other examples because it appears that an actual human compiled this list of blocked words. That might be useful to highlight because we typically think that companies like Google do everything with complicated math and not site-by-site or word-by-word rules–they have claimed as much, but this example shows that in fact this crude sort of blacklist censorship still goes on.

Google does censor actual search results (what you get after pressing [Enter]) in a variety of ways but that is a topic for another time. This exercise with Google Instant at least gets us started thinking about algorithms, whose interests they are serving, and whether or not they are doing their job well.

(2.) Top Stories Fail (Facebook)

In this example, you’ll need a Facebook account.  Go to http://www.facebook.com/ and look for the tiny little toggle that appears under the text “News Feed.” This allows you to switch between two different sorting algorithms: the Facebook proprietary EdgeRank algorithm (this is the default), and “most recent.” (On my interface this toggle is in the upper left, but Facebook has multiple user interfaces at any given time and for some people it appears in the center of the page at the top.)

Switch this toggle back and forth and look at how your feed changes.

What’s happening? Okay, we know that among 18-29 year-old Facebook users the median number of friends is now 300. Even given that most people are not over-sharers, with some simple arithmetic it is clear that some of the things posted to Facebook may never be seen by anyone. A status update is certainly unlikely to be seen by anywhere near your entire friend network. Facebook’s “Top Stories” (EdgeRank) algorithm is the solution to the oversupply of status updates and the undersupply of attention to them, it determines what appears on your news feed and how it is sorted.

We know that Facebook’s “Top Stories” sorting algorithm uses a heavy hand. It is quite likely that you have people in your friend network that post to Facebook A LOT but that Facebook has decided to filter out ALL of their posts. These might be called your “silenced Facebook friends.” Sometimes when people do this toggling-the-algorithm exercise they exclaim: “Oh, I forgot that so-and-so was even on Facebook.”

Since we don’t know the exact details of EdgeRank, it isn’t clear exactly how Facebook is deciding which of your friends you should hear from and which should be ignored. Even though the algorithm might be well-constructed, it’s interesting that when I’ve done this toggling exercise with a large group a significant number of people say that Facebook’s algorithm produces a much more interesting list of posts than “Most Recent,” while a significant number of people say the opposite — that Facebook’s algorithm makes their news feed worse. (Personally, I find “Most Recent” produces a far more interesting news feed than “Top Stories.”)

It is an interesting intellectual exercise to try and reverse-engineer Facebook’s EdgeRank on your own by doing this toggling. Why is so-and-so hidden from you? What is it they are doing that Facebook thinks you wouldn’t like? For example, I think that EdgeRank doesn’t work well for me because I select my friends carefully, then I don’t provide much feedback that counts toward EdgeRank after that. So my initial decision about who to friend works better as a sort without further filtering (“most recent”) than Facebook’s decision about what to hide. (In contrast, some people I spoke with will friend anyone, and they do a lot more “liking” than I do.)

What does it mean that your relationship to your friends is mediated by this secret algorithm? A minor note: If you switch to “most recent” some people have reported that after a while Facebook will switch you back to Facebook’s “Top Stories” algorithm without asking.

There are deeper things to say about Facebook, but this is enough to start with. Onward. 

(3.) Your DoubleClick Cookie Filling (DoubleClick)

This example will only work if you browse the Web regularly from the same Web browser on the same computer and you have cookies turned on. (That describes most people.) Go to the Google Ads settings page — the URL is a mess so here’s a shortcut: http://bit.ly/uc256google

Look at the right column, headed “Google Ads Across The Web,” then scroll down and look for the section marked “Interests.” The other parts may be interesting too, such as Google’s estimate of your Gender, Age, and the language you speak — all of which may or may not be correct.  Here’s a screen shot:

If you have “interests” listed, click on “Edit” to see a list of topics.

What’s Happening? Google is the largest advertising clearinghouse on the Web. (It bought DoubleClick in 2007 for over $3 billion.) When you visit a Web site that runs Google Ads — this is likely quite common — your visit is noted and a pattern of all of your Web site visits is then compiled and aggregated with other personal information that Google may know about you.

What a big departure from some old media! In comparison, in most states it is illegal to gather a list of books you’ve read at the library because this would reveal too much information about you. Yet for Web sites this data collection is the norm.

This settings page won’t reveal Google’s ad placement algorithm, but it shows you part of the result: a list of the categories that the algorithm is currently using to choose advertising content to display to you. Your attention will be sold to advertisers in these categories and you will see ads that match these categories.

This list is quite volatile and this is linked to the way Google hopes to connect advertisers with people who are interested in a particular topic RIGHT NOW. Unlike demographics that are presumed to change slowly (age) or not to change at all (gender), Google appears to base a lot of its algorithm on your recent browsing history. That means if you browse the Web differently you can change this list fairly quickly (in a matter of days, at least).

Many people find the list uncannily accurate, while some are surprised at how inaccurate it is. Usually it is a mixture. Note that some categories are very specific (“Currency Exchange”), while others are very broad (“Humor”).  Right now it thinks I am interested in 27 things, some of them are:

  • Standardized & Admissions Tests (Yes.)
  • Roleplaying Games (Yes.)
  • Dishwashers (No.)
  • Dresses (No.)

You can also type in your own interests to save Google the trouble of profiling you.

Again this is an interesting algorithm to speculate about. I’ve been checking this for a few years and I persistently get “Hygiene & Toiletries.” I am insulted by this. It’s not that I’m uninterested in hygiene but I think I am no more interested in hygiene than the average person. I don’t visit any Web sites about hygiene or toiletries. So I’d guess this means… what exactly? I must visit Web sites that are visited by other people who visit sites about hygiene and toiletries. Not a group I really want to be a part of, to be honest.

These were three examples of algorithm-ish activities that I’ve used. Any other ideas? I was thinking of trying something with an item-to-item recommender system but I could not come up with a great example. I tried anonymized vs. normal Web searching to highlight location-specific results but I could not think of a search term that did a great job showing a contrast.  I also tried personalized twitter trends vs. location-based twitter trends but the differences were quite subtle. Maybe you can do better.

In my next post I’ll write about how the students reacted to all this.

 

(This was also cross-posted to The Social Media Collective.)

 

No Dial Tone

Thursday, August 4th, 2011

(or: The End of Reliability)

(or: Why is the FCC Broadband Study Good News?)

(or: Comcast Digital Voice Gets me One Service Outage Every 63 Days)

I just spent my morning troubleshooting my Comcast digital voice telephone. I get my phone service via my Comcast cable modem… which is to say, over the Internet.  It wasn’t working.

I followed the instructions on the Web troubleshooting wizard, which required me to dig up the only corded telephone that I still own.  I finally found it in the basement. It dates from the late 1980s and it still has the speed-dial list written in pencil… speed dial #1 is “Live 105 Request Line.” Anyone get that reference?

Anyway, after switching to corded phones and re-wiring my home entertainment center so that I could easily get to the back of my Comcast cable modem with a paperclip, nothing had changed. I still had no dial tone.  Finally after a chat session with the Comcast customer support they sent a mysterious reset signal to my house that solved the issue.

But this made me reflect… I’ve had Comcast digital voice service since March 28 (see my previous post about how hard it was to get).  So that’s four months and one week.  In that time I’ve had two major telephone outages.

The first was a neighborhood-wide outage that was corrected two hours after I noticed it.  This one went on for two days until we noticed it — according to reports from a friend that couldn’t reach us.  (Since we didn’t dial out during that time, we just thought no one was calling us.)

Almost a tangent: I’m also concerned about my backup battery, as the battery light on my modem sometimes turns off and on by itself.  (In the old phone system backup batteries used to be centralized but with digital voice over the cable network each cable modem has to have one.) I haven’t gotten around to complaining about that — I’m not sure if I have the energy.

The whole experience screams: not-ready-for-prime-time. Cheap-looking flimsy gray plastic boxes that have to be reset with paperclips. Nothing like the good old Model 500 telephone from Western Electric (pictured). That thing was solid as a rock–and as heavy as one.

The whole time I had plain-old-telephone-service from AT&T I never had any service problems.  Currently my average with Comcast digital cable is one service outage every sixty three days — and those are only the ones that I noticed.

Is this the way of modern telecommunications? Reduced regulatory requirements lead to the death of reliability?

Crappy cell phone service quality has softened us up to expect poor quality across other areas of telecommunications. The quirky and opaque nature of Internet service is no help.  Low quality there seems to be lowering standards elsewhere as well.

This week in the media a FCC study of broadband speeds has been trumpeted across all major outlets.  The finding that made the news? ISPs now deliver 80-90% of their advertised speeds. This is hailed as a triumph. (And it’s an increase since the 2009 report.)

Yet another way to present the same numbers is: Only two ISPs out of every single one studied by the government actually provided the speeds that they advertise (see p. 15 of the report). In telecom that’s the kind of news that we’re happy about these days. It’s a new era.

Slow home network? Check the router.

Sunday, March 27th, 2011

(or, A geeky interlude from our regular blogging.)

I’m trying to improve my Internet connection speed and home and my home network generally.  (That partly explains my last post, too.)

After a series of tests last week, I was astonished to discover that one of the key bottlenecks was my router.  I bought a router that says 10/100 Ethernet on the box, meaning it supports both 10Base-T and 100Base-TX. I was assuming that if I connected Fast Ethernet devices to it with an ordinary Category 5 cable I would get 100 Mbit/s of throughput in each direction.  But I was getting about 12.


(Cat 5 cable. Image credit: Wikimedia commons.)

The Eureka moment came when I found this chart over at smallnetbuilder.com. My router came in 63rd out of 64 routers tested on total throughput.  On WAN to LAN download speed, a key metric, it came in 59th out of 64 routers tested.

Although it said 100 Mbit/s on the box, it actually maxes out at 20 according to these tests.  (Like I said, I was getting about 12.)

The offender was a PepLink Balance 30 — actually a load balancing switch.  Maybe it is the load balancing that makes it so terrible? I don’t know. At one time I was so desperate for throughput I had multiple ISPs at the same time and I was aggregating them. I have abandoned the idea of load balancing, so there’s no point to it now.

Time to treat myself to an ASUS Black Diamond Dual-Band Gigabit Wireless-N Router. I’m worth it. I deserve it. Tested maximum throughput 1,268 Mbit/s.  We should notice a difference between that and 12. Damn it.

Confessions of a Spy Car Driver

Friday, May 28th, 2010

(or: Inadvertently Illegal Programming, A Primer)

Earlier this month, Google’s official engineering blog confessed that the company’s Street View cars and bikes have “inadvertently” gathered personal data in transit on unencrypted Wi-Fi networks for the past three years (see the post: Wi-Fi Data Collection).  As chronicled in major news stories in the past three weeks, Google’s actions are under scrutiny by government regulators everywhere (see links to news stories at the end of this post).

[One of Google’s Ominous-Looking Spy Cars
photo by byrion — click to enlarge]

This is a topic close to my heart because my research group has been conducting similar surveys of wireless signals for the past five years as part of a project funded by the US National Science Foundation.  Here’s a picture of our own slightly less obtrusive Wi-Fi sampling car in South Central Los Angeles in 2005.  (On second thought, we shouldn’t have chosen a black SUV.  Too scary.)

Read the rest of this entry »

What I learned from ROFLcon

Thursday, May 6th, 2010

(or: You’re Internet Famous, I’m Internet Serious)

O hai dear reader!

(The ROFLcon II official T-Shirt. Click to enlarge.)

At ROFLcon this year I had the honor and privilege of moderating the panel “And the Internet Swooped In.”  What a lineup!  I got to moderate the following Internet celebrities:

Mahir Cagri (of “I kiss you” fame) — the author of the most famous personal home page on the World Wide Web and perhaps the first individual to become “Internet Famous.” (The dancing baby and hamsterdance were not really people, after all.)  Mahir doesn’t speak English well, he speaks Turkish.  He appeared with his dodgy-looking manager.

David DeVore (Jr. and Sr.), of David After Dentist — one of the most popular home movies ever produced and one of the most popular videos on YouTube (58m views+).  David DeVore, Jr. is now 9 years old and I have never moderated a panel with a 9-year-old before.

Charlie Schmidt, creator of the original Keyboard Cat video.  He also created one of the most popular home movies of all time — but it was recorded in 1984 on videotape, then uploaded to YouTube and subsequently discovered by Brad O’Farrell, who turned the footage into the “Play Him Off, Keyboard Cat” meme.  There are now over 4,000 derivative “Play Him Off, Keyboard Cat” videos on YouTube.  (Brad was in the audience.)

(Click to enlarge. Photo by extraface on flickr)

With a Turkish-speaker, a 9-year-old, and no irony apparent anywhere on the panel, it was one of the most difficult moderation assignments I’ve ever been given. The video will be available in about a week so you can see for yourself how it went.  I loved it!  Until then, Alex Leavitt liveblogged my panel.  (Here is his summary:  http://roflcon.org/2010/05/04/liveblog-and-the-internet-swooped-in/ )

But beyond my panel, here’s my big list of…

What I Learned From ROFLcon

Read the rest of this entry »

Bad Behavior has blocked 100 access attempts in the last 7 days.