What They Know

You are currently browsing articles tagged What They Know.

(Cross-posted from the ProjectVRM blog.)

left r-buttonright r-buttonFor as long as we’ve had economies, demand and supply have been attracted to each other like a pair of magnets. Ideally, they should match up evenly and produce good outcomes. But sometimes one side comes to dominate the other, with bad effects along with good ones.

Such has been the case on the Web ever since it went commercial with the invention of the cookie in 1995, resulting in a  in which the demand side — that’s you and me — plays the submissive role of mere “users,” who pretty much have to put up with whatever rules websites set on the supply side.

Consistent with  (“Power corrupts; absolute power corrupts absolutely”) the near absolute power of website cows over user calves has resulted in near-absolute corruption of website ethics in respect to personal privacy.

This has been a subject of productive obsession by  and her team of reporters at The Wall Street Journal, which have been producing the  series (shortcut: http://wsj.com/wtk) since July 30, 2010, when Julia by-lined . The next day I called that piece a turning point. And I still believe that.

Today came another one, again in the Journal, in Julia’s latest, titled Web Firms to Adopt ‘No Track’ Button. She begins,

A coalition of Internet giants including Google Inc. has agreed to support a do-not-track button to be embedded in most Web browsers—a move that the industry had been resisting for more than a year.

The reversal is being announced as part of the White House’s call for Congress to pass a “privacy bill of rights,” that will give people greater control over the personal data collected about them.

The long White House press release headline reads,

We Can’t Wait: Obama Administration Unveils Blueprint for a “Privacy Bill of Rights” to Protect Consumers Online

Internet Advertising Networks Announces Commitment to “Do-Not-Track” Technology to Allow Consumers to Control Online Tracking

Obviously, government and industry have been working together on this one. Which is good, as far as it goes. Toward that point, Julia adds,

The new do-not-track button isn’t going to stop all Web tracking. The companies have agreed to stop using the data about people’s Web browsing habits to customize ads, and have agreed not to use the data for employment, credit, health-care or insurance purposes. But the data can still be used for some purposes such as “market research” and “product development” and can still be obtained by law enforcement officers.

The do-not-track button also wouldn’t block companies such as Facebook Inc. from tracking their members through “Like” buttons and other functions.

“It’s a good start,” said Christopher Calabrese, legislative counsel at the American Civil Liberties Union. “But we want you to be able to not be tracked at all if you so choose.”

In the New York Times’ White House, Consumers in Mind, Offers Online Privacy Guidelines Edward Wyatt writes,

The framework for a new privacy code moves electronic commerce closer to a one-click, one-touch process by which users can tell Internet companies whether they want their online activity tracked.

Much remains to be done before consumers can click on a button in their Web browser to set their privacy standards. Congress will probably have to write legislation governing the collection and use of personal data, officials said, something that is unlikely to occur this year. And the companies that make browsers — Google, Microsoft, Apple and others — will have to agree to the new standards.

No they won’t. Buttons can be plug-ins to existing browsers. And work has already been done. VRM developers are on the case, and their ranks are growing. We have dozens of developers (at that last link) working on equipping both the demand and the supply side with tools for engaging as independent and respectful parties. In fact we already have a button that can say “Don’t track me,” plus much more — for both sides. Its calle the R-button, and it looks like this: ⊂ ⊃. (And yes, those symbols are real characters. Took a long time to find them, but they do exist.)

Yours — the user’s — is on the left. The website’s is on the right. On a browser it might look like this:

r-button in a browser

Underneath both those buttons can go many things, including preferences, policies, terms, offers, or anything else — on both sides. One of those terms can be “do not track me.” It might point to a fourth party (see explanations here and here) which, on behalf of the user or customer, maintains settings that control sharing of personal data, including the conditions that must be met. A number of development projects and companies are already on this case. Some have personal data stores (PDSes), also called “lockers” or “vaults.” These include:

Three of those are in the U.S., one in Austria, one in France, one in South Africa, and three in the U.K. (All helping drive the Midata project by the U.K. government, by the way.) And those are just companies with PDSes. There are many others working on allied technologies, standards, protocols and much more. They’re all just flying below media radar because media like to look at what big suppliers and governments are doing. Speaking of which… :-)

Here’s Julia again:

Google is expected to enable do-not-track in its Chrome Web browser by the end of this year.

Susan Wojcicki, senior vice president of advertising at Google, said the company is pleased to join “a broad industry agreement to respect the ‘Do Not Track’ header in a consistent and meaningful way that offers users choice and clearly explained browser controls.”

White House Deputy Chief Technology Officer Daniel Weitzner said the do-not-track option should clear up confusion among consumers who “think they are expressing a preference and it ends up, for a set of technical reasons, that they are not.”

Some critics said the industry’s move could throw a wrench in a separate year-long effort by the World Wide Web consortium to set an international standard for do-not-track. But Mr. Ingis said he hopes the consortium could “build off of” the industry’s approach.

So here’s an invitation to the White House, Google, the 3wC, interested BigCos (including CRM companies), developers of all sizes and journalists who are interested in building out genuine and cooperative relationships between demand and supply::::

Join us at IIW — the Internet Identity Workshop — in Mountain View, May 1-3. This is the unconference where developers and other helpful parties gather to talk things over and move development forward. No speakers, no panels, no BS. Just good conversation and productive work. It’s our fourteenth one, and they’ve all been highly productive.

As for the r-button, take it and run with it. It’s there for the development. It’s meaningful. We’re past square one. We’d love to have all the participation we can get, from the big guys as well as the little ones listed above and here.

To help get your thinking started, visit this presentation of one r-button scenario, by Adam Marcus of MIT. Here’s another view of the same work, which came of of a Google Summer of Code project through ProjectVRM and the Berkman Center:

(Props to Oshani Seneviratne and David Karger, also both of MIT, and Ahmad Bakhiet, of Kings College London, for work on that project.)

If we leave fixing the calf-cow problem entirely up to the BigCos and BigGov, it won’t get fixed. We have to work from the demand side as well. In economies, customers are the 100%.

Here are some other stories, mostly gathered by Zemanta:

All look at the symptoms, and supply-side cures. Time for the demand side to demand answers from itself. Fortunately, we’ve been listening, and the answers are coming.

Tags: , , , , , , , ,

In The Data Bubble, I told readers to mark the day: 31 July 2010. That’s when The Wall Street Journal published The Web’s Gold Mine: Your Secrets, subtitled A Journal investigation finds that one of the fastest-growing businesses on the Internet is the business of spying on consumers. First in a series. That same series is now nine stories long, not counting the introduction and a long list of related pieces. Here’s the current list:

  1. The Web’s Gold Mine: What They Know About You
  2. Microsoft Quashed Bid to Boost Web Privacy
  3. On the Web’s Cutting Edge: Anonymity in Name Only
  4. Stalking by Cell Phone
  5. Google Agonizes Over Privacy
  6. Kids Face Intensive Tracking on Web
  7. ‘Scrapers’ Dig Deep for Data on the Web
  8. Facebook in Privacy Breach
  9. A Web Pioneer Profiles Users By Name

Related pieces—

Two things I especially like about all this. First, Julia Angwin and her team are doing a terrific job of old-fashioned investigative journalism here. Kudos for that. Second, the whole series stands on the side of readers. The second person voice (you, your) is directed to individual persons—the same persons who do not sit at the tables of decision-makers in this crazy new hyper-personalized advertising business.

To measure the delta of change in that business, start with John Battelle‘s Conversational Marketing series (post 1, post 2, post 3) from early 2007, and then his post Identity and the Independent Web, from last week. In the former he writes about how the need for companies to converse directly with customers and prospects is both inevitable and transformative. He even kindly links to The Cluetrain Manifesto (behind the phrase “brands are conversations”).

In his latest he observes some changes in the Web itself:

Here’s one major architectural pattern I’ve noticed: the emergence of two distinct territories across the web landscape. One I’ll call the “Dependent Web,” the other is its converse: The “Independent Web.”

The Dependent Web is dominated by companies that deliver services, content and advertising based on who that service believes you to be: What you see on these sites “depends” on their proprietary model of your identity, including what you’ve done in the past, what you’re doing right now, what “cohorts” you might fall into based on third- or first-party data and algorithms, and any number of other robust signals.

The Independent Web, for the most part, does not shift its content or services based on who you are. However, in the past few years, a large group of these sites have begun to use Dependent Web algorithms and services to deliver advertising based on who you are.

A Shift In How The Web Works?

And therein lies the itch I’m looking to scratch: With Facebook’s push to export its version of the social graph across the Independent Web; Google’s efforts to personalize display via AdSense and Doubleclick; AOL, Yahoo and Demand building search-driven content farms, and the rise of data-driven ad exchanges and “Demand Side Platforms” to manage revenue for it all, it’s clear that we’re in the early phases of a major shift in the texture and experience of the web.

He goes on to talk about how “these services match their model of your identity to an extraordinary machinery of marketing dollars“, and how

When we’re “on” Facebook, Google, or Twitter, we’re plugged into an infrastructure (in the case of the latter two, it may be a distributed infrastructure) that locks onto us, serving us content and commerce in an automated but increasingly sophisticated fashion. Sure, we navigate around, in control of our experience, but the fact is, the choices provided to us as we navigate are increasingly driven by algorithms modeled on the service’s understanding of our identity.

And here is where we get to the deepest, most critical problem: Their understanding of our identity is not the same as our understanding of our identity. What they have are a bunch of derived assumptions that may or may not be correct; and even if they are, they are not ours. This is a difference in kind, not degree. It doesn’t matter how personalized anybody makes advertising targeted at us. Who we are is something we possess and control—or would at least like to think we do—no matter how well some of us (such as advertisers) rationalize the “socially derived” natures of our identities in the world.

It is standard for people in the ad business to equate assent with approval, and John’s take on this is a good example of that. Sez he,

We know this, and we’re cool with the deal.

In fact we don’t know, we’re not cool with it, and it isn’t a deal.

If we knew, the Wall Street Journal wouldn’t have a reason to clue us in at such length.

We’re cool with it only to the degree that we are uncomplaining about it—so far.

And it isn’t a “deal” because nothing was ever negotiated.

On that last point, our “deals” with vendors on the Web are agreements in name only. Specifically, they are a breed of assent called contracts of adhesion. Also called standard form or boilerplate contracts, they are what you get when a dominant party sets all the terms, there is no room for negotiation, and the submissive party has a choice only to accept the terms or walk away. The term “adhesion” refers to the nailed-down nature of the submissive party’s position, while the dominant party is free to change the terms any time it wishes. Next time you “agree” to terms you haven’t read, go read them and see where it says the other party reserves the right to change the terms.

There is a good reason why we have had these kinds of agreements since the dawn of e-commerce. It’s because that’s the way the Web was built. Only one party—the one with the servers and the services—was in a position to say what was what. It’s still that way. The best slide I’ve seen in the last several years is one of Phil Windley‘s. It says,

HISTORY OF E-COMMERCE

1995: Invention of the Cookie.

The End.

About all we’ve done since 1995 on the sell side is improve the cookie-based system of “relating” to users. This is a one-way take-it-or-leave-it system that has become lame and pernicious in the extreme. We can and should do better than that.

Phil’s own company, Kynetx, has come up with a whole new schema. Besides clients and servers (which don’t go away), you’ve got end points, events, rules and rules engines to execute the rules. David Siegel’s excellent book, The Power of Pull, describes how the Semantic Web also offers a rich and far more flexible and useful alternative to the Web’s old skool model. His post yesterday is a perfect example of liberated thinking and planning that transcends the old cookie-limited world. The man is on fire. Dig his first paragraph:

Monday I talked about the social networking bubble. Marketers are getting sucked into the social-networking vortex and can’t find their way out. The problem is that most companies are trying small tactical improvements, hoping to improve sales a bit and trying tactical savings programs, hoping to improve margins a bit. Yet there’s a whole new curve of efficiency waiting in the world of pull. It’s time to start talking about savingtrillions, not millions. Companies should think in terms of big, strategic, double-digit improvements, new markets, and new ways to cooperate. Here is a road map

Read on. (I love that he calls social networking a “bubble”. I’m with that.)

This week at IIW in Mountain View, we’re going to be talking about, and working on, improving markets from the buyers’ side. (Through VRM and other means.) On the table will be whole new ways of relating, starting with systems by which users and customers can offer their own terms of engagement, their own policies, their own preferences (even their own prices and payment options)—and by which sellers and site operators can signal their openness to those terms (even if they’re not yet ready to accept them). The idea here is to get buyers out of their shells and sellers out of their silos, so they can meet and deal for real in a truly open marketplace. (This doesn’t have to be complicated. A lot of it can be automated. And, if we do it right, we can skip a lot of the pointless one-sided agreement-clicking friction we now take for granted.)

Right now it’s hard to argue against all the money being spent (and therefore made) in the personalized advertising business—just like it was hard to argue against the bubble in tech stock prices in 1999 and in home prices in 2004. But we need to come to our senses here, and develop new and better systems by which demand and supply can meet and deal with each other as equally powerful parties in the open marketplace. Some of the tech we need for that is coming into being right now. That’s what we should be following. Not just whether Google, Facebook or Twitter will do the best job of putting crosshairs on our backs.

John’s right that the split is between dependence and independence. But the split that matters most is between yesterday’s dependence and tomorrow’s independence—for ourselves. If we want a truly conversational economy, we’re going to need individuals who are independent and self-empowered. Once we have that, the level of economic activity that follows will be a lot higher, and a lot more productive, than we’re getting now just by improving the world’s biggest guesswork business.

Tags: , , , , , , ,

Back on July 31 I posted The Data Bubble in response to the first of The Wall Street Journal‘s landmark series of articles and Web postings on the topic of unwelcome (and, to their targets, mostly unknown) user tracking.

A couple days ago I began to get concerned about how much time had passed since the last posting, on August 12. So I tweeted, Hey @whattheyknow, is your Wall Street Journal series done? If not, when are we going to see more entries? Last I saw was >1 month ago.

Then yesterday @WhatTheyKnow tweeted back, @dsearls: Ask and ye shall receive: http://on.wsj.com/9DTpdP. Nice!

The piece is titled On the Web, Children Face Intensive Tracking, by Steve Stecklow, and it’s a good one indeed. To start,

The Journal examined 50 sites popular with U.S. teens and children to see what tracking tools they installed on a test computer. As a group, the sites placed 4,123 “cookies,” “beacons” and other pieces of tracking technology. That is 30% more than were found in an analysis of the 50 most popular U.S. sites overall, which are generally aimed at adults.

The most prolific site: Snazzyspace.com, which helps teens customize their social-networking pages, installed 248 tracking tools. Its operator described the site as a “hobby” and said the tracking tools come from advertisers.

Should we call cookies for kids “candy”? Hey, why not?

Once again we see the beginning of the end of fettered user tracking. Such as right here:

Many kids’ sites are heavily dependent on advertising, which likely explains the presence of so many tracking tools. Research has shown children influence hundreds of billions of dollars in annual family purchases.

Google Inc. placed the most tracking files overall on the 50 sites examined. A Google spokesman said “a small proportion” of the files may be used to determine computer users’ interests. He also said Google doesn’t include “topics solely of interest to children” in its profiles.

Still, Google’s Ads Preferences page displays what Google has determined about web users’ interests. There, Google accurately identified a dozen pastimes of 10-year-old Jenna Maas—including pets, photography, “virtual worlds” and “online goodies” such as little animated graphics to decorate a website.

“It is a real eye opener,” said Jenna’s mother, Kate Maas, a schoolteacher in Charleston, S.C., viewing that data.

Jenna, now in fifth grade, said: “I don’t like everyone knowing what I’m doing and stuff.”

A Google spokesman said its preference lists are “based on anonymous browser activity. We don’t know if it’s one user or four using a particular browser, or who those users are.” He said users can adjust the privacy settings on their browser or use the Ads Preferences page to limit data collection.

I went and checked my own Ads Preferences page (http://www.google.com/ads/preferences) and found that I had opted out of Google’s interest-based advertising sometime in the past. I barely remember doing that, but I’m not surprised I did. On the whole I think most people would opt to turn that kind of stuff off, just to get a small measure of shelter amidst the advertising blizzard that the commercial Web has become.

Finding Google’s opt-out control box without a flashlight, however, is a bit of a chore. Worse, Google is just one company. The average user has to deal with dozens or hundreds of other (forgive me) cookie monsters, each with its own opt-out/in control boxes (or lack of them). And I suspect that most of those others are far less disclosing about their practices (and respectful of users) than Google is.

(But I have no research to back that up—yet. If anybody does, please let me have it. There’s a whole chapter in a book I’m writing that’s all about this kind of stuff.)

Meanwhile, says the Journal,

Parents hoping to let their kids use the Internet, while protecting them from snooping, are in a bind. That’s because many sites put the onus on visitors to figure out how data companies use the information they collect.

Exactly. And what are we to do? Depend on the site owners and their partners? Not in the absence of help, that’s for sure. The Journal again:

Gaiaonline.com—where teens hang out together in a virtual world—says in its privacy policy that it “cannot control the activities” of other companies that install tracking files on its users’ computers. It suggests that users consult the privacy policies of 11 different companies.

In a statement, gaiaonline.com said, “It is standard industry practice that advertisers and ad networks are bound by their own privacy policy, which is why we recommend that our users review those.” The Journal’s examination found that gaiaonline.com installed 131 tracking files from third parties, such as ad networks.

An executive at a company that installed several of those 131 files, eXelate Media Ltd., said in an email that his firm wasn’t collecting or selling teen-related data. “We currently are not specifically capturing or promoting any ‘teen’ oriented segments for marketing purposes,” wrote Mark S. Zagorski, eXelate’s chief revenue officer.

But the Journal found that eXelate was offering data for sale on 5.9 million people it described as “Age: 13-17.” In a later interview, Mr. Zagorski confirmed eXelate was selling teen data. He said it was a small part of its business and didn’t include personal details such as names.

BlueKai Inc., which auctions data on Internet users, also said it wasn’t offering for sale data on minors. “We are not selling data on kids,” chief executive Omar Tawakol wrote in an email. “Let there be no doubt on what we do.”

However, another data-collecting company, Lotame Solutions Inc., told the Journal that it was selling what it labeled “teeny bopper” data on kids age 13 to 19 via BlueKai’s auctions. “If you log into BlueKai, you’ll see ‘teeny boppers’ available for sale,” said Eric L. Porres, Lotame’s chief marketing officer.

Mr. Tawakol of BlueKai later confirmed the “teeny bopper” data had been for sale on BlueKai’s exchange but no one had ever bought it. He said as a result of the Journal’s inquiries, BlueKai had removed it.

The FTC is reviewing the only federal law that limits data collection about kids, the Children’s Online Privacy Protection Act, or Coppa. That law requires sites aimed at children under 13 to obtain parental permission before collecting, using or disclosing a child’s “personal information” such as name, home or email address, and phone and Social Security number. The law also applies to general-audience sites that knowingly collect personal information from kids.

So we have pots and kettles calling each other black while copping out of responsibility in any case—and then, naturally, turning toward government for help.

My own advice: let’s not be so fast with that. Let’s continue to expose bad practices, but let’s also fix the problem on the users’ end. Because what we really need here are tools by which individuals (including parents) can issue their own global preferences, their own terms of engagement,  their own controls, and their own ends of relationships with companies that serve them.

These tools need to be be based on open standards, code and protocols, and independent of any seller. Where they require trusted intermediaries, those parties should be substitutable, so individuals are not locked in again.

And guess what? We’re working on those. Here’s what I wrote last month in Cooperation vs. Coercion:

What we need now is for vendors to discover that free customers are more valuable than captive ones. For that we need to equip customers with better ways to enjoy and express their freedom, including ways of engaging that work consistently for many vendors, rather than in as many different ways ways as there are vendors — which is the “system” (that isn’t) we have now.

There are lots of VRM development efforts working on both the customer and vendor sides of this challenge. In this post I want to draw attention to the symbols that represent those two sides, which we call r-buttons, two of which appear [in the example below]. Yours is the left one. The vendor’s is the right one. They face each other like magnets, and are open on the facing ends.

These are designed to support what Steve Gillmor calls gestures, which he started talking about back in 2005 or so. I paid some respect to gestures (though I didn’t yet understand what he meant) in The Intention Economy, a piece I wrote for Linux Journal in 2006. (That same title is also the one for book I’m writing for Harvard Business Press. The subtitle is What happens when customers get real power.) On the sell side, in a browser environment, the vendor puts some RDFa in its HTML that says “We welcome free customers.” That can mean many things, but the most important is this: Free customers bring their own means of engagement. It also means they bring their own terms of engagement.

Being open to free customers doesn’t mean that a vendor has to accept the customer’s terms. It does mean that the vendor doesn’t believe it has to provide all those terms itself, through the currently defaulted contracts of adhesion that most of us click “accept” for, almost daily. We have those because from the dawn of e-commerce sellers have assumed that they alone have full responsibility for relationships with customers. Maybe now that dawn has passed, we can get some daylight on other ways of getting along in a free and open marketplace.

The gesture shown here —

— is the vendor (in this case the public radio station KQED, which I’m just using as an example here) expressing openness to the user, through that RDFa code in its HTML. Without that code, the right-side r-button would be gray. The red color on the left side shows that the user has his or her own code for engagement, ready to go. (I unpack some of this stuff here.)

Putting in that RDFa would be trivial for a CRM system. Or even for a CMS (content management system). Next step: (I have Craig Burton leading me on this… he’s on the phone with me right now…) RESTful APIs for customer data. Check slide 69 here. Also slides 98 and 99. And 122, 124, 133 and 153.

If I’m not mistaken, a little bit of RDFa can populate a pop-down menu on the site’s side that might look like this:

All the lower stuff is typical “here are our social links” jive. The important new one is that item at the top. It’s the new place for “legal” (the symbol is one side of a “scale of justice”) but it doesn’t say “these are our non-negotiable terms of service (or privacy policies, or other contracts of adhesion). Just by appearing there it says “We’re open to what you bring to the table. Click here to see how.” This in turn opens the door to a whole new way for buyers and sellers to relate: one that doesn’t need to start with the buyer (or the user) just “accepting” terms he or she doesn’t bother to read because they give all advantages to the seller and are not negotiable. Instead it is an open door like one in a store. Much can be implicit, casual and free of obligation. No new law is required here. Just new practice. This worked for Creative Commons (which neither offered nor required new copyright law), and it can work for r-commerce (a term I just made up). As with Creative Commons, what happens behind that symbol can be machine, lawyer or human-readable. You don’t have to click on it. If your policy as a buyer is that you don’t want to to be tracked by advertisers, you can specify that, and the site can hear and respond to it. The system is, as Renee Lloyd puts it, the difference between a handcuff and a handshake.

Giving customers means for showing up in the marketplace with their own terms of engagement is a core job right now for VRM. Being ready to deal with customers who bring their own terms is equally important for CRM. What I wrote here goes into some of the progress being made for both. Much more is going on as well. (I’m writing about this stuff because these are the development projects I’m involved with personally. There are many others.)

You can check out some of those others here.

Bonus link: Tracking the Companies that Track You Online. That’s a Fresh Air interview by Dave Davies of Julia Angwin, senior technology editor of The Wall Street Journal and the lead reporter on the What They Know series.

Tags: , ,

“I make my living off the Evening News
Just give me something: something I can use
People love it when you lose
They love dirty laundry.

Don Henley, “Dirty Laundry”

Look up “Wikipedia loses” (with the quotes) and you get 20,800 results. Look up “Wikipedia has lost” and you get 56,900. (Or at least that’s what I got this morning.) Most of those results tell a story, which is what news reports do. “What’s the story?” may be the most common question asked of reporters by their managing editors. As humans, we are interested in stories — even if they’re contrived, which is what we have with all “reality” television shows.

Lately Wikipedia itself is the subject of a story about losing editors. The coverage snowball apparently started rolling with Volunteers Log Off as Wikipedia Ages, by Julia Angwin and Geoffrey A. Fowler in The Wall Street Journal. It begins,

Wikipedia.org is the fifth-most-popular Web site in the world, with roughly 325 million monthly visitors. But unprecedented numbers of the millions of online volunteers who write, edit and police it are quitting.

That could have significant implications for the brand of democratization that Wikipedia helped to unleash over the Internet — the empowerment of the amateur.

Volunteers have been departing the project that bills itself as “the free encyclopedia that anyone can edit” faster than new ones have been joining, and the net losses have accelerated over the past year. In the first three months of 2009, the English-language Wikipedia …

That’s all you get without paying. Still, it’s enough.

Three elements make stories interesting: 1) a protagonist we know, or is at least interesting; 2) a struggle of some kind; and 3) movement (or possible movement) toward a resolution. Struggle is at the heart of a story. There has to be a problem (what to do with Afghanistan), a conflict (a game between good teams, going to the final seconds), a mystery (wtf was Tiger Woods’ accident all about?), a wealth of complications (Brad and Angelina), a crazy success (the iPhone), failings of the mighty (Nixon and Watergate). The Journal‘s Wikipedia story is of the Mighty Falling variety.

The Journal’s source is Wikipedia: A Quantitative Analysis, a doctoral thesis by José Phillipe Ortega of Universidad Rey San Carlos in Madrid. (The graphic at the top of this post is one among many from the study.) In Wikipedia’s Volunteer Story, Erik Moeller and Erik Zachte of the Wikimedia Foundation write,

First, it’s important to note that Dr. Ortega’s study of editing patterns defines as an editor anyone who has made a single edit, however experimental. This results in a total count of three million editors across all languages.  In our own analytics, we choose to define editors as people who have made at least 5 edits. By our narrower definition, just under a million people can be counted as editors across all languages combined.  Both numbers include both active and inactive editors.  It’s not yet clear how the patterns observed in Dr. Ortega’s analysis could change if focused only on editors who have moved past initial experimentation.

Even more importantly, the findings reported by the Wall Street Journal are not a measure of the number of people participating in a given month. Rather, they come from the part of Dr. Ortega’s research that attempts to measure when individual Wikipedia volunteers start editing, and when they stop. Because it’s impossible to make a determination that a person has left and will never edit again, there are methodological challenges with determining the long term trend of joining and leaving: Dr. Ortega qualifies as the editor’s “log-off date” the last time they contributed. This is a snapshot in time and doesn’t predict whether the same person will make an edit in the future, nor does it reflect the actual number of active editors in that month.

Dr. Ortega supplements this research with data about the actual participation (number of changes, number of editors) in the different language editions of our projects. His findings regarding actual participation are generally consistent with our own, as well as those of other researchers such as Xerox PARC’s Augmented Social Cognition research group.

What do those numbers show?  Studying the number of actual participants in a given month shows that Wikipedia participation as a whole has declined slightly from its peak 2.5 years ago, and has remained stable since then. (See WikiStats data for all Wikipedia languages combined.) On the English Wikipedia, the peak number of active editors (5 edits per month) was 54,510 in March 2007. After a more significant decline by about 25%, it has been stable over the last year at a level of approximately 40,000. (See WikiStats data for the English Wikipedia.) Many other Wikipedia language editions saw a rise in the number of editors in the same time period. As a result the overall number of editors on all projects combined has been stable at a high level over recent years. We’re continuing to work with Dr. Ortega to specifically better understand the long-term trend in editor retention, and whether this trend may result in a decrease of the number of editors in the future.

They add details that amount to not much of a story, if you consider all the factors involved, including the maturity of Wikipedia itself.

As it happens I’m an editor of Wikipedia, at least by the organization’s own definitions. I’ve made fourteen contributions, starting with one in April 2006, and ending, for the moment, with one I made this morning. Most involve a subject I know something about: radio. In particular, radio stations, and rules around broadcast engineering. The one this morning involved edits to the WQXR-FM entry. The edits took a lot longer than I intended — about an hour, total — and were less extensive than I would have made, had I given the job more time and had I been more adept at editing references and citations. (It’s pretty freaking complicated.) The preview method of copy editing is also time consuming as well as endlessly iterative. It was sobering to see how many times I needed to go back and forth between edits and previews before I felt comfortable that I had contributed accurate and well-written copy.

In fact, as I look back over my fourteen editing efforts, I can see that most of them were to some degree experimental. I wanted to see if I had what it took to be a dedicated Wikipedia editor, because I regard that as a High Calling. The answer so far is a qualified no. I’ll continue to help where I can. But on the whole my time is better spent doing other things, some of which also have leverage with Wikipedia, but not of the sort that Dr. Ortega measured in his study.

For example, photography.

As of today you can find 113 photos on Wikimedia Commons that I shot. Most of these have also found use in Wikipedia. (Click “Check Usage” at the top of any shot to see how it’s been used, and where.) I didn’t put any of these shots in Wikimedia Commons, nor have I put any of them in Wikipedia. Other people did all of that. To the limited degree I can bother to tell, I don’t know anybody who has done any of that work. All I do is upload shots to my Flickr site, caption and tag them as completely as time allows, and let nature take its course. I have confidence that at least some of the shots I take will be useful. And the labor involved on my part is low.

I also spent about half an hour looking through Dr. Ortega’s study. My take-away is that Wikipedia has reached a kind of maturity, and that the fall-off in participation is no big deal. This is not to say that Wikipedia doesn’t have problems. It has plenty. But I see most of those as features rather than as bugs, even if they sometimes manifest, at least superficially, as the latter. That’s not much of a story, but it’s a hell of an accomplishment.

Tags: , , , , , , , , , , , ,