A publisher tries the OpenAI chatbot

December 3, 2022December 3, 2022 technology software

I’ve been hearing a lot about advanced AI content creation models, and decided to give one of them – OpenAI and the OpenAI chatbot tool – a spin. Over the past 30+ years, I have seen advanced digital technologies upend the ways in which creators make and sell media, and audiences consume it. Creators generally embrace such technologies and associated tools, even as media businesses have struggled.

So can OpenAI (and similar AI-based applications) be viewed as yet another powerful technology in the media creator’s toolbox, much like other transformational technologies have served in the past? Or will it outright replace writers, designers, musicians, and other creative professionals and the publishing businesses that employ them? Some artists are indeed very worried, after seeing AI-created examples that mimic human artists so well that one “painting” even won an award:

There is no doubt that OpenAI is very powerful. Check it what it spat out when I played in the OpenAI API playground, using a prompt “explain the Lean Media framework” and the “text-davinci-003” model:

The Lean Media framework is a framework for making media production more efficient and cost-effective. It is based on the Lean Manufacturing principles developed by Toyota in the late 1980s. The Lean Media framework focuses on a continuous process of eliminating waste and optimizing efficiency in media production. It is designed to help media companies reduce costs, improve quality, and increase productivity. The framework emphasizes the importance of using data and analytics to inform decisions, streamlining processes, and focusing on value creation rather than cost cutting. It also encourages collaboration among teams and stakeholders to ensure that everyone is on the same page with regards to goals and objectives. The Lean Media framework is designed to help media production become faster, more efficient, and more cost-effective.

That’s very good! I should know, because I created the Lean Media framework and wrote a book about it. Readers may even remember that some of the early concepts were fleshed out on this blog 10 years ago. I believe the summary is based on various public sources, including blog posts, articles, press releases, possibly the Google Books excerpt, or samples of the book (legal and otherwise) from other sources.

But then things started to go off the rails. I asked the OpenAI chatbot “who invented the lean media framework”?

The Lean Media Framework was developed by the Lean Media Lab, a research and design collective based in Los Angeles. The lab was founded by media scholar and professor Dr. Juliette De Maeyer and media makers and technologists Esther Robinson and Sarah Szalavitz.

Hmmm. Is there another Lean Media framework floating around? I couldn’t find it online, or even the “Lean Media Lab.”

So I re-entered the prompt “who invented the lean media framework.” It gave a completely different answer:

OpenAI media creation example

I started re-asking questions about the framework, using slightly different phrasings. More unfamiliar answers came up that completely contradicted earlier answers:

What’s going on here? How can the AI give different answers to the same questions, or even apparently “wrong” answers? It’s a hard question, because the “black box” design of most AI systems means that even its creators are unable to explain how certain answers were obtained:

In machine learning, these black box models are created directly from data by an algorithm, meaning that humans, even those who design them, cannot understand how variables are being combined to make predictions. Even if one has a list of the input variables, black box predictive models can be such complicated functions of the variables that no human can understand how the variables are jointly related to each other to reach a final prediction.

There are exceptions to the black box problem, such as leela.ai. There are also small, amusing examples of AI failures that Google’s AI generates in response to search queries, but the potential for harm is real, as I pointed out when I queried Google last year about “When Neil Armstrong set foot on Mars”:

AI researchers, including OpenAI itself, acknowledge there is a problem:

The OpenAI API is powered by GPT-3 language models which can be coaxed to perform natural language tasks using carefully engineered text prompts. But these models can also generate outputs that are untruthful, toxic, or reflect harmful sentiments. This is in part because GPT-3 is trained to predict the next word on a large dataset of Internet text, rather than to safely perform the language task that the user wants. In other words, these models aren’t aligned with their users.

Clearly, there is still a lot of work to be done. But there are a few important conclusions:

AI models will improve.
AI tools for media creators will improve.
We will see AI-generated content with a higher degree of quality (editorial, visual, and so on).
“Accuracy” based upon existing inputs will improve.
Humans will attempt to “game” AIs to produce desired communication, business, or creative outcomes. This may be done by training them on unusual data/inputs (including data/inputs at scale) or tweaks to the models.
Creators will have to monitor AIs to protect their intellectual property and creative rights. We are already seeing this emerge as an issue with Github Copilot, a tool which generates quick generic code blocks for developers to use but also copyrighted code with no attribution.
Media creators will learn to harness AI, just as they have done with earlier technologies and tools.

Ancestry’s indexing experiment with firms in China

May 11, 2022July 10, 2022 genealogy, technology Business, Corporations, research, software

I follow genealogist Michele Lewis on TikTok. She recently found an unusual Ancestry.com transcription from the 1820 Federal Census. Check out the handwritten first name. What does it look like to you?

Now, I get it that a 200-year-old handwritten scrawl can be hard to read. But how could a transcriber even consider “Elizabether” in this case?

I think I know the answer. In 2008, I worked for an online technology publication, The Industry Standard (no longer online). I interviewed Tim Sullivan, CEO of The Generations Network, which was Ancestry.com’s official corporate until 2009. The article was published on October 3, 2008, on the website of The Industry Standard (see image below).

In the interview, Sullivan noted that computers were “not even close” to being able to read handwritten records, especially those from disparate sources such as census records which have many different styles of handwriting.

So Ancestry turned to human transcriptionists. Paid transcriptionists, not volunteers like on FamilySearch. Sullivan told me:

“The vast majority of the investment we’ve made in the last 10 years is not in acquisitions costs or imaging costs, it’s in the indexing costs.”

At the time, Sullivan said Ancestry was paying $10 million per year to transcribe old records. To cut costs, Ancestry hired overseas partners in China where English was not widely spoken, but they can get census records transcribed for less money:

So how did The Generations Network import the data from millions of old census forms into its online database? Sullivan says the company spent about $75 million over 10 years to build its “content assets” including the census data, and much of that cost went into partnering with Chinese firms whose employees read the data and entered it into Ancestry.com’s database. The Chinese staff are specially trained to read the cursive and other handwriting styles from digitized paper records and microfilm. The task is ongoing with other handwritten records, at a cost of approximately $10 million per year, he adds.

If you have ever tried to read old handwriting in an unfamiliar language, I am sure you can appreciate how difficult this task would be. But the lack of quality checks and nonsensical transcriptions is stunning. Keep in mind that Ancestry charges customers lots of money (up to 25% more as of January) but its main focus is generating profit for a string of private equity firms. Its current owner is a Wall Street PE firm, Blackstone Inc. It’s not clear if Ancestry still outsources its transcriptions to overseas firms, or if the OCR technology is good enough to hand off the task to computers.

Regardless, what’s especially frustrating is Ancestry customers have attempted to correct this particular error. The actual name is “Christopher Orr.” They’ve added the correct annotation multiple times, but Ancestry still shows the name from that 200-year-old census return as “Elizabether Orr.” Lots of people searching for this ancestor will never find him, thanks to Ancestry’s cost-cutting moves 15 years ago and lack of quality checks to correct such errors.

As Lewis notes at the end of her video, “Maybe you’re going to have the hand-search the indexes one at a time” to determine what the actual name is.

Archive of “Google stays mum on plans for public documents, Ancestry.com points to OCR hurdle.” By Ian Lamont. Published 10/3/2008, The Industry Standard.

ancestry china outsource index transcription 2008

Publishing a jQuery programming guide

April 13, 2014June 20, 2020 e-books, technology Books, html, javascript, jquery, software

Over the past year, I have done several content experiments or expansions in the In 30 Minutes series, ranging from cooking to health and medicine. In this post, I’ll be talking about the jQuery Plugin book that my company released this month. While software has been a focus of the series since the beginning, this is the first title that gets into making software as opposed to using it. The story begins last summer. I am a long-term member of the Hacker News community, and on a thread about ebook publishing I left this comment about best practices for experimental publishing. It got 16 upvotes, which was a nice validation — I am not a hacker, but I like to be able to positively contribute to Hacker News when I can. But the thread moved out of sight, and after a few days I forgot about my comment. Six months later, I received an email out of the blue. It started:

I’ve been checking out your “30 Minutes” series and was originally inspired to write my own ebook after reading your post on HN a few months ago. I have since wrote a small 48 page guide on “jQuery Plugin Development”. I haven’t launched it yet, just waiting for some feedback after sending it to a few friends first.

The author was Robert Duchnik, a Canadian developer who was living in Thailand. We began corresponding, and tossed around the idea of releasing a programming title as an In 30 Minutes guide. This was an interesting area to expand into. Most In 30 Minutes titles are written for mainstream audiences. They range from Melanie Pinola’s book about LinkedIn to the experimental easy Chinese recipes cookbook on the iPad authored by Shiao-jang Kung. The jQuery Plugin guide was focused on a much narrower, highly technical niche audience. Marketing to this group would be a challenge.

Moving Forward With jQuery Plugin Development In 30 Minutes

Rob’s book had some big things going for it:

He’s a jQuery Plugin expert, with many years of experience in the field and the operator of Websanova, an online resource devoted to jQuery Plugins.
Rob has an existing audience, via Websanova. From previous releases by Melanie and Tim Fisher (author of Windows 8 Basics In 30 Minutes), I have found that those authors who already have existing online audiences have a huge advantage right out of the gate. Not only can they turn to their fans to purchase copies and help spread the word, but by virtue of the fact that they have already interacted with the audience over time they have an innate knowledge of the problems that readers face, and what people want to know. This makes for better books and a better author/reader relationship going forward.
There was already a draft manuscript. It needed some light editing and a proofreader, but otherwise it was in pretty good shape.
The manuscript was short. This is an asset, as we want readers to be able to understand the topic at hand in less than 30 minutes.
The market for books about jQuery plugin development had a hole. Through discussions with Rob and a quick analysis of competing titles, I determined that there is a need for this type of resource (high-quality, quick-start programming guide) on this topic, especially if it were priced right.

This last point is important. I am not talking about low-balling the competition. There are already lots of free online resources about how to write jQuery plugins. There are also a small number of books about jQuery plugins, but most of them are long and somewhat expensive. There was not much in the middle, in terms of length or price. This is where jQuery Plugin Development In 30 Minutes would live. Rob and I came to an agreement in January, and we moved forward with preparing the manuscript for publication. There were some new writing tools to try out, and some difficulties related to producing code blocks in Scrivener (my primary book production tool) but we established a workflow based on markdown and Github and published the title at the beginning of April. You can read the table of contents for the jQuery plugin book here. The title is available for the Kindle, iPad, Nook, and Google Play, as well as a paperback and a PDF.

In addition, you may be interested in reading some of Rob’s blog posts about jQuery plugin development:

A proposal for a Lean Media Framework: Input and iteration required

October 1, 2012January 15, 2018 Lean, media, Other Business, Lean, music, software, startup, TV

(Updated) I’m a media guy. I’ve been involved as a producer and manager in various sectors of the media industry my entire adult life, including the music industry, broadcasting (radio and TV), newspapers, magazines, and, starting in the 1990s, online media. I’ve experienced the shift from analog to digital, and the many struggles that have resulted from this sea change.

More recently, I’ve become a startup guy. I co-founded a mobile software startup that released a classifieds app. I’m currently trying to bootstrap an e-publishing venture around In 30 Minutes® guides, and have released more than a half-dozen titles on Amazon, iTunes, Kobo, and other ebook distribution platforms. These guides are aimed at mainstream audiences who need help getting up to speed with mildly complex subjects, ranging from health to technology. The guides include ebooks/books as well as online components — including the guide which people mistakenly compare with Google Docs for Dummies and posts such as What Is Dropbox? to get an idea of the products and information being offered.

The Lean Startup A few years ago, before the mobile startup, I heard Eric Ries give his Lean Startup stump speech at MIT. It immediately clicked with me. His focus was software development, but I realized that the things he was saying about product development, feedback cycles, and speed applied not only to software, but to media content as well. I had seen it with my own eyes. Print content, websites, video, music and other products/projects that were developed with these qualities in mind had many positive qualities. They were cheaper to produce, they made it to market more quickly, user feedback loops started sooner, and if they were new brands, they got a huge head start. They were also more fun to work on.

Conversely, products that took the big media approach — bloated teams, top-down directives, planned by committee, limited feedback cycles, etc. — encountered problems. They required huge staff and budget commitments, took years to complete, and seemed to have a higher rate of failure.

But I also realized that there were some problems with applying the Lean Startup framework to media content.

First, out of all of the “Lean” media products that I had been a part of or had seen close-up, very few could be considered successes. My blog about the Harvard Extension School is one (more than a half-million page views, thousands of dollars in revenue) and an online community for Computerworld (probably 10 million visits before it was retired) is another. But other products floundered or failed out of the gate, and even after iteration, they failed.

Second, it wasn’t hard to find examples of fat big media products that were hits. Turn on the TV, and you can see examples on every channel. A reality TV talent show that takes millions to produce, is planned for at least a year, and follows a format of a three-judge panel with at least one British judge, has a very high chance of success. In the music world, there have been many albums that have taken years to produce and have broken every Lean rule in the book, yet have sold millions of copies. To illustrate, Def Leppard started writing the songs for Hysteria in 1984, yet the album wasn’t released until 1987. The songs on Hysteria didn’t take long to write. But finessing them, producing them, marketing them, and launching them took years. This is the exact opposite of Ries’ Minimum Viable Product (MVP) concept, or even the variation known as Minimum Delightful Product.

Third, it was hard to isolate certain factors that are commonly found in media products but are seldom seen in the software world. “Brand” and “star power” can be hugely important in new product launches for media, but in the software world (aside from Apple) it’s more about the product and what it can do. For media products, another difference relates to creative processes and team dynamics, and the feedback cycles that exist within teams (think of the Beatles in the studio, the New York Times editorial processes, or the Saturday Night Live script readings). There is also the huge disruption that is taking place around business models, which clouds everything around media.

Lean Media: From Theory To Practice

When I launched a mobile software startup, I finally had a chance to put Lean methodologies to work with my co-founder. We made mistakes, especially at the beginning, but eventually released a product that proved to be very popular with consumers, and had high engagement and retention rates. I felt that when we followed the Lean philosophy, it worked very well for product development.

When I started my second venture this summer, the ebook experiment, I pledged to myself that I would attempt to actively follow the Lean philosophy. Get products out to the marketplace as soon as possible. Measure. Iterate. Improve. Some of these processes were already ingrained, owing to my earlier experiences with rapid product development in the online media and music industries, as well as the mobile software startup, and my grad school experience, which emphasized iterative product development. But I was more methodical with measuring and incorporating feedback. I also paid a lot of attention to revenue, something that I had not been focused on with any previous venture or media experiment.

As the ebook venture progresses, my mind has been circling back to the inconsistencies I observed earlier. Yes, Lean methodologies do work for media content. They can lead to better products, and better sales. However, the Lean approach does not take into account important factors — such as brand and creative processes — that can determine the success or failure of media ventures.

The Opportunity For Lean Media

Therefore, I believe there is an opportunity to build a new Lean framework that is specific to media ventures — a Lean “mod” for media, if you will. The goal of building a Lean Media Framework is to help startups and established companies build innovative products, platforms, and business models that have a higher chance of success and can contribute to new models of creation, distribution, and consumption.

In the old media world, an idea like this would have been developed by a single writer or a small team of collaborators. An essay would appear in a communications journal or The New Yorker. If it got traction, the author(s) would get a book deal.

In the spirit of Lean development and distributed knowledge, I am starting with a simple blog post (which took two hours to write) and throwing these concepts out to my favorite forums for discussion and iteration. Share your thoughts below, tweet to @ilamont, write a blog post, or do whatever you think is appropriate to carry the discussion forward and iterate until we have something that we can share with a wider audience.

November 2015 Update: I am expanding Lean Media into a book. Read sample chapters here. I have also launched a newsletter about industrial automation using Lean Media principles.

Update: More thoughts and discussion here:

Ipso Facto

By the fact itself. An award-winning blog by a Harvard Extension School alumnus.

software