Duck Soup: Whispers of A.I.’s Modular Future

One day in late December, I downloaded a program called Whisper.cpp onto my laptop, hoping to use it to transcribe an interview I’d done. I fed it an audio file and, every few seconds, it produced one or two lines of eerily accurate transcript, writing down exactly what had been said with a precision I’d never seen before. As the lines piled up, I could feel my computer getting hotter. This was one of the few times in recent memory that my laptop had actually computed something complicated—mostly I just use it to browse the Web, watch TV, and write. Now it was running cutting-edge A.I.

Despite being one of the more sophisticated programs ever to run on my laptop, Whisper.cpp is also one of the simplest. If you showed its source code to A.I. researchers from the early days of speech recognition, they might laugh in disbelief, or cry—it would be like revealing to a nuclear physicist that the process for achieving cold fusion can be written on a napkin. Whisper.cpp is intelligence distilled. It’s rare for modern software in that it has virtually no dependencies—in other words, it works without the help of other programs. Instead, it is ten thousand lines of stand-alone code, most of which does little more than fairly complicated arithmetic. It was written in five days by Georgi Gerganov, a Bulgarian programmer who, by his own admission, knows next to nothing about speech recognition. Gerganov adapted it from a program called Whisper, released in September by OpenAI, the same organization behind ChatGPT and dall-e. Whisper transcribes speech in more than ninety languages. In some of them, the software is capable of superhuman performance—that is, it can actually parse what somebody’s saying better than a human can.

What’s so unusual about Whisper is that OpenAI open-sourced it, releasing not just the code but a detailed description of its architecture. They also included the all-important “model weights”: a giant file of numbers specifying the synaptic strength of every connection in the software’s neural network. In so doing, OpenAI made it possible for anyone, including an amateur like Gerganov, to modify the program. Gerganov converted Whisper to C++, a widely supported programming language, to make it easier to download and run on practically any device. This sounds like a logistical detail, but it’s actually the mark of a wider sea change. Until recently, world-beating A.I.s like Whisper were the exclusive province of the big tech firms that developed them. They existed behind the scenes, subtly powering search results, recommendations, chat assistants, and the like. If outsiders have been allowed to use them directly, their usage has been metered and controlled.

There have been a few other open-source A.I.s in the past few years, but most of them have been developed by reverse engineering proprietary projects. LeelaZero, a chess engine, is a crowdsourced version of DeepMind’s AlphaZero, the world’s best computer player; because DeepMind didn’t release AlphaZero’s model weights, LeelaZero had to be trained from scratch, by individual users—a strategy that was only workable because the program could learn by playing chess against itself. Similarly, Stable Diffusion, which conjures images from descriptions, is a hugely popular clone of OpenAI’s dall-e and Google’s Imagen, but trained with publicly available data. Whisper may be the first A.I. in this class that was simply gifted to the public. In an era of cloud-based software, when all of our programs are essentially rented from the companies that make them, I find it somewhat electrifying that, now that I’ve downloaded Whisper.cpp, no one can take it away from me—not even Gerganov. His little program has transformed my laptop from a device that accesses A.I. to something of an intelligent machine in itself. (...)

A textbook from 1999, which described a then state-of-the-art speech-recognition system similar to Dragon NaturallySpeaking, ran to more than four hundred pages; to understand it, one had to master complicated math that was sometimes specific to sound—hidden Markov models, spectral analysis, and something called “cepstral compensation.” The book came with a CD-rom containing thirty thousand lines of code, much of it devoted to the vagaries of speech and sound. In its embrace of statistics, speech recognition had become a deep, difficult field. It appeared that progress would come now only incrementally, and with increasing pain.

But, in fact, the opposite happened. As Sutton put it in his 2019 essay, seventy years of A.I. research had revealed that “general methods that leverage computation are ultimately the most effective, and by a large margin.” Sutton called this “the bitter lesson”: it was bitter because there was something upsetting about the fact that packing more cleverness and technical arcana into your A.I. programs was not only inessential to progress but actually an impediment. It was better to have a simpler program that knew how to learn, running on a fast computer, and to task it with solving a complicated problem for itself. The lesson kept having to be relearned, Sutton wrote, because jamming everything you knew into an A.I. often yielded short-term improvements at first. With each new bit of knowledge, your program would get marginally better—but, in the long run, the added complexity would make it harder to find the way to faster progress. Methods that took a step back and stripped expert knowledge in favor of raw computation always won out. Sutton concluded that the goal of A.I. research should be to build “agents that can discover like we can” rather than programs “which contain what we have discovered.” In recent years, A.I. researchers seem to have learned the bitter lesson once and for all. The result has been a parade of astonishing new programs.

by James Somers, New Yorker | Read more:

Image: Pierre Buttin

[ed. For creating images similar to DALL-E, see also: the free, open source program Stable Diffusion Online (no sign-up required). And, in other AI news: California congressman proposes a new government agency to regulate various AI issues; another wants to create general operating standards, including digital watermarks; BuzzFeed says it will use AI to create content (stock jumps 150 percent); and, Mostly Skeptical Thoughts On The Chatbot Propaganda Apocalypse (ACX). {ed.} Why does everything AI suddenly seem like it's all moving too damn fast?]

"Imagine a world where autonomous weapons roam the streets, decisions about your life are made by AI systems that perpetuate societal biases and hackers use AI to launch devastating cyberattacks. This dystopian future may sound like science fiction, but the truth is that without proper regulations for the development and deployment of Artificial Intelligence (AI), it could become a reality. The rapid advancements in AI technology have made it clear that the time to act is now to ensure that AI is used in ways that are safe, ethical and beneficial for society. Failure to do so could lead to a future where the risks of AI far outweigh its benefits.

I didn’t write the above paragraph. It was generated in a few seconds by an A.I. program called ChatGPT, which is available on the internet. I simply logged into the program and entered the following prompt: “Write an attention grabbing first paragraph of an Op-Ed on why artificial intelligence should be regulated.”

I’m a Congressman Who Codes. A.I. Freaks Me Out. (NY Times)

Monday, February 6, 2023

Whispers of A.I.’s Modular Future