Tuesday, February 16, 2016

Proposals Toward the End of Writing

I. The Solution to Cliché

…whenever thought for a time runs along an accepted groove—there is an opportunity for the machine.
—Vannevar Bush


Some writers morbidly fixate on computer interference in their working lives: it’s a distraction, an unwanted convenience; it debases the written word, revolutionizes form, or is “making us stupid.” Often it’s framed as anathema to serious writing: Philip Roth worries that books “can’t compete with the screen”; Zadie Smith credits the Internet-blocking app Freedom in the Acknowledgements of NW; Doris Grumbach grumbles that word processors allow people to write too much; Jonathan Franzen squirts superglue into his laptop’s Ethernet port.

For all this handwringing, there’s less discussion about technology’s direct interventions in the writing itself, especially in an editorial capacity. Consider spellcheck, whose influence is obscure but probably quietly tremendous, not just on writing but on writers themselves—a 2012 British survey found that two-thirds of people used spellcheck “all or most of the time,” and one-third misspelled “definitely.” (The organization blamed this on the “auto-correct generation,” though the causal link appears baseless.) And can we ever measure contemporary literature’s debt to cut-and-paste, find-replace, versioned backup, web research, online correspondence, Track Changes?

Of course, these general-purpose functions influence far more than just literature, but we’re also beginning to see text analysis tools with a specific literary focus. These tools promise to show us our true reflections in the form of hard statistical data—insights beyond the reach of mere human editors. These include everything from word counters and sentence-length analyzers to hundreds of more boutique gizmos: Gender Guesser tries to ascertain an author’s gender by comparing it against the word frequency trends in prose written by women and by men, while MetaMind can be programmed to assess a writing sample’s “viewpoint,” from its political leanings to its “positivity.” Services like Turnitin circumvent plagiarism, while others like PhraseExpress insert entire sentences right under your fingertips.

The recent Hemingway app goes even further, offering dogmatic editorial guidance to make your prose “bold and clear”:
Hemingway highlights long, complex sentences and common errors; if you see a yellow sentence, shorten or split it. If you see a red highlight, your sentence is so dense and complicated that your readers will get lost trying to follow its meandering, splitting logic — try editing this sentence to remove the red.
It also recommends the indiscriminate excision of adverbs and passive constructions. Tallying up all the infelicities, it assigns the passage a numerical grade, representing “the lowest education level needed to understand your text,” which oddly equates boldness and clarity with legibility to young children (presumably, the best score would be “Illiterate”). Ernest Hemingway’s own prose often fails the test, though, as Ian Crouch observes, Hemingway is usually making a stylistic point wherever he trespasses against his own putative rules. Meanwhile, Nabokov’s “Spring in Fialta” gets the worst possible score of 25 (a second-year post-doc?).

With inventions like these, many of which are intended to improve prose’s suitability to a particular purpose, it seems inevitable that we’ll soon have programs aimed at broader literary purposes. Imagine, for instance, a computer program that detects clichés at the sentence level. Existing attempts are based on small databases of fixed idioms. Suppose our cliché detector is a simple extension of the language-checking features already baked into most word processing software, underlining each trite phrase with a baby-blue squiggle. It analyzes the text for any sequences of words that statistically tend to accompany each other—and the statistical database of clichés, in turn, is based on a Zipfian distribution of word groupings obtained from the quantitative analysis of a large prose corpus. Every phrase ranked above a certain score is flagged as a cliché. No more “in any case” or “at this rate,” no more “battling cancer” or “wry grin” or “boisterous laughter”—though the program might forgive idioms that lack basic synonyms, like “walking the dog.”

The larger the corpus, the better; Google could team up with the NSA to digitize and index every word ever written or recorded, and make this omni-corpus available for indexing, mining, and categorizing. Or by being trained on a personal corpus of writing samples, the detector could be adapted to learn an author’s pet phrases. Zadie Smith pointed out that in all of her novels someone “rummages in their purse”; our program would flag each instance, as well as any variations: “they had rummaged through their purses,” “purses were rummaged,” etc. And it could be tailored to specific genres: “heaving bosoms” in romance, “throughout history” in student papers, “please advise” in business emails. (...)

II. Art in the Age of Mechanical Production

But I have no native language,
I can’t judge, I suspect I write garbage.
—Eugene Ostashevsky, Iterature


One can never assume that tools will only be used for their intended purposes. In the same way that people have appropriated plagiarism detectors to gather research citations, it’s easy to imagine people using the cliché detector as a composition aid. Picture an uninspired-yet-tenacious user—the Lazy Student—slamming out a cliché-infested rough draft, then methodically stepping through it, iterating through elegant variations like a slot machine until he finds one that sounds right, repeating as necessary. All that’s required is a decent ear to produce sentences that will be, statistically speaking, highly original. If this method of editing proved more efficient and at least as good as traditional writing, it would put taste at a premium, and render talent as unnecessary and quaint as good penmanship. Better writing produced by worse writers.

Being particular to sentence-level flaws, the cliché detector we’ve described is just a rudimentary line-editor, a nose-hair trimmer. It doesn’t address the larger problems of clichéd sentiment, faddish style, stylistic vampirism, stereotyped characters, shopworn narrative devices. It has no sense of context, and wouldn’t be able to distinguish clichés from quotations, allusions, parodies, or collage pieces.

But you could suppose that each of these problems is just an engineering hurdle waiting to be jumped. If spellcheckers correct words and cliché detectors fix phrases, what would a coarser-grained cliché detector that addressed whole sentences and scenes look like?

by Tony Tulathimutte, The Believer | Read more:
Image: Bucky Miller