The Apocryphal Twain: “Data is like garbage. You’d better know what you’re going to do with it before you collect it.”

There is perhaps no greater testament to Twain’s lasting reputation than the habitual misattribution of miscellaneous wit and wisdom to his name. The circulation of such apocryphal aphorisms was common enough in the 20th century. It has only increased with the popularization of social media. The most common question addressed to the Center for Mark Twain Studies is some variety of “Did he really say that?” Whenever possible, we track down the original source, as well as attempt to trace how their words came to be imagined in Twain’s mouth.


This misattribution is surging in usage, largely due to the fact that several companies have been using it in their marketing materials. Here are a couple examples from well-established consultancies.

Strong evidence that this is not something Mark Twain would actually have said lies in the language of the quote itself. The usage of the word data was very uncommon throughout Twain’s lifetime.

Twain did use the term, though quite sparingly, and in the dozen iterations I found, he always associates it with one of two specific tasks, either the compiling of basic biographical details (names, ages, birthplaces, etc.) or the evaluation of author contracts (sales figures, prices, royalty percentages, etc.).

This is hardly surprising, given that the connotation of data in the contested aphorism – as part of processes of data collection, data acquisition, or data hygiene – emerges with specific methods of social science and computer science which do not develop until well after Twain’s death.

The word data is more common in Twain’s vocabulary, however, than the word garbage. I could find only three iterations of that term across Twain’s whole corpus of public and private writings. He clearly much favored the synonyms which would have been more common in the American vernacular of his youth, like trash, waste, and refuse.

Another strong indication that this aphorism is apocryphal is that association with Twain does not appear until long after the author’s death. The earliest attribution I found was in the computer science magazine Interfaces in 1986. Even in this case, and in several others like it from the 1990s, the authors acknowledge the uncertainty of their attribution, saying “Twain is reported to have said.”

But the circulation of the aphorism is much older than its attribution to Twain. The earliest printed version I found was from a Quality Control Conference handbook from 1966, while an automotive industry consultant refers to it as “a 1970s expression,” as though it was a kind of business cliche of the era.

Robert Rodin, the CEO of Marshall Industries, a manufacturer of semiconductors and other small parts for computers and other electronics, makes the aphorism a centerpiece of his 1999 memoir, Free, Perfect, & Now, and attributes it to Michael Tveite, a very successful management consultant of the 1960s and 1970s. Both Rodin and Tveite are disciples of the influential management theorist W. Edwards Deming.

Deming wrote, in an essay title “On A Classification of the Problems of Statistical Inference,” published in 1942, while he was working for the U.S. Census,

It seems most likely that this is where the “data is like garbage” aphorism originates, modified with humor, perhaps by Tveite, to be more effective in oratorical presentations, then falsely attributed to Twain, perhaps because knowledge of the influence of Deming and his followers was fading.