On the word list

Please read this, before you download the word list.
The file words.dat contains a list of 5757 5-letter English words. This is from The Stanford Graphbase, a collection of programs and datasets which generate and manipulate graphs and networks. This package is the creation of Donald Knuth at Stanford University. In his book on Stanford Graphbase [2], Knuth provides a brief history of this set of words.
The nucleus of this collection was a dictionary for playing a game called Jotto, compiled by Michael Beeler prior to 1971 [1] and extended shortly afterward to a list of 6627 words corresponding to the contents of the Webster's 7th New Collegiate Dictionary. Beeler estimated that 16,000 words would have been present if he had used an unabridged dictionary. But the 6627 five-letter words in Webster's Collegiate already included plenty of esoterica, so the author pared Beeler's list down by removing whatever he could not remember seeing previously.

Additional words were added to the culled file during the next 20 years whenever the author came across a bona fide five-letter word that was not present.

Knuth also mentions that he did not do any "censorship" on this list of words. Neither have I. So it is possible that this word list may contain words that some of you may find inappropriate. However, note that for this project, you don't have to read the words, your program does. I hope that you will all view this word list in the broader context of the project as simply an interesting data set that provides a way to generate an interesting graph. Here is the word list: words.dat.

  1. Michael Beeler, "Information theory and the game of Jotto", A.I. Memo 218 (Cambridge, MA: M.I.T. Artificial Intelligence Laboratory, August 1971), 3 pp.

  2. Donald E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, ACM Press and Addison-Wesley Publishing Company, 1993.