Create a Spell Check Dictionary

The create-dict.pl script creates a "local" dictionary from Voyager for use with Jim Robinson's Jazzy-based Tomcat WebVoyage spell checker.

Spell checkers (e.g. Aspell or Jazzy) come with one or more dictionaries. If a spell checker is integrated into something like an editing app, these "off-the-shelf" dictionaries are the best starting point.

However, if a spell checker is integrated with something like a library catalog, these off-the-shelf dictionaries present two potential problems: 1) if a search is done and no hits are found, the spell checker may suggest terms for words that are not contained in any records in the catalog, or 2) if a search is done and no hits are found, the spell checker may neglect to suggest terms for words that are contained in the catalog. For instance, the Jazzy off-the-shelf dictionary does not contain the word Shakespeare (much less Angewandte).

Creating a local dictionary is not as easy at it might initially seem. It is not actually desirable to have every single word that's in your catalog appear in the dictionary. This may be counter-intuitive, but you will likely find that to be true. For instance, most library catalogs have multiple (entirely legitimate) spelling variations for "Shakespeare": Shakespeare, Shakespear, Shakspeare, Shakspere. If a user mispells the name as Shakespeer, we would prefer to return Shakespeare as the suggestion, rather than one of the variant archaic spellings.

The create-dict.pl script extracts name and subject headings that appear in some minimum threshold of records, and then generates the dictionary file from words occurring in those headings. This approach is a compromise and the author is interested in alternate approaches.

After generating the output file, you can tighten the list up by running the shell command below:

cat dictionary.2 | sort -uf > english.0

...which will sort alphabetically with unique terms and output that into a file with the same name as Jim Robinson's Spell Checker dictionary.