A word sounds like generator

The slike tool takes a word, or sequence of letters, as input and generates other words, letter sequences, that sound like it. Sounds like may also involve sounding like it was spoken by somebody with a non-English accent, e.g., German.

There is a blog article giving an overview of the generation process from the language point of view. Another blog article discusses the functionaing of the major internal components.

The source code is known to build under Linux (well at least Suse 11.3).

All sources are available for download under a GPL license.

Current version (28 Mar 12)
slike.0.1.3.tgz

Changelog

README

Support tools, libraries and empirical data

The Perl Compatible Regular Expression library handles all the rule matching.

The technical report Letter-to-Sound rules for Automatic Translation of English Text to Phonetics by Elovitz, Johnson, McHugh and Shore contains the letter/phoneme mapping rules.

Case-sensitive letter and bigram frequency counts from large-scale English corpora by Jones and Mewhort contains the letter bigram frequencies that are used.

Other related tools

Phonetisaurus a WFST-driven (Weighted Finite-State transducers) grapheme-to-phoneme framework.

Sequitur a trainable Grapheme-to-Phoneme converter.

Chris Pound's name generation tools and multiple language data files.

Mary text to speech system.

Brett Kessler has written lots of papers on the connection between sound/syllables/phonemes and spelling.

Suggestions and comments welcome

slike "at" coding-guidelines dot com


Last updated