NotesFAQContact Us
Collection
Advanced
Search Tips
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ845102
Record Type: Journal
Publication Date: 2009
Pages: 27
Abstractor: As Provided
Reference Count: 32
ISBN: N/A
ISSN: ISSN-0023-8309
Automatic Syllabification in English: A Comparison of Different Algorithms
Marchand, Yannick; Adsett, Connie R.; Damper, Robert I.
Language and Speech, v52 n1 p1-27 2009
Automatic syllabification of words is challenging, not least because the syllable is not easy to define precisely. Consequently, no accepted standard algorithm for automatic syllabification exists. There are two broad approaches: rule-based and data-driven. The rule-based method effectively embodies some theoretical position regarding the syllable, whereas the data-driven paradigm tries to infer "new" syllabifications from examples assumed to be correctly syllabified already. This article compares the performance of several variants of the two basic approaches. Given the problems of definition, it is difficult to determine a correct syllabification in all cases and so to establish the quality of the "gold standard" corpus used either to evaluate quantitatively the output of an automatic algorithm or as the example-set on which data-driven methods crucially depend. Thus, we look for consensus in the entries in multiple lexical databases of pre-syllabified words. In this work, we have used two independent lexicons, and extracted from them the same 18,016 words with their corresponding (possibly different) syllabifications. We have also created a third lexicon corresponding to the 13,594 words that share the same syllabifications in these two sources. As well as two rule-based approaches (Hammond's and Fisher's implementation of Kahn's), three data-driven techniques are evaluated: a look-up procedure, an exemplar-based generalization technique, and syllabification by analogy (SbA). The results on the three databases show consistent and robust patterns. First, the data-driven techniques outperform the rule-based systems in word and juncture accuracies by a very significant margin but require training data and are slower. Second, syllabification in the pronunciation domain is easier than in the spelling domain. Finally, best results are consistently obtained with SbA. (Contains 9 tables, 4 figures and 7 footnotes.)
SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: http://sagepub.com
Publication Type: Journal Articles; Reports - Evaluative
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A