NotesFAQContact Us
Search Tips
ERIC Number: ED519480
Record Type: Non-Journal
Publication Date: 2010
Pages: 134
Abstractor: As Provided
ISBN: ISBN-978-1-1241-9819-4
Improving Tone Recognition with Nucleus Modeling and Sequential Learning
Wang, Siwei
ProQuest LLC, Ph.D. Dissertation, The University of Chicago
Mandarin Chinese and many other tonal languages use tones that are defined as specific pitch patterns to distinguish syllables otherwise ambiguous. It had been shown that tones carry at least as much information as vowels in Mandarin Chinese [Surendran et al., 2005]. Surprisingly, though, many speech recognition systems for Mandarin Chinese have not performed explicit tone recognition. Systems that had initially tried to incorporate tone information by recognizing tonal syllables often found that errors at the tone-recognition stage outweighed any benefit from the percentage of correctly recognized tones. Instead, these systems recognized toneless syllables and used large-scale language models for disambiguation. We attribute these problems in early tone-recognition approaches to several limitations of their tone modeling strategies, which our approaches aim to correct. The first limitation we identify is that most early tone-recognition approaches did not model the effect of neighboring syllables in continuous speech, known as coarticulation, which often severely distorts the realizations of the underlying tones. We propose two nucleus modeling techniques to locate the most reliably produced portion of the syllable and remove the regions affected by coarticulation. The first method manipulates a geometric representation of both pitch and amplitude to locate the tone nucleus as the region with the greatest articulatory effort. The second method models tone nuclei based on the likely output of a landmark-based vowel detector, linking the sonority profile of the speech signal to the stability of tone production. Tone recognition using both nucleus modeling techniques outperforms tone recognition using either baseline nucleus approaches or the full syllable. Furthermore, the second method--the landmark-based nucleus modeling--shows a 15% improvement over two published tone-recognition frameworks. The second limitation we identify is that most early tone-recognition approaches provided limited or no modeling of tone variation derived from phrase, sentence, and topic-level intonation. Most previous modeling techniques aimed at modifying the pitch levels and ranges of tones to adjust for the impact of these intonation events. However, we demonstrate that none of these approaches adequately compensates for the boundary effects of these intonational units. Our approach employs sequential learning frameworks, both to investigate their modeling on tone production with intonation events and to model the effect of the broader intonation context on tone realization and recognition. Our analyses also show that all three factors--tone identity, level of intonation event, and structure of sequential graphical modeling--can influence recognition performance. This is because different tones show pitch patterns that vary across phrase, sentence, and story-level boundaries, and because the structures of a sequential graphical model make them encode these variations differently. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A