NotesFAQContact Us
Search Tips
ERIC Number: ED546039
Record Type: Non-Journal
Publication Date: 2012
Pages: 215
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-2675-8038-2
Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech
Tyson, Na'im R.
ProQuest LLC, Ph.D. Dissertation, The Ohio State University
In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form of a vowel would match one another. I defined the transcribed-form of a vowel as an intended production from a speaker "X" labeled as "Y" by a transcriber. With specific acoustic/auditory feature sets extracted from a vowel, a pattern recognizer was used to produce a result indicating if the transcribed-form is an example of a citation form of a vowel. The second purpose was to compare discrimination performance of models with static vowel measures to those models with measurements taken along a trajectory, which consisted of measurements from 20%, 50% and 80% of a vowel's duration. The hypothesis was that trajectory-based measures would have notable performance gains over static vowel measures. Static and trajectory-based measurements were then organized between formant-based and cepstral measurements. The hypothesis was that cepstral representations of vowels should outperform resonant frequencies of the vocal tract (formants) simply because there is more acoustic/auditory information encoded within cepstral representations compared to formants, thereby facilitating vowel discrimination in spontaneous speech. To model this type of vowel discrimination process, I used a Support Vector Machine (SVM) and Discriminant Analysis since such pattern recognition models showed encouraging results in classifying vowel data-as shown by Clarkson and Moreno (1999) in the case of SVMs and Hillenbrand et al. (1995) for Discriminant Analysis. Input parameters came in the form of either formant measures (and transformations of formants into log and Bark scales) or cepstral measures such as Mel Frequency Cepstral Coefficients and Perceptual Linear Predictive (PLP) Coefficients. Both were computed from the midpoint and at distinct time points of a vowel's duration (20%, 50% and 80%) for CVC syllables--where I chose only stop consonants /p, b, t, d, k, g/ and one of the vowels /[low back unrounded vowel], e, ?, [near-high near-back vowel]/ because of their high numbers of mismatches between the canonical and transcribed forms. Results substantiated our hypothesis that trajectory-based, cepstral measures had the highest accuracies for both male and female speakers. Auditory features like Bark transformations and PLP coefficients were the most effective as well for both classifiers, with percentages of agreement (with the human transcribers) upwards of 80%. However, the differences in performance between formant-based and cepstral-based features were not substantial as I had originally postulated. This finding suggests that formant transformations to the auditory scale were sufficient for the task of vowel discrimination within the Buckeye Corpus. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: Ohio