NotesFAQContact Us
Search Tips
ERIC Number: ED561666
Record Type: Non-Journal
Publication Date: 2013
Pages: 167
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-3034-6795-0
Representation, Classification and Information Fusion for Robust and Efficient Multimodal Human States Recognition
Li, Ming
ProQuest LLC, Ph.D. Dissertation, University of Southern California
The goal of this work is to enhance the robustness and efficiency of the multimodal human states recognition task. Human states recognition can be considered as a joint term for identifying/verifing various kinds of human related states, such as biometric identity, language spoken, age, gender, emotion, intoxication level, physical activity, vocal tract patterns, ECG QT intervals and so on. I performed research on the aforementioned states recognition problems and my focus is to increase the performance while reduce the computational cost. I start by extending the well known total variability i-vector modeling (a factor analysis on the concatenated GMM mean supervectors) to the simplified supervised i-vector modeling to enhance the robustness and efficiency. First, by concatenating the label vector and the linear classifier matrix at the end of the mean supervector and the i-vector factor loading matrix, respectively, the traditional i-vectors are extended to the label regularized supervised i-vectors. This supervised i-vectors are optimized to not only reconstruct the mean supervectors well but also minimize the mean square error between the original and the reconstructed label vectors, thus can make the supervised i-vectors more discriminative in terms of the label information regularized. Second, I perform the factor analysis (FA) on the pre-normalized GMM first order statistics supervector to ensure each gaussian component's statistics sub-vector is treated equally in the FA which reduce the computational cost by a factor of 25. Inspired by the recent success of sparse representation on face recognition, I explored the possibility to adopt sparse representation for both representation and classification in this multimodal human sates recognition problem. For classification purpose, a sparse representation computed by l1-minimization (to approximate the l0 minimization) with quadratic constraints was proposed to replace the SVM on the GMM mean supervectors and by fusing the sparse representation based classification (SRC) method with SVM, the overall system performance was improved. Second, by adding a redundant identity matrix at the end of the original over-complete dictionary, the sparse representation is made more robust to variability and noise. Third, both the l1 norm ratio and the background normalized (BNorm) l2 residual ratio are used and shown to outperform the conventional l2 residual ratio in the verification task. I also present an automatic speaker affective state recognition approach which models the factor vectors in the latent factor analysis framework improving upon the Gaussian Mixture Model (GMM) baseline performance. I consider the affective speech signal as the original normal average speech signal being corrupted by the affective channel effects. Rather than reducing the channel variability to enhance the robustness as in the speaker verification task, I directly model the speaker state on the channel factors under the factor analysis framework. Experimental results show that the proposed speaker state factor vector modeling system achieved unweighted and weighted accuracy improvement over the GMM baseline on the intoxicated speech detection task and the emotion recognition task, respectively. To summarize the methods for representation, I propose a general optimization framework. The aforementioned methods, such as traditional factor analysis, i-vector, supervised i-vector, simplified i-vector and s-vectors, are all special cases of this general optimization problem. In the future, I plan to investigate other kinds of distance measures, cost functions and constraints in this unified general optimization problem. (Abstract shortened by UMI.). [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A