NotesFAQContact Us
Search Tips
Peer reviewed Peer reviewed
PDF on ERIC Download full text
ERIC Number: EJ1168730
Record Type: Journal
Publication Date: 2017-Dec
Pages: 13
Abstractor: As Provided
ISSN: EISSN-2330-8516
Using Vision and Speech Features for Automated Prediction of Performance Metrics in Multimodal Dialogs. Research Report. ETS RR-17-20
Ramanarayanan, Vikram; Lange, Patrick; Evanini, Keelan; Molloy, Hillary; Tsuprun, Eugene; Qian, Yao; Suendermann-Oeft, David
ETS Research Report Series, Dec 2017
Predicting and analyzing multimodal dialog user experience (UX) metrics, such as overall call experience, caller engagement, and latency, among other metrics, in an ongoing manner is important for evaluating such systems. We investigate automated prediction of multiple such metrics collected from crowdsourced interactions with an open-source, cloud-based multimodal dialog system in the educational domain. We extract features from both the audio and video signals and examine the efficacy of multiple machine learning algorithms in predicting these performance metrics. The best performing audio features consist of multiple low-level audio descriptors--intensity, loudness, cepstra, pitch, and so on--and their functionals, extracted using the OpenSMILE toolkit, while the video features are bags of visual words that use 3D Scale-Invariant Feature Transform descriptors. We find that our proposed methods outperform the majority vote classification baseline in predicting various UX metrics rated by both the user and experts. Our results suggest that such automated prediction of performance metrics can not only inform the qualitative and quantitative analysis of dialogs but also be potentially incorporated into dialog management routines for positively impacting UX and other metrics during the course of the interaction.
Educational Testing Service. Rosedale Road, MS19-R Princeton, NJ 08541. Tel: 609-921-9000; Fax: 609-734-5410; e-mail:; Web site:
Publication Type: Journal Articles; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A