ERIC Number: ED459830
Record Type: Non-Journal
Publication Date: 2001-Jun
Reference Count: N/A
Building Searchable Collections of Enterprise Speech Data.
Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret
The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing algorithms was applied to the output of the recognizer to render it suitable for the Textract text mining system. The call transcripts were indexed using a search engine and Textract and associated Java technologies were used to place the relevant terms for each document in a relational database. Following a search query, a thumbnail display of the results of each call was generated with the salient terms highlighted. These results are illustrated and their utility is discussed. Results of these experiments were taken and this analysis was continued on a set of talks and presentations. A distinct document genre is described, based on the note-taking concept of document content, and a significant new method is proposed for measuring speech recognition accuracy. This procedure is generally relevant to the problem of capturing meetings and talks and providing a searchable index of these presentations on the Web. (Contains 19 references.) (Author/AEF)
Descriptors: Databases, Information Processing, Information Retrieval, Information Seeking, Internet, Online Searching, World Wide Web
Association for Computing Machinery, 1515 Broadway, New York NY 10036. Tel: 800-342-6626 (Toll Free); Tel: 212-626-0500; e-mail: firstname.lastname@example.org. For full text: http://www1.acm.org/pubs/contents/proceedings/dl/379437/.
Publication Type: Reports - Research; Speeches/Meeting Papers
Education Level: N/A
Authoring Institution: N/A