NotesFAQContact Us
Search Tips
ERIC Number: ED459832
Record Type: Non-Journal
Publication Date: 2001-Jun
Pages: 10
Abstractor: N/A
Reference Count: N/A
Hierarchical Indexing and Document Matching in BoW.
Geffet, Maayan; Feitelson, Dror G.
An obvious and natural approach to organizing a large corpus of data is a hierarchical index--akin to a book's table of contents. The type of corpus dealt with here is a bibliographical repository, with entries form a limited domain. Given such an index, it is desirable that search results point to relevant locations in the hierarchy, rather than just providing a flat list of entries. This is useful not only to support user searching, but also as an aid suggesting possible places to link new entries that are inserted into the repository. BoW is an online bibliographical repository based on a hierarchical concept index to which entries are linked. Searching in the repository should therefore return matching topics from the hierarchy, rather than just a list of entries. Likewise, when new entries are inserted, a search for relevant topics to which they should be linked is required. The study develops a vector-based algorithm that creates keyword vectors for the set of competing topics at each node in the hierarchy, and show how its performance improves when domain-specific features are added (such as special handling of topic titles and author names). The results of a 7-fold cross validation on a corpus of some 3,500 entries with a 5-level index are hit ratios in the range of 89-95%, and most of the misclassifications are indeed ambiguous to begin with. (Contains 34 references.) (Author/AEF)
Association for Computing Machinery, 1515 Broadway, New York NY 10036. Tel: 800-342-6626 (Toll Free); Tel: 212-626-0500; e-mail: For full text:
Publication Type: Numerical/Quantitative Data; Reports - Research; Speeches/Meeting Papers
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A