ERIC Number: ED041591
Record Type: RIE
Publication Date: 1969-Oct
Reference Count: N/A
Statistical Information Retrieval System.
DiFondi, Nicholas M.
An information retrieval system was developed using technical word occurrences as a basis for classification. A set of words, designated a vocabulary, was selected from the middle range of frequency listing of words occurring in an experimental sample of 94 documents. The selection produced 115 non-function words with technical definition that did not allow ambiguous usage and they were assigned one of eighty concept numbers. The frequencies of these concepts served as data for factor analysis and 39 factors were extracted to represent the orthongonal axes of a geometric subject-content space. The locations of concepts in this space were used to locate the geometric position of documents according to their frequencies in the documents. The total of 194 documents was used in the measuring of system effectiveness. Requests formulated for a previous experiment using the same data base were processed. Precision and recall measures were calculated and on the average 66% precision and 80% recall were attained with one of three dissemination thresholds. Overall analysis of the results supports the theory that statistical data about word occurrences is sufficient to accurately represent documents relative to their subject content. (Author)
Descriptors: Classification, Information Processing, Information Retrieval, Information Systems, Relevance (Information Retrieval), Search Strategies, Statistical Analysis, Vocabulary
National Technical Information Service, Springfield, Va. 22151 (AD-697 403, MF $.65, HC $3.00)
Publication Type: N/A
Education Level: N/A
Authoring Institution: Rome Air Development Center, Griffiss AFB, NY.