ERIC Number: ED063007
Record Type: RIE
Publication Date: 1972-May
Reference Count: 0
Document Retrieval Based on Clustered Files.
Murray, Daniel McClure
A retrieval system is considered in which document descriptions are stored and accessed in groups called clusters. All items in a cluster meet common similarity criteria and are represented by a composite entity called a profile. In large collections, profiles themselves are clustered and additional levels of profiles are generated. This entire process establishes a file organization for the system in that records are composed with a logical structure with a directory (profile hierarchy) to facilitate searching. Clustered files have the following advantages over other organizations: complete document information is stored in the same location; storage overhead is low; and flexible, economical searches can be realized. The problems investigated in clustered file organization are: profile definition, updating, hierarchy storage, and secondary profile uses. A comparison with an inverted file is included. Nearly all work has an experimental base and uses the SMART retrieval system. The proposed organization compares favorably in terms of speed and storage economy. Various request-document matching procedures, and feedback schemes are easily implemented. Search precision is less, but compensated by a flexible level of recall--low or high. (Author/SJ)
Publication Type: N/A
Education Level: N/A
Sponsor: National Library of Medicine (DHEW), Bethesda, MD.; National Science Foundation, Washington, DC.
Authoring Institution: Cornell Univ., Ithaca, NY. Dept. of Computer Science.
Note: 97 References; A Thesis Presented to the Faculty of the Graduate School of Cornell University for the Degree of Doctor of Philosophy