ERIC Number: ED031261
Record Type: RIE
Publication Date: 1969-May
Reference Count: N/A
Determination of Statistical Clumps.
The exponential growth in the literature of most fields has produced a near crisis situation for the people who provide storage and retrieval facilities for this same literature. One useful tool suggested for anyone interested in mechanizing the process of information storage and retrieval requires that vocabulary used in the system be divided into groups of words, each group representing a different subarea of the initial field. It is the intention of the present paper to show why the problem of subdividing a vocabulary is best handled by computer. Of a number of existing techniques, one which seems appropriate is selected, modified, and certain improvements are suggested. The results described in this report are equally valid for any collection of objects--obeying a minimum set of requirements--which must be divided into smaller groups, the groups being defined in a statistical sense. (Author)
Descriptors: Automation, Cluster Grouping, Correlation, Indexing, Information Retrieval, Information Storage, Statistical Analysis, Subject Index Terms, Vocabulary, Word Frequency
Clearinghouse for Federal Scientific and Technical Information, Springfield, Va. 22151 (PB 184 136, MF-$0.65, HC-$3.00)
Publication Type: N/A
Education Level: N/A
Sponsor: National Science Foundation, Washington, DC.
Authoring Institution: Pennsylvania Univ., Philadelphia. Moore School of Electrical Engineering.
Note: Master's thesis, Moore School of Electrical Engineering, University of Pennsylvania, 1969.