NotesFAQContact Us
Collection
Advanced
Search Tips
ERIC Number: ED017286
Record Type: RIE
Publication Date: 1967-Jan
Pages: 1
Abstractor: N/A
Reference Count: N/A
ISBN: N/A
ISSN: N/A
COMPUTER CLASSIFICATION OF DOCUMENTS. ANNUAL PROGRESS REPORT.
WILLIAMS, J.H., JR.
CLASSIFICATION OF DOCUMENTS INVOLVES THREE DISTINCT PROCESSES-- (1) DEFINING A STRUCTURE OF CATEGORIES, (2) DETERMINING A BASIS FOR A CLASSIFICATION DECISION, AND (3) CLASSIFYING DOCUMENTS INTO CATEGORIES. OF THE THREE COMPUTER TECHNIQUES ARE DISCUSSED FOR THE LAST TWO. A WORD SELECTION MEASURE IS USED TO DELETE THOSE TERMS IN THE DOCUMENT THAT OCCUR RARELY AND THOSE THAT HAVE A LOW CONDITIONAL PROBABILITY OF OCCURING IN THE CATEGORY UNDER CONSIDERATION. A SET OF SAMPLE DOCUMENTS KNOWN TO BELONG TO EACH CATEGORY IS USED AS A BASIS FOR COMPUTING THE DISCRIMINANT FUNCTIONS WHICH PROVIDE WEIGHTING COEFFICIENTS FOR EACH TERM. A NEW DOCUMENT IS CLASSIFIED BY COUNTING THE FREQUENCIES OF THE SELECTED TERMS OCCURING IN IT, AND COMPUTING THE PROBABILITY OF MEMBERSHIP IN EACH CATEGORY. THE DOCUMENT IS THEN ASSIGNED TO THE CATEGORY HAVING THE HIGHEST PROBABILITY OF MEMBERSHIP, OR, IF ASSIGNMENT TO ONE CATEGORY IS NOT DESIRABLE, A MULTI-CATEGORY ASSIGNMENT CAN BE INDICATED. A THESAURUS CAPABILITY ALLOWS VARIOUS TYPES OF WORDS TO BE CONSIDERED EQUIVALENT, INCLUDING INFLECTED WORDS AND COMPOUND WORDS. IF A SAMPLE SET OF DOCUMENTS IN ANY LANGUAGE IS AVAILABLE, OTHER DOCUMENTS IN THAT LANGUAGE CAN BE CLASSIFIED ALSO. THIS PAPER WAS DELIVERED AT THE INTERNATIONAL FEDERATION OF DOCUMENTATION- INTERNATIONAL FEDERATION OF INFORMATION PROCESSING CONFERENCE ON MECHANIZED INFORMATION STORAGE RETRIEVAL AND DISSEMINATION (ROME, ITALY, JUNE 15, 1967). IT IS AVAILABLE AS AD-663-178 FROM THE CLEARINGHOUSE FOR FEDERAL SCIENTIFIC AND TECHNICAL INFORMATION, SPRINGFIELD, VIRGINIA 22151, $3.00 FOR HARD COPY, $0.65 FOR MICROFICHE, 25 PAGES. (AUTHOR/CM)
Publication Type: N/A
Education Level: N/A
Audience: N/A
Language: N/A
Sponsor: N/A
Authoring Institution: International Business Machines Corp., Gaithersburg, MD. Federal Services Div.
Identifiers: International Federation for Documentation; INTERNATIONAL FEDERATION OF INFORMATION PROCESSING