A Distance Measure for Automatic Sequential Document Classification.

Kar, B. Gautam; White, Lee J.

Notes FAQ Contact Us

Download full text

ERIC Number: ED160041

Record Type: Non-Journal

Publication Date: 1975-Aug

Pages: 168

Abstractor: N/A

ISBN: N/A

ISSN: N/A

EISSN: N/A

A Distance Measure for Automatic Sequential Document Classification.

Kar, B. Gautam; White, Lee J.

The feasibility of using a distance measure, called the Bayesian distance, for automatic sequential document classification was studied. Results indicate that, by observing the variation of this distance measure as keywords are extracted sequentially from a document, the occurrence of noisy keywords may be detected. This property of the distance measure has been utilized to design a sequential classification algorithm which works in two phases. In the first phase keywords extracted from a document are partitioned into two groups, the good keyword group and the noisy keyword group. In the second phase these two groups are analyzed separately to assign primary and secondary classes to a document. The algorithm has been applied to the SPIN data base, and very encouraging results have been obtained. Appendices include descriptions and mathematical models of (1) Bayesian distance and classification error, (2) Bayesian distance and alpha-j values, (3) Bayesian distance and keyword vectors, and (4) the classification algorithm. (Author/CMV)

Descriptors: Algorithms, Automatic Indexing, Bayesian Statistics, Classification, Cluster Grouping, Databases, Documentation, Feasibility Studies, Mathematical Models, Measurement Techniques, Probability, Sequential Approach, Statistical Analysis

Publication Type: Reports - Research

Education Level: N/A

Audience: N/A

Language: English

Sponsor: National Science Foundation, Washington, DC. Div. of Science Information.

Authoring Institution: Ohio State Univ., Columbus. Computer and Information Science Research Center.

Grant or Contract Numbers: N/A

Privacy | Copyright | Contact Us | Selection Policy | API