Clustering Binary Data in the Presence of Masking Variables.

Brusco, Michael J.

Notes FAQ Contact Us

Back to results

Peer reviewed

Direct link

ERIC Number: EJ685079

Record Type: Journal

Publication Date: 2004-Dec

Pages: 13

Abstractor: Author

ISBN: N/A

ISSN: ISSN-1082-989X

EISSN: N/A

Clustering Binary Data in the Presence of Masking Variables

Brusco, Michael J.

Psychological Methods, v9 n4 p510-523 Dec 2004

A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the true cluster structure. The author presents a heuristic procedure that selects an appropriate subset from among the set of all candidate clustering variables. Specifically, this procedure attempts to select only those variables that contribute to the definition of true cluster structure while eliminating variables that can hide (or mask) that true structure. Experimental testing of the proposed variable-selection procedure reveals that it is extremely successful at accomplishing this goal.

Descriptors: Mathematics, Multivariate Analysis, Statistical Data, Statistical Analysis, Evaluation Research, Evaluation Methods

American Psychological Association, 750 First Street, NE, Washington, DC 20002-4242. Tel: 800-374-2721 (Toll Free); Tel: 202-336-5510; TDD/TTY: 202-336-6123; Fax: 202-336-5502; e-mail: journals@apa.org

Publication Type: Journal Articles; Reports - Research

Education Level: N/A

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Grant or Contract Numbers: N/A

Privacy | Copyright | Contact Us | Selection Policy | API