ERIC Number: ED382654
Record Type: Non-Journal
Publication Date: 1994-May
Reference Count: N/A
Variable Screening for Cluster Analysis.
Donoghue, John R.
Inclusion of irrelevant variables in a cluster analysis adversely affects subgroup recovery. This paper examines using moment-based statistics to screen variables; only variables that pass the screening are then used in clustering. Normal mixtures are analytically shown often to possess negative kurtosis. Two related measures, "m" and coefficient of bimodality "b," are also examined. A Monte Carlo study compared the screening measures to no selection, De Soete's (1988) ultrametric weights, and the forward selection procedure of Fowlkes, Gnanadesikan, and Kettenring (1988). Screening based on kurtosis degraded recovery and is not recommended. In contrast, screening on "m" or on "b" improved recovery over both no selection and forward selection, and screening performed as well as ultrametric weights. Combining screening with ultrametric weights performed extremely well. All methods were found to be somewhat sensitive to other types of error. Screening variables appears to be a viable alternative to both ultrametric weights and forward selection. The potential advantages and disadvantaged of screening are considered. Eleven tables and six figures illustrate the analyses. An appendix provides supplemental detail. (Contains 51 references.) (Author/SLD)
Publication Type: Reports - Evaluative
Education Level: N/A
Authoring Institution: Educational Testing Service, Princeton, NJ.