NotesFAQContact Us
Search Tips
ERIC Number: ED546259
Record Type: Non-Journal
Publication Date: 2012
Pages: 164
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-2676-0864-2
Effective and Efficient Correlation Analysis with Application to Market Basket Analysis and Network Community Detection
Duan, Lian
ProQuest LLC, Ph.D. Dissertation, The University of Iowa
Finding the most interesting correlations among items is essential for problems in many commercial, medical, and scientific domains. For example, what kinds of items should be recommended with regard to what has been purchased by a customer? How to arrange the store shelf in order to increase sales? How to partition the whole social network into several communities for successful advertising campaigns? Which set of individuals on a social network should we target to convince in order to trigger a large cascade of further adoptions? When conducting correlation analysis, traditional methods have both effectiveness and efficiency problems, which will be addressed in this dissertation. Here, we explore the effectiveness problem in three ways. First, we expand the set of desirable properties and study the property satisfaction for different correlation measures. Second, we study different techniques to adjust original correlation measure, and propose two new correlation measures: the Simplified ? [superscript 2] with Continuity Correction and the Simplified ? [superscript 2] with Support. Third, we study the upper and lower bounds of different measures and categorize them by the bound differences. Combining with the above three directions, we provide guidelines for users to choose the proper measure according to their situations. With the proper correlation measure, we start to solve the efficiency problem for a large dataset. Here, we propose a fully-correlated itemset (FCI) framework to decouple the correlation measure from the need for efficient search. By wrapping the desired measure in our FCI framework, we take advantage of the desired measure's superiority in evaluating itemsets, eliminate itemsets with irrelevant items, and achieve good computational performance. In addition, we identify a 1-dimensional monotone property of the upper bound of any good correlation measure, and different 2-dimensional monotone properties for different types of correlation measures. We can either use the 2-dimensional search algorithm to retrieve correlated pairs above a certain threshold, or our new Token-Ring algorithm to find top-k correlated pairs to prune many pairs without computing their correlations. In order to speed up FCI search, we build an enumeration tree to save the fully-correlated value (FCV) for all the FCIs under an initial threshold. We can either efficiently retrieve the desired FCIs for any given threshold above the initial threshold or incrementally grow the tree if the given threshold is below the initial threshold. With the theoretical analysis on correlation search, we applied our research to two typical applications: Market Basket Analysis and Network Community Detection. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A