NotesFAQContact Us
Search Tips
ERIC Number: ED561760
Record Type: Non-Journal
Publication Date: 2013
Pages: 165
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-3034-7253-4
Improving the Scalability of an Exact Approach for Frequent Item Set Hiding
LaMacchia, Carolyn
ProQuest LLC, Ph.D. Dissertation, Nova Southeastern University
Technological advances have led to the generation of large databases of organizational data recognized as an information-rich, strategic asset for internal analysis and sharing with trading partners. Data mining techniques can discover patterns in large databases including relationships considered strategically relevant to the owner of the data. The frequent item set hiding problem is an area of active research to study approaches for hiding the sensitive knowledge patterns before disclosing the data outside the organization. Several methods address hiding sensitive item sets including an exact approach that generates an extension to the original database that, when combined with the original database, limits the discovery of sensitive association rules without impacting other non-sensitive information. To generate the database extension, this method formulates a constraint optimization problem (COP). Solving the COP formulation is the dominant factor in the computational resource requirements of the exact approach. This dissertation developed heuristics that address the scalability of the exact hiding method. The heuristics are directed at improving the performance of COP solver by reducing the size of the COP formulation without significantly affecting the quality of the solutions generated. The first heuristic decomposes the COP formulation into multiple smaller problem instances that are processed separately by the COP solver to generate partial extensions of the database. The smaller database extensions are then combined to form a database extension that is close to the database extension generated with the original, larger COP formulation. The second heuristic evaluates the revised border used to formulate the COP and reduces the number of variables and constraints by selectively substituting multiple item sets with composite variables. Solving the COP with fewer variables and constraints reduces the computational cost of the processing. Results of heuristic processing were compared with an existing exact approach based on the size of the database extension, the ability to hide sensitive data, and the impact on nonsensitive data. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A