NotesFAQContact Us
Search Tips
ERIC Number: ED528627
Record Type: Non-Journal
Publication Date: 2009
Pages: 102
Abstractor: As Provided
Reference Count: 0
ISBN: ISBN-978-1-1093-9715-4
Hybrid Matching and Risk Assessment of the Misspelled Names [HMRA]
Varol, Cihan
ProQuest LLC, Ph.D. Dissertation, University of Arkansas at Little Rock
Companies acquire personal information from phone, World Wide Web, or email in order to sell or send an advertisement about their product. However, when this information is acquired, moved, copied or edited, the data may loose its quality. Often, the use of data administrators or a tool that has limited capabilities to correct the mistyped information can cause many problems. Moreover, most of the correction techniques are particularly implemented for the words used in daily conversations. Since personal names have different characteristics compared to general text, firstly, we proposed a hybrid matching algorithm (PNRS) which employs phonetic encoding, string matching and statistical facts to provide a possible candidate for misspelled names. "SoundD Phonetic Strategy" is created to provide name suggestions based on the phonetic structure of the misspelled name, "Restricted Near Miss" Strategy is build to produce name suggestions based on the pattern of the ill-defined data, and "Weighted Census Score" is used to produce the final suggestion based on the frequency of usage of the candidate names to overcome the problem. The PNRS system makes it possible to suggest the closest match for the ill defined data compared to the other algorithms that are available in the literature. Secondly, in order to justify the effectiveness of PNRS, we attempted to check the correctness without looking at the reference table. Therefore, a decision support system is embedded to the PNRS structure. This support system contains a similarity based name cluster which is created by using "k-medoid's" method. At the end, PNRS Distance Metric (PNRSDM) is mathematically modeled in order to provide a confidence level for the results achieved by PNRS. Thirdly, in order to identify the impact on customer satisfaction caused by ill-defined/dirty data (misspelled or mistyped data) we define a mathematical model for error propagation in Information Quality Products and created NXN matrix. A framework and business case is created based on Talend Open Studio 3.0. Unified Modeling Language (UML) Activity Diagrams are used to model the estimation of error propagation where the messages and associated attributes yield to calculate the single and multi step error propagation in the workflow. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A