NotesFAQContact Us
Search Tips
ERIC Number: ED550422
Record Type: Non-Journal
Publication Date: 2012
Pages: 130
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-2678-2419-6
Person Name Disambiguation in the Multicultural and Online Setting
Treeratpituk, Pucktada
ProQuest LLC, Ph.D. Dissertation, The Pennsylvania State University
With the recent rise in popularity of social network sites, more and more personal information is becoming available online. Since a person's information is generally available in various formats across multiple sites, there are ever increasing interests in consolidating such personal information from multiple information sources. The goal of person name disambiguation is to group these people references to the corresponding real-world people. These references can range from personal homepages to name mentioned in news articles. This dissertation examines the person name disambiguation problem in three different settings: (1) the name-based person name disambiguation, (2) the metadata based person name disambiguation and (3) the person name disambiguation in online setting. In the simplest setting--the name-based person name disambiguation, records are disambiguated based purely on personal names. Since personal names are culture-dependent, we propose a novel name matching similarity that take the ethnicity of the names into consideration. More specifically, we propose a name-ethnicity classifier based on multinomial logistic regression and a ethnicity-sensitive name matching similarity based on Smith-Waterman alignment algorithm, where different cost matrices are applied depending on the ethnicity of the names being compared. In the second setting, we examine the person name disambiguation problem where additional information other than personal names is also available. These additional information includes both association information, such as one's affiliation and social network, and contextual information, such as the content of the document where one's name is mentioned. We propose a random forest-based method for aggregating multiple types of metadata information in determining whether two person name records or more should be linked. In the last setting, we consider the person name disambiguation problem from the real system perspective, where the number of people references to be disambiguated are not static but ever increasing. Here we propose an online clustering method with constraints for person name disambiguation, where the integrity of each person cluster is continuously enforced. Our experiment shows that our method outperforms the previous static clustering approach without constraints. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A