NotesFAQContact Us
Search Tips
ERIC Number: ED536548
Record Type: Non-Journal
Publication Date: 2011
Pages: 120
Abstractor: As Provided
Reference Count: 0
ISBN: ISBN-978-1-2670-6499-8
Predicting Audience Demographics of Web Sites Using Local Cues
Kim, Iljoo
ProQuest LLC, Ph.D. Dissertation, The University of Utah
The size and dynamism of the Web poses challenges for all its stakeholders, which include producers/consumers of content, and advertisers who want to place advertisements next to relevant content. A critical piece of information for the stakeholders is the demographics of the consumers who are likely to visit a given web site. However, predicting the demographics of consumers who are "likely" to visit a given web site, while being essential, remains a challenging task. Hence in this dissertation we ask the following questions: Is it possible to deduce the audience demographics of a web site based solely on the local cues such as the design or the content of the web site? If so, is it design, content, or combination that provides a good predictive model? In addition to the design or the content, is it also possible to use the semantics embedded within content to further improve the prediction performance? We explore these questions with statistical analyses as well as predictive models using various modeling schemes. From the results, we find that it is indeed possible to effectively predict demographics of consumers of a web site using cues embedded in the design or the content of its homepage. In addition, we build and evaluate an ensemble classifier that combines the predictions from both design and content cues. An analysis of the ensemble suggests the possible use of the approach for better prediction. In addition to the classification-based predictive model that predicts a discrete demographic class (e.g., female) within each demographic dimension (e.g., gender), we also explore a regression-based predictive model that predicts the demographic composition (e.g., 63.5% female) of a web site, which is a continuous dependent variable. We show that this model also works effectively with good estimation performance. Finally, we suggest a feature selection approach using Latent Dirichlet Allocation (LDA) method and show that semantics extracted from web site content using the method can also be utilized to achieve a competitive prediction performance while significantly improving the prediction efficiency. The approaches in this study serve as low-burden complements to the more intrusive and costly registration/cookie based techniques. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A