NotesFAQContact Us
Search Tips
ERIC Number: ED533241
Record Type: Non-Journal
Publication Date: 2011
Pages: 197
Abstractor: As Provided
Reference Count: 0
ISBN: ISBN-978-1-1248-9068-5
A Holistic, Similarity-Based Approach for Personalized Ranking in Web Databases
Telang, Aditya
ProQuest LLC, Ph.D. Dissertation, The University of Texas at Arlington
With the advent of the Web, the notion of "information retrieval" has acquired a completely new connotation and currently encompasses several disciplines ranging from traditional forms of text and data retrieval in unstructured and structured repositories to retrieval of static and dynamic information from the contents of the surface and deep Web. From the point of view of the end user, a common thread that binds all these areas is to support appropriate alternatives for allowing users to specify their intent (i.e., the user input) and displaying the resulting output ranked in an order relevant to the users. In the context of specifying an user's intent, the paradigms of "querying" as well as "searching" have served well, as the staple mechanisms in the process of information retrieval over structured and unstructured repositories. Processing queries over known, structured repositories (e.g., traditional and Web databases) has been well-understood, and search has become ubiquitous when it comes to unstructured repositories (e.g., document collections and the surface Web). Furthermore, searching structured repositories has been explored to a limited extent. However, there is not much work in querying unstructured sources which, we believe is the next step in performing focused retrievals. Correspondingly, one of the important contributions of this dissertation is a novel semantic-guided approach, termed Query-By-Keywords (or QBK), to generate queries from search-like inputs for unstructured repositories. Instead of burdening the user with schema details, this approach utilizes pre-discovered semantic information in the form of taxonomies, relationship of keywords based on context, and attribute & operator compatibility to generate query skeletons that are subsequently transformed into queries. Additionally, progressive feedback from users is used to further improve the accuracy of these query skeletons. The overall focus thus, is to propose an alternative paradigm for the generation of queries on unstructured repositories using as little information from the user as possible. Irrespective of the template for intent specification (i.e., either a search or a query request), the number of results typically returned in response to such intents, are often, extremely large. This is particularly true in the context of the deep Web where a large number of results are returned for queries on Web databases and choosing the most useful answer(s) becomes a tedious and time-consuming task. Most of the time the user is not interested in all answers; instead s/he would prefer those results, that are ranked based on her/his interests, characteristics, and past usage, to be displayed before the rest. Furthermore, these preferences vary as users and queries change. Accordingly, in this dissertation, we propose a novel "similarity"-based framework for supporting user- and query-dependent ranking of query results in Web databases. This framework is based on the intuition that--for the results of a given query, similar users display comparable ranking preferences, and a user displays analogous ranking preferences over results of similar queries. Consequently, this framework is supported by two novel and comprehensive models of: (1) Query Similarity, and (2) User Similarity, proposed as part of this work. In addition, this ranking framework relies on the availability of a small yet representative set of ranking functions collected across several user-query pairs, in order to rank the results of a given user query at runtime. Appropriately, we address the subsequent problem of establishing a relevant "workload" of ranking functions that assists the similarity model in the best possible way to achieve the goal of user- and query-dependent ranking. Furthermore, we advance a novel "probabilistic learning model" that infers individual ranking functions (for this workload) based on the implicit browsing behavior displayed by users. We establish the effectiveness of this complete ranking framework by experimentally evaluating it on Google Base's "vehicle" and "real estate" databases with the aid of Amazon's Mechanical Turk users. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A