Mongrel 

Supporting Effective Access Through User- and Topic-Based Language Models

Project Overview People Public Papers & Instruments Working Papers Home
 

In today’s networked information environment, tools to support information retrieval and filtering have become common. Despite the general utility and popularity of these tools, in many important respects their performance is mediocre. Text search engines and agent-based filtering systems make mistakes that are obvious and aggravating to users, and relevant documents are usually mixed with many others that are totally unrelated. These problems significantly lower the productivity and effectiveness of people using the tools, whether in education, science, business, or government. We believe that the fundamental issue that underlies all of these problems is the lack of adequate models of the user and the domain. In order to achieve breakthroughs in retrieval and filtering accuracy, the tools need to be able to use more information about the context of the query, better models of the user, and more knowledge about the domain.

User models and models of topics or domains are not new. A number of studies in the past 20 years have examined different approaches and implementations. In general, these studies did not have a significant impact on the design of retrieval and filtering systems, despite the obvious relevance of user modeling to such systems. We believe that some reasons for this lack of impact are that previous studies were unable to specify precisely how such models would be used to affect performance, that there were severe problems with how the data for such models would be elicited, and that there was no well-defined structure within which such models could be implemented.

In this proposal, we describe a new approach to user and domain or topic modeling that has the potential of significantly improving the effectiveness of information access and filtering. This approach is based on recent research on language models for information retrieval. In this approach, it is assumed that associated with every document or group of documents there are one or more probability distributions that model how the text in the document can be generated. This generative model is quite different from the standard probabilistic retrieval models and has a number of advantages. The key advantages for this project are that language models appear to capture the important aspects of user and domain modeling that have been observed in earlier experiments, and that retrieval techniques based on document language models have been shown to be very effective.

The project we propose combines the expertise and experience of one group in the development and testing of information retrieval models and systems, with that of another in user modeling and user studies in interactive systems. These two groups have a history of successful collaboration in related domains, which provides a solid basis for the proposed collaborative project. We describe a number of research issues, potential solutions, and a comprehensive experimental program that will establish the impact of the proposed approach. The evaluation of the new techniques can be done partly using standard collections like TREC, but will also involve a number of user studies in a laboratory setting, and studies of the impact on an operational Web search application with large numbers of users.

Slide Show (Power Point) of Mongrel Project.


This material is based upon work supported by the National Science Foundation under Grant No. 9911942. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).