Recommended readings for IR

This schedule is subject to alterations.

(Legend for the readings: MBK = Meadow, Boyce and Kraft, vR = van Rijsbergen, RB = Rik Belew, SJW = Sparck-Jones and Willett)

Topics Readings

Introduction to IR. Information vs data retrieval.

What do we want from IR ? Introduction to evaluation.


vR, ch.1: "Introduction"; RB, ch.1: "Overview"; SJW, ch.1, "Overall introduction", ch.2: "Introduction"; Croft, W.B. (1995) "What do people want from information retrieval ?", D-Lib Magazine, November, Belkin N.J. & Croft, W.B. (1992) "Information filtering and information retrieval: Two sides of the same coin?" Communications of the ACM, v. 35 no. 12: 29-38..

IR concepts. Aboutness. Relevance.

Rationalist vs. empriricist approaches (AI vs. Stats)

SJW, ch.3: "Introduction". Belkin, N.J. (1978) Information concepts for information science. Journal of Documentation, v. 34, no.1: 55-85. Hutchins, W.J. (1978) "The concept of "aboutness" in subject indexing". Journal of Informatics, vol.1, no.1, April1977, pp.17-35 ( also Aslib Proceedings, vol. 30: 172-181) (also SJW, 93-97).

Saracevic, T. (1975) "Relevance: a review of and a framework for the thinking on the topic", Journal of the American Society for Information Science, vol. 26: 321-343 (SJW, 143-165).

Document and query representation. Manual vs. automatic indexing.

J. D. Anderson & J. Perez-Carballo, “The nature of indexing: how humans and machines analyze messages and texts for retrieval. Part I: Research, and the nature of human indexing; Part II: Machine indexing, and the allocation of human versus machine effort”, Information Processing and Management, vol. 37 (2001), 231-254, 255-277.

Furnas, G.W., Landauer, T. K., Gomez, L. M., Dumais, S. T. (1987) "The vocabulary problem in human-system communication", Communications of the ACM, 30(11), 964-971.

Automatic indexing. Lexical analysis. Weighting. Data structures.

vR, ch.2: “Automatic text analysis”; RB, ch.2: “Extracting lexical features”; SJW, ch.6, "Introduction" (esp. the section on Indexing). Salton, G. & Buckley, C. (1988) “Term weighting approaches in automatic text retrieval”, Information Processing and Management, vol. 24: 513-523 (SJW, pp. 323-328). Robertson, S. E. and Sparck Jones, K. (1997), “Simple, proven approaches to text retrieval”, University of Cambridge Computer Laboratory Technical Report no. 356, 1994 (updated 1996,1997).

For stemming code or a demo, see Martin Porter’s site.

Mikheev, Andrei “Document Centered Approach to Text Normalization”, SIGIR 2000, Athens.

Models of IR.

Interaction models. Indexing models. Language models. Topic models. User models.

Relevance feedback.

SJW, intro to ch.5 ("Models") and ch.6 ("Techniques"). Cooper, W.S. "Getting beyond Boole", Information Processing and Management, vol. 24: 243-248 (also in SJW, 265-267); van Rijsbergen,C. J. "A new theoretical framework for information retrieval", SIGIR'86; Robertson, S.E. "The probability ranking principle in IR", Journal of Documentation, vol 33: 294-304, 1977 (also in SJW, 281-286). Salton, G., Wong, A. & Yang, C.S. (1975) “A vector space model for automatic indexing”, Communications of the ACM, vol 18: 613-620 (also in SJW, 273-280); N. J. Belkin, R. N. Oddy, and H. M. Brooks "ASK for information retrieval: Part I. Background and theory.", Journal of Documentation, 38(2):61--71, 1982; Saracevic, T. "Interactive models in information retrieval (IR): Progress, problems, proposal", in Proceedings of the 1996 ASIS Annual Meeting, Medford, NJ. Turtle, H. & Croft, W.B. (1990) “Inference networks for document retrieval”, SIGIR 1990, New York: ACM, 1-24; Ponte, J. and Croft, W.B. "A Language Modeling Approach to Information Retrieval", SIGIR'98; Lavrenko, V. "Language models", tutorial at SIGIR 2003.

Robertson, S. and Sparck Jones, K. "Simple, proven approaches to text retrieval", Technical Report TR356, Cambridge University Computer Laboratory, 1997; Salton, G. and Buckley, C. "Term-weighting approaches in automatic text retrieval", Information Processing and Management, 24(5):513-523, 1988.

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman R. (1990) "Indexing by Latent Semantic Analysis", JASIST, 41(6), 391-407 (a less mathematical alternative to Deerwester's original LSA paper is: Landauer, T. K., Foltz, P. W., Laham, D. "An Introduction to Latent Semantic Analysis", Discourse Processes, 25,259-284, 1998).

Bates, Marcia J. “The Design of Browsing and Berrypicking Techniques for the Online Search Interface.", Online Review 13 (October 1989): 407-424.

Robert Ash (1990) "Information Theory", Dover Publications, ISBN: 0486665216.

Information Retrieval as interaction. Evaluation of interactive systems.

Intro to Statistics for IR.

Case study: Rutgers at TREC, Interactive Track (here's the presentation from Interactive TREC 2002).

Preece et al, "Interaction design" and/or complementary website, esp. chapters on evaluation.

Any book or online tutorial on Statistics (concentrate on hypothesis testing, t-tests, ANOVA, Chi-square, correlation, ...); Hull, D. (1993) "Using statistical testing in the evaluation of retrieval experiments", SIGIR'93'; Buckley, C. and Voorhees E.M. (2000) "Evaluating evaluation measure stability", SIGIR'00; Voorhees E.M. and Buckley, C. (2002) "The effect of topic set size on retrieval experiment error", SIGIR'02.

TREC, Rutgers' recent work at TREC (see Muresan's publications, and TREC 2003 webpage), and previous work (and actually the whole 37 (3), May 2001 issue of Information Processing and Management, focussing on Interactive TREC, would be interesting for people who choose to do an evaluation for the final project).

User interfaces for IR systems. Part I: Interaction models.

Chapter 10: “User Interfaces and Visualization” by Marti Hearst in “Modern Information Retrieval”.

Journal of the American Society of Information Science, vol. 43, issue 2, 1992, special issue on Human-Computer Interface: “Introduction and Overview” by Lunin and Harman, “Interfaces for end-user information seeking” by Gary Marchionini, “User-friendly systems instead of user-friendly front-ends” by Donna Harman, “Intelligent information retrieval: An introduction” by Susan Gauch, “Models for hypertext” by Mark F. Frisse and Steve B. Cousins; Muresan, G. and Harper, D. J. “Document Clustering and Language Models for System-Mediated Information Access”, ECDL’01, Darmstadt, p. 438-449.

Journal of the American Society of Information Science, vol 57, issue 6, 2006 - selection of papers on "Perspectives on Search User Interfaces: Best Practices and Future Visions".

Bates, M. (1990) “Where should the person stop and the information search interface start?” Information Processing and Management, v 26(5): 575-591.

O’Day, V. L. and Jeffries, R. “Orienteering in an information landscape: how information seekers get from here to there”, InterCHI’93, Amsterdam.

Brajnik, G., Mizzaro, S., Tasso, C. and Venuti, F. “Strategic Help in User Interfaces for Information Retrieval”, JASIST, 53(5), 2002, p. 343-358.

Campbell, I. “Supporting Information Needs by Ostensive Definition in an Adaptive Information Space”, MIRO’95, or "The Ostensive Model of Developing Information Needs", CoLIS, Copenhagen, 1996.

Ryen W. White, Ian Ruthven, Joemon M. Jose, C. J. Van Rijsbergen, "Evaluating implicit feedback models using searcher simulations", ACM Transactions on Information Systems, 23 (3): 325-361, July 2005;
Ryen W. White, Ian Ruthven "A study of interface support mechanisms for interactive information retrieval", JASIST, 57 (7), 2006.

 

User interfaces for IR systems. Part II : Tools and techniques.

Shneiderman, Ben, ch.“Information Search and Visualization” in “Designing the user Interface”, 3rd ed., 1997 (see webpage); Belkin, N.J., Marchetti, P.-G., Cool, C. (1993) BRAQUE: Design of an interface to support user interaction in information retrieval. Information Processing and Management, 29 (3): 325-344; Chalmers, M. and Chitson, P. “Bead: Exploration in information visualization”, SIGIR’92, p. 330-337; Nowell, L.T., France, R.K., Hix, D., Heath, L.S., Fox, E.A. (1996) “Visualizing search results: Some alternatives to query-document similarity”, SIGIR’ 96, p. 67-75; Williamson, C., Shneiderman, B. (1992) “The Dynamic HomeFinder: Evaluating dynamic queries in a real-estate information exploration system”, SIGIR’92, p. 338-346; Lin, Xia “Map displays for information retrieval”, JASIS, 48(1), 1997, p. 40-54; George Robertson (2000) "The Task Gallery: a 3D window manager", SIGCHI.

Korfhage, Robert R. “To see, or not to see - is that the query?”, SIGIR’91, p. 134-141;

Gary Marchionini, “Interfaces for end-user information seeking”, JASIS, 43(2), 1992.

Cutting, D. R., Pedersen, J. O., Karger, D. and Tukey, J. W. “Scatter/Gather: A cluster-based approach to browsing large document collections”, SIGIR’92, p. 318-329;

Hearst, M. and Karadi, C. "Cat-a-Cone: An Interactive Interface for Specifying Searches and Viewing Retrieval Results using a Large Category Hierarchy", SIGIR'97 , Philadelphia, PA;

Chen, M., Hearst, M., Hong, J., and Lin, J. "Cha-Cha: A System for Organizing Intranet Search Results" in the Proceedings of the 2nd USENIX Symposium on Internet Technologies and SYSTEMS (USITS), Boulder, CO, 1999.

Human Computer Interaction (HCI) Preece, J., Rogers, Y. and Sharp, H. (2002) – “Interaction Design – Beyond Human-Computer Interaction” (and associated webpage).
Information Visualization. Spence, R. (2000) – “Information Visualization”, ISBN: 0201596261; Chen, C. (1999) – “Information Visualisation and Virtual Environments”, ISBN: 1852331364; Card, S. K., MacKinlay, J. D. and Shneiderman (1999) – “Readings in Information Visualization : Using Vision to Think”, ISBN: 1558605339. Also, University of Maryland’s HCI Lab website, and InfoViz, a repository for IV.

Evaluation of IR systems. Experimental vs operational IR systems.

Evaluation of interactive IR systems. IR evaluation in context.

In Baeza-Yates & Ribeiro-Neto “Modern Information Retrieval”, ch.3: “Retrieval Evaluation”; in RK, ch.4: "Assessing the Retrieval".
In Sparck Jones & Willett, from Chapter 4, the "Introduction" and the articles by Saracevic, et al., Lancaster, and Harman;
Su, L. (1992) "Evaluation measures for interactive information retrieval", Information Processing and Management, 28(4): 503-516;
Harman, Donna “Overview of the first TREC conference”, SIGIR’93, Pittsburg.

In JASIS, 47(1), January 1996, Special Issue: Evaluation of Information Retrieval :- Tague-Sutcliffe, J. M. – “Some perspectives on the evaluation of information retrieval systems”, Blair, D. C. – “STAIRS redux: Thoughts on the STAIRS evaluation, ten years after”, Hersh, W. et al. – “A task-oriented approach to information retrieval evaluation”; Ellis, D. – “The dilemma of measurement in information retrieval research”; Beaulieu, M. et al. – “Evaluating interactive systems in TREC”.

In Information Processing and Management, 31 (3), May-June 1995, Special issue: TREC :- Harman, D. - “Overview of the Second Text Retrieval Conference (TREC-2)”; Sparck Jones, K. – “Reflections on TREC”; Robertson, S. E. et al. – “Large Test Collection Experiments on an Operational, Interactive System: Okapi at TREC”; Belkin, N. et al. – “Combining the Evidence of Multiple Query Representations for Information Retrieval”.
Ellen M. Voorhees - "TREC: Improving Information Access through Evaluation" in the ASIST Bulletin, Vol. 32, No. 1, October/November 2005.
Ellen M. Voorhees and Donna Harman TREC: Experiment and Evaluation in Information Retrieval, MIT Press, 2005, ISBN 0-262-22073-3.

In Information Processing and Management, 36 (1), January 2000, Special issue: TREC :- Harman, D. - “Overview of the Sixth Text REtrieval Conference (TREC-2)”; Sparck Jones, K. – “Further reflections on TREC”; Robertson, S. E. et al. – “Experimentation as a way of life: Okapi at TREC”.

Borlund, P. and Ingwersen, P. (1997) “The development of a method for the evaluation of interactive information retrieval systems”, Journal of Documentation, 53(3).

In Information Processing and Management, 37 (3), May 2001, Special issue: Interactive TREC :- Hersh, W. and Over, P. - “Interactivity at the Text Retrieval Conference (TREC)”; Over, P. - “The TREC interactive track: an annotated bibliography”; Hersh et al. – “Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations”; Belkin, N. et al. “Iterative exploration, design and evaluation of support for query reformulation in interactive information retrieval”; Allan, J. et al. – “Evaluating combinations of ranked lists and visualizations of inter-document similarity”; Wu, M. et al. – “Using clustering and classification approaches in interactive retrieval”; Larson, R. R. - “ TREC interactive with Cheshire II”; Bodner, R. C. et al. – “The impact of text browsing on text retrieval performance”; Yang, K. - “Passage feedback with IRIS”.

The Text Retrieval Conference (TREC) webpage. In TREC2001, Belkin et al. “Rutgers' TREC 2001 Interactive Track Experience”. Nicholas J. Belkin and Gheorghe Muresan Measuring Web Search Effectiveness: Rutgers at Interactive TREC, in Measuring Web Search Effectiveness: The User Perspective, workshop at WWW 2004, May 2004, New York (paper, presentation).

Jakob Nielsen's Alertbox, March 1, 2004: "Risks of Quantitative Studies"

Saracevic, T. “Evaluation of Evaluation in Information Retrieval”, SIGIR’95.

Reid, J. “A Task-Oriented Non-Interactive Evaluation Methodology for Information Retrieval Systems”, Information Retrieval, 2(1), Feb 2000.

Ying Sun, Paul B. Kantor "Cross-Evaluation: A new model for information system evaluation", JASIST, 57(5): 614-628, March 2006.

Structure. Document and query structure. Links. Categorization vs. clustering. Filtering. XML & INEX.

vR, ch.3: “Automatic classification”;
SJWt, from ch.6 the article by Griffiths, Luckhurst & Willett; from ch.8, the article by Hayes, Knecht and Cellio and the article by Rau;
Leuski, Anton "Evaluating Document Clustering for Interactive Information Retrieval", CIKM'01, 33-40;
Hearst, Marti “The Use of Categories and Clusters in Information Access Interfaces”, in Natural Language Information Retrieval, Strzalkowski (ed.), Kluwer Academic Publishers, 1999;
Sanderson, M. and Croft, W. B. “Deriving concept hierarchies from text”, SIGIR 1999, Berkeley; Hideo Joho et al (2002) "Hierarchical presentation of expansion terms", Proceedings of the 2002 ACM symposium on Applied Computing;
Tombros, A., Villa, R. and Van Rijsbergen, C. J. (2002) “The effectiveness of query-specific hierarchic clustering in information retrieval”, Information Processing and Management, 38(4);
Yang, Yiming “An Evaluation of Statistical Approaches to Text Categorization”, Information Retrieval 1, 1999, p.69-90;
Fabrizio Sebastiani (2002) "Machine learning in automated text categorization", ACM Computing Surveys (CSUR), 34(1):1-47.

Borlund, P. “Experimental Components for the evaluation of interactive information retrieval systems”, Journal of Documentation, Vol. 56, no. 1, 2000, 71-90.

Hearst, M. A. and Pedersen, J. O. “Reexamining the cluster hypothesis: scatter/gather on retrieval results”, SIGIR’96, Zurich, p. 76-84

Y. Kural, S. Robertson and S. Jones (2001) "Deciphering cluster representations", Information Processing and Management 37, 593-601.

Anastasios Tombros, C.J. van Rijsbergen (2004) "Query-sensitive similarity measures for information retrieval" (invited paper), Knowledge and Information Systems, 6(5):617-642, September 2004;
Anastasios Tombros, Robert Villa, C.J. van Rijsbergen (2002) "The effectiveness of query-specific hierarchic clustering in information retrieval", Information Processing & Management, 38(4):559-582, July 2002.

Weili Wu; Hui Xiong; Shekhar, S. (Eds.) (2004) "Clustering and Information Retrieval", ISBN: 1-4020-7682-7.
Survey of Text Mining
Berry, Michael W. (Ed.) (2004) "Clustering, Classification, and Retrieval: Clustering, Classification, and Retrieval", ISBN: 0-387-95563-1.

IR on the Web.

Journal of the American Society for Information Science and Technology, 53(2), 2002 - Special issue on Web research;
Almind, T. C. and Ingwersen, P. (1997) “Informetric Analysis on the World Wide Web: Methodological Approaches to Webometrics”, Journal of Documentation, 53(4);
Chu, H. and Rosenthal, M (1996) “Search Engines for the World Wide Web: A Comparative Study and Evaluation Methodology”, Proceedings of ASIS’96;
Mei Kobayashi and Koichi Takeda (2000) "Information retrieval on the web", ACM Computing Surveys (CSUR), 32(2); Arvind Arasu et al (2001) "Searching the Web", ACM Transactions on Internet Technology (TOIT), 1(1): 2-43.

Hao Chen and Susan Dumais (2000) “Bringing Order from Chaos: automatically categorizing search results”, SIGCHI; Susan Dumais and Hao Chen (2000) "Hierarchical classification of Web content", SIGIR, Susan Dumais, Edward Cutrell, Hao Chen (2001) "Optimizing search by showing results in context ", SIGCHI; Ed H. Chi, Peter Pirolli, James Pitkow (2000), "The scent of a site: a system for analyzing and predicting information scent, usage, and usability of a Web site", SIGCHI.

Lary Page, Sergey Brin et al “PageRank: Bringing Order to the Web”, Stanford Uni. report (the model behind Google).
Krishna Bharat and Monika R. Henzinger (1998) "Improved algorithms for topic distillation in a hyperlinked environment", SIGIR.
Spink, Amanda (2002) “A user-centered approach to evaluating human interaction with Web search engines: an exploratory study”, Information Processing and Management, 38(3).
Ellis, D., Ford, N. and Furner, J. (1998) “In search of the unknown user: indexing, hypertext and the World Wide Web”, Journal of Documentation, 54(1).

Wendy Lucas, Heikki Topi (2004) "Training for Web search: Will it get you in shape?", JASIST, 55(13):1183-1198.

Search EngineWatch
How Internet Search Engines Work
Pew report: Internet and American life

Informetrics Dietmar Wolfram, "Applied Informetrics for Information Retrieval Research", in New Directions in Information Management, no.36, Libraries Unlimited, 2003.

Artificial Intelligence

Machine Learning

Data Mining

Stuart J. Russell, Peter Norvig () "Artificial Intelligence: A Modern Approach", Prentice Hall, ISBN: 0137903952.

Tom M. Mitchell (1997) "Machine Learning", McGraw-Hill, ISBN: 0070428077 (scan of ch.1).

Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 [DHS 2000] (scans of ch.1 and ch.5).

Ian H. Witten, Eibe Frank (2000) "Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations", Morgan-Kaufmann, ISBN: 1558605525;
Jiawei Han, Micheline Kamber (2000) "Data Mining: Concepts and Techniques", Morgan-Kaufmann, ISBN: 1558604898.

Fabrizio Sebastiani (2002) "Machine Learning in Automated Text Categorization", ACM Computing Surveys, Vol. 34, No. 1, March 2002, pp. 1–47.

Burges' SVM tutorial.

Cristianini, N., Shawe-Taylor, J. (2000) "An Introduction to Support Vector Machines", Cambridge University Press.

More resources in John Platt’s webpage.

Natural Language Processing in IR.

Information extraction.

Document summarization.

Robert Krovetz and W. Bruce Croft (1992) "Lexical ambiguity and information retrieval", ACM Transactions on Information Systems, 10(2):115-141;
Smeaton, A. F. (1997) "Using NLP or NLP Resources for Information Retrieval Tasks" In Strzalkowski, T., editor, Natural Language Information Retrieval. Kluwer Academic Publishers;
T. Strzalkowski, L. Guthrie, J. Karlgren, J. Leistensnider, F. Lin, J. Perez-Carballo, T. Straszheim, J. Wang and J. Wilding (1996) "Natural Language Information Retrieval: TREC-5 Report" In D. K. Harman, editor, Proceedings of the Fifth Text REtrieval Conference (TREC-5);
Voorhees, EM, "Natural Language Processing and Information Retrieval" in Pazienza, M.T. (ed.), Information Extraction: Towards Scalable, Adaptable Systems, New York: Springer, 1999, pp. 32-48.

Advanced topics: Multimedia IR (image, video, music, ...).

Eakins, J. P. and Graham, M. E. "Content-based Image Retrieval: A Report to the JISC Technology Applications Programme";
Joemon Jose (1998) An integrated approach for multimedia information retrieval, PhD thesis, School of Computing and Mathematical Sciences, Robert Gordon University, Aberdeen, UK;
Jana Urban and Joemon M. Jose and C.J. van Rijsbergen(2003) "An Adaptive Approach Towards Content-Based Image Retrieval", Proc. of the Third International Workshop on Content-Based Multimedia Indexing (CBMI'03), pages 119-126, Rennes, France.

ACM Transactions on Information Systems (TOIS), vol 22, no.1, January 2004 - Joseph A. Konstan: "Introduction to recommender systems: Algorithms and Evaluation"; J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl: "Evaluating collaborative filtering recommender systems";

JASIST 55 (12), October 2004 - "Perspectives on ... Music Information Retrieval".

Personalization and user modeling.

Collaborative systems. Recommender systems.

Susan Dumais et al (2003) "Stuff I've seen: a system for personal information retrieval and re-use", SIGIR, 72-79.

Bruce, H. (2005): "Personal, anticipated information need", Information Research, 10(3) paper 232.
Jones, William (2004) "Finders, keepers: the present and future perfect in support of personal information management", First Monday, 9(3), March 2004.

AI and IR.

Agents.