The Information Retrieval Series

Chirag Shah · Gerald J. Kowalski · James Z. Wang · Marie-Francine Moens · Massimo Melucci · W. Bruce Croft

Latest release: March 16, 2023

Series

Books

Information Retrieval Systems: Theory and Implementation

Book 1·Aug 2007

The growth of the Internet and the availability of enormous volumes of data in digital form have necessitated intense interest in techniques to assist the user in locating data of interest. The Internet has over 350 million pages of data and is expected to reach over one billion pages by the year 2000. Buried on the Internet are both valuable nuggets to answer questions as well as a large quantity of information the average person does not care about. The Digital Library effort is also progressing, with the goal of migrating from the traditional book environment to a digital library environment. The challenge to both authors of new publications that will reside on this information domain and developers of systems to locate information is to provide the information and capabilities to sort out the non-relevant items from those desired by the consumer. In effect, as we proceed down this path, it will be the computer that determines what we see versus the human being. The days of going to a library and browsing the new book shelf are being replaced by electronic searching the Internet or the library catalogs. Whatever the search engines return will constrain our knowledge of what information is available. An understanding of Information Retrieval Systems puts this new environment into perspective for both the creator of documents and the consumer trying to locate information.

Cross-Language Information Retrieval

Book 2·Dec 2012

0.0

Most of the papers in this volume were first presented at the Workshop on Cross-Linguistic Information Retrieval that was held August 22, 1996 dur ing the SIGIR'96 Conference. Alan Smeaton of Dublin University and Paraic Sheridan of the ETH, Zurich, were the two other members of the Scientific Committee for this workshop. SIGIR is the Association for Computing Ma chinery (ACM) Special Interest Group on Information Retrieval, and they have held conferences yearly since 1977. Three additional papers have been added: Chapter 4 Distributed Cross-Lingual Information retrieval describes the EMIR retrieval system, one of the first general cross-language systems to be implemented and evaluated; Chapter 6 Mapping Vocabularies Using Latent Semantic Indexing, which originally appeared as a technical report in the Lab oratory for Computational Linguistics at Carnegie Mellon University in 1991, is included here because it was one of the earliest, though hard-to-find, publi cations showing the application of Latent Semantic Indexing to the problem of cross-language retrieval; and Chapter 10 A Weighted Boolean Model for Cross Language Text Retrieval describes a recent approach to solving the translation term weighting problem, specific to Cross-Language Information Retrieval. Gregory Grefenstette CONTRIBUTORS Lisa Ballesteros David Hull W, Bruce Croft Gregory Grefenstette Center for Intelligent Xerox Research Centre Europe Information Retrieval Grenoble Laboratory Computer Science Department University of Massachusetts Thomas K. Landauer Department of Psychology Mark W. Davis and Institute of Cognitive Science Computing Research Lab University of Colorado, Boulder New Mexico State University Michael L. Littman Bonnie J.

Text Retrieval and Filtering: Analytic Models of Performance

Book 3·Dec 2012

0.0

Text Retrieval and Filtering: Analytical Models of Performance is the first book that addresses the problem of analytically computing the performance of retrieval and filtering systems. The book describes means by which retrieval may be studied analytically, allowing one to describe current performance, predict future performance, and to understand why systems perform as they do. The focus is on retrieving and filtering natural language text, with material addressing retrieval performance for the simple case of queries with a single term, the more complex case with multiple terms, both with term independence and term dependence, and for the use of grammatical information to improve performance. Unambiguous statements of the conditions under which one method or system will be more effective than another are developed.
Text Retrieval and Filtering: Analytical Models of Performance focuses on the performance of systems that retrieve natural language text, considering full sentences as well as phrases and individual words. The last chapter explicitly addresses how grammatical constructs and methods may be studied in the context of retrieval or filtering system performance. The book builds toward solving this problem, although the material in earlier chapters is as useful to those addressing non-linguistic, statistical concerns as it is to linguists. Those interested in grammatical information should be cautioned to carefully examine earlier chapters, especially Chapters 7 and 8, which discuss purely statistical relationships between terms, before moving on to Chapter 10, which explicitly addresses linguistic issues.
Text Retrieval and Filtering: Analytical Models of Performance is suitable as a secondary text for a graduate level course on Information Retrieval or Linguistics, and as a reference for researchers and practitioners in industry.

Information Retrieval: Uncertainty and Logics: Advanced Models for the Representation and Retrieval of Information

Book 4·Dec 2012

0.0

In recent years, there have been several attempts to define a logic for information retrieval (IR). The aim was to provide a rich and uniform representation of information and its semantics with the goal of improving retrieval effectiveness. The basis of a logical model for IR is the assumption that queries and documents can be represented effectively by logical formulae. To retrieve a document, an IR system has to infer the formula representing the query from the formula representing the document. This logical interpretation of query and document emphasizes that relevance in IR is an inference process.
The use of logic to build IR models enables one to obtain models that are more general than earlier well-known IR models. Indeed, some logical models are able to represent within a uniform framework various features of IR systems such as hypermedia links, multimedia data, and user's knowledge. Logic also provides a common approach to the integration of IR systems with logical database systems. Finally, logic makes it possible to reason about an IR model and its properties. This latter possibility is becoming increasingly more important since conventional evaluation methods, although good indicators of the effectiveness of IR systems, often give results which cannot be predicted, or for that matter satisfactorily explained.
However, logic by itself cannot fully model IR. The success or the failure of the inference of the query formula from the document formula is not enough to model relevance in IR. It is necessary to take into account the uncertainty inherent in such an inference process. In 1986, Van Rijsbergen proposed the uncertainty logical principle to model relevance as an uncertain inference process. When proposing the principle, Van Rijsbergen was not specific about which logic and which uncertainty theory to use. As a consequence, various logics and uncertainty theories have been proposed and investigated. The choice of an appropriate logic and uncertainty mechanism has been a main research theme in logical IR modeling leading to a number of logical IR models over the years.
Information Retrieval: Uncertainty and Logics contains a collection of exciting papers proposing, developing and implementing logical IR models. This book is appropriate for use as a text for a graduate-level course on Information Retrieval or Database Systems, and as a reference for researchers and practitioners in industry.

Document Computing: Technologies for Managing Electronic Document Collections

Book 5·Dec 2012

0.0

Document Computing: Technologies for Managing Electronic Document Collections discusses the important aspects of document computing and recommends technologies and techniques for document management, with an emphasis on the processes that are appropriate when computers are used to create, access, and publish documents. This book includes descriptions of the nature of documents, their components and structure, and how they can be represented; examines how documents are used and controlled; explores the issues and factors affecting design and implementation of a document management strategy; and gives a detailed case study. The analysis and recommendations are grounded in the findings of the latest research.
Document Computing: Technologies for Managing Electronic Document Collections brings together concepts, research, and practice from diverse areas including document computing, information retrieval, librarianship, records management, and business process re-engineering. It will be of value to anyone working in these areas, whether as a researcher, a developer, or a user.
Document Computing: Technologies for Managing Electronic Document Collections can be used for graduate classes in document computing and related fields, by developers and integrators of document management systems and document management applications, and by anyone wishing to understand the processes of document management.

Automatic Indexing and Abstracting of Document Texts

Book 6·Dec 2005

0.0

Automatic Indexing and Abstracting of Document Texts summarizes the latest techniques of automatic indexing and abstracting, and the results of their application. It also places the techniques in the context of the study of text, manual indexing and abstracting, and the use of the indexing descriptions and abstracts in systems that select documents or information from large collections. Important sections of the book consider the development of new techniques for indexing and abstracting. The techniques involve the following: using text grammars, learning of the themes of the texts including the identification of representative sentences or paragraphs by means of adequate cluster algorithms, and learning of classification patterns of texts. In addition, the book is an attempt to illuminate new avenues for future research.
Automatic Indexing and Abstracting of Document Texts is an excellent reference for researchers and professionals working in the field of content management and information retrieval.

Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval

Book 7·Apr 2006

0.0

The Center for Intelligent Information Retrieval (CIIR) was formed in the Computer Science Department ofthe University ofMassachusetts, Amherst in 1992. The core support for the Center came from a National Science Foun- tion State/Industry/University Cooperative Research Center(S/IUCRC) grant, although there had been a sizeable information retrieval (IR) research group for over 10 years prior to that grant. Thebasic goal ofthese Centers is to combine basic research, applied research, and technology transfer. The CIIR has been successful in each of these areas, in that it has produced over 270 research papers, has been involved in many successful government and industry collaborations, and has had a significant role in high-visibility Internet sites and start-ups. As a result of these efforts, the CIIR has become known internationally as one of the leading research groups in the area of information retrieval. The CIIR focuses on research that results in more effective and efficient access and discovery in large, heterogeneous, distributed, text and multimedia databases. The scope of the work that is done in the CIIR is broad and goes significantly beyond “traditional” areas of information retrieval such as retrieval models, cross-lingual search, and automatic query expansion. The research includes both low-level systems issues such as the design of protocols and architectures for distributed search, as well as more human-centered topics such as user interface design, visualization and data mining with text, and multimedia retrieval.

Information Storage and Retrieval Systems: Theory and Implementation, Edition 2

Book 8·Nov 2005

0.0

Chapter 1 places into perspective a total Information Storage and Retrieval System. This perspective introduces new challenges to the problems that need to be theoretically addressed and commercially implemented. Ten years ago commercial implementation of the algorithms being developed was not realistic, allowing theoreticians to limit their focus to very specific areas. Bounding a problem is still essential in deriving theoretical results. But the commercialization and insertion of this technology into systems like the Internet that are widely being used changes the way problems are bounded. From a theoretical perspective, efficient scalability of algorithms to systems with gigabytes and terabytes of data, operating with minimal user search statement information, and making maximum use of all functional aspects of an information system need to be considered. The dissemination systems using persistent indexes or mail files to modify ranking algorithms and combining the search of structured information fields and free text into a consolidated weighted output are examples of potential new areas of investigation. The best way for the theoretician or the commercial developer to understand the importance of problems to be solved is to place them in the context of a total vision of a complete system. Understanding the differences between Digital Libraries and Information Retrieval Systems will add an additional dimension to the potential future development of systems. The collaborative aspects of digital libraries can be viewed as a new source of information that dynamically could interact with information retrieval techniques.

Perspectives on Content-Based Multimedia Systems

Book 9·Apr 2006

0.0

Multimedia data comprising of images, audio and video is becoming increasingly common. The decreasing costs of consumer electronic devices such as digital cameras and digital camcorders, along with the ease of transportation facilitated by the Internet, has lead to a phenomenal rise in the amount of multimedia data generated and distributed. Given that this trend of increased use of multimedia data is likely to accelerate, there is an urgent need for providing a clear means of capturing, storing, indexing, retrieving, analyzing and summarizing such data.
Content-based access to multimedia data is of primary importance since it is the natural way by which human beings interact with such information. To facilitate the content-based access of multimedia information, the first step is to derive feature measures from these data so that a feature space representation of the data content can be formed. This can subsequently allow for mapping the feature space to the symbol space (semantics) either automatically or through human intervention. Thus, signal to symbol mapping, useful for any practical system, can be successfully achieved.
Perspectives on Content-Based Multimedia Systems provides a comprehensive set of techniques to tackle these important issues. This book offers detailed solutions to a wide range of practical problems in building real systems by providing specifics of three systems built by the authors. While providing a systems focus, it also equips the reader with a keen understanding of the fundamental issues, including a formalism for content-based multimedia database systems, multimedia feature extraction, object-based techniques, signature-based techniques and fuzzy retrieval techniques. The performance evaluation issues of practical systems is also explained. This book brings together essential elements of building a content-based multimedia database system in a way that makes them accessible to practitioners in computer science and electrical engineering. It can also serve as a textbook for graduate-level courses.

Mining the World Wide Web: An Information Search Approach

Book 10·Dec 2012

4.0

Mining the World Wide Web: An Information Search Approach explores the concepts and techniques of Web mining, a promising and rapidly growing field of computer science research. Web mining is a multidisciplinary field, drawing on such areas as artificial intelligence, databases, data mining, data warehousing, data visualization, information retrieval, machine learning, markup languages, pattern recognition, statistics, and Web technology. Mining the World Wide Web presents the Web mining material from an information search perspective, focusing on issues relating to the efficiency, feasibility, scalability and usability of searching techniques for Web mining.
Mining the World Wide Web is designed for researchers and developers of Web information systems and also serves as an excellent supplemental reference to advanced level courses in data mining, databases and information retrieval.

Integrated Region-Based Image Retrieval

Book 11·Dec 2012

0.0

Content-based image retrieval is the set of techniques for retrieving relevant images from an image database on the basis of automatically derived image features. The need for efficient content-based image re trieval has increased tremendously in many application areas such as biomedicine, the military, commerce, education, and Web image clas sification and searching. In the biomedical domain, content-based im age retrieval can be used in patient digital libraries, clinical diagnosis, searching of 2-D electrophoresis gels, and pathology slides. I started my work on content-based image retrieval in 1995 when I was with Stanford University. The project was initiated by the Stan ford University Libraries and later funded by a research grant from the National Science Foundation. The goal was to design and implement a computer system capable of indexing and retrieving large collections of digitized multimedia data available in the libraries based on the media contents. At the time, it seemed reasonable to me that I should discover the solution to the image retrieval problem during the project. Experi ence has certainly demonstrated how far we are as yet from solving this basic problem.

Topic Detection and Tracking: Event-based Information Organization

Book 12·Dec 2012

0.0

Topic Detection and Tracking: Event-based Information Organization brings together in one place state-of-the-art research in Topic Detection and Tracking (TDT). This collection of technical papers from leading researchers in the field not only provides several chapters devoted to the research program and its evaluation paradigm, but also presents the most current research results and describes some of the remaining open challenges. Topic Detection and Tracking: Event-based Information Organization is an excellent reference for researchers and practitioners in a variety of fields related to TDT, including information retrieval, automatic speech recognition, machine learning, and information extraction.

Language Modeling for Information Retrieval

Book 13·Apr 2013

0.0

A statisticallanguage model, or more simply a language model, is a prob abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon's study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.

Machine Learning and Statistical Modeling Approaches to Image Retrieval

Book 14·Apr 2006

0.0

In the early 1990s, the establishment of the Internet brought forth a revolutionary viewpoint of information storage, distribution, and processing: the World Wide Web is becoming an enormous and expanding distributed digital library. Along with the development of the Web, image indexing and retrieval have grown into research areas sharing a vision of intelligent agents. Far beyond Web searching, image indexing and retrieval can potentially be applied to many other areas, including biomedicine, space science, biometric identification, digital libraries, the military, education, commerce, culture and entertainment.
Machine Learning and Statistical Modeling Approaches to Image Retrieval describes several approaches of integrating machine learning and statistical modeling into an image retrieval and indexing system that demonstrates promising results. The topics of this book reflect authors' experiences of machine learning and statistical modeling based image indexing and retrieval. This book contains detailed references for further reading and research in this field as well.

Information Retrieval: Algorithms and Heuristics, Edition 2

Book 15·Nov 2012

0.0

Interested in how an efficient search engine works? Want to know what algorithms are used to rank resulting documents in response to user requests? The authors answer these and other key information retrieval design and implementation questions.

This book is not yet another high level text. Instead, algorithms are thoroughly described, making this book ideally suited for both computer science students and practitioners who work on search-related applications. As stated in the foreword, this book provides a current, broad, and detailed overview of the field and is the only one that does so. Examples are used throughout to illustrate the algorithms.

The authors explain how a query is ranked against a document collection using either a single or a combination of retrieval strategies, and how an assortment of utilities are integrated into the query processing scheme to improve these rankings. Methods for building and compressing text indexes, querying and retrieving documents in multiple languages, and using parallel or distributed processing to expedite the search are likewise described.

This edition is a major expansion of the one published in 1998. Besides updating the entire book with current techniques, it includes new sections on language models, cross-language information retrieval, peer-to-peer processing, XML search, mediators, and duplicate document detection.

Charting a New Course: Natural Language Processing and Information Retrieval.: Essays in Honour of Karen Spärck Jones

Book 16·Aug 2005

0.0

Karen Spärck Jones is one of the major figures of 20th century and early 21st Century computing and information processing. Her ideas have had an important influence on the development of Internet Search Engines. Her contribution has been recognized by awards from the natural language processing, information retrieval and artificial intelligence communities, including being asked to present the prestigious Grace Hopper lecture. She continues to be an active and influential researcher. Her contribution to the scientific evaluation of the effectiveness of such computer systems has been quite outstanding.

This book celebrates the life and work of Karen Spärck Jones in her seventieth year. It consists of fifteen new and original chapters written by leading international authorities reviewing the state of the art and her influence in the areas in which Karen Spärck Jones has been active. Although she has a publication record which goes back over forty years, it is clear even the very early work reviewed in the book can be read with profit by those working on recent developments in information processing like bioinformatics and the semantic web.

Intelligent Document Retrieval: Exploiting Markup Structure

Book 17·Jan 2006

0.0

Collections of digital documents can nowadays be found everywhere in institutions, universities or companies. Examples are Web sites or intranets. But searching them for information can still be painful. Searches often return either large numbers of matches or no suitable matches at all.

Such document collections can vary a lot in size and how much structure they carry. What they have in common is that they typically do have some structure and that they cover a limited range of topics. The second point is significantly different from documents on the Web in general.

The type of search system that we propose in this book can suggest ways of refining or relaxing the query to assist a user in the search process. In order to suggest sensible query modifications we would need to know what the documents are about. Explicit knowledge about the document collection encoded in some electronic form is what we need. However, typically such knowledge is not available.

This book describes how that knowledge can be contructed automatically.

This book

demonstrates how document markup structure can be used to construct domain models for collections of partially structured documents

shows how such knowledge can be utilized when searching the document collections

presents two implemented search systems which demonstrate the usefulness of this approach.

The Turn: Integration of Information Seeking and Retrieval in Context

Book 18·Nov 2005

0.0

The Turn analyzes the research of information seeking and retrieval (IS&R) and proposes a new direction of integrating research in these two areas: the fields should turn off their separate and narrow paths and construct a new avenue of research. An essential direction for this avenue is context as given in the subtitle Integration of Information Seeking and Retrieval in Context. Other essential themes in the book include:

IS&R research models, frameworks and theories; search and works tasks and situations in context; interaction between humans and machines; information acquisition, relevance and information use; research design and methodology based on a structured set of explicit variables - all set into the holistic cognitive approach. The present monograph invites the reader into a construction project - there is much research to do for a contextual understanding of IS&R.

The Turn represents a wide-ranging perspective of IS&R by providing a novel unique research framework, covering both individual and social aspects of information behavior, including the generation, searching, retrieval and use of information. Regarding traditional laboratory information retrieval research, the monograph proposes the extension of research toward actors, search and work tasks, IR interaction and utility of information. Regarding traditional information seeking research, it proposes the extension toward information access technology and work task contexts.

The Turn is the first synthesis of research in the broad area of IS&R ranging from systems oriented laboratory IR research to social science oriented information seeking studies.

New Directions in Cognitive Information Retrieval

Book 19·Aug 2006

5.0

New Directions in Cognitive Information Retrieval presents an exciting new direction for research into cognitive oriented information retrieval (IR) research, a direction based on an analysis of the user’s problem situation and cognitive behavior when using the IR system. This contrasts with the current dominant IR research paradigm which concentrates on improving IR system matching performance.

The chapters describe the leading edge concepts and models of cognitive IR that explore the nexus between human cognition, information and the social conditions that drive humans to seek information using IR systems. Chapter topics include: Polyrepresentation, cognitive overlap and the boomerang effect, Multitasking while conducting the search, Knowledge Diagram Visualizations of the topic space to facilitate user assimilation of information, Task, relevance, selection state, knowledge need and knowledge behavior, search training built into the search, children’s collaboration for school projects, and other cognitive perspectives on IR concepts and issues.

This book is directly relevant to information scientists, librarians, social scientists and computer scientists interested in Human Computer Interaction (HCI) usability issues. Undergraduate and graduate students, academics.

Computing Attitude and Affect in Text: Theory and Applications

Book 20·Jan 2006

2.0

Human Language Technology (HLT) and Natural Language Processing (NLP) systems have typically focused on the “factual” aspect of content analysis. Other aspects, including pragmatics, opinion, and style, have received much less attention. However, to achieve an adequate understanding of a text, these aspects cannot be ignored. The chapters in this book address the aspect of subjective opinion, which includes identifying different points of view, identifying different emotive dimensions, and classifying text by opinion. Various conceptual models and computational methods are presented. The models explored in this book include the following: distinguishing attitudes from simple factual assertions; distinguishing between the author’s reports from reports of other people’s opinions; and distinguishing between explicitly and implicitly stated attitudes. In addition, many applications are described that promise to benefit from the ability to understand attitudes and affect, including indexing and retrieval of documents by opinion; automatic question answering about opinions; analysis of sentiment in the media and in discussion groups about consumer products, political issues, etc. ; brand and reputation management; discovering and predicting consumer and voting trends; analyzing client discourse in therapy and counseling; determining relations between scientific texts by finding reasons for citations; generating more appropriate texts and making agents more believable; and creating writers’ aids. The studies reported here are carried out on different languages such as English, French, Japanese, and Portuguese. Difficult challenges remain, however. It can be argued that analyzing attitude and affect in text is an “NLP”-complete problem.

Free of charge

Entity-Oriented Search

Krisztian Balog

Book 39•Computers & technology

Free

Learning to Quantify

Andrea Esuli

Book 47•Computers & technology

5.0

Free