The Information Retrieval Series

· · · · ·
Latest release: March 16, 2023
Series
43
Books
Cross-Language Information Retrieval
Book 2·Dec 2012
0.0
·
Most of the papers in this volume were first presented at the Workshop on Cross-Linguistic Information Retrieval that was held August 22, 1996 dur ing the SIGIR'96 Conference. Alan Smeaton of Dublin University and Paraic Sheridan of the ETH, Zurich, were the two other members of the Scientific Committee for this workshop. SIGIR is the Association for Computing Ma chinery (ACM) Special Interest Group on Information Retrieval, and they have held conferences yearly since 1977. Three additional papers have been added: Chapter 4 Distributed Cross-Lingual Information retrieval describes the EMIR retrieval system, one of the first general cross-language systems to be implemented and evaluated; Chapter 6 Mapping Vocabularies Using Latent Semantic Indexing, which originally appeared as a technical report in the Lab oratory for Computational Linguistics at Carnegie Mellon University in 1991, is included here because it was one of the earliest, though hard-to-find, publi cations showing the application of Latent Semantic Indexing to the problem of cross-language retrieval; and Chapter 10 A Weighted Boolean Model for Cross Language Text Retrieval describes a recent approach to solving the translation term weighting problem, specific to Cross-Language Information Retrieval. Gregory Grefenstette CONTRIBUTORS Lisa Ballesteros David Hull W, Bruce Croft Gregory Grefenstette Center for Intelligent Xerox Research Centre Europe Information Retrieval Grenoble Laboratory Computer Science Department University of Massachusetts Thomas K. Landauer Department of Psychology Mark W. Davis and Institute of Cognitive Science Computing Research Lab University of Colorado, Boulder New Mexico State University Michael L. Littman Bonnie J.
Text Retrieval and Filtering: Analytic Models of Performance
Book 3·Dec 2012
0.0
·
Text Retrieval and Filtering: Analytical Models of Performance is the first book that addresses the problem of analytically computing the performance of retrieval and filtering systems. The book describes means by which retrieval may be studied analytically, allowing one to describe current performance, predict future performance, and to understand why systems perform as they do. The focus is on retrieving and filtering natural language text, with material addressing retrieval performance for the simple case of queries with a single term, the more complex case with multiple terms, both with term independence and term dependence, and for the use of grammatical information to improve performance. Unambiguous statements of the conditions under which one method or system will be more effective than another are developed.
Text Retrieval and Filtering: Analytical Models of Performance focuses on the performance of systems that retrieve natural language text, considering full sentences as well as phrases and individual words. The last chapter explicitly addresses how grammatical constructs and methods may be studied in the context of retrieval or filtering system performance. The book builds toward solving this problem, although the material in earlier chapters is as useful to those addressing non-linguistic, statistical concerns as it is to linguists. Those interested in grammatical information should be cautioned to carefully examine earlier chapters, especially Chapters 7 and 8, which discuss purely statistical relationships between terms, before moving on to Chapter 10, which explicitly addresses linguistic issues.
Text Retrieval and Filtering: Analytical Models of Performance is suitable as a secondary text for a graduate level course on Information Retrieval or Linguistics, and as a reference for researchers and practitioners in industry.
Information Retrieval: Uncertainty and Logics: Advanced Models for the Representation and Retrieval of Information
Book 4·Dec 2012
0.0
·
In recent years, there have been several attempts to define a logic for information retrieval (IR). The aim was to provide a rich and uniform representation of information and its semantics with the goal of improving retrieval effectiveness. The basis of a logical model for IR is the assumption that queries and documents can be represented effectively by logical formulae. To retrieve a document, an IR system has to infer the formula representing the query from the formula representing the document. This logical interpretation of query and document emphasizes that relevance in IR is an inference process.
The use of logic to build IR models enables one to obtain models that are more general than earlier well-known IR models. Indeed, some logical models are able to represent within a uniform framework various features of IR systems such as hypermedia links, multimedia data, and user's knowledge. Logic also provides a common approach to the integration of IR systems with logical database systems. Finally, logic makes it possible to reason about an IR model and its properties. This latter possibility is becoming increasingly more important since conventional evaluation methods, although good indicators of the effectiveness of IR systems, often give results which cannot be predicted, or for that matter satisfactorily explained.
However, logic by itself cannot fully model IR. The success or the failure of the inference of the query formula from the document formula is not enough to model relevance in IR. It is necessary to take into account the uncertainty inherent in such an inference process. In 1986, Van Rijsbergen proposed the uncertainty logical principle to model relevance as an uncertain inference process. When proposing the principle, Van Rijsbergen was not specific about which logic and which uncertainty theory to use. As a consequence, various logics and uncertainty theories have been proposed and investigated. The choice of an appropriate logic and uncertainty mechanism has been a main research theme in logical IR modeling leading to a number of logical IR models over the years.
Information Retrieval: Uncertainty and Logics contains a collection of exciting papers proposing, developing and implementing logical IR models. This book is appropriate for use as a text for a graduate-level course on Information Retrieval or Database Systems, and as a reference for researchers and practitioners in industry.
Document Computing: Technologies for Managing Electronic Document Collections
Book 5·Dec 2012
0.0
·
Document Computing: Technologies for Managing Electronic Document Collections discusses the important aspects of document computing and recommends technologies and techniques for document management, with an emphasis on the processes that are appropriate when computers are used to create, access, and publish documents. This book includes descriptions of the nature of documents, their components and structure, and how they can be represented; examines how documents are used and controlled; explores the issues and factors affecting design and implementation of a document management strategy; and gives a detailed case study. The analysis and recommendations are grounded in the findings of the latest research.
Document Computing: Technologies for Managing Electronic Document Collections brings together concepts, research, and practice from diverse areas including document computing, information retrieval, librarianship, records management, and business process re-engineering. It will be of value to anyone working in these areas, whether as a researcher, a developer, or a user.
Document Computing: Technologies for Managing Electronic Document Collections can be used for graduate classes in document computing and related fields, by developers and integrators of document management systems and document management applications, and by anyone wishing to understand the processes of document management.
Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval
Book 7·Apr 2006
0.0
·
The Center for Intelligent Information Retrieval (CIIR) was formed in the Computer Science Department ofthe University ofMassachusetts, Amherst in 1992. The core support for the Center came from a National Science Foun- tion State/Industry/University Cooperative Research Center(S/IUCRC) grant, although there had been a sizeable information retrieval (IR) research group for over 10 years prior to that grant. Thebasic goal ofthese Centers is to combine basic research, applied research, and technology transfer. The CIIR has been successful in each of these areas, in that it has produced over 270 research papers, has been involved in many successful government and industry collaborations, and has had a significant role in high-visibility Internet sites and start-ups. As a result of these efforts, the CIIR has become known internationally as one of the leading research groups in the area of information retrieval. The CIIR focuses on research that results in more effective and efficient access and discovery in large, heterogeneous, distributed, text and multimedia databases. The scope of the work that is done in the CIIR is broad and goes significantly beyond “traditional” areas of information retrieval such as retrieval models, cross-lingual search, and automatic query expansion. The research includes both low-level systems issues such as the design of protocols and architectures for distributed search, as well as more human-centered topics such as user interface design, visualization and data mining with text, and multimedia retrieval.
Information Storage and Retrieval Systems: Theory and Implementation, Edition 2
Book 8·Nov 2005
0.0
·
Chapter 1 places into perspective a total Information Storage and Retrieval System. This perspective introduces new challenges to the problems that need to be theoretically addressed and commercially implemented. Ten years ago commercial implementation of the algorithms being developed was not realistic, allowing theoreticians to limit their focus to very specific areas. Bounding a problem is still essential in deriving theoretical results. But the commercialization and insertion of this technology into systems like the Internet that are widely being used changes the way problems are bounded. From a theoretical perspective, efficient scalability of algorithms to systems with gigabytes and terabytes of data, operating with minimal user search statement information, and making maximum use of all functional aspects of an information system need to be considered. The dissemination systems using persistent indexes or mail files to modify ranking algorithms and combining the search of structured information fields and free text into a consolidated weighted output are examples of potential new areas of investigation. The best way for the theoretician or the commercial developer to understand the importance of problems to be solved is to place them in the context of a total vision of a complete system. Understanding the differences between Digital Libraries and Information Retrieval Systems will add an additional dimension to the potential future development of systems. The collaborative aspects of digital libraries can be viewed as a new source of information that dynamically could interact with information retrieval techniques.
Perspectives on Content-Based Multimedia Systems
Book 9·Apr 2006
0.0
·
Multimedia data comprising of images, audio and video is becoming increasingly common. The decreasing costs of consumer electronic devices such as digital cameras and digital camcorders, along with the ease of transportation facilitated by the Internet, has lead to a phenomenal rise in the amount of multimedia data generated and distributed. Given that this trend of increased use of multimedia data is likely to accelerate, there is an urgent need for providing a clear means of capturing, storing, indexing, retrieving, analyzing and summarizing such data.
Content-based access to multimedia data is of primary importance since it is the natural way by which human beings interact with such information. To facilitate the content-based access of multimedia information, the first step is to derive feature measures from these data so that a feature space representation of the data content can be formed. This can subsequently allow for mapping the feature space to the symbol space (semantics) either automatically or through human intervention. Thus, signal to symbol mapping, useful for any practical system, can be successfully achieved.
Perspectives on Content-Based Multimedia Systems provides a comprehensive set of techniques to tackle these important issues. This book offers detailed solutions to a wide range of practical problems in building real systems by providing specifics of three systems built by the authors. While providing a systems focus, it also equips the reader with a keen understanding of the fundamental issues, including a formalism for content-based multimedia database systems, multimedia feature extraction, object-based techniques, signature-based techniques and fuzzy retrieval techniques. The performance evaluation issues of practical systems is also explained. This book brings together essential elements of building a content-based multimedia database system in a way that makes them accessible to practitioners in computer science and electrical engineering. It can also serve as a textbook for graduate-level courses.
Language Modeling for Information Retrieval
Book 13·Apr 2013
0.0
·
A statisticallanguage model, or more simply a language model, is a prob abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon's study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.
The Turn: Integration of Information Seeking and Retrieval in Context
Book 18·Nov 2005
0.0
·
The Turn analyzes the research of information seeking and retrieval (IS&R) and proposes a new direction of integrating research in these two areas: the fields should turn off their separate and narrow paths and construct a new avenue of research. An essential direction for this avenue is context as given in the subtitle Integration of Information Seeking and Retrieval in Context. Other essential themes in the book include:

IS&R research models, frameworks and theories; search and works tasks and situations in context; interaction between humans and machines; information acquisition, relevance and information use; research design and methodology based on a structured set of explicit variables - all set into the holistic cognitive approach. The present monograph invites the reader into a construction project - there is much research to do for a contextual understanding of IS&R.

The Turn represents a wide-ranging perspective of IS&R by providing a novel unique research framework, covering both individual and social aspects of information behavior, including the generation, searching, retrieval and use of information. Regarding traditional laboratory information retrieval research, the monograph proposes the extension of research toward actors, search and work tasks, IR interaction and utility of information. Regarding traditional information seeking research, it proposes the extension toward information access technology and work task contexts.

The Turn is the first synthesis of research in the broad area of IS&R ranging from systems oriented laboratory IR research to social science oriented information seeking studies.

Computing Attitude and Affect in Text: Theory and Applications
Book 20·Jan 2006
2.0
·
Human Language Technology (HLT) and Natural Language Processing (NLP) systems have typically focused on the “factual” aspect of content analysis. Other aspects, including pragmatics, opinion, and style, have received much less attention. However, to achieve an adequate understanding of a text, these aspects cannot be ignored. The chapters in this book address the aspect of subjective opinion, which includes identifying different points of view, identifying different emotive dimensions, and classifying text by opinion. Various conceptual models and computational methods are presented. The models explored in this book include the following: distinguishing attitudes from simple factual assertions; distinguishing between the author’s reports from reports of other people’s opinions; and distinguishing between explicitly and implicitly stated attitudes. In addition, many applications are described that promise to benefit from the ability to understand attitudes and affect, including indexing and retrieval of documents by opinion; automatic question answering about opinions; analysis of sentiment in the media and in discussion groups about consumer products, political issues, etc. ; brand and reputation management; discovering and predicting consumer and voting trends; analyzing client discourse in therapy and counseling; determining relations between scientific texts by finding reasons for citations; generating more appropriate texts and making agents more believable; and creating writers’ aids. The studies reported here are carried out on different languages such as English, French, Japanese, and Portuguese. Difficult challenges remain, however. It can be argued that analyzing attitude and affect in text is an “NLP”-complete problem.