Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data— including stream data, sequence data, graph structured data, social network data, and multi-relational data.
Jiawei Han is Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Well known for his research in the areas of data mining and database systems, he has received many awards for his contributions in the field, including the 2004 ACM SIGKDD Innovations Award. He has served as Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data, and on editorial boards of several journals, including IEEE Transactions on Knowledge and Data Engineering and Data Mining and Knowledge Discovery.
Jian Pei is currently a Canada Research Chair (Tier 1) in Big Data Science and a Professor in the School of Computing Science at Simon Fraser University. He is also an associate member of the Department of Statistics and Actuarial Science. He is a well-known leading researcher in the general areas of data science, big data, data mining, and database systems. His expertise is on developing effective and efficient data analysis techniques for novel data intensive applications. He is recognized as a Fellow of the Association of Computing Machinery (ACM) for his “contributions to the foundation, methodology and applications of data mining and as a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) for his “contributions to data mining and knowledge discovery . He is the editor-in-chief of the IEEE Transactions of Knowledge and Data Engineering (TKDE), a director of the Special Interest Group on Knowledge Discovery in Data (SIGKDD) of the Association for Computing Machinery (ACM), and a general co-chair or program committee co-chair of many premier conferences.
Micheline Kamber is a researcher with a passion for writing in easy-to-understand terms. She has a master's degree in computer science (specializing in artificial intelligence) from Concordia University, Canada.
They present new and innovative developments and applications, divided into technical stream sections on Knowledge Discovery and Data Mining I, Knowledge Discovery and Data Mining II, Intelligent Agents, Representation and Reasoning, and Machine Learning and Constraint Programming, followed by application stream sections on Medical Applications, Applications in Education and Information Science, and AI Applications. The volume also includes the text of short papers presented as posters at the conference.
This is the thirtieth volume in the Research and Development in Intelligent Systems series, which also incorporates the twenty-first volume in the Applications and Innovations in Intelligent Systems series. These series are essential reading for those who wish to keep up to date with developments in this important field.
Today many information sources—including sensor networks, financial markets, social networks, and healthcare monitoring—are so-called data streams, arriving sequentially and at high speed. Analysis must take place in real time, with partial data and without the capacity to store the entire data set. This book presents algorithms and techniques used in data stream mining and real-time analytics. Taking a hands-on approach, the book demonstrates the techniques using MOA (Massive Online Analysis), a popular, freely available open-source software framework, allowing readers to try out the techniques after reading the explanations.
The book first offers a brief introduction to the topic, covering big data mining, basic methodologies for mining data streams, and a simple example of MOA. More detailed discussions follow, with chapters on sketching techniques, change, classification, ensemble methods, regression, clustering, and frequent pattern mining. Most of these chapters include exercises, an MOA-based lab session, or both. Finally, the book discusses the MOA software, covering the MOA graphical user interface, the command line, use of its API, and the development of new methods within MOA. The book will be an essential reference for readers who want to use data stream mining as a tool, researchers in innovation or data stream mining, and programmers who want to create new algorithms for MOA.
Algorithms and evaluation: These chapters discuss the fundamental algorithms in recommender systems, including collaborative filtering methods, content-based methods, knowledge-based methods, ensemble-based methods, and evaluation.
Recommendations in specific domains and contexts: the context of a recommendation can be viewed as important side information that affects the recommendation goals. Different types of context such as temporal data, spatial data, social data, tagging data, and trustworthiness are explored.
Advanced topics and applications: Various robustness aspects of recommender systems, such as shilling systems, attack models, and their defenses are discussed.
In addition, recent topics, such as learning to rank, multi-armed bandits, group systems, multi-criteria systems, and active learning systems, are introduced together with applications.Although this book primarily serves as a textbook, it will also appeal to industrial practitioners and researchers due to its focus on applications and references. Numerous examples and exercises have been provided, and a solution manual is available for instructors.
Internet technology and data engineering should reinforce efficiency and effectiveness of business processes. These technologies should help people make better and more accurate decisions by presenting necessary information and possible consequences for the decisions. Intelligent information systems should help us better understand and manage information with ubiquitous data repository and cloud computing.
This book is a compilation of some recent research findings in Internet Technology and Data Engineering. This book provides state-of-the-art accounts in computational algorithms/tools, database management and database technologies, intelligent information systems, data engineering applications, internet security, internet data management, web search, data grids, cloud computing, web-based application, and other related topics.
Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research.
The book is targeted at information systems practitioners, programmers, consultants, developers, information technology managers, specification writers, data analysts, data modelers, database R&D professionals, data warehouse engineers, data mining professionals. The book will also be useful for professors and students of upper-level undergraduate and graduate-level data mining and machine learning courses who want to incorporate data mining as part of their data management knowledge base and expertise.Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projectsOffers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methodsIncludes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks—in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization
Divided into three parts, the book explains how the fundamental algorithms and methods of both physics-based and data-driven approaches effectively address systems health management. The first part of the text describes data-driven methods for anomaly detection, diagnosis, and prognosis of massive data streams and associated performance metrics. It also illustrates the analysis of text reports using novel machine learning approaches that help detect and discriminate between failure modes. The second part focuses on physics-based methods for diagnostics and prognostics, exploring how these methods adapt to observed data. It covers physics-based, data-driven, and hybrid approaches to studying damage propagation and prognostics in composite materials and solid rocket motors. The third part discusses the use of machine learning and physics-based approaches in distributed data centers, aircraft engines, and embedded real-time software systems.
Reflecting the interdisciplinary nature of the field, this book shows how various machine learning and knowledge discovery techniques are used in the analysis of complex engineering systems. It emphasizes the importance of these techniques in managing the intricate interactions within and between the systems to maintain a high degree of reliability.
Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.
By the end of an average day in the early twenty-first century, human beings searching the internet will amass eight trillion gigabytes of data. This staggering amount of information—unprecedented in history—can tell us a great deal about who we are—the fears, desires, and behaviors that drive us, and the conscious and unconscious decisions we make. From the profound to the mundane, we can gain astonishing knowledge about the human psyche that less than twenty years ago, seemed unfathomable.
Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?
Investigating these questions and a host of others, Seth Stephens-Davidowitz offers revelations that can help us understand ourselves and our lives better. Drawing on studies and experiments on how we really live and think, he demonstrates in fascinating and often funny ways the extent to which all the world is indeed a lab. With conclusions ranging from strange-but-true to thought-provoking to disturbing, he explores the power of this digital truth serum and its deeper potential—revealing biases deeply embedded within us, information we can use to change our culture, and the questions we’re afraid to ask that might be essential to our health—both emotional and physical. All of us are touched by big data everyday, and its influence is multiplying. Everybody Lies challenges us to think differently about how we see it and the world.