The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You’ll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media.
Julia Silge is a data scientist at Stack Overflow; her work involves analyzing complex datasets and communicating about technical topics with diverse audiences. She has a PhD in astrophysics and loves Jane Austen and making beautiful charts. Julia worked in academia and ed tech before moving into data science and discovering the statistical programming language R.
David Robinson is a data scientist at Stack Overflow with a PhD in Quantitative and Computational Biology from Princeton University. He enjoys developing open source R packages, including broom, gganimate, fuzzyjoin and widyr, as well as blogging about statistics, R, and text mining on his blog, Variance Explained.
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This practical book shows you how.
By using concrete examples, minimal theory, and two production-ready Python frameworks—scikit-learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems. You’ll learn a range of techniques, starting with simple linear regression and progressing to deep neural networks. With exercises in each chapter to help you apply what you’ve learned, all you need is programming experience to get started.Explore the machine learning landscape, particularly neural netsUse scikit-learn to track an example machine-learning project end-to-endExplore several training models, including support vector machines, decision trees, random forests, and ensemble methodsUse the TensorFlow library to build and train neural netsDive into neural net architectures, including convolutional nets, recurrent nets, and deep reinforcement learningLearn techniques for training and scaling deep neural netsApply practical code examples without acquiring excessive machine learning theory or algorithm details
This book teaches the principles of natural language processing, first covering practical linguistics issues such as encoding and annotation schemes, defining words, tokens and parts of speech and morphology, as well as key concepts in machine learning, such as entropy, regression and classification, which are used throughout the book. It then details the language-processing functions involved, including part-of-speech tagging using rules and stochastic techniques, using Prolog to write phase-structure grammars, syntactic formalisms and parsing techniques, semantics, predicate logic and lexical semantics and analysis of discourse and applications in dialogue systems. A key feature of the book is the author's hands-on approach throughout, with sample code in Prolog and Perl, extensive exercises, and a detailed introduction to Prolog. The reader is supported with a companion website that contains teaching slides, programs and additional material.
The second edition is a complete revision of the techniques exposed in the book to reflect advances in the field the author redesigned or updated all the chapters, added two new ones and considerably expanded the sections on machine-learning techniques.
Sources of the African Past is designed for use in a wide variety of courses and in conjuction with other texts. The authors have kept their own interpretations to a minimum and invited scrutiny of their decision of selection and arrangement. They chose the cases on the basis of several criteria: geographical coverage, abundance and diversity of primary sources, importance in the secondary literature, and relevance to important historical problems. All the studies emphasize political change. All witness some growth in European intervention.
In selecting the documents, the authors sought a balance of perspective without sacrificing accuracy and relevance. This means a conscious effort to present a variety of views: African and European, internal and external, partipant and observer, those of the victims as well as those of the victors, those of the "people" as well as those of the elite. Within the limitations of space, they have made the excerpts sufficiently long to allow the reader to examine the author's style, purpose and other characteristics. Keeping in mind the limitations of libraries, they have attemted to make each chapter self-contained.
If you are an R programmer, analyst, or data scientist who wants to gain experience in performing text data mining and analytics with R, then this book is for you. Exposure to working with statistical methods and language processing would be helpful.What You Will LearnGet acquainted with some of the highly efficient R packages such as OpenNLP and RWeka to perform various steps in the text mining processAccess and manipulate data from different sources such as JSON and HTTPProcess text using regular expressionsGet to know the different approaches of tagging texts, such as POS tagging, to get started with text analysisExplore different dimensionality reduction techniques, such as Principal Component Analysis (PCA), and understand its implementation in RDiscover the underlying themes or topics that are present in an unstructured collection of documents, using common topic models such as Latent Dirichlet Allocation (LDA)Build a baseline sentence completing applicationPerform entity extraction and named entity recognition using RIn Detail
Text Mining (or text data mining or text analytics) is the process of extracting useful and high-quality information from text by devising patterns and trends. R provides an extensive ecosystem to mine text through its many frameworks and packages.
Starting with basic information about the statistics concepts used in text mining, this book will teach you how to access, cleanse, and process text using the R language and will equip you with the tools and the associated knowledge about different tagging, chunking, and entailment approaches and their usage in natural language processing. Moving on, this book will teach you different dimensionality reduction techniques and their implementation in R. Next, we will cover pattern recognition in text data utilizing classification mechanisms, perform entity recognition, and develop an ontology learning framework.
By the end of the book, you will develop a practical application from the concepts learned, and will understand how text mining can be leveraged to analyze the massively available data on social media.Style and approach
This book takes a hands-on, example-driven approach to the text mining process with lucid implementation in R.