Text Mining with R: A Tidy Approach

"O'Reilly Media, Inc."
Free sample

Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you’ll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You’ll learn how tidytext and other tidy tools in R can make text analysis easier and more effective.

The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You’ll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media.

  • Learn how to apply the tidy text format to NLP
  • Use sentiment analysis to mine the emotional content of text
  • Identify a document’s most important terms with frequency measurements
  • Explore relationships and connections between words with the ggraph and widyr packages
  • Convert back and forth between R’s tidy and non-tidy text formats
  • Use topic modeling to classify document collections into natural groups
  • Examine case studies that compare Twitter archives, dig into NASA metadata, and analyze thousands of Usenet messages
Read more
Collapse

About the author

Julia Silge is a data scientist at Stack Overflow; her work involves analyzing complex datasets and communicating about technical topics with diverse audiences. She has a PhD in astrophysics and loves Jane Austen and making beautiful charts. Julia worked in academia and ed tech before moving into data science and discovering the statistical programming language R.

David Robinson is a data scientist at Stack Overflow with a PhD in Quantitative and Computational Biology from Princeton University. He enjoys developing open source R packages, including broom, gganimate, fuzzyjoin and widyr, as well as blogging about statistics, R, and text mining on his blog, Variance Explained.

Read more
Collapse
Loading...

Additional Information

Publisher
"O'Reilly Media, Inc."
Read more
Collapse
Published on
Jun 12, 2017
Read more
Collapse
Pages
194
Read more
Collapse
ISBN
9781491981603
Read more
Collapse
Read more
Collapse
Read more
Collapse
Language
English
Read more
Collapse
Genres
Computers / Data Visualization
Computers / Databases / Data Mining
Computers / Databases / General
Computers / Intelligence (AI) & Semantics
Computers / Natural Language Processing
Read more
Collapse
Content Protection
This content is DRM free.
Read more
Collapse
Read Aloud
Available on Android devices
Read more
Collapse
Eligible for Family Library

Reading information

Smartphones and Tablets

Install the Google Play Books app for Android and iPad/iPhone. It syncs automatically with your account and allows you to read online or offline wherever you are.

Laptops and Computers

You can read books purchased on Google Play using your computer's web browser.

eReaders and other devices

To read on e-ink devices like the Sony eReader or Barnes & Noble Nook, you'll need to download a file and transfer it to your device. Please follow the detailed Help center instructions to transfer the files to supported eReaders.
The areas of natural language processing and computational linguistics have continued to grow in recent years, driven by the demand to automatically process text and spoken data. With the processing power and techniques now available, research is scaling up from lab prototypes to real-world, proven applications.

This book teaches the principles of natural language processing, first covering practical linguistics issues such as encoding and annotation schemes, defining words, tokens and parts of speech and morphology, as well as key concepts in machine learning, such as entropy, regression and classification, which are used throughout the book. It then details the language-processing functions involved, including part-of-speech tagging using rules and stochastic techniques, using Prolog to write phase-structure grammars, syntactic formalisms and parsing techniques, semantics, predicate logic and lexical semantics and analysis of discourse and applications in dialogue systems. A key feature of the book is the author's hands-on approach throughout, with sample code in Prolog and Perl, extensive exercises, and a detailed introduction to Prolog. The reader is supported with a companion website that contains teaching slides, programs and additional material.

The second edition is a complete revision of the techniques exposed in the book to reflect advances in the field the author redesigned or updated all the chapters, added two new ones and considerably expanded the sections on machine-learning techniques.

Sources of the African Past combines a case-study approach with an emphasis on primary and orally transmitted sources to accomplish three objectives; to tell a story in some depth, to portray major themes and to raise basic questions of analysis and interpretation. The case studies are set in the nineteenth century and deal with critical periods in the fortunes of five societies in different parts of the continent (South, East, and West Africa). The authors wish students to work with the "raw" materials of history and to that end have provided a workbook for a "laboratory" experience.

Sources of the African Past is designed for use in a wide variety of courses and in conjuction with other texts. The authors have kept their own interpretations to a minimum and invited scrutiny of their decision of selection and arrangement. They chose the cases on the basis of several criteria: geographical coverage, abundance and diversity of primary sources, importance in the secondary literature, and relevance to important historical problems. All the studies emphasize political change. All witness some growth in European intervention.

In selecting the documents, the authors sought a balance of perspective without sacrificing accuracy and relevance. This means a conscious effort to present a variety of views: African and European, internal and external, partipant and observer, those of the victims as well as those of the victors, those of the "people" as well as those of the elite. Within the limitations of space, they have made the excerpts sufficiently long to allow the reader to examine the author's style, purpose and other characteristics. Keeping in mind the limitations of libraries, they have attemted to make each chapter self-contained.

Master text-taming techniques and build effective text-processing applications with RAbout This BookDevelop all the relevant skills for building text-mining apps with R with this easy-to-follow guideGain in-depth understanding of the text mining process with lucid implementation in the R languageExample-rich guide that lets you gain high-quality information from text dataWho This Book Is For

If you are an R programmer, analyst, or data scientist who wants to gain experience in performing text data mining and analytics with R, then this book is for you. Exposure to working with statistical methods and language processing would be helpful.

What You Will LearnGet acquainted with some of the highly efficient R packages such as OpenNLP and RWeka to perform various steps in the text mining processAccess and manipulate data from different sources such as JSON and HTTPProcess text using regular expressionsGet to know the different approaches of tagging texts, such as POS tagging, to get started with text analysisExplore different dimensionality reduction techniques, such as Principal Component Analysis (PCA), and understand its implementation in RDiscover the underlying themes or topics that are present in an unstructured collection of documents, using common topic models such as Latent Dirichlet Allocation (LDA)Build a baseline sentence completing applicationPerform entity extraction and named entity recognition using RIn Detail

Text Mining (or text data mining or text analytics) is the process of extracting useful and high-quality information from text by devising patterns and trends. R provides an extensive ecosystem to mine text through its many frameworks and packages.

Starting with basic information about the statistics concepts used in text mining, this book will teach you how to access, cleanse, and process text using the R language and will equip you with the tools and the associated knowledge about different tagging, chunking, and entailment approaches and their usage in natural language processing. Moving on, this book will teach you different dimensionality reduction techniques and their implementation in R. Next, we will cover pattern recognition in text data utilizing classification mechanisms, perform entity recognition, and develop an ontology learning framework.

By the end of the book, you will develop a practical application from the concepts learned, and will understand how text mining can be leveraged to analyze the massively available data on social media.

Style and approach

This book takes a hands-on, example-driven approach to the text mining process with lucid implementation in R.

©2019 GoogleSite Terms of ServicePrivacyDevelopersArtistsAbout Google|Location: United StatesLanguage: English (United States)
By purchasing this item, you are transacting with Google Payments and agreeing to the Google Payments Terms of Service and Privacy Notice.