Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data

Apress
Free sample


Derive useful insights from your data using Python. You will learn both basic and advanced concepts, including text and language syntax, structure, and semantics. You will focus on algorithms and techniques, such as text classification, clustering, topic modeling, and text summarization.

Text Analytics with Python teaches you the techniques related to natural language processing and text analytics, and you will gain the skills to know which technique is best suited to solve a particular problem. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.

What You Will Learn:

  • Understand the major concepts and techniques of natural language processing (NLP) and text analytics, including syntax and structure
  • Build a text classification system to categorize news articles, analyze app or game reviews using topic modeling and text summarization, and cluster popular movie synopses and analyze the sentiment of movie reviews
  • Implement Python and popular open source libraries in NLP and text analytics, such as the natural language toolkit (nltk), gensim, scikit-learn, spaCy and Pattern


Who This Book Is For :
IT professionals, analysts, developers, linguistic experts, data scientists, and anyone with a keen interest in linguistics, analytics, and generating insights from textual data
Read more
Collapse

About the author

Dipanjan Sarkar is a Data Scientist at Intel, the world's largest silicon company which is on a mission to make the world more connected and productive. He primarily works on Analytics, Business Intelligence, Application Development and building large scale Intelligent Systems. He received his master's degree in Information Technology from the International Institute of Information Technology, Bangalore with a focus on Data Science and Software Engineering. He is also an avid supporter of self-learning, especially Massive Open Online Courses and holds a Data Science Specialization from Johns Hopkins University on Coursera.

He has been an analytics practitioner for over 4 years now specializing in statistical, predictive and text analytics. He has also authored a couple of books on R and Machine Learning and occasionally reviews technical books and acts as a course beta tester for Coursera. Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science and more recently, artificial intelligence and deep learning. In his spare time he loves reading, gaming and watching popular sitcoms and football.
Read more
Collapse
Loading...

Additional Information

Publisher
Apress
Read more
Collapse
Published on
Nov 30, 2016
Read more
Collapse
Pages
385
Read more
Collapse
ISBN
9781484223888
Read more
Collapse
Read more
Collapse
Read more
Collapse
Language
English
Read more
Collapse
Genres
Computers / Databases / Data Mining
Computers / Databases / General
Computers / Networking / General
Computers / Programming Languages / General
Read more
Collapse
Content Protection
This content is DRM protected.
Read more
Collapse
Read Aloud
Available on Android devices
Read more
Collapse

Reading information

Smartphones and Tablets

Install the Google Play Books app for Android and iPad/iPhone. It syncs automatically with your account and allows you to read online or offline wherever you are.

Laptops and Computers

You can read books purchased on Google Play using your computer's web browser.

eReaders and other devices

To read on e-ink devices like the Sony eReader or Barnes & Noble Nook, you'll need to download a file and transfer it to your device. Please follow the detailed Help center instructions to transfer the files to supported eReaders.
Automate the predictive analytics process using Oracle Data Miner and Oracle R Enterprise. This book talks about how both these technologies can provide a framework for in-database predictive analytics. You'll see a unified architecture and embedded workflow to automate various analytics steps such as data preprocessing, model creation, and storing final model output to tables.You'll take a deep dive into various statistical models commonly used in businesses and how they can be automated for predictive analytics using various SQL, PLSQL, ORE, ODM, and native R packages. You'll get to know various options available in the ODM workflow for driving automation. Also, you'll get an understanding of various ways to integrate ODM packages, ORE, and native R packages using PLSQL for automating the processes.

Data Science Automation Using Oracle Data Miner and Oracle R Enterprise starts with an introduction to business analytics, covering why automation is necessary and the level of complexity in automation at each analytic stage. Then, it focuses on how predictive analytics can be automated by using Oracle Data Miner and Oracle R Enterprise. Also, it explains when and why ODM and ORE are to be used together for automation.

The subsequent chapters detail various statistical processes used for predictive analytics such as calculating attribute importance, clustering methods, regression analysis, classification techniques, ensemble models, and neural networks. In these chapters you will also get to understand the automation processes for each of these statistical processes using ODM and ORE along with their application in a real-life business use case.

What you'll learn

Discover the functionality of Oracle Data Miner and Oracle R EnterpriseGain methods to perform in-database predictive analyticsUse Oracle's SQL and PLSQL APIs for building analytical solutionsAcquire knowledge of common and widely-used business statistical analysis techniques

Who this book is for

IT executives, BI architects, Oracle architects and developers, R users and statisticians.


Learn advanced analytical techniques and leverage existing tool kits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of classification, clustering, and recommendation.

Pro Hadoop Data Analytics emphasizes best practices to ensure coherent, efficient development. A complete example system will be developed using standard third-party components that consist of the tool kits, libraries, visualization and reporting code, as well as support glue to provide a working and extensible end-to-end system.

The book also highlights the importance of end-to-end, flexible, configurable, high-performance data pipeline systems with analytical components as well as appropriate visualization results. You'll discover the importance of mix-and-match or hybrid systems, using different analytical components in one application. This hybrid approach will be prominent in the examples.

What You'll Learn

Build big data analytic systems with the Hadoop ecosystemUse libraries, tool kits, and algorithms to make development easier and more effectiveApply metrics to measure performance and efficiency of components and systemsConnect to standard relational databases, noSQL data sources, and moreFollow case studies with example components to create your own systemsWho This Book Is For
Software engineers, architects, and data scientists with an interest in the design and implementation of big data analytical systems using Hadoop, the Hadoop ecosystem, and other associated technologies.
Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies.

Next-Generation Big Data takes a holistic approach, covering the most important aspects of modern enterprise big data. The book covers not only the main technology stack but also the next-generation tools and applications used for big data warehousing, data warehouse optimization, real-time and batch data ingestion and processing, real-time data visualization, big data governance, data wrangling, big data cloud deployments, and distributed in-memory big data computing. Finally, the book has an extensive and detailed coverage of big data case studies from Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard.

What You’ll Learn

Install Apache Kudu, Impala, and Spark to modernize enterprise data warehouse and business intelligence environments, complete with real-world, easy-to-follow examples, and practical advice
Integrate HBase, Solr, Oracle, SQL Server, MySQL, Flume, Kafka, HDFS, and Amazon S3 with Apache Kudu, Impala, and Spark
Use StreamSets, Talend, Pentaho, and CDAP for real-time and batch data ingestion and processing
Utilize Trifacta, Alteryx, and Datameer for data wrangling and interactive data processing
Turbocharge Spark with Alluxio, a distributed in-memory storage platform
Deploy big data in the cloud using Cloudera Director
Perform real-time data visualization and time series analysis using Zoomdata, Apache Kudu, Impala, and Spark
Understand enterprise big data topics such as big data governance, metadata management, data lineage, impact analysis, and policy enforcement, and how to use Cloudera Navigator to perform common data governance tasks
Implement big data use cases such as big data warehousing, data warehouse optimization, Internet of Things, real-time data ingestion and analytics, complex event processing, and scalable predictive modeling
Study real-world big data case studies from innovative companies, including Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and MastercardWho This Book Is For

BI and big data warehouse professionals interested in gaining practical and real-world insight into next-generation big data processing and analytics using Apache Kudu, Impala, and Spark; and those who want to learn more about other advanced enterprise topics
Understand the fundamentals of machine learning with R and build your own dynamic algorithms to tackle complicated real-world problems successfullyAbout This BookGet to grips with the concepts of machine learning through exciting real-world examplesVisualize and solve complex problems by using power-packed R constructs and its robust packages for machine learningLearn to build your own machine learning system with this example-based practical guideWho This Book Is For

If you are interested in mining useful information from data using state-of-the-art techniques to make data-driven decisions, this is a go-to guide for you. No prior experience with data science is required, although basic knowledge of R is highly desirable. Prior knowledge in machine learning would be helpful but is not necessary.

What You Will LearnUtilize the power of R to handle data extraction, manipulation, and exploration techniquesUse R to visualize data spread across multiple dimensions and extract useful featuresExplore the underlying mathematical and logical concepts that drive machine learning algorithmsDive deep into the world of analytics to predict situations correctlyImplement R machine learning algorithms from scratch and be amazed to see the algorithms in actionWrite reusable code and build complete machine learning systems from the ground upSolve interesting real-world problems using machine learning and R as the journey unfoldsHarness the power of robust and optimized R packages to work on projects that solve real-world problems in machine learning and data scienceIn Detail

Data science and machine learning are some of the top buzzwords in the technical world today. From retail stores to Fortune 500 companies, everyone is working hard to making machine learning give them data-driven insights to grow their business. With powerful data manipulation features, machine learning packages, and an active developer community, R empowers users to build sophisticated machine learning systems to solve real-world data problems.

This book takes you on a data-driven journey that starts with the very basics of R and machine learning and gradually builds upon the concepts to work on projects that tackle real-world problems.

You'll begin by getting an understanding of the core concepts and definitions required to appreciate machine learning algorithms and concepts. Building upon the basics, you will then work on three different projects to apply the concepts of machine learning, following current trends and cover major algorithms as well as popular R packages in detail. These projects have been neatly divided into six different chapters covering the worlds of e-commerce, finance, and social-media, which are at the very core of this data-driven revolution. Each of the projects will help you to understand, explore, visualize, and derive insights depending upon the domain and algorithms.

Through this book, you will learn to apply the concepts of machine learning to deal with data-related problems and solve them using the powerful yet simple language, R.

Style and approach

The book is an enticing journey that starts from the very basics to gradually pick up pace as the story unfolds. Each concept is first defined in the larger context of things succinctly, followed by a detailed explanation of their application. Each topic is explained with the help of a project that solves a real real-world problem involving hands-on work thus giving you a deep insight into the world of machine learning.

Be prepared for next semester and get set for back to school!

Foreword by Steven Pinker

Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.

By the end of an average day in the early twenty-first century, human beings searching the internet will amass eight trillion gigabytes of data. This staggering amount of information—unprecedented in history—can tell us a great deal about who we are—the fears, desires, and behaviors that drive us, and the conscious and unconscious decisions we make. From the profound to the mundane, we can gain astonishing knowledge about the human psyche that less than twenty years ago, seemed unfathomable.

Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?

Investigating these questions and a host of others, Seth Stephens-Davidowitz offers revelations that can help us understand ourselves and our lives better. Drawing on studies and experiments on how we really live and think, he demonstrates in fascinating and often funny ways the extent to which all the world is indeed a lab. With conclusions ranging from strange-but-true to thought-provoking to disturbing, he explores the power of this digital truth serum and its deeper potential—revealing biases deeply embedded within us, information we can use to change our culture, and the questions we’re afraid to ask that might be essential to our health—both emotional and physical. All of us are touched by big data everyday, and its influence is multiplying. Everybody Lies challenges us to think differently about how we see it and the world.

Big Data Analytics with Spark is a step-by-step guide for learning Spark, which is an open-source fast and general-purpose cluster computing framework for large-scale data analysis. You will learn how to use Spark for different types of big data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. In addition, this book will help you become a much sought-after Spark expert.

Spark is one of the hottest Big Data technologies. The amount of data generated today by devices, applications and users is exploding. Therefore, there is a critical need for tools that can analyze large-scale data and unlock value from it. Spark is a powerful technology that meets that need. You can, for example, use Spark to perform low latency computations through the use of efficient caching and iterative algorithms; leverage the features of its shell for easy and interactive Data analysis; employ its fast batch processing and low latency features to process your real time data streams and so on. As a result, adoption of Spark is rapidly growing and is replacing Hadoop MapReduce as the technology of choice for big data analytics.

This book provides an introduction to Spark and related big-data technologies. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, and MLlib. Big Data Analytics with Spark is therefore written for busy professionals who prefer learning a new technology from a consolidated source instead of spending countless hours on the Internet trying to pick bits and pieces from different sources.

The book also provides a chapter on Scala, the hottest functional programming language, and the program that underlies Spark. You’ll learn the basics of functional programming in Scala, so that you can write Spark applications in it.

What's more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, like Hive, Avro, Kafka and so on. So the book is self-sufficient; all the technologies that you need to know to use Spark are covered. The only thing that you are expected to know is programming in any language.

There is a critical shortage of people with big data expertise, so companies are willing to pay top dollar for people with skills in areas like Spark and Scala. So reading this book and absorbing its principles will provide a boost—possibly a big boost—to your career.

Master the essential skills needed to recognize and solve complex problems with machine learning and deep learning. Using real-world examples that leverage the popular Python machine learning ecosystem, this book is your perfect companion for learning the art and science of machine learning to become a successful practitioner. The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute machine learning systems and projects successfully.Practical Machine Learning with Python follows a structured and comprehensive three-tiered approach packed with hands-on examples and code.

Part 1 focuses on understanding machine learning concepts and tools. This includes machine learning basics with a broad overview of algorithms, techniques, concepts and applications, followed by a tour of the entire Python machine learning ecosystem. Brief guides for useful machine learning tools, libraries and frameworks are also covered.

Part 2 details standard machine learning pipelines, with an emphasis on data processing analysis, feature engineering, and modeling. You will learn how to process, wrangle, summarize and visualize data in its various forms. Feature engineering and selection methodologies will be covered in detail with real-world datasets followed by model building, tuning, interpretation and deployment.

Part 3 explores multiple real-world case studies spanning diverse domains and industries like retail, transportation, movies, music, marketing, computer vision and finance. For each case study, you will learn the application of various machine learning techniques and methods. The hands-on examples will help you become familiar with state-of-the-art machine learning tools and techniques and understand what algorithms are best suited for any problem.

Practical Machine Learning with Python will empower you to start solving your own problems with machine learning today!

What You'll Learn

Execute end-to-end machine learning projects and systems
Implement hands-on examples with industry standard, open source, robust machine learning tools and frameworks
Review case studies depicting applications of machine learning and deep learning on diverse domains and industriesApply a wide range of machine learning models including regression, classification, and clustering.Understand and apply the latest models and methodologies from deep learning including CNNs, RNNs, LSTMs and transfer learning.

Who This Book Is For
IT professionals, analysts, developers, data scientists, engineers, graduate students
©2019 GoogleSite Terms of ServicePrivacyDevelopersArtistsAbout Google|Location: United StatesLanguage: English (United States)
By purchasing this item, you are transacting with Google Payments and agreeing to the Google Payments Terms of Service and Privacy Notice.