- Provides fun and insightful case-lets from real-world stories at the beginning of every chapter. For example IMB Watson Case Study, Google Flu
- Provides a running case study across the chapters as exercises e.g. Google Query Architecture, How Google Search Works
- Dedicated Chapters on Data Mining and Big Data Programming, Appendices on Installation of Hadoop, Spark and Amazon Web Services
Professor of CSE and MIS at MUM, in Fairfield, Iowa
This book is for Java developers who are looking to perform data analysis in production environment. Those who wish to implement data analysis in their Big data applications will find this book helpful.What You Will LearnStart from simple analytic tasks on big dataGet into more complex tasks with predictive analytics on big data using machine learningLearn real time analytic tasksUnderstand the concepts with examples and case studiesPrepare and refine data for analysisCreate charts in order to understand the dataSee various real-world datasetsIn Detail
This book covers case studies such as sentiment analysis on a tweet dataset, recommendations on a movielens dataset, customer segmentation on an ecommerce dataset, and graph analysis on actual flights dataset.
This book is an end-to-end guide to implement analytics on big data with Java. Java is the de facto language for major big data environments, including Hadoop. This book will teach you how to perform analytics on big data with production-friendly Java. This book basically divided into two sections. The first part is an introduction that will help the readers get acquainted with big data environments, whereas the second part will contain a hardcore discussion on all the concepts in analytics on big data. It will take you from data analysis and data visualization to the core concepts and advantages of machine learning, real-life usage of regression and classification using Naive Bayes, a deep discussion on the concepts of clustering,and a review of simple neural networks on big data using deepLearning4j or plain Java Spark code. This book is a must-have book for Java developers who want to start learning big data analytics and want to use it in the real world.Style and approach
The approach of book is to deliver practical learning modules in manageable content. Each chapter is a self-contained unit of a concept in big data analytics. Book will step by step builds the competency in the area of big data analytics. Examples using real world case studies to give ideas of real applications and how to use the techniques mentioned. The examples and case studies will be shown using both theory and code.
Drawing on extensive experience as a researcher, practitioner, and instructor, Dr. Dursun Delen delivers an optimal balance of concepts, techniques and applications. Without compromising either simplicity or clarity, he provides enough technical depth to help readers truly understand how data mining technologies work. Coverage includes: processes, methods, techniques, tools, and metrics; the role and management of data; text and web mining; sentiment analysis; and Big Data integration. Throughout, Delen's conceptual coverage is complemented with application case studies (examples of both successes and failures), as well as simple, hands-on tutorials.
Real-World Data Mining will be valuable to professionals on analytics teams; professionals seeking certification in the field; and undergraduate or graduate students in any analytics program: concentrations, certificate-based, or degree-based.
Big data is ubiquitous but heterogeneous. Big data can be used to tally clicks and traffic on web pages, find patterns in stock trades, track consumer preferences, identify linguistic correlations in large corpuses of texts. This book examines big data not as an undifferentiated whole but contextually, investigating the varied challenges posed by big data for health, science, law, commerce, and politics. Taken together, the chapters reveal a complex set of problems, practices, and policies.
The advent of big data methodologies has challenged the theory-driven approach to scientific knowledge in favor of a data-driven one. Social media platforms and self-tracking tools change the way we see ourselves and others. The collection of data by corporations and government threatens privacy while promoting transparency. Meanwhile, politicians, policy makers, and ethicists are ill-prepared to deal with big data's ramifications. The contributors look at big data's effect on individuals as it exerts social control through monitoring, mining, and manipulation; big data and society, examining both its empowering and its constraining effects; big data and science, considering issues of data governance, provenance, reuse, and trust; and big data and organizations, discussing data responsibility, “data harm,” and decision making.
Ryan Abbott, Cristina Alaimo, Kent R. Anderson, Mark Andrejevic, Diane E. Bailey, Mike Bailey, Mark Burdon, Fred H. Cate, Jorge L. Contreras, Simon DeDeo, Hamid R. Ekbia, Allison Goodwell, Jannis Kallinikos, Inna Kouper, M. Lynne Markus, Michael Mattioli, Paul Ohm, Scott Peppet, Beth Plale, Jason Portenoy, Julie Rennecker, Katie Shilton, Dan Sholler, Cassidy R. Sugimoto, Isuru Suriarachchi, Jevin D. West
If you are a data scientist or data analyst who wants to learn Big Data processing using Apache Spark and Python, this book is for you. If you have some programming experience in Python, and want to learn how to process large amounts of data using Apache Spark, Frank Kane's Taming Big Data with Apache Spark and Python will also help you.What You Will LearnFind out how you can identify Big Data problems as Spark problemsInstall and run Apache Spark on your computer or on a clusterAnalyze large data sets across many CPUs using Spark's Resilient Distributed DatasetsImplement machine learning on Spark using the MLlib libraryProcess continuous streams of data in real time using the Spark streaming modulePerform complex network analysis using Spark's GraphX libraryUse Amazon's Elastic MapReduce service to run your Spark jobs on a clusterIn Detail
Frank Kane's Taming Big Data with Apache Spark and Python is your companion to learning Apache Spark in a hands-on manner. Frank will start you off by teaching you how to set up Spark on a single system or on a cluster, and you'll soon move on to analyzing large data sets using Spark RDD, and developing and running effective Spark jobs quickly using Python.
Apache Spark has emerged as the next big thing in the Big Data domain – quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis, making it an essential tool in many modern businesses.
Frank has packed this book with over 15 interactive, fun-filled examples relevant to the real world, and he will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease.Style and approach
Frank Kane's Taming Big Data with Apache Spark and Python is a hands-on tutorial with over 15 real-world examples carefully explained by Frank in a step-by-step manner. The examples vary in complexity, and you can move through them at your own pace.