In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.
Martin is a researcher in distributed systems at the University of Cambridge. Previously he was a software engineer and entrepreneur at Internet companies including LinkedIn and Rapportive, where he worked on large-scale data infrastructure. In the process he learned a few things the hard way, and he hopes this book will save you from repeating the same mistakes.
Martin is a regular conference speaker, blogger, and open source contributor. He believes that profound technical ideas should be accessible to everyone, and that deeper understanding will help us develop better software.
Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization.Build value from your data in a series of agile sprints, using the data-value pyramidExtract features for statistical models from a single datasetVisualize data with charts, and expose different aspects through interactive reportsUse historical data to predict the future via classification and regressionTranslate predictions into actionsGet feedback from users after each sprint to keep your project on track
If you are a data scientist who wants to learn how to perform deep learning on Hadoop, this is the book for you. Knowledge of the basic machine learning concepts and some understanding of Hadoop is required to make the best use of this book.What You Will LearnExplore Deep Learning and various models associated with itUnderstand the challenges of implementing distributed deep learning with Hadoop and how to overcome itImplement Convolutional Neural Network (CNN) with deeplearning4jDelve into the implementation of Restricted Boltzmann Machines (RBM)Understand the mathematical explanation for implementing Recurrent Neural Networks (RNN)Get hands on practice of deep learning and their implementation with Hadoop.In Detail
This book will teach you how to deploy large-scale dataset in deep neural networks with Hadoop for optimal performance.
Starting with understanding what deep learning is, and what the various models associated with deep neural networks are, this book will then show you how to set up the Hadoop environment for deep learning. In this book, you will also learn how to overcome the challenges that you face while implementing distributed deep learning with large-scale unstructured datasets. The book will also show you how you can implement and parallelize the widely used deep learning models such as Deep Belief Networks, Convolutional Neural Networks, Recurrent Neural Networks, Restricted Boltzmann Machines and autoencoder using the popular deep learning library deeplearning4j.
Get in-depth mathematical explanations and visual representations to help you understand the design and implementations of Recurrent Neural network and Denoising AutoEncoders with deeplearning4j. To give you a more practical perspective, the book will also teach you the implementation of large-scale video processing, image processing and natural language processing on Hadoop.
By the end of this book, you will know how to deploy various deep neural networks in distributed systems using Hadoop.Style and approach
This book takes a comprehensive, step-by-step approach to implement efficient deep learning models on Hadoop. It starts from the basics and builds the readers' knowledge as they strengthen their understanding of the concepts. Practical examples are included in every step of the way to supplement the theory.
If you are new to the NoSQL document system or have little or no experience in NoSQL development and administration and are planning to deploy Couchbase for your next project, then this book is for you. It would be helpful to have a bit of familiarity with Java.What You Will LearnGet acquainted with the concept of NoSQL databases and configure your Couchbase database clusterMaintain Couchbase effectively using the web-based administrative console with easeEnable partition capabilities by making use of BucketsAnalyze important design considerations for maintaining relationship between various documentsUse Couchbase SDK Java API to store and retrieve documentWrite views using map/reduce to retrieve documents efficientlyGet familiar with N1QL and how to use it in Java applicationsIntegrate Couchbase with Elasticsearch to implement full text searchConfigure XDCR for disaster recovery and develop ecommerce application using CouchbaseIn Detail
NoSQL database systems have changed application development in terms of adaptability to dynamics schema and scalability. Compared with the currently available NoSQL database systems, Couchbase is the fastest. Its ease of configuration and powerful features for storing different schema structures, retrieval using map reduce and inbuilt disaster recovery by replicating document across the geographical region, make it one of the most powerful, scalable and comprehensive NoSQL in the market. Couchbase also introduces smart client API for various programming language to integrate the database with the application easily, yet providing very complex features like cluster health awareness.
This book achieves its goal by taking up an end-to-end development structure, right from understanding NOSQL document design to implementing full fledged eCommerce application design using Couchbase as a backend.
Starting with the architecture of Couchbase to get you up and running, this book quickly takes you through designing a NoSQL document and implementing highly scalable applications using Java API. You will then be introduced to document design and get to know the various ways to administer Couchbase. Followed by this, learn to store documents using bucket. Moving on, you will then learn to store, retrieve and delete documents using smart client base on Java API. You will then retrieve documents using SQL like syntax call N1QL. Next, you will learn how to write map reduce base views. Finally, you will configure XDCR for disaster recovery and implement an eCommerce application using Couchbase.Style and approach
The book starts from absolute basics and slowly moves to more advanced topics ensuring at every step that all concepts and terms are understood by the reader to have complete understanding at every stage. Technical and complex terms are explained in clear and simple language, thus making this book a perfect companion for those who have started their journey to NoSQL using Couchbase
This guide organizes APIs by the subjects they cover—such as websites, people, or places—so you can quickly locate the best resources for augmenting the data you handle in your own service. Categories include:Website tools such as WHOIS, bit.ly, and CompeteServices that use email addresses as search terms, including GithubFinding information from just a name, with APIs such as WhitePagesServices, such as Klout, for locating people with Facebook and Twitter accountsSearch APIs, including BOSS and WikipediaGeographical data sources, including SimpleGeo and U.S. CensusCompany information APIs, such as CrunchBase and ZoomInfoAPIs that list IP addresses, such as MaxMindServices that list books, films, music, and products
You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance.
If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications.
With this book, you will:Familiarize yourself with the Spark programming modelBecome comfortable within the Spark ecosystemLearn general approaches in data scienceExamine complete implementations that analyze large public data setsDiscover which machine learning tools make sense for particular problemsAcquire code that can be adapted to many uses
Based on his popular blog posts, LinkedIn principal engineer Jay Kreps shows you how logs work in distributed systems, and then delivers practical applications of these concepts in a variety of common uses—data integration, enterprise architecture, real-time stream processing, data system design, and abstract computing models.
Go ahead and take the plunge with logs; you’re going love them.Learn how logs are used for programmatic access in databases and distributed systemsDiscover solutions to the huge data integration problem when more data of more varieties meet more systemsUnderstand why logs are at the heart of real-time stream processingLearn the role of a log in the internals of online data systemsExplore how Jay Kreps applies these ideas to his own work on data infrastructure systems at LinkedIn
Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.Understand how data science fits in your organization—and how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidates
Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you’ll learn Kafka’s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.Understand publish-subscribe messaging and how it fits in the big data ecosystem.Explore Kafka producers and consumers for writing and reading messagesUnderstand Kafka patterns and use-case requirements to ensure reliable data deliveryGet best practices for building data pipelines and applications with KafkaManage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasksLearn the most critical metrics among Kafka’s operational measurementsExplore how Kafka’s stream delivery capabilities make it a perfect source for stream processing systems
Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.Use the IPython shell and Jupyter notebook for exploratory computingLearn basic and advanced features in NumPy (Numerical Python)Get started with data analysis tools in the pandas libraryUse flexible tools to load, clean, transform, merge, and reshape dataCreate informative visualizations with matplotlibApply the pandas groupby facility to slice, dice, and summarize datasetsAnalyze and manipulate regular and irregular time series dataLearn how to solve real-world data analysis problems with thorough, detailed examples