An audacious, irreverent investigation of human behavior—and a first look at a revolution in the making
Our personal data has been used to spy on us, hire and fire us, and sell us stuff we don’t need. In Dataclysm, Christian Rudder uses it to show us who we truly are.
For centuries, we’ve relied on polling or small-scale lab experiments to study human behavior. Today, a new approach is possible. As we live more of our lives online, researchers can finally observe us directly, in vast numbers, and without filters. Data scientists have become the new demographers.
In this daring and original book, Rudder explains how Facebook "likes" can predict, with surprising accuracy, a person’s sexual orientation and even intelligence; how attractive women receive exponentially more interview requests; and why you must have haters to be hot. He charts the rise and fall of America’s most reviled word through Google Search and examines the new dynamics of collaborative rage on Twitter. He shows how people express themselves, both privately and publicly. What is the least Asian thing you can say? Do people bathe more in Vermont or New Jersey? What do black women think about Simon & Garfunkel? (Hint: they don’t think about Simon & Garfunkel.) Rudder also traces human migration over time, showing how groups of people move from certain small towns to the same big cities across the globe. And he grapples with the challenge of maintaining privacy in a world where these explorations are possible.
Visually arresting and full of wit and insight, Dataclysm is a new way of seeing ourselves—a brilliant alchemy, in which math is made human and numbers become the narrative of our time.
From the Hardcover edition.
The book begins with a summary of the nontechnical aspects of interviewing, such as common mistakes, strategies for a great interview, perspectives from the other side of the table, tips on negotiating the best offer, and a guide to the best ways to use EPI.
The technical core of EPI is a sequence of chapters on basic and advanced data structures, searching, sorting, broad algorithmic principles, concurrency, and system design. Each chapter consists of a brief review, followed by a broad and thought-provoking series of problems. We include a summary of data structure, algorithm, and problem solving patterns.
Inside, you'll learn about:
Interaction design and physical computingThe Arduino hardware and software development environmentBasics of electricity and electronicsPrototyping on a solderless breadboardDrawing a schematic diagram
And more. With inexpensive hardware and open-source software components that you can download free, getting started with Arduino is a snap. To use the introductory examples in this book, all you need is a USB Arduino, USB A-B cable, and an LED.
Join the tens of thousands of hobbyists who have discovered this incredible (and educational) platform. Written by the co-founder of the Arduino project, with illustrations by Elisa Canducci, Getting Started with Arduino gets you in on the fun! This 128-page book is a greatly expanded follow-up to the author's original short PDF that's available on the Arduino website.
This is an engineering book. You will not find much prose in here (the author’s English is broken anyway.) Instead, this book has only bit of text and plenty of drawings attempting to describe in great detail the Wolfenstein 3D game engine and its hardware, the IBM PC with an Intel 386 CPU and a VGA graphic card.
Game Engine Black Book details techniques such as raycasting, compiled scalers, deferred rendition, VGA Mode-Y, linear feedback shift register, fixed point arithmetic, pulse width modulation, runtime generated code, self-modifying code, and many others tricks. Open up to discover the architecture of the software which pioneered the First Person Shooter genre.
Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.Understand how data science fits in your organization—and how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidates
SQLite is a small, embeddable, SQL-based, relational database management system. It has been widely used in low- to medium-tier database applications, especially in embedded devices. This book provides a comprehensive description of SQLite database system. It describes design principles, engineering trade-offs, implementation issues, and operations of SQLite.
In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.Peer under the hood of the systems you already use, and learn how to use and operate them more effectivelyMake informed decisions by identifying the strengths and weaknesses of different toolsNavigate the trade-offs around consistency, scalability, fault tolerance, and complexityUnderstand the distributed systems research upon which modern databases are builtPeek behind the scenes of major online services, and learn from their architectures
Why do some games become boring quickly, while others remain fun for years? How do games serve as fundamental and powerful learning tools? Whether you’re a game developer, dedicated gamer, or curious observer, this illustrated, fully updated edition helps you understand what drives this major cultural force, and inspires you to take it further.
You’ll discover that:Games play into our innate ability to seek patterns and solve puzzlesMost successful games are built upon the same elementsSlightly more females than males now play gamesMany games still teach primitive survival skillsFictional dressing for modern games is more developed than the conceptual elementsTruly creative designers seldom use other games for inspirationGames are beginning to evolve beyond their prehistoric origins
Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you’ll learn how to analyze sample datasets and write simple machine learning algorithms. Machine Learning for Hackers is ideal for programmers from any background, including business, government, and academic research.Develop a naïve Bayesian classifier to determine if an email is spam, based only on its textUse linear regression to predict the number of page views for the top 1,000 websitesLearn optimization techniques by attempting to break a simple letter cipherCompare and contrast U.S. Senators statistically, based on their voting recordsBuild a “whom to follow” recommendation system from Twitter data
The algorithms in this book represent a body of knowledge developed over the last 50 years that has become indispensable, not just for professional programmers and computer science students but for any student with interests in science, mathematics, and engineering, not to mention students who use computation in the liberal arts.
The companion web site, algs4.cs.princeton.edu, containsAn online synopsis Full Java implementations Test data Exercises and answers Dynamic visualizations Lecture slides Programming assignments with checklists Links to related material
The MOOC related to this book is accessible via the "Online Course" link at algs4.cs.princeton.edu. The course offers more than 100 video lecture segments that are integrated with the text, extensive online assessments, and the large-scale discussion forums that have proven so valuable. Offered each fall and spring, this course regularly attracts tens of thousands of registrants.
Robert Sedgewick and Kevin Wayne are developing a modern approach to disseminating knowledge that fully embraces technology, enabling people all around the world to discover new ways of learning and teaching. By integrating their textbook, online content, and MOOC, all at the state of the art, they have built a unique resource that greatly expands the breadth and depth of the educational experience.
This book covers:Arrays and lists: the most common data structuresStacks and queues: more complex list-like data structuresLinked lists: how they overcome the shortcomings of arraysDictionaries: storing data as key-value pairsHashing: good for quick insertion and retrievalSets: useful for storing unique elements that appear only onceBinary Trees: storing data in a hierarchical mannerGraphs and graph algorithms: ideal for modeling networksAlgorithms: including those that help you sort or search dataAdvanced algorithms: dynamic programming and greedy algorithms
Updated for R 2.14 and 2.15, this second edition includes new and expanded chapters on R performance, the ggplot2 data visualization package, and parallel R computing with Hadoop.Get started quickly with an R tutorial and hundreds of examplesExplore R syntax, objects, and other language detailsFind thousands of user-contributed R packages online, including BioconductorLearn how to use R to prepare data for analysisVisualize your data with R’s graphics, lattice, and ggplot2 packagesUse R to calculate statistical fests, fit models, and compute probability distributionsSpeed up intensive computations by writing parallel R programs for HadoopGet a complete desktop reference to R
By working with a single case study throughout this thoroughly revised book, you’ll learn the entire process of exploratory data analysis—from collecting data and generating statistics to identifying patterns and testing hypotheses. You’ll explore distributions, rules of probability, visualization, and many other tools and concepts.
New chapters on regression, time series analysis, survival analysis, and analytic methods will enrich your discoveries.Develop an understanding of probability and statistics by writing and testing codeRun experiments to test statistical behavior, such as generating samples from several distributionsUse simulations to understand concepts that are hard to grasp mathematicallyImport data from most sources with Python, rather than rely on data that’s cleaned and formatted for statistics toolsUse statistical inference to answer questions about real-world data
Companies such as Google, Microsoft, and Facebook are actively growing in-house deep-learning teams. For the rest of us, however, deep learning is still a pretty complex and difficult subject to grasp. If you’re familiar with Python, and have a background in calculus, along with a basic understanding of machine learning, this book will get you started.Examine the foundations of machine learning and neural networksLearn how to train feed-forward neural networksUse TensorFlow to implement your first neural networkManage problems that arise as you begin to make networks deeperBuild neural networks that analyze complex imagesPerform effective dimensionality reduction using autoencodersDive deep into sequence analysis to examine languageLearn the fundamentals of reinforcement learning
Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.Use the IPython shell and Jupyter notebook for exploratory computingLearn basic and advanced features in NumPy (Numerical Python)Get started with data analysis tools in the pandas libraryUse flexible tools to load, clean, transform, merge, and reshape dataCreate informative visualizations with matplotlibApply the pandas groupby facility to slice, dice, and summarize datasetsAnalyze and manipulate regular and irregular time series dataLearn how to solve real-world data analysis problems with thorough, detailed examples
If you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning – whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource.What You Will LearnExplore how to use different machine learning models to ask different questions of your dataLearn how to build neural networks using Keras and TheanoFind out how to write clean and elegant Python code that will optimize the strength of your algorithmsDiscover how to embed your machine learning model in a web application for increased accessibilityPredict continuous target outcomes using regression analysisUncover hidden patterns and structures in data with clusteringOrganize data using effective pre-processing techniquesGet to grips with sentiment analysis to delve deeper into textual and social media dataIn Detail
Machine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, becoming one of the key strategies for unlocking growth in a challenging contemporary marketplace. Python can help you deliver key insights into your data – its unique capabilities as a language let you build sophisticated algorithms and statistical models that can reveal new perspectives and answer key questions that are vital for success.
Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world's leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you'll soon be able to answer some of the most important questions facing you and your organization.Style and approach
Python Machine Learning connects the fundamental theoretical principles behind machine learning to their practical application in a way that focuses you on asking and answering the right questions. It walks you through the key elements of Python and its powerful machine learning libraries, while demonstrating how to get to grips with a range of statistical models.
Bradley Holt, co-founder of the creative services firm Found Line, is a web developer and entrepreneur ten years of PHP and MySQL experience. He began using CouchDB before the release of version 1.0. Bradley is an active member of the PHP community, and can be reached at bradley-holt.com.
The code-packed examples in this book will help you learn how to work with documents, populate a simple database, replicate data from one database to another, and a host of other tasks.Install CouchDB on Linux, Mac OS X, Windows, or (if you must) from the source codeInteract with data through CouchDB’s RESTful API, and use standard HTTP operations, such as PUT, GET, POST, and DELETEUse Futon—CouchDB’s web-based interface— to manage databases and documents, and to configure replicationsLearn how to create, update, and delete documents in JSON format, and how to create and delete databasesWork with design documents to get the formatting and indexing your application requires
The CWNA: Certified Wireless Network Administrator OfficialStudy Guide: Exam CWNA-106 is the officially endorsed CWNA testprep for the leading wireless certification. Expert authors andCWNEs David D. Coleman and David A. Westcott guide readers throughthe skills and concepts candidates need to know for the exam, usinghands-on methods to convey an in-depth understanding of wirelessnetwork administration. Readers should have a basic knowledge ofRadio Frequency behavior, experience with WLAN hardware peripheralsand protocols, and an interest in designing, installing, andmanaging wireless networks.
Wireless technology is taking over the tech industry, and thedemand for competent, certified professionals is far outpacing thesupply. A CWNA certification denotes advanced-level proficiency inthe field, with a complete understanding of wireless LANcomponents, features, and function—but the only way to passthe exam is to truly understand the material, not just the talkingpoints. The CWNA: Certified Wireless Network AdministratorOfficial Study Guide thoroughly covers each exam objective, andincludes review questions, assessment tests, and exercises to testyour skills. Topics include:Radio Frequency technologies, regulations, and standards802.11 protocolsNetwork implementation and security802.11 RF site surveying
Readers also get access to a suite of study tools including anelectronic test engine with hundreds or practice test questions,electronic flashcards, exercise peripherals, and industry WhitePapers, which serve as valuable backup references. In preparing forthe CWNA-106 exam, the ideal study guide should cover all of theexam topics in depth—CWNA: Certified Wireless NetworkAdministrator Official Study Guide does just that, making it anexcellent, comprehensive study guide.
Written by the founders of Processing, this book takes you through the learning process one step at a time to help you grasp core programming concepts. You'll learn how to sketch with code -- creating a program with one a line of code, observing the result, and then adding to it. Join the thousands of hobbyists, students, and professionals who have discovered this free and educational community platform.Quickly learn programming basics, from variables to objects Understand the fundamentals of computer graphics Get acquainted with the Processing software development environment Create interactive graphics with easy-to-follow projects Use the Arduino open source prototyping platform to control your Processing graphics
It includes Matlab code of the most common methods and algorithms in the book, together with a descriptive summary and solved examples, and including real-life data sets in imaging and audio recognition.
This text is designed for electronic engineering, computer science, computer engineering, biomedical engineering and applied mathematics students taking graduate courses on pattern recognition and machine learning as well as R&D engineers and university researchers in image and signal processing/analyisis, and computer vision.Matlab code and descriptive summary of the most common methods and algorithms in Theodoridis/Koutroumbas, Pattern Recognition, Fourth EditionSolved examples in Matlab, including real-life data sets in imaging and audio recognitionAvailable separately or at a special package price with the main text (ISBN for package: 978-0-12-374491-3)
The Concept and Object Modeling Notation (COMN) is able to cover the full spectrum of analysis and design. A single COMN model can represent the objects and concepts in the problem space, logical data design, and concrete NoSQL and SQL document, key-value, columnar, and relational database implementations. COMN models enable an unprecedented level of traceability of requirements to implementation. COMN models can also represent the static structure of software and the predicates that represent the patterns of meaning in databases.
This book will teach you:the simple and familiar graphical notation of COMN with its three basic shapes and four line styles how to think about objects, concepts, types, and classes in the real world, using the ordinary meanings of English words that aren’t tangled with confused techno-speak how to express logical data designs that are freer from implementation considerations than is possible in any other notation how to understand key-value, document, columnar, and table-oriented database designs in logical and physical terms how to use COMN to specify physical database implementations in any NoSQL or SQL database with the precision necessary for model-driven development
Implementing Splunk Second Edition is a learning guide that introduces you to all the latest features and improvements of Splunk 6.2. The book starts by introducing you to various concepts such as charting, reporting, clustering, and visualization. Every chapter is dedicated to enhancing your knowledge of a specific concept, including data models and pivots, speeding up your queries, backfilling, data replication, and so on. By the end of the book, you'll have a very good understanding of Splunk and be able to perform efficient data analysis.
Inside, you’ll learn about:Interaction design and physical computing The Arduino hardware and software development environment Basics of electricity and electronics Prototyping on a solderless breadboard Drawing a schematic diagram
Getting started with Arduino is a snap. To use the introductory examples in this guide, all you need an Arduino Uno or earlier model, along with USB A-B cable and an LED. The easy-to-use Arduino development environment is free to download.
Join hundreds of thousands of hobbyists who have discovered this incredible (and educational) platform. Written by the co-founder of the Arduino project, Getting Started with Arduino gets you in on all the fun!
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.Get a crash course in PythonLearn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data scienceCollect, explore, clean, munge, and manipulate dataDive into the fundamentals of machine learningImplement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clusteringExplore recommender systems, natural language processing, network analysis, MapReduce, and databases
This updated second edition provides guidance for database developers, advanced configuration for system administrators, and an overview of the concepts and use cases for other people on your project. Ideal for NoSQL newcomers and experienced MongoDB users alike, this guide provides numerous real-world schema design examples.Get started with MongoDB core concepts and vocabularyPerform basic write operations at different levels of safety and speedCreate complex queries, with options for limiting, skipping, and sorting resultsDesign an application that works well with MongoDBAggregate data, including counting, finding distinct values, grouping documents, and using MapReduceGather and interpret statistics about your collections and databasesSet up replica sets and automatic failover in MongoDBUse sharding to scale horizontally, and learn how it impacts applicationsDelve into monitoring, security and authentication, backup/restore, and other administrative tasks
Project-oriented and classroom-tested, the book presents a number of important algorithms supported by motivating examples that bring meaning to the problems faced by computer programmers. The idea of computational complexity is also introduced, demonstrating what can and cannot be computed efficiently so that the programmer can make informed judgements about the algorithms they use. The text assumes some basic experience in computer programming and familiarity in an object-oriented language, but not necessarily with Python.
Topics and features: includes both introductory and advanced data structures and algorithms topics, with suggested chapter sequences for those respective courses provided in the preface; provides learning goals, review questions and programming exercises in each chapter, as well as numerous illustrative examples; offers downloadable programs and supplementary files at an associated website, with instructor materials available from the author; presents a primer on Python for those coming from a different language background; reviews the use of hashing in sets and maps, along with an examination of binary search trees and tree traversals, and material on depth first search of graphs; discusses topics suitable for an advanced course, such as membership structures, heaps, balanced binary search trees, B-trees and heuristic search.
Students of computer science will find this clear and concise textbook to be invaluable for undergraduate courses on data structures and algorithms, at both introductory and advanced levels. The book is also suitable as a refresher guide for computer programmers starting new jobs working with Python.
· Introduces the concept of discrete event Monte Carlo simulation, the most commonly used methodology for modeling and analysis of complex systems
· Covers essential workings of the popular animated simulation language, ARENA, including set-up, design parameters, input data, and output analysis, along with a wide variety of sample model applications from production lines to transportation systems
· Reviews elements of statistics, probability, and stochastic processes relevant to simulation modeling
* Ample end-of-chapter problems and full Solutions Manual
* Includes CD with sample ARENA modeling programs
Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you’re a beginner, R Cookbook will help get you started. If you’re an experienced data programmer, it will jog your memory and expand your horizons. You’ll get the job done faster and learn more about R in the process.Create vectors, handle variables, and perform other basic functionsInput and output dataTackle data structures such as matrices, lists, factors, and data framesWork with probability, probability distributions, and random variablesCalculate statistics and confidence intervals, and perform statistical testsCreate a variety of graphic displaysBuild statistical models with linear regressions and analysis of variance (ANOVA)Explore advanced statistical techniques, such as finding clusters in your data
"Wonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language—one practical example at a time."—Jeffrey Ryan, software consultant and R package author
This innovative, comprehensive book examines the user-centered design process from the perspective of a designer. With rich imagery,Interactive Designintroduces the different UX players, outlines the user-centered design process from user research to user testing, and explains through various examples how user-centered design has been successfully integrated into the design process of a variety of design studios worldwide.
Author Bob DuCharme has you writing simple queries right away before providing background on how SPARQL fits into RDF technologies. Using short examples that you can run yourself with open source software, you’ll learn how to update, add to, and delete data in RDF datasets.Get the big picture on RDF, linked data, and the semantic webUse SPARQL to find bad data and create new data from existing dataUse datatype metadata and functions in your queriesLearn techniques and tools to help your queries run more efficientlyUse RDF Schemas and OWL ontologies to extend the power of your queriesDiscover the roles that SPARQL can play in your applications
How to Cheat in Unity 5takes a no-nonsense approach to help you achieve fast and effective results with Unity 5. Geared towards the intermediate user, HTC in Unity 5 provides content beyond what an introductory book offers, and allows you to work more quickly and powerfully in Unity. Packed full with easy-to-follow methods to get the most from Unity, this book explores time-saving features for interface customization and scene management, along with productivity-enhancing ways to work with rendering and optimization. In addition, this book features a companion website at www.alanthorn.net, where you can download the book’s companion files and also watch bonus tutorial video content.
Learn bite-sized tips and tricks for effective Unity workflows
Become a more powerful Unity user through interface customization
Enhance your productivity with rendering tricks, better scene organization and more
Better understand Unity asset and import workflows
Learn techniques to save you time and money during development
Inspired by Lean and Agile development theories, Lean UX lets you focus on the actual experience being designed, rather than deliverables. This book shows you how to collaborate closely with other members of your Agile product team, and gather feedback early and often. You’ll learn how to drive the design in short, iterative cycles to assess what works best for the business and the user. Lean UX shows you how to make this change—for the better.Frame a vision of the problem you’re solving and focus your team on the right outcomesBring the designers’ toolkit to the rest of your product teamShare your insights with your team much earlier in the processCreate Minimum Viable Products to determine which ideas are validIncorporate the voice of the customer throughout the project cycleMake your team more productive: combine Lean UX with Agile’s Scrum frameworkUnderstand the organizational shifts necessary to integrate Lean UX
The pervasiveness and range of capabilities of today’s mobile devices have enabled a wide spectrum of mobile applications that are transforming our daily lives, from smartphones equipped with GPS to integrated mobile sensors that acquire physiological data. Human Activity Recognition: Using Wearable Sensors and Smartphones focuses on the automatic identification of human activities from pervasive wearable sensors—a crucial component for health monitoring and also applicable to other areas, such as entertainment and tactical operations.
Developed from the authors’ nearly four years of rigorous research in the field, the book covers the theory, fundamentals, and applications of human activity recognition (HAR). The authors examine how machine learning and pattern recognition tools help determine a user’s activity during a certain period of time. They propose two systems for performing HAR: Centinela, an offline server-oriented HAR system, and Vigilante, a completely mobile real-time activity recognition system. The book also provides a practical guide to the development of activity recognition applications in the Android framework.
This book is for intermediate Python developers who want to engage with the use of public APIs to collect data from social media platforms and perform statistical analysis in order to produce useful insights from data. The book assumes a basic understanding of the Python Standard Library and provides practical examples to guide you toward the creation of your data analysis project based on social data.What You Will LearnInteract with a social media platform via their public API with PythonStore social data in a convenient format for data analysisSlice and dice social data using Python tools for data scienceApply text analytics techniques to understand what people are talking about on social mediaApply advanced statistical and analytical techniques to produce useful insights from dataBuild beautiful visualizations with web technologies to explore data and present data productsIn Detail
Your social media is filled with a wealth of hidden data – unlock it with the power of Python. Transform your understanding of your clients and customers when you use Python to solve the problems of understanding consumer behavior and turning raw data into actionable customer insights.
This book will help you acquire and analyze data from leading social media sites. It will show you how to employ scientific Python tools to mine popular social websites such as Facebook, Twitter, Quora, and more. Explore the Python libraries used for social media mining, and get the tips, tricks, and insider insight you need to make the most of them. Discover how to develop data mining tools that use a social media API, and how to create your own data analysis projects using Python for clear insight from your social data.Style and approach
This practical, hands-on guide will help you learn everything you need to perform data mining for social media. Throughout the book, we take an example-oriented approach to use Python for data analysis and provide useful tips and tricks that you can use in day-to-day tasks.
Data scientists and analysts will learn how to perform a wide range of techniques, from writing MapReduce and Spark applications with Python to using advanced modeling and data management with Spark MLlib, Hive, and HBase. You’ll also learn about the analytical processes and data systems available to build and empower data products that can handle—and actually require—huge amounts of data.Understand core concepts behind Hadoop and cluster computingUse design patterns and parallel analytical algorithms to create distributed data analysis jobsLearn about data management, mining, and warehousing in a distributed context using Apache Hive and HBaseUse Sqoop and Apache Flume to ingest data from relational databasesProgram complex Hadoop and Spark applications with Apache Pig and Spark DataFramesPerform machine learning techniques such as classification, clustering, and collaborative filtering with Spark’s MLlib
This valuable handbook has attracted scores of contributors since the European Journalism Centre and the Open Knowledge Foundation launched the project at MozFest 2011. Through a collection of tips and techniques from leading journalists, professors, software developers, and data analysts, you’ll learn how data can be either the source of data journalism or a tool with which the story is told—or both.Examine the use of data journalism at the BBC, the Chicago Tribune, the Guardian, and other news organizationsExplore in-depth case studies on elections, riots, school performance, and corruptionLearn how to find data from the Web, through freedom of information laws, and by "crowd sourcing"Extract information from raw data with tips for working with numbers and statistics and using data visualizationDeliver data through infographics, news apps, open data platforms, and download links