Machine learning and data science are large disciplines, requiring years of study in order to gain proficiency. This book can be viewed as a set of essential tools we need for a long-term career in the data science field – recommendations are provided for further study in order to build advanced skills in tackling important data problem domains.
The R statistical environment was chosen for use in this book. R is a growing phenomenon worldwide, with many data scientists using it exclusively for their project work. All of the code examples for the book are written in R. In addition, many popular R packages and data sets will be used.
Daniel D. Gutierrez is a practicing data scientist through his Santa Monica, Calif. consulting firm AMULET Analytics. Daniel also serves as Managing Editor for insideBIGDATA.com where he keeps a pulse on this dynamic industry. He is also an educator and teaches classes in data science, machine learning and R for universities and large enterprises. Daniel holds a BS degree in mathematics and computer science from UCLA.
Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.
Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way.
You’ll learn how to:Wrangle—transform your datasets into a form convenient for analysisProgram—learn powerful R tools for solving data problems with greater clarity and easeExplore—examine your data, generate hypotheses, and quickly test themModel—provide a low-dimensional summary that captures true "signals" in your datasetCommunicate—learn R Markdown for integrating prose, code, and results
Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.
By the end of an average day in the early twenty-first century, human beings searching the internet will amass eight trillion gigabytes of data. This staggering amount of information—unprecedented in history—can tell us a great deal about who we are—the fears, desires, and behaviors that drive us, and the conscious and unconscious decisions we make. From the profound to the mundane, we can gain astonishing knowledge about the human psyche that less than twenty years ago, seemed unfathomable.
Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?
Investigating these questions and a host of others, Seth Stephens-Davidowitz offers revelations that can help us understand ourselves and our lives better. Drawing on studies and experiments on how we really live and think, he demonstrates in fascinating and often funny ways the extent to which all the world is indeed a lab. With conclusions ranging from strange-but-true to thought-provoking to disturbing, he explores the power of this digital truth serum and its deeper potential—revealing biases deeply embedded within us, information we can use to change our culture, and the questions we’re afraid to ask that might be essential to our health—both emotional and physical. All of us are touched by big data everyday, and its influence is multiplying. Everybody Lies challenges us to think differently about how we see it and the world.