The Data Science Design Manual

Springer
Free sample

This engaging and clearly written textbook/reference provides a must-have introduction to the rapidly emerging interdisciplinary field of data science. It focuses on the principles fundamental to becoming a good data scientist and the key skills needed to build systems for collecting, analyzing, and interpreting data.

The Data Science Design Manual is a source of practical insights that highlights what really matters in analyzing data, and provides an intuitive understanding of how these core concepts can be used. The book does not emphasize any particular programming language or suite of data-analysis tools, focusing instead on high-level discussion of important design principles.

This easy-to-read text ideally serves the needs of undergraduate and early graduate students embarking on an “Introduction to Data Science” course. It reveals how this discipline sits at the intersection of statistics, computer science, and machine learning, with a distinct heft and character of its own. Practitioners in these and related fields will find this book perfect for self-study as well.

Additional learning tools:

  • Contains “War Stories,” offering perspectives on how data science applies in the real world
  • Includes “Homework Problems,” providing a wide range of exercises and projects for self-study
  • Provides a complete set of lecture slides and online video lectures at www.data-manual.com
  • Provides “Take-Home Lessons,” emphasizing the big-picture concepts to learn from each chapter
  • Recommends exciting “Kaggle Challenges” from the online platform Kaggle
  • Highlights “False Starts,” revealing the subtle reasons why certain approaches fail
  • Offers examples taken from the data science television show “The Quant Shop” (www.quant-shop.com)

Read more
Collapse

About the author

Dr. Steven S. Skiena is Distinguished Teaching Professor of Computer Science at Stony Brook University, with research interests in data science, natural language processing, and algorithms. He was awarded the IEEE Computer Science and Engineering Undergraduate Teaching Award “for outstanding contributions to undergraduate education ...and for influential textbooks and software.” Dr. Skiena is the author of six books, including the popular Springer titles The Algorithm Design Manual and Programming Challenges: The Programming Contest Training Manual.

Read more
Collapse
Loading...

Additional Information

Publisher
Springer
Read more
Collapse
Published on
Jul 1, 2017
Read more
Collapse
Pages
445
Read more
Collapse
ISBN
9783319554440
Read more
Collapse
Read more
Collapse
Best For
Read more
Collapse
Language
English
Read more
Collapse
Genres
Business & Economics / Business Mathematics
Business & Economics / Industries / Computers & Information Technology
Computers / Computer Vision & Pattern Recognition
Computers / Databases / Data Mining
Computers / Information Technology
Computers / Mathematical & Statistical Software
Computers / Optical Data Processing
Mathematics / Combinatorics
Mathematics / Graphic Methods
Read more
Collapse
Content Protection
This content is DRM protected.
Read more
Collapse

Reading information

Smartphones and Tablets

Install the Google Play Books app for Android and iPad/iPhone. It syncs automatically with your account and allows you to read online or offline wherever you are.

Laptops and Computers

You can read books purchased on Google Play using your computer's web browser.

eReaders and other devices

To read on e-ink devices like the Sony eReader or Barnes & Noble Nook, you'll need to download a file and transfer it to your device. Please follow the detailed Help center instructions to transfer the files to supported eReaders.
Methods of dimensionality reduction provide a way to understand and visualize the structure of complex data sets. Traditional methods like principal component analysis and classical metric multidimensional scaling suffer from being based on linear models. Until recently, very few methods were able to reduce the data dimensionality in a nonlinear way. However, since the late nineties, many new methods have been developed and nonlinear dimensionality reduction, also called manifold learning, has become a hot topic. New advances that account for this rapid growth are, e.g. the use of graphs to represent the manifold topology, and the use of new metrics like the geodesic distance. In addition, new optimization schemes, based on kernel techniques and spectral decomposition, have lead to spectral embedding, which encompasses many of the recently developed methods.

This book describes existing and advanced methods to reduce the dimensionality of numerical databases. For each method, the description starts from intuitive ideas, develops the necessary mathematical details, and ends by outlining the algorithmic implementation. Methods are compared with each other with the help of different illustrative examples.

The purpose of the book is to summarize clear facts and ideas about well-known methods as well as recent developments in the topic of nonlinear dimensionality reduction. With this goal in mind, methods are all described from a unifying point of view, in order to highlight their respective strengths and shortcomings.

The book is primarily intended for statisticians, computer scientists and data analysts. It is also accessible to other practitioners having a basic background in statistics and/or computational learning, like psychologists (in psychometry) and economists.

Remarkable advances in computation and data storage and the ready availability of huge data sets have been the keys to the growth of the new disciplines of data mining and machine learning, while the enormous success of the Human Genome Project has opened up the field of bioinformatics.

These exciting developments, which led to the introduction of many innovative statistical tools for high-dimensional data analysis, are described here in detail. The author takes a broad perspective; for the first time in a book on multivariate analysis, nonlinear methods are discussed in detail as well as linear methods. Techniques covered range from traditional multivariate methods, such as multiple regression, principal components, canonical variates, linear discriminant analysis, factor analysis, clustering, multidimensional scaling, and correspondence analysis, to the newer methods of density estimation, projection pursuit, neural networks, multivariate reduced-rank regression, nonlinear manifold learning, bagging, boosting, random forests, independent component analysis, support vector machines, and classification and regression trees. Another unique feature of this book is the discussion of database management systems.

This book is appropriate for advanced undergraduate students, graduate students, and researchers in statistics, computer science, artificial intelligence, psychology, cognitive sciences, business, medicine, bioinformatics, and engineering. Familiarity with multivariable calculus, linear algebra, and probability and statistics is required. The book presents a carefully-integrated mixture of theory and applications, and of classical and modern multivariate statistical techniques, including Bayesian methods. There are over 60 interesting data sets used as examples in the book, over 200 exercises, and many color illustrations and photographs.

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses.
This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.
This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.

Learn how to build a data science team within your organization rather than hiring from the outside. Teach your team to ask the right questions to gain actionable insights into your business.

Most organizations still focus on objectives and deliverables. Instead, a data science team is exploratory. They use the scientific method to ask interesting questions and run small experiments. Your team needs to see if the data illuminate their questions. Then, they have to use critical thinking techniques to justify their insights and reasoning. They should pivot their efforts to keep their insights aligned with business value. Finally, your team needs to deliver these insights as a compelling story.

Insight!: How to Build Data Science Teams that Deliver Real Business Value shows that the most important thing you can do now is help your team think about data. Management coach Doug Rose walks you through the process of creating and managing effective data science teams. You will learn how to find the right people inside your organization and equip them with the right mindset. The book has three overarching concepts:

You should mine your own company for talent. You can’t change your organization by hiring a few data science superheroes.
You should form small, agile-like data teams that focus on delivering valuable insights early and often.
You can make real changes to your organization by telling compelling data stories. These stories are the best way to communicate your insights about your customers, challenges, and industry.
What Your Will Learn:Create data science teams from existing talent in your organization to cost-efficiently extract maximum business value from your organization’s data
Understand key data science terms and concepts
Follow practical guidance to create and integrate an effective data science team with key roles and the responsibilities for each team member
Utilize the data science life cycle (DSLC) to model essential processes and practices for delivering value
Use sprints and storytelling to help your team stay on track and adapt to new knowledge

Who This Book Is For
Data science project managers and team leaders. The secondary readership is data scientists, DBAs, analysts, senior management, HR managers, and performance specialists.

NATIONAL BESTSELLER

Developing video games—hero's journey or fool's errand? The creative and technical logistics that go into building today's hottest games can be more harrowing and complex than the games themselves, often seeming like an endless maze or a bottomless abyss. In Blood, Sweat, and Pixels, Jason Schreier takes readers on a fascinating odyssey behind the scenes of video game development, where the creator may be a team of 600 overworked underdogs or a solitary geek genius. Exploring the artistic challenges, technical impossibilities, marketplace demands, and Donkey Kong-sized monkey wrenches thrown into the works by corporate, Blood, Sweat, and Pixels reveals how bringing any game to completion is more than Sisyphean—it's nothing short of miraculous.

Taking some of the most popular, bestselling recent games, Schreier immerses readers in the hellfire of the development process, whether it's RPG studio Bioware's challenge to beat an impossible schedule and overcome countless technical nightmares to build Dragon Age: Inquisition; indie developer Eric Barone's single-handed efforts to grow country-life RPG Stardew Valley from one man's vision into a multi-million-dollar franchise; or Bungie spinning out from their corporate overlords at Microsoft to create Destiny, a brand new universe that they hoped would become as iconic as Star Wars and Lord of the Rings—even as it nearly ripped their studio apart.

Documenting the round-the-clock crunches, buggy-eyed burnout, and last-minute saves, Blood, Sweat, and Pixels is a journey through development hell—and ultimately a tribute to the dedicated diehards and unsung heroes who scale mountains of obstacles in their quests to create the best games imaginable.

©2019 GoogleSite Terms of ServicePrivacyDevelopersArtistsAbout Google|Location: United StatesLanguage: English (United States)
By purchasing this item, you are transacting with Google Payments and agreeing to the Google Payments Terms of Service and Privacy Notice.