Data Scientist: The Definitive Guide to Becoming a Data Scientist

Technics Publications
1
Free sample

As our society transforms into a data-driven one, the role of the Data Scientist is becoming more and more important. If you want to be on the leading edge of what is sure to become a major profession in the not-too-distant future, this book can show you how. Each chapter is filled with practical information that will help you reap the fruits of big data and become a successful Data Scientist: • Learn what big data is and how it differs from traditional data through its main characteristics: volume, variety, velocity, and veracity. • Explore the different types of Data Scientists and the skillset each one has. • Dig into what the role of the Data Scientist requires in terms of the relevant mindset, technical skills, experience, and how the Data Scientist connects with other people. • Be a Data Scientist for a day, examining the problems you may encounter and how you tackle them, what programs you use, and how you expand your knowledge and know-how. • See how you can become a Data Scientist, based on where you are starting from: a programming, machine learning, or data-related background. • Follow step-by-step through the process of landing a Data Scientist job: where you need to look, how you would present yourself to a potential employer, and what it takes to follow a freelancer path. • Read the case studies of experienced, senior-level Data Scientists, in an attempt to get a better perspective of what this role is, in practice. At the end of the book, there is a glossary of the most important terms that have been introduced, as well as three appendices – a list of useful sites, some relevant articles on the web, and a list of offline resources for further reading.
Read more

About the author

Dr. Zacharias Voulgaris was born and raised in Greece. Upon completing a 5-year Engineering degree at the Technical University of Crete, he enrolled in the City University of London for a Masters course in Information Systems and Technology. Afterwards, he pursued a PhD in Birkbeck College (University of London), under the joint supervision of Prof. G. Magoulas and Prof. B. Mirkin, in the field of Machine Learning. Upon receiving his doctorate, he was recruited by the Georgia Institute of Technology as a research fellow. Since January 2013 he has been working as a Data Scientist.
Read more
5.0
1 total
Loading...

Additional Information

Publisher
Technics Publications
Read more
Published on
May 1, 2014
Read more
Pages
278
Read more
ISBN
9781634620284
Read more
Read more
Best For
Read more
Language
English
Read more
Genres
Computers / Databases / Data Mining
Computers / Mathematical & Statistical Software
Read more
Content Protection
This content is DRM protected.
Read more

Reading information

Smartphones and Tablets

Install the Google Play Books app for Android and iPad/iPhone. It syncs automatically with your account and allows you to read online or offline wherever you are.

Laptops and Computers

You can read books purchased on Google Play using your computer's web browser.

eReaders and other devices

To read on e-ink devices like the Sony eReader or Barnes & Noble Nook, you'll need to download a file and transfer it to your device. Please follow the detailed Help center instructions to transfer the files to supported eReaders.
 Master how to use the Julia language to solve business critical data science challenges. After covering the importance of Julia to the data science community and several essential data science principles, we start with the basics including how to install Julia and its powerful libraries. Many examples are provided as we illustrate how to leverage each Julia command, dataset, and function.

Specialized script packages are introduced and described. Hands-on problems representative of those commonly encountered throughout the data science pipeline are provided, and we guide you in the use of Julia in solving them using published datasets. Many of these scenarios make use of existing packages and built-in functions, as we cover:

1.      1. An overview of the data science pipeline along with an example illustrating the key points, implemented in Julia

2.      2. Options for Julia IDEs

3.     3.  Programming structures and functions

4.     4.  Engineering tasks, such as importing, cleaning, formatting and storing data, as well as performing data preprocessing

5.      5. Data visualization and some simple yet powerful statistics for data exploration purposes

6.      6. Dimensionality reduction and feature evaluation

7.      7. Machine learning methods, ranging from unsupervised (different types of clustering) to supervised ones (decision trees, random forests, basic neural networks, regression trees, and Extreme Learning Machines)

8.      8. Graph analysis including pinpointing the connections among the various entities and how they can be mined for useful insights.

Each chapter concludes with a series of questions and exercises to reinforce what you learned. The last chapter of the book will guide you in creating a data science application from scratch using Julia.

 

Foreword by Steven Pinker

Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.

By the end of an average day in the early twenty-first century, human beings searching the internet will amass eight trillion gigabytes of data. This staggering amount of information—unprecedented in history—can tell us a great deal about who we are—the fears, desires, and behaviors that drive us, and the conscious and unconscious decisions we make. From the profound to the mundane, we can gain astonishing knowledge about the human psyche that less than twenty years ago, seemed unfathomable.

Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?

Investigating these questions and a host of others, Seth Stephens-Davidowitz offers revelations that can help us understand ourselves and our lives better. Drawing on studies and experiments on how we really live and think, he demonstrates in fascinating and often funny ways the extent to which all the world is indeed a lab. With conclusions ranging from strange-but-true to thought-provoking to disturbing, he explores the power of this digital truth serum and its deeper potential—revealing biases deeply embedded within us, information we can use to change our culture, and the questions we’re afraid to ask that might be essential to our health—both emotional and physical. All of us are touched by big data everyday, and its influence is multiplying. Everybody Lies challenges us to think differently about how we see it and the world.

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.

Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.

 Master how to use the Julia language to solve business critical data science challenges. After covering the importance of Julia to the data science community and several essential data science principles, we start with the basics including how to install Julia and its powerful libraries. Many examples are provided as we illustrate how to leverage each Julia command, dataset, and function.

Specialized script packages are introduced and described. Hands-on problems representative of those commonly encountered throughout the data science pipeline are provided, and we guide you in the use of Julia in solving them using published datasets. Many of these scenarios make use of existing packages and built-in functions, as we cover:

1.      1. An overview of the data science pipeline along with an example illustrating the key points, implemented in Julia

2.      2. Options for Julia IDEs

3.     3.  Programming structures and functions

4.     4.  Engineering tasks, such as importing, cleaning, formatting and storing data, as well as performing data preprocessing

5.      5. Data visualization and some simple yet powerful statistics for data exploration purposes

6.      6. Dimensionality reduction and feature evaluation

7.      7. Machine learning methods, ranging from unsupervised (different types of clustering) to supervised ones (decision trees, random forests, basic neural networks, regression trees, and Extreme Learning Machines)

8.      8. Graph analysis including pinpointing the connections among the various entities and how they can be mined for useful insights.

Each chapter concludes with a series of questions and exercises to reinforce what you learned. The last chapter of the book will guide you in creating a data science application from scratch using Julia.

 

©2018 GoogleSite Terms of ServicePrivacyDevelopersArtistsAbout Google
By purchasing this item, you are transacting with Google Payments and agreeing to the Google Payments Terms of Service and Privacy Notice.