Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.
By the end of an average day in the early twenty-first century, human beings searching the internet will amass eight trillion gigabytes of data. This staggering amount of information—unprecedented in history—can tell us a great deal about who we are—the fears, desires, and behaviors that drive us, and the conscious and unconscious decisions we make. From the profound to the mundane, we can gain astonishing knowledge about the human psyche that less than twenty years ago, seemed unfathomable.
Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?
Investigating these questions and a host of others, Seth Stephens-Davidowitz offers revelations that can help us understand ourselves and our lives better. Drawing on studies and experiments on how we really live and think, he demonstrates in fascinating and often funny ways the extent to which all the world is indeed a lab. With conclusions ranging from strange-but-true to thought-provoking to disturbing, he explores the power of this digital truth serum and its deeper potential—revealing biases deeply embedded within us, information we can use to change our culture, and the questions we’re afraid to ask that might be essential to our health—both emotional and physical. All of us are touched by big data everyday, and its influence is multiplying. Everybody Lies challenges us to think differently about how we see it and the world.
This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.
Let's face it, SQL is a deceptively simple language to learn, and many database developers never go far beyond the simple statement: SELECT columns FROM table WHERE conditions. But there is so much more you can do with the language. In the SQL Cookbook, experienced SQL developer Anthony Molinaro shares his favorite SQL techniques and features. You'll learn about:
Window functions, arguably the most significant enhancement to SQL in the past decade. If you're not using these, you're missing out
Powerful, database-specific features such as SQL Server's PIVOT and UNPIVOT operators, Oracle's MODEL clause, and PostgreSQL's very useful GENERATE_SERIES function
Pivoting rows into columns, reverse-pivoting columns into rows, using pivoting to facilitate inter-row calculations, and double-pivoting a result set
Bucketization, and why you should never use that term in Brooklyn.
How to create histograms, summarize data into buckets, perform aggregations over a moving range of values, generate running-totals and subtotals, and other advanced, data warehousing techniques
The technique of walking a string, which allows you to use SQL to parse through the characters, words, or delimited elements of a string
Written in O'Reilly's popular Problem/Solution/Discussion style, the SQL Cookbook is sure to please. Anthony's credo is: "When it comes down to it, we all go to work, we all have bills to pay, and we all want to go home at a reasonable time and enjoy what's still available of our days." The SQL Cookbook moves quickly from problem to solution, saving you time each step of the way.
Updated to reflect recent advances in MySQL and InnoDB performance, features, and tools, this third edition not only offers specific examples of how MySQL works, it also teaches you why this system works as it does, with illustrative stories and case studies that demonstrate MySQL’s principles in action. With this book, you’ll learn how to think in MySQL.Learn the effects of new features in MySQL 5.5, including stored procedures, partitioned databases, triggers, and viewsImplement improvements in replication, high availability, and clusteringAchieve high performance when running MySQL in the cloudOptimize advanced querying features, such as full-text searchesTake advantage of modern multi-core CPUs and solid-state disksExplore backup and recovery strategies—including new tools for hot online backups
Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.Understand how data science fits in your organization—and how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidates
Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone, Second Edition, is the solution.
Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you’ll need to accomplish 80 percent of modern data tasks.
Lander’s self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You’ll download and install R; navigate and use the R environment; master basic program control, data import, manipulation, and visualization; and walk through several essential tests. Then, building on this foundation, you’ll construct several complete models, both linear and nonlinear, and use some data mining techniques. After all this you’ll make your code reproducible with LaTeX, RMarkdown, and Shiny.
By the time you’re done, you won’t just know how to write R programs, you’ll be ready to tackle the statistical problems you care about most.
Coverage includesExplore R, RStudio, and R packages Use R for math: variable types, vectors, calling functions, and more Exploit data structures, including data.frames, matrices, and lists Read many different types of data Create attractive, intuitive statistical graphics Write user-defined functions Control program flow with if, ifelse, and complex checks Improve program efficiency with group manipulations Combine and reshape multiple datasets Manipulate strings using R’s facilities and regular expressions Create normal, binomial, and Poisson probability distributions Build linear, generalized linear, and nonlinear models Program basic statistics: mean, standard deviation, and t-tests Train machine learning models Assess the quality of models and variable selection Prevent overfitting and perform variable selection, using the Elastic Net and Bayesian methods Analyze univariate and multivariate time series data Group data via K-means and hierarchical clustering Prepare reports, slideshows, and web pages with knitr Display interactive data with RMarkdown and htmlwidgets Implement dashboards with Shiny Build reusable R packages with devtools and Rcpp
Register your product at informit.com/register for convenient access to downloads, updates, and corrections as they become available.
Companies such as Google, Microsoft, and Facebook are actively growing in-house deep-learning teams. For the rest of us, however, deep learning is still a pretty complex and difficult subject to grasp. If you’re familiar with Python, and have a background in calculus, along with a basic understanding of machine learning, this book will get you started.Examine the foundations of machine learning and neural networksLearn how to train feed-forward neural networksUse TensorFlow to implement your first neural networkManage problems that arise as you begin to make networks deeperBuild neural networks that analyze complex imagesPerform effective dimensionality reduction using autoencodersDive deep into sequence analysis to examine languageLearn the fundamentals of reinforcement learning
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.Get a crash course in PythonLearn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data scienceCollect, explore, clean, munge, and manipulate dataDive into the fundamentals of machine learningImplement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clusteringExplore recommender systems, natural language processing, network analysis, MapReduce, and databases
If you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning – whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource.What You Will LearnExplore how to use different machine learning models to ask different questions of your dataLearn how to build neural networks using Keras and TheanoFind out how to write clean and elegant Python code that will optimize the strength of your algorithmsDiscover how to embed your machine learning model in a web application for increased accessibilityPredict continuous target outcomes using regression analysisUncover hidden patterns and structures in data with clusteringOrganize data using effective pre-processing techniquesGet to grips with sentiment analysis to delve deeper into textual and social media dataIn Detail
Machine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, becoming one of the key strategies for unlocking growth in a challenging contemporary marketplace. Python can help you deliver key insights into your data – its unique capabilities as a language let you build sophisticated algorithms and statistical models that can reveal new perspectives and answer key questions that are vital for success.
Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world's leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you'll soon be able to answer some of the most important questions facing you and your organization.Style and approach
Python Machine Learning connects the fundamental theoretical principles behind machine learning to their practical application in a way that focuses you on asking and answering the right questions. It walks you through the key elements of Python and its powerful machine learning libraries, while demonstrating how to get to grips with a range of statistical models.
This is the eBook of the printed book and may not include any media, website access codes, or print supplements that may come packaged with the bound book.
Master business modeling and analysis techniques with Microsoft Excel 2016, and transform data into bottom-line results. Written by award-winning educator Wayne Winston, this hands on, scenario-focused guide helps you use Excel’s newest tools to ask the right questions and get accurate, actionable answers. This edition adds 150+ new problems with solutions, plus a chapter of basic spreadsheet models to make sure you’re fully up to speed.
Solve real business problems with Excel–and build your competitive advantageQuickly transition from Excel basics to sophisticated analytics Summarize data by using PivotTables and Descriptive Statistics Use Excel trend curves, multiple regression, and exponential smoothing Master advanced functions such as OFFSET and INDIRECT Delve into key financial, statistical, and time functions Leverage the new charts in Excel 2016 (including box and whisker and waterfall charts) Make charts more effective by using Power View Tame complex optimizations by using Excel Solver Run Monte Carlo simulations on stock prices and bidding models Work with the AGGREGATE function and table slicers Create PivotTables from data in different worksheets or workbooks Learn about basic probability and Bayes’ Theorem Automate repetitive tasks by using macros
This book is ideal for data scientists who are familiar with C++ or Python and perform machine learning activities on a day-to-day basis. Intermediate and advanced machine learning implementers who need a quick guide they can easily navigate will find it useful.What You Will LearnBecome familiar with the basics of the TensorFlow machine learning libraryGet to know Linear Regression techniques with TensorFlowLearn SVMs with hands-on recipesImplement neural networks and improve predictionsApply NLP and sentiment analysis to your dataMaster CNN and RNN through practical recipesTake TensorFlow into productionIn Detail
TensorFlow is an open source software library for Machine Intelligence. The independent recipes in this book will teach you how to use TensorFlow for complex data computations and will let you dig deeper and gain more insights into your data than ever before. You'll work through recipes on training models, model evaluation, sentiment analysis, regression analysis, clustering analysis, artificial neural networks, and deep learning – each using Google's machine learning library TensorFlow.
This guide starts with the fundamentals of the TensorFlow library which includes variables, matrices, and various data sources. Moving ahead, you will get hands-on experience with Linear Regression techniques with TensorFlow. The next chapters cover important high-level concepts such as neural networks, CNN, RNN, and NLP.
Once you are familiar and comfortable with the TensorFlow ecosystem, the last chapter will show you how to take it to production.Style and approach
This book takes a recipe-based approach where every topic is explicated with the help of a real-world example.
Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power.
Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention.
Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You’ll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you’ve mastered these techniques, you’ll constantly turn to this guide for the working PyMC code you need to jumpstart future projects.
• Learning the Bayesian “state of mind” and its practical implications
• Understanding how computers perform Bayesian inference
• Using the PyMC Python library to program Bayesian analyses
• Building and debugging models with PyMC
• Testing your model’s “goodness of fit”
• Opening the “black box” of the Markov Chain Monte Carlo algorithm to see how and why it works
• Leveraging the power of the “Law of Large Numbers”
• Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning
• Using loss functions to measure an estimate’s weaknesses based on your goals and desired outcomes
• Selecting appropriate priors and understanding how their influence changes with dataset size
• Overcoming the “exploration versus exploitation” dilemma: deciding when “pretty good” is good enough
• Using Bayesian inference to improve A/B testing
• Solving data science problems when only small amounts of data are available
Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.
NoSQL Distilled is a concise but thorough introduction to this rapidly emerging technology. Pramod J. Sadalage and Martin Fowler explain how NoSQL databases work and the ways that they may be a superior alternative to a traditional RDBMS. The authors provide a fast-paced guide to the concepts you need to know in order to evaluate whether NoSQL databases are right for your needs and, if so, which technologies you should explore further.
The first part of the book concentrates on core concepts, including schemaless data models, aggregates, new distribution models, the CAP theorem, and map-reduce. In the second part, the authors explore architectural and design issues associated with implementing NoSQL. They also present realistic use cases that demonstrate NoSQL databases at work and feature representative examples using Riak, MongoDB, Cassandra, and Neo4j.
In addition, by drawing on Pramod Sadalage’s pioneering work, NoSQL Distilled shows how to implement evolutionary design with schema migration: an essential technique for applying NoSQL databases. The book concludes by describing how NoSQL is ushering in a new age of Polyglot Persistence, where multiple data-storage worlds coexist, and architects can choose the technology best optimized for each type of data access.
Predictive analytics and Data Mining techniques covered: Exploratory Data Analysis, Visualization, Decision trees, Rule induction, k-Nearest Neighbors, Naïve Bayesian, Artificial Neural Networks, Support Vector machines, Ensemble models, Bagging, Boosting, Random Forests, Linear regression, Logistic regression, Association analysis using Apriori and FP Growth, K-Means clustering, Density based clustering, Self Organizing Maps, Text Mining, Time series forecasting, Anomaly detection and Feature selection. Implementation files can be downloaded from the book companion site at www.LearnPredictiveAnalytics.comDemystifies data mining concepts with easy to understand languageShows how to get up and running fast with 20 commonly used powerful techniques for predictive analysisExplains the process of using open source RapidMiner toolsDiscusses a simple 5 step process for implementing algorithms that can be used for performing predictive analyticsIncludes practical use cases and examples
For each marketing problem, the authors help you:Identify the right data and analytics techniques Conduct the analysis and obtain insights from it Outline what-if scenarios and define optimal solutions Connect your insights to strategic decision-making
Each chapter contains technical notes, statistical knowledge, case studies, and real data you can use to perform the analysis yourself. As you proceed, you'll gain an in-depth understanding of:The real value of marketing analytics How to integrate quantitative analysis with managerial sensibility How to apply linear regression, logistic regression, cluster analysis, and Anova models The crucial role of careful experimental design
For all marketing professionals specializing in marketing analytics and/or business intelligence; and for students and faculty in all graduate-level business courses covering Marketing Analytics, Marketing Effectiveness, or Marketing Metrics
This book offers practical answers to some of the hardest questions faced by PL/SQL developers, including:What is the best way to write the SQL logic in my application code?
How should I write my packages so they can be leveraged by my entire team of developers?
How can I make sure that all my team's programs handle and record errors consistently?Oracle PL/SQL Best Practices summarizes PL/SQL best practices in nine major categories: overall PL/SQL application development; programming standards; program testing, tracing, and debugging; variables and data structures; control logic; error handling; the use of SQL in PL/SQL; building procedures, functions, packages, and triggers; and overall program performance.
This book is a concise and entertaining guide that PL/SQL developers will turn to again and again as they seek out ways to write higher quality code and more successful applications.
"This book presents ideas that make the difference between a successful project and one that never gets off the ground. It goes beyond just listing a set of rules, and provides realistic scenarios that help the reader understand where the rules come from. This book should be required reading for any team of Oracle database professionals."
--Dwayne King, President, KRIDAN Consulting
Collier introduces platform-agnostic Agile solutions for integrating infrastructures consisting of diverse operational, legacy, and specialty systems that mix commercial and custom code. Using working examples, he shows how to manage analytics development teams with widely diverse skill sets and how to support enormous and fast-growing data volumes. Collier’s techniques offer optimal value whether your projects involve “back-end” data management, “front-end” business analysis, or both.
Part I focuses on Agile project management techniques and delivery team coordination, introducing core practices that shape the way your Agile DW/BI project community can collaborate toward success
Part II presents technical methods for enabling continuous delivery of business value at production-quality levels, including evolving superior designs; test-driven DW development; version control; and project automation
Collier brings together proven solutions you can apply right now—whether you’re an IT decision-maker, data warehouse professional, database administrator, business intelligence specialist, or database developer. With his help, you can mitigate project risk, improve business alignment, achieve better results—and have fun along the way.
This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success.
Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data.
Learn how to
• Discover what Apache Spark does and how it fits into the Big Data landscape
• Deploy and run Spark locally or in the cloud
• Interact with Spark from the shell
• Make the most of the Spark Cluster Architecture
• Develop Spark applications with Scala and functional Python
• Program with the Spark API, including transformations and actions
• Apply practical data engineering/analysis approaches designed for Spark
• Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output
• Optimize Spark solution performance
• Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra)
• Leverage cutting-edge functional programming techniques
• Extend Spark with streaming, R, and Sparkling Water
• Start building Spark-based machine learning and graph-processing applications
• Explore advanced messaging technologies, including Kafka
• Preview and prepare for Spark’s next generation of innovations
Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.
Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you’ll learn Kafka’s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.Understand publish-subscribe messaging and how it fits in the big data ecosystem.Explore Kafka producers and consumers for writing and reading messagesUnderstand Kafka patterns and use-case requirements to ensure reliable data deliveryGet best practices for building data pipelines and applications with KafkaManage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasksLearn the most critical metrics among Kafka’s operational measurementsExplore how Kafka’s stream delivery capabilities make it a perfect source for stream processing systems
Detailing the hows and the whys of successful Essbase implementation, the book arms you with simple yet powerful tools to meet your immediate needs, as well as the theoretical knowledge to proceed to the next level with Essbase. Infrastructure, data sourcing and transformation, database design, calculations, automation, APIs, reporting, and project implementation are covered by subject matter experts who work with the tools and techniques on a daily basis. In addition to practical cases that illustrate valuable lessons learned, the book offers:
Undocumented Secrets—Dan Pressman describes the previously unpublished and undocumented inner workings of the ASO Essbase engine. Authoritative Experts—If you have questions that no one else can solve, these 12 Essbase professionals are the ones who can answer them. Unpublished—Includes the only third-party guide to infrastructure. Infrastructure is easy to get wrong and can doom any Essbase project. Comprehensive—Let there never again be a question on how to create blocks or design BSO databases for performance—Dave Farnsworth provides the answers within. Innovative—Cameron Lackpour and Joe Aultman bring new and exciting solutions to persistent Essbase problems.
With a list of contributors as impressive as the program of presenters at a leading Essbase conference, this book offers unprecedented access to the insights and experiences of those at the forefront of the field. The previously unpublished material presented in these pages will give you the practical knowledge needed to use this powerful and intuitive tool to build highly useful analytical models, reporting systems, and forecasting applications.
This book will help you:Become a contributor on a data science team Deploy a structured lifecycle approach to data analytics problems Apply appropriate analytic techniques and tools to analyzing big data Learn how to tell a compelling story with data to drive business action Prepare for EMC Proven Professional Data Science Certification
Corresponding data sets are available at www.wiley.com/go/9781118876138.
Get started discovering, analyzing, visualizing, and presenting data in a meaningful way today!
Updated for R 2.14 and 2.15, this second edition includes new and expanded chapters on R performance, the ggplot2 data visualization package, and parallel R computing with Hadoop.Get started quickly with an R tutorial and hundreds of examplesExplore R syntax, objects, and other language detailsFind thousands of user-contributed R packages online, including BioconductorLearn how to use R to prepare data for analysisVisualize your data with R’s graphics, lattice, and ggplot2 packagesUse R to calculate statistical fests, fit models, and compute probability distributionsSpeed up intensive computations by writing parallel R programs for HadoopGet a complete desktop reference to R
Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data— including stream data, sequence data, graph structured data, social network data, and multi-relational data.A comprehensive, practical look at the concepts and techniques you need to know to get the most out of real business dataUpdates that incorporate input from readers, changes in the field, and more material on statistics and machine learningDozens of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projectsComplete classroom support for instructors at www.mkp.com/datamining2e companion site
Maybe you've written some simple SQL queries to interact with databases. But now you want more, you want to really dig into those databases and work with your data. Head First SQL will show you the fundamentals of SQL and how to really take advantage of it. We'll take you on a journey through the language, from basic INSERT statements and SELECT queries to hardcore database manipulation with indices, joins, and transactions. We all know "Data is Power" - but we'll show you how to have "Power over your Data". Expect to have fun, expect to learn, and expect to be querying, normalizing, and joining your data like a pro by the time you're finished reading!
The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You’ll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media.Learn how to apply the tidy text format to NLPUse sentiment analysis to mine the emotional content of textIdentify a document’s most important terms with frequency measurementsExplore relationships and connections between words with the ggraph and widyr packagesConvert back and forth between R’s tidy and non-tidy text formatsUse topic modeling to classify document collections into natural groupsExamine case studies that compare Twitter archives, dig into NASA metadata, and analyze thousands of Usenet messages
Transforming the immense potential of workforce analytics into reality isn’t easy. Pioneering practitioners have learned crucial lessons that can help you succeed. The Power of People shares their journeys—and their indispensable insights.
Drawing on incisive case studies and vignettes, three experts help you bring purpose and clarity to any workforce analytics project, with robust research design and analysis to get reliable insights. They reveal where to start, where to find stakeholder support, and how to earn “quick wins” to build upon.
You’ll learn how to sustain success through best-practice data management, technology usage, partnering, and skill building. Finally, you’ll discover how to earn even more value by establishing an analytical mindset throughout HR, and building two key skills: storytelling and visualization.
The Power of People will be invaluable to HR executives establishing or leading analytics functions; HR professionals planning analytics projects; and any business executive who wants more value from HR.
Build agile and responsive business intelligence solutions
Create a semantic model and analyze data using the tabular model in SQL Server 2016 Analysis Services to create corporate-level business intelligence (BI) solutions. Led by two BI experts, you will learn how to build, deploy, and query a tabular model by following detailed examples and best practices. This hands-on book shows you how to use the tabular model’s in-memory database to perform rapid analytics—whether you are new to Analysis Services or already familiar with its multidimensional model.
Discover how to:
• Determine when a tabular or multidimensional model is right for your project
• Build a tabular model using SQL Server Data Tools in Microsoft Visual Studio 2015
• Integrate data from multiple sources into a single, coherent view of company information
• Choose a data-modeling technique that meets your organization’s performance and usability requirements
• Implement security by establishing administrative and data user roles
• Define and implement partitioning strategies to reduce processing time
• Use Tabular Model Scripting Language (TMSL) to execute and automate administrative tasks
• Optimize your data model to reduce the memory footprint for VertiPaq
• Choose between in-memory (VertiPaq) and pass-through (DirectQuery) engines for tabular models
• Select the proper hardware and virtualization configurations
• Deploy and manipulate tabular models from C# and PowerShell using AMO and TOM libraries
Get code samples, including complete apps, at: https://aka.ms/tabular/downloads
About This Book
• For BI professionals who are new to SQL Server 2016 Analysis Services or already familiar with previous versions of the product, and who want the best reference for creating and maintaining tabular models.
• Assumes basic familiarity with database design and business analytics concepts.
Authors Adam Gibson and Josh Patterson provide theory on deep learning before introducing their open-source Deeplearning4j (DL4J) library for developing production-class workflows. Through real-world examples, you’ll learn methods and strategies for training deep network architectures and running deep learning workflows on Spark and Hadoop with DL4J.Dive into machine learning concepts in general, as well as deep learning in particularUnderstand how deep networks evolved from neural network fundamentalsExplore the major deep network architectures, including Convolutional and RecurrentLearn how to map specific deep networks to the right problemWalk through the fundamentals of tuning general neural networks and specific deep network architecturesUse vectorization techniques for different data types with DataVec, DL4J’s workflow toolLearn how to use DL4J natively on Spark and Hadoop
Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research.
The book is targeted at information systems practitioners, programmers, consultants, developers, information technology managers, specification writers, data analysts, data modelers, database R&D professionals, data warehouse engineers, data mining professionals. The book will also be useful for professors and students of upper-level undergraduate and graduate-level data mining and machine learning courses who want to incorporate data mining as part of their data management knowledge base and expertise.Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projectsOffers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methodsIncludes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks—in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization
Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysis
"Finally, a concise reference for understanding how to conquer piles of data."--Austin King, Senior Web Developer, Mozilla
"An indispensable text for aspiring data scientists."--Michael E. Driscoll, CEO/Founder, Dataspora
This book is for intermediate Python developers who want to engage with the use of public APIs to collect data from social media platforms and perform statistical analysis in order to produce useful insights from data. The book assumes a basic understanding of the Python Standard Library and provides practical examples to guide you toward the creation of your data analysis project based on social data.What You Will LearnInteract with a social media platform via their public API with PythonStore social data in a convenient format for data analysisSlice and dice social data using Python tools for data scienceApply text analytics techniques to understand what people are talking about on social mediaApply advanced statistical and analytical techniques to produce useful insights from dataBuild beautiful visualizations with web technologies to explore data and present data productsIn Detail
Your social media is filled with a wealth of hidden data – unlock it with the power of Python. Transform your understanding of your clients and customers when you use Python to solve the problems of understanding consumer behavior and turning raw data into actionable customer insights.
This book will help you acquire and analyze data from leading social media sites. It will show you how to employ scientific Python tools to mine popular social websites such as Facebook, Twitter, Quora, and more. Explore the Python libraries used for social media mining, and get the tips, tricks, and insider insight you need to make the most of them. Discover how to develop data mining tools that use a social media API, and how to create your own data analysis projects using Python for clear insight from your social data.Style and approach
This practical, hands-on guide will help you learn everything you need to perform data mining for social media. Throughout the book, we take an example-oriented approach to use Python for data analysis and provide useful tips and tricks that you can use in day-to-day tasks.
This book is intended for Data Analysts, Scientists, Data Engineers, Statisticians, Researchers, who want to integrate R with their current or future Big Data workflows.
It is assumed that readers have some experience in data analysis and understanding of data management and algorithmic processing of large quantities of data, however they may lack specific skills related to R.What You Will LearnLearn about current state of Big Data processing using R programming language and its powerful statistical capabilitiesDeploy Big Data analytics platforms with selected Big Data tools supported by R in a cost-effective and time-saving mannerApply the R language to real-world Big Data problems on a multi-node Hadoop cluster, e.g. electricity consumption across various socio-demographic indicators and bike share scheme usageExplore the compatibility of R with Hadoop, Spark, SQL and NoSQL databases, and H2O platformIn Detail
Big Data analytics is the process of examining large and complex data sets that often exceed the computational capabilities. R is a leading programming language of data science, consisting of powerful functions to tackle all problems related to Big Data processing.
The book will begin with a brief introduction to the Big Data world and its current industry standards. With introduction to the R language and presenting its development, structure, applications in real world, and its shortcomings. Book will progress towards revision of major R functions for data management and transformations. Readers will be introduce to Cloud based Big Data solutions (e.g. Amazon EC2 instances and Amazon RDS, Microsoft Azure and its HDInsight clusters) and also provide guidance on R connectivity with relational and non-relational databases such as MongoDB and HBase etc. It will further expand to include Big Data tools such as Apache Hadoop ecosystem, HDFS and MapReduce frameworks. Also other R compatible tools such as Apache Spark, its machine learning library Spark MLlib, as well as H2O.Style and approach
This book will serve as a practical guide to tackling Big Data problems using R programming language and its statistical environment. Each section of the book will present you with concise and easy-to-follow steps on how to process, transform and analyse large data sets.
This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining.Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projectsAddresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fieldsProvides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
Drawing on extensive experience as a researcher, practitioner, and instructor, Dr. Dursun Delen delivers an optimal balance of concepts, techniques and applications. Without compromising either simplicity or clarity, he provides enough technical depth to help readers truly understand how data mining technologies work. Coverage includes: processes, methods, techniques, tools, and metrics; the role and management of data; text and web mining; sentiment analysis; and Big Data integration. Throughout, Delen's conceptual coverage is complemented with application case studies (examples of both successes and failures), as well as simple, hands-on tutorials.
Real-World Data Mining will be valuable to professionals on analytics teams; professionals seeking certification in the field; and undergraduate or graduate students in any analytics program: concentrations, certificate-based, or degree-based.
Written in a highly accessible style, Introduction to Statistics through Resampling Methods and R, Second Edition guides students in the understanding of descriptive statistics, estimation, hypothesis testing, and model building. The book emphasizes the discovery method, enabling readers to ascertain solutions on their own rather than simply copy answers or apply a formula by rote. The Second Edition utilizes the R programming language to simplify tedious computations, illustrate new concepts, and assist readers in completing exercises. The text facilitates quick learning through the use of:
More than 250 exercises—with selected "hints"—scattered throughout to stimulate readers' thinking and to actively engage them in applying their newfound skills
An increased focus on why a method is introduced
Multiple explanations of basic concepts
Real-life applications in a variety of disciplines
Dozens of thought-provoking, problem-solving questions in the final chapter to assist readers in applying statistics to real-life applications
Introduction to Statistics through Resampling Methods and R, Second Edition is an excellent resource for students and practitioners in the fields of agriculture, astrophysics, bacteriology, biology, botany, business, climatology, clinical trials, economics, education, epidemiology, genetics, geology, growth processes, hospital administration, law, manufacturing, marketing, medicine, mycology, physics, political science, psychology, social welfare, sports, and toxicology who want to master and learn to apply statistical methods.
Using data-driven business analytics to understand customers and improve results is a great idea in theory, but in today's busy offices, marketers and analysts need simple, low-cost ways to process and make the most of all that data. This expert book offers the perfect solution. Written by data analysis expert Wayne L. Winston, this practical resource shows you how to tap a simple and cost-effective tool, Microsoft Excel, to solve specific business problems using powerful analytic techniques—and achieve optimum results.
Practical exercises in each chapter help you apply and reinforce techniques as you learn.Shows you how to perform sophisticated business analyses using the cost-effective and widely available Microsoft Excel instead of expensive, proprietary analytical tools Reveals how to target and retain profitable customers and avoid high-risk customers Helps you forecast sales and improve response rates for marketing campaigns Explores how to optimize price points for products and services, optimize store layouts, and improve online advertising Covers social media, viral marketing, and how to exploit both effectively
Improve your marketing results with Microsoft Excel and the invaluable techniques and ideas in Marketing Analytics: Data-Driven Techniques with Microsoft Excel.
Today, analysts must manage data characterized by extraordinary variety, velocity, and volume. Using the open source Pandas library, you can use Python to rapidly automate and perform virtually any data analysis task, no matter how large or complex. Pandas can help you ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets.
Pandas for Everyone brings together practical knowledge and insight for solving real problems with Pandas, even if you’re new to Python data analysis. Daniel Y. Chen introduces key concepts through simple but practical examples, incrementally building on them to solve more difficult, real-world problems.
Chen gives you a jumpstart on using Pandas with a realistic dataset and covers combining datasets, handling missing data, and structuring datasets for easier analysis and visualization. He demonstrates powerful data cleaning techniques, from basic string manipulation to applying functions simultaneously across dataframes.
Once your data is ready, Chen guides you through fitting models for prediction, clustering, inference, and exploration. He provides tips on performance and scalability, and introduces you to the wider Python data analysis ecosystem.Work with DataFrames and Series, and import or export data Create plots with matplotlib, seaborn, and pandas Combine datasets and handle missing data Reshape, tidy, and clean datasets so they’re easier to work with Convert data types and manipulate text strings Apply functions to scale data manipulations Aggregate, transform, and filter large datasets with groupby Leverage Pandas’ advanced date and time capabilities Fit linear models using statsmodels and scikit-learn libraries Use generalized linear modeling to fit models with different response variables Compare multiple models to select the “best” Regularize to overcome overfitting and improve performance Use clustering in unsupervised machine learning
As you’ll discover in Enterprise Information Management in Practice, EIM deals with both structured data (e.g. sales data and customer data) as well as unstructured data (like customer satisfaction forms, emails, documents, social network sentiments, and so forth). With the deluge of information that enterprises face given their global operations and complex business models, as well as the advent of big data technology, it is not surprising that making sense of the large piles of data is of paramount importance. Enterprises must therefore put much greater emphasis on managing and monetizing both structured and unstructured data.
As Saumya Chaki—an information management expert and consultant with IBM—explains in Enterprise Information Management in Practice, it is now more important than ever before to have an enterprise information strategy that covers the entire life cycle of information and its consumption while providing security controls.
With Fortune 100 consultant Saumya Chaki as your guide, Enterprise Information Management in Practice covers each of these and the other pillars of EIM in depth, which provide readers with a comprehensive view of the building blocks for EIM.
Enterprises today deal with complex business environments where information demands take place in real time, are complex, and often serve as the differentiator among competitors. The effective management of information is thus crucial in managing enterprises. EIM has evolved as a specialized discipline in the business intelligence and enterprise data warehousing space to address the complex needs of information processing and delivery—and to ensure the enterprise is making the most of its information assets.
—Devdatt Dubhashi, Professor, Department of Computer Science and Engineering, Chalmers University, Sweden
"This textbook manages to be easier to read than other comparable books in the subject while retaining all the rigorous treatment needed. The new chapters put it at the forefront of the field by covering topics that have become mainstream in machine learning over the last decade."
—Daniel Barbara, George Mason University, Fairfax, Virginia, USA
"The new edition of A First Course in Machine Learning by Rogers and Girolami is an excellent introduction to the use of statistical methods in machine learning. The book introduces concepts such as mathematical modeling, inference, and prediction, providing ‘just in time’ the essential background on linear algebra, calculus, and probability theory that the reader needs to understand these concepts."
—Daniel Ortiz-Arroyo, Associate Professor, Aalborg University Esbjerg, Denmark
"I was impressed by how closely the material aligns with the needs of an introductory course on machine learning, which is its greatest strength...Overall, this is a pragmatic and helpful book, which is well-aligned to the needs of an introductory course and one that I will be looking at for my own students in coming months."
—David Clifton, University of Oxford, UK
"The first edition of this book was already an excellent introductory text on machine learning for an advanced undergraduate or taught masters level course, or indeed for anybody who wants to learn about an interesting and important field of computer science. The additional chapters of advanced material on Gaussian process, MCMC and mixture modeling provide an ideal basis for practical projects, without disturbing the very clear and readable exposition of the basics contained in the first part of the book."
—Gavin Cawley, Senior Lecturer, School of Computing Sciences, University of East Anglia, UK
"This book could be used for junior/senior undergraduate students or first-year graduate students, as well as individuals who want to explore the field of machine learning...The book introduces not only the concepts but the underlying ideas on algorithm implementation from a critical thinking perspective."
—Guangzhi Qu, Oakland University, Rochester, Michigan, USA
Expert Oracle Exadata will give you a look under the covers at how the combination of hardware and software that comprise Exadata actually work. Authors Kerry Osborne, Randy Johnson, and Tanel Põder share their real-world experience, gained through multiple Exadata implementations with the goal of opening up the internals of the Exadata platform. This book is intended for readers who want to understand what makes the platform tick and for whom—"how" it does what it is does is as important as what it does. By being exposed to the features that are unique to Exadata, you will gain an understanding of the mechanics that will allow you to fully benefit from the advantages that the platform provides.Changes the way you think about managing SQL performance and processing Provides a roadmap to laying out the Exadata platform to best support your existing systems Dives deeply into the internals, removing the "black box" mystique and showing how Exadata actually works
Author Thomas Nield provides exercises throughout the book to help you practice your newfound SQL skills at home, without having to use a database server environment. Not only will you learn how to use key SQL statements to find and manipulate your data, but you’ll also discover how to efficiently design and manage databases to meet your needs.
You’ll also learn how to:Explore relational databases, including lightweight and centralized modelsUse SQLite and SQLiteStudio to create lightweight databases in minutesQuery and transform data in meaningful ways by using SELECT, WHERE, GROUP BY, and ORDER BYJoin tables to get a more complete view of your business dataBuild your own tables and centralized databases by using normalized design principlesManage data by learning how to INSERT, DELETE, and UPDATE records
Specialized script packages are introduced and described. Hands-on problems representative of those commonly encountered throughout the data science pipeline are provided, and we guide you in the use of Julia in solving them using published datasets. Many of these scenarios make use of existing packages and built-in functions, as we cover:
1. 1. An overview of the data science pipeline along with an example illustrating the key points, implemented in Julia
2. 2. Options for Julia IDEs
3. 3. Programming structures and functions
4. 4. Engineering tasks, such as importing, cleaning, formatting and storing data, as well as performing data preprocessing
5. 5. Data visualization and some simple yet powerful statistics for data exploration purposes
6. 6. Dimensionality reduction and feature evaluation
7. 7. Machine learning methods, ranging from unsupervised (different types of clustering) to supervised ones (decision trees, random forests, basic neural networks, regression trees, and Extreme Learning Machines)
8. 8. Graph analysis including pinpointing the connections among the various entities and how they can be mined for useful insights.Each chapter concludes with a series of questions and exercises to reinforce what you learned. The last chapter of the book will guide you in creating a data science application from scratch using Julia.
The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis. Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and Visualization Offers extensive coverage of the R statistical programming language Contains 280 end-of-chapter exercises Includes a companion website for university instructors who adopt the book
Does the subject of data analysis make you dizzy? You've come to the right place! Statistics For Big Data For Dummies breaks this often-overwhelming subject down into easily digestible parts, offering new and aspiring data analysts the foundation they need to be successful in the field. Inside, you'll find an easy-to-follow introduction to exploratory data analysis, the lowdown on collecting, cleaning, and organizing data, everything you need to know about interpreting data using common software and programming languages, plain-English explanations of how to make sense of data in the real world, and much more.
Data has never been easier to come by, and the tools students and professionals need to enter the world of big data are based on applied statistics. While the word "statistics" alone can evoke feelings of anxiety in even the most confident student or professional, it doesn't have to. Written in the familiar and friendly tone that has defined the For Dummies brand for more than twenty years, Statistics For Big Data For Dummies takes the intimidation out of the subject, offering clear explanations and tons of step-by-step instruction to help you make sense of data mining—without losing your cool.Helps you to identify valid, useful, and understandable patterns in data Provides guidance on extracting previously unknown information from large databases Shows you how to discover patterns available in big data Gives you access to the latest tools and techniques for working in big data
If you're a student enrolled in a related Applied Statistics course or a professional looking to expand your skillset, Statistics For Big Data For Dummies gives you access to everything you need to succeed.
Each chapter presents a self-contained lesson on a key SQL concept or technique, with numerous illustrations and annotated examples. Exercises at the end of each chapter let you practice the skills you learn. With this book, you will:
Move quickly through SQL basics and learn several advanced featuresUse SQL data statements to generate, manipulate, and retrieve dataCreate database objects, such as tables, indexes, and constraints, using SQL schema statementsLearn how data sets interact with queries, and understand the importance of subqueriesConvert and manipulate data with SQL's built-in functions, and use conditional logic in data statements
Knowledge of SQL is a must for interacting with data. With Learning SQL, you'll quickly learn how to put the power and flexibility of this language to work.
Packed with case studies and real-world examples, advice on how to build data competencies in an organization and crucial coverage of how to ensure your data doesn't become a liability, Data Strategy will equip any organization with the tools and strategies it needs to profit from big data, analytics and the Internet of Things.
Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s non-relational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility.Understand Cassandra’s distributed and decentralized structureUse the Cassandra Query Language (CQL) and cqlsh—the CQL shellCreate a working data model and compare it with an equivalent relational modelDevelop sample applications using client drivers for languages including Java, Python, and Node.jsExplore cluster topology and learn how nodes exchange dataMaintain a high level of performance in your clusterDeploy Cassandra on site, in the Cloud, or with DockerIntegrate Cassandra with Spark, Hadoop, Elasticsearch, Solr, and Lucene
Data mining is quickly becoming integral to creating value and business momentum. The ability to detect unseen patterns hidden in the numbers exhaustively generated by day-to-day operations allows savvy decision-makers to exploit every tool at their disposal in the pursuit of better business. By creating models and testing whether patterns hold up, it is possible to discover new intelligence that could change your business's entire paradigm for a more successful outcome.
Data Mining for Dummies shows you why it doesn't take a data scientist to gain this advantage, and empowers average business people to start shaping a process relevant to their business's needs. In this book, you'll learn the hows and whys of mining to the depths of your data, and how to make the case for heavier investment into data mining capabilities. The book explains the details of the knowledge discovery process including:Model creation, validity testing, and interpretation Effective communication of findings Available tools, both paid and open-source Data selection, transformation, and evaluation
Data Mining for Dummies takes you step-by-step through a real-world data-mining project using open-source tools that allow you to get immediate hands-on experience working with large amounts of data. You'll gain the confidence you need to start making data mining practices a routine part of your successful business. If you're serious about doing everything you can to push your company to the top, Data Mining for Dummies is your ticket to effective data mining.
The Concept and Object Modeling Notation (COMN) is able to cover the full spectrum of analysis and design. A single COMN model can represent the objects and concepts in the problem space, logical data design, and concrete NoSQL and SQL document, key-value, columnar, and relational database implementations. COMN models enable an unprecedented level of traceability of requirements to implementation. COMN models can also represent the static structure of software and the predicates that represent the patterns of meaning in databases.
This book will teach you:the simple and familiar graphical notation of COMN with its three basic shapes and four line styles how to think about objects, concepts, types, and classes in the real world, using the ordinary meanings of English words that aren’t tangled with confused techno-speak how to express logical data designs that are freer from implementation considerations than is possible in any other notation how to understand key-value, document, columnar, and table-oriented database designs in logical and physical terms how to use COMN to specify physical database implementations in any NoSQL or SQL database with the precision necessary for model-driven development
Did You Know?
-Knowledge of SQL is an important skill to display on your resume.
-With the growth of digital information, Database Administrator is one of the fastest growing careers.
-SQL can be learned in hours and used for decades.
Learn to script Transact SQL using Microsoft SQL Server.
-Create tables and databases
-create views, stored procedures and more.
Over 100 examples of SQL queries and statements along with images of results will help you learn T SQL.
A special section included in this illustrated guide will help you test your skills and get ahead in the workplace.
Now is the time to learn SQL.
Click the 'buy button' and start scripting SQL TODAY!
The book starts with an introduction to Microsoft Azure and how it differs from Office 365—Microsoft’s ‘other’ cloud. You'll also get a useful overview of the services available. Part II then takes you through setting up your Azure account, and gets you up-and-running on some of the core Azure services, including creating web sites and virtual machines, and choosing between fully cloud-based and hybrid storage solutions, depending on your needs.
Part III now takes an in-depth look at how to integrate Azure with your existing infrastructure. The authors, Anthony Puca, Mike Manning, Brent Rush, Marshall Copeland and Julian Soh, bring their depth of experience in cloud technology and customer support to guide you through the whole process, through each layer of your infrastructure from networking to operations. High availability and disaster recovery are the topics on everyone’s minds when considering a move to the cloud, and this book provides keyinsights and step-by-step guidance to help you set up and manage your resources correctly to optimize for these scenarios. You’ll also get expert advice on migrating your existing VMs to Azure using InMage, mail-in and the best 3rd party tools available, helping you ensure continuity of service with minimum disruption to the business.
In the book’s final chapters, you’ll find cutting edge examples of cloud technology in action, from machine learning to business intelligence, for a taste of some exciting ways your business could benefit from your new Microsoft Azure deployment.
"A must-have book for anyone expecting to do research and/or applications in categorical data analysis."
—Statistics in Medicine
"It is a total delight reading this book."
"If you do any analysis of categorical data, this is an essential desktop reference."
The use of statistical methods for analyzing categorical data has increased dramatically, particularly in the biomedical, social sciences, and financial industries. Responding to new developments, this book offers a comprehensive treatment of the most important methods for categorical data analysis.
Categorical Data Analysis, Third Edition summarizes the latest methods for univariate and correlated multivariate categorical responses. Readers will find a unified generalized linear models approach that connects logistic regression and Poisson and negative binomial loglinear models for discrete data with normal regression for continuous data. This edition also features:An emphasis on logistic and probit regression methods for binary, ordinal, and nominal responses for independent observations and for clustered data with marginal models and random effects models Two new chapters on alternative methods for binary response data, including smoothing and regularization methods, classification methods such as linear discriminant analysis and classification trees, and cluster analysis New sections introducing the Bayesian approach for methods in that chapter More than 100 analyses of data sets and over 600 exercises Notes at the end of each chapter that provide references to recent research and topics not covered in the text, linked to a bibliography of more than 1,200 sources A supplementary website showing how to use R and SAS; for all examples in the text, with information also about SPSS and Stata and with exercise solutions
Categorical Data Analysis, Third Edition is an invaluable tool for statisticians and methodologists, such as biostatisticians and researchers in the social and behavioral sciences, medicine and public health, marketing, education, finance, biological and agricultural sciences, and industrial quality control.