Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power.
Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention.
Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You’ll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you’ve mastered these techniques, you’ll constantly turn to this guide for the working PyMC code you need to jumpstart future projects.
• Learning the Bayesian “state of mind” and its practical implications
• Understanding how computers perform Bayesian inference
• Using the PyMC Python library to program Bayesian analyses
• Building and debugging models with PyMC
• Testing your model’s “goodness of fit”
• Opening the “black box” of the Markov Chain Monte Carlo algorithm to see how and why it works
• Leveraging the power of the “Law of Large Numbers”
• Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning
• Using loss functions to measure an estimate’s weaknesses based on your goals and desired outcomes
• Selecting appropriate priors and understanding how their influence changes with dataset size
• Overcoming the “exploration versus exploitation” dilemma: deciding when “pretty good” is good enough
• Using Bayesian inference to improve A/B testing
• Solving data science problems when only small amounts of data are available
Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.
Cameron Davidson-Pilon has seen many fields of applied mathematics, from evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His main contributions to the open-source community include Bayesian Methods for Hackers and lifelines. Cameron was raised in Guelph, Ontario, but was educated at the University of Waterloo and Independent University of Moscow. He currently lives in Ottawa, Ontario, working with the online commerce leader Shopify.
This IBM Redbooks® publication provides a guide for deploying the Guardium solutions.
This book also provides a roadmap process for implementing an InfoSphere Guardium solution that is based on years of experience and best practices that were collected from various Guardium experts. We describe planning, installation, configuration, monitoring, and administrating an InfoSphere Guardium environment. We also describe use cases and how InfoSphere Guardium integrates with other IBM products.
The guidance can help you successfully deploy and manage an IBM InfoSphere Guardium system.
This book is intended for the system administrators and support staff who are responsible for deploying or supporting an InfoSphere Guardium environment.
Although universal access to data is desirable, safeguards are necessary to protect people's privacy, prevent data leakage, and detect suspicious activity.
The data reservoir is a reference architecture that balances the desire for easy access to data with information governance and security. The data reservoir reference architecture describes the technical capabilities necessary for a system of insight, while being independent of specific technologies. Being technology independent is important, because most organizations already have investments in data platforms that they want to incorporate in their solution. In addition, technology is continually improving, and the choice of technology is often dictated by the volume, variety, and velocity of the data being managed.
A system of insight needs more than technology to succeed. The data reservoir reference architecture includes description of governance and management processes and definitions to ensure the human and business systems around the technology support a collaborative, self-service, and safe environment for data use.
The data reservoir reference architecture was first introduced in Governing and Managing Big Data for Analytics and Decision Makers, REDP-5120, which is available at:
This IBM® Redbooks publication, Designing and Operating a Data Reservoir, builds on that material to provide more detail on the capabilities and internal workings of a data reservoir.
The authors describe the systems they have designed and developed: email worm detection using data mining, a scalable multi-level feature extraction technique to detect malicious executables, detecting remote exploits using data mining, and flow-based identification of botnet traffic by mining multiple log files. For each of these tools, they detail the system architecture, algorithms, performance results, and limitations.
Discusses data mining for emerging applications, including adaptable malware detection, insider threat detection, firewall policy analysis, and real-time data mining Includes four appendices that provide a firm foundation in data management, secure systems, and the semantic web Describes the authors’ tools for stream data mining
From algorithms to experimental results, this is one of the few books that will be equally valuable to those in industry, government, and academia. It will help technologists decide which tools to select for specific applications, managers will learn how to determine whether or not to proceed with a data mining project, and developers will find innovative alternative designs for a range of applications.
The Handbook of Research on Information Security and Assurance includes 47 chapters offering comprehensive definitions and explanations on topics such as firewalls, information warfare, encryption standards, and social and ethical concerns in enterprise security. Edited by over 90 scholars in information science, this reference provides tools to combat the growing risk associated with technology.
This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.
Bayesian statistical methods are becoming more common and more important, but not many resources are available to help beginners. Based on undergraduate classes taught by author Allen Downey, this book’s computational approach helps you get a solid start.Use your existing programming skills to learn and understand Bayesian statisticsWork with problems involving estimation, prediction, decision analysis, evidence, and hypothesis testingGet started with simple examples, using coins, M&Ms, Dungeons & Dragons dice, paintball, and hockeyLearn computational methods for solving real-world problems, such as interpreting SAT scores, simulating kidney tumors, and modeling the human microbiome.
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.Get a crash course in PythonLearn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data scienceCollect, explore, clean, munge, and manipulate dataDive into the fundamentals of machine learningImplement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clusteringExplore recommender systems, natural language processing, network analysis, MapReduce, and databases
Highlighting both underlying concepts and practical computational skills, Data Mining and Business Analytics with R begins with coverage of standard linear regression and the importance of parsimony in statistical modeling. The book includes important topics such as penalty-based variable selection (LASSO); logistic regression; regression and classification trees; clustering; principal components and partial least squares; and the analysis of text and network data. In addition, the book presents:
• A thorough discussion and extensive demonstration of the theory behind the most useful data mining tools
• Illustrations of how to use the outlined concepts in real-world situations
• Readily available additional data sets and related R code allowing readers to apply their own analyses to the discussed materials
• Numerous exercises to help readers with computing skills and deepen their understanding of the material
Data Mining and Business Analytics with R is an excellent graduate-level textbook for courses on data mining and business analytics. The book is also a valuable reference for practitioners who collect and analyze data in the fields of finance, operations management, marketing, and the information sciences.
Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.Understand how data science fits in your organization—and how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidates