This book explains and explores the principal techniques of Data Mining: for classification, generation of association rules and clustering. It is written for readers without a strong background in mathematics or statistics and focuses on detailed examples and explanations of the algorithms given. This should prove of value to readers of all kinds, from those whose only use of data mining techniques will be via commercial packages right through to academic researchers.
This book aims to help the general reader develop the necessary understanding to use commercial data mining packages discriminatingly, as well as enabling the advanced reader to understand or contribute to future technical advances in the field. Each chapter has practical exercises to enable readers to check their progress. A full glossary of technical terms used is included.
Principles of Data Mining explains and explores the principal techniques of Data Mining: for classification, association rule mining and clustering. Each topic is clearly explained and illustrated by detailed worked examples, with a focus on algorithms rather than mathematical formalism. It is written for readers without a strong background in mathematics or statistics, and any formulae used are explained in detail.
This second edition has been expanded to include additional chapters on using frequent pattern trees for Association Rule Mining, comparing classifiers, ensemble classification and dealing with very large volumes of data.
Principles of Data Mining aims to help general readers develop the necessary understanding of what is inside the 'black box' so they can use commercial data mining packages discriminatingly, as well as enabling advanced readers or academic researchers to understand or contribute to future technical advances in the field.
Suitable as a textbook to support courses at undergraduate or postgraduate levels in a wide range of subjects including Computer Science, Business Studies, Marketing, Artificial Intelligence, Bioinformatics and Forensic Science.
They present new and innovative developments and applications, divided into technical stream sections on Knowledge Discovery and Data Mining I, Knowledge Discovery and Data Mining II, Intelligent Agents, Representation and Reasoning, and Machine Learning and Constraint Programming, followed by application stream sections on Medical Applications, Applications in Education and Information Science, and AI Applications. The volume also includes the text of short papers presented as posters at the conference.
This is the thirtieth volume in the Research and Development in Intelligent Systems series, which also incorporates the twenty-first volume in the Applications and Innovations in Intelligent Systems series. These series are essential reading for those who wish to keep up to date with developments in this important field.
They present new and innovative developments in the field, divided into sections on CBR and Classification, AI Techniques, Argumentation and Negotiation, Intelligent Systems, From Machine Learning To E-Learning and Decision Making. The volume also includes the text of short papers presented as posters at the conference.
This is the twenty-fifth volume in the Research and Development series. The series is essential reading for those who wish to keep up to date with developments in this important field.
The Application Stream papers are published as a companion volume under the title Applications and Innovations in Intelligent Systems XVI.
The papers present new and innovative developments in the field, divided into sections on Synthesis and Prediction, Scheduling and Search, Diagnosis and Monitoring, Classification and Design, and Analysis and Evaluation.
This is the fifteenth volume in the Applications and Innovations series. The series serves as a key reference on the use of AI Technology to enable organisations to solve complex problems and gain significant business benefits.
The Technical Stream papers are published as a companion volume under the title Research and Development in Intelligent Systems XXIV.
Logic Programming with Prolog does not assume that the reader is an experienced programmer or has a background in Mathematics, Logic or Artificial Intelligence. It starts from scratch and aims to arrive at the point where quite powerful programs can be written in the language. It is intended both as a textbook for an introductory course and as a self-study book. On completion readers will know enough to use Prolog in their own research or practical projects.
Each chapter has self-assessment exercises so that readers may check their own progress. A glossary of the technical terms used completes the book.
This second edition has been revised to be fully compatible with SWI-Prolog, a popular multi-platform public domain implementation of the language. Additional chapters have been added covering the use of Prolog to analyse English sentences and to illustrate how Prolog can be used to implement applications of an 'Artificial Intelligence' kind.
Max Bramer is Emeritus Professor of Information Technology at the University of Portsmouth, England. He has taught Prolog to undergraduate computer science students and used Prolog in his own work for many years.
They present new and innovative developments and applications, divided into technical stream sections on Data Mining, Data Mining and Machine Learning, Planning and Optimisation, and Knowledge Management and Prediction, followed by application stream sections on Language and Classification, Recommendation, Practical Applications and Systems, and Data Mining and Machine Learning. The volume also includes the text of short papers presented as posters at the conference.
This is the twenty-ninth volume in the Research and Development in Intelligent Systems series, which also incorporates the twentieth volume in the Applications and Innovations in Intelligent Systems series. These series are essential reading for those who wish to keep up to date with developments in this important field.
Each chapter includes aims, a summary and practical exercises (with solutions) to support learning. Chapters are designed to stand alone as far as possible, so that they can be studied independently of the rest of the text by those with some previous knowledge of the languages. There is a comprehensive glossary of technical terms, together with extensive appendices for quick reference of language features.
They present new and innovative developments and applications, divided into technical stream sections on Knowledge Discovery and Data Mining, Machine Learning, and Agents, Ontologies and Genetic Programming, followed by application stream sections on Evolutionary Algorithms/Dynamic Modelling, Planning and Optimisation, and Machine Learning and Data Mining. The volume also includes the text of short papers presented as posters at the conference.
This is the thirty-first volume in the Research and Development in Intelligent Systems series, which also incorporates the twenty-second volume in the Applications and Innovations in Intelligent Systems series. These series are essential reading for those who wish to keep up to date with developments in this important field.
This state-of-the-art survey not only serves as a "position paper" on the field from the viewpoint of expert members of the IFIP Technical Committee 12, its Working Groups and their colleagues, but also presents overviews of current work in different countries.
The chapters describe important relatively new or emerging areas of work in which the authors are personally involved, including text and hypertext categorization; autonomous systems; affective intelligence; AI in electronic healthcare systems; artifact-mediated society and social intelligence design; multilingual knowledge management; agents, intelligence and tools; intelligent user profiling; and supply chain business intelligence. They provide an interesting international perspective on where this significant field is going at the end of the first decade of the twenty-first century.
They present new and innovative developments and applications, divided into technical stream sections on Planning, Evolutionary Algorithms, Speech and Vision, and Machine Learning, followed by application stream sections on Knowledge Discovery and Data Mining, Machine Learning, Evolutionary Algorithms and AI in Action. The volume also includes the text of short papers presented as posters at the conference.
This is the twenty-eighth volume in the Research and Development in Intelligent Systems series, which also incorporates the nineteenth volume in the Applications and Innovations in Intelligent Systems series. These series are essential reading for those who wish to keep up to date with developments in this important field.
Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.Understand how data science fits in your organization—and how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidates
Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one stroke, with a new understanding of intelligence itself.
Hawkins develops a powerful theory of how the human brain works, explaining why computers are not intelligent and how, based on this new theory, we can finally build intelligent machines.
The brain is not a computer, but a memory system that stores experiences in a way that reflects the true structure of the world, remembering sequences of events and their nested relationships and making predictions based on those memories. It is this memory-prediction system that forms the basis of intelligence, perception, creativity, and even consciousness.
In an engaging style that will captivate audiences from the merely curious to the professional scientist, Hawkins shows how a clear understanding of how the brain works will make it possible for us to build intelligent machines, in silicon, that will exceed our human ability in surprising ways.
Written with acclaimed science writer Sandra Blakeslee, On Intelligence promises to completely transfigure the possibilities of the technology age. It is a landmark book in its scope and clarity.
NoSQL Distilled is a concise but thorough introduction to this rapidly emerging technology. Pramod J. Sadalage and Martin Fowler explain how NoSQL databases work and the ways that they may be a superior alternative to a traditional RDBMS. The authors provide a fast-paced guide to the concepts you need to know in order to evaluate whether NoSQL databases are right for your needs and, if so, which technologies you should explore further.
The first part of the book concentrates on core concepts, including schemaless data models, aggregates, new distribution models, the CAP theorem, and map-reduce. In the second part, the authors explore architectural and design issues associated with implementing NoSQL. They also present realistic use cases that demonstrate NoSQL databases at work and feature representative examples using Riak, MongoDB, Cassandra, and Neo4j.
In addition, by drawing on Pramod Sadalage’s pioneering work, NoSQL Distilled shows how to implement evolutionary design with schema migration: an essential technique for applying NoSQL databases. The book concludes by describing how NoSQL is ushering in a new age of Polyglot Persistence, where multiple data-storage worlds coexist, and architects can choose the technology best optimized for each type of data access.
This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.
Artificial Intelligence helps choose what books you buy, what movies you see, and even who you date. It puts the "smart" in your smartphone and soon it will drive your car. It makes most of the trades on Wall Street, and controls vital energy, water, and transportation infrastructure. But Artificial Intelligence can also threaten our existence.
In as little as a decade, AI could match and then surpass human intelligence. Corporations and government agencies are pouring billions into achieving AI's Holy Grail—human-level intelligence. Once AI has attained it, scientists argue, it will have survival drives much like our own. We may be forced to compete with a rival more cunning, more powerful, and more alien than we can imagine.
Through profiles of tech visionaries, industry watchdogs, and groundbreaking AI systems, Our Final Invention explores the perils of the heedless pursuit of advanced AI. Until now, human intelligence has had no rival. Can we coexist with beings whose intelligence dwarfs our own? And will they allow us to?
Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:Collaborative filtering techniques that enable online retailers to recommend products or mediaMethods of clustering to detect groups of similar items in a large datasetSearch engine features -- crawlers, indexers, query engines, and the PageRank algorithmOptimization algorithms that search millions of possible solutions to a problem and choose the best oneBayesian filtering, used in spam filters for classifying documents based on word types and other featuresUsing decision trees not only to make predictions, but to model the way decisions are madePredicting numerical values rather than classifications to build price modelsSupport vector machines to match people in online dating sitesNon-negative matrix factorization to find the independent features in a datasetEvolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a gameEach chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you.
"Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."
-- Dan Russell, Google
"Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths."
-- Tim Wolters, CTO, Collective Intellect
Let's face it, SQL is a deceptively simple language to learn, and many database developers never go far beyond the simple statement: SELECT columns FROM table WHERE conditions. But there is so much more you can do with the language. In the SQL Cookbook, experienced SQL developer Anthony Molinaro shares his favorite SQL techniques and features. You'll learn about:
Window functions, arguably the most significant enhancement to SQL in the past decade. If you're not using these, you're missing out
Powerful, database-specific features such as SQL Server's PIVOT and UNPIVOT operators, Oracle's MODEL clause, and PostgreSQL's very useful GENERATE_SERIES function
Pivoting rows into columns, reverse-pivoting columns into rows, using pivoting to facilitate inter-row calculations, and double-pivoting a result set
Bucketization, and why you should never use that term in Brooklyn.
How to create histograms, summarize data into buckets, perform aggregations over a moving range of values, generate running-totals and subtotals, and other advanced, data warehousing techniques
The technique of walking a string, which allows you to use SQL to parse through the characters, words, or delimited elements of a string
Written in O'Reilly's popular Problem/Solution/Discussion style, the SQL Cookbook is sure to please. Anthony's credo is: "When it comes down to it, we all go to work, we all have bills to pay, and we all want to go home at a reasonable time and enjoy what's still available of our days." The SQL Cookbook moves quickly from problem to solution, saving you time each step of the way.
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.Get a crash course in PythonLearn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data scienceCollect, explore, clean, munge, and manipulate dataDive into the fundamentals of machine learningImplement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clusteringExplore recommender systems, natural language processing, network analysis, MapReduce, and databases
Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
Updated to reflect recent advances in MySQL and InnoDB performance, features, and tools, this third edition not only offers specific examples of how MySQL works, it also teaches you why this system works as it does, with illustrative stories and case studies that demonstrate MySQL’s principles in action. With this book, you’ll learn how to think in MySQL.Learn the effects of new features in MySQL 5.5, including stored procedures, partitioned databases, triggers, and viewsImplement improvements in replication, high availability, and clusteringAchieve high performance when running MySQL in the cloudOptimize advanced querying features, such as full-text searchesTake advantage of modern multi-core CPUs and solid-state disksExplore backup and recovery strategies—including new tools for hot online backups
Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:
Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduceBecome familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistenceDiscover common pitfalls and advanced features for writing real-world MapReduce programsDesign, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloudUse Pig, a high-level query language for large-scale data processingTake advantage of HBase, Hadoop's database for structured and semi-structured dataLearn ZooKeeper, a toolkit of coordination primitives for building distributed systems
If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject.
"Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk."-- Doug Cutting, Hadoop Founder, Yahoo!
Each chapter presents a self-contained lesson on a key SQL concept or technique, with numerous illustrations and annotated examples. Exercises at the end of each chapter let you practice the skills you learn. With this book, you will:
Move quickly through SQL basics and learn several advanced featuresUse SQL data statements to generate, manipulate, and retrieve dataCreate database objects, such as tables, indexes, and constraints, using SQL schema statementsLearn how data sets interact with queries, and understand the importance of subqueriesConvert and manipulate data with SQL's built-in functions, and use conditional logic in data statements
Knowledge of SQL is a must for interacting with data. With Learning SQL, you'll quickly learn how to put the power and flexibility of this language to work.
If you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning – whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource.What You Will LearnExplore how to use different machine learning models to ask different questions of your dataLearn how to build neural networks using Keras and TheanoFind out how to write clean and elegant Python code that will optimize the strength of your algorithmsDiscover how to embed your machine learning model in a web application for increased accessibilityPredict continuous target outcomes using regression analysisUncover hidden patterns and structures in data with clusteringOrganize data using effective pre-processing techniquesGet to grips with sentiment analysis to delve deeper into textual and social media dataIn Detail
Machine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, becoming one of the key strategies for unlocking growth in a challenging contemporary marketplace. Python can help you deliver key insights into your data – its unique capabilities as a language let you build sophisticated algorithms and statistical models that can reveal new perspectives and answer key questions that are vital for success.
Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world's leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you'll soon be able to answer some of the most important questions facing you and your organization.Style and approach
Python Machine Learning connects the fundamental theoretical principles behind machine learning to their practical application in a way that focuses you on asking and answering the right questions. It walks you through the key elements of Python and its powerful machine learning libraries, while demonstrating how to get to grips with a range of statistical models.
But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope.
Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet.
Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype.
But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, the math and the magic, behind big data.
Each chapter will cover a different technique in a spreadsheet so you can follow along:Mathematical optimization, including non-linear programming and genetic algorithms Clustering via k-means, spherical k-means, and graph modularity Data mining in graphs, such as outlier detection Supervised AI through logistic regression, ensemble models, and bag-of-words models Forecasting, seasonal adjustments, and prediction intervals through monte carlo simulation Moving from spreadsheets into the R programming language
You get your hands dirty as you work alongside John through each technique. But never fear, the topics are readily applicable and the author laces humor throughout. You'll even learn what a dead squirrel has to do with optimization modeling, which you no doubt are dying to know.
In science fiction, artificial intelligence takes the shape of computers that can speak like people, think for themselves, and sometimes act against us. Sometimes the machines seem to know everything, and symbolize implacable and unknowable power, as in The Matrix. Such machines can also embody the limits of logic, and by extension our own powers of reason. In Arthur C. Clarke's 2001: A Space Odyssey, HAL was a computer of vast capability driven insane by the demands of his programming – to honestly and completely report information – when those instructions conflicted with orders to keep state secrets. Star Trek has given us the android, Lieutenant Commander Data, who strives to be more human. None of these visions came true in quite the way science fiction writers imagined, even though in many ways computers surpass their fictional counterparts. This eBook reviews work in the field and covers topics from chess-playing to quantum computing. The writers tackle how to make computers more powerful, how we define consciousness, what the hard problems are and even how computers might be built once the limits of silicon chips have been reached. Artificial intelligence also raises some thorny ethical questions, such as whether morality can be programmed. These are kinds of issues that make artificial intelligence and computing fascinating. Building an intelligent machine brings together the human desire to create and the question of what makes us what we are. If anyone ever builds a true thinking machine, that last question becomes much more complicated, not less. Data and HAL would probably agree.
This updated second edition provides guidance for database developers, advanced configuration for system administrators, and an overview of the concepts and use cases for other people on your project. Ideal for NoSQL newcomers and experienced MongoDB users alike, this guide provides numerous real-world schema design examples.Get started with MongoDB core concepts and vocabularyPerform basic write operations at different levels of safety and speedCreate complex queries, with options for limiting, skipping, and sorting resultsDesign an application that works well with MongoDBAggregate data, including counting, finding distinct values, grouping documents, and using MapReduceGather and interpret statistics about your collections and databasesSet up replica sets and automatic failover in MongoDBUse sharding to scale horizontally, and learn how it impacts applicationsDelve into monitoring, security and authentication, backup/restore, and other administrative tasks
Detailing the hows and the whys of successful Essbase implementation, the book arms you with simple yet powerful tools to meet your immediate needs, as well as the theoretical knowledge to proceed to the next level with Essbase. Infrastructure, data sourcing and transformation, database design, calculations, automation, APIs, reporting, and project implementation are covered by subject matter experts who work with the tools and techniques on a daily basis. In addition to practical cases that illustrate valuable lessons learned, the book offers:
Undocumented Secrets—Dan Pressman describes the previously unpublished and undocumented inner workings of the ASO Essbase engine. Authoritative Experts—If you have questions that no one else can solve, these 12 Essbase professionals are the ones who can answer them. Unpublished—Includes the only third-party guide to infrastructure. Infrastructure is easy to get wrong and can doom any Essbase project. Comprehensive—Let there never again be a question on how to create blocks or design BSO databases for performance—Dave Farnsworth provides the answers within. Innovative—Cameron Lackpour and Joe Aultman bring new and exciting solutions to persistent Essbase problems.
With a list of contributors as impressive as the program of presenters at a leading Essbase conference, this book offers unprecedented access to the insights and experiences of those at the forefront of the field. The previously unpublished material presented in these pages will give you the practical knowledge needed to use this powerful and intuitive tool to build highly useful analytical models, reporting systems, and forecasting applications.
Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.Get a high-level overview of HDFS and MapReduce: why they exist and how they workPlan a Hadoop deployment, from hardware and OS selection to network requirementsLearn setup and configuration details with a list of critical propertiesManage resources by sharing a cluster across multiple groupsGet a runbook of the most common cluster maintenance tasksMonitor Hadoop clusters—and learn troubleshooting with the help of real-world war storiesUse basic tools and techniques to handle backup and catastrophic failure
This book offers practical answers to some of the hardest questions faced by PL/SQL developers, including:What is the best way to write the SQL logic in my application code?
How should I write my packages so they can be leveraged by my entire team of developers?
How can I make sure that all my team's programs handle and record errors consistently?Oracle PL/SQL Best Practices summarizes PL/SQL best practices in nine major categories: overall PL/SQL application development; programming standards; program testing, tracing, and debugging; variables and data structures; control logic; error handling; the use of SQL in PL/SQL; building procedures, functions, packages, and triggers; and overall program performance.
This book is a concise and entertaining guide that PL/SQL developers will turn to again and again as they seek out ways to write higher quality code and more successful applications.
"This book presents ideas that make the difference between a successful project and one that never gets off the ground. It goes beyond just listing a set of rules, and provides realistic scenarios that help the reader understand where the rules come from. This book should be required reading for any team of Oracle database professionals."
--Dwayne King, President, KRIDAN Consulting
This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
Implementations, as well as interesting, real-world examples of each data structure and algorithm, are included.
Using both a programming style and a writing style that are exceptionally clean, Kyle Loudon shows you how to use such essential data structures as lists, stacks, queues, sets, trees, heaps, priority queues, and graphs. He explains how to use algorithms for sorting, searching, numerical analysis, data compression, data encryption, common graph problems, and computational geometry. And he describes the relative efficiency of all implementations. The compression and encryption chapters not only give you working code for reasonably efficient solutions, they offer explanations of concepts in an approachable manner for people who never have had the time or expertise to study them in depth.
Anyone with a basic understanding of the C language can use this book. In order to provide maintainable and extendible code, an extra level of abstraction (such as pointers to functions) is used in examples where appropriate. Understanding that these techniques may be unfamiliar to some programmers, Loudon explains them clearly in the introductory chapters.
Contents include:PointersRecursionAnalysis of algorithmsData structures (lists, stacks, queues, sets, hash tables, trees, heaps, priority queues, graphs)Sorting and searchingNumerical methodsData compressionData encryptionGraph algorithmsGeometric algorithms
Your freemium product generates vast volumes of data, but using that data to maximize conversion, boost retention, and deliver revenue can be challenging if you don't fully understand the impact that small changes can have on revenue. In this book, author Eric Seufert provides clear guidelines for using data and analytics through all stages of development to optimize your implementation of the freemium model. Freemium Economics de-mystifies the freemium model through an exploration of its core, data-oriented tenets, so that you can apply it methodically rather than hoping that conversion and revenue will naturally follow product launch.
By reading Freemium Economics, you will:Learn how to apply data science and big data principles in freemium product design and development to maximize conversion, boost retention, and deliver revenue Gain a broad introduction to the conceptual economic pillars of freemium and a complete understanding of the unique approaches needed to acquire users and convert them from free to paying customers Get practical tips and analytical guidance to successfully implement the freemium model Understand the metrics and infrastructure required to measure the success of a freemium product and improve it post-launch Includes a detailed explanation of the lifetime customer value (LCV) calculation and step-by-step instructions for implementing key performance indicators in a simple, universally-accessible tool like Excel
This book will help you:Become a contributor on a data science team Deploy a structured lifecycle approach to data analytics problems Apply appropriate analytic techniques and tools to analyzing big data Learn how to tell a compelling story with data to drive business action Prepare for EMC Proven Professional Data Science Certification
Corresponding data sets are available at www.wiley.com/go/9781118876138.
Get started discovering, analyzing, visualizing, and presenting data in a meaningful way today!
The example code for this unique data science book is maintained in a public GitHub repository. It’s designed to be easily accessible through a turnkey virtual machine that facilitates interactive learning with an easy-to-use collection of IPython Notebooks.
Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power.
Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention.
Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You’ll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you’ve mastered these techniques, you’ll constantly turn to this guide for the working PyMC code you need to jumpstart future projects.
• Learning the Bayesian “state of mind” and its practical implications
• Understanding how computers perform Bayesian inference
• Using the PyMC Python library to program Bayesian analyses
• Building and debugging models with PyMC
• Testing your model’s “goodness of fit”
• Opening the “black box” of the Markov Chain Monte Carlo algorithm to see how and why it works
• Leveraging the power of the “Law of Large Numbers”
• Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning
• Using loss functions to measure an estimate’s weaknesses based on your goals and desired outcomes
• Selecting appropriate priors and understanding how their influence changes with dataset size
• Overcoming the “exploration versus exploitation” dilemma: deciding when “pretty good” is good enough
• Using Bayesian inference to improve A/B testing
• Solving data science problems when only small amounts of data are available
Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.
Maybe you've written some simple SQL queries to interact with databases. But now you want more, you want to really dig into those databases and work with your data. Head First SQL will show you the fundamentals of SQL and how to really take advantage of it. We'll take you on a journey through the language, from basic INSERT statements and SELECT queries to hardcore database manipulation with indices, joins, and transactions. We all know "Data is Power" - but we'll show you how to have "Power over your Data". Expect to have fun, expect to learn, and expect to be querying, normalizing, and joining your data like a pro by the time you're finished reading!
This book is an in-depth guide to the use of pandas for data analysis, for either the seasoned data analysis practitioner or the novice user. It provides a basic introduction to the pandas framework, and takes users through the installation of the library and the IPython interactive environment. Thereafter, you will learn basic as well as advanced features, such as MultiIndexing, modifying data structures, and sampling data, which provide powerful capabilities for data analysis.
The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as short courses on sparse modeling, deep learning, and probabilistic graphical models.All major classical techniques: Mean/Least-Squares regression and filtering, Kalman filtering, stochastic approximation and online learning, Bayesian classification, decision trees, logistic regression and boosting methods.The latest trends: Sparsity, convex analysis and optimization, online distributed algorithms, learning in RKH spaces, Bayesian inference, graphical and hidden Markov models, particle filtering, deep learning, dictionary learning and latent variables modeling.Case studies - protein folding prediction, optical character recognition, text authorship identification, fMRI data analysis, change point detection, hyperspectral image unmixing, target localization, channel equalization and echo cancellation, show how the theory can be applied.MATLAB code for all the main algorithms are available on an accompanying website, enabling the reader to experiment with the code.