Similar
The book begins with a summary of the nontechnical aspects of interviewing, such as common mistakes, strategies for a great interview, perspectives from the other side of the table, tips on negotiating the best offer, and a guide to the best ways to use EPI.
The technical core of EPI is a sequence of chapters on basic and advanced data structures, searching, sorting, broad algorithmic principles, concurrency, and system design. Each chapter consists of a brief review, followed by a broad and thought-provoking series of problems. We include a summary of data structure, algorithm, and problem solving patterns.
An audacious, irreverent investigation of human behavior—and a first look at a revolution in the making
Our personal data has been used to spy on us, hire and fire us, and sell us stuff we don’t need. In Dataclysm, Christian Rudder uses it to show us who we truly are.
For centuries, we’ve relied on polling or small-scale lab experiments to study human behavior. Today, a new approach is possible. As we live more of our lives online, researchers can finally observe us directly, in vast numbers, and without filters. Data scientists have become the new demographers.
In this daring and original book, Rudder explains how Facebook "likes" can predict, with surprising accuracy, a person’s sexual orientation and even intelligence; how attractive women receive exponentially more interview requests; and why you must have haters to be hot. He charts the rise and fall of America’s most reviled word through Google Search and examines the new dynamics of collaborative rage on Twitter. He shows how people express themselves, both privately and publicly. What is the least Asian thing you can say? Do people bathe more in Vermont or New Jersey? What do black women think about Simon & Garfunkel? (Hint: they don’t think about Simon & Garfunkel.) Rudder also traces human migration over time, showing how groups of people move from certain small towns to the same big cities across the globe. And he grapples with the challenge of maintaining privacy in a world where these explorations are possible.
Visually arresting and full of wit and insight, Dataclysm is a new way of seeing ourselves—a brilliant alchemy, in which math is made human and numbers become the narrative of our time.
From the Hardcover edition.
The algorithms in this book represent a body of knowledge developed over the last 50 years that has become indispensable, not just for professional programmers and computer science students but for any student with interests in science, mathematics, and engineering, not to mention students who use computation in the liberal arts.
The companion web site, algs4.cs.princeton.edu, contains
An online synopsis Full Java implementations Test data Exercises and answers Dynamic visualizations Lecture slides Programming assignments with checklists Links to related materialThe MOOC related to this book is accessible via the "Online Course" link at algs4.cs.princeton.edu. The course offers more than 100 video lecture segments that are integrated with the text, extensive online assessments, and the large-scale discussion forums that have proven so valuable. Offered each fall and spring, this course regularly attracts tens of thousands of registrants.
Robert Sedgewick and Kevin Wayne are developing a modern approach to disseminating knowledge that fully embraces technology, enabling people all around the world to discover new ways of learning and teaching. By integrating their textbook, online content, and MOOC, all at the state of the art, they have built a unique resource that greatly expands the breadth and depth of the educational experience.
“Artfully envisions a breathtakingly better world.” —Los Angeles Times
“Elaborate, smart and persuasive.” —The Boston Globe
“A pleasure to read.” —The Wall Street Journal
One of CBS News’s Best Fall Books of 2005 • Among St Louis Post-Dispatch’s Best Nonfiction Books of 2005 • One of Amazon.com’s Best Science Books of 2005
A radical and optimistic view of the future course of human development from the bestselling author of How to Create a Mind and The Age of Spiritual Machines who Bill Gates calls “the best person I know at predicting the future of artificial intelligence”
For over three decades, Ray Kurzweil has been one of the most respected and provocative advocates of the role of technology in our future. In his classic The Age of Spiritual Machines, he argued that computers would soon rival the full range of human intelligence at its best. Now he examines the next step in this inexorable evolutionary process: the union of human and machine, in which the knowledge and skills embedded in our brains will be combined with the vastly greater capacity, speed, and knowledge-sharing ability of our creations.
From the Trade Paperback edition.
A Huffington Post Definitive Tech Book of 2013
Artificial Intelligence helps choose what books you buy, what movies you see, and even who you date. It puts the "smart" in your smartphone and soon it will drive your car. It makes most of the trades on Wall Street, and controls vital energy, water, and transportation infrastructure. But Artificial Intelligence can also threaten our existence.
In as little as a decade, AI could match and then surpass human intelligence. Corporations and government agencies are pouring billions into achieving AI's Holy Grail—human-level intelligence. Once AI has attained it, scientists argue, it will have survival drives much like our own. We may be forced to compete with a rival more cunning, more powerful, and more alien than we can imagine.
Through profiles of tech visionaries, industry watchdogs, and groundbreaking AI systems, Our Final Invention explores the perils of the heedless pursuit of advanced AI. Until now, human intelligence has had no rival. Can we coexist with beings whose intelligence dwarfs our own? And will they allow us to?
Ray Kurzweil is arguably today’s most influential—and often controversial—futurist. In How to Create a Mind, Kurzweil presents a provocative exploration of the most important project in human-machine civilization—reverse engineering the brain to understand precisely how it works and using that knowledge to create even more intelligent machines.
Kurzweil discusses how the brain functions, how the mind emerges from the brain, and the implications of vastly increasing the powers of our intelligence in addressing the world’s problems. He thoughtfully examines emotional and moral intelligence and the origins of consciousness and envisions the radical possibilities of our merging with the intelligent technology we are creating.
Certain to be one of the most widely discussed and debated science books of the year, How to Create a Mind is sure to take its place alongside Kurzweil’s previous classics which include Fantastic Voyage: Live Long Enough to Live Forever and The Age of Spiritual Machines.
From the Hardcover edition.
Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one stroke, with a new understanding of intelligence itself.
Hawkins develops a powerful theory of how the human brain works, explaining why computers are not intelligent and how, based on this new theory, we can finally build intelligent machines.
The brain is not a computer, but a memory system that stores experiences in a way that reflects the true structure of the world, remembering sequences of events and their nested relationships and making predictions based on those memories. It is this memory-prediction system that forms the basis of intelligence, perception, creativity, and even consciousness.
In an engaging style that will captivate audiences from the merely curious to the professional scientist, Hawkins shows how a clear understanding of how the brain works will make it possible for us to build intelligent machines, in silicon, that will exceed our human ability in surprising ways.
Written with acclaimed science writer Sandra Blakeslee, On Intelligence promises to completely transfigure the possibilities of the technology age. It is a landmark book in its scope and clarity.
This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.
The Art of Computer Programming, Volumes 1-4A Boxed Set, 3/e
ISBN: 0321751043
Art of Computer Programming, Volume 4, Fascicle 4,The: Generating All Trees--History of Combinatorial Generation: Generating All Trees--History of Combinatorial Generation
This multivolume work on the analysis of algorithms has long been recognized as the definitive description of classical computer science.The three complete volumes published to date already comprise a unique and invaluable resource in programming theory and practice. Countless readers have spoken about the profound personal influence of Knuth's writings. Scientists have marveled at the beauty and elegance of his analysis, while practicing programmers have successfully applied his “cookbook” solutions to their day-to-day problems. All have admired Knuth for the breadth, clarity, accuracy, and good humor found in his books.
To begin the fourth and later volumes of the set, and to update parts of the existing three, Knuth has created a series of small books called fascicles, which will be published at regular intervals. Each fascicle will encompass a section or more of wholly new or revised material. Ultimately, the content of these fascicles will be rolled up into the comprehensive, final versions of each volume, and the enormous undertaking that began in 1962 will be complete.
Volume 4, Fascicle 4
This latest fascicle covers the generation of all trees, a basic topic that has surprisingly rich ties to the first three volumes of The Art of Computer Programming. In thoroughly discussing this well-known subject, while providing 124 new exercises, Knuth continues to build a firm foundation for programming. To that same end, this fascicle also covers the history of combinatorial generation. Spanning many centuries, across many parts of the world, Knuth tells a fascinating story of interest and relevance to every artful programmer, much of it never before told. The story even includes a touch of suspense: two problems that no one has yet been able to solve.
We are living in the computer age, in a world increasingly designed and engineered by computer programmers and software designers, by people who call themselves hackers. Who are these people, what motivates them, and why should you care?
Consider these facts: Everything around us is turning into computers. Your typewriter is gone, replaced by a computer. Your phone has turned into a computer. So has your camera. Soon your TV will. Your car was not only designed on computers, but has more processing power in it than a room-sized mainframe did in 1970. Letters, encyclopedias, newspapers, and even your local store are being replaced by the Internet.
Hackers & Painters: Big Ideas from the Computer Age, by Paul Graham, explains this world and the motivations of the people who occupy it. In clear, thoughtful prose that draws on illuminating historical examples, Graham takes readers on an unflinching exploration into what he calls "an intellectual Wild West."
The ideas discussed in this book will have a powerful and lasting impact on how we think, how we work, how we develop technology, and how we live. Topics include the importance of beauty in software design, how to make wealth, heresy and free speech, the programming language renaissance, the open-source movement, digital design, internet startups, and more.
SQLite is a small, embeddable, SQL-based, relational database management system. It has been widely used in low- to medium-tier database applications, especially in embedded devices. This book provides a comprehensive description of SQLite database system. It describes design principles, engineering trade-offs, implementation issues, and operations of SQLite.
The first edition became a widely used text in universities worldwide as well as the standard reference for professionals. The second edition featured new chapters on the role of algorithms, probabilistic analysis and randomized algorithms, and linear programming. The third edition has been revised and updated throughout. It includes two completely new chapters, on van Emde Boas trees and multithreaded algorithms, substantial additions to the chapter on recurrence (now called "Divide-and-Conquer"), and an appendix on matrices. It features improved treatment of dynamic programming and greedy algorithms and a new notion of edge-based flow in the material on flow networks. Many new exercises and problems have been added for this edition. The international paperback edition is no longer available; the hardcover is available worldwide.
Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you’ll learn how to analyze sample datasets and write simple machine learning algorithms. Machine Learning for Hackers is ideal for programmers from any background, including business, government, and academic research.
Develop a naïve Bayesian classifier to determine if an email is spam, based only on its textUse linear regression to predict the number of page views for the top 1,000 websitesLearn optimization techniques by attempting to break a simple letter cipherCompare and contrast U.S. Senators statistically, based on their voting recordsBuild a “whom to follow” recommendation system from Twitter dataIf you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.
Get a crash course in PythonLearn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data scienceCollect, explore, clean, munge, and manipulate dataDive into the fundamentals of machine learningImplement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clusteringExplore recommender systems, natural language processing, network analysis, MapReduce, and databasesUpdated to reflect recent advances in MySQL and InnoDB performance, features, and tools, this third edition not only offers specific examples of how MySQL works, it also teaches you why this system works as it does, with illustrative stories and case studies that demonstrate MySQL’s principles in action. With this book, you’ll learn how to think in MySQL.
Learn the effects of new features in MySQL 5.5, including stored procedures, partitioned databases, triggers, and viewsImplement improvements in replication, high availability, and clusteringAchieve high performance when running MySQL in the cloudOptimize advanced querying features, such as full-text searchesTake advantage of modern multi-core CPUs and solid-state disksExplore backup and recovery strategies—including new tools for hot online backupsBased on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.
Understand how data science fits in your organization—and how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidatesFrom the Trade Paperback edition.
If you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning – whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource.
What You Will LearnExplore how to use different machine learning models to ask different questions of your dataLearn how to build neural networks using Keras and TheanoFind out how to write clean and elegant Python code that will optimize the strength of your algorithmsDiscover how to embed your machine learning model in a web application for increased accessibilityPredict continuous target outcomes using regression analysisUncover hidden patterns and structures in data with clusteringOrganize data using effective pre-processing techniquesGet to grips with sentiment analysis to delve deeper into textual and social media dataIn DetailMachine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, becoming one of the key strategies for unlocking growth in a challenging contemporary marketplace. Python can help you deliver key insights into your data – its unique capabilities as a language let you build sophisticated algorithms and statistical models that can reveal new perspectives and answer key questions that are vital for success.
Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world's leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you'll soon be able to answer some of the most important questions facing you and your organization.
Style and approachPython Machine Learning connects the fundamental theoretical principles behind machine learning to their practical application in a way that focuses you on asking and answering the right questions. It walks you through the key elements of Python and its powerful machine learning libraries, while demonstrating how to get to grips with a range of statistical models.
Let's face it, SQL is a deceptively simple language to learn, and many database developers never go far beyond the simple statement: SELECT columns FROM table WHERE conditions. But there is so much more you can do with the language. In the SQL Cookbook, experienced SQL developer Anthony Molinaro shares his favorite SQL techniques and features. You'll learn about:
Window functions, arguably the most significant enhancement to SQL in the past decade. If you're not using these, you're missing out
Powerful, database-specific features such as SQL Server's PIVOT and UNPIVOT operators, Oracle's MODEL clause, and PostgreSQL's very useful GENERATE_SERIES function
Pivoting rows into columns, reverse-pivoting columns into rows, using pivoting to facilitate inter-row calculations, and double-pivoting a result set
Bucketization, and why you should never use that term in Brooklyn.
How to create histograms, summarize data into buckets, perform aggregations over a moving range of values, generate running-totals and subtotals, and other advanced, data warehousing techniques
The technique of walking a string, which allows you to use SQL to parse through the characters, words, or delimited elements of a string
Written in O'Reilly's popular Problem/Solution/Discussion style, the SQL Cookbook is sure to please. Anthony's credo is: "When it comes down to it, we all go to work, we all have bills to pay, and we all want to go home at a reasonable time and enjoy what's still available of our days." The SQL Cookbook moves quickly from problem to solution, saving you time each step of the way.
Sams Teach Yourself SQL in 10 Minutes, Fourth Edition
New full-color code examples help you see how SQL statements are structured
Whether you're an application developer, database administrator, web application designer, mobile app developer, or Microsoft Office users, a good working knowledge of SQL is an important part of interacting with databases. And Sams Teach Yourself SQL in 10 Minutes offers the straightforward, practical answers you need to help you do your job.
Expert trainer and popular author Ben Forta teaches you just the parts of SQL you need to know–starting with simple data retrieval and quickly going on to more complex topics including the use of joins, subqueries, stored procedures, cursors, triggers, and table constraints.
You'll learn methodically, systematically, and simply–in 22 short, quick lessons that will each take only 10 minutes or less to complete.
With the Fourth Edition of this worldwide bestseller, the book has been thoroughly updated, expanded, and improved. Lessons now cover the latest versions of IBM DB2, Microsoft Access, Microsoft SQL Server, MySQL, Oracle, PostgreSQL, SQLite, MariaDB, and Apache Open Office Base. And new full-color SQL code listings help the beginner clearly see the elements and structure of the language.
10 minutes is all you need to learn how to...
Use the major SQL statements Construct complex SQL statements using multiple clauses and operators Retrieve, sort, and format database contents Pinpoint the data you need using a variety of filtering techniques Use aggregate functions to summarize data Join two or more related tables Insert, update, and delete data Create and alter database tables Work with views, stored procedures, and more Table of Contents
1 Understanding SQL
2 Retrieving Data
4 Filtering Data
5 Advanced Data Filtering
6 Using Wildcard Filtering
7 Creating Calculated Fields
8 Using Data Manipulation Functions
9 Summarizing Data
10 Grouping Data
11 Working with Subqueries
12 Joining Tables
13 Creating Advanced Joins
14 Combining Queries
15 Inserting Data
16 Updating and Deleting Data
17 Creating and Manipulating Tables
18 Using Views
19 Working with Stored Procedures
20 Managing Transaction Processing
21 Using Cursors
22 Understanding Advanced SQL Features
Appendix A: Sample Table Scripts
Appendix B: Working in Popular Applications
Appendix C : SQL Statement Syntax
Appendix E: SQL Reserved Words
In the world's top research labs and universities, the race is on to invent the ultimate learning algorithm: one capable of discovering any knowledge from data, and doing anything we want, before we even ask. In The Master Algorithm, Pedro Domingos lifts the veil to give us a peek inside the learning machines that power Google, Amazon, and your smartphone. He assembles a blueprint for the future universal learner--the Master Algorithm--and discusses what it will mean for business, science, and society. If data-ism is today's philosophy, this book is its bible.
Determine which data structures and algorithms are most appropriate for the problems you’re trying to solve, and understand the tradeoffs when using them in a JavaScript program. An overview of the JavaScript features used throughout the book is also included.
This book covers:
Arrays and lists: the most common data structuresStacks and queues: more complex list-like data structuresLinked lists: how they overcome the shortcomings of arraysDictionaries: storing data as key-value pairsHashing: good for quick insertion and retrievalSets: useful for storing unique elements that appear only onceBinary Trees: storing data in a hierarchical mannerGraphs and graph algorithms: ideal for modeling networksAlgorithms: including those that help you sort or search dataAdvanced algorithms: dynamic programming and greedy algorithmsReaders will learn what computer algorithms are, how to describe them, and how to evaluate them. They will discover simple ways to search for information in a computer; methods for rearranging information in a computer into a prescribed order ("sorting"); how to solve basic problems that can be modeled in a computer with a mathematical structure called a "graph" (useful for modeling road networks, dependencies among tasks, and financial relationships); how to solve problems that ask questions about strings of characters such as DNA structures; the basic principles behind cryptography; fundamentals of data compression; and even that there are some problems that no one has figured out how to solve on a computer in a reasonable amount of time.
Updated for R 2.14 and 2.15, this second edition includes new and expanded chapters on R performance, the ggplot2 data visualization package, and parallel R computing with Hadoop.
Get started quickly with an R tutorial and hundreds of examplesExplore R syntax, objects, and other language detailsFind thousands of user-contributed R packages online, including BioconductorLearn how to use R to prepare data for analysisVisualize your data with R’s graphics, lattice, and ggplot2 packagesUse R to calculate statistical fests, fit models, and compute probability distributionsSpeed up intensive computations by writing parallel R programs for HadoopGet a complete desktop reference to R
The Art of Computer Programming, Volumes 1-4A Boxed Set, 3/e
ISBN: 0321751043
Art of Computer Programming, Volume 1, Fascicle 1, The: MMIX -- A RISC Computer for the New Millennium
This multivolume work on the analysis of algorithms has long been recognized as the definitive description of classical computer science. The three complete volumes published to date already comprise a unique and invaluable resource in programming theory and practice. Countless readers have spoken about the profound personal influence of Knuth's writings. Scientists have marveled at the beauty and elegance of his analysis, while practicing programmers have successfully applied his "cookbook" solutions to their day-to-day problems. All have admired Knuth for the breadth, clarity, accuracy, and good humor found in his books.
To begin the fourth and later volumes of the set, and to update parts of the existing three, Knuth has created a series of small books called fascicles, which will be published t regular intervals. Each fascicle will encompass a section or more of wholly new or evised material. Ultimately, the content of these fascicles will be rolled up into the comprehensive, final versions of each volume, and the enormous undertaking that began in 1962 will be complete.
Volume 1, Fascicle 1
This first fascicle updates The Art of Computer Programming, Volume 1, Third Edition: Fundamental Algorithms, and ultimately will become part of the fourth edition of that book. Specifically, it provides a programmer's introduction to the long-awaited MMIX, a RISC-based computer that replaces the original MIX, and describes the MMIX assembly language. The fascicle also presents new material on subroutines, coroutines, and interpretive routines.
Ebook (PDF version) produced by Mathematical Sciences Publishers (MSP),http://msp.org
Employ the Natural Language Toolkit, NetworkX, and other scientific computing tools to mine popular social web sitesApply advanced text-mining techniques, such as clustering and TF-IDF, to extract meaning from human language dataBootstrap interest graphs from GitHub by discovering affinities among people, programming languages, and coding projectsBuild interactive visualizations with D3.js, an extraordinarily flexible HTML5 and JavaScript toolkitTake advantage of more than two-dozen Twitter recipes, presented in O’Reilly’s popular "problem/solution/discussion" cookbook format
The example code for this unique data science book is maintained in a public GitHub repository. It’s designed to be easily accessible through a turnkey virtual machine that facilitates interactive learning with an easy-to-use collection of IPython Notebooks.
Algorithms in C++, Third Edition, Part 5: Graph Algorithms is the second book in Sedgewick's thoroughly revised and rewritten series. The first book, Parts 1-4, addresses fundamental algorithms, data structures, sorting, and searching. A forthcoming third book will focus on strings, geometry, and a range of advanced algorithms. Each book's expanded coverage features new algorithms and implementations, enhanced descriptions and diagrams, and a wealth of new exercises for polishing skills. A focus on abstract data types makes the programs more broadly useful and relevant for the modern object-oriented programming environment.
Coverage includes:
A complete overview of graph properties and types Diagraphs and DAGs Minimum spanning trees Shortest paths Network flows Diagrams, sample C++ code, and detailed algorithm descriptionsThe Web site for this book (http://www.cs.princeton.edu/~rs/) provides additional source code for programmers along with a wide range of academic support materials for educators.
A landmark revision, Algorithms in C++, Third Edition, Part 5 provides a complete tool set for programmers to implement, debug, and use graph algorithms across a wide range of computer applications.
By working with a single case study throughout this thoroughly revised book, you’ll learn the entire process of exploratory data analysis—from collecting data and generating statistics to identifying patterns and testing hypotheses. You’ll explore distributions, rules of probability, visualization, and many other tools and concepts.
New chapters on regression, time series analysis, survival analysis, and analytic methods will enrich your discoveries.
Develop an understanding of probability and statistics by writing and testing codeRun experiments to test statistical behavior, such as generating samples from several distributionsUse simulations to understand concepts that are hard to grasp mathematicallyImport data from most sources with Python, rather than rely on data that’s cleaned and formatted for statistics toolsUse statistical inference to answer questions about real-world dataMany new algorithms are presented, and the explanations of each algorithm are much more detailed than in previous editions. A new text design and detailed, innovative figures, with accompanying commentary, greatly enhance the presentation. The third edition retains the successful blend of theory and practice that has made Sedgewick's work an invaluable resource for more than 250,000 programmers!
This particular book, Parts 1n4, represents the essential first half of Sedgewick's complete work. It provides extensive coverage of fundamental data structures and algorithms for sorting, searching, and related applications. Although the substance of the book applies to programming in any language, the implementations by Van Wyk and Sedgewick also exploit the natural match between C++ classes and ADT implementations.
Highlights Expanded coverage of arrays, linked lists, strings, trees, and other basic data structures Greater emphasis on abstract data types (ADTs), modular programming, object-oriented programming, and C++ classes than in previous editions Over 100 algorithms for sorting, selection, priority queue ADT implementations, and symbol table ADT (searching) implementations New implementations of binomial queues, multiway radix sorting, randomized BSTs, splay trees, skip lists, multiway tries, B trees, extendible hashing, and much more Increased quantitative information about the algorithms, giving you a basis for comparing them Over 1000 new exercises to help you learn the properties of algorithmsWhether you are learning the algorithms for the first time or wish to have up-to-date reference material that incorporates new programming styles with classic and new algorithms, you will find a wealth of useful information in this book.
This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research.
The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text.
Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided.
Many illustrative examples and entertaining asides MATLAB code Accessible and informal style Complete and self-contained section for mathematics review
It used to be that to diagnose an illness, interpret legal documents, analyze foreign policy, or write a newspaper article you needed a human being with specific skills—and maybe an advanced degree or two. These days, high-level tasks are increasingly being handled by algorithms that can do precise work not only with speed but also with nuance. These “bots” started with human programming and logic, but now their reach extends beyond what their creators ever expected.
In this fascinating, frightening book, Christopher Steiner tells the story of how algorithms took over—and shows why the “bot revolution” is about to spill into every aspect of our lives, often silently, without our knowledge.
The May 2010 “Flash Crash” exposed Wall Street’s reliance on trading bots to the tune of a 998-point market drop and $1 trillion in vanished market value. But that was just the beginning. In Automate This, we meet bots that are driving cars, penning haiku, and writing music mistaken for Bach’s. They listen in on our customer service calls and figure out what Iran would do in the event of a nuclear standoff. There are algorithms that can pick out the most cohesive crew of astronauts for a space mission or identify the next Jeremy Lin. Some can even ingest statistics from baseball games and spit out pitch-perfect sports journalism indistinguishable from that produced by humans.
The interaction of man and machine can make our lives easier. But what will the world look like when algorithms control our hospitals, our roads, our culture, and our national security? What happens to businesses when we automate judgment and eliminate human instinct? And what role will be left for doctors, lawyers, writers, truck drivers, and many others?
Who knows—maybe there’s a bot learning to do your job this minute.You'll get step-by-step instructions and lots of sample code to create and explore several MapReduce views through the course of the book, using an example database you construct. To work with these different views, you’ll learn how to use the Futon web administration console and the cURL command line tool that come with CouchDB.
Learn how the Map and Reduce steps work independently and together to index your dataUse the example database to create several temporary views based on different criteriaDiscover the uses of Map and Reduce JavaScript functionsConvert your temporary views to permanent views within a design documentLearn several options for querying the data within your viewsLimit the number of results returned, skip some results, or reverse the order of the outputGroup your results by exact keys or by parts of keysBradley Holt, co-founder of the creative services firm Found Line, is a web developer and entrepreneur ten years of PHP and MySQL experience. He began using CouchDB before the release of version 1.0. Bradley is an active member of the PHP community, and can be reached at bradley-holt.com.
This updated second edition provides guidance for database developers, advanced configuration for system administrators, and an overview of the concepts and use cases for other people on your project. Ideal for NoSQL newcomers and experienced MongoDB users alike, this guide provides numerous real-world schema design examples.
Get started with MongoDB core concepts and vocabularyPerform basic write operations at different levels of safety and speedCreate complex queries, with options for limiting, skipping, and sorting resultsDesign an application that works well with MongoDBAggregate data, including counting, finding distinct values, grouping documents, and using MapReduceGather and interpret statistics about your collections and databasesSet up replica sets and automatic failover in MongoDBUse sharding to scale horizontally, and learn how it impacts applicationsDelve into monitoring, security and authentication, backup/restore, and other administrative tasksRather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.
Get a high-level overview of HDFS and MapReduce: why they exist and how they workPlan a Hadoop deployment, from hardware and OS selection to network requirementsLearn setup and configuration details with a list of critical propertiesManage resources by sharing a cluster across multiple groupsGet a runbook of the most common cluster maintenance tasksMonitor Hadoop clusters—and learn troubleshooting with the help of real-world war storiesUse basic tools and techniques to handle backup and catastrophic failureDetailing the hows and the whys of successful Essbase implementation, the book arms you with simple yet powerful tools to meet your immediate needs, as well as the theoretical knowledge to proceed to the next level with Essbase. Infrastructure, data sourcing and transformation, database design, calculations, automation, APIs, reporting, and project implementation are covered by subject matter experts who work with the tools and techniques on a daily basis. In addition to practical cases that illustrate valuable lessons learned, the book offers:
Undocumented Secrets—Dan Pressman describes the previously unpublished and undocumented inner workings of the ASO Essbase engine. Authoritative Experts—If you have questions that no one else can solve, these 12 Essbase professionals are the ones who can answer them. Unpublished—Includes the only third-party guide to infrastructure. Infrastructure is easy to get wrong and can doom any Essbase project. Comprehensive—Let there never again be a question on how to create blocks or design BSO databases for performance—Dave Farnsworth provides the answers within. Innovative—Cameron Lackpour and Joe Aultman bring new and exciting solutions to persistent Essbase problems.
With a list of contributors as impressive as the program of presenters at a leading Essbase conference, this book offers unprecedented access to the insights and experiences of those at the forefront of the field. The previously unpublished material presented in these pages will give you the practical knowledge needed to use this powerful and intuitive tool to build highly useful analytical models, reporting systems, and forecasting applications.
The code-packed examples in this book will help you learn how to work with documents, populate a simple database, replicate data from one database to another, and a host of other tasks.
Install CouchDB on Linux, Mac OS X, Windows, or (if you must) from the source codeInteract with data through CouchDB’s RESTful API, and use standard HTTP operations, such as PUT, GET, POST, and DELETEUse Futon—CouchDB’s web-based interface— to manage databases and documents, and to configure replicationsLearn how to create, update, and delete documents in JSON format, and how to create and delete databasesWork with design documents to get the formatting and indexing your application requires
Using a practical, hands-on approach, this book will take you through all the facets of developing Access-based solutions, such as data modeling, complex form development, and user interface customizations. You'll then deploy your solution to the web and integrate it with other external data sources. This book is full of handy tricks to help you get the most out of what Access has to offer, including its comprehensive set of features and tools for collecting, using, and acting on business data, whether your data is in Access or stored on another platform. You'll also see how to smoothly integrate your applications with SQL Server databases and other Office programs, such as Outlook.
Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power.
Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention.
Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You’ll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you’ve mastered these techniques, you’ll constantly turn to this guide for the working PyMC code you need to jumpstart future projects.
Coverage includes
• Learning the Bayesian “state of mind” and its practical implications
• Understanding how computers perform Bayesian inference
• Using the PyMC Python library to program Bayesian analyses
• Building and debugging models with PyMC
• Testing your model’s “goodness of fit”
• Opening the “black box” of the Markov Chain Monte Carlo algorithm to see how and why it works
• Leveraging the power of the “Law of Large Numbers”
• Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning
• Using loss functions to measure an estimate’s weaknesses based on your goals and desired outcomes
• Selecting appropriate priors and understanding how their influence changes with dataset size
• Overcoming the “exploration versus exploitation” dilemma: deciding when “pretty good” is good enough
• Using Bayesian inference to improve A/B testing
• Solving data science problems when only small amounts of data are available
Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.
This book will help you:
Become a contributor on a data science team Deploy a structured lifecycle approach to data analytics problems Apply appropriate analytic techniques and tools to analyzing big data Learn how to tell a compelling story with data to drive business action Prepare for EMC Proven Professional Data Science CertificationCorresponding data sets are available at www.wiley.com/go/9781118876138.
Get started discovering, analyzing, visualizing, and presenting data in a meaningful way today!
Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You’ll learn about recent changes to Hadoop, and explore new case studies on Hadoop’s role in healthcare systems and genomics data processing.
Learn fundamental components such as MapReduce, HDFS, and YARNExplore MapReduce in depth, including steps for developing applications with itSet up and maintain a Hadoop cluster running HDFS and MapReduce on YARNLearn two data formats: Avro for data serialization and Parquet for nested dataUse data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer)Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with HadoopLearn the HBase distributed database and the ZooKeeper distributed configuration service"The authors have appreciated that MDM is a complex multidimensional area, and have set out to cover each of these dimensions in sufficient detail to provide adequate practical guidance to anyone implementing MDM. While this necessarily makes the book rather long, it means that the authors achieve a comprehensive treatment of MDM that is lacking in previous works." -- Malcolm Chisholm, Ph.D., President, AskGet.com Consulting, Inc.
Regain control of your master data and maintain a master-entity-centric enterprise data framework using the detailed information in this authoritative guide. Master Data Management and Data Governance, Second Edition provides up-to-date coverage of the most current architecture and technology views and system development and management methods. Discover how to construct an MDM business case and roadmap, build accurate models, deploy data hubs, and implement layered security policies. Legacy system integration, cross-industry challenges, and regulatory compliance are also covered in this comprehensive volume.
Plan and implement enterprise-scale MDM and Data Governance solutions Develop master data model Identify, match, and link master records for various domains through entity resolution Improve efficiency and maximize integration using SOA and Web services Ensure compliance with local, state, federal, and international regulations Handle security using authentication, authorization, roles, entitlements, and encryption Defend against identity theft, data compromise, spyware attack, and worm infection Synchronize components and test data quality and system performanceThis book offers practical answers to some of the hardest questions faced by PL/SQL developers, including:What is the best way to write the SQL logic in my application code?
How should I write my packages so they can be leveraged by my entire team of developers?
How can I make sure that all my team's programs handle and record errors consistently?Oracle PL/SQL Best Practices summarizes PL/SQL best practices in nine major categories: overall PL/SQL application development; programming standards; program testing, tracing, and debugging; variables and data structures; control logic; error handling; the use of SQL in PL/SQL; building procedures, functions, packages, and triggers; and overall program performance.
This book is a concise and entertaining guide that PL/SQL developers will turn to again and again as they seek out ways to write higher quality code and more successful applications.
"This book presents ideas that make the difference between a successful project and one that never gets off the ground. It goes beyond just listing a set of rules, and provides realistic scenarios that help the reader understand where the rules come from. This book should be required reading for any team of Oracle database professionals."
--Dwayne King, President, KRIDAN Consulting
Beginning ASP.NET 4.5 Databases is a comprehensive introduction on how you can connect a Web site to many different data sources — not just databases — and use the data to create dynamic page content. It also shows you how to build a relational database, use SQL to communicate with it, and understand how they differ from each other.
With in-depth, on-target coverage of the new data access features of .NET Framework 4.5, this book is your guide to using ASP.NET to build responsive, easy-to-update data-driven Web sites.
It includes Matlab code of the most common methods and algorithms in the book, together with a descriptive summary and solved examples, and including real-life data sets in imaging and audio recognition.
This text is designed for electronic engineering, computer science, computer engineering, biomedical engineering and applied mathematics students taking graduate courses on pattern recognition and machine learning as well as R&D engineers and university researchers in image and signal processing/analyisis, and computer vision.
Matlab code and descriptive summary of the most common methods and algorithms in Theodoridis/Koutroumbas, Pattern Recognition, Fourth EditionSolved examples in Matlab, including real-life data sets in imaging and audio recognitionAvailable separately or at a special package price with the main text (ISBN for package: 978-0-12-374491-3)