Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.Understand how data science fits in your organization—and how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidates
NoSQL Distilled is a concise but thorough introduction to this rapidly emerging technology. Pramod J. Sadalage and Martin Fowler explain how NoSQL databases work and the ways that they may be a superior alternative to a traditional RDBMS. The authors provide a fast-paced guide to the concepts you need to know in order to evaluate whether NoSQL databases are right for your needs and, if so, which technologies you should explore further.
The first part of the book concentrates on core concepts, including schemaless data models, aggregates, new distribution models, the CAP theorem, and map-reduce. In the second part, the authors explore architectural and design issues associated with implementing NoSQL. They also present realistic use cases that demonstrate NoSQL databases at work and feature representative examples using Riak, MongoDB, Cassandra, and Neo4j.
In addition, by drawing on Pramod Sadalage’s pioneering work, NoSQL Distilled shows how to implement evolutionary design with schema migration: an essential technique for applying NoSQL databases. The book concludes by describing how NoSQL is ushering in a new age of Polyglot Persistence, where multiple data-storage worlds coexist, and architects can choose the technology best optimized for each type of data access.
This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.
Let's face it, SQL is a deceptively simple language to learn, and many database developers never go far beyond the simple statement: SELECT columns FROM table WHERE conditions. But there is so much more you can do with the language. In the SQL Cookbook, experienced SQL developer Anthony Molinaro shares his favorite SQL techniques and features. You'll learn about:
Window functions, arguably the most significant enhancement to SQL in the past decade. If you're not using these, you're missing out
Powerful, database-specific features such as SQL Server's PIVOT and UNPIVOT operators, Oracle's MODEL clause, and PostgreSQL's very useful GENERATE_SERIES function
Pivoting rows into columns, reverse-pivoting columns into rows, using pivoting to facilitate inter-row calculations, and double-pivoting a result set
Bucketization, and why you should never use that term in Brooklyn.
How to create histograms, summarize data into buckets, perform aggregations over a moving range of values, generate running-totals and subtotals, and other advanced, data warehousing techniques
The technique of walking a string, which allows you to use SQL to parse through the characters, words, or delimited elements of a string
Written in O'Reilly's popular Problem/Solution/Discussion style, the SQL Cookbook is sure to please. Anthony's credo is: "When it comes down to it, we all go to work, we all have bills to pay, and we all want to go home at a reasonable time and enjoy what's still available of our days." The SQL Cookbook moves quickly from problem to solution, saving you time each step of the way.
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.Get a crash course in PythonLearn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data scienceCollect, explore, clean, munge, and manipulate dataDive into the fundamentals of machine learningImplement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clusteringExplore recommender systems, natural language processing, network analysis, MapReduce, and databases
Updated to reflect recent advances in MySQL and InnoDB performance, features, and tools, this third edition not only offers specific examples of how MySQL works, it also teaches you why this system works as it does, with illustrative stories and case studies that demonstrate MySQL’s principles in action. With this book, you’ll learn how to think in MySQL.Learn the effects of new features in MySQL 5.5, including stored procedures, partitioned databases, triggers, and viewsImplement improvements in replication, high availability, and clusteringAchieve high performance when running MySQL in the cloudOptimize advanced querying features, such as full-text searchesTake advantage of modern multi-core CPUs and solid-state disksExplore backup and recovery strategies—including new tools for hot online backups
Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:
Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduceBecome familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistenceDiscover common pitfalls and advanced features for writing real-world MapReduce programsDesign, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloudUse Pig, a high-level query language for large-scale data processingTake advantage of HBase, Hadoop's database for structured and semi-structured dataLearn ZooKeeper, a toolkit of coordination primitives for building distributed systems
If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject.
"Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk."-- Doug Cutting, Hadoop Founder, Yahoo!
Each chapter presents a self-contained lesson on a key SQL concept or technique, with numerous illustrations and annotated examples. Exercises at the end of each chapter let you practice the skills you learn. With this book, you will:
Move quickly through SQL basics and learn several advanced featuresUse SQL data statements to generate, manipulate, and retrieve dataCreate database objects, such as tables, indexes, and constraints, using SQL schema statementsLearn how data sets interact with queries, and understand the importance of subqueriesConvert and manipulate data with SQL's built-in functions, and use conditional logic in data statements
Knowledge of SQL is a must for interacting with data. With Learning SQL, you'll quickly learn how to put the power and flexibility of this language to work.
If you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning – whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource.What You Will LearnExplore how to use different machine learning models to ask different questions of your dataLearn how to build neural networks using Keras and TheanoFind out how to write clean and elegant Python code that will optimize the strength of your algorithmsDiscover how to embed your machine learning model in a web application for increased accessibilityPredict continuous target outcomes using regression analysisUncover hidden patterns and structures in data with clusteringOrganize data using effective pre-processing techniquesGet to grips with sentiment analysis to delve deeper into textual and social media dataIn Detail
Machine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, becoming one of the key strategies for unlocking growth in a challenging contemporary marketplace. Python can help you deliver key insights into your data – its unique capabilities as a language let you build sophisticated algorithms and statistical models that can reveal new perspectives and answer key questions that are vital for success.
Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world's leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you'll soon be able to answer some of the most important questions facing you and your organization.Style and approach
Python Machine Learning connects the fundamental theoretical principles behind machine learning to their practical application in a way that focuses you on asking and answering the right questions. It walks you through the key elements of Python and its powerful machine learning libraries, while demonstrating how to get to grips with a range of statistical models.
But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope.
Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet.
Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype.
But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, the math and the magic, behind big data.
Each chapter will cover a different technique in a spreadsheet so you can follow along:Mathematical optimization, including non-linear programming and genetic algorithms Clustering via k-means, spherical k-means, and graph modularity Data mining in graphs, such as outlier detection Supervised AI through logistic regression, ensemble models, and bag-of-words models Forecasting, seasonal adjustments, and prediction intervals through monte carlo simulation Moving from spreadsheets into the R programming language
You get your hands dirty as you work alongside John through each technique. But never fear, the topics are readily applicable and the author laces humor throughout. You'll even learn what a dead squirrel has to do with optimization modeling, which you no doubt are dying to know.
This updated second edition provides guidance for database developers, advanced configuration for system administrators, and an overview of the concepts and use cases for other people on your project. Ideal for NoSQL newcomers and experienced MongoDB users alike, this guide provides numerous real-world schema design examples.Get started with MongoDB core concepts and vocabularyPerform basic write operations at different levels of safety and speedCreate complex queries, with options for limiting, skipping, and sorting resultsDesign an application that works well with MongoDBAggregate data, including counting, finding distinct values, grouping documents, and using MapReduceGather and interpret statistics about your collections and databasesSet up replica sets and automatic failover in MongoDBUse sharding to scale horizontally, and learn how it impacts applicationsDelve into monitoring, security and authentication, backup/restore, and other administrative tasks
Detailing the hows and the whys of successful Essbase implementation, the book arms you with simple yet powerful tools to meet your immediate needs, as well as the theoretical knowledge to proceed to the next level with Essbase. Infrastructure, data sourcing and transformation, database design, calculations, automation, APIs, reporting, and project implementation are covered by subject matter experts who work with the tools and techniques on a daily basis. In addition to practical cases that illustrate valuable lessons learned, the book offers:
Undocumented Secrets—Dan Pressman describes the previously unpublished and undocumented inner workings of the ASO Essbase engine. Authoritative Experts—If you have questions that no one else can solve, these 12 Essbase professionals are the ones who can answer them. Unpublished—Includes the only third-party guide to infrastructure. Infrastructure is easy to get wrong and can doom any Essbase project. Comprehensive—Let there never again be a question on how to create blocks or design BSO databases for performance—Dave Farnsworth provides the answers within. Innovative—Cameron Lackpour and Joe Aultman bring new and exciting solutions to persistent Essbase problems.
With a list of contributors as impressive as the program of presenters at a leading Essbase conference, this book offers unprecedented access to the insights and experiences of those at the forefront of the field. The previously unpublished material presented in these pages will give you the practical knowledge needed to use this powerful and intuitive tool to build highly useful analytical models, reporting systems, and forecasting applications.
Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.Get a high-level overview of HDFS and MapReduce: why they exist and how they workPlan a Hadoop deployment, from hardware and OS selection to network requirementsLearn setup and configuration details with a list of critical propertiesManage resources by sharing a cluster across multiple groupsGet a runbook of the most common cluster maintenance tasksMonitor Hadoop clusters—and learn troubleshooting with the help of real-world war storiesUse basic tools and techniques to handle backup and catastrophic failure
This book offers practical answers to some of the hardest questions faced by PL/SQL developers, including:What is the best way to write the SQL logic in my application code?
How should I write my packages so they can be leveraged by my entire team of developers?
How can I make sure that all my team's programs handle and record errors consistently?Oracle PL/SQL Best Practices summarizes PL/SQL best practices in nine major categories: overall PL/SQL application development; programming standards; program testing, tracing, and debugging; variables and data structures; control logic; error handling; the use of SQL in PL/SQL; building procedures, functions, packages, and triggers; and overall program performance.
This book is a concise and entertaining guide that PL/SQL developers will turn to again and again as they seek out ways to write higher quality code and more successful applications.
"This book presents ideas that make the difference between a successful project and one that never gets off the ground. It goes beyond just listing a set of rules, and provides realistic scenarios that help the reader understand where the rules come from. This book should be required reading for any team of Oracle database professionals."
--Dwayne King, President, KRIDAN Consulting
This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
Your freemium product generates vast volumes of data, but using that data to maximize conversion, boost retention, and deliver revenue can be challenging if you don't fully understand the impact that small changes can have on revenue. In this book, author Eric Seufert provides clear guidelines for using data and analytics through all stages of development to optimize your implementation of the freemium model. Freemium Economics de-mystifies the freemium model through an exploration of its core, data-oriented tenets, so that you can apply it methodically rather than hoping that conversion and revenue will naturally follow product launch.
By reading Freemium Economics, you will:Learn how to apply data science and big data principles in freemium product design and development to maximize conversion, boost retention, and deliver revenue Gain a broad introduction to the conceptual economic pillars of freemium and a complete understanding of the unique approaches needed to acquire users and convert them from free to paying customers Get practical tips and analytical guidance to successfully implement the freemium model Understand the metrics and infrastructure required to measure the success of a freemium product and improve it post-launch Includes a detailed explanation of the lifetime customer value (LCV) calculation and step-by-step instructions for implementing key performance indicators in a simple, universally-accessible tool like Excel
This book will help you:Become a contributor on a data science team Deploy a structured lifecycle approach to data analytics problems Apply appropriate analytic techniques and tools to analyzing big data Learn how to tell a compelling story with data to drive business action Prepare for EMC Proven Professional Data Science Certification
Corresponding data sets are available at www.wiley.com/go/9781118876138.
Get started discovering, analyzing, visualizing, and presenting data in a meaningful way today!
Get the Access 2010 information you need to succeed with this comprehensive reference. If this is your first encounter with Access, you'll appreciate the thorough attention to database fundamentals and terminology. If you're familiar with earlier versions, you can jump right into Access 2010 enhancements such as the new Access user interface and wider use of XML and Web services.Takes you under the hood of Microsoft Access 2010, the database application included with Microsoft Office 2010 Explores the latest enhancements, such as a new user interface and wider use of XML and Web services; also, how to exchange data with Word, Excel, PowerPoint, and other Office apps Covers how to create tables, manipulate datasheets, and work with multiple tables Explains the seven database objects and how to use a seven-step design method to build a database tailored to your needs Shows you how to build forms, use Visual Basic and the VBA Editor, automate query parameters, create functions and subroutines, use XML to create data access pages, and more Includes a CD with all source code from the book and working examples, plus bonus shareware, freeware, trial, demo and evaluation programs that work with or enhance Microsoft Office
You’ll want to keep this soup-to-nuts Access reference close at hand!
Note: CD-ROM/DVD and other supplementary materials are not included as part of eBook file.
The example code for this unique data science book is maintained in a public GitHub repository. It’s designed to be easily accessible through a turnkey virtual machine that facilitates interactive learning with an easy-to-use collection of IPython Notebooks.
Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power.
Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention.
Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You’ll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you’ve mastered these techniques, you’ll constantly turn to this guide for the working PyMC code you need to jumpstart future projects.
• Learning the Bayesian “state of mind” and its practical implications
• Understanding how computers perform Bayesian inference
• Using the PyMC Python library to program Bayesian analyses
• Building and debugging models with PyMC
• Testing your model’s “goodness of fit”
• Opening the “black box” of the Markov Chain Monte Carlo algorithm to see how and why it works
• Leveraging the power of the “Law of Large Numbers”
• Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning
• Using loss functions to measure an estimate’s weaknesses based on your goals and desired outcomes
• Selecting appropriate priors and understanding how their influence changes with dataset size
• Overcoming the “exploration versus exploitation” dilemma: deciding when “pretty good” is good enough
• Using Bayesian inference to improve A/B testing
• Solving data science problems when only small amounts of data are available
Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.
Maybe you've written some simple SQL queries to interact with databases. But now you want more, you want to really dig into those databases and work with your data. Head First SQL will show you the fundamentals of SQL and how to really take advantage of it. We'll take you on a journey through the language, from basic INSERT statements and SELECT queries to hardcore database manipulation with indices, joins, and transactions. We all know "Data is Power" - but we'll show you how to have "Power over your Data". Expect to have fun, expect to learn, and expect to be querying, normalizing, and joining your data like a pro by the time you're finished reading!
This book is an in-depth guide to the use of pandas for data analysis, for either the seasoned data analysis practitioner or the novice user. It provides a basic introduction to the pandas framework, and takes users through the installation of the library and the IPython interactive environment. Thereafter, you will learn basic as well as advanced features, such as MultiIndexing, modifying data structures, and sampling data, which provide powerful capabilities for data analysis.
Create and distribute dynamic, feature-rich data visualizations and highly interactive BI dashboards—quickly and easily! Tableau 8: The Official Guide provides the hands-on instruction and best practices you need to meet your business intelligence objectives and drive better decision making. Discover how to work from the Tableau GUI, load BI from disparate sources, drag and drop to analyze data, set up custom visualizations, and build robust dashboards. This practical guide shows you, step by step, how to design and publish meaningful business communications to end users across your enterprise.Navigate the Tableau user interface and data window Connect to spreadsheets, databases, and other sources Select data fields and drag them to desired screen locations Work with pre-defined visualizations and sample workbooks Display background maps and perform geographic analysis Add calculated fields, graphs, charts, tables, and statistics Combine multiple data sources into real-time dashboards Export your visualizations to the Web or in various file formats
Electronic content includes:Videos that demonstrate the techniques presented in the book Sample Tableau workbooks
The authors use task oriented descriptions and concrete end-to-end examples to ensure that the reader can immediately begin using this new service. The book describes all aspects of the service from data ingress to applying machine learning, evaluating the models, and deploying them as web services.
Learn how you can quickly build and deploy sophisticated predictive models with the new Azure Machine Learning from Microsoft.
What’s New in the Second Edition?
Five new chapters have been added with practical detailed coverage of:Python Integration – a new feature announced February 2015Data preparation and feature selection Data visualization with Power BIRecommendation enginesSelling your models on Azure Marketplace
Design, implement, manage, and maintain a highly flexible service-oriented computing infrastructure across your enterprise using the detailed information in this Oracle Press guide. Written by an Oracle ACE director, Oracle SOA Suite 12c Handbook uses a start-to-finish case study to illustrate each concept and technique. Learn expert techniques for designing and implementing components, assembling composite applications, integrating Java, handling complex business logic, and maximizing code reuse. Runtime administration, governance, and security are covered in this practical resource.Get started with the Oracle SOA Suite 12c development and run time environment Deploy and manage SOA composite applications Expose SOAP/XML REST/JSON through Oracle Service Bus Establish interactions through adapters for Database, JMS, File/FTP, UMS, LDAP, and Coherence Embed custom logic using Java and the Spring component Perform fast data analysis in real time with Oracle Event Processor Implement Event Drive Architecture based on the Event Delivery Network (EDN) Use Oracle Business Rules to encapsulate logic and automate decisions Model complex processes using BPEL, BPMN, and human task components Establish KPIs and evaluate performance using Oracle Business Activity Monitoring Control traffic, audit system activity, and encrypt sensitive data
Now, in just 24 lessons of one hour or less, you can learn how to leverage MongoDB's immense power. Each short, easy lesson builds on all that's come before, teaching NoSQL concepts and MongoDB techniques from the ground up.
Sams Teach Yourself NoSQL with MongoDB in 24 Hours covers all this, and much more:
Predictive analytics and Data Mining techniques covered: Exploratory Data Analysis, Visualization, Decision trees, Rule induction, k-Nearest Neighbors, Naïve Bayesian, Artificial Neural Networks, Support Vector machines, Ensemble models, Bagging, Boosting, Random Forests, Linear regression, Logistic regression, Association analysis using Apriori and FP Growth, K-Means clustering, Density based clustering, Self Organizing Maps, Text Mining, Time series forecasting, Anomaly detection and Feature selection. Implementation files can be downloaded from the book companion site at www.LearnPredictiveAnalytics.comDemystifies data mining concepts with easy to understand languageShows how to get up and running fast with 20 commonly used powerful techniques for predictive analysisExplains the process of using open source RapidMiner toolsDiscusses a simple 5 step process for implementing algorithms that can be used for performing predictive analyticsIncludes practical use cases and examples
“Cindi has created, with her typical attention to details that matter, a contemporary forward-looking guide that organizations could use to evaluate existing or create a foundation for evolving business intelligence / analytics programs. The book touches on strategy, value, people, process, and technology, all of which must be considered for program success. Among other topics, the data, data warehousing, and ROI comments were spot on. The ‘technobabble’ chapter was brilliant!” —Bill Frank, Business Intelligence and Data Warehousing Program Manager, Johnson & Johnson
“If you want to be an analytical competitor, you’ve got to go well beyond business intelligence technology. Cindi Howson has wrapped up the needed advice on technology, organization, strategy, and even culture in a neat package. It’s required reading for quantitatively oriented strategists and the technologists who support them.” —Thomas H. Davenport, President’s Distinguished Professor, Babson College and co-author, Competing on Analytics
“Cindi has created an exceptional, authoritative description of the end-to-end business intelligence ecosystem. This is a great read for those who are just trying to better understand the business intelligence space, as well as for the seasoned BI practitioner.” —Sully McConnell, Vice President, Business Intelligence and Information Management, Time Warner Cable
“Cindi’s book succinctly yet completely lays out what it takes to deliver BI successfully. IT and business leaders will benefit from Cindi’s deep BI experience, which she shares through helpful, real-world definitions, frameworks, examples, and stories. This is a must-read for companies engaged in – or considering – BI.” —Barbara Wixom, PhD, Principal Research Scientist, MIT Sloan Center for Information Systems Research
Expanded to cover the latest advances in business intelligence such as big data, cloud, mobile, visual data discovery, and in-memory computing, this fully updated bestseller by BI guru Cindi Howson provides cutting-edge techniques to exploit BI for maximum value. Successful Business Intelligence: Unlock the Value of BI & Big Data, Second Edition describes best practices for an effective BI strategy. Find out how to:Garner executive support to foster an analytic culture Align the BI strategy with business goals Develop an analytic ecosystem to exploit data warehousing, analytic appliances, and Hadoop for the right BI workload Continuously improve the quality, breadth, and timeliness of data Find the relevance of BI for everyone in the company Use agile development processes to deliver BI capabilities and improvements at the pace of business change Select the right BI tools to meet user and business needs Measure success in multiple ways Embrace innovation, promote successes and applications, and invest in training Monitor your evolution and maturity across various factors for impact
Exclusive industry survey data and real-world case studies from Medtronic, Macy’s, 1-800 CONTACTS, The Dow Chemical Company, Netflix, Constant Contact, and other companies show successful BI initiatives in action.
From Moneyball to Nate Silver, BI and big data have permeated our cultural, political, and economic landscape. This timely, up-to-date guide reveals how to plan and deploy an agile, state-of-the-art BI solution that links insight to action and delivers a sustained competitive advantage.
Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You’ll learn about recent changes to Hadoop, and explore new case studies on Hadoop’s role in healthcare systems and genomics data processing.Learn fundamental components such as MapReduce, HDFS, and YARNExplore MapReduce in depth, including steps for developing applications with itSet up and maintain a Hadoop cluster running HDFS and MapReduce on YARNLearn two data formats: Avro for data serialization and Parquet for nested dataUse data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer)Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with HadoopLearn the HBase distributed database and the ZooKeeper distributed configuration service
Did You Know?
-Knowledge of SQL is an important skill to display on your resume.
-With the growth of digital information, Database Administrator is one of the fastest growing careers.
-SQL can be learned in hours and used for decades.
Learn to script Transact SQL using Microsoft SQL Server.
-Create tables and databases
-create views, stored procedures and more.
Over 100 examples of SQL queries and statements along with images of results will help you learn T SQL.
A special section included in this illustrated guide will help you test your skills and get ahead in the workplace.
Now is the time to learn SQL.
Click the 'buy button' and start scripting SQL TODAY!
Written by Oracle ACE Director and MySQL expert Ronald Bradford, Effective MySQL: Optimizing SQL Statements is filled with detailed explanations and practical examples that can be applied immediately to improve database and application performances. Featuring a step-by-step approach to SQL optimization, this Oracle Press book helps you to analyze and tune problematic SQL statements.Identify the essential analysis commands for gathering and diagnosing issues Learn how different index theories are applied and represented in MySQL Plan and execute informed SQL optimizations Create MySQL indexes to improve query performance Master the MySQL query execution plan Identify key configuration variables that impact SQL execution and performance Apply the SQL optimization lifecycle to capture, identify, confirm, analyze, and optimize SQL statements and verify the results Improve index utilization with covering indexes and partial indexes Learn hidden performance tips for improving index efficiency and simplifying SQL statements
—From the Foreword by Raymie Stata, CEO of Altiscale
The Insider’s Guide to Building Distributed, Big Data Applications with Apache Hadoop™ YARN
Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop™ YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances.
YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment.
You’ll find many examples drawn from the authors’ cutting-edge experience—first as Hadoop’s earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it.
Coverage includesYARN’s goals, design, architecture, and components—how it expands the Apache Hadoop ecosystem Exploring YARN on a single node Administering YARN clusters and Capacity Scheduler Running existing MapReduce applications Developing a large-scale clustered YARN application Discovering new open source frameworks that run under YARN
Complete with illustrations and helpful hints, this fifth edition provides a valuable one-stop overview of Oracle Database 12c, including an introduction to Oracle and cloud computing. Oracle Essentials provides the conceptual background you need to understand how Oracle truly works.
Topics include:A complete overview of Oracle databases and data stores, and Fusion Middleware products and featuresCore concepts and structures in Oracle’s architecture, including pluggable databasesOracle objects and the various datatypes Oracle supportsSystem and database management, including Oracle Enterprise Manager 12cSecurity options, basic auditing capabilities, and options for meeting compliance needsPerformance characteristics of disk, memory, and CPU tuningBasic principles of multiuser concurrencyOracle’s online transaction processing (OLTP)Data warehouses, Big Data, and Oracle’s business intelligence toolsBackup and recovery, and high availability and failover solutions
Hacking Web Intelligence shows you how to dig into the Web and uncover the information many don't even know exists. The book takes a holistic approach that is not only about using tools to find information online but also how to link all the information and transform it into presentable and actionable intelligence. You will also learn how to secure your information online to prevent it being discovered by these reconnaissance methods.
Hacking Web Intelligence is an in-depth technical reference covering the methods and techniques you need to unearth open source information from the Internet and utilize it for the purpose of targeted attack during a security assessment. This book will introduce you to many new and leading-edge reconnaissance, information gathering, and open source intelligence methods and techniques, including metadata extraction tools, advanced search engines, advanced browsers, power searching methods, online anonymity tools such as TOR and i2p, OSINT tools such as Maltego, Shodan, Creepy, SearchDiggity, Recon-ng, Social Network Analysis (SNA), Darkweb/Deepweb, data visualization, and much more.Provides a holistic approach to OSINT and Web recon, showing you how to fit all the data together into actionable intelligenceFocuses on hands-on tools such as TOR, i2p, Maltego, Shodan, Creepy, SearchDiggity, Recon-ng, FOCA, EXIF, Metagoofil, MAT, and many moreCovers key technical topics such as metadata searching, advanced browsers and power searching, online anonymity, Darkweb / Deepweb, Social Network Analysis (SNA), and how to manage, analyze, and visualize the data you gatherIncludes hands-on technical examples and case studies, as well as a Python chapter that shows you how to create your own information-gathering tools and modify existing APIs
Master Oracle Database 11g fundamentals quickly and easily. Using self-paced tutorials, this book covers core database essentials, the role of the administrator, high availability, and large database features. Oracle Database 11g: A Beginner's Guide walks you, step by step, through database setup, administration, programming, backup, and recovery. In-depth introductions to SQL and PL/SQL are included. Designed for easy learning, this exclusive Oracle Press guide offers:Core Concepts--Oracle Database 11g topics presented in logically organized chapters Critical Skills--Lists of specific skills covered in each chapter Projects--Practical exercises that show how to apply the critical skills learned in each chapter Progress Checks--Quick self-assessment sections to check your progress Notes--Extra information related to the topic being covered Mastery Checks--Chapter-ending quizzes to test your knowledge
Updated for the latest versions of this popular database, this edition covers many complex features that have been added to MySQL 5.0 and 5.1, including a section dedicated to stored procedures and triggers. After a brief introduction on installation and initial setup, the book explains: How to configure MySQL, such as setting the root passwordMySQL data types, including numerics, strings, dates, and complex typesSQL syntax, commands, data types, operators, and functionsArithmetic, comparison and logical operatorsAggregate and general functionsStored procedures and triggers, including procedure definition, procedure calls, procedure management, cursors, and triggersYou don't have time to stop and thumb through an exhaustive reference when you're hard at work. This portable and affordable guide is small enough to fit into your pocket, and gives you a convenient reference that you can consult anywhere. When you reach a sticking point and need to get to a solution quickly, the MySQL Pocket Reference is the book you want to have.
Master Data Management equips you with a deeply practical, business-focused way of thinking about MDM—an understanding that will greatly enhance your ability to communicate with stakeholders and win their support. Moreover, it will help you deserve their support: you’ll master all the details involved in planning and executing an MDM project that leads to measurable improvements in business productivity and effectiveness.
* Presents a comprehensive roadmap that you can adapt to any MDM project.
* Emphasizes the critical goal of maintaining and improving data quality.
* Provides guidelines for determining which data to “master.
* Examines special issues relating to master data metadata.
* Considers a range of MDM architectural styles.
* Covers the synchronization of master data across the application infrastructure.
This comprehensive new volume shows you how to compile PostgreSQL from source, create a database, and configure PostgreSQL to accept client-server connections. It also covers the many advanced features, such as transactions, versioning, replication, and referential integrity that enable developers and DBAs to use PostgreSQL for serious business applications. The thorough introduction to PostgreSQL's PL/pgSQL programming language explains how you can use this very useful but under-documented feature to develop stored procedures and triggers. The book includes a complete command reference, and database administrators will appreciate the chapters on user management, database maintenance, and backup & recovery. With Practical PostgreSQL, you will discover quickly why this open source database is such a great open source alternative to proprietary products from Oracle, IBM, and Microsoft.
Crystal Reports 2008 For Dummies is a quick and easy guide to get you going with the latest version of this bestselling report-writing software. In fact, it’s so popular that previous editions have made it a bestseller too. Crystal Reports 2008 For Dummies gives you just what you should know to produce the reports you’ll need most often, including how to:Pull specific information from your database, sort and group it, and find the details you need Use dynamic or cascading prompts Troubleshoot and print reports and save time with templates View reports on your LAN Write formulas to retrieve specific information Create and update OLAP reports Format reports, control page breaks, and even add graphics or Flash files Enhance your reports with charts and maps Use Crystal Reports in the enterprise
There’s also a companion Web site with sample reports from the book and links to sites with more related information. With Crystal Reports 2008 For Dummies by your side, you’ll soon be able to create reports from simple to spectacular, whenever the need arises.
Peter Christen’s book is divided into three parts: Part I, “Overview”, introduces the subject by presenting several sample applications and their special challenges, as well as a general overview of a generic data matching process. Part II, “Steps of the Data Matching Process”, then details its main steps like pre-processing, indexing, field and record comparison, classification, and quality evaluation. Lastly, part III, “Further Topics”, deals with specific aspects like privacy, real-time matching, or matching unstructured data. Finally, it briefly describes the main features of many research and open source systems available today.By providing the reader with a broad range of data matching concepts and techniques and touching on all aspects of the data matching process, this book helps researchers as well as students specializing in data quality or data matching aspects to familiarize themselves with recent research advances and to identify open research challenges in the area of data matching. To this end, each chapter of the book includes a final section that provides pointers to further background and research material. Practitioners will better understand the current state of the art in data matching as well as the internal workings and limitations of current systems. Especially, they will learn that it is often not feasible to simply implement an existing off-the-shelf data matching system without substantial adaption and customization. Such practical considerations are discussed for each of the major steps in the data matching process.
Liu has written a comprehensive text on Web mining, which consists of two parts. The first part covers the data mining and machine learning foundations, where all the essential concepts and algorithms of data mining and machine learning are presented. The second part covers the key topics of Web mining, where Web crawling, search, social network analysis, structured data extraction, information integration, opinion mining and sentiment analysis, Web usage mining, query log mining, computational advertising, and recommender systems are all treated both in breadth and in depth. His book thus brings all the related concepts and algorithms together to form an authoritative and coherent text.
The book offers a rich blend of theory and practice. It is suitable for students, researchers and practitioners interested in Web mining and data mining both as a learning text and as a reference book. Professors can readily use it for classes on data mining, Web mining, and text mining. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.
—Jeff Lenamon, CIBC World Markets
Updated edition with exciting new Access 2007 features!
Harness the power of Access 2007 with the expert guidance in this comprehensive reference. Beginners will appreciate the thorough attention to database fundamentals and terminology. Experienced users can jump right into Access 2007 enhancements like the all-new user interface and wider use of XML and Web services. Each of the book's six parts thoroughly focuses on key elements in a logical sequence, so you have what you need, when you need it. Designed as both a reference and a tutorial, Access 2007 Bible is a powerful tool for developers needing to make the most of the new features in Access 2007.Build Access tables using good relational database techniques Construct efficient databases using a five-step design method Design efficient data-entry and data display forms Utilize the improved Access report designer Use Visual Basic(r) for Applications and the VBA Editor to automate applications Build and customize Access 2007 ribbons Seamlessly exchange Access data with SharePoint(r) Employ advanced techniques such as the Windows(r) API and object-oriented programming Add security and use data replication in your Access applications
What's on the CD-ROM?
Follow the examples in the book chapter by chapter using the bonus materials on the CD-ROM. You'll find separate Microsoft Access database files for each chapter and other working files, includingAll the examples and databases used in the book, including database files, images, data files in various formats, and icon files used in the book's examples A complete sample application file, including queries, reports, objects, and modules, that you can use as a reference
See the CD-ROM appendix for details and complete system requirements.
Note: CD-ROM/DVD and other supplementary materials are not included as part of eBook file.
most complex database environments. The provided examples and sample code provide plenty of hands-on opportunities to learn more about SQL Server and create your own viable solutions.
Four leading SQL Server experts present deep practical insights for administering SQL Server, analyzing and optimizing queries, implementing data warehouses, ensuring high availability, tuning performance, and much more. You will benefit from their behind-the-scenes look into SQL Server, showing what goes on behind the various wizards and GUI-based tools. You’ll learn how to use the underlying SQL commands to fully unlock the power and capabilities of SQL Server.
Writing for all intermediate-to-advanced-level SQL Server professionals, the authors draw on immense production experience with SQL Server. Throughout, they focus on successfully applying SQL Server 2014’s most powerful capabilities and its newest tools and features.
Detailed information on how to…
The important stuff you need to know:Dive into relational data. Solve problems quickly by connecting and combining data from different tables.Create professional documents. Publish reports, charts, invoices, catalogs, and other documents with ease.Access data anywhere. Use FileMaker Go on your iPad or iPhone—or share data on the Web.Harness processing power. Use new calculation and scripting tools to crunch numbers, search text, and automate tasks.Run your database on a secure server. Learn the high-level features of FileMaker Pro Advanced.Keep your data safe. Set privileges and allow data sharing with FileMaker’s streamlined security features.
–Charles Carr, Reviews Editor, ComputorEdge Magazine
Create Forms for Business
Ensure Data Entry Accuracy
Build Elegant Form Interfaces
Collect Data Via Email
Design Effective Business Reports
Make an Invoice Report
Create Mailing Labels
Work with Multiple Tables
Develop your Microsoft Access expertise instantly with proven techniques
Let’s face it: Microsoft Access is a large, intimidating program. Most people never progress beyond creating simple tables and using wizards to build basic forms and reports. At the same time, you need information and you know that what you seek is embedded somewhere in your Access database. Without a more sophisticated knowledge of how to extract and present that data, you’re forced to rely on office gurus and overworked IT people to provide canned reports or one-size-fits-all solutions.
This book changes all that by giving you the skills to build efficient front-ends for data (forms), publish the results in an attractive and easy-to-read format (reports), and extract the data you need (queries). This book shuns the big Access picture and instead focuses intently on forms, reports, and queries. This in-depth approach will give you the knowledge and understanding you need to get at the data and prove the old saw that knowledge is power.
· Focuses on the three technologies that you must master to get the most out of Access: forms, reports, and queries.
· Avoids database theory in favor of practical know-how that you can put to use right away.
· Packed full of real-world examples and techniques to help you learn and understand the importance of each section.
· Covers what’s new and changed in Microsoft Access 2007.
Part I: Creating Forms
Chapter 1 Creating and Using a Form
Chapter 2 Working with Form Controls
Chapter 3 Designing Forms for Efficient and Accurate Data Entry
Chapter 4 Designing Forms for Business Use
Chapter 5 Creating Specialized Forms
Part II: Designing and Customizing Reports
Chapter 6 Creating and Publishing a Report
Chapter 7 Designing Effective Business Reports
Chapter 8 Designing Advanced Reports
Chapter 9 Creating Specialized Reports
Part III: Creating Powerful Queries
Chapter 10 Creating a Basic Query
Chapter 11 Building Criteria Expressions
Chapter 12 Working with Multiple-Table Queries
Chapter 13 Creating Advanced Queries
Chapter 14 Creating PivotTable Queries
Chapter 15 Querying with SQL Statements
You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find these patterns useful for working on your own data applications.
Patterns include:Recommending music and the Audioscrobbler data setPredicting forest cover with decision treesAnomaly detection in network traffic with K-means clusteringUnderstanding Wikipedia with Latent Semantic AnalysisAnalyzing co-occurrence networks with GraphXGeospatial and temporal data analysis on the New York City Taxi Trips dataEstimating financial risk through Monte Carlo simulationAnalyzing genomics data and the BDG projectAnalyzing neuroimaging data with PySpark and Thunder
In The Art of SQL, author and SQL expert Stephane Faroult argues that this "safe approach" only leads to disaster. His insightful book, named after Art of War by Sun Tzu, contends that writing quick inefficient code is sweeping the dirt under the rug. SQL code may run for 5 to 10 years, surviving several major releases of the database management system and on several generations of hardware. The code must be fast and sound from the start, and that requires a firm understanding of SQL and relational theory.
The Art of SQL offers best practices that teach experienced SQL users to focus on strategy rather than specifics. Faroult's approach takes a page from Sun Tzu's classic treatise by viewing database design as a military campaign. You need knowledge, skills, and talent. Talent can't be taught, but every strategist from Sun Tzu to modern-day generals believed that it can be nurtured through the experience of others. They passed on their experience acquired in the field through basic principles that served as guiding stars amid the sound and fury of battle. This is what Faroult does with SQL.
Like a successful battle plan, good architectural choices are based on contingencies. What if the volume of this or that table increases unexpectedly? What if, following a merger, the number of users doubles? What if you want to keep several years of data online? Faroult's way of looking at SQL performance may be unconventional and unique, but he's deadly serious about writing good SQL and using SQL well. The Art of SQL is not a cookbook, listing problems and giving recipes. The aim is to get you-and your manager-to raise good questions.
The #1 Easy, Commonsense Guide to Database Design! Michael J. Hernandez’s best-selling Database Design for Mere Mortals® has earned worldwide respect as the clearest, simplest way to learn relational database design. Now, he’s made this hands-on, software-independent tutorial even easier, while ensuring that his design methodology is still relevant to the latest databases, applications, and best practices. Step by step, Database Design for Mere Mortals ® , Third Edition, shows you how to design databases that are soundly structured, reliable, and flexible, even in modern web applications. Hernandez guides you through everything from database planning to defining tables, fields, keys, table relationships, business rules, and views. You’ll learn practical ways to improve data integrity, how to avoid common mistakes, and when to break the rules.
Understanding database types, models, and design terminology
Discovering what good database design can do for you—and why bad design can make your life miserable
Setting objectives for your database, and transforming those objectives into real designs
Analyzing a current database so you can identify ways to improve it
Establishing table structures and relationships, assigning primary keys, setting field specifications, and setting up views
Ensuring the appropriate level of data integrity for each application
Identifying and establishing business rules
Whatever relational database systems you use, Hernandez will help you design databases that are robust and trustworthy. Never designed a database before? Settling for inadequate generic designs? Running existing databases that need improvement? Start here.