Plan, build, and distribute data-rich business intelligence reports with ease. Oracle CRM On Demand Reporting details the entire report-building process and explains how to use the powerful features available in Answers On Demand to create and share comprehensive, integrated reports. The book includes proven strategies for report design and development, real-world examples, formatting tips, advanced reporting techniques, and sample formulas. Convert disparate corporate data into actionable business intelligence with help from this detailed guide.Maximize the features of Answers On Demand Strategically plan your reports Format, sort, filter, and modify report data Use column formulas with numeric, date and time, and text data Apply logic and transform data with column formulas Format report title and table views Add charts, graphs, pivot tables, filter prompts, and interactivity to reports Configure advanced views and create custom dashboards Manage reports and control access
Create custom, interactive dashboards to deliver actionable business intelligence directly to end users with help from this Oracle Press guide. Oracle CRM On Demand Dashboards provides comprehensive coverage of the versatile features available in Oracle Answers On Demand coupled with detailed planning and design strategies for building and deploying reports and dashboards with Oracle CRM On Demand. Real-world examples and time-saving formatting tips are included throughout this practical resource.Configure administrative settings to access report and dashboard development tools Take advantage of built-in dashboards Plan dashboards and reports based on business needs Create, edit, manage, save, and delete custom dashboards Configure dashboard properties and pages Add and configure dashboard objects, such as guided navigation, links, images, and folders Display and arrange reports on dashboards Develop dashboard filter prompts Deploy dashboards to Oracle CRM On Demand end users
Integrate business intelligence directly into day-to-day business processes by presenting analytic reports to end users in applications. Oracle CRM On Demand Embedded Analytics shows you how to link and embed reports throughout Oracle CRM On Demand. This Oracle Press guide explains how to plan, design, and develop reports; make them available through hyperlinks; and embed reports in a variety of locations easily accessible to end users. Real-world examples and time-saving tips are included throughout this practical resource.Navigate the report development environment within Oracle CRM On Demand Configure administrative settings to access report development and embedding tools Design reports for linking and embedding in applications Format columns, filter data, and add and edit views, including pivot tables and pivot charts Add link functionality to Oracle CRM On Demand and your reports Embed reports using HTML and URL encoding Display reports inside dashboards and on home page screens Embed reports inside other reports Add reports and dashboards to custom Web tabs and desktop applications Include external sites in reports
This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.
If you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning – whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource.What You Will LearnExplore how to use different machine learning models to ask different questions of your dataLearn how to build neural networks using Keras and TheanoFind out how to write clean and elegant Python code that will optimize the strength of your algorithmsDiscover how to embed your machine learning model in a web application for increased accessibilityPredict continuous target outcomes using regression analysisUncover hidden patterns and structures in data with clusteringOrganize data using effective pre-processing techniquesGet to grips with sentiment analysis to delve deeper into textual and social media dataIn Detail
Machine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, becoming one of the key strategies for unlocking growth in a challenging contemporary marketplace. Python can help you deliver key insights into your data – its unique capabilities as a language let you build sophisticated algorithms and statistical models that can reveal new perspectives and answer key questions that are vital for success.
Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world's leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you'll soon be able to answer some of the most important questions facing you and your organization.Style and approach
Python Machine Learning connects the fundamental theoretical principles behind machine learning to their practical application in a way that focuses you on asking and answering the right questions. It walks you through the key elements of Python and its powerful machine learning libraries, while demonstrating how to get to grips with a range of statistical models.
Let's face it, SQL is a deceptively simple language to learn, and many database developers never go far beyond the simple statement: SELECT columns FROM table WHERE conditions. But there is so much more you can do with the language. In the SQL Cookbook, experienced SQL developer Anthony Molinaro shares his favorite SQL techniques and features. You'll learn about:
Window functions, arguably the most significant enhancement to SQL in the past decade. If you're not using these, you're missing out
Powerful, database-specific features such as SQL Server's PIVOT and UNPIVOT operators, Oracle's MODEL clause, and PostgreSQL's very useful GENERATE_SERIES function
Pivoting rows into columns, reverse-pivoting columns into rows, using pivoting to facilitate inter-row calculations, and double-pivoting a result set
Bucketization, and why you should never use that term in Brooklyn.
How to create histograms, summarize data into buckets, perform aggregations over a moving range of values, generate running-totals and subtotals, and other advanced, data warehousing techniques
The technique of walking a string, which allows you to use SQL to parse through the characters, words, or delimited elements of a string
Written in O'Reilly's popular Problem/Solution/Discussion style, the SQL Cookbook is sure to please. Anthony's credo is: "When it comes down to it, we all go to work, we all have bills to pay, and we all want to go home at a reasonable time and enjoy what's still available of our days." The SQL Cookbook moves quickly from problem to solution, saving you time each step of the way.
Updated to reflect recent advances in MySQL and InnoDB performance, features, and tools, this third edition not only offers specific examples of how MySQL works, it also teaches you why this system works as it does, with illustrative stories and case studies that demonstrate MySQL’s principles in action. With this book, you’ll learn how to think in MySQL.Learn the effects of new features in MySQL 5.5, including stored procedures, partitioned databases, triggers, and viewsImplement improvements in replication, high availability, and clusteringAchieve high performance when running MySQL in the cloudOptimize advanced querying features, such as full-text searchesTake advantage of modern multi-core CPUs and solid-state disksExplore backup and recovery strategies—including new tools for hot online backups
Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.Understand how data science fits in your organization—and how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidates
Updated for R 2.14 and 2.15, this second edition includes new and expanded chapters on R performance, the ggplot2 data visualization package, and parallel R computing with Hadoop.Get started quickly with an R tutorial and hundreds of examplesExplore R syntax, objects, and other language detailsFind thousands of user-contributed R packages online, including BioconductorLearn how to use R to prepare data for analysisVisualize your data with R’s graphics, lattice, and ggplot2 packagesUse R to calculate statistical fests, fit models, and compute probability distributionsSpeed up intensive computations by writing parallel R programs for HadoopGet a complete desktop reference to R
The example code for this unique data science book is maintained in a public GitHub repository. It’s designed to be easily accessible through a turnkey virtual machine that facilitates interactive learning with an easy-to-use collection of IPython Notebooks.
This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
This updated second edition provides guidance for database developers, advanced configuration for system administrators, and an overview of the concepts and use cases for other people on your project. Ideal for NoSQL newcomers and experienced MongoDB users alike, this guide provides numerous real-world schema design examples.Get started with MongoDB core concepts and vocabularyPerform basic write operations at different levels of safety and speedCreate complex queries, with options for limiting, skipping, and sorting resultsDesign an application that works well with MongoDBAggregate data, including counting, finding distinct values, grouping documents, and using MapReduceGather and interpret statistics about your collections and databasesSet up replica sets and automatic failover in MongoDBUse sharding to scale horizontally, and learn how it impacts applicationsDelve into monitoring, security and authentication, backup/restore, and other administrative tasks
Detailing the hows and the whys of successful Essbase implementation, the book arms you with simple yet powerful tools to meet your immediate needs, as well as the theoretical knowledge to proceed to the next level with Essbase. Infrastructure, data sourcing and transformation, database design, calculations, automation, APIs, reporting, and project implementation are covered by subject matter experts who work with the tools and techniques on a daily basis. In addition to practical cases that illustrate valuable lessons learned, the book offers:
Undocumented Secrets—Dan Pressman describes the previously unpublished and undocumented inner workings of the ASO Essbase engine. Authoritative Experts—If you have questions that no one else can solve, these 12 Essbase professionals are the ones who can answer them. Unpublished—Includes the only third-party guide to infrastructure. Infrastructure is easy to get wrong and can doom any Essbase project. Comprehensive—Let there never again be a question on how to create blocks or design BSO databases for performance—Dave Farnsworth provides the answers within. Innovative—Cameron Lackpour and Joe Aultman bring new and exciting solutions to persistent Essbase problems.
With a list of contributors as impressive as the program of presenters at a leading Essbase conference, this book offers unprecedented access to the insights and experiences of those at the forefront of the field. The previously unpublished material presented in these pages will give you the practical knowledge needed to use this powerful and intuitive tool to build highly useful analytical models, reporting systems, and forecasting applications.
Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.Get a high-level overview of HDFS and MapReduce: why they exist and how they workPlan a Hadoop deployment, from hardware and OS selection to network requirementsLearn setup and configuration details with a list of critical propertiesManage resources by sharing a cluster across multiple groupsGet a runbook of the most common cluster maintenance tasksMonitor Hadoop clusters—and learn troubleshooting with the help of real-world war storiesUse basic tools and techniques to handle backup and catastrophic failure
This book will help you:Become a contributor on a data science team Deploy a structured lifecycle approach to data analytics problems Apply appropriate analytic techniques and tools to analyzing big data Learn how to tell a compelling story with data to drive business action Prepare for EMC Proven Professional Data Science Certification
Corresponding data sets are available at www.wiley.com/go/9781118876138.
Get started discovering, analyzing, visualizing, and presenting data in a meaningful way today!
Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You’ll learn about recent changes to Hadoop, and explore new case studies on Hadoop’s role in healthcare systems and genomics data processing.Learn fundamental components such as MapReduce, HDFS, and YARNExplore MapReduce in depth, including steps for developing applications with itSet up and maintain a Hadoop cluster running HDFS and MapReduce on YARNLearn two data formats: Avro for data serialization and Parquet for nested dataUse data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer)Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with HadoopLearn the HBase distributed database and the ZooKeeper distributed configuration service
This book offers practical answers to some of the hardest questions faced by PL/SQL developers, including:What is the best way to write the SQL logic in my application code?
How should I write my packages so they can be leveraged by my entire team of developers?
How can I make sure that all my team's programs handle and record errors consistently?Oracle PL/SQL Best Practices summarizes PL/SQL best practices in nine major categories: overall PL/SQL application development; programming standards; program testing, tracing, and debugging; variables and data structures; control logic; error handling; the use of SQL in PL/SQL; building procedures, functions, packages, and triggers; and overall program performance.
This book is a concise and entertaining guide that PL/SQL developers will turn to again and again as they seek out ways to write higher quality code and more successful applications.
"This book presents ideas that make the difference between a successful project and one that never gets off the ground. It goes beyond just listing a set of rules, and provides realistic scenarios that help the reader understand where the rules come from. This book should be required reading for any team of Oracle database professionals."
--Dwayne King, President, KRIDAN Consulting
Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power.
Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention.
Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You’ll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you’ve mastered these techniques, you’ll constantly turn to this guide for the working PyMC code you need to jumpstart future projects.
• Learning the Bayesian “state of mind” and its practical implications
• Understanding how computers perform Bayesian inference
• Using the PyMC Python library to program Bayesian analyses
• Building and debugging models with PyMC
• Testing your model’s “goodness of fit”
• Opening the “black box” of the Markov Chain Monte Carlo algorithm to see how and why it works
• Leveraging the power of the “Law of Large Numbers”
• Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning
• Using loss functions to measure an estimate’s weaknesses based on your goals and desired outcomes
• Selecting appropriate priors and understanding how their influence changes with dataset size
• Overcoming the “exploration versus exploitation” dilemma: deciding when “pretty good” is good enough
• Using Bayesian inference to improve A/B testing
• Solving data science problems when only small amounts of data are available
Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.
The Silicon Jungle is a cautionary fictional tale of data mining’s promise and peril. Baluja raises ethical questions about contemporary technological innovations, and how minute details can be routinely pieced together into rich profiles that reveal our habits, goals, and secret desires—all ready to be exploited.
The Concept and Object Modeling Notation (COMN) is able to cover the full spectrum of analysis and design. A single COMN model can represent the objects and concepts in the problem space, logical data design, and concrete NoSQL and SQL document, key-value, columnar, and relational database implementations. COMN models enable an unprecedented level of traceability of requirements to implementation. COMN models can also represent the static structure of software and the predicates that represent the patterns of meaning in databases.
This book will teach you:the simple and familiar graphical notation of COMN with its three basic shapes and four line styles how to think about objects, concepts, types, and classes in the real world, using the ordinary meanings of English words that aren’t tangled with confused techno-speak how to express logical data designs that are freer from implementation considerations than is possible in any other notation how to understand key-value, document, columnar, and table-oriented database designs in logical and physical terms how to use COMN to specify physical database implementations in any NoSQL or SQL database with the precision necessary for model-driven development
This book is an in-depth guide to the use of pandas for data analysis, for either the seasoned data analysis practitioner or the novice user. It provides a basic introduction to the pandas framework, and takes users through the installation of the library and the IPython interactive environment. Thereafter, you will learn basic as well as advanced features, such as MultiIndexing, modifying data structures, and sampling data, which provide powerful capabilities for data analysis.
Let SQL Hacks serve as your toolbox for digging up and manipulating data. If you love to tinker and optimize, SQL is the perfect technology and SQL Hacks is the must-have book for you.
Ideal for beginners and professional database and web developers, this updated third edition covers powerful features in MySQL 5.6 (and some in 5.7). The book focuses on programming APIs in Python, PHP, Java, Perl, and Ruby. With more than 200+ recipes, you’ll learn how to:Use the mysql client and write MySQL-based programsCreate, populate, and select data from tablesStore, retrieve, and manipulate stringsWork with dates and timesSort query results and generate summariesUse stored routines, triggers, and scheduled eventsImport, export, validate, and reformat dataPerform transactions and work with statisticsProcess web input, and generate web content from query resultsUse MySQL-based web session managementProvide security and server administration
In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.
Topics include:Statistical inference, exploratory data analysis, and the data science processAlgorithmsSpam filters, Naive Bayes, and data wranglingLogistic regressionFinancial modelingRecommendation engines and causalityData visualizationSocial networks and data journalismData engineering, MapReduce, Pregel, and Hadoop
Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.
Collier introduces platform-agnostic Agile solutions for integrating infrastructures consisting of diverse operational, legacy, and specialty systems that mix commercial and custom code. Using working examples, he shows how to manage analytics development teams with widely diverse skill sets and how to support enormous and fast-growing data volumes. Collier’s techniques offer optimal value whether your projects involve “back-end” data management, “front-end” business analysis, or both.
Part I focuses on Agile project management techniques and delivery team coordination, introducing core practices that shape the way your Agile DW/BI project community can collaborate toward success
Part II presents technical methods for enabling continuous delivery of business value at production-quality levels, including evolving superior designs; test-driven DW development; version control; and project automation
Collier brings together proven solutions you can apply right now—whether you’re an IT decision-maker, data warehouse professional, database administrator, business intelligence specialist, or database developer. With his help, you can mitigate project risk, improve business alignment, achieve better results—and have fun along the way.
Master modern web and network data modeling: both theory and applications.In Web and Network Data Science, a top faculty member of Northwestern University’s prestigious analytics program presents the first fully-integrated treatment of both the business and academic elements of web and network modeling for predictive analytics.
Some books in this field focus either entirely on business issues (e.g., Google Analytics and SEO); others are strictly academic (covering topics such as sociology, complexity theory, ecology, applied physics, and economics). This text gives today's managers and students what they really need: integrated coverage of concepts, principles, and theory in the context of real-world applications.
Building on his pioneering Web Analytics course at Northwestern University, Thomas W. Miller covers usability testing, Web site performance, usage analysis, social media platforms, search engine optimization (SEO), and many other topics. He balances this practical coverage with accessible and up-to-date introductions to both social network analysis and network science, demonstrating how these disciplines can be used to solve real business problems.
—From the Foreword by Raymie Stata, CEO of Altiscale
The Insider’s Guide to Building Distributed, Big Data Applications with Apache Hadoop™ YARN
Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop™ YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances.
YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment.
You’ll find many examples drawn from the authors’ cutting-edge experience—first as Hadoop’s earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it.
Coverage includesYARN’s goals, design, architecture, and components—how it expands the Apache Hadoop ecosystem Exploring YARN on a single node Administering YARN clusters and Capacity Scheduler Running existing MapReduce applications Developing a large-scale clustered YARN application Discovering new open source frameworks that run under YARN
The book offers a rich blend of theory and practice. It is suitable for students, researchers and practitioners interested in Web mining and data mining both as a learning text and as a reference book. Professors can readily use it for classes on data mining, Web mining, and text mining. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.
As the data deluge continues in today’s world, the need to master data mining, predictive analytics, and business analytics has never been greater. These techniques and tools provide unprecedented insights into data, enabling better decision making and forecasting, and ultimately the solution of increasingly complex problems.
Learn from the Creators of the RapidMiner Software
Written by leaders in the data mining community, including the developers of the RapidMiner software, RapidMiner: Data Mining Use Cases and Business Analytics Applications provides an in-depth introduction to the application of data mining and business analytics techniques and tools in scientific research, medicine, industry, commerce, and diverse other sectors. It presents the most powerful and flexible open source software solutions: RapidMiner and RapidAnalytics. The software and their extensions can be freely downloaded at www.RapidMiner.com.
Understand Each Stage of the Data Mining Process
The book and software tools cover all relevant steps of the data mining process, from data loading, transformation, integration, aggregation, and visualization to automated feature selection, automated parameter and process optimization, and integration with other tools, such as R packages or your IT infrastructure via web services. The book and software also extensively discuss the analysis of unstructured data, including text and image mining.
Easily Implement Analytics Approaches Using RapidMiner and RapidAnalytics
Each chapter describes an application, how to approach it with data mining methods, and how to implement it with RapidMiner and RapidAnalytics. These application-oriented chapters give you not only the necessary analytics to solve problems and tasks, but also reproducible, step-by-step descriptions of using RapidMiner and RapidAnalytics. The case studies serve as blueprints for your own data mining applications, enabling you to effectively solve similar problems.
Written by one of the most respected consultants in the area of data mining and security, Data Mining for Intelligence, Fraud & Criminal Detection: Advanced Analytics & Information Sharing Technologies reviews the tangible results produced by these systems and evaluates their effectiveness. While CSI-type shows may depict information sharing and analysis that are accomplished with the push of a button, this sort of proficiency is more fiction than reality. Going beyond a discussion of the various technologies, the author outlines the issues of information sharing and the effective interpretation of results, which are critical to any integrated homeland security effort.
Organized into three main sections, the book fully examines and outlines the future of this field with an insider’s perspective and a visionary’s insight.
Section 1 provides a fundamental understanding of the types of data that can be used in current systems. It covers approaches to analyzing data and clearly delineates how to connect the dots among different data elements Section 2 provides real-world examples derived from actual operational systems to show how data is used, manipulated, and interpreted in domains involving human smuggling, money laundering, narcotics trafficking, and corporate fraud Section 3 provides an overview of the many information-sharing systems, organizations, and task forces as well as data interchange formats. It also discusses optimal information-sharing and analytical architectures
Currently, there is very little published literature that truly defines real-world systems. Although politics and other factors all play into how much one agency is willing to support the sharing of its resources, many now embrace the wisdom of that path. This book will provide those individuals with an understanding of what approaches are currently available and how they can be most effectively employed.
Drawing on extensive experience as a researcher, practitioner, and instructor, Dr. Dursun Delen delivers an optimal balance of concepts, techniques and applications. Without compromising either simplicity or clarity, he provides enough technical depth to help readers truly understand how data mining technologies work. Coverage includes: processes, methods, techniques, tools, and metrics; the role and management of data; text and web mining; sentiment analysis; and Big Data integration. Throughout, Delen's conceptual coverage is complemented with application case studies (examples of both successes and failures), as well as simple, hands-on tutorials.
Real-World Data Mining will be valuable to professionals on analytics teams; professionals seeking certification in the field; and undergraduate or graduate students in any analytics program: concentrations, certificate-based, or degree-based.
Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving illustrates the details involved in solving real computational problems encountered in data analysis. It reveals the dynamic and iterative process by which data analysts approach a problem and reason about different ways of implementing solutions.
The book’s collection of projects, comprehensive sample solutions, and follow-up exercises encompass practical topics pertaining to data processing, including:
Non-standard, complex data formats, such as robot logs and email messages Text processing and regular expressions Newer technologies, such as Web scraping, Web services, Keyhole Markup Language (KML), and Google Earth Statistical methods, such as classification trees, k-nearest neighbors, and naïve Bayes Visualization and exploratory data analysis Relational databases and Structured Query Language (SQL) Simulation Algorithm implementation Large data and efficiency
Suitable for self-study or as supplementary reading in a statistical computing course, the book enables instructors to incorporate interesting problems into their courses so that students gain valuable experience and data science skills. Students learn how to acquire and work with unstructured or semistructured data as well as how to narrow down and carefully frame the questions of interest about the data.
Blending computational details with statistical and data analysis concepts, this book provides readers with an understanding of how professional data scientists think about daily computational tasks. It will improve readers’ computational reasoning of real-world data analyses.
Peter Christen’s book is divided into three parts: Part I, “Overview”, introduces the subject by presenting several sample applications and their special challenges, as well as a general overview of a generic data matching process. Part II, “Steps of the Data Matching Process”, then details its main steps like pre-processing, indexing, field and record comparison, classification, and quality evaluation. Lastly, part III, “Further Topics”, deals with specific aspects like privacy, real-time matching, or matching unstructured data. Finally, it briefly describes the main features of many research and open source systems available today.By providing the reader with a broad range of data matching concepts and techniques and touching on all aspects of the data matching process, this book helps researchers as well as students specializing in data quality or data matching aspects to familiarize themselves with recent research advances and to identify open research challenges in the area of data matching. To this end, each chapter of the book includes a final section that provides pointers to further background and research material. Practitioners will better understand the current state of the art in data matching as well as the internal workings and limitations of current systems. Especially, they will learn that it is often not feasible to simply implement an existing off-the-shelf data matching system without substantial adaption and customization. Such practical considerations are discussed for each of the major steps in the data matching process.
Data Mining Mobile Devices defines the collection of machine-sensed environmental data pertaining to human social behavior. It explains how the integration of data mining and machine learning can enable the modeling of conversation context, proximity sensing, and geospatial location throughout large communities of mobile users. Examines the construction and leveraging of mobile sites Describes how to use mobile apps to gather key data about consumers’ behavior and preferences Discusses mobile mobs, which can be differentiated as distinct marketplaces—including Apple®, Google®, Facebook®, Amazon®, and Twitter® Provides detailed coverage of mobile analytics via clustering, text, and classification AI software and techniques
Mobile devices serve as detailed diaries of a person, continuously and intimately broadcasting where, how, when, and what products, services, and content your consumers desire. The future is mobile—data mining starts and stops in consumers' pockets.
Describing how to analyze Wi-Fi and GPS data from websites and apps, the book explains how to model mined data through the use of artificial intelligence software. It also discusses the monetization of mobile devices’ desires and preferences that can lead to the triangulated marketing of content, products, or services to billions of consumers—in a relevant, anonymous, and personal manner.
Design, implement, manage, and maintain a highly flexible service-oriented computing infrastructure across your enterprise using the detailed information in this Oracle Press guide. Written by an Oracle ACE director, Oracle SOA Suite 12c Handbook uses a start-to-finish case study to illustrate each concept and technique. Learn expert techniques for designing and implementing components, assembling composite applications, integrating Java, handling complex business logic, and maximizing code reuse. Runtime administration, governance, and security are covered in this practical resource.Get started with the Oracle SOA Suite 12c development and run time environment Deploy and manage SOA composite applications Expose SOAP/XML REST/JSON through Oracle Service Bus Establish interactions through adapters for Database, JMS, File/FTP, UMS, LDAP, and Coherence Embed custom logic using Java and the Spring component Perform fast data analysis in real time with Oracle Event Processor Implement Event Drive Architecture based on the Event Delivery Network (EDN) Use Oracle Business Rules to encapsulate logic and automate decisions Model complex processes using BPEL, BPMN, and human task components Establish KPIs and evaluate performance using Oracle Business Activity Monitoring Control traffic, audit system activity, and encrypt sensitive data
"A must-have book for anyone expecting to do research and/or applications in categorical data analysis."
—Statistics in Medicine
"It is a total delight reading this book."
"If you do any analysis of categorical data, this is an essential desktop reference."
The use of statistical methods for analyzing categorical data has increased dramatically, particularly in the biomedical, social sciences, and financial industries. Responding to new developments, this book offers a comprehensive treatment of the most important methods for categorical data analysis.
Categorical Data Analysis, Third Edition summarizes the latest methods for univariate and correlated multivariate categorical responses. Readers will find a unified generalized linear models approach that connects logistic regression and Poisson and negative binomial loglinear models for discrete data with normal regression for continuous data. This edition also features:An emphasis on logistic and probit regression methods for binary, ordinal, and nominal responses for independent observations and for clustered data with marginal models and random effects models Two new chapters on alternative methods for binary response data, including smoothing and regularization methods, classification methods such as linear discriminant analysis and classification trees, and cluster analysis New sections introducing the Bayesian approach for methods in that chapter More than 100 analyses of data sets and over 600 exercises Notes at the end of each chapter that provide references to recent research and topics not covered in the text, linked to a bibliography of more than 1,200 sources A supplementary website showing how to use R and SAS; for all examples in the text, with information also about SPSS and Stata and with exercise solutions
Categorical Data Analysis, Third Edition is an invaluable tool for statisticians and methodologists, such as biostatisticians and researchers in the social and behavioral sciences, medicine and public health, marketing, education, finance, biological and agricultural sciences, and industrial quality control.
Stop being limited by the Query Wizard in Microsoft Office Access.
Go beyond creating queries in Design View!
Over 100 SQL queries and statements along with images of results will help you learn to:
EXTRA: A special section in this guide is designed to test your skills.
Did You Know:
Learning Access’ JET SQL is a great way to become familiar and learn other forms of SQL, like Transact.
Knowledge of SQL is an important skill to display on your resume.
SQL can be learned in hours and used for decades.
Follow along with the exercises in this guide!
Download the free Northwind database from Microsoft.
The Northwind database is available in the sample templates area of Access.
Exercises and examples apply to Access 2007, 2010 and 2013.
Are You Ready To Learn SQL?
Download now and start writing SQL queries TODAY!
Scroll to the top of this page and click the 'buy button'.
Are you new to programming languages?
This is Transact SQL – To The Point!
Gain the fundamentals you will need to query your databases using Transact SQL in Microsoft SQL Server.
Learn how to:
• and delete records in tables
• create databases
• create tables
• … and more.
Illustrated examples and a special section for test questions will reinforce your new knowledge of SQL.
With Beautiful Data, you will:Explore the opportunities and challenges involved in working with the vast number of datasets made available by the WebLearn how to visualize trends in urban crime, using maps and data mashupsDiscover the challenges of designing a data processing system that works within the constraints of space travelLearn how crowdsourcing and transparency have combined to advance the state of drug researchUnderstand how new data can automatically trigger alerts when it matches or overlaps pre-existing dataLearn about the massive infrastructure required to create, capture, and process DNA data
That's only small sample of what you'll find in Beautiful Data. For anyone who handles data, this is a truly fascinating book. Contributors include:
Complete with illustrations and helpful hints, this fifth edition provides a valuable one-stop overview of Oracle Database 12c, including an introduction to Oracle and cloud computing. Oracle Essentials provides the conceptual background you need to understand how Oracle truly works.
Topics include:A complete overview of Oracle databases and data stores, and Fusion Middleware products and featuresCore concepts and structures in Oracle’s architecture, including pluggable databasesOracle objects and the various datatypes Oracle supportsSystem and database management, including Oracle Enterprise Manager 12cSecurity options, basic auditing capabilities, and options for meeting compliance needsPerformance characteristics of disk, memory, and CPU tuningBasic principles of multiuser concurrencyOracle’s online transaction processing (OLTP)Data warehouses, Big Data, and Oracle’s business intelligence toolsBackup and recovery, and high availability and failover solutions
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.Get a crash course in PythonLearn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data scienceCollect, explore, clean, munge, and manipulate dataDive into the fundamentals of machine learningImplement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clusteringExplore recommender systems, natural language processing, network analysis, MapReduce, and databases
Get Started Fast with Apache Hadoop® 2, YARN, and Today’s Hadoop Ecosystem
With Hadoop 2.x and YARN, Hadoop moves beyond MapReduce to become practical for virtually any type of data processing. Hadoop 2.x and the Data Lake concept represent a radical shift away from conventional approaches to data usage and storage. Hadoop 2.x installations offer unmatched scalability and breakthrough extensibility that supports new and existing Big Data analytics processing methods and models.
Hadoop® 2 Quick-Start Guide is the first easy, accessible guide to Apache Hadoop 2.x, YARN, and the modern Hadoop ecosystem. Building on his unsurpassed experience teaching Hadoop and Big Data, author Douglas Eadline covers all the basics you need to know to install and use Hadoop 2 on personal computers or servers, and to navigate the powerful technologies that complement it.
Eadline concisely introduces and explains every key Hadoop 2 concept, tool, and service, illustrating each with a simple “beginning-to-end” example and identifying trustworthy, up-to-date resources for learning more.
This guide is ideal if you want to learn about Hadoop 2 without getting mired in technical details. Douglas Eadline will bring you up to speed quickly, whether you’re a user, admin, devops specialist, programmer, architect, analyst, or data scientist.
Coverage IncludesUnderstanding what Hadoop 2 and YARN do, and how they improve on Hadoop 1 with MapReduce Understanding Hadoop-based Data Lakes versus RDBMS Data Warehouses Installing Hadoop 2 and core services on Linux machines, virtualized sandboxes, or clusters Exploring the Hadoop Distributed File System (HDFS) Understanding the essentials of MapReduce and YARN application programming Simplifying programming and data movement with Apache Pig, Hive, Sqoop, Flume, Oozie, and HBase Observing application progress, controlling jobs, and managing workflows Managing Hadoop efficiently with Apache Ambari–including recipes for HDFS to NFSv3 gateway, HDFS snapshots, and YARN configuration Learning basic Hadoop 2 troubleshooting, and installing Apache Hue and Apache Spark
Written in a highly accessible style, Introduction to Statistics through Resampling Methods and R, Second Edition guides students in the understanding of descriptive statistics, estimation, hypothesis testing, and model building. The book emphasizes the discovery method, enabling readers to ascertain solutions on their own rather than simply copy answers or apply a formula by rote. The Second Edition utilizes the R programming language to simplify tedious computations, illustrate new concepts, and assist readers in completing exercises. The text facilitates quick learning through the use of:
More than 250 exercises—with selected "hints"—scattered throughout to stimulate readers' thinking and to actively engage them in applying their newfound skills
An increased focus on why a method is introduced
Multiple explanations of basic concepts
Real-life applications in a variety of disciplines
Dozens of thought-provoking, problem-solving questions in the final chapter to assist readers in applying statistics to real-life applications
Introduction to Statistics through Resampling Methods and R, Second Edition is an excellent resource for students and practitioners in the fields of agriculture, astrophysics, bacteriology, biology, botany, business, climatology, clinical trials, economics, education, epidemiology, genetics, geology, growth processes, hospital administration, law, manufacturing, marketing, medicine, mycology, physics, political science, psychology, social welfare, sports, and toxicology who want to master and learn to apply statistical methods.
SQL Server 2008 puts an incredible amount of BI functionality at your disposal. But how do you take advantage of it? That’s what this book is all about. Authors Guy Fouché and Lynn Langit show how to implement end-to-end BI solutions using SQL Server Analysis Services (SSAS), SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), and other tools in the Microsoft business intelligence toolkit. You’ll learn about all-features such as PowerPivot and Report Builder 3.0. Also provided are clear examples of predictive analysis made possible through powerful data mining features in SQL Server.
If you’re an analyst or developer working with SQL Server 2008 who is charged with delivering results that drive business success, you can’t afford to be without this book; you can’t afford to ignore the powerful business intelligence suite that Microsoft has placed at your disposal.Provides the "big picture" of Microsoft’s BI tool suite Covers PowerPivot and other game-changing technologies introduced alongside SQL Server 2008 Release 2 Gives a practical analysis of features based on real-world practices
The first part provides an introduction to basic procedures for handling and operating with text strings. Then, it reviews major mathematical modeling approaches. Statistical and geometrical models are also described along with main dimensionality reduction methods. Finally, it presents some specific applications such as document clustering, classification, search and terminology extraction.
All descriptions presented are supported with practical examples that are fully reproducible. Further reading, as well as additional exercises and projects, are proposed at the end of each chapter for those readers interested in conducting further experimentation.
The patient revolution has just begun and an exciting journey awaits us.
Praise for the patient revolution
"A full 50% of the US population has at least one chronic disease that requires ongoing monitoring and treatment. Our current health care system is woefully inadequate in providing these individuals with the treatment and support they need. This disparity can only be addressed through empowering patients to better care for themselves and giving providers better tools to care for their patients. Both of those solutions will require the development and application of novel technologies. In Krisa Tailor's book The Patient Revolution, a blueprint is articulated for how this could be achieved, culminating in a vision for a learning health system within 10 years."
—Ricky Bloomfield, MD, Director, Mobile Technology Strategy; Assistant Professor, Duke Medicine
"In The Patient Revolution, Krisa Tailor astutely points out that 80% of health is impacted by factors outside of the health care system. Amazon unfortunately knows more about our patients than we do. The prescriptive analytics she describes will allow health care providers to use big data to optimize interventions at the level of the individual patient. The use of analytics will allow providers to improve quality, shape care coordination, and contain costs. Advanced analytics will lead to personalized care and ultimately empowered patients!"
—Linda Butler, MD, Vice President of Medical Affairs/Chief Medical Officer/Chief Medical Information Officer, Rex Healthcare
"The Patient Revolution provides a practical roadmap on how the industry can capture value by making health and care more personalized, anticipatory, and intuitive to patient needs."
—Ash Damle, CEO, Lumiata
"Excellent read. For me, health care represents a unique economy—one focused on technology, but requiring a deep understanding of humanity. Ms. Tailor begins the exploration of how we provide care via the concepts of design thinking, asking how we might redesign care with an eye toward changing the experience. She does an excellent job deconstructing this from the patient experience. I look forward to a hopeful follow-up directed at changing the provider culture."
—Alan Pitt, MD, Chief Medical Officer, Avizia
"Whether you're a health care provider looking to gain an understanding of the health care landscape, a health data scientist, or a seasoned business pro, you'll come away with a deeper, nuanced understanding of today's evolving health care system with this book. Krisa Tailor ties together—in a comprehensive, unique way—the worlds of health care administration, clinical practice, design thinking, and business strategy and innovation."
—Steven Chan, MD, MBA, University of California, Davis
ONDUX (On Demand Unsupervised Information Extraction) is an unsupervised probabilistic approach for IETS that relies on content-based features to bootstrap the learning of structure-based features. JUDIE (Joint Unsupervised Structure Discovery and Information Extraction) aims at automatically extracting several semi-structured data records in the form of continuous text and having no explicit delimiters between them. In comparison with other IETS methods, including ONDUX, JUDIE faces a task considerably harder that is, extracting information while simultaneously uncovering the underlying structure of the implicit records containing it. iForm applies the authors’ approach to the task of Web form filling. It aims at extracting segments from a data-rich text given as input and associating these segments with fields from a target Web form.
All of these methods were evaluated considering different experimental datasets, which are used to perform a large set of experiments in order to validate the presented approach and methods. These experiments indicate that the proposed approach yields high quality results when compared to state-of-the-art approaches and that it is able to properly support IETS methods in a number of real applications. The findings will prove valuable to practitioners in helping them to understand the current state-of-the-art in unsupervised information extraction techniques, as well as to graduate and undergraduate students of web data management.