"Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss:
How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes.
Important data warehouse technologies and practices.
Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture.Provides a complete introduction to data warehousing, applications, and the business context so readers can get-up and running fast Explains theoretical concepts and provides hands-on instruction on how to build and implement a data warehouseDemystifies data vault modeling with beginning, intermediate, and advanced techniquesDiscusses the advantages of the data vault approach over other techniques, also including the latest updates to Data Vault 2.0 and multiple improvements to Data Vault 1.0
Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research.
Please visit the book companion website at http://www.cs.waikato.ac.nz/ml/weka/book.html
It containsPowerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the bookOnline Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the bookTable of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc.Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projectsPresents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methodsIncludes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interfaceIncludes open-access online courses that introduce practical applications of the material in the book
To help realize Big Data’s full potential, the book addresses numerous challenges, offering the conceptual and technological solutions for tackling them. These challenges include life-cycle data management, large-scale storage, flexible processing infrastructure, data modeling, scalable machine learning, data analysis algorithms, sampling techniques, and privacy and ethical issues.Covers computational platforms supporting Big Data applicationsAddresses key principles underlying Big Data computingExamines key developments supporting next generation Big Data platformsExplores the challenges in Big Data computing and ways to overcome themContains expert contributors from both academia and industry