The Data Model Scorecard is a data model quality scoring tool containing ten categories aimed at improving the quality of your organization’s data models. Many of my consulting assignments are dedicated to applying the Data Model Scorecard to my client’s data models – I will show you how to apply the Scorecard in this book.
This book, written for people who build, use, or review data models, contains the Data Model Scorecard template and an explanation along with many examples of each of the ten Scorecard categories. There are three sections:
In Section I, Data Modeling and the Need for Validation, receive a short data modeling primer in Chapter 1, understand why it is important to get the data model right in Chapter 2, and learn about the Data Model Scorecard in Chapter 3.
In Section II, Data Model Scorecard Categories, we will explain each of the ten categories of the Data Model Scorecard. There are ten chapters in this section, each chapter dedicated to a specific Scorecard category:
· Chapter 4: Correctness
· Chapter 5: Completeness
· Chapter 6: Scheme
· Chapter 7: Structure
· Chapter 8: Abstraction
· Chapter 9: Standards
· Chapter 10: Readability
· Chapter 11: Definitions
· Chapter 12: Consistency
· Chapter 13: DataIn Section III, Validating Data Models, we will prepare for the model review (Chapter 14), cover tips to help during the model review (Chapter 15), and then review a data model based upon an actual project (Chapter 16).
Steve Hoberman has trained more than 10,000 people in data modeling since 1992. Steve is known for his entertaining and interactive teaching style, and organizations around the globe have brought Steve in to teach his Data Modeling Master Class, which is recognized as the most comprehensive data modeling course in the industry. Steve is the author of nine books on data modeling, including the bestseller Data Modeling Made Simple. One of Steve's frequent data modeling consulting assignments is to review data models using his Data Model Scorecard® technique. He is the founder of the Design Challenges group, Conference Chair of the Data Modeling Zone conference, recipient of the 2012 Data Administration Management Association (DAMA) International Professional Achievement Award, and highest rated tutorial presenter at Enterprise Data World 2014 and Enterprise Data World 2015.
The code-packed examples in this book will help you learn how to work with documents, populate a simple database, replicate data from one database to another, and a host of other tasks.Install CouchDB on Linux, Mac OS X, Windows, or (if you must) from the source codeInteract with data through CouchDB’s RESTful API, and use standard HTTP operations, such as PUT, GET, POST, and DELETEUse Futon—CouchDB’s web-based interface— to manage databases and documents, and to configure replicationsLearn how to create, update, and delete documents in JSON format, and how to create and delete databasesWork with design documents to get the formatting and indexing your application requires
Understand the language and vocabulary of Data Architecture.
The Data Architecture field is rife with terms that have become “fashionable”. Some of the terms began with very specific, specialized, meanings – but as their use spread, they lost the precision of their technical definitions and become, well, “buzzwords”.
A buzzword is “a word or expression from a particular subject area that has become fashionable because it has been used a lot”. Compliance is “the obeying of an accepted principle or instruction that states the way things are or should be done.”
The assignment is to take buzzwords and follow rules to use them correctly. We cut through the hype to arrive at buzzword compliance – the state where you fully understand the words that in fact have real meaning in the data architecture industry. This book will rationalize the various ways all these terms are defined.
Of necessity, the book must address all aspects of describing an enterprise and its data management technologies. This includes a wide range of subjects, from entity/relationship modeling, through the semantic web, to database issues like relational and “beyond relational” (“NoSQL”) approaches. In each case, the definitions for the subject are meant to be detailed enough to make it possible to understand basic principles—while recognizing that a full understanding will require consulting the sources where they are more completely described.
The book’s Glossary contains a catalogue of definitions and its Bibliography contains a comprehensive set of references.
Each pattern is explained in context, with pitfalls and caveats clearly identified to help you avoid common design mistakes when modeling your big data architecture. This book also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. All code examples are written for Hadoop.Summarization patterns: get a top-level view by summarizing and grouping dataFiltering patterns: view data subsets such as records generated from one userData organization patterns: reorganize data to work with other systems, or to make MapReduce analysis easierJoin patterns: analyze different datasets together to discover interesting relationshipsMetapatterns: piece together several patterns to solve multi-stage problems, or to perform several analytics in the same jobInput and output patterns: customize the way you use Hadoop to load or store data
"A clear exposition of MapReduce programs for common data processing patterns—this book is indespensible for anyone using Hadoop."
--Tom White, author of Hadoop: The Definitive Guide
If you are a developer, data architect, or a data scientist looking for information on how to integrate the Big Data stack architecture and how to choose the correct technology in every layer, this book is what you are looking for.What You Will LearnDesign and implement a fast data Pipeline architectureThink and solve programming challenges in a functional way with ScalaLearn to use Akka, the actors model implementation for the JVMMake on memory processing and data analysis with Spark to solve modern business demandsBuild a powerful and effective cluster infrastructure with Mesos and DockerManage and consume unstructured and No-SQL data sources with CassandraConsume and produce messages in a massive way with KafkaIn Detail
SMACK is an open source full stack for big data architecture. It is a combination of Spark, Mesos, Akka, Cassandra, and Kafka. This stack is the newest technique developers have begun to use to tackle critical real-time analytics for big data. This highly practical guide will teach you how to integrate these technologies to create a highly efficient data analysis system for fast data processing.
We'll start off with an introduction to SMACK and show you when to use it. First you'll get to grips with functional thinking and problem solving using Scala. Next you'll come to understand the Akka architecture. Then you'll get to know how to improve the data structure architecture and optimize resources using Apache Spark.
Moving forward, you'll learn how to perform linear scalability in databases with Apache Cassandra. You'll grasp the high throughput distributed messaging systems using Apache Kafka. We'll show you how to build a cheap but effective cluster infrastructure with Apache Mesos. Finally, you will deep dive into the different aspect of SMACK using a few case studies.
By the end of the book, you will be able to integrate all the components of the SMACK stack and use them together to achieve highly effective and fast data processing.Style and approach
With the help of various industry examples, you will learn about the full stack of big data architecture, taking the important aspects in every technology. You will learn how to integrate the technologies to build effective systems rather than getting incomplete information on single technologies. You will learn how various open source technologies can be used to build cheap and fast data processing systems with the help of various industry examples
In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.Peer under the hood of the systems you already use, and learn how to use and operate them more effectivelyMake informed decisions by identifying the strengths and weaknesses of different toolsNavigate the trade-offs around consistency, scalability, fault tolerance, and complexityUnderstand the distributed systems research upon which modern databases are builtPeek behind the scenes of major online services, and learn from their architectures
Now, what if you had a time machine and could go back and read this book. You would learn that even NoSQL databases like MongoDB require some level of data modeling. Data modeling is the process of learning about the data, and regardless of technology, this process must be performed for a successful application. You would learn the value of conceptual, logical, and physical data modeling and how each stage increases our knowledge of the data and reduces assumptions and poor design decisions.
Read this book to learn how to do data modeling for MongoDB applications, and accomplish these five objectives:
Understand how data modeling contributes to the process of learning about the data, and is, therefore, a required technique, even when the resulting database is not relational. That is, NoSQL does not mean NoDataModeling! Know how NoSQL databases differ from traditional relational databases, and where MongoDB fits. Explore each MongoDB object and comprehend how each compares to their data modeling and traditional relational database counterparts, and learn the basics of adding, querying, updating, and deleting data in MongoDB. Practice a streamlined, template-driven approach to performing conceptual, logical, and physical data modeling. Recognize that data modeling does not always have to lead to traditional data models! Distinguish top-down from bottom-up development approaches and complete a top-down case study which ties all of the modeling techniques together.
This book is written for anyone who is working with, or will be working with MongoDB, including business analysts, data modelers, database administrators, developers, project managers, and data scientists. There are three sections:In Section I, Getting Started, we will reveal the power of data modeling and the tight connections to data models that exist when designing any type of database (Chapter 1), compare NoSQL with traditional relational databases and where MongoDB fits (Chapter 2), explore each MongoDB object and comprehend how each compares to their data modeling and traditional relational database counterparts (Chapter 3), and explain the basics of adding, querying, updating, and deleting data in MongoDB (Chapter 4).
In Section II, Levels of Granularity, we cover Conceptual Data Modeling (Chapter 5), Logical Data Modeling (Chapter 6), and Physical Data Modeling (Chapter 7). Notice the “ing” at the end of each of these chapters. We focus on the process of building each of these models, which is where we gain essential business knowledge.
In Section III, Case Study, we will explain both top down and bottom up development approaches and go through a top down case study where we start with business requirements and end with the MongoDB database. This case study will tie together all of the techniques in the previous seven chapters.
Nike Senior Data Architect Ryan Smith wrote the foreword. Key points are included at the end of each chapter as a way to reinforce concepts. In addition, this book is loaded with hands-on exercises, along with their answers provided in Appendix A. Appendix B contains all of the book’s references and Appendix C contains a glossary of the terms used throughout the text.