Understand the language and vocabulary of Data Architecture.
The Data Architecture field is rife with terms that have become “fashionable”. Some of the terms began with very specific, specialized, meanings – but as their use spread, they lost the precision of their technical definitions and become, well, “buzzwords”.
A buzzword is “a word or expression from a particular subject area that has become fashionable because it has been used a lot”. Compliance is “the obeying of an accepted principle or instruction that states the way things are or should be done.”
The assignment is to take buzzwords and follow rules to use them correctly. We cut through the hype to arrive at buzzword compliance – the state where you fully understand the words that in fact have real meaning in the data architecture industry. This book will rationalize the various ways all these terms are defined.
Of necessity, the book must address all aspects of describing an enterprise and its data management technologies. This includes a wide range of subjects, from entity/relationship modeling, through the semantic web, to database issues like relational and “beyond relational” (“NoSQL”) approaches. In each case, the definitions for the subject are meant to be detailed enough to make it possible to understand basic principles—while recognizing that a full understanding will require consulting the sources where they are more completely described.
The book’s Glossary contains a catalogue of definitions and its Bibliography contains a comprehensive set of references.
Since the early 1980s, David Hay has been a pioneer in the use of process and data models to support strategic planning, requirements analysis and system design. He has developed enterprise models for many industries, including, among others, pharmaceutical research, oil refining and production, film and television, and nuclear energy. In each case, he found the relatively simple structures hidden in formidably complex situations. In addition to being a frequent speaker at international conferences, Mr. Hay has published several books and numerous articles.
The Data Model Scorecard is a data model quality scoring tool containing ten categories aimed at improving the quality of your organization’s data models. Many of my consulting assignments are dedicated to applying the Data Model Scorecard to my client’s data models – I will show you how to apply the Scorecard in this book.
This book, written for people who build, use, or review data models, contains the Data Model Scorecard template and an explanation along with many examples of each of the ten Scorecard categories. There are three sections:
In Section I, Data Modeling and the Need for Validation, receive a short data modeling primer in Chapter 1, understand why it is important to get the data model right in Chapter 2, and learn about the Data Model Scorecard in Chapter 3.
In Section II, Data Model Scorecard Categories, we will explain each of the ten categories of the Data Model Scorecard. There are ten chapters in this section, each chapter dedicated to a specific Scorecard category:
· Chapter 4: Correctness
· Chapter 5: Completeness
· Chapter 6: Scheme
· Chapter 7: Structure
· Chapter 8: Abstraction
· Chapter 9: Standards
· Chapter 10: Readability
· Chapter 11: Definitions
· Chapter 12: Consistency
· Chapter 13: DataIn Section III, Validating Data Models, we will prepare for the model review (Chapter 14), cover tips to help during the model review (Chapter 15), and then review a data model based upon an actual project (Chapter 16).
The code-packed examples in this book will help you learn how to work with documents, populate a simple database, replicate data from one database to another, and a host of other tasks.Install CouchDB on Linux, Mac OS X, Windows, or (if you must) from the source codeInteract with data through CouchDB’s RESTful API, and use standard HTTP operations, such as PUT, GET, POST, and DELETEUse Futon—CouchDB’s web-based interface— to manage databases and documents, and to configure replicationsLearn how to create, update, and delete documents in JSON format, and how to create and delete databasesWork with design documents to get the formatting and indexing your application requires
Apache Ignite is a distributed in-memory platform designed to scale and process large volume of data. It can be integrated with microservices as well as monolithic systems, and can be used as a scalable, highly available and performant deployment platform for microservices. This book will teach you to use Apache Ignite for building a high-performance, scalable, highly available system architecture with data integrity.
The book takes you through the basics of Apache Ignite and in-memory technologies. You will learn about installation and clustering Ignite nodes, caching topologies, and various caching strategies, such as cache aside, read and write through, and write behind. Next, you will delve into detailed aspects of Ignite’s data grid: web session clustering and querying data.
You will learn how to process large volumes of data using compute grid and Ignite’s map-reduce and executor service. You will learn about the memory architecture of Apache Ignite and monitoring memory and caches. You will use Ignite for complex event processing, event streaming, and the time-series predictions of opportunities and threats. Additionally, you will go through off-heap and on-heap caching, swapping, and native and Spring framework integration with Apache Ignite.
By the end of this book, you will be confident with all the features of Apache Ignite 2.x that can be used to build a high-performance system architecture.What you will learnUse Apache Ignite’s data grid and implement web session clusteringGain high performance and linear scalability with in-memory distributed data processingCreate a microservice on top of Apache Ignite that can scale and performPerform ACID-compliant CRUD operations on an Ignite cacheRetrieve data from Apache Ignite’s data grid using SQL, Scan and Lucene Text queryExplore complex event processing concepts and event streamingIntegrate your Ignite app with the Spring frameworkWho this book is for
The book is for Big Data professionals who want to learn the essentials of Apache Ignite. Prior experience in Java is necessary.
Each pattern is explained in context, with pitfalls and caveats clearly identified to help you avoid common design mistakes when modeling your big data architecture. This book also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. All code examples are written for Hadoop.Summarization patterns: get a top-level view by summarizing and grouping dataFiltering patterns: view data subsets such as records generated from one userData organization patterns: reorganize data to work with other systems, or to make MapReduce analysis easierJoin patterns: analyze different datasets together to discover interesting relationshipsMetapatterns: piece together several patterns to solve multi-stage problems, or to perform several analytics in the same jobInput and output patterns: customize the way you use Hadoop to load or store data
"A clear exposition of MapReduce programs for common data processing patterns—this book is indespensible for anyone using Hadoop."
--Tom White, author of Hadoop: The Definitive Guide
In 1995, David Hay published Data Model Patterns: Conventions of Thought - the groundbreaking book on how to use standard data models to describe the standard business situations. Enterprise Model Patterns: Describing the World builds on the concepts presented there, adds 15 years of practical experience, and presents a more comprehensive view. You will learn how to apply both the abstract and concrete elements of your enterprise’s architectural data model through four levels of abstraction:
An abstract template that underlies the Level 1 model that follows, plus two meta models:
• Information Resources. In addition to books, articles, and e-mail notes, it also includes photographs, videos, and sound recordings.
• Accounting. Accounting is remarkable because it is itself a modeling language. It takes a very different approach than data modelers in that instead of using entities and entity classes that represent things in the world, it is concerned with accounts that represent bits of value to the organization.
Level 1: An enterprise model that is generic enough to apply to any company or government agency, but concrete enough to be readily understood by all. It describes:
• People and Organization. Who is involved with the business? The people involved are not only the employees within the organization, but customers, agents, and others with whom the organization comes in contact. Organizations of interest include the enterprise itself and its own internal departments, as well as customers, competitors, government agencies, and the like.
• Geographic Locations. Where is business conducted? A geographic location may be either a geographic area (defined as any bounded area on the Earth), a geographic point (used to identify a particular location), or, if you are an oil company for example, a geographic solid (such as an oil reserve).
• Assets. What tangible items are used to carry out the business? These are any physical things that are manipulated, sometimes as products, but also as the means to producing products and services.
• Activities. How is the business carried out? This model not only covers services offered, but also projects and any other kinds of activities. In addition, the model describes the events that cause activities to happen.
• Time. All data is positioned in time, but some more than others.
Level 2: A more detailed model describing specific functional areas:
• Human Resources
• Communications and Marketing
• The Laboratory Level 3: Examples of the details a model can have to address what is truly unique in a particular industry. Here you see how to address the unique bits in areas as diverse as:
• Criminal Justice. The model presented here is based on the “Global Justice XML Data Model” (GJXDM).
• Banking. The model presented here is the result of working for four different banks and then adding some thought to come up with something different from what is currently in any of them.
• Highways. The model here is derived from a project in a Canadian Provincial Highway Department, and addresses the question “what is a road?”
In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.Peer under the hood of the systems you already use, and learn how to use and operate them more effectivelyMake informed decisions by identifying the strengths and weaknesses of different toolsNavigate the trade-offs around consistency, scalability, fault tolerance, and complexityUnderstand the distributed systems research upon which modern databases are builtPeek behind the scenes of major online services, and learn from their architectures
It offers a view of the world being addressed by all the techniques, methods, and tools of the information processing industry (for example, object-oriented design, CASE, business process re-engineering, etc.) and presents several concepts that need to be addressed by such tools.
This book is pertinent, with companies and government agencies realizing that the data they use represent a significant corporate resource recognize the need to integrate data that has traditionally only been available from disparate sources. An important component of this integration is management of the "metadata" that describe, catalogue, and provide access to the various forms of underlying business data. The "metadata repository" is essential to keep track of the various physical components of these systems and their semantics.
The book is ideal for data management professionals, data modeling and design professionals, and data warehouse and database repository designers.A comprehensive work based on the Zachman Framework for information architecture—encompassing the Business Owner's, Architect's, and Designer's views, for all columns (data, activities, locations, people, timing, and motivation)Provides a step-by-step description of model and is organized so that different readers can benefit from different partsProvides a view of the world being addressed by all the techniques, methods and tools of the information processing industry (for example, object-oriented design, CASE, business process re-engineering, etc.)Presents many concepts that are not currently being addressed by such tools — and should be
Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.Use the IPython shell and Jupyter notebook for exploratory computingLearn basic and advanced features in NumPy (Numerical Python)Get started with data analysis tools in the pandas libraryUse flexible tools to load, clean, transform, merge, and reshape dataCreate informative visualizations with matplotlibApply the pandas groupby facility to slice, dice, and summarize datasetsAnalyze and manipulate regular and irregular time series dataLearn how to solve real-world data analysis problems with thorough, detailed examples