Data Mesh: The next generation of data platforms
Data mesh represents a new way of looking at information. It is born from the growing concept that data is itself a product, a tool, a means to an end – not simply something businesses gather and analyze later in a backward-looking attempt to understand things that have already happened.
What is Data Mesh?
Data mesh is a strategic approach to modern data management and a way to strengthen an organization’s digital transformation journey, as it centers on serving up valuable and secure data products. The main objective of data mesh is to evolve beyond the traditional centralized data management methods of utilizing data warehouses and data lakes. It emphasizes the idea of organizational agility by empowering data producers and data consumers with the accessibility to access and manage data, without the trouble of delegating to the data lake or data warehouse team. The decentralized method of data mesh allocates data ownership to domain-specific groups that serve, own, and manage data as a product.
Why Data Mesh?
The sad truth is that the monolithic data architectures of the past are cumbersome, expensive, and inflexible. Over the years, it’s become clear that most of the time and costs for the digital business platform from analytics applications are sunk into integration efforts. Consequently, most platform initiatives fail.
While data mesh is not a silver bullet for centralized, monolithic data architectures, the principles, practices, and technologies of the data mesh strategy are designed to solve some of the most pressing and unaddressed, modernization objectives for data-driven business initiatives.
How does it work?
A data mesh involves a cultural shift in the way that companies think about their data. Instead of data acting as a by-product of a process, it becomes the product, where data producers act as data product owners. Historically, a centralized infrastructure team would maintain data ownership across domains, but the product thinking focus under a data mesh model shifts this ownership to the producers as they are the subject matter experts. Their understanding of the primary data consumers and how they leverage the domain’s operational and analytical data allows them to design APIs with their best interests in mind. While this domain-driven design also makes data producers responsible for documenting semantic definitions, cataloging metadata, and setting policies for permissions and usage, there is still a centralized data governance team to enforce these standards and procedures around the data. Additionally, while domain teams become responsible for their ETL data pipelines under a data mesh architecture, it doesn’t eliminate the need for a centralized data engineering team. However, their responsibility becomes more focused on determining the best data infrastructure solutions for the data products being stored.
Similar to how a microservices architecture couples lightweight services together to provide functionality to a business or consumer-facing application, a data mesh uses functional domains as a way to set parameters around the data, enabling it to be treated as a product that can be accessed by users across the organization. In this way, a data mesh allows for more flexible data integration and interoperable functionality, where data from multiple domains can be immediately consumed by users for business analytics, data science experimentation, and more.
Benefits of data mesh in data management
- Agility and scalability: Data mesh powers decentralized data operations — improving time-to-market, scalability, and business domain agility.
- Flexibility and independence: Enterprises that take on data mesh architecture avoid becoming locked into one data platform or data product.
- Faster access to critical data: It offers easy access to a centralized infrastructure with a self-service model, allowing for faster data access and SQL queries.
- Transparency for cross-functional use across teams: Centralized data ownership on traditional data platforms makes expert data teams isolated and heavily dependent – creating a lack of transparency. Data mesh decentralizes data ownership and distributes it among cross-functional domain teams.
Use cases of a Data Mesh
While distributed data mesh architectures are still gaining adoption, they’re helping teams attain their goals of scalability for common big data use cases. These include:
- Business intelligence dashboards: As new initiatives arise, teams commonly require customized data views to understand the performance of these projects. Data mesh architectures can support this need for flexibility and customization by making data more available to data consumers.
- Automated virtual assistants: Businesses commonly use chatbots to support call centers and customer service teams. As frequently asked questions can touch on various datasets, a distributed data architecture can make more data assets available to these virtual agent systems.
- Customer experience: Customer data allows businesses to better understand their users, allowing them to provide more personalized experiences. This has been observed in a variety of industries from marketing to healthcare.
- Machine learning projects: By standardizing domain-agnostic data, data scientists can more easily stitch together data from various data sources, reducing the time spent on data processing. This time can help to accelerate the number of models that move into a production environment, enabling the achievement of automation goals.
Some Challenges
The main challenges of a data mesh stem from the complexities inherent to managing multiple data products (and their dependencies) across multiple autonomous domains. Here are the key considerations:
Multi-domain data duplication
Redundancy, which may occur when the data of one domain is repurposed to serve the business needs of another domain, could potentially impact resource utilization and data management costs.
Federated data governance and quality assurance
Different domains may require different data governance tools, which must be taken into account when data products and pipelines are shared commodities. The resulting deltas must be identified and federated.
Change management
Decentralizing data management to adopt a data mesh approach requires significant change management in highly centralized data management practices.
Cost and risk
Existing data and analytics tools should be adapted and augmented to support a data mesh architecture. Establishing a data management infrastructure to support a data mesh – including data integration, virtualization, preparation, masking, governance, orchestration, cataloging, and delivery – can be a very large, costly, and risky undertaking.
Cross-domain analytics
An enterprise-wide data model must be defined to consolidate the various data products and make them available to authorized users in one central location.
Conclusion
Data mesh does, in fact, advocate a highly coordinated, cross-domain governance program, but it’s easy to overlook, for a variety of reasons, not least of which is that many organizations are seeing this as an opportunity to avoid the hard work of cross-domain coordination, much like agile methods were often misunderstood and misused to inappropriately jettison timeless program and project management principles.
So, if you’re among the many enterprise data professionals intrigued by data mesh or are beginning to apply data mesh ideas in your organization, be careful to retain the professionalism required to coordinate activities across domains, applying the stitching you’ll need to hold the mesh together.