What is Data Mesh?
With the growing importance of data over the years, many ideas have emerged around data management, data processing, and data storage. Among these, Data Mesh stands out as an innovative concept, both for its novel perspective and for the operational benefits it promises.
First introduced by Zhamak Dehghani in 2019, the concept has been developed and shaped into clear principles. Instead of a traditional centralized architecture like the data warehouse, it aims to create a decentralized, self-service platform where data can be accessed seamlessly across domains, inspired by microservice architectures in software engineering.
The approach expects data not to be treated as a by-product, but as a product itself (Data as a Product). Dehghani built the Data Mesh paradigm on four key principles:
1. Principle of Domain Ownership
At the core of Data Mesh is the idea of decentralization: distributing ownership of data to the teams closest to it. This provides advantages such as faster adaptation to changes. However, this raises key questions: How will the data be divided? Where do the boundaries lie? How will responsibilities be distributed?
Data Mesh answers this by aligning ownership with organizational domains rather than central tools.
Example:
In a bank, if an ad hoc environment needs to be prepared for ATM transactions, this task should not fall to the central analytics reporting team. The reporting team does not have deep knowledge of ATM transaction data. Instead, the ATM domain team, which owns and understands the data, should prepare it themselves.
In traditional data warehouse architectures, ownership is tool-driven: the analytics team would be responsible for such a task. This mismatch often causes inefficiencies.
2. Principle of Data as a Product
This principle emphasizes that domains must treat data as a product and data users as customers — even if those customers are other domains. As a result, data products must carry certain qualities:
- Discoverable: Users should be able to easily discover data products for new projects.
- Addressable: Each data product must have a unique, accessible address.
- Understandable: Users must quickly understand the entities and relationships in a dataset.
- Trustworthy: Users must trust the product. The data must be consistent and reliable.
- Accessible: Access to the data product must be straightforward.
- Interoperable: Data products must work together across multiple domains.
- Valuable: A data product must have meaning and serve a useful purpose.
- Secure: Access must be safe and privacy-compliant.
This approach also introduces new roles:
Data Product Developer
Builds, maintains, and serves data products over their lifecycle.Data Product Owner
Acts like a product manager: balancing user requests, guiding the team, and ensuring product value.
3. Principle of the Self-Serve Data Platform
Since many domains coexist in a Data Mesh, there must be a common platform that allows them to work autonomously while sharing data products. The goal is to empower domains to use and publish data without depending on other teams’ workflows.
Example:
In a traditional warehouse, a shared table may be populated by two different teams. Each team’s workflow can block or slow down the other. A self-serve platform eliminates such dependencies.
The platform should also reduce complexity to the minimum, simplifying the exchange of data products across domains. Tools like self-service authorization systems can help eliminate bureaucracy and let domains focus on delivering value.
4. Principle of Federated Computational Governance
The final missing piece is governance and security. While domains operate independently, the entire mesh must still adhere to organizational policies. This requires a federated governance model.
Governance teams are responsible for ensuring that data is high-quality, consistent, and secure. They also enforce global rules (e.g., compliance, privacy, interoperability) across the platform.
So, how are the rules defined in such a federated system?
- Local autonomy: Domains can make local decisions within their scope.
- Global policies: All domains must comply with agreed-upon organizational rules.
- These global rules are shaped by legal, security, and interoperability requirements.
Summary
Data Mesh is a concept we will hear more and more about in the near future. It holds great promise as a solution to the limitations of centralized data architectures. I believe that many organizations will gradually move away from data lakes and data warehouses, adopting Data Mesh as a new way of managing and scaling data.