Building a Data Mesh with Starburst

Sanjeev Varma – CEO / Alex Scroggins – Solution Architect/ Jasmohan Singh Narula – Solution Architect

Data mesh is a new, decentralized approach to data that allows end-users to easily access data where it lives without a data lake or data warehouse. Domain-specific teams manage and serve data as a product to be consumed by others. Its objective is to allow for data products to be created from virtually any data source while minimizing intervention from data engineers.

Data mesh has four principles to achieve this objective. The principles are:

  1. Domain-Oriented Ownership
  2. Data as a Product
  3. Self-Service Data Infrastructure
  4. Federated Computational Governance

Starburst can be used to achieve a data mesh. The following sections outline how Starburst can be used in alignment with each of the four principles of data mesh.

Domain-Oriented Ownership

In a data mesh, data teams are organized by domain, which is another word for the subject area. Teams publish data products that other teams can access and use to derive their own new data products. Starburst’s goal is to allow teams to focus less on building infrastructure and data pipelines around serving data products and more on using familiar tools such as SQL to prepare data products for end-users.

To achieve this, Starburst provides a large set of connectors that allows each domain to connect to data wherever and in whatever format it may live using a SQL query interface.

Figure 1 shows various example data sources that can be accessed from Starburst.

Figure 1: Example data sources in Starburst

Data as a Product

After connecting to a data source in Starburst, Starburst allows you to curate data products from it for other users to access. 

Users can browse the published data products as shown in Figure 2.

Figure 2: Example data products created from data sources in Starburst

Self-Service Data Infrastructure

Starburst’s SQL query interface allows users to discover, understand, and evaluate the trustworthiness of data products.

Figure 3 shows an example of using the SQL query interface to query an Amazon S3-based data product.

Figure 3: Querying an S3 file in Starburst

Using the SQL query interface, you can also join data products from different technologies together. For example, Figure 4 shows an example of joining together a PostgreSQL-based data product with an Amazon S3-based data product on a common field. The result of this join can be considered a new, derived data product that can also be registered in Starburst.

Figure 4: Deriving a new data product from different technologies using Starburst

Federated Computational Governance

Data mesh proposes a federated model for data governance that focuses on shared responsibility between the domains and the central IT organization in order to adhere to governance, risk, and compliance concerns while allowing adequate autonomy for the domains.

Starburst provides connectors and access to various data governance and data catalog tools such as Collibra and Alation to help users discover, understand, and evaluate the trustworthiness of data products.

Starburst also significantly reduces the need to create copies of data between systems as Starburst’s query engine can read across data sources and can replace or reduce a traditional ETL/ELT pipeline. Copying data also requires reapplying entitlements, which can result in potential opportunities for a data breach; with Starburst that risk is minimized simply because fewer copies of the data will exist since data is mostly queried at the source. This concept, known as data minimization, means data privacy, security, and governance are more achievable goals in organizations that embrace Starburst together with data mesh.

Sources

https://www.starburst.io/resources/starburst-data-products/

https://blog.starburst.io/data-mesh-and-starburst-domain-oriented-ownership-architecture

https://blog.starburst.io/data-mesh-and-starburst-data-as-a-product

https://blog.starburst.io/data-mesh-starburst-self-service-data-infrastructure

https://blog.starburst.io/data-mesh-federated-computational-governance

Share on facebook
Share on twitter
Share on linkedin

Let us know how we can help you.

Looking for a new career?

We use cookies to ensure we give you the best experience on our website. If you continue to use this site, we will assume you consent to our privacy policy.