icon for mcp server

Data Product

STDIO

MCP server for discovering data products and executing queries with governance in Data Mesh Manager

Data Product MCP

A Model Context Protocol (MCP) server for discovering data products and requesting access in Data Mesh Manager, and executing queries on the data platform to access business data.

https://github.com/user-attachments/assets/8c8cd04d-33f6-4e33-856f-6141a41af2bb

Concept

Idea: Enable AI agents to find and access any data product for semantic business context while enforcing data governance policies.

or, if you prefer:

Enable AI to answer any business question.

Data Products are managed high-quality business data sets shared with other teams within an organization and specified by data contracts. Data contracts describe the structure, semantics, quality, and terms of use. Data products provide the semantic context AI needs to understand not just what data exists, but what it means and how to use it correctly. We use Data Mesh Manager as a data product marketplace to search for available data products and evaluate if these are relevant for the task by analyzing its metadata.

Once a data product is identified, data governance plays a crucial role in ensuring that access to data products is controlled, queries are in line with the data contract's terms of use, and its compliance with organizational global policies. If necessary, the AI agent can request access to the data product's output port, which may require manual approval from the data product owner.

Finally, the LLM can generate SQL queries based on the data contracts data model descriptions and semantics. The SQL queries are executed, while security guardrails are in place to ensure that no sensitive data is misused and attack vectors (such as prompt injections) are mitigated. The results are returned to the AI agent, which can then use them to answer the original business question.

Steps:

  1. Discovery: Find relevant data products for task in the data product marketplace
  2. Governance: Check and request access to data products
  3. Query: Use platform-specific MCP servers to execute SQL statements.

Data Mesh Manager serves as the central data product marketplace and governance layer, providing metadata, access controls, and data contracts for all data products in your organization.

Data Platforms (Snowflake, Databricks, etc.) host the actual data and execute queries. The MCP server connects to these platforms to run SQL queries against the data products you have access to.

Tools

  1. dataproduct_search

    • Search data products based on the search term. Uses multiple search approaches (list, semantic search) for comprehensive results. Only returns active data products.
    • Optional inputs:
      • search_term (string): Search term to filter data products. Searches in the id, title, and description. Multiple search terms are supported, separated by space.
    • Returns: Structured list of data products with their ID, name and description, owner information, and source of the result.
  2. dataproduct_get

    • Get a data product by its ID. The data product contains all its output ports and server information. The response includes access status for each output port and inlines any data contracts.
    • Required inputs:
      • data_product_id (string): The data product ID.
    • Returns: Data product details with enhanced output ports, including access status and inlined data contracts
  3. dataproduct_request_access

    • Request access to a specific output port of a data product. This creates an access request. Based on the data product configuration, purpose, and data governance rules, the access will be automatically granted, or it will be reviewed by the data product owner.
    • Required inputs:
      • data_product_id (string): The data product ID.
      • output_port_id (string): The output port ID.
      • purpose (string): The specific purpose what the user is doing with the data and the reason why they need access. If the access request needs to be approved by the data owner, the purpose is used by the data owner to decide if the access is eligible from a business, technical, and governance point of view.
    • Returns: Access request details including access_id, status, and approval information
  4. dataproduct_query

    • Execute a SQL query on a data product's output port. This tool connects to the underlying data platform and executes the provided SQL query. You must have active access to the output port to execute queries.
    • Required inputs:
      • data_product_id (string): The data product ID.
      • output_port_id (string): The output port ID.
      • query (string): The SQL query to execute.
    • Returns: Query results as structured data (limited to 100 rows)

Configuration

Add this entry to your MCP client configuration:

{ "mcpServers": { "dataproduct": { "command": "uvx", "args": [ "dataproduct_mcp" ], "env": { "DATAMESH_MANAGER_API_KEY": "dmm_live_user_...", "SNOWFLAKE_USER": "", "SNOWFLAKE_PASSWORD": "", "SNOWFLAKE_ROLE": "", "SNOWFLAKE_WAREHOUSE": "COMPUTE_WH", "DATABRICKS_HOST": "adb-xxx.azuredatabricks.net", "DATABRICKS_HTTP_PATH": "/sql/1.0/warehouses/xxx", "DATABRICKS_CLIENT_ID": "", "DATABRICKS_CLIENT_SECRET": "" } } } }

This is the format for Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json), other MCP clients have similar config options.

In Data Mesh Manager, create an API Key with scope "User (personal access token)".

Add the properties for Snowflake, Databricks, etc. as needed.

(Yes, we will work on OAuth2 based authentication to get rid of these access tokens)

Supported Server Types

The dataproduct_query tool supports executing queries on data products. The MCP client formulates SQL queries based on the data contract with its data model structure and semantics.

The following server types are currently supported out-of-the-box:

Server TypeStatusNotes
SnowflakeRequires SNOWFLAKE_USER, SNOWFLAKE_PASSWORD, SNOWFLAKE_WAREHOUSE, SNOWFLAKE_ROLE environment variables
DatabricksRequires DATABRICKS_HOST, DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, DATABRICKS_CLIENT_SECRET environment variables
S3Coming soonImplemented through DuckDB client
BigQueryComing soon
FabricComing soon

Note: Use additional Platform-specific MCP servers for other data platform types (e.g., BigQuery, Redshift, PostgreSQL) by adding them to your MCP client.

Contributing

See CONTRIBUTING.md for development setup and contribution guidelines.

Credits

Maintained by Simon Harrer, André Deuerling, and Jochen Christ.

Be the First to Experience MCP Now