> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getprova.com/llms.txt
> Use this file to discover all available pages before exploring further.

# ChatGPT MCP Security Review

> Security review package for Provalytics MCP access in ChatGPT, including system architecture, data schema, trust boundaries, and what data is and is not exposed.

## Overview

This document is intended for security, data governance, and architecture review teams evaluating the use of Provalytics through a custom MCP connector in ChatGPT.

[Download Security Review PDF](/images/security/chatgpt-mcp-security-review.pdf)

It covers:

* the system architecture for the integration
* the data architecture exposed through the MCP server
* the detailed operating model for how ChatGPT accesses Provalytics data
* the controls and limitations built into the implementation

## Executive summary

The Provalytics ChatGPT integration does **not** send a full database dump to ChatGPT.

Instead, ChatGPT connects to the Prova MCP server and makes **on-demand, read-only tool calls** for specific reporting and modeling questions. Each API key is scoped to a single client workspace, and the MCP server returns only the data that the authenticated key is permitted to access.

Key design points:

* access is **read-only**
* access is **client-scoped**
* tool calls are made **on demand**, not as a persistent bulk sync
* ChatGPT does **not** connect directly to Provalytics databases
* the customer tool surface exposes reporting and model outputs, not administrative controls or write operations

## 1. Detailed description of data used for ChatGPT integration

The ChatGPT integration uses the Provalytics MCP server as a controlled data-access layer.

When a user asks a question in ChatGPT, ChatGPT calls one or more Provalytics MCP tools, such as:

* `get_incrementality`
* `get_recommendations`
* `get_campaign_performance`
* `get_model_predictions`
* `get_model_statistics`
* `get_days_to_conversion`
* `get_cpm`
* `get_categories`
* `get_methodology`

The MCP server authenticates the user's Provalytics API key, determines the client workspace associated with that key, queries the appropriate reporting or model data source, and returns only the result for that request.

This means the integration behaves like a **query interface** over approved Provalytics data, not a replication process.

### What data is typically returned

The current customer-facing MCP tool set is designed around aggregated reporting and model outputs, including:

* channel-level spend
* channel-level incrementality
* recommendation and forecast outputs
* campaign or hierarchy-level performance rollups
* model validation metrics such as `R²` and `MAPE`
* model-predicted vs actual time series
* CPM and impression-share analysis
* days-to-conversion metrics
* category and subcategory mappings
* static methodology text

### What is not sent by design

The MCP integration is not designed to expose:

* write access into Provalytics
* raw database access
* unrestricted SQL access
* administrative UI actions
* source-system credentials
* Provalytics user passwords
* connector secrets
* cross-client data for a client-scoped key

### PII and user-level data note

The exposed customer tool set is oriented around aggregated reporting and modeled outputs, not user-level identity data.

Typical responses contain channel, campaign, KPI, date, spend, impression, click, forecast, and model-quality fields rather than cookies, device IDs, hashed emails, phone numbers, or person-level event streams.

That said, campaign names, category labels, and other customer-defined taxonomy values are client-authored strings. As a best practice, customers should avoid embedding personal or otherwise sensitive information in naming conventions.

## 2. Data schema and data architecture

### Access pattern

The MCP server exposes a controlled tool interface over Provalytics data.

For a customer-scoped key:

1. ChatGPT calls the MCP endpoint
2. The Prova MCP server validates the API key
3. The MCP server resolves the client scope for that key
4. The MCP server calls a specific read-only tool
5. The tool reads from the approved reporting table, restored model bundle, or static methodology content
6. The result is returned to ChatGPT

No general browse-all-data operation is exposed to customer keys.

### Tool surface and source architecture

| MCP tool                   | Data domain                              | Primary source              | Example returned fields                                                                                                                                                           |
| -------------------------- | ---------------------------------------- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `get_categories`           | Category mapping / funnel organization   | Provalytics database        | `category`, `subcategory`, mapped channel/campaign combinations                                                                                                                   |
| `get_incrementality`       | Channel contribution above baseline      | Provalytics database        | `kpi`, `kpis_available`, `timeframe`, `channel`, `incremental_units`, `share_pct`, `spend`                                                                                        |
| `get_recommendations`      | Optimizer recommendation output          | Provalytics database        | `scenario`, `forecast_period`, `total_current_spend`, `total_recommended_spend`, `channel`, `current_spend`, `recommended_spend`, `change`, `change_pct`, `inc_share_pct`, `roas` |
| `get_marginal_response`    | Response curves / efficiency comparison  | Provalytics database        | `channel`, `current_spend`, `current_response`, `roas_per_dollar`, `curve_points`                                                                                                 |
| `get_model_statistics`     | Model quality summary                    | Provalytics database        | `model`, `observations`, `r_squared`, `rmse`, `mape_pct`, optional coefficients                                                                                                   |
| `get_model_predictions`    | Predicted vs actual time series          | Provalytics database        | `model`, `dates`, `actual`, `predicted`, `mape_pct`                                                                                                                               |
| `get_campaign_performance` | Channel / campaign hierarchy performance | Provalytics database        | `date_range`, `level`, `channel`, `spend`, `incremental_units`, `impressions`, `clicks`, `roas`, `ctr_pct`, `cpm`                                                                 |
| `get_days_to_conversion`   | Conversion timing by channel             | Provalytics database        | `kpi`, `as_of_date`, `timeframe`, `channel`, `days_to_conversion`, `impressions`                                                                                                  |
| `get_cpm`                  | CPM and impression-share reporting       | Provalytics database        | `date_range`, `blended_cpm`, `total_impressions`, `total_spend`, `daily_trend`, `channel`, `impression_share_pct`, `yoy`                                                          |
| `get_methodology`          | Static model explanation                 | In-code methodology content | methodology sections, summaries, definitions, academic references                                                                                                                 |

### Source-of-truth behavior

The implementation intentionally aligns several MCP tools with the same persistent reporting tables used by the Provalytics dashboard.

Examples:

* `get_incrementality` reads from the Provalytics incrementality reporting layer
* `get_campaign_performance` reads from the Provalytics campaign-performance reporting layer
* `get_days_to_conversion` reads from the Provalytics days-to-conversion reporting layer
* `get_cpm` reads from the Provalytics cost and impressions reporting layer

This is important because it means the values returned through MCP are intended to match the numbers shown in the dashboard for the same client and time window.

### Data minimization characteristics

The data returned to ChatGPT is limited by:

* the selected tool
* the parameters supplied to that tool
* the client scope attached to the API key
* report visibility configuration where applicable

The MCP server returns the result of the request rather than a broad export of unrelated tables.

## 3. Systems architecture diagram

```mermaid theme={null}
flowchart LR
    U["Authorized user"] --> CG["ChatGPT workspace"]
    CG --> CC["Custom MCP connector"]
    CC -->|HTTPS + authenticated request| MCP["Prova MCP server"]

    subgraph Provalytics["Provalytics trust boundary"]
        MCP --> APP["Provalytics application layer\nauthentication, access control, request handling"]
    end

    MCP -->|Read-only tool response| CG
```

## Trust boundaries

### ChatGPT boundary

ChatGPT acts as the client application invoking MCP tools. It does not connect directly to Provalytics databases.

### MCP server boundary

The Prova MCP server is the enforcement layer for:

* authentication
* client scoping
* tool selection
* response shaping

### Data-source boundary

Approved reporting tables, model bundles, and static methodology content remain inside the Provalytics environment. ChatGPT receives only the response payload produced by the selected tool call.

## Authentication and authorization controls

### API key model

The MCP server uses user-specific Provalytics API keys for authenticated access.

### Client scoping

Each key is scoped to a single client workspace.

### Revocation

Revoked keys are rejected by the MCP server and stop working immediately once marked revoked in the key table.

### Rate limiting

The server enforces per-key request controls to protect the service boundary.

### Monitoring

The implementation tracks:

* request count
* last used timestamp
* per-tool request volume
* session count
* session duration

## Transport and interaction model

### In transit

The MCP endpoint is served over `HTTPS` using modern `TLS` transport.

### Interaction style

The MCP server supports both:

* stateless request/response message handling
* streaming / SSE transport for compatible clients

For ChatGPT security review purposes, the important point is that the server remains a controlled API boundary in either mode.

## Data handling characteristics relevant to approval

### Read-only design

The customer-exposed MCP tool set is read-only. No tool writes back into Provalytics data stores.

### No direct database access from ChatGPT

ChatGPT does not authenticate directly against Provalytics internal data stores.

### No customer-scoped admin surface

Customer keys do not expose administrative controls.

### On-demand responses rather than bulk sync

Data is returned only when a tool is invoked. The integration is not built as a background replication process into ChatGPT.

## Example approval language

If the customer security team wants a concise summary, the following language is accurate:

> The Provalytics ChatGPT integration uses a client-scoped, read-only MCP server. ChatGPT does not connect directly to Provalytics databases and does not receive a full data export. Instead, ChatGPT makes authenticated, on-demand tool calls to the Prova MCP server, which validates the user key, enforces client scoping, queries approved reporting or model-output sources, and returns only the requested result set. Customer-scoped keys cannot modify data and cannot traverse other clients’ workspaces.

## Questions security teams commonly ask

### Does ChatGPT receive a full copy of our Provalytics data?

No. The integration is request-driven. ChatGPT receives only the response payload for the specific MCP tool call that was made.

### Can the connector write back into Provalytics?

No. The current customer tool set is read-only.

### Can one customer key access another client’s data?

No. Customer keys are client-scoped.

### Is raw source-system credential material exposed?

No. The customer MCP tool surface is designed around reporting and model outputs, not connector secret retrieval.

### Are model methodology explanations also available?

Yes. `get_methodology` returns static methodology content and does not require client-specific reporting data.

## Recommended next step

For formal security review, this document should be paired with:

* the ChatGPT connector setup guide
* the MCP overview page
* any customer-specific internal policy language around AI usage and retention
