ChatGPT MCP Security Review

Overview

This document is intended for security, data governance, and architecture review teams evaluating the use of Provalytics through a custom MCP connector in ChatGPT. Download Security Review PDF It covers:

the system architecture for the integration
the data architecture and data schema exposed through the MCP server
the detailed operating model for how ChatGPT accesses Provalytics data
the controls and limitations built into the implementation

Executive summary

The Provalytics ChatGPT integration does not send a full database dump to ChatGPT. Instead, ChatGPT connects to the Prova MCP server and makes on-demand, read-only tool calls for specific reporting and modeling questions. Each API key is scoped to a single client workspace, and the MCP server returns only the data that the authenticated key is permitted to access. Key design points:

access is read-only
access is client-scoped
tool calls are made on demand, not as a persistent bulk sync
ChatGPT does not connect directly to Provalytics databases
the customer tool surface exposes reporting and model outputs, not administrative controls or write operations

1. Detailed description of data used for ChatGPT integration

The ChatGPT integration uses the Provalytics MCP server as a controlled data-access layer. When a user asks a question in ChatGPT, ChatGPT calls one or more Provalytics MCP tools, such as:

get_incrementality
get_recommendations
get_campaign_performance
get_model_predictions
get_model_statistics
get_days_to_conversion
get_cpm
get_categories
get_methodology

The MCP server authenticates the user’s Provalytics API key, determines the client workspace associated with that key, queries the appropriate reporting or model data source, and returns only the result for that request. This means the integration behaves like a query interface over approved Provalytics data, not a replication process.

What data is typically returned

The current customer-facing MCP tool set is designed around aggregated reporting and model outputs, including:

channel-level spend
channel-level incrementality
recommendation and forecast outputs
campaign or hierarchy-level performance rollups
model validation metrics such as R² and MAPE
model-predicted vs actual time series
CPM and impression-share analysis
days-to-conversion metrics
category and subcategory mappings
static methodology text

What is not sent by design

The MCP integration is not designed to expose:

write access into Provalytics
raw database access
unrestricted SQL access
administrative UI actions
source-system credentials
Provalytics user passwords
connector secrets
cross-client data for a client-scoped key

PII and user-level data note

The exposed customer tool set is oriented around aggregated reporting and modeled outputs, not user-level identity data. Typical responses contain channel, campaign, KPI, date, spend, impression, click, forecast, and model-quality fields rather than cookies, device IDs, hashed emails, phone numbers, or person-level event streams. That said, campaign names, category labels, and other customer-defined taxonomy values are client-authored strings. As a best practice, customers should avoid embedding personal or otherwise sensitive information in naming conventions.

2. Data schema and data architecture

Access pattern

The MCP server exposes a controlled tool interface over Provalytics data. For a customer-scoped key:

ChatGPT calls the MCP endpoint
The Prova MCP server validates the API key
The MCP server resolves the client scope for that key
The MCP server calls a specific read-only tool
The tool reads from the approved reporting table, restored model bundle, or static methodology content
The result is returned to ChatGPT

No general browse-all-data operation is exposed to customer keys.

Tool surface and source architecture

MCP tool	Data domain	Primary source	Example returned fields
`get_categories`	Category mapping / funnel organization	Provalytics database	`category`, `subcategory`, mapped channel/campaign combinations
`get_incrementality`	Channel contribution above baseline	Provalytics database	`kpi`, `kpis_available`, `timeframe`, `channel`, `incremental_units`, `share_pct`, `spend`
`get_recommendations`	Optimizer recommendation output	Provalytics database	`scenario`, `forecast_period`, `total_current_spend`, `total_recommended_spend`, `channel`, `current_spend`, `recommended_spend`, `change`, `change_pct`, `inc_share_pct`, `roas`
`get_marginal_response`	Response curves / efficiency comparison	Provalytics database	`channel`, `current_spend`, `current_response`, `roas_per_dollar`, `curve_points`
`get_model_statistics`	Model quality summary	Provalytics database	`model`, `observations`, `r_squared`, `rmse`, `mape_pct`, optional coefficients
`get_model_predictions`	Predicted vs actual time series	Provalytics database	`model`, `dates`, `actual`, `predicted`, `mape_pct`
`get_campaign_performance`	Channel / campaign hierarchy performance	Provalytics database	`date_range`, `level`, `channel`, `spend`, `incremental_units`, `impressions`, `clicks`, `roas`, `ctr_pct`, `cpm`
`get_days_to_conversion`	Conversion timing by channel	Provalytics database	`kpi`, `as_of_date`, `timeframe`, `channel`, `days_to_conversion`, `impressions`
`get_cpm`	CPM and impression-share reporting	Provalytics database	`date_range`, `blended_cpm`, `total_impressions`, `total_spend`, `daily_trend`, `channel`, `impression_share_pct`, `yoy`
`get_methodology`	Static model explanation	In-code methodology content	methodology sections, summaries, definitions, academic references

Source-of-truth behavior

The implementation intentionally aligns several MCP tools with the same persistent reporting tables used by the Provalytics dashboard. Examples:

get_incrementality reads from the Provalytics incrementality reporting layer
get_campaign_performance reads from the Provalytics campaign-performance reporting layer
get_days_to_conversion reads from the Provalytics days-to-conversion reporting layer
get_cpm reads from the Provalytics cost and impressions reporting layer

This is important because it means the values returned through MCP are intended to match the numbers shown in the dashboard for the same client and time window.

Internal-only separation

The MCP server contains an internal-only tool called list_clients, but that tool is restricted to internal-scoped keys and is not available to customer-scoped keys. For customer ChatGPT integrations, the relevant operating assumption is:

one user key
one client scope
no cross-client listing or traversal

Data minimization characteristics

The data returned to ChatGPT is limited by:

the selected tool
the parameters supplied to that tool
the client scope attached to the API key
report visibility configuration where applicable

The MCP server returns the result of the request rather than a broad export of unrelated tables.

3. Systems architecture diagram

Trust boundaries

ChatGPT boundary

ChatGPT acts as the client application invoking MCP tools. It does not connect directly to Provalytics databases.

MCP server boundary

The Prova MCP server is the enforcement layer for:

authentication
client scoping
tool selection
rate limiting
response shaping

Data-source boundary

Approved reporting tables, model bundles, and static methodology content remain inside the Provalytics environment. ChatGPT receives only the response payload produced by the selected tool call.

Authentication and authorization controls

API key model

The MCP server accepts credentials via:

Authorization: Bearer <token>
?token=<token> query parameter

The query parameter path exists to support connector environments such as Claude Desktop. For ChatGPT, the expected pattern is token-based authentication configured in the connector.

Key format and storage

The implementation validates that keys use the prova_ prefix and performs a SHA-256 hash lookup against the stored key record. This means the server checks a hashed representation of the key rather than retrieving a stored plaintext key value from the database.

Client scoping

The key record includes:

key_id
client_id
scope
name

For customer use, the important field is client_id, which binds the key to a single client workspace.

Revocation

Revoked keys are rejected by the MCP server and stop working immediately once marked revoked in the key table.

Rate limiting

The server enforces a per-key rate limit of 60 requests per minute.

Monitoring

The implementation tracks:

request count
last used timestamp
per-tool request volume
session count
session duration

Transport and interaction model

In transit

The MCP endpoint is served over HTTPS. The live endpoint supports TLS 1.2 and TLS 1.3. TLS 1.0 and TLS 1.1 are not accepted by the endpoint.

Interaction style

The MCP server supports both:

stateless request/response message handling
streaming / SSE transport for compatible clients

For ChatGPT security review purposes, the important point is that the server remains a controlled API boundary in either mode.

Data handling characteristics relevant to approval

Read-only design

The customer-exposed MCP tool set is read-only. No tool writes back into Provalytics data stores.

No direct database access from ChatGPT

ChatGPT does not authenticate directly against Provalytics internal data stores.

No customer-scoped admin surface

Customer keys do not expose internal-only client enumeration or administrative controls.

On-demand responses rather than bulk sync

Data is returned only when a tool is invoked. The integration is not built as a background replication process into ChatGPT.

Example approval language

If the customer security team wants a concise summary, the following language is accurate:

The Provalytics ChatGPT integration uses a client-scoped, read-only MCP server. ChatGPT does not connect directly to Provalytics databases and does not receive a full data export. Instead, ChatGPT makes authenticated, on-demand tool calls to the Prova MCP server, which validates the user key, enforces client scoping, queries approved reporting or model-output sources, and returns only the requested result set. Customer-scoped keys cannot modify data and cannot traverse other clients’ workspaces.

Questions security teams commonly ask

Does ChatGPT receive a full copy of our Provalytics data?

No. The integration is request-driven. ChatGPT receives only the response payload for the specific MCP tool call that was made.

Can the connector write back into Provalytics?

No. The current customer tool set is read-only.

Can one customer key access another client’s data?

No. Customer keys are client-scoped.

Is raw source-system credential material exposed?

No. The customer MCP tool surface is designed around reporting and model outputs, not connector secret retrieval.

Are model methodology explanations also available?

Yes. get_methodology returns static methodology content and does not require client-specific reporting data.

Recommended next step

For formal security review, this document should be paired with:

the ChatGPT connector setup guide
the MCP overview page
any customer-specific internal policy language around AI usage and retention

​Overview

​Executive summary

​1. Detailed description of data used for ChatGPT integration

​What data is typically returned

​What is not sent by design

​PII and user-level data note

​2. Data schema and data architecture

​Access pattern

​Tool surface and source architecture

​Source-of-truth behavior

​Internal-only separation

​Data minimization characteristics

​3. Systems architecture diagram

​Trust boundaries

​ChatGPT boundary

​MCP server boundary

​Data-source boundary

​Authentication and authorization controls

​API key model

​Key format and storage

​Client scoping

​Revocation

​Rate limiting

​Monitoring

​Transport and interaction model

​In transit

​Interaction style

​Data handling characteristics relevant to approval

​Read-only design

​No direct database access from ChatGPT

​No customer-scoped admin surface

​On-demand responses rather than bulk sync

​Example approval language

​Questions security teams commonly ask

​Does ChatGPT receive a full copy of our Provalytics data?

​Can the connector write back into Provalytics?

​Can one customer key access another client’s data?

​Is raw source-system credential material exposed?

​Are model methodology explanations also available?

​Recommended next step