The AI Data Stack

Data infrastructure that leverages human context across all models and AI modalities

Scroll to begin
Chapter 01

Two Environs, Two Trust Models

When we talk about "AI for analytics," we're actually talking about two completely different problems that happen to share some vocabulary. Getting them confused is why most solutions feel wrong for at least one use case.

🔬

The IDE for Analytics

Exploratory mode: grad students helping experts

  • User is a sophisticated analyst or data scientist
  • Knows what questions they're asking and why
  • Can evaluate whether the agent's approach makes sense
  • Wants the agent to surface possibilities, try things
  • Comfortable dipping into raw data if that's where the signal is
📊

The Dashboard Companion

Constrained mode: translator for a curated view

  • User is a business stakeholder looking at a dashboard
  • Trusts the dashboard (someone else built it)
  • Wants to understand what they're seeing, not go deeper
  • Should not get answers that contradict or exceed the view
  • Agent is a translator, not an explorer

When you're working closely with the model, you want exploration and novelty. You want discovery of new facts, which inherently comes with risk of hallucination and incorrect conclusions. In many other workflows, you just want answers.

The key distinction

These two environs imply different trust boundaries, not just different prompts. The exploratory agent's ceiling is human judgment. The constrained agent's ceiling is the dashboard itself.

Chapter 02

The Toolbag

The AI Data Stack provides four categories of infrastructure that any agent can consume. Think of these as the primitives: the things an agent needs to know to make good decisions about data, or to have its decisions audited.

Users
🔬
Data Scientist
Exploratory, full access
📊
Business User
Dashboard-scoped
🤖
Automated Pipeline
AAA metrics only
Model
Any LLM / Agent
Claude, GPT, Gemini, custom agents
AI Data Stack
Trust Ratings Composition Rules SQL Compilation Lineage

The stack sits beneath the model. Different users get different experiences, but the infrastructure is the same. The model consumes the stack; the user's context determines how much of it surfaces.

Trust Ratings

Not all data is created equal. A governed metric with clear ownership and lineage is fundamentally different from a raw log table someone created for debugging. The agent needs to know the difference.

AAA
Governed Metrics
Fully governed, auditable, repeatable. Clear ownership, stable definitions, strong typing.
deterministic
AA
Semantic Models
Structured but less strictly governed. Curated dimensions and relationships.
structured
A
Raw Tables & Logs
Exploratory, lowest trust. May be sampled, ephemeral, or schema-unstable.
exploratory

Composition Rules

This is where most semantic layers fail. They tell you what things are, but not what combinations are valid. An agent that doesn't know composition rules will happily join a sampled table to an unsampled table and give you nonsense with confidence.

Composition rules encode things like:

Deterministic SQL Compilation

This is the core insight: governed metrics must compile down to inspectable SQL. Not "generate SQL probabilistically." Not "approximate with LLM reasoning." Compile: deterministically, repeatably, cacheably.

-- The metric definition (high-level representation) METRIC weekly_active_users: COUNT DISTINCT user_id WHERE event_type IN ('login', 'action') AND event_date BETWEEN @week_start AND @week_end FROM events_table -- AAA source -- Compiles to (inspectable, cacheable, auditable) SELECT COUNT(DISTINCT user_id) AS weekly_active_users FROM analytics.events_table WHERE event_type IN ('login', 'action') AND event_date BETWEEN '2025-01-06' AND '2025-01-12'

Why does this matter? Because if you can't show the SQL, you can't trust the number. And if the agent can't trust the number, neither can you trust the agent's reasoning about it.

Lineage & Provenance

Every piece of data has a story: where it came from, what transformations it went through, who owns it, when it was last refreshed. Lineage makes that story machine-readable.

For exploratory agents, lineage is about explanation: the agent can tell you why it trusts (or doesn't trust) a particular number. For constrained agents, lineage is about validation: the agent can verify that its answer is grounded in the same sources as the dashboard.

Tool What It Gives the Agent Why It Matters
Trust Ratings "How much should I believe this?" Agent can prefer governed metrics, explain when it can't
Composition Rules "Can I join these two things?" Prevents semantically invalid combinations
SQL Compilation "Show me exactly what this means" Auditability, caching, no hidden magic
Lineage "Where did this come from?" Agent can explain its reasoning, human can verify
Chapter 03

How Each Environ Uses the Stack

Same four tools, completely different usage patterns. The AI Data Stack doesn't change, but the agent's posture does.

Exploratory / IDE Mode
Constrained / Dashboard Mode
Trust Ratings
Inform choices. Agent consults ratings but isn't bound by them. Might deliberately go to A-tier data: "This is raw logs, here's my hypothesis."
Hard boundaries. Only AAA metrics. Only what's rendered in the dashboard. No exceptions.
Composition Rules
Flag risks. Agent warns about invalid joins but lets the human decide. "These tables have different grains, want me to proceed anyway?"
Validation checks. If the dashboard computed something one way, don't recompute it differently. Refuse invalid compositions outright.
SQL Compilation
Show your work. Agent generates SQL, shows it to the human, iterates. The SQL is part of the conversation.
Verify alignment. Agent checks that its answers use the same SQL as the dashboard's underlying queries. No drift allowed.
Lineage
Explain reasoning. Agent traces its logic back to sources. Human can audit the full chain.
Confirm grounding. "This number comes from the same source as the chart you're looking at."

See it in action

"Why did our conversion rate drop last week? Dig into it."
Agent Response
1 Pulling conversion_rate from governed metrics AAA
2 Week-over-week: 3.2% → 2.8% (−12.5%). Confirmed against source.
3 Checking funnel stages... checkout_started also dropped. Investigating.
4 Raw clickstream logs show 504 errors spike on Wed PM A-tier
5 Hypothesis: checkout service degradation. Here's the SQL I used...
"I'm looking at the revenue dashboard. Why is APAC down?"
Agent Response
1 The dashboard shows APAC revenue from regional_revenue metric AAA
2 APAC is down 8% vs last month. This matches what you see in the chart.
3 Within APAC, Japan accounts for most of the decline (−12%).
4 The dashboard doesn't break down further. Want me to flag this for the analytics team?

Notice the difference: the exploratory agent went beyond the governed metrics into raw logs, and said so explicitly. The constrained agent stayed within the dashboard's scope, and stopped when it hit the boundary.

Exploratory agents can do clever things that would be reckless in constrained mode: mining old queries from experts to discover how tables join, inferring relationships from historical usage patterns, trying novel combinations to surface insights. These are features, not bugs, when an expert is reviewing the work. Constrained agents need to be much more careful. They're not discovering, they're translating.

Both agents used the same infrastructure. The difference is posture.

Chapter 04

What Happens Without the Stack

When agents don't have access to trust ratings, composition rules, and deterministic compilation, you get predictable failure modes. These aren't edge cases, they're the default.

Without... What Breaks What It Looks Like
Trust Ratings Agents can't prioritize sources Agent treats a debug log table the same as a governed metric. Gives you "answers" from unreliable data without flagging the risk.
Composition Rules Semantically invalid joins Agent joins a 10% sampled table with unsampled data, or sums a non-additive metric across regions. The number is precise but meaningless.
SQL Compilation Hidden magic, no audit trail The semantic layer "figures out" the join path with LLM reasoning. Sometimes right, sometimes wrong. You can't tell which without running queries and eyeballing results.
Lineage Unexplainable answers Agent joins three tables and gives you a number. You ask "where did that come from?" It generates a plausible-sounding but wrong explanation. No verifiable trail.
Dashboard Scope Constrained agents that drift You want the agent to only use dashboard data, but "what's in this dashboard" isn't machine-readable. Agent hallucinates metrics that sound right but don't match the view.

The cost of not building the stack

Without trust ratings, agents can't prioritize. Without composition rules, they can't avoid invalid combinations. Without deterministic compilation, you can't audit their work. Without lineage, you can't verify their reasoning. Every failure mode traces back to missing infrastructure.

Chapter 05

What You Actually Build

The AI Data Stack isn't a product you buy. It's a set of capabilities you build into your data infrastructure. Here's what each piece looks like in practice.

Trust Ratings: A Metadata Layer

Every table, column, and metric gets a trust rating stored in your metadata catalog. This isn't just documentation, it's machine-readable configuration that agents consume at query time.

# In your metric definitions metrics: weekly_active_users: trust_rating: "AAA" owner: "analytics-core" definition: "COUNT DISTINCT user_id WHERE..." refresh_sla: "daily by 6am UTC" certified_date: "2025-01-01" experimental_engagement_score: trust_rating: "A" owner: "growth-experiments" warning: "Definition under review. Do not use for reporting."

Composition Rules: A Constraint System

Encode which combinations are valid, and which aren't. This is where you capture tribal knowledge that currently lives in people's heads.

# Composition rules rules: - name: "no_sampled_unsampled_join" when: join_tables: - has_property: "sampled" - has_property: "unsampled" action: "block" message: "Cannot join sampled and unsampled tables without normalization" - name: "revenue_not_additive_across_currency" when: aggregate: "SUM(revenue)" group_by_includes: "currency" action: "warn" message: "Revenue is not additive across currencies. Convert first."

The Compiler: From Metrics to SQL

This is the heart of it. A governed metric is a high-level representation that compiles to SQL. Not probabilistically generates it, but deterministically transforms it.

The compilation step is where you enforce composition rules, inject trust ratings, and produce auditable output. If the metric can't compile cleanly, the agent knows something is wrong before it runs anything.

The Agent Interface

Finally, expose all of this through an interface agents can consume:

# What the agent calls data_stack.get_metric("weekly_active_users") # Returns: definition, trust rating, owner, compiled SQL, lineage data_stack.validate_query(proposed_sql) # Returns: valid/invalid, rule violations, warnings data_stack.get_dashboard_scope("revenue_overview") # Returns: list of metrics/dimensions the dashboard uses

The agent doesn't need to understand your data warehouse. It needs to understand the contracts your data warehouse exposes. That's what the AI Data Stack provides.

Start small, expand from trust

You don't need to catalog everything on day one. Start with your most-used metrics, the ones that show up in executive dashboards. Get those to AAA. Add composition rules as you discover invalid joins in the wild. The stack grows organically from the metrics that matter most.