The AI Data Stack: A Primer

Chapter 01

Two Environs, Two Trust Models

When we talk about "AI for analytics," we're actually talking about two completely different problems that happen to share some vocabulary. Getting them confused is why most solutions feel wrong for at least one use case.

🔬

The IDE for Analytics

Exploratory mode: grad students helping experts

User is a sophisticated analyst or data scientist
Knows what questions they're asking and why
Can evaluate whether the agent's approach makes sense
Wants the agent to surface possibilities, try things
Comfortable dipping into raw data if that's where the signal is

📊

The Dashboard Companion

Constrained mode: translator for a curated view

User is a business stakeholder looking at a dashboard
Trusts the dashboard (someone else built it)
Wants to understand what they're seeing, not go deeper
Should not get answers that contradict or exceed the view
Agent is a translator, not an explorer

When you're working closely with the model, you want exploration and novelty. You want discovery of new facts, which inherently comes with risk of hallucination and incorrect conclusions. In many other workflows, you just want answers.

The key distinction

These two environs imply different trust boundaries, not just different prompts. The exploratory agent's ceiling is human judgment. The constrained agent's ceiling is the dashboard itself.

Chapter 02

The Toolbag

The AI Data Stack provides four categories of infrastructure that any agent can consume. Think of these as the primitives: the things an agent needs to know to make good decisions about data, or to have its decisions audited.

Users

🔬

Data Scientist

Exploratory, full access

📊

Business User

Dashboard-scoped

🤖

Automated Pipeline

AAA metrics only

Model

Any LLM / Agent

Claude, GPT, Gemini, custom agents

AI Data Stack

Trust Ratings Composition Rules SQL Compilation Lineage

The stack sits beneath the model. Different users get different experiences, but the infrastructure is the same. The model consumes the stack; the user's context determines how much of it surfaces.

Trust Ratings

Not all data is created equal. A governed metric with clear ownership and lineage is fundamentally different from a raw log table someone created for debugging. The agent needs to know the difference.

AAA

Governed Metrics

Fully governed, auditable, repeatable. Clear ownership, stable definitions, strong typing.

deterministic

AA

Semantic Models

Structured but less strictly governed. Curated dimensions and relationships.

structured

A

Raw Tables & Logs

Exploratory, lowest trust. May be sampled, ephemeral, or schema-unstable.

exploratory

Composition Rules

This is where most semantic layers fail. They tell you what things are, but not what combinations are valid. An agent that doesn't know composition rules will happily join a sampled table to an unsampled table and give you nonsense with confidence.

Composition rules encode things like:

These two columns can be joined; these two cannot
This metric is additive across this dimension but not that one
This table is sampled at 10%; combining it with unsampled data requires normalization
This dimension hierarchy is strict; this one allows skip-levels

Deterministic SQL Compilation

This is the core insight: governed metrics must compile down to inspectable SQL. Not "generate SQL probabilistically." Not "approximate with LLM reasoning." Compile: deterministically, repeatably, cacheably.

-- The metric definition (high-level representation)
METRIC weekly_active_users:
  COUNT DISTINCT user_id
  WHERE event_type IN ('login', 'action')
  AND event_date BETWEEN @week_start AND @week_end
  FROM events_table -- AAA source

-- Compiles to (inspectable, cacheable, auditable)
SELECT COUNT(DISTINCT user_id) AS weekly_active_users
FROM analytics.events_table
WHERE event_type IN ('login', 'action')
  AND event_date BETWEEN '2025-01-06' AND '2025-01-12'
        

Why does this matter? Because if you can't show the SQL, you can't trust the number. And if the agent can't trust the number, neither can you trust the agent's reasoning about it.

Lineage & Provenance

Every piece of data has a story: where it came from, what transformations it went through, who owns it, when it was last refreshed. Lineage makes that story machine-readable.

For exploratory agents, lineage is about explanation: the agent can tell you why it trusts (or doesn't trust) a particular number. For constrained agents, lineage is about validation: the agent can verify that its answer is grounded in the same sources as the dashboard.

Tool	What It Gives the Agent	Why It Matters
Trust Ratings	"How much should I believe this?"	Agent can prefer governed metrics, explain when it can't
Composition Rules	"Can I join these two things?"	Prevents semantically invalid combinations
SQL Compilation	"Show me exactly what this means"	Auditability, caching, no hidden magic
Lineage	"Where did this come from?"	Agent can explain its reasoning, human can verify

Chapter 03

How Each Environ Uses the Stack

Same four tools, completely different usage patterns. The AI Data Stack doesn't change, but the agent's posture does.

Trust Ratings

Inform choices. Agent consults ratings but isn't bound by them. Might deliberately go to A-tier data: "This is raw logs, here's my hypothesis."

Hard boundaries. Only AAA metrics. Only what's rendered in the dashboard. No exceptions.

Composition Rules

Flag risks. Agent warns about invalid joins but lets the human decide. "These tables have different grains, want me to proceed anyway?"

Validation checks. If the dashboard computed something one way, don't recompute it differently. Refuse invalid compositions outright.

SQL Compilation

Show your work. Agent generates SQL, shows it to the human, iterates. The SQL is part of the conversation.

Verify alignment. Agent checks that its answers use the same SQL as the dashboard's underlying queries. No drift allowed.

Lineage

Explain reasoning. Agent traces its logic back to sources. Human can audit the full chain.

Confirm grounding. "This number comes from the same source as the chart you're looking at."

See it in action

"Why did our conversion rate drop last week? Dig into it."

Agent Response

1 Pulling conversion_rate from governed metrics AAA

2 Week-over-week: 3.2% → 2.8% (−12.5%). Confirmed against source.

3 Checking funnel stages... checkout_started also dropped. Investigating.

4 Raw clickstream logs show 504 errors spike on Wed PM A-tier

5 Hypothesis: checkout service degradation. Here's the SQL I used...

"I'm looking at the revenue dashboard. Why is APAC down?"

Agent Response

1 The dashboard shows APAC revenue from regional_revenue metric AAA

2 APAC is down 8% vs last month. This matches what you see in the chart.

3 Within APAC, Japan accounts for most of the decline (−12%).

4 The dashboard doesn't break down further. Want me to flag this for the analytics team?

Notice the difference: the exploratory agent went beyond the governed metrics into raw logs, and said so explicitly. The constrained agent stayed within the dashboard's scope, and stopped when it hit the boundary.

Exploratory agents can do clever things that would be reckless in constrained mode: mining old queries from experts to discover how tables join, inferring relationships from historical usage patterns, trying novel combinations to surface insights. These are features, not bugs, when an expert is reviewing the work. Constrained agents need to be much more careful. They're not discovering, they're translating.

Both agents used the same infrastructure. The difference is posture.

Chapter 04

What Happens Without the Stack

When agents don't have access to trust ratings, composition rules, and deterministic compilation, you get predictable failure modes. These aren't edge cases, they're the default.

Without...	What Breaks	What It Looks Like
Trust Ratings	Agents can't prioritize sources	Agent treats a debug log table the same as a governed metric. Gives you "answers" from unreliable data without flagging the risk.
Composition Rules	Semantically invalid joins	Agent joins a 10% sampled table with unsampled data, or sums a non-additive metric across regions. The number is precise but meaningless.
SQL Compilation	Hidden magic, no audit trail	The semantic layer "figures out" the join path with LLM reasoning. Sometimes right, sometimes wrong. You can't tell which without running queries and eyeballing results.
Lineage	Unexplainable answers	Agent joins three tables and gives you a number. You ask "where did that come from?" It generates a plausible-sounding but wrong explanation. No verifiable trail.
Dashboard Scope	Constrained agents that drift	You want the agent to only use dashboard data, but "what's in this dashboard" isn't machine-readable. Agent hallucinates metrics that sound right but don't match the view.

The cost of not building the stack

Without trust ratings, agents can't prioritize. Without composition rules, they can't avoid invalid combinations. Without deterministic compilation, you can't audit their work. Without lineage, you can't verify their reasoning. Every failure mode traces back to missing infrastructure.

Chapter 05

What You Actually Build

The AI Data Stack isn't a product you buy. It's a set of capabilities you build into your data infrastructure. Here's what each piece looks like in practice.

Trust Ratings: A Metadata Layer

Every table, column, and metric gets a trust rating stored in your metadata catalog. This isn't just documentation, it's machine-readable configuration that agents consume at query time.

# In your metric definitions
metrics:
  weekly_active_users:
    trust_rating: "AAA"
    owner: "analytics-core"
    definition: "COUNT DISTINCT user_id WHERE..."
    refresh_sla: "daily by 6am UTC"
    certified_date: "2025-01-01"
    
  experimental_engagement_score:
    trust_rating: "A"
    owner: "growth-experiments"
    warning: "Definition under review. Do not use for reporting."
        

Composition Rules: A Constraint System

Encode which combinations are valid, and which aren't. This is where you capture tribal knowledge that currently lives in people's heads.

# Composition rules
rules:
  - name: "no_sampled_unsampled_join"
    when:
      join_tables:
        - has_property: "sampled"
        - has_property: "unsampled"
    action: "block"
    message: "Cannot join sampled and unsampled tables without normalization"
    
  - name: "revenue_not_additive_across_currency"
    when:
      aggregate: "SUM(revenue)"
      group_by_includes: "currency"
    action: "warn"
    message: "Revenue is not additive across currencies. Convert first."
        

The Compiler: From Metrics to SQL

This is the heart of it. A governed metric is a high-level representation that compiles to SQL. Not probabilistically generates it, but deterministically transforms it.

The compilation step is where you enforce composition rules, inject trust ratings, and produce auditable output. If the metric can't compile cleanly, the agent knows something is wrong before it runs anything.

The Agent Interface

Finally, expose all of this through an interface agents can consume:

# What the agent calls
data_stack.get_metric("weekly_active_users")
# Returns: definition, trust rating, owner, compiled SQL, lineage

data_stack.validate_query(proposed_sql)
# Returns: valid/invalid, rule violations, warnings

data_stack.get_dashboard_scope("revenue_overview")
# Returns: list of metrics/dimensions the dashboard uses
        

The agent doesn't need to understand your data warehouse. It needs to understand the contracts your data warehouse exposes. That's what the AI Data Stack provides.

Start small, expand from trust

You don't need to catalog everything on day one. Start with your most-used metrics, the ones that show up in executive dashboards. Get those to AAA. Add composition rules as you discover invalid joins in the wild. The stack grows organically from the metrics that matter most.