// the_problem

AI can write SQL. It can write good SQL. But it writes SQL for a generic company, because it has no idea how your company works.

Your warehouse has two columns called user_id, and one of them is the right one. Your revenue metric excludes internal accounts. Your main events table has duplicated rows unless you filter on is_primary = true.

And so every time someone asks Claude to write a query, the same ritual. The AI produces something plausible, the analyst spends an afternoon fixing it, and nobody captures the process.


// our_principles

1. Agentic data science should be agnostic to your current tools.

Dabte is not a product feature. It's a system fo records for a companies collective data knowledge.

2. Humans should not have to create or maintain context.

Your organization already has context set about your data. Any system that depends on people maintaining a golden set of data is a challenge to maintain indefinitely.

3. Context should compound, not decay.

Context should improve as your team works, not degrade when someone leaves. The work itself should be the input.


// our_solution: dante_for_data_science

Dante is a two-part system that solves this by becoming the central nervous system for your data science context.

Part 1: dante-ds (the library)

dante-ds is a free, open-source Python library. You pip install it and point it at your data sources. Right now it connects to Databricks and Looker, with more connectors coming for anything that exposes an API.

Here is what it does:

It scrapes your existing charts and dashboards. Every visualization your team has already built has validated SQL behind it. dante-ds pulls that SQL and pairs it with a plain-language description of what the chart shows. "What are our weekly active users?" gets paired with the query that answers that question in your warehouse, with your tables, your joins, your filters.

These pairs become embeddings placned a searchable database where every entry is a question matched to the SQL that answers it. When Claude Code needs to write a query, it searches this database first and finds the closest existing examples from your own work.

dante-ds also generates a rules library: a set of persistent instructions (a CLAUDE.md file) that encode your team's conventions. Things like "always use analytics.users instead of raw.user_table" or "revenue excludes rows where account_type = 'internal' after 2024-04-01." These rules load automatically every time Claude Code starts a session.

The result: you ask Claude Code "what were our top customers last quarter?" and it writes SQL that actually works, because it already knows your schema, business logic and the patterns your team has validated.

Part 2: Dante Studio (the brain)

dante-ds is powerful on its own, but it gets dramatically better at the team level.

When multiple people across your company run dante-ds, each one builds their own local embedding database. Dante Studio is the central system that consolidates all of them. It watches what's working well across the team and updates a shared embedding database that everyone draws from.

This means the system gets smarter as people use it. Studio groups queries by similarity and simplifies SQL down to the best example for Claude to reference. When one analyst discovers a better way to calculate churn, the whole team gets it. When someone fixes a subtle join issue, that correction propagates everywhere.

Instead of ten analysts each maintaining their own tribal knowledge, there is a single shared system that continuously improves. Studio can also enforce presentation standards, so when people create dashboards or decks, they use consistent colors, fonts, and formatting.


// how_it_works_mechanically

The architecture is simple. dante-ds installs as a Python package and registers itself as an MCP server for Claude Code. This gives Claude a set of skills: it can search your embedding database, look up table schemas, and retrieve validated query patterns, all before it writes a single line of SQL.

When you ask Claude a data question, it does not start from scratch. It searches your embeddings for similar questions, pulls the closest matching SQL, reads the relevant rules from your CLAUDE.md, and then writes a query grounded in real examples from your company. The difference is immediate.

The embedding database is local. Your SQL, your business logic, your data stays on your machine. Dante Studio is the optional layer that aggregates and shares across the team, and that runs on infrastructure you control.


// why_this_compounds

Manual context management degrades over time. Somebody writes documentation, it goes stale, nobody updates it. Semantic layers need governance committees. dbt docs need someone to remember they exist.

Dante works the other way. Every query your team writes teaches the system something. Every validated pattern gets captured. Every time an analyst fixes an AI-generated query, that correction becomes a data point. Over weeks and months, the AI's understanding of your data goes from approximate to precise to deeply expert.

The trajectory is fundamentally different from any context system that requires human maintenance. Dante gets better because your team is doing work every day, and the work itself is the input.


// get_started

Start with dante-ds. It is one pip install dante-ds away. Point it at a Databricks workspace or a Looker instance, let it build your embeddings and rules, and see what happens the next time you ask Claude Code a data question.