How we helped a global data integrity leader build a config-driven agentic AI suite for autonomous data operations
A global leader in data integrity replaced manual, expert-dependent data work with a config-driven agentic AI framework and six production 'lighthouse' agents that make rule authoring, classification, standardization, enrichment, search, and pipeline design guided, conversational, and fully governed.

At a Glance
Our client is a global leader in data integrity, trusted by thousands of organizations worldwide, including much of the Fortune 100, to keep their most complex, regulated, and mission-critical data accurate, consistent, and contextual. As enterprises race to make their data ready for AI, the client set out to embed intelligent, autonomous agents directly into its data integrity platform.
Challenge
Turning expert-dependent data work into scalable, AI-ready operations
As the client looked to evolve its platform, the focus shifted from isolated, task-level features toward AI that could run complex data operations end-to-end, safely and at enterprise scale. Key challenges included:
- High-value data work was manual and expertise-bound: authoring quality rules, classifying sensitive and critical data, standardizing records, and enriching data all depended on scarce specialists and tribal knowledge.
- Results were slow and inconsistent: manual processes produced governance gaps, duplicated effort, and compliance risk, and couldn't keep pace with growing data volumes.
- Users couldn't easily find or trust their data: discovering the right asset and understanding its lineage, quality, and classification required deep catalog and query expertise.
- Automation had to stay governed: any AI operating on regulated, production data needed transparency, guardrails, and human control, not a black box.
- One-off agents wouldn't scale: the client needed an approach that could grow to many agents over time without re-engineering each one.
Solution
Designing a config-driven agentic AI suite
We partnered with the client to architect and implement a unified agentic AI suite: a single foundational framework, plus six production-grade 'lighthouse' agents built on top of it and surfaced through the platform's in-product conversational assistant. The framework handles configuration, orchestration, guardrails, memory, tool access, and human-in-the-loop control, so every agent behaves consistently, and new agents can be added through configuration alone.
A config-driven agentic foundation
Rather than building isolated point solutions, we designed a declarative, configuration-driven framework (on PydanticAI) that instantiates agents at runtime from version-controlled definitions. Agent behavior, tools, memory, guardrails, and model connections are all expressed as configuration, separating what an agent does from the code that runs it.
Key capabilities included:
- Zero-code agent expansion: new agents are defined through configuration in a central, version-controlled registry (no new code required), enabling rapid prototyping and safe rollback.
- Central orchestration with auditable hand-offs: a Foundation Service routes every request, coordinates multi-agent hand-offs through a tracked call stack, and anchors each conversation back to the co-pilot as a single entry and exit point.
- Guardrails on every turn: layered checks run before and after each interaction: scope and jailbreak validation, language safety, and automatic PII detection and redaction.
- Human-in-the-loop by design: any agent can pause for user confirmation on ambiguous or high-impact actions, so automation never overrides human judgment on production data.
- Real action through MCP: dedicated Model Context Protocol servers give agents governed access to read, create, and update catalog and pipeline assets: action, not just text.
- Natural-language retrieval and built-in domain knowledge: a vector index makes any asset findable in plain language, and knowledge bases seeded with standard rules and enrichment patterns give agents a strong reasoning foundation.
- Provider-agnostic LLMs (BYOLLM) with continuous evaluation: customers connect their own model of choice, and a dedicated evaluation framework scores agent quality at scale using an LLM-as-a-Judge method, surfaced on live observability dashboards.
Conversational Search Assistant
The search assistant lets users find datasets, fields, rules, lineage, quality scores, and relationships using plain language, with no query expertise required. It interprets intent, runs semantic search across the catalog, and orchestrates a suite of metadata tools, returning results with clear citations and reasoning for transparency. It also serves as the orchestration layer that other agents call on.
Key capabilities included:
- Natural-language discovery across datasets, fields, rules, lineage, profiling, and relationships
- Transparent results with catalog citations and reasoning for every match
- Role-based access so users only see assets they're authorized to view
- Conversational continuity across follow-up questions, with graceful redirects for out-of-scope requests
Data Classification Agent
This agent automates the discovery and validation of Critical Data Elements (CDEs) and Personally Identifiable Information (PII) across thousands of datasets, work that was previously manual and tribal-knowledge-dependent. It generates domain-specific critical-data lists, validates whether a dataset or field qualifies, and updates catalog classifications only after user approval.
Key capabilities included:
- Autonomous CDE and PII classification of datasets, fields, and columns
- Domain-driven generation of critical business terms to seed the catalog
- Clear reasoning for every classification, supporting audit readiness
- Permission-checked catalog updates confirmed by a human
Data Quality Rule Discovery Agent
Instead of authoring rules by hand, users describe what they need and the agent discovers high-impact data quality rules for the targeted fields, checking existing rules first to avoid duplication. Each recommendation is validated and ranked before the user applies or edits it.
Key capabilities included:
- Discovery of unique rules, each with name, description, pass condition, dimension, and reasoning
- Critical-field detection when users are unsure which fields to target
- Confidence-based prioritization so the highest-impact rules surface first
- One-click apply or edit; nothing is created without explicit user approval
Normalization & Standardization Agent
This agent turns a conversation into a data quality pipeline. It recommends the right normalization and standardization techniques for a dataset, aware of each column's semantic type, and assembles them step-by-step directly on the visual pipeline canvas.
Key capabilities included:
- Natural-language pipeline construction on the visual editor
- Semantic-type-aware standardization (e.g., country-code conversions, name normalization, date formats)
- Suggested steps shown with description, reasoning, and a visual preview
- Full user control: steps are applied only on accept, at the user's chosen point in the pipeline
Location Intelligence & Enrichment Agent
This agent adds real-world geographic context to first-party data. It recommends the most relevant enrichment datasets based on the user's region and business context, and coordinates the address-verification prerequisites needed to enrich records accurately.
Key capabilities included:
- AI-driven enrichment recommendations tailored to region and business use case
- Human-in-the-loop validation of inferred region and business context
- Enrichment integrated directly into the data quality pipeline
- Address-verification prerequisites handled to generate reliable location identifiers
Automated Replication Pipeline Designer
This agent designs and configures data replication pipelines through guided dialogue, discovering cataloged assets, validating source-to-target compatibility, and standing up the pipeline structure automatically.
Key capabilities included:
- Conversational, multi-turn pipeline setup with validation at each decision point
- Connection-matrix checks to confirm valid source-to-target combinations
- Automatic component selection and regex-based table mapping
- Automatic target dataset creation, with human confirmation throughout
Governed, enterprise-ready execution
Because every agent runs on the same framework, governance isn't bolted on; it's inherited. Guardrails, PII redaction, role-based access, auditable orchestration, and human-in-the-loop confirmation apply uniformly, letting the client deploy autonomous agents on regulated, production data with confidence.
Benefits
Intelligent, governed automation that scales
The implementation delivered clear business value for the client and its customers:
- Complex work becomes conversational: rule authoring, classification, standardization, enrichment, search, and pipeline design are now guided, natural-language experiences, accessible to business users, not just specialists.
- New agents in days, not quarters: because agents are configuration, not bespoke code, the client can prototype and ship new capabilities without engineering rework.
- Governance and trust built in: guardrails, PII redaction, transparency, and human-in-the-loop confirmation let automation run safely on production data, with users always in control.
- Consistency at scale: every agent inherits the same framework, orchestration, and quality controls, and continuous evaluation keeps quality visible in production.
- A foundation for AI-ready data: by combining automation with human oversight, the agents help the organization build accurate, consistent, enriched data: the foundation autonomous AI depends on.
A reusable foundation for scaling AI across the enterprise
By investing in a config-driven agentic foundation first, and building six lighthouse agents on top of it, we helped a global data integrity leader turn slow, expert-dependent data work into guided, conversational, and fully governed operations. The result isn't just six agents; it's a reusable platform for building many more, positioning the organization to scale AI across the enterprise with confidence.
Ready to transform
your enterprise?
Let's build something that lasts. Our team is ready to talk.