Data Quality Monitoring
What is DataBridge Data Quality?
DataBridge provides comprehensive data quality monitoring for your data warehouse-both in-flight (as events are ingested) and at rest (after data lands in your warehouse). Unlike pure event tracking tools that only validate data during ingestion, DataBridge continuously monitors your warehouse tables to ensure data quality is maintained over time.
Two Layers of Quality Assurance
1. In-Flight Validation (During Ingestion)
- Real-time schema validation against your JSON schemas
- Type checking and required field validation
- Invalid events automatically routed to dead-letter queues
- Prevents bad data from reaching your warehouse
2. At-Rest Monitoring (In Your Warehouse)
- Continuous profiling and validation of warehouse tables
- Automated anomaly detection and alerting
- Schema drift detection
- Data freshness and completeness monitoring
Key Features
📊 Comprehensive Data Profiling
Automatically profile tables to understand their structure, distribution and quality characteristics:
- Column Statistics: Null/blank counts, cardinality, unique values
- Distribution Metrics: Min/Max/Avg values, standard deviation, percentiles
- Most Frequent Values: Top values by occurrence
- Row Sampling: Preview actual data for context
✅ Flexible Validation Rules
Define data quality checks at multiple levels:
- Schema-Level Checks: Column presence, order and structure
- Table-Level Checks: Row counts, custom SQL queries
- Column-Level Checks: Null checks, uniqueness, ranges, freshness
🔔 Real-Time Alerts
Get notified immediately when data quality issues are detected:
- Email, Slack, or webhook notifications
- Custom thresholds for any quality metric
- Volume anomaly detection
- Schema change alerts
🎯 Multiple Validation Options
Cloud Version (Managed Service)
- Visual UI for defining and managing checks
- Scheduled validation runs
- Team collaboration features
- Built-in alerting and reporting
Community Version (Open Source CLI)
- Free, open-source tool: dbqctl
- Run checks from YAML configuration
- Integrate into CI/CD pipelines
- Perfect for automated workflows
Use Cases
Event-Heavy Applications
Validate that high-volume event data maintains quality standards:
- Gaming apps: Verify player action completeness
- IoT platforms: Monitor sensor data accuracy
- FinTech: Ensure transaction data integrity
Data Engineering Teams
- Catch schema drift before it breaks downstream pipelines
- Monitor data freshness for time-sensitive workflows
- Validate ETL pipeline outputs automatically
- Maintain data contracts across teams
Analytics & Business Intelligence
- Ensure report accuracy with continuous quality checks
- Detect anomalies in business metrics
- Validate assumptions about data distributions
- Prevent bad data from affecting decisions
Supported Databases
DataBridge supports quality monitoring for:
- ClickHouse - High-performance analytics
- PostgreSQL - General-purpose relational database
- MySQL - Popular open-source database
Available Quality Dimensions
DataBridge monitors these critical data quality dimensions:
- Completeness: Track null and blank value counts
- Uniqueness: Ensure unique identifiers are truly unique
- Freshness: Monitor data recency and staleness
- Validity: Check values fall within expected ranges
- Consistency: Detect schema drift and type changes
- Volume: Identify unexpected spikes or drops
- Accuracy: Validate data meets business rules
Getting Started
Choose your preferred way to implement data quality monitoring:
- Quick Start Guide - Install and run your first quality checks
- Core Concepts - Understand the fundamentals
- User Guide - Detailed how-tos and CLI reference
- Cloud Version - Learn about the managed service
Next Steps
- Install dbqctl and run your first data profile
- Learn core concepts to build a mental model
- Explore check types to see all available validations
- Join the community to get help and share ideas