ClassificationAccuracyDrift
Compares classification accuracy metrics between reference and monitoring datasets.
Purpose
The Classification Accuracy Drift test is designed to evaluate changes in the model’s predictive accuracy over time. By comparing key accuracy metrics between reference and monitoring datasets, this test helps identify whether the model maintains its performance levels in production. This is crucial for understanding if the model’s predictions remain reliable and whether its overall effectiveness has degraded significantly.
Test Mechanism
This test proceeds by calculating comprehensive accuracy metrics for both reference and monitoring datasets. It computes overall accuracy, per-label precision, recall, and F1 scores, as well as macro-averaged metrics. The test quantifies drift as percentage changes in these metrics between datasets, providing both granular and aggregate views of accuracy changes. Special attention is paid to per-label performance to identify class-specific degradation.
Signs of High Risk
- Large drifts in accuracy metrics exceeding the threshold
- Inconsistent changes across different labels
- Significant drops in macro-averaged metrics
- Systematic degradation in specific class performance
- Unexpected improvements suggesting data quality issues
- Divergent trends between precision and recall
Strengths
- Provides comprehensive accuracy assessment
- Identifies class-specific performance changes
- Enables early detection of model degradation
- Includes both micro and macro perspectives
- Supports multi-class classification evaluation
- Maintains interpretable drift thresholds
Limitations
- May be sensitive to class distribution changes
- Does not account for prediction confidence
- Cannot identify root causes of accuracy drift
- Limited to accuracy-based metrics only
- Requires sufficient samples per class
- May not capture subtle performance changes