ROCCurveDrift

Compares ROC curves between reference and monitoring datasets.

Purpose

The ROC Curve Drift test is designed to evaluate changes in the model's discriminative ability over time. By comparing Receiver Operating Characteristic (ROC) curves between reference and monitoring datasets, this test helps identify whether the model maintains its ability to distinguish between classes across different decision thresholds. This is crucial for understanding if the model's trade-off between sensitivity and specificity remains stable in production.

Test Mechanism

This test proceeds by generating ROC curves for both reference and monitoring datasets. For each dataset, it plots the True Positive Rate against the False Positive Rate across all possible classification thresholds. The test also computes AUC scores and visualizes the difference between ROC curves, providing both graphical and numerical assessments of discrimination stability. Special attention is paid to regions where curves diverge significantly.

Signs of High Risk

Large differences between reference and monitoring ROC curves
Significant drop in AUC score for monitoring dataset
Systematic differences in specific FPR regions
Changes in optimal operating points
Inconsistent performance across different thresholds
Unexpected crossovers between curves

Strengths

Provides comprehensive view of discriminative ability
Identifies specific threshold ranges with drift
Enables visualization of performance differences
Includes AUC comparison for overall assessment
Supports threshold-independent evaluation
Maintains interpretable performance metrics

Limitations

Limited to binary classification problems
May be sensitive to class distribution changes
Cannot suggest optimal threshold adjustments
Requires visual inspection for detailed analysis
Complex interpretation of curve differences
May not capture subtle performance changes