%pip install -q validmind
Intro to Assign Scores
The assign_scores()
method is a powerful feature that allows you to compute and add unit metric scores as new columns in your dataset. This method takes a model and metric(s) as input, computes the specified metrics from the ValidMind unit_metrics library, and adds them as new columns. The computed metrics can be scalar values that apply to the entire dataset or per-row values, providing flexibility in how performance is measured and tracked.
In this interactive notebook, we demonstrate how to use the assign_scores()
method effectively. We'll walk through a complete example using a customer churn dataset, showing how to compute and assign both dataset-level metrics (like overall F1 score) and row-level metrics (like prediction probabilities). You'll learn how to work with single and multiple unit metrics, pass custom parameters, and handle different metric types - all while maintaining a clean, organized dataset structure. Currently, assign_scores() supports all metrics available in the validmind.unit_metrics module.
The Power of Integrated Scoring
Traditional model evaluation workflows often involve computing metrics separately from your core dataset, leading to fragmented analysis and potential data misalignment. The assign_scores()
method addresses this challenge by:
- Seamless Integration: Directly embedding computed metrics as dataset columns using a consistent naming convention
- Enhanced Traceability: Maintaining clear links between model predictions and performance metrics
- Simplified Analysis: Enabling straightforward comparison of metrics across different models and datasets
- Standardized Workflow: Providing a unified approach to metric computation and storage
Understanding assign_scores()
The assign_scores()
method computes unit metrics for a given model-dataset combination and adds the results as new columns to your dataset. Each new column follows the naming convention: {model.input_id}_{metric_name}
, ensuring clear identification of which model and metric combination generated each score.
Key features:
- Flexible Input: Accepts single metrics or lists of metrics
- Parameter Support: Allows passing additional parameters to underlying metric implementations
- Multi-Model Support: Can assign scores from multiple models to the same dataset
- Type Agnostic: Works with classification, regression, and other model types
This approach streamlines your model evaluation workflow, making performance metrics an integral part of your dataset rather than external calculations.
Contents
About ValidMind
ValidMind is a suite of tools for managing model risk, including risk associated with AI and statistical models.
You use the ValidMind Library to automate documentation and validation tests, and then use the ValidMind Platform to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators.
Before you begin
This notebook assumes you have basic familiarity with Python, including an understanding of how functions work. If you are new to Python, you can still run the notebook but we recommend further familiarizing yourself with the language.
If you encounter errors due to missing modules in your Python environment, install the modules with pip install
, and then re-run the notebook. For more help, refer to Installing Python Modules.
New to ValidMind?
If you haven't already seen our documentation on the ValidMind Library, we recommend you begin by exploring the available resources in this section. There, you can learn more about documenting models and running tests, as well as find code samples and our Python Library API reference.
Register with ValidMind
Install the ValidMind Library
To install the library:
Initialize the ValidMind Library
ValidMind generates a unique code snippet for each registered model to connect with your developer environment. You initialize the ValidMind Library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.
Get your code snippet
In a browser, log in to ValidMind.
In the left sidebar, navigate to Model Inventory and click + Register Model.
Enter the model details and click Continue. (Need more help?)
For example, to register a model for use with this notebook, select:
- Documentation template:
Binary classification
- Use case:
Marketing/Sales - Analytics
You can fill in other options according to your preference.
- Documentation template:
Go to Getting Started and click Copy snippet to clipboard.
Next, load your model identifier credentials from an .env
file or replace the placeholder with your own code snippet:
# Load your model identifier credentials from an `.env` file
%load_ext dotenv
%dotenv .env
# Or replace with your code snippet
import validmind as vm
vm.init(# api_host="...",
# api_key="...",
# api_secret="...",
# model="...",
)
Load the demo dataset
In this example, we load a demo dataset to demonstrate the assign_scores functionality with customer churn prediction models.
from validmind.datasets.classification import customer_churn as demo_dataset
print(
f"Loaded demo dataset with: \n\n\t• Target column: '{demo_dataset.target_column}' \n\t• Class labels: {demo_dataset.class_labels}"
)
= demo_dataset.load_data()
raw_df raw_df.head()
Train models for testing
We'll train two different customer churn models to demonstrate the assign_scores functionality with multiple models.
import xgboost as xgb
from sklearn.ensemble import RandomForestClassifier
# Preprocess the data
= demo_dataset.preprocess(raw_df)
train_df, validation_df, test_df
# Prepare training data
= train_df.drop(demo_dataset.target_column, axis=1)
x_train = train_df[demo_dataset.target_column]
y_train = validation_df.drop(demo_dataset.target_column, axis=1)
x_val = validation_df[demo_dataset.target_column]
y_val
# Train XGBoost model
= xgb.XGBClassifier(early_stopping_rounds=10, random_state=42)
xgb_model
xgb_model.set_params(=["error", "logloss", "auc"],
eval_metric
)
xgb_model.fit(
x_train,
y_train,=[(x_val, y_val)],
eval_set=False,
verbose
)
# Train Random Forest model
= RandomForestClassifier(n_estimators=100, random_state=42)
rf_model
rf_model.fit(x_train, y_train)
print("Models trained successfully!")
print(f"XGBoost training accuracy: {xgb_model.score(x_train, y_train):.3f}")
print(f"Random Forest training accuracy: {rf_model.score(x_train, y_train):.3f}")
Initialize ValidMind objects
We initialize ValidMind dataset
and model
objects. The input_id
parameter is crucial for the assign_scores functionality as it determines the column naming convention for assigned scores.
# Initialize datasets
= vm.init_dataset(
vm_train_ds ="train_dataset",
input_id=train_df,
dataset=demo_dataset.target_column,
target_column
)= vm.init_dataset(
vm_test_ds ="test_dataset",
input_id=test_df,
dataset=demo_dataset.target_column,
target_column
)
# Initialize models with descriptive input_ids
= vm.init_model(model=xgb_model, input_id="xgboost_model")
vm_xgb_model = vm.init_model(model=rf_model, input_id="random_forest_model")
vm_rf_model
print("ValidMind objects initialized successfully!")
print(f"XGBoost model ID: {vm_xgb_model.input_id}")
print(f"Random Forest model ID: {vm_rf_model.input_id}")
Assign predictions
Before we can use assign_scores(), we need to assign predictions to our datasets. This step is essential as many unit metrics require both actual and predicted values.
# Assign predictions for both models to both datasets
=vm_xgb_model)
vm_train_ds.assign_predictions(model=vm_rf_model)
vm_train_ds.assign_predictions(model
=vm_xgb_model)
vm_test_ds.assign_predictions(model=vm_rf_model)
vm_test_ds.assign_predictions(model
print("Predictions assigned successfully!")
print(f"Test dataset now has {len(vm_test_ds.df.columns)} columns")
Using assign_scores()
Now we'll explore the various ways to use the assign_scores() method to integrate performance metrics directly into your dataset.
Basic Usage
The assign_scores() method has a simple interface:
**kwargs) dataset.assign_scores(model, metrics,
- model: A ValidMind model object
- metrics: Single metric ID or list of metric IDs (can use short names or full IDs)
- kwargs: Additional parameters passed to the underlying metric implementations
Let's first check what columns we currently have in our test dataset:
print("Current columns in test dataset:")
for i, col in enumerate(vm_test_ds.df.columns, 1):
print(f"{i:2d}. {col}")
print(f"\nDataset shape: {vm_test_ds.df.shape}")
Single Metric Assignment
Let's start by assigning a single metric - the F1 score - for our XGBoost model on the test dataset.
# Assign F1 score for XGBoost model
"F1")
vm_test_ds.assign_scores(vm_xgb_model,
print("After assigning F1 score:")
print(f"New column added: {vm_test_ds.df.columns}")
Multiple Metrics Assignment
We can assign multiple metrics at once by passing a list of metric names. This is more efficient than calling assign_scores() multiple times.
# Assign multiple classification metrics for the Random Forest model
= ["Precision", "Recall", "Accuracy", "ROC_AUC"]
classification_metrics
vm_test_ds.assign_scores(vm_rf_model, classification_metrics)
print("After assigning multiple metrics for Random Forest:")
= [col for col in vm_test_ds.df.columns if 'random_forest_model' in col]
rf_columns print(f"Random Forest columns: {rf_columns}")
# Display the metric values
for metric in classification_metrics:
= f"random_forest_model_{metric}"
col_name if col_name in vm_test_ds.df.columns:
= vm_test_ds.df[col_name].iloc[0]
value print(f"{metric}: {value:.4f}")
Passing Parameters to Metrics
Many unit metrics accept additional parameters that are passed through to the underlying sklearn implementations. Let's demonstrate this with the ROC_AUC metric.
# Assign ROC_AUC with different averaging strategies
"ROC_AUC", average="macro")
vm_test_ds.assign_scores(vm_xgb_model,
# We can also assign with different parameters by calling assign_scores again
# Note: This will overwrite the previous column with the same name
print("ROC_AUC assigned with macro averaging")
# Let's also assign precision and recall with different averaging
"Precision", "Recall"], average="weighted")
vm_test_ds.assign_scores(vm_xgb_model, [
print("Precision and Recall assigned with weighted averaging")
# Display current XGBoost metric columns
= [col for col in vm_test_ds.df.columns if 'xgboost_model' in col]
xgb_columns print(f"\nXGBoost model columns: {xgb_columns}")
Multi-Model Scoring
One of the powerful features of assign_scores() is the ability to assign scores from multiple models to the same dataset, enabling easy model comparison.
# Let's assign a comprehensive set of metrics for both models
= ["F1", "Precision", "Recall", "Accuracy", "ROC_AUC"]
comprehensive_metrics
# Assign for XGBoost model
vm_test_ds.assign_scores(vm_xgb_model, comprehensive_metrics)
# Assign for Random Forest model}
vm_test_ds.assign_scores(vm_rf_model, comprehensive_metrics)
print("Comprehensive metrics assigned for both models!")
Individual Metrics
The next section demonstrates how to assign individual metrics that compute scores per row, rather than aggregate metrics. We'll use two important metrics:
- Brier Score: Measures how well calibrated the model's probability predictions are for each individual prediction
- Log Loss: Evaluates how well the predicted probabilities match the true labels on a per-prediction basis
Both metrics provide more granular insights into model performance at the individual prediction level.
# Let's add some individual metrics that compute per-row scores
print("Adding individual metrics...")
# Add Brier Score - measures accuracy of probabilistic predictions per row
"BrierScore")
vm_test_ds.assign_scores(vm_xgb_model, print("Added Brier Score - lower values indicate better calibrated probabilities")
# Add Log Loss - measures how well the predicted probabilities match true labels per row
"LogLoss")
vm_test_ds.assign_scores(vm_xgb_model, print("Added Log Loss - lower values indicate better probability estimates")
# Create a comparison summary showing first few rows of individual metrics
print("\nFirst few rows of individual metrics:")
= [col for col in vm_test_ds.df.columns if any(m in col for m in ['BrierScore', 'LogLoss'])]
individual_metrics print(vm_test_ds.df[individual_metrics].head())
vm_test_ds._df.head()
Next steps
You can explore the assigned scores right in the notebook as demonstrated above. However, there's even more value in using the ValidMind Platform to work with your model documentation and monitoring.
Work with your model documentation
From the Model Inventory in the ValidMind Platform, go to the model you registered earlier. (Need more help?)
Click and expand the Model Development section.
The scores you've assigned using assign_scores()
become part of your model's documentation and can be used in ongoing monitoring workflows. You can view these metrics over time, set up alerts for performance drift, and compare models systematically. Learn more ...
Discover more learning resources
We offer many interactive notebooks to help you work with model scoring and evaluation:
Or, visit our documentation to learn more about ValidMind.
Upgrade ValidMind
Retrieve the information for the currently installed version of ValidMind:
%pip show validmind
If the version returned is lower than the version indicated in our production open-source code, restart your notebook and run:
%pip install --upgrade validmind
You may need to restart your kernel after running the upgrade package for changes to be applied.