# Make sure the ValidMind Library is installed
%pip install -q validmind
# Load your model identifier credentials from an `.env` file
%load_ext dotenv
%dotenv .env
# Or replace with your code snippet
import validmind as vm
vm.init(# api_host="...",
# api_key="...",
# api_secret="...",
# model="...",
)
ValidMind for model development — 104 Finalize testing and documentation
Learn how to use ValidMind for your end-to-end model documentation process with our introductory notebook series. In this last notebook, finalize the testing and documentation of your model and have a fully documented sample model ready for review.
We’ll first use run_documentation_tests()
previously covered in 102 Start the model development process to ensure that your custom test results generated in 103 Integrate custom tests are included in your documentation. Then, we’ll view and update the configuration for the entire model documentation template to suit your needs.
Prerequisites
In order to finalize the testing and documentation for your sample model, you’ll need to first have:
Need help with the above steps?
Refer to the first three notebooks in this series:
Setting up
This section should be very familiar to you now — as we performed the same actions in the previous two notebooks in this series.
Initialize the ValidMind Library
As usual, let’s first connect up the ValidMind Library to our model we previously registered in the ValidMind Platform:
In a browser, log in to ValidMind.
In the left sidebar, navigate to Inventory and select the model you registered for this “ValidMind for model development” series of notebooks.
Go to Getting Started and click Copy snippet to clipboard.
Next, load your model identifier credentials from an .env
file or replace the placeholder with your own code snippet:
Import sample dataset
Next, we’ll import the same public Bank Customer Churn Prediction dataset from Kaggle we used in the last notebook so that we have something to work with:
from validmind.datasets.classification import customer_churn as demo_dataset
print(
f"Loaded demo dataset with: \n\n\t• Target column: '{demo_dataset.target_column}' \n\t• Class labels: {demo_dataset.class_labels}"
)
= demo_dataset.load_data() raw_df
We’ll apply a simple rebalancing technique to the dataset before continuing:
import pandas as pd
= raw_df.sample(frac=1) # Create a copy of the raw dataset
raw_copy_df
# Create a balanced dataset with the same number of exited and not exited customers
= raw_copy_df.loc[raw_copy_df["Exited"] == 1]
exited_df = raw_copy_df.loc[raw_copy_df["Exited"] == 0].sample(n=exited_df.shape[0])
not_exited_df
= pd.concat([exited_df, not_exited_df])
balanced_raw_df = balanced_raw_df.sample(frac=1, random_state=42) balanced_raw_df
Train the model
We’ll then use ValidMind tests to train a simple logistic regression model on our prepared dataset:
# First encode the categorical features in our dataset with the highly correlated features removed
= pd.get_dummies(
balanced_raw_no_age_df =["Geography", "Gender"], drop_first=True
balanced_raw_no_age_df, columns
) balanced_raw_no_age_df.head()
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
# Split the input and target variables
= balanced_raw_no_age_df.drop("Exited", axis=1)
X = balanced_raw_no_age_df["Exited"]
y = train_test_split(
X_train, X_test, y_train, y_test
X,
y,=0.2,
test_size=42,
random_state
)
# Logistic Regression grid params
= {
log_reg_params "penalty": ["l1", "l2"],
"C": [0.001, 0.01, 0.1, 1, 10, 100, 1000],
"solver": ["liblinear"],
}
# Grid search for Logistic Regression
from sklearn.model_selection import GridSearchCV
= GridSearchCV(LogisticRegression(), log_reg_params)
grid_log_reg
grid_log_reg.fit(X_train, y_train)
# Logistic Regression best estimator
= grid_log_reg.best_estimator_ log_reg
Initialize the ValidMind objects
Let’s initialize the ValidMind Dataset
and Model
objects in preparation for assigning model predictions to each dataset:
= X_train
train_df "Exited"] = y_train
train_df[= X_test
test_df "Exited"] = y_test
test_df[
# Initialize the datasets into their own dataset objects
= vm.init_dataset(
vm_train_ds ="train_dataset_final",
input_id=train_df,
dataset="Exited",
target_column
)
= vm.init_dataset(
vm_test_ds ="test_dataset_final",
input_id=test_df,
dataset="Exited",
target_column
)
# Initialize a model object
= vm.init_model(log_reg, input_id="log_reg_model_v1") vm_model
Assign predictions
Once the model is registered, we’ll assign predictions to the training and test datasets:
=vm_model)
vm_train_ds.assign_predictions(model=vm_model) vm_test_ds.assign_predictions(model
Add custom tests
We’ll also add the same custom tests we implemented in the previous notebook so that this session has access to the same custom inline test and local test provider.
Implement custom inline test
Let’s set up a custom inline test that calculates the confusion matrix for a binary classification model:
# First create a confusion matrix plot
import matplotlib.pyplot as plt
from sklearn import metrics
# Get the predicted classes
= log_reg.predict(vm_test_ds.x)
y_pred
= metrics.confusion_matrix(y_test, y_pred)
confusion_matrix
= metrics.ConfusionMatrixDisplay(
cm_display =confusion_matrix, display_labels=[False, True]
confusion_matrix
) cm_display.plot()
# Create the reusable ConfusionMatrix inline test with normalized matrix
@vm.test("my_custom_tests.ConfusionMatrix")
def confusion_matrix(dataset, model, normalize=False):
"""The confusion matrix is a table that is often used to describe the performance of a classification model on a set of data for which the true values are known.
The confusion matrix is a 2x2 table that contains 4 values:
- True Positive (TP): the number of correct positive predictions
- True Negative (TN): the number of correct negative predictions
- False Positive (FP): the number of incorrect positive predictions
- False Negative (FN): the number of incorrect negative predictions
The confusion matrix can be used to assess the holistic performance of a classification model by showing the accuracy, precision, recall, and F1 score of the model on a single figure.
"""
= dataset.y
y_true = dataset.y_pred(model=model)
y_pred
if normalize:
= metrics.confusion_matrix(y_true, y_pred, normalize="all")
confusion_matrix else:
= metrics.confusion_matrix(y_true, y_pred)
confusion_matrix
= metrics.ConfusionMatrixDisplay(
cm_display =confusion_matrix, display_labels=[False, True]
confusion_matrix
)
cm_display.plot()
# close the plot to avoid displaying it
plt.close()
return cm_display.figure_ # return the figure object itself
# Test dataset with normalize=True
= vm.tests.run_test(
result "my_custom_tests.ConfusionMatrix:test_dataset_normalized",
={"model": vm_model, "dataset": vm_test_ds},
inputs={"normalize": True},
params )
Add a local test provider
Finally, let’s save our custom inline test to our local test provider:
# Create custom tests folder
= "my_tests"
tests_folder
import os
# create tests folder
=True)
os.makedirs(tests_folder, exist_ok
# remove existing tests
for f in os.listdir(tests_folder):
# remove files and pycache
if f.endswith(".py") or f == "__pycache__":
f"rm -rf {tests_folder}/{f}") os.system(
# Save custom inline test to custom tests folder
confusion_matrix.save(
tests_folder,=["import matplotlib.pyplot as plt", "from sklearn import metrics"],
imports )
# Register local test provider
from validmind.tests import LocalTestProvider
# initialize the test provider with the tests folder we created earlier
= LocalTestProvider(tests_folder)
my_test_provider
vm.tests.register_test_provider(="my_test_provider",
namespace=my_test_provider,
test_provider )
Reconnect to ValidMind
After you insert test-driven blocks into your model documentation, changes should persist and become available every time you call vm.preview_template()
.
However, you’ll need to reload the connection to the ValidMind Platform if you have added test-driven blocks when the connection was already established using reload()
:
reload() vm.
Now, when you run preview_template()
again, the three test-driven blocks you added to your documentation in the last two notebooks in should show up in the template in sections 2.3 Correlations and Interactions and 3.2 Model Evaluation:
vm.preview_template()
Include custom test results
Since your custom test IDs are now part of your documentation template, you can now run tests for an entire section and all additional custom tests should be loaded without any issues.
Let’s run all tests in the Model Evaluation section of the documentation. Note that we have been running the sample custom confusion matrix with normalize=True
to demonstrate the ability to provide custom parameters.
In the Run the model evaluation tests section of 102 Start the model development process, you learned how to assign inputs to individual tests with run_documentation_tests()
. Assigning parameters is similar, you only need to provide assign a params
dictionary to a given test ID, my_test_provider.ConfusionMatrix
in this case.
= {
test_config "validmind.model_validation.sklearn.ClassifierPerformance:in_sample": {
"inputs": {
"dataset": vm_train_ds,
"model": vm_model,
},
},"my_test_provider.ConfusionMatrix": {
"params": {"normalize": True},
"inputs": {"dataset": vm_test_ds, "model": vm_model},
},
}= vm.run_documentation_tests(
results =["model_evaluation"],
section={
inputs"dataset": vm_test_ds, # Any test that requires a single dataset will use vm_test_ds
"model": vm_model,
"datasets": (
vm_train_ds,
vm_test_ds,# Any test that requires multiple datasets will use vm_train_ds and vm_test_ds
),
},=test_config,
config )
Documentation template configuration
Let’s call the utility function vm.get_test_suite().get_default_config()
which will return the default configuration for the entire documentation template as a dictionary:
- This configuration will contain all the test IDs and their default parameters.
- You can then modify this configuration as needed and pass it to
run_documentation_tests()
to run all tests in the documentation template if needed. - You still have the option to continue running tests for one section at a time;
get_default_config()
simply provides a useful reference for providing default parameters to every test.
import json
= vm.get_test_suite()
model_test_suite = model_test_suite.get_default_config()
config print("Suite Config: \n", json.dumps(config, indent=2))
Update the config
The default config does not assign any inputs to a test, but you can assign inputs to individual tests as needed depending on the datasets and models you want to pass to individual tests.
For this particular documentation template (binary classification), the ValidMind Library provides a sample configuration that can be used to populate the entire model documentation using the following inputs as placeholders:
- A
raw_dataset
raw dataset - A
train_dataset
training dataset - A
test_dataset
test dataset - A trained
model
instance
As part of updating the config
you will need to ensure the correct input_id
s are used in the final config passed to run_documentation_tests()
.
from validmind.datasets.classification import customer_churn
from validmind.utils import preview_test_config
= customer_churn.get_demo_test_config()
test_config preview_test_config(test_config)
Using this sample configuration, let’s finish populating model documentation by running all tests for the Model Development section of the documentation.
Recall that the training and test datasets in our exercise have the following input_id
values:
train_dataset_final
for the training datasettest_dataset_final
for the test dataset
= {
config "validmind.model_validation.ModelMetadata": {
"inputs": {"model": "log_reg_model_v1"},
},"validmind.data_validation.DatasetSplit": {
"inputs": {"datasets": ["train_dataset_final", "test_dataset_final"]},
},"validmind.model_validation.sklearn.PopulationStabilityIndex": {
"inputs": {
"model": "log_reg_model_v1",
"datasets": ["train_dataset_final", "test_dataset_final"],
},"params": {"num_bins": 10, "mode": "fixed"},
},"validmind.model_validation.sklearn.ConfusionMatrix": {
"inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
},"my_test_provider.ConfusionMatrix": {
"inputs": {"dataset": "test_dataset_final", "model": "log_reg_model_v1"},
},"my_custom_tests.ConfusionMatrix:test_dataset_normalized": {
"inputs": {"dataset": "test_dataset_final", "model": "log_reg_model_v1"},
},"validmind.model_validation.sklearn.ClassifierPerformance:in_sample": {
"inputs": {"model": "log_reg_model_v1", "dataset": "train_dataset_final"}
},"validmind.model_validation.sklearn.ClassifierPerformance:out_of_sample": {
"inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"}
},"validmind.model_validation.sklearn.PrecisionRecallCurve": {
"inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
},"validmind.model_validation.sklearn.ROCCurve": {
"inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
},"validmind.model_validation.sklearn.TrainingTestDegradation": {
"inputs": {
"model": "log_reg_model_v1",
"datasets": ["train_dataset_final", "test_dataset_final"],
},"params": {
"metrics": ["accuracy", "precision", "recall", "f1"],
"max_threshold": 0.1,
},
},"validmind.model_validation.sklearn.MinimumAccuracy": {
"inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
"params": {"min_threshold": 0.7},
},"validmind.model_validation.sklearn.MinimumF1Score": {
"inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
"params": {"min_threshold": 0.5},
},"validmind.model_validation.sklearn.MinimumROCAUCScore": {
"inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
"params": {"min_threshold": 0.5},
},"validmind.model_validation.sklearn.PermutationFeatureImportance": {
"inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
},"validmind.model_validation.sklearn.SHAPGlobalImportance": {
"inputs": {"model": "log_reg_model_v1", "dataset": "test_dataset_final"},
"params": {"kernel_explainer_samples": 10},
},"validmind.model_validation.sklearn.WeakspotsDiagnosis": {
"inputs": {
"model": "log_reg_model_v1",
"datasets": ["train_dataset_final", "test_dataset_final"],
},"params": {
"thresholds": {"accuracy": 0.75, "precision": 0.5, "recall": 0.5, "f1": 0.7}
},
},"validmind.model_validation.sklearn.OverfitDiagnosis": {
"inputs": {
"model": "log_reg_model_v1",
"datasets": ["train_dataset_final", "test_dataset_final"],
},"params": {"cut_off_percentage": 4},
},"validmind.model_validation.sklearn.RobustnessDiagnosis": {
"inputs": {
"model": "log_reg_model_v1",
"datasets": ["train_dataset_final", "test_dataset_final"],
},"params": {
"scaling_factor_std_dev_list": [0.0, 0.1, 0.2, 0.3, 0.4, 0.5],
"accuracy_decay_threshold": 4,
},
},
}
= vm.run_documentation_tests(
full_suite ="model_development",
section=config,
config )
In summary
In this final notebook, you learned how to:
With our ValidMind for model development series of notebooks, you learned how to document a model end-to-end with the ValidMind Library by running through some common scenarios in a typical model development setting:
- Running out-of-the-box tests
- Documenting your model by adding evidence to model documentation
- Extending the capabilities of the ValidMind Library by implementing custom tests
- Ensuring that the documentation is complete by running all tests in the documentation template
Next steps
Work with your model documentation
Now that you’ve logged all your test results and generated a draft for your model documentation, head to the ValidMind Platform to make qualitative edits, view guidelines, collaborate with validators, and submit your model documentation for approval when it’s ready. Learn more: Working with model documentation
Learn more
Now that you’re familiar with the basics, you can explore the following notebooks to get a deeper understanding on how the ValidMind Library allows you generate model documentation for any use case:
Use cases
More how-to guides and code samples
Discover more learning resources
All notebook samples can be found in the following directories of the ValidMind Library GitHub repository: