Testing

testing

model documentation

customization

custom data

explainability

ongoing monitoring

validmind library

Published

August 8, 2025

How do the out-of-the-box tests developed by ValidMind work?

All the default tests are developed using open-source Python and R libraries.

The ValidMind Library¹ test interface is a light wrapper that defines utility functions to agnostically interact with different dataset and model backends, and contains functions to collect and post results to the ValidMind Platform² using a generic results schema.

¹ ValidMind Library

² Accessing ValidMind

When do I use tests and tests suites?

While you have the flexibility to decide when to use which ValidMind tests, here are a few typical scenarios:³

³ When do I use tests and test suites?

Dataset testing — To document and validate your dataset.
Model testing — To document and validate your model.
End-to-end testing — To document a binary classification model and the relevant dataset end-to-end.

Can we configure, customize, or add our own tests?

Yes, ValidMind allows tests to be manipulated at several levels:

You can configure which tests are required to run programmatically depending on the model use case.⁴
You can change the thresholds and parameters for default tests already available in the library — for instance, changing the threshold parameter for the class imbalance flag.⁵
You can also connect your own custom tests with the ValidMind Library. These custom tests are configurable and are able to run programmatically, just like the rest of the library.⁶
Personalize tests further for your use case by using ValidMind’s RawData feature⁷ to customize the output of tests.

⁴ run_documentation_tests()

⁵ ClassImbalance()

⁶ Can I use my own tests?

⁷ Understand and utilize RawData in ValidMind tests

In addition to custom tests, you can also add use case and test-specific context for any test to enhance the LLM-generated test descriptions using the ValidMind Library.⁸

⁸ Add context to LLM-generated test descriptions

How do I log tests as a developer?

You use the ValidMind Library to run and log tests during model development, the results of which are then inserted your model documentation within the ValidMind Platform.⁹ The library also automatically generates draft test descriptions for your test results — generations that can be modified for your custom use cases.¹⁰

⁹ Work with test results

¹⁰ Can we configure, customize, or add our own tests?

To log tests as a developer with the ValidMind Library:

You must have the Developer role¹¹ or another role with sufficient permissions to create and own models, and to work with model documentation.
You must be the model owner or model developer, but not the model validator,¹² for the model you want to log tests and update documentation for.

¹¹ Manage roles

¹² Manage model stakeholder types

Want to learn how to use ValidMind as a developer?

Check our our introductory series — ValidMind for model development

How do I log tests as a validator?

You use the ValidMind Library to run and log tests during model validation, the results of which are then inserted your validation report within the ValidMind Platform.¹³ The library also automatically generates draft test descriptions for your test results — generations that can be modified for your custom use cases.¹⁴

¹³ Assess compliance

¹⁴ Can we configure, customize, or add our own tests?

To log tests as a validator with the ValidMind Library:

You must have the Validator role¹⁵ or another role with sufficient permissions to access models for validation, to review model documentation, and to work with validation reports and model findings.
You must be the model validator, but not the model owner or model developer,¹⁶ for the model you want to log tests and update documentation for.

¹⁵ Manage roles

¹⁶ Manage model stakeholder types

Want to learn how to use ValidMind as a validator?

Check our our introductory series — ValidMind for model validation

Do you include explainability-related testing and documentation?

Yes, ValidMind includes explainability-related testing and documentation as part of our offerings. Our approach incorporates a comprehensive suite of tests designed to evaluate model interpretability and identify potential risks, ensuring transparency and reliability in model outcomes.

Below is an overview of our key explainability-related tests:

Features AUC¹⁷ — Assesses the discriminatory power of individual features in binary classification models, providing insights into how well each feature differentiates between classes. This test supports explainability by isolating the contribution of each feature to the classification task.
Feature Importance¹⁸ — Generates feature importance scores to identify and compare impactful features across different models and datasets. By highlighting the relative significance of features, this test clarifies how inputs influence model predictions.
Overfit Diagnosis¹⁹ — Detects potential overfitting by comparing performance between training and testing sets for specific feature segments, highlighting areas of significant deviation. This test aids explainability by revealing where model behavior is inconsistent, offering insights into its generalization capability.
Permutation Feature Importance²⁰ — Measures feature significance by analyzing the impact of randomly rearranging feature values on model performance. This test quantifies the dependency of model performance on each feature, making it clear which inputs drive the predictions.
SHAP Global Importance²¹ — Uses SHAP (SHapley Additive exPlanations) values to assign global importance to features, offering a clear explanation of model outcomes and supporting risk identification. SHAP values provide a mathematically sound attribution of model predictions to specific features, enhancing interpretability.
Weakspots Diagnosis²² — Identifies and visualizes regions of suboptimal model performance across the feature space, highlighting areas that may require further attention. This test explains where and why the model struggles by connecting poor performance to specific feature regions.

¹⁷ FeaturesAUC

¹⁸ FeatureImportance

¹⁹ OverfitDiagnosis

²⁰ PermutationFeatureImportance

²¹ SHAPGlobalImportance

²² WeakspotsDiagnosis

When logged for documentation, each test automatically generates a comprehensive report as soon as it is executed.

ValidMind leverages generative AI to produce tailored, detailed summaries that include the test description, key insights, and a concise summary of results.
This automated documentation ensures that every test outcome is transparently recorded, clearly communicated, and immediately actionable.

Does ValidMind support using synthetic datasets?

The ValidMind Library supports you bringing your own datasets, including synthetic datasets, for testing and benchmarking purposes such as for fair lending and bias testing.²³
If you are unable to share your real-world data with us, ValidMind is happy to work with you to generate custom synthetic datasets based on characteristics of your data, or provide scripts to assist with synthetic dataset generation if details cannot be shared.

²³ Document a Credit Risk Model

Does ValidMind support monitoring models after deployment?

Yes, ValidMind offers ongoing monitoring support to help you regularly assess a model’s accuracy, stability, and robustness to ensure it remains reliable after deployment:

You can enable monitoring for both new and existing models.²⁴
You use the ValidMind Library to automatically populate the monitoring template for your model with data, providing a comprehensive view of your model’s performance over time.
You then access and examine these results within the ValidMind Platform, allowing you to identify any deviations from expected performance and take corrective actions as needed.²⁵
Once generated via the ValidMind Library, view and add metrics over time to your ongoing monitoring plans in the ValidMind Platform.²⁶

²⁴ Enable monitoring

²⁵ Review monitoring results

²⁶ Work with metrics over time

Learn more

How do the out-of-the-box tests developed by ​ValidMind work?

When do I use tests and tests suites?

Can we configure, customize, or add our own tests?

How do I log tests as a developer?

How do I log tests as a validator?

Do you include explainability-related testing and documentation?

Does ​ValidMind support using synthetic datasets?

Does ​ValidMind support monitoring models after deployment?

Learn more

Run tests and test suites

Test descriptions

Ongoing monitoring

How do the out-of-the-box tests developed by ValidMind work?

Does ValidMind support using synthetic datasets?

Does ValidMind support monitoring models after deployment?