ValidMind for model development 2 — Start the model development process

Learn how to use ValidMind for your end-to-end model documentation process with our series of four introductory notebooks. In this second notebook, you'll run tests and investigate results, then add the results or evidence to your documentation.

You'll become familiar with the individual tests available in ValidMind, as well as how to run them and change parameters as necessary. Using ValidMind's repository of individual tests as building blocks helps you ensure that a model is being built appropriately.

For a full list of out-of-the-box tests, refer to our Test descriptions or try the interactive Test sandbox.

Learn by doing

Our course tailor-made for developers new to ValidMind combines this series of notebooks with more a more in-depth introduction to the ValidMind Platform — Developer Fundamentals

Prerequisites

In order to log test results or evidence to your model documentation with this notebook, you'll need to first have:

Registered a model within the ValidMind Platform with a predefined documentation template
Installed the ValidMind Library in your local environment, allowing you to access all its features

Need help with the above steps?

Refer to the first notebook in this series: 1 — Set up the ValidMind Library

Setting up

Initialize the ValidMind Library

First, let's connect up the ValidMind Library to our model we previously registered in the ValidMind Platform:

In a browser, log in to ValidMind.
In the left sidebar, navigate to Inventory and select the model you registered for this "ValidMind for model development" series of notebooks.
Go to Getting Started and click Copy snippet to clipboard.

Next, load your model identifier credentials from an .env file or replace the placeholder with your own code snippet:

# Make sure the ValidMind Library is installed

%pip install -q validmind

# Load your model identifier credentials from an `.env` file

%load_ext dotenv
%dotenv .env

# Or replace with your code snippet

import validmind as vm

vm.init(
    # api_host="...",
    # api_key="...",
    # api_secret="...",
    # model="...",
    # document="documentation",
)

Note: you may need to restart the kernel to use updated packages.

2026-03-04 01:02:15,185 - ERROR(validmind.api_client): Future releases will require `document` as one of the options you must provide to `vm.init()`. To learn more, refer to https://docs.validmind.ai/developer/validmind-library.html
2026-03-04 01:02:15,584 - INFO(validmind.api_client): 🎉 Connected to ValidMind!
📊 Model: [ValidMind Academy] Model development (ID: cmalgf3qi02ce199qm3rdkl46)
📁 Document Type: model_documentation

Import sample dataset

Then, let's import the public Bank Customer Churn Prediction dataset from Kaggle.

In our below example, note that:

The target column, Exited has a value of 1 when a customer has churned and 0 otherwise.
The ValidMind Library provides a wrapper to automatically load the dataset as a Pandas DataFrame object. A Pandas Dataframe is a two-dimensional tabular data structure that makes use of rows and columns.

from validmind.datasets.classification import customer_churn as demo_dataset

print(
    f"Loaded demo dataset with: \n\n\t• Target column: '{demo_dataset.target_column}' \n\t• Class labels: {demo_dataset.class_labels}"
)

raw_df = demo_dataset.load_data()
raw_df.head()

Loaded demo dataset with: 

    • Target column: 'Exited' 
    • Class labels: {'0': 'Did not exit', '1': 'Exited'}

	CreditScore	Geography	Gender	Age	Tenure	Balance	NumOfProducts	HasCrCard	IsActiveMember	EstimatedSalary	Exited
0	619	France	Female	42	2	0.00	1	1	1	101348.88	1
1	608	Spain	Female	41	1	83807.86	1	0	1	112542.58	0
2	502	France	Female	42	8	159660.80	3	1	0	113931.57	1
3	699	France	Female	39	1	0.00	2	0	0	93826.63	0
4	850	Spain	Female	43	2	125510.82	1	1	1	79084.10	0

Identify qualitative tests

Next, let's say we want to do some data quality assessments by running a few individual tests.

Use the vm.tests.list_tests() function introduced by the first notebook in this series in combination with vm.tests.list_tags() and vm.tests.list_tasks() to find which prebuilt tests are relevant for data quality assessment:

tasks represent the kind of modeling task associated with a test. Here we'll focus on classification tasks.
tags are free-form descriptions providing more details about the test, for example, what category the test falls into. Here we'll focus on the data_quality tag.

# Get the list of available task types
sorted(vm.tests.list_tasks())

['classification',
 'clustering',
 'data_validation',
 'feature_extraction',
 'monitoring',
 'nlp',
 'regression',
 'residual_analysis',
 'text_classification',
 'text_generation',
 'text_qa',
 'text_summarization',
 'time_series_forecasting',
 'visualization']

# Get the list of available tags
sorted(vm.tests.list_tags())

['AUC',
 'analysis',
 'anomaly',
 'anomaly_detection',
 'bias_and_fairness',
 'binary_classification',
 'calibration',
 'categorical_data',
 'classification',
 'classification_metrics',
 'clustering',
 'correlation',
 'credit_risk',
 'data_analysis',
 'data_distribution',
 'data_quality',
 'data_validation',
 'descriptive_statistics',
 'dimensionality_reduction',
 'distribution',
 'embeddings',
 'feature_importance',
 'feature_selection',
 'few_shot',
 'forecasting',
 'frequency_analysis',
 'kmeans',
 'linear_regression',
 'llm',
 'logistic_regression',
 'metadata',
 'model_comparison',
 'model_diagnosis',
 'model_explainability',
 'model_interpretation',
 'model_performance',
 'model_predictions',
 'model_selection',
 'model_training',
 'model_validation',
 'multiclass_classification',
 'nlp',
 'normality',
 'numerical_data',
 'outlier',
 'outliers',
 'qualitative',
 'rag_performance',
 'ragas',
 'regression',
 'retrieval_performance',
 'scorecard',
 'seasonality',
 'senstivity_analysis',
 'sklearn',
 'stationarity',
 'statistical_test',
 'statistics',
 'statsmodels',
 'tabular_data',
 'text_data',
 'threshold_optimization',
 'time_series_data',
 'unit_root_test',
 'visualization',
 'zero_shot']

You can pass tags and tasks as parameters to the vm.tests.list_tests() function to filter the tests based on the tags and task types.

For example, to find tests related to tabular data quality for classification models, you can call list_tests() like this:

vm.tests.list_tests(task="classification", tags=["tabular_data", "data_quality"])

ID	Name	Description	Has Figure	Has Table	Required Inputs	Params	Tags	Tasks
validmind.data_validation.ClassImbalance	Class Imbalance	Evaluates and quantifies class distribution imbalance in a dataset used by a machine learning model....	True	True	['dataset']	{'min_percent_threshold': {'type': 'int', 'default': 10}}	['tabular_data', 'binary_classification', 'multiclass_classification', 'data_quality']	['classification']
validmind.data_validation.DescriptiveStatistics	Descriptive Statistics	Performs a detailed descriptive statistical analysis of both numerical and categorical data within a model's...	False	True	['dataset']	{}	['tabular_data', 'time_series_data', 'data_quality']	['classification', 'regression']
validmind.data_validation.Duplicates	Duplicates	Tests dataset for duplicate entries, ensuring model reliability via data quality verification....	False	True	['dataset']	{'min_threshold': {'type': '_empty', 'default': 1}}	['tabular_data', 'data_quality', 'text_data']	['classification', 'regression']
validmind.data_validation.HighCardinality	High Cardinality	Assesses the number of unique values in categorical columns to detect high cardinality and potential overfitting....	False	True	['dataset']	{'num_threshold': {'type': 'int', 'default': 100}, 'percent_threshold': {'type': 'float', 'default': 0.1}, 'threshold_type': {'type': 'str', 'default': 'percent'}}	['tabular_data', 'data_quality', 'categorical_data']	['classification', 'regression']
validmind.data_validation.HighPearsonCorrelation	High Pearson Correlation	Identifies highly correlated feature pairs in a dataset suggesting feature redundancy or multicollinearity....	False	True	['dataset']	{'max_threshold': {'type': 'float', 'default': 0.3}, 'top_n_correlations': {'type': 'int', 'default': 10}, 'feature_columns': {'type': 'list', 'default': None}}	['tabular_data', 'data_quality', 'correlation']	['classification', 'regression']
validmind.data_validation.MissingValues	Missing Values	Evaluates dataset quality by ensuring missing value percentage across all features does not exceed a set threshold....	False	True	['dataset']	{'min_percentage_threshold': {'type': 'float', 'default': 1.0}}	['tabular_data', 'data_quality']	['classification', 'regression']
validmind.data_validation.MissingValuesBarPlot	Missing Values Bar Plot	Assesses the percentage and distribution of missing values in the dataset via a bar plot, with emphasis on...	True	False	['dataset']	{'threshold': {'type': 'int', 'default': 80}, 'fig_height': {'type': 'int', 'default': 600}}	['tabular_data', 'data_quality', 'visualization']	['classification', 'regression']
validmind.data_validation.Skewness	Skewness	Evaluates the skewness of numerical data in a dataset to check against a defined threshold, aiming to ensure data...	False	True	['dataset']	{'max_threshold': {'type': '_empty', 'default': 1}}	['data_quality', 'tabular_data']	['classification', 'regression']
validmind.plots.BoxPlot	Box Plot	Generates customizable box plots for numerical features in a dataset with optional grouping using Plotly....	True	False	['dataset']	{'columns': {'type': 'Optional', 'default': None}, 'group_by': {'type': 'Optional', 'default': None}, 'width': {'type': 'int', 'default': 1800}, 'height': {'type': 'int', 'default': 1200}, 'colors': {'type': 'Optional', 'default': None}, 'show_outliers': {'type': 'bool', 'default': True}, 'title_prefix': {'type': 'str', 'default': 'Box Plot of'}}	['tabular_data', 'visualization', 'data_quality']	['classification', 'regression', 'clustering']
validmind.plots.HistogramPlot	Histogram Plot	Generates customizable histogram plots for numerical features in a dataset using Plotly....	True	False	['dataset']	{'columns': {'type': 'Optional', 'default': None}, 'bins': {'type': 'Union', 'default': 30}, 'color': {'type': 'str', 'default': 'steelblue'}, 'opacity': {'type': 'float', 'default': 0.7}, 'show_kde': {'type': 'bool', 'default': True}, 'normalize': {'type': 'bool', 'default': False}, 'log_scale': {'type': 'bool', 'default': False}, 'title_prefix': {'type': 'str', 'default': 'Histogram of'}, 'width': {'type': 'int', 'default': 1200}, 'height': {'type': 'int', 'default': 800}, 'n_cols': {'type': 'int', 'default': 2}, 'vertical_spacing': {'type': 'float', 'default': 0.15}, 'horizontal_spacing': {'type': 'float', 'default': 0.1}}	['tabular_data', 'visualization', 'data_quality']	['classification', 'regression', 'clustering']
validmind.stats.DescriptiveStats	Descriptive Stats	Provides comprehensive descriptive statistics for numerical features in a dataset....	False	True	['dataset']	{'columns': {'type': 'Optional', 'default': None}, 'include_advanced': {'type': 'bool', 'default': True}, 'confidence_level': {'type': 'float', 'default': 0.95}}	['tabular_data', 'statistics', 'data_quality']	['classification', 'regression', 'clustering']

Want to learn more about navigating ValidMind tests?

Refer to our notebook outlining the utilities available for viewing and understanding available ValidMind tests: Explore tests

Initialize the ValidMind dataset

With the individual tests we want to run identified, the next step is to connect your data with a ValidMind Dataset object. This step is always necessary every time you want to connect a dataset to documentation and produce test results through ValidMind, but you only need to do it once per dataset.

Initialize a ValidMind dataset object using the init_dataset function from the ValidMind (vm) module. For this example, we'll pass in the following arguments:

dataset — The raw dataset that you want to provide as input to tests.
input_id — A unique identifier that allows tracking what inputs are used when running each individual test.
target_column — A required argument if tests require access to true values. This is the name of the target column in the dataset.

# vm_raw_dataset is now a VMDataset object that you can pass to any ValidMind test
vm_raw_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column="Exited",
)

Running tests

Now that we know how to initialize a ValidMind dataset object, we're ready to run some tests!

You run individual tests by calling the run_test function provided by the validmind.tests module. For the examples below, we'll pass in the following arguments:

test_id — The ID of the test to run, as seen in the ID column when you run list_tests.
params — A dictionary of parameters for the test. These will override any default_params set in the test definition.

Run tabular data tests

The inputs expected by a test can also be found in the test definition — let's take validmind.data_validation.DescriptiveStatistics as an example.

Note that the output of the describe_test() function below shows that this test expects a dataset as input:

vm.tests.describe_test("validmind.data_validation.DescriptiveStatistics")

▶ Test: Descriptive Statistics ('validmind.data_validation.DescriptiveStatistics')

Descriptive Statistics

Performs a detailed descriptive statistical analysis of both numerical and categorical data within a model's dataset.

Purpose

The purpose of the Descriptive Statistics metric is to provide a comprehensive summary of both numerical and categorical data within a dataset. This involves statistics such as count, mean, standard deviation, minimum and maximum values for numerical data. For categorical data, it calculates the count, number of unique values, most common value and its frequency, and the proportion of the most frequent value relative to the total. The goal is to visualize the overall distribution of the variables in the dataset, aiding in understanding the model's behavior and predicting its performance.

Test Mechanism

The testing mechanism utilizes two in-built functions of pandas dataframes: describe() for numerical fields and value_counts() for categorical fields. The describe() function pulls out several summary statistics, while value_counts() accounts for unique values. The resulting data is formatted into two distinct tables, one for numerical and another for categorical variable summaries. These tables provide a clear summary of the main characteristics of the variables, which can be instrumental in assessing the model's performance.

Signs of High Risk

Skewed data or significant outliers can represent high risk. For numerical data, this may be reflected via a significant difference between the mean and median (50% percentile).
For categorical data, a lack of diversity (low count of unique values), or overdominance of a single category (high frequency of the top value) can indicate high risk.

Strengths

Provides a comprehensive summary of the dataset, shedding light on the distribution and characteristics of the variables under consideration.
It is a versatile and robust method, applicable to both numerical and categorical data.
Helps highlight crucial anomalies such as outliers, extreme skewness, or lack of diversity, which are vital in understanding model behavior during testing and validation.

Limitations

While this metric offers a high-level overview of the data, it may fail to detect subtle correlations or complex patterns.
Does not offer any insights on the relationship between variables.
Alone, descriptive statistics cannot be used to infer properties about future unseen data.
Should be used in conjunction with other statistical tests to provide a comprehensive understanding of the model's data.

Required Inputs: dataset

How to Run:

Code:

        
import validmind as vm

# inputs dictionary maps your inputs to the expected input names
# keys are the expected input names and values are the actual inputs
# values may be string input_ids or the actual VMDataset or VMModel objects
inputs = {
    "dataset": "my_vm_dataset"
}
params = {}

# to run and view the result of this test, run the following code:
result = vm.tests.run_test(
  "validmind.data_validation.DescriptiveStatistics", inputs=inputs, params=params
)

# To see the result of the test, ensure that you have called `vm.init()` and then run:
result.log()

Now, let's run a few tests to assess the quality of the dataset:

result = vm.tests.run_test(
    test_id="validmind.data_validation.DescriptiveStatistics",
    inputs={"dataset": vm_raw_dataset},
)

Descriptive Statistics

The Descriptive Statistics test evaluates the distributional characteristics of both numerical and categorical variables in the dataset. The results present summary statistics for eight numerical variables, including measures of central tendency, dispersion, and range, as well as frequency-based summaries for two categorical variables. The numerical table details counts, means, standard deviations, and percentiles, while the categorical table reports unique value counts and the dominance of the most frequent category. These results provide a comprehensive overview of the dataset’s structure and variable distributions.

Key insights:

Wide range and skewness in balance values: The Balance variable exhibits a minimum of 0 and a maximum of 250,898, with a mean of 76,434 and a median (50th percentile) of 97,264, indicating a right-skewed distribution with a substantial proportion of low or zero balances.
High concentration in categorical variables: Geography is dominated by France (50.12%), and Gender by Male (54.95%), indicating moderate to high category concentration.
Binary variables with balanced representation: HasCrCard and IsActiveMember are binary, with means of 0.70 and 0.52, respectively, suggesting a relatively even split between categories, particularly for IsActiveMember.
CreditScore and Age distributions are symmetric: CreditScore and Age show close alignment between mean and median (CreditScore mean: 650.16, median: 652.0; Age mean: 38.95, median: 37.0), indicating low skewness.
Low diversity in some categorical variables: Gender has only two unique values, and Geography has three, reflecting limited categorical diversity.

The dataset displays a mix of symmetric and skewed distributions among numerical variables, with Balance notably right-skewed and a significant proportion of zero values. Categorical variables show moderate to high concentration in top categories, particularly for Geography and Gender. Binary variables are relatively balanced, supporting robust representation across groups. Overall, the data structure is well-characterized, with some variables exhibiting concentration or skewness that may influence model behavior.

Tables

Numerical Variables

Name	Count	Mean	Std	Min	25%	50%	75%	90%	95%	Max
CreditScore	8000.0	650.1596	96.8462	350.0	583.0	652.0	717.0	778.0	813.0	850.0
Age	8000.0	38.9489	10.4590	18.0	32.0	37.0	44.0	53.0	60.0	92.0
Tenure	8000.0	5.0339	2.8853	0.0	3.0	5.0	8.0	9.0	9.0	10.0
Balance	8000.0	76434.0965	62612.2513	0.0	0.0	97264.0	128045.0	149545.0	162488.0	250898.0
NumOfProducts	8000.0	1.5325	0.5805	1.0	1.0	1.0	2.0	2.0	2.0	4.0
HasCrCard	8000.0	0.7026	0.4571	0.0	0.0	1.0	1.0	1.0	1.0	1.0
IsActiveMember	8000.0	0.5199	0.4996	0.0	0.0	1.0	1.0	1.0	1.0	1.0
EstimatedSalary	8000.0	99790.1880	57520.5089	12.0	50857.0	99505.0	149216.0	179486.0	189997.0	199992.0

Categorical Variables

Name	Count	Number of Unique Values	Top Value	Top Value Frequency	Top Value Frequency %
Geography	8000.0	3.0	France	4010.0	50.12
Gender	8000.0	2.0	Male	4396.0	54.95

result2 = vm.tests.run_test(
    test_id="validmind.data_validation.ClassImbalance",
    inputs={"dataset": vm_raw_dataset},
    params={"min_percent_threshold": 30},
)

❌ Class Imbalance

The Class Imbalance test evaluates the distribution of target classes in the dataset to identify potential imbalances that could impact model performance. The results table presents the percentage representation of each class in the "Exited" target variable, alongside a pass/fail outcome based on a minimum threshold of 30%. The accompanying bar plot visually depicts the proportion of each class, highlighting the relative frequencies within the dataset.

Key insights:

Majority class exceeds threshold: The "Exited = 0" class constitutes 79.80% of the dataset and passes the 30% minimum threshold.
Minority class below threshold: The "Exited = 1" class represents 20.20% of the dataset and fails the 30% minimum threshold, indicating under-representation.
Visual confirmation of imbalance: The bar plot demonstrates a pronounced disparity between the two classes, with the majority class substantially outnumbering the minority class.

The results indicate a notable class imbalance, with the minority class ("Exited = 1") falling below the specified 30% threshold. This distribution suggests that the dataset is skewed toward the majority class, which may influence model learning and predictive behavior. The imbalance is both quantitatively and visually apparent, warranting consideration in subsequent model development and evaluation processes.

Parameters:

{
  "min_percent_threshold": 30
}

Tables

Exited Class Imbalance

Exited	Percentage of Rows (%)	Pass/Fail
0	79.80%	Pass
1	20.20%	Fail

Figures

ValidMind Figure validmind.data_validation.ClassImbalance:4447

The output above shows that the class imbalance test did not pass according to the value we set for min_percent_threshold.

To address this issue, we'll re-run the test on some processed data. In this case let's apply a very simple rebalancing technique to the dataset:

import pandas as pd

raw_copy_df = raw_df.sample(frac=1)  # Create a copy of the raw dataset

# Create a balanced dataset with the same number of exited and not exited customers
exited_df = raw_copy_df.loc[raw_copy_df["Exited"] == 1]
not_exited_df = raw_copy_df.loc[raw_copy_df["Exited"] == 0].sample(n=exited_df.shape[0])

balanced_raw_df = pd.concat([exited_df, not_exited_df])
balanced_raw_df = balanced_raw_df.sample(frac=1, random_state=42)

With this new balanced dataset, you can re-run the individual test to see if it now passes the class imbalance test requirement.

As this is technically a different dataset, remember to first initialize a new ValidMind Dataset object to pass in as input as required by run_test():

# Register new data and now 'balanced_raw_dataset' is the new dataset object of interest
vm_balanced_raw_dataset = vm.init_dataset(
    dataset=balanced_raw_df,
    input_id="balanced_raw_dataset",
    target_column="Exited",
)

# Pass the initialized `balanced_raw_dataset` as input into the test run
result = vm.tests.run_test(
    test_id="validmind.data_validation.ClassImbalance",
    inputs={"dataset": vm_balanced_raw_dataset},
    params={"min_percent_threshold": 30},
)

✅ Class Imbalance

The Class Imbalance test evaluates the distribution of target classes within the dataset to identify potential imbalances that could affect model performance. The results table presents the percentage representation of each class in the target variable "Exited," alongside a pass/fail indicator based on a minimum threshold of 30%. The accompanying bar plot visually displays the proportion of each class, facilitating interpretation of class distribution.

Key insights:

Equal class distribution observed: Both classes (Exited = 0 and Exited = 1) each constitute exactly 50% of the dataset, indicating a perfectly balanced class distribution.
All classes exceed threshold: Each class surpasses the minimum percentage threshold of 30%, resulting in a "Pass" outcome for both classes.
No evidence of class imbalance: The visual plot and tabular results confirm the absence of under-represented classes.

The dataset demonstrates a balanced distribution across the target classes, with both classes equally represented and exceeding the specified minimum threshold. No class imbalance is detected, supporting the suitability of the dataset for unbiased model training and evaluation.

Parameters:

{
  "min_percent_threshold": 30
}

Tables

Exited Class Imbalance

Exited	Percentage of Rows (%)	Pass/Fail
0	50.00%	Pass
1	50.00%	Pass

Figures

ValidMind Figure validmind.data_validation.ClassImbalance:9133

Utilize test output

You can utilize the output from a ValidMind test for further use, for example, if you want to remove highly correlated features. Removing highly correlated features helps make the model simpler, more stable, and easier to understand.

Below we demonstrate how to retrieve the list of features with the highest correlation coefficients and use them to reduce the final list of features for modeling.

First, we'll run validmind.data_validation.HighPearsonCorrelation with the balanced_raw_dataset we initialized previously as input as is for comparison with later runs:

corr_result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_balanced_raw_dataset},
)

❌ High Pearson Correlation

The High Pearson Correlation test evaluates the linear relationships between feature pairs to identify potential redundancy or multicollinearity within the dataset. The results table presents the top ten strongest absolute Pearson correlation coefficients, along with their corresponding feature pairs and Pass/Fail status based on a threshold of 0.3. Only one feature pair exceeds the threshold, while the remaining pairs display lower correlation magnitudes and are marked as Pass.

Key insights:

One feature pair exceeds correlation threshold: The pair (Age, Exited) shows a Pearson correlation coefficient of 0.3475, surpassing the 0.3 threshold and resulting in a Fail status.
All other feature pairs below threshold: The remaining nine feature pairs have absolute correlation coefficients ranging from 0.1868 to 0.0395, all marked as Pass.
No evidence of widespread multicollinearity: Only a single pair demonstrates a correlation above the threshold, with no clusters of high correlations among other features.

The results indicate that the dataset contains minimal evidence of strong linear relationships between most feature pairs, with only the (Age, Exited) pair exceeding the specified correlation threshold. The overall correlation structure suggests low risk of feature redundancy or multicollinearity, supporting model interpretability and coefficient stability.

Parameters:

{
  "max_threshold": 0.3
}

Tables

Columns	Coefficient	Pass/Fail
(Age, Exited)	0.3475	Fail
(IsActiveMember, Exited)	-0.1868	Pass
(Balance, NumOfProducts)	-0.1855	Pass
(Balance, Exited)	0.1400	Pass
(NumOfProducts, Exited)	-0.0494	Pass
(Tenure, IsActiveMember)	-0.0459	Pass
(NumOfProducts, IsActiveMember)	0.0432	Pass
(CreditScore, IsActiveMember)	0.0419	Pass
(Tenure, EstimatedSalary)	0.0407	Pass
(Age, NumOfProducts)	-0.0395	Pass

The output above shows that the test did not pass according to the value we set for max_threshold.

corr_result is an object of type TestResult. We can inspect the result object to see what the test has produced:

print(type(corr_result))
print("Result ID: ", corr_result.result_id)
print("Params: ", corr_result.params)
print("Passed: ", corr_result.passed)
print("Tables: ", corr_result.tables)

<class 'validmind.vm_models.result.result.TestResult'>
Result ID:  validmind.data_validation.HighPearsonCorrelation
Params:  {'max_threshold': 0.3}
Passed:  False
Tables:  [ResultTable]

Let's remove the highly correlated features and create a new VM dataset object.

We'll begin by checking out the table in the result and extracting a list of features that failed the test:

# Extract table from `corr_result.tables`
features_df = corr_result.tables[0].data
features_df

	Columns	Coefficient	Pass/Fail
0	(Age, Exited)	0.3475	Fail
1	(IsActiveMember, Exited)	-0.1868	Pass
2	(Balance, NumOfProducts)	-0.1855	Pass
3	(Balance, Exited)	0.1400	Pass
4	(NumOfProducts, Exited)	-0.0494	Pass
5	(Tenure, IsActiveMember)	-0.0459	Pass
6	(NumOfProducts, IsActiveMember)	0.0432	Pass
7	(CreditScore, IsActiveMember)	0.0419	Pass
8	(Tenure, EstimatedSalary)	0.0407	Pass
9	(Age, NumOfProducts)	-0.0395	Pass

# Extract list of features that failed the test
high_correlation_features = features_df[features_df["Pass/Fail"] == "Fail"]["Columns"].tolist()
high_correlation_features

['(Age, Exited)']

Next, extract the feature names from the list of strings (example: (Age, Exited) > Age):

high_correlation_features = [feature.split(",")[0].strip("()") for feature in high_correlation_features]
high_correlation_features

['Age']

Now, it's time to re-initialize the dataset with the highly correlated features removed.

Note the use of a different input_id. This allows tracking the inputs used when running each individual test.

# Remove the highly correlated features from the dataset
balanced_raw_no_age_df = balanced_raw_df.drop(columns=high_correlation_features)

# Re-initialize the dataset object
vm_raw_dataset_preprocessed = vm.init_dataset(
    dataset=balanced_raw_no_age_df,
    input_id="raw_dataset_preprocessed",
    target_column="Exited",
)

Re-running the test with the reduced feature set should pass the test:

corr_result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_raw_dataset_preprocessed},
)

✅ High Pearson Correlation

The High Pearson Correlation test evaluates the linear relationships between feature pairs to identify potential redundancy or multicollinearity. The results table presents the top ten absolute Pearson correlation coefficients among feature pairs, along with their Pass/Fail status relative to a threshold of 0.3. All reported coefficients are below the threshold, and each feature pair is accompanied by its correlation value and assessment outcome.

Key insights:

No feature pairs exceed correlation threshold: All absolute Pearson correlation coefficients are below the 0.3 threshold, with the highest observed value being -0.1868 between IsActiveMember and Exited.
Weak linear relationships dominate: The strongest correlations, both positive and negative, are of low magnitude, indicating weak linear associations among the top feature pairs.
Consistent Pass status across all pairs: Every feature pair in the top ten list is marked as Pass, reflecting the absence of high linear correlation within the evaluated set.

The results indicate that the dataset does not exhibit strong linear dependencies among the top feature pairs, as all observed correlations fall well below the specified threshold. This suggests a low risk of feature redundancy or multicollinearity based on linear relationships, supporting the interpretability and stability of subsequent modeling efforts.

Parameters:

{
  "max_threshold": 0.3
}

Tables

Columns	Coefficient	Pass/Fail
(IsActiveMember, Exited)	-0.1868	Pass
(Balance, NumOfProducts)	-0.1855	Pass
(Balance, Exited)	0.1400	Pass
(NumOfProducts, Exited)	-0.0494	Pass
(Tenure, IsActiveMember)	-0.0459	Pass
(NumOfProducts, IsActiveMember)	0.0432	Pass
(CreditScore, IsActiveMember)	0.0419	Pass
(Tenure, EstimatedSalary)	0.0407	Pass
(HasCrCard, IsActiveMember)	-0.0349	Pass
(CreditScore, Exited)	-0.0291	Pass

You can also plot the correlation matrix to visualize the new correlation between features:

corr_result = vm.tests.run_test(
    test_id="validmind.data_validation.PearsonCorrelationMatrix",
    inputs={"dataset": vm_raw_dataset_preprocessed},
)

Pearson Correlation Matrix

The Pearson Correlation Matrix test evaluates the linear dependency between all pairs of numerical variables in the dataset. The resulting heat map displays the Pearson correlation coefficients, with values ranging from -1 to 1, where the color intensity indicates the strength and direction of the relationship. The matrix highlights any coefficients with an absolute value above 0.7, signaling high correlation, but no such values are present in this result. All observed correlations are low to moderate, with the highest absolute value being 0.19.

Key insights:

No high correlations detected: All pairwise correlation coefficients fall well below the 0.7 threshold, indicating an absence of strong linear relationships among variables.
Weak negative association between NumOfProducts and Balance: The strongest observed correlation is -0.19 between NumOfProducts and Balance, which remains in the weak range.
Minimal linear dependency with target variable: The target variable, Exited, shows low correlations with all predictors, with the highest being -0.19 (IsActiveMember) and 0.14 (Balance).

The correlation structure demonstrates that the dataset does not exhibit significant linear redundancy among numerical variables. All observed relationships are weak, supporting the independence of features and reducing concerns regarding multicollinearity or unnecessary dimensionality.

Figures

ValidMind Figure validmind.data_validation.PearsonCorrelationMatrix:c137

Documenting test results

Now that we've done some analysis on two different datasets, we can use ValidMind to easily document why certain things were done to our raw data with testing to support it.

Every test result returned by the run_test() function has a .log() method that can be used to send the test results to the ValidMind Platform:

When using run_documentation_tests(), documentation sections will be automatically populated with the results of all tests registered in the documentation template.
When logging individual test results to the platform, you'll need to manually add those results to the desired section of the model documentation.

To demonstrate how to add test results to your model documentation, we'll populate the entire Data Preparation section of the documentation using the clean vm_raw_dataset_preprocessed dataset as input, and then document an additional individual result for the highly correlated dataset vm_balanced_raw_dataset.

Run and log multiple tests

run_documentation_tests() allows you to run multiple tests at once and automatically log the results to your documentation. Below, we'll run the tests using the previously initialized vm_raw_dataset_preprocessed as input — this will populate the entire Data Preparation section for every test that is part of the documentation template.

For this example, we'll pass in the following arguments:

inputs: Any inputs to be passed to the tests.
config: A dictionary <test_id>:<test_config> that allows configuring each test individually. Each test config requires the following:
- params: Individual test parameters.
- inputs: Individual test inputs. This overrides any inputs passed from the run_documentation_tests() function.

When including explicit configuration for individual tests, you'll need to specify the inputs even if they mirror what is included in your global configuration.

# Individual test config with inputs specified
test_config = {
    "validmind.data_validation.ClassImbalance": {
        "params": {"min_percent_threshold": 30},
        "inputs": {"dataset": vm_raw_dataset_preprocessed},
    },
    "validmind.data_validation.HighPearsonCorrelation": {
        "params": {"max_threshold": 0.3},
        "inputs": {"dataset": vm_raw_dataset_preprocessed},
    },
}

# Global test config
tests_suite = vm.run_documentation_tests(
    inputs={
        "dataset": vm_raw_dataset_preprocessed,
    },
    config=test_config,
    section=["data_preparation"],
)

Test suite complete!

26/26 (100.0%)

Test Suite Results: Binary Classification V2

Check out the updated documentation on ValidMind.

Template for binary classification models.

▶ Data Preparation

Data Preparation

▶ Test Result: Dataset Description (validmind.data_validation.DatasetDescription)

Dataset Description

The Dataset Description test provides a comprehensive summary of the dataset's structure and content, detailing column types, counts, missingness, and the distribution of unique values. The results table enumerates each feature, specifying its data type, the number of non-missing entries, the proportion of missing values, and the count and proportion of distinct values. All columns are fully populated, and the dataset includes a mix of numeric and categorical variables with varying cardinality.

Key insights:

No missing values across all columns: All features report 0 missing entries, with 0.0% missingness, indicating complete data coverage for the entire dataset.
High cardinality in select numeric features: EstimatedSalary and Balance exhibit high distinct value proportions (100% and 68.25% respectively), reflecting continuous or near-continuous distributions.
Low cardinality in categorical features: Categorical variables such as Gender, HasCrCard, IsActiveMember, and Exited have only 2 distinct values each, while Geography has 3, indicating limited category diversity.
Moderate cardinality in other numeric features: CreditScore and Tenure show 434 and 11 distinct values respectively, suggesting moderate granularity, while NumOfProducts has 4 unique values.

The dataset is fully populated with no missing data, supporting robust downstream analysis. Numeric features display a range of cardinalities, from highly granular (EstimatedSalary, Balance) to more discrete (NumOfProducts, Tenure). Categorical variables are low in diversity, which may influence model interpretability and feature encoding strategies. Overall, the dataset structure is well-defined and suitable for modeling applications requiring complete and varied input features.

Tables

Dataset Description

Name	Type	Count	Distinct	Distinct %
CreditScore	Numeric	3232.0	434	0.1343
Geography	Categorical	3232.0	3	0.0009
Gender	Categorical	3232.0	2	0.0006
Tenure	Numeric	3232.0	11	0.0034
Balance	Numeric	3232.0	2206	0.6825
NumOfProducts	Numeric	3232.0	4	0.0012
HasCrCard	Categorical	3232.0	2	0.0006
IsActiveMember	Categorical	3232.0	2	0.0006
EstimatedSalary	Numeric	3232.0	3232	1.0000
Exited	Categorical	3232.0	2	0.0006

▶ Test Result: Class Imbalance (validmind.data_validation.ClassImbalance)

▶ Test Result: Duplicates (validmind.data_validation.Duplicates)

Number of Duplicates	Percentage of Rows (%)
0	0.0

▶ Test Result: High Cardinality (validmind.data_validation.HighCardinality)

Column	Number of Distinct Values	Percentage of Distinct Values (%)	Pass/Fail
Geography	3	0.0928	Pass
Gender	2	0.0619	Pass

▶ Test Result: Missing Values (validmind.data_validation.MissingValues)

Column	Number of Missing Values	Percentage of Missing Values (%)	Pass/Fail
CreditScore	0	0.0	Pass
Geography	0	0.0	Pass
Gender	0	0.0	Pass
Tenure	0	0.0	Pass
Balance	0	0.0	Pass
NumOfProducts	0	0.0	Pass
HasCrCard	0	0.0	Pass
IsActiveMember	0	0.0	Pass
EstimatedSalary	0	0.0	Pass
Exited	0	0.0	Pass

▶ Test Result: Skewness (validmind.data_validation.Skewness)

Column	Skewness	Pass/Fail
CreditScore	-0.0719	Pass
Tenure	0.0288	Pass
Balance	-0.2806	Pass
NumOfProducts	1.2449	Fail
HasCrCard	-0.8742	Pass
IsActiveMember	0.1502	Pass
EstimatedSalary	-0.0133	Pass
Exited	0.0000	Pass

▶ Test Result: Unique Rows (validmind.data_validation.UniqueRows)

❌ Unique Rows

The UniqueRows test evaluates dataset diversity by measuring the proportion of unique values in each column relative to the total row count, with the objective of identifying columns that meet or fall short of a prescribed uniqueness threshold. The results table presents, for each column, the number and percentage of unique values, along with a pass/fail outcome based on whether the uniqueness percentage exceeds the threshold. Columns such as EstimatedSalary and Balance exhibit high uniqueness percentages, while categorical and low-cardinality columns like Gender and Geography show low uniqueness and do not meet the threshold.

Key insights:

High uniqueness in continuous variables: EstimatedSalary (100.0%) and Balance (68.26%) display the highest percentages of unique values, both passing the uniqueness threshold.
Low uniqueness in categorical variables: Columns such as Gender (0.06%), Geography (0.09%), HasCrCard (0.06%), IsActiveMember (0.06%), and Exited (0.06%) have very low uniqueness percentages and fail the test.
Mixed results in ordinal and discrete variables: CreditScore (13.43%) passes the threshold, while Tenure (0.34%) and NumOfProducts (0.12%) do not, reflecting limited diversity in these features.
Majority of columns fail uniqueness threshold: Only 3 out of 10 columns (EstimatedSalary, Balance, CreditScore) pass the uniqueness criterion, with the remaining 7 columns failing.

The results indicate that while continuous variables in the dataset exhibit high diversity, most categorical and low-cardinality columns have limited uniqueness and do not meet the prescribed threshold. This pattern reflects the inherent structure of the data, with high-cardinality features contributing most to overall diversity, and categorical features showing constrained variability. The overall dataset contains a mix of highly unique and low-uniqueness columns, with the majority of columns not passing the uniqueness test.

Tables

Column	Number of Unique Values	Percentage of Unique Values (%)	Pass/Fail
CreditScore	434	13.4282	Pass
Geography	3	0.0928	Fail
Gender	2	0.0619	Fail
Tenure	11	0.3403	Fail
Balance	2206	68.2550	Pass
NumOfProducts	4	0.1238	Fail
HasCrCard	2	0.0619	Fail
IsActiveMember	2	0.0619	Fail
EstimatedSalary	3232	100.0000	Pass
Exited	2	0.0619	Fail

▶ Test Result: Too Many Zero Values (validmind.data_validation.TooManyZeroValues)

❌ Too Many Zero Values

The TooManyZeroValues test identifies numerical columns in the dataset that contain a proportion of zero values exceeding a predefined threshold, set at 0.03%. The results table summarizes, for each numerical variable, the total row count, the number and percentage of zero values, and whether the column passes or fails the threshold criterion. All four evaluated variables—Tenure, Balance, HasCrCard, and IsActiveMember—are reported with their respective zero value statistics and test outcomes.

Key insights:

All evaluated variables exceed zero value threshold: Each of the four numerical columns fails the test, with zero value percentages ranging from 4.21% (Tenure) to 53.74% (IsActiveMember), all substantially above the 0.03% threshold.
High concentration of zeros in categorical indicators: HasCrCard and IsActiveMember display particularly high proportions of zero values, at 29.98% and 53.74% respectively, indicating a large segment of the dataset with these attributes set to zero.
Balance variable shows substantial sparsity: The Balance column contains 31.78% zero values, reflecting a significant portion of records with no balance.
Tenure variable exhibits lower, but still elevated, zero incidence: Tenure registers 4.21% zero values, the lowest among the variables tested, yet still well above the threshold.

All tested numerical columns exhibit zero value proportions that exceed the defined threshold, resulting in test failures across the board. The prevalence of zero values is especially pronounced in the Balance, HasCrCard, and IsActiveMember variables, indicating substantial data sparsity or concentration at zero for these features. The Tenure variable, while less affected, also surpasses the threshold, highlighting a consistent pattern of elevated zero incidence among the evaluated columns.

Tables

Variable	Row Count	Number of Zero Values	Percentage of Zero Values (%)	Pass/Fail
Tenure	3232	136	4.2079	Fail
Balance	3232	1027	31.7760	Fail
HasCrCard	3232	969	29.9814	Fail
IsActiveMember	3232	1737	53.7438	Fail

▶ Test Result: IQR Outliers Table (validmind.data_validation.IQROutliersTable)

IQR Outliers Table

The Interquartile Range Outliers Table (IQROutliersTable) test identifies and summarizes outliers in numerical features using the IQR method, providing counts and summary statistics for detected outliers. The results table presents the total number of outliers, mean values, and the distribution of outlier values (minimum, quartiles, and maximum) for each evaluated variable. Outlier detection was performed on the variables CreditScore and NumOfProducts, with results indicating the extent and characteristics of outlier values in each feature.

Key insights:

Concentrated outliers in NumOfProducts: NumOfProducts exhibits 44 outliers, all with a value of 4, which is the maximum observed for this variable. The 25th, 50th, and 75th percentiles of outlier values are all 4.0, indicating a uniform outlier group at the upper boundary.
Low-frequency, low-value outliers in CreditScore: CreditScore has 8 outliers, with values ranging from 350 to 367. The lower quartiles (25th and 50th percentiles) are at 350.0, and the 75th percentile is 360.5, indicating that most outliers are clustered at the lower end of the CreditScore distribution.
Distinct outlier patterns across features: Outlier counts and value distributions differ substantially between variables, with NumOfProducts showing a higher frequency of identical upper-bound outliers and CreditScore displaying a smaller number of lower-bound outliers with a narrow value range.

The results indicate that outlier presence and distribution are feature-specific, with NumOfProducts showing a concentrated cluster of upper-bound outliers and CreditScore exhibiting a limited set of lower-bound outliers. These patterns highlight variable-specific data characteristics that may influence subsequent data processing or model behavior.

Tables

Summary of Outliers Detected by IQR Method

Variable	Total Count of Outliers	Mean Value of Variable	Minimum Outlier Value	Outlier Value at 25th Percentile	Outlier Value at 50th Percentile	Outlier Value at 75th Percentile	Maximum Outlier Value
CreditScore	8	648.1377	350	350.0	350.0	360.5	367
NumOfProducts	44	1.5053	4	4.0	4.0	4.0	4

▶ Test Result: IQR Outliers Bar Plot (validmind.data_validation.IQROutliersBarPlot)

IQR Outliers Bar Plot

The IQROutliersBarPlot test visualizes the distribution of outliers across percentiles for numeric features using the Interquartile Range (IQR) method. The resulting bar plots display the count of outliers within each quartile for the features CreditScore and NumOfProducts, enabling assessment of where outliers are concentrated within the data distribution. Outlier counts are segmented into four percentile bins: 0-25, 25-50, 50-75, and 75-100, providing a granular view of outlier localization.

Key insights:

Outliers in CreditScore concentrated in upper percentiles: Outliers are present only in the 50-75 and 75-100 percentile ranges for CreditScore, with 6 outliers in the 50-75 range and 2 in the 75-100 range. No outliers are observed in the lower half of the distribution.
NumOfProducts outliers isolated to top quartile: All outliers for NumOfProducts are confined to the 75-100 percentile range, with a total count of 44. No outliers are detected in the lower three quartiles.
Absence of lower percentile outliers: Both features show zero outliers in the 0-25 and 25-50 percentile bins, indicating that outlier behavior is restricted to the upper half of the distributions.

The results indicate that outlier presence is highly localized to the upper percentiles for both CreditScore and NumOfProducts, with NumOfProducts exhibiting a pronounced concentration of outliers exclusively in the top quartile. This pattern suggests that extreme values are not uniformly distributed but are instead clustered at the higher end of the feature ranges, which may influence model sensitivity to upper-tail data points. No evidence of outlier activity is observed in the lower or middle percentiles for either feature.

Figures

ValidMind Figure validmind.data_validation.IQROutliersBarPlot:71ef

ValidMind Figure validmind.data_validation.IQROutliersBarPlot:95b9

▶ Test Result: Descriptive Statistics (validmind.data_validation.DescriptiveStatistics)

Descriptive Statistics

The Descriptive Statistics test evaluates the distributional characteristics of both numerical and categorical variables in the dataset. The results present summary statistics for seven numerical variables, including measures of central tendency, dispersion, and range, as well as frequency-based summaries for two categorical variables. The numerical table details counts, means, standard deviations, and percentiles, while the categorical table reports unique value counts and the prevalence of the most common category. These results provide a comprehensive overview of the dataset’s structure and variable distributions.

Key insights:

Wide range and skewness in Balance: The Balance variable exhibits a minimum of 0 and a maximum of 250,898, with a mean (82,421.84) substantially lower than the median (103,574), indicating right-skewness and a concentration of lower values.
CreditScore distribution is symmetric and complete: CreditScore shows a mean (648.14) closely aligned with the median (650), and a full range from 350 to 850, suggesting a well-populated and symmetric distribution.
Categorical dominance in Geography and Gender: The France category in Geography accounts for 46.69% of records, while Male represents 50.84% of Gender, indicating moderate dominance but not extreme concentration in either variable.
Binary variables with balanced representation: HasCrCard and IsActiveMember are binary, with means of 0.70 and 0.46, respectively, reflecting a reasonable split between categories and no evidence of severe imbalance.
No missing data detected: All variables report a count of 3,232, indicating complete data coverage for both numerical and categorical fields.

The dataset demonstrates comprehensive coverage and generally balanced distributions across both numerical and categorical variables. While most variables display symmetry and completeness, Balance is notably right-skewed, with a substantial proportion of low or zero values. Categorical variables show moderate dominance of specific categories but retain diversity, and binary variables are well-represented across both classes. Overall, the data structure supports robust analysis, with no evidence of missingness or extreme category concentration.

Tables

Numerical Variables

Name	Count	Mean	Std	Min	25%	50%	75%	90%	95%	Max
CreditScore	3232.0	648.1377	99.4166	350.0	579.0	650.0	719.0	780.0	817.0	850.0
Tenure	3232.0	4.9821	2.8836	0.0	3.0	5.0	7.0	9.0	9.0	10.0
Balance	3232.0	82421.8415	61625.7232	0.0	0.0	103574.0	130372.0	151216.0	164209.0	250898.0
NumOfProducts	3232.0	1.5053	0.6699	1.0	1.0	1.0	2.0	2.0	3.0	4.0
HasCrCard	3232.0	0.7002	0.4582	0.0	0.0	1.0	1.0	1.0	1.0	1.0
IsActiveMember	3232.0	0.4626	0.4987	0.0	0.0	0.0	1.0	1.0	1.0	1.0
EstimatedSalary	3232.0	100294.8363	57357.7114	12.0	51701.0	101089.0	149210.0	179270.0	189158.0	199992.0

Categorical Variables

Name	Count	Number of Unique Values	Top Value	Top Value Frequency	Top Value Frequency %
Geography	3232.0	3.0	France	1509.0	46.69
Gender	3232.0	2.0	Male	1643.0	50.84

▶ Test Result: Pearson Correlation Matrix (validmind.data_validation.PearsonCorrelationMatrix)

▶ Test Result: High Pearson Correlation (validmind.data_validation.HighPearsonCorrelation)

✅ High Pearson Correlation

Key insights:

No feature pairs exceed correlation threshold: All absolute Pearson correlation coefficients are below the 0.3 threshold, with the highest magnitude observed at 0.1868 between IsActiveMember and Exited.
Weak linear relationships among top pairs: The strongest correlations, both positive and negative, remain modest in magnitude, indicating limited linear association between the examined features.
Consistent Pass status across all pairs: Each of the top ten feature pairs is classified as Pass, reflecting the absence of high linear correlation within the evaluated set.

The results indicate that the dataset does not exhibit strong linear relationships or multicollinearity among the top correlated feature pairs. The observed correlation structure supports the interpretability and stability of subsequent modeling, with no evidence of feature redundancy based on the tested threshold.

Parameters:

{
  "max_threshold": 0.3
}

Tables

Columns	Coefficient	Pass/Fail
(IsActiveMember, Exited)	-0.1868	Pass
(Balance, NumOfProducts)	-0.1855	Pass
(Balance, Exited)	0.1400	Pass
(NumOfProducts, Exited)	-0.0494	Pass
(Tenure, IsActiveMember)	-0.0459	Pass
(NumOfProducts, IsActiveMember)	0.0432	Pass
(CreditScore, IsActiveMember)	0.0419	Pass
(Tenure, EstimatedSalary)	0.0407	Pass
(HasCrCard, IsActiveMember)	-0.0349	Pass
(CreditScore, Exited)	-0.0291	Pass

Run and log an individual test

Next, we'll use the previously initialized vm_balanced_raw_dataset (that still has a highly correlated Age column) as input to run an individual test, then log the result to the ValidMind Platform.

When running individual tests, you can use a custom result_id to tag the individual result with a unique identifier:

This result_id can be appended to test_id with a : separator.
The balanced_raw_dataset result identifier will correspond to the balanced_raw_dataset input, the dataset that still has the Age column.

result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation:balanced_raw_dataset",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_balanced_raw_dataset},
)
result.log()

❌ High Pearson Correlation Balanced Raw Dataset

The High Pearson Correlation test identifies pairs of features in the dataset that exhibit strong linear relationships, with the aim of detecting potential feature redundancy or multicollinearity. The results table lists the top ten feature pairs ranked by the absolute value of their Pearson correlation coefficients, along with a Pass or Fail status based on a threshold of 0.3. Only one feature pair exceeds the threshold, while the remaining pairs display lower correlation values.

Key insights:

One feature pair exceeds correlation threshold: The pair (Age, Exited) has a correlation coefficient of 0.3475, surpassing the 0.3 threshold and resulting in a Fail status.
All other feature pairs below threshold: The remaining nine feature pairs have absolute correlation coefficients ranging from 0.1868 to 0.0395, all classified as Pass.
Predominantly weak linear relationships: Most feature pairs exhibit weak linear associations, with coefficients well below the threshold.

The results indicate that the dataset contains minimal evidence of strong linear relationships among most feature pairs, with only the (Age, Exited) pair showing moderate correlation above the defined threshold. The overall correlation structure suggests low risk of widespread multicollinearity or feature redundancy based on linear associations.

Parameters:

{
  "max_threshold": 0.3
}

Tables

Columns	Coefficient	Pass/Fail
(Age, Exited)	0.3475	Fail
(IsActiveMember, Exited)	-0.1868	Pass
(Balance, NumOfProducts)	-0.1855	Pass
(Balance, Exited)	0.1400	Pass
(NumOfProducts, Exited)	-0.0494	Pass
(Tenure, IsActiveMember)	-0.0459	Pass
(NumOfProducts, IsActiveMember)	0.0432	Pass
(CreditScore, IsActiveMember)	0.0419	Pass
(Tenure, EstimatedSalary)	0.0407	Pass
(Age, NumOfProducts)	-0.0395	Pass

2026-03-04 01:03:29,789 - INFO(validmind.vm_models.result.result): Test driven block with result_id validmind.data_validation.HighPearsonCorrelation:balanced_raw_dataset does not exist in model's document

Note the output returned indicating that a test-driven block doesn't currently exist in your model's documentation for this particular test ID.

That's expected, as when we run individual tests the results logged need to be manually added to your documentation within the ValidMind Platform.

Add individual test results to model documentation

With the test results logged, let's head to the model we connected to at the beginning of this notebook and insert our test results into the documentation (Need more help?):

From the Inventory in the ValidMind Platform, go to the model you connected to earlier.
In the left sidebar that appears for your model, click Documentation under Documents.
Locate the Data Preparation section and click on 2.3. Correlations and Interactions to expand that section.
Hover under the Pearson Correlation Matrix content block until a horizontal dashed line with a + button appears, indicating that you can insert a new block.
Click + and then select Test-Driven Block under FROM LIBRARY:
- Click on VM Library under TEST-DRIVEN in the left sidebar.
- In the search bar, type in HighPearsonCorrelation.
- Select HighPearsonCorrelation:balanced_raw_dataset as the test.
A preview of the test gets shown:
Finally, click Insert 1 Test Result to Document to add the test result to the documentation.

Confirm that the individual results for the high correlation test has been correctly inserted into section 2.3. Correlations and Interactions of the documentation.
Finalize the documentation by editing the test result's description block to explain the changes you made to the raw data and the reasons behind them as shown in the screenshot below:

Model testing

So far, we've focused on the data assessment and pre-processing that usually occurs prior to any models being built. Now, let's instead assume we have already built a model and we want to incorporate some model results into our documentation.

Train simple logistic regression model

Using ValidMind tests, we'll train a simple logistic regression model on our dataset and evaluate its performance by using the LogisticRegression class from the sklearn.linear_model.

To start, let's grab the first few rows from the balanced_raw_no_age_df dataset with the highly correlated features removed we initialized earlier:

balanced_raw_no_age_df.head()

	CreditScore	Geography	Gender	Tenure	Balance	NumOfProducts	HasCrCard	IsActiveMember	EstimatedSalary	Exited
2934	650	Spain	Male	7	160599.06	2	1	1	28391.52	0
6022	655	France	Female	6	0.00	1	1	1	188639.28	0
4594	624	France	Male	7	0.00	2	1	1	108841.83	0
1769	720	France	Female	10	0.00	2	1	1	56229.72	1
6458	609	France	Male	1	108019.27	3	1	1	184524.65	1

Before training the model, we need to encode the categorical features in the dataset:

Use the OneHotEncoder class from the sklearn.preprocessing module to encode the categorical features.
The categorical features in the dataset are Geography and Gender.

balanced_raw_no_age_df = pd.get_dummies(
    balanced_raw_no_age_df, columns=["Geography", "Gender"], drop_first=True
)
balanced_raw_no_age_df.head()

	CreditScore	Tenure	Balance	NumOfProducts	HasCrCard	IsActiveMember	EstimatedSalary	Exited	Geography_Germany	Geography_Spain	Gender_Male
2934	650	7	160599.06	2	1	1	28391.52	0	False	True	True
6022	655	6	0.00	1	1	1	188639.28	0	False	False	False
4594	624	7	0.00	2	1	1	108841.83	0	False	False	True
1769	720	10	0.00	2	1	1	56229.72	1	False	False	False
6458	609	1	108019.27	3	1	1	184524.65	1	False	False	True

We'll split our preprocessed dataset into training and testing, to help assess how well the model generalizes to unseen data:

We start by dividing our balanced_raw_no_age_df dataset into training and test subsets using train_test_split, with 80% of the data allocated to training (train_df) and 20% to testing (test_df).
From each subset, we separate the features (all columns except "Exited") into X_train and X_test, and the target column ("Exited") into y_train and y_test.

from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(balanced_raw_no_age_df, test_size=0.20)

X_train = train_df.drop("Exited", axis=1)
y_train = train_df["Exited"]
X_test = test_df.drop("Exited", axis=1)
y_test = test_df["Exited"]

Then using GridSearchCV, we'll find the best-performing hyperparameters or settings and save them:

from sklearn.linear_model import LogisticRegression

# Logistic Regression grid params
log_reg_params = {
    "penalty": ["l1", "l2"],
    "C": [0.001, 0.01, 0.1, 1, 10, 100, 1000],
    "solver": ["liblinear"],
}

# Grid search for Logistic Regression
from sklearn.model_selection import GridSearchCV

grid_log_reg = GridSearchCV(LogisticRegression(), log_reg_params)
grid_log_reg.fit(X_train, y_train)

# Logistic Regression best estimator
log_reg = grid_log_reg.best_estimator_

/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1135: FutureWarning: 'penalty' was deprecated in version 1.8 and will be removed in 1.10. To avoid this warning, leave 'penalty' set to its default value and use 'l1_ratio' or 'C' instead. Use l1_ratio=0 instead of penalty='l2', l1_ratio=1 instead of penalty='l1', and C=np.inf instead of penalty=None.
  warnings.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:1160: UserWarning: Inconsistent values: penalty=l1 with l1_ratio=0.0. penalty is deprecated. Please use l1_ratio only.
  warnings.warn(

Initialize model evaluation objects

The last step for evaluating the model's performance is to initialize the ValidMind Dataset and Model objects in preparation for assigning model predictions to each dataset.

# Initialize the datasets into their own dataset objects
vm_train_ds = vm.init_dataset(
    input_id="train_dataset_final",
    dataset=train_df,
    target_column="Exited",
)

vm_test_ds = vm.init_dataset(
    input_id="test_dataset_final",
    dataset=test_df,
    target_column="Exited",
)

You'll also need to initialize a ValidMind model object (vm_model) that can be passed to other functions for analysis and tests on the data for each of our three models.

You simply initialize this model object with vm.init_model():

# Register the model
vm_model = vm.init_model(log_reg, input_id="log_reg_model_v1")

Assign predictions

Once the model has been registered you can assign model predictions to the training and testing datasets.

The assign_predictions() method from the Dataset object can link existing predictions to any number of models.
This method links the model's class prediction values and probabilities to our vm_train_ds and vm_test_ds datasets.

If no prediction values are passed, the method will compute predictions automatically:

vm_train_ds.assign_predictions(model=vm_model)
vm_test_ds.assign_predictions(model=vm_model)

2026-03-04 01:03:31,001 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while
2026-03-04 01:03:31,003 - INFO(validmind.vm_models.dataset.utils): Done running predict_proba()
2026-03-04 01:03:31,004 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while
2026-03-04 01:03:31,006 - INFO(validmind.vm_models.dataset.utils): Done running predict()
2026-03-04 01:03:31,008 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while
2026-03-04 01:03:31,009 - INFO(validmind.vm_models.dataset.utils): Done running predict_proba()
2026-03-04 01:03:31,009 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while
2026-03-04 01:03:31,011 - INFO(validmind.vm_models.dataset.utils): Done running predict()

Run the model evaluation tests

In this next example, we'll focus on running the tests within the Model Development section of the model documentation. Only tests associated with this section will be executed, and the corresponding results will be updated in the model documentation.

Note the additional config that is passed to run_documentation_tests() — this allows you to override inputs or params in certain tests.
In our case, we want to explicitly use the vm_train_ds for the validmind.model_validation.sklearn.ClassifierPerformance:in_sample test, since it's supposed to run on the training dataset and not the test dataset.

test_config = {
    "validmind.model_validation.sklearn.ClassifierPerformance:in_sample": {
        "inputs": {
            "dataset": vm_train_ds,
            "model": vm_model,
        },
    }
}
results = vm.run_documentation_tests(
    section=["model_development"],
    inputs={
        "dataset": vm_test_ds,  # Any test that requires a single dataset will use vm_test_ds
        "model": vm_model,
        "datasets": (
            vm_train_ds,
            vm_test_ds,
        ),  # Any test that requires multiple datasets will use vm_train_ds and vm_test_ds
    },
    config=test_config,
)

Test suite complete!

34/34 (100.0%)

Test Suite Results: Binary Classification V2

Check out the updated documentation on ValidMind.

Template for binary classification models.

▶ Model Development

Model Development

▶ Test Result: Model Metadata (validmind.model_validation.ModelMetadata)

Modeling Technique	Modeling Framework	Framework Version	Programming Language
SKlearnModel	sklearn	1.8.0	Python

▶ Test Result: Dataset Split (validmind.data_validation.DatasetSplit)

Dataset	Size	Proportion
train_dataset_final	2585	79.98%
test_dataset_final	647	20.02%
Total	3232	100%

▶ Test Result: Population Stability Index (validmind.model_validation.sklearn.PopulationStabilityIndex)

Population Stability Index

The Population Stability Index (PSI) test evaluates the stability of model prediction distributions by comparing the binned proportions of predictions between the initial (training) and new (test) datasets. The results are presented in a summary table and a grouped bar chart, showing the population ratios for each bin and the corresponding PSI values. The total PSI value is provided to quantify the overall degree of distributional shift between the two datasets.

Key insights:

Low overall PSI value: The total PSI across all bins is 0.0116, indicating minimal distributional change between the initial and new datasets.
Consistent bin proportions: Population ratios for each bin remain closely aligned between datasets, with differences in bin proportions generally below 1.5 percentage points.
No bins with elevated PSI: Individual bin PSI values are all below 0.004, with the highest observed in bins 2, 8, and 9, but still at low absolute levels.

The results indicate strong stability in the distribution of model predictions between the initial and new datasets. The low total PSI and consistently small bin-level differences suggest that the underlying population characteristics and model output distributions have remained stable over time. No evidence of material population or prediction drift is observed in this assessment.

Tables

Population Stability Index for train_dataset_final and test_dataset_final Datasets

Bin	Count Initial	Percent Initial (%)	Count New	Percent New (%)	PSI
0	135	5.2224	36	5.5641	0.0002
1	229	8.8588	54	8.3462	0.0003
2	293	11.3346	61	9.4281	0.0035
3	365	14.1199	92	14.2195	0.0000
4	376	14.5455	99	15.3014	0.0004
5	342	13.2302	83	12.8284	0.0001
6	310	11.9923	78	12.0556	0.0000
7	278	10.7544	63	9.7372	0.0010
8	89	3.4429	29	4.4822	0.0027
9	168	6.4990	52	8.0371	0.0033
Total	2585	100.0000	647	100.0000	0.0116

Figures

ValidMind Figure validmind.model_validation.sklearn.PopulationStabilityIndex:dc83

▶ Test Result: Confusion Matrix (validmind.model_validation.sklearn.ConfusionMatrix)

▶ Test Result: Classifier Performance In Sample (validmind.model_validation.sklearn.ClassifierPerformance:in_sample)

Classifier Performance In Sample

The Classifier Performance test evaluates the predictive effectiveness of the classification model by reporting precision, recall, F1-score, accuracy, and ROC AUC metrics. The results are presented for each class, as well as macro and weighted averages, providing a comprehensive view of model performance across all classes. The accuracy and ROC AUC scores are also reported, summarizing the model's overall classification ability and its discrimination capacity between classes.

Key insights:

Balanced class-wise performance: Precision, recall, and F1-scores are similar across both classes, with values ranging from 0.6146 to 0.6502, indicating consistent model behavior for each class.
Moderate overall accuracy: The model achieves an accuracy of 0.6325, reflecting moderate correct classification rates on the in-sample data.
Macro and weighted averages align: Macro and weighted averages for precision, recall, and F1-score are closely matched (all approximately 0.632), suggesting balanced class distribution and performance.
ROC AUC indicates moderate discrimination: The ROC AUC score of 0.6765 demonstrates moderate ability to distinguish between classes.

The results indicate that the model exhibits consistent and balanced performance across both classes, with moderate accuracy and discrimination capability as reflected by the ROC AUC. The close alignment of macro and weighted averages further supports the absence of significant class imbalance or performance disparity. Overall, the model demonstrates stable but moderate classification effectiveness on the evaluated dataset.

Tables

Precision, Recall, and F1

Class	Precision	Recall	F1
0	0.6299	0.6502	0.6399
1	0.6353	0.6146	0.6248
Weighted Average	0.6326	0.6325	0.6324
Macro Average	0.6326	0.6324	0.6323

Accuracy and ROC AUC

Metric	Value
Accuracy	0.6325
ROC AUC	0.6765

▶ Test Result: Classifier Performance Out Of Sample (validmind.model_validation.sklearn.ClassifierPerformance:out_of_sample)

Classifier Performance Out Of Sample

The Classifier Performance: out_of_sample test evaluates the predictive performance of the classification model using precision, recall, F1-Score, accuracy, and ROC AUC metrics. The results are presented for each class, as well as macro and weighted averages, providing a comprehensive view of model behavior across classes. The accuracy and ROC AUC scores are reported separately, summarizing overall classification effectiveness and the model's ability to distinguish between classes.

Key insights:

Balanced class-wise performance: Precision, recall, and F1-Score are similar across both classes (Class 0: F1 = 0.645; Class 1: F1 = 0.6471), indicating consistent model behavior for each class.
Moderate overall accuracy: The model achieves an accuracy of 0.6461, reflecting moderate correct classification rates on the out-of-sample dataset.
Macro and weighted averages align: Macro and weighted averages for precision, recall, and F1-Score are nearly identical (all ≈ 0.646), suggesting balanced class distribution and uniform performance.
ROC AUC indicates moderate separability: The ROC AUC score of 0.6988 demonstrates moderate ability of the model to distinguish between classes.

The test results indicate that the model exhibits consistent and balanced performance across both classes, with moderate accuracy and F1-Score values. The ROC AUC score further supports moderate class separability, and the close alignment of macro and weighted averages suggests no significant class imbalance or disproportionate performance. Overall, the model demonstrates stable but moderate predictive capability on the out-of-sample data.

Tables

Precision, Recall, and F1

Class	Precision	Recall	F1
0	0.6361	0.6541	0.6450
1	0.6562	0.6383	0.6471
Weighted Average	0.6463	0.6461	0.6461
Macro Average	0.6462	0.6462	0.6461

Accuracy and ROC AUC

Metric	Value
Accuracy	0.6461
ROC AUC	0.6988

▶ Test Result: Precision Recall Curve (validmind.model_validation.sklearn.PrecisionRecallCurve)

▶ Test Result: ROC Curve (validmind.model_validation.sklearn.ROCCurve)

▶ Test Result: Training Test Degradation (validmind.model_validation.sklearn.TrainingTestDegradation)

✅ Training Test Degradation

The TrainingTestDegradation test evaluates the extent of performance degradation between the training and test datasets across key classification metrics, including precision, recall, and F1-score, for each class. The results table presents metric scores for both datasets, the calculated degradation percentage, and the corresponding pass/fail status based on a 10% threshold. All metrics are reported for both class 1 and class 0, with degradation percentages and pass/fail outcomes clearly indicated.

Key insights:

All metrics show negative degradation: Degradation percentages for all metrics and classes are negative, indicating that test set performance slightly exceeds training set performance.
Degradation well below threshold: The largest absolute degradation observed is -3.85% (recall for class 1), which is substantially below the 10% threshold.
Consistent pass status across metrics: All evaluated metrics for both classes are marked as "Pass," reflecting uniformity in generalization performance.

The results indicate that the model demonstrates stable and consistent generalization from training to test data, with no evidence of overfitting or material performance drop. Negative degradation values across all metrics suggest marginally improved performance on the test set relative to training, and all metrics comfortably satisfy the predefined degradation threshold. This pattern reflects robust model behavior with respect to generalization across both classes.

Tables

Class	Metric	train_dataset_final Score	test_dataset_final Score	Degradation (%)	Pass/Fail
1	Precision	0.6353	0.6562	-3.2909	Pass
1	Recall	0.6146	0.6383	-3.8545	Pass
1	F1-Score	0.6248	0.6471	-3.5766	Pass
0	Precision	0.6299	0.6361	-0.9899	Pass
0	Recall	0.6502	0.6541	-0.5932	Pass
0	F1-Score	0.6399	0.6450	-0.7943	Pass

▶ Test Result: Minimum Accuracy (validmind.model_validation.sklearn.MinimumAccuracy)

Score	Threshold	Pass/Fail
0.6461	0.7	Fail

▶ Test Result: Minimum F1 Score (validmind.model_validation.sklearn.MinimumF1Score)

Score	Threshold	Pass/Fail
0.6471	0.5	Pass

▶ Test Result: Minimum ROCAUC Score (validmind.model_validation.sklearn.MinimumROCAUCScore)

Score	Threshold	Pass/Fail
0.6988	0.5	Pass

▶ Test Result: Permutation Feature Importance (validmind.model_validation.sklearn.PermutationFeatureImportance)

Permutation Feature Importance

The Permutation Feature Importance (PFI) test evaluates the relative importance of each input feature by measuring the decrease in model performance when the feature's values are randomly permuted. The resulting bar plot displays the magnitude of importance for each feature, with higher values indicating greater impact on model predictions. The plot ranks features from most to least important, providing a visual summary of their respective contributions to model performance.

Key insights:

Geography_Germany is the most influential feature: Geography_Germany exhibits the highest permutation importance, indicating a substantial impact on model performance when its values are permuted.
IsActiveMember has significant predictive value: IsActiveMember ranks second in importance, with a notable decrease in model performance upon permutation.
Gender_Male and NumOfProducts contribute moderately: Both Gender_Male and NumOfProducts show moderate importance, with lower but still meaningful impacts on model predictions.
Balance, Tenure, and CreditScore have limited influence: These features display lower importance scores, suggesting a smaller effect on model output.
EstimatedSalary, Geography_Spain, and HasCrCard are minimally impactful: The lowest importance values are observed for EstimatedSalary, Geography_Spain, and HasCrCard, indicating minimal contribution to model performance.

The permutation feature importance results indicate that model predictions are primarily driven by Geography_Germany and IsActiveMember, with moderate contributions from Gender_Male and NumOfProducts. Other features, including Balance, Tenure, CreditScore, EstimatedSalary, Geography_Spain, and HasCrCard, have progressively lower influence on the model's predictive capability. The distribution of importance values highlights a concentration of predictive power in a small subset of features.

Figures

ValidMind Figure validmind.model_validation.sklearn.PermutationFeatureImportance:0d91

▶ Test Result: SHAP Global Importance (validmind.model_validation.sklearn.SHAPGlobalImportance)

SHAP Global Importance

The SHAPGlobalImportance test evaluates global feature importance by quantifying and visualizing the contribution of each input variable to model predictions using SHAP values. The results are presented through a mean importance plot, which ranks features by their normalized absolute SHAP values, and a summary plot, which displays the distribution and directionality of SHAP values for each feature across all instances. These visualizations provide a comprehensive overview of which features most strongly influence model output and how their values affect predictions.

Key insights:

Dominance of Geography_Germany and IsActiveMember: Geography_Germany and IsActiveMember exhibit the highest normalized SHAP values, indicating they are the most influential features in the model’s decision process.
Gender_Male and Balance show moderate importance: Gender_Male and Balance contribute substantially to model predictions, with normalized SHAP values notably higher than the remaining features.
Low impact from EstimatedSalary and HasCrCard: EstimatedSalary and HasCrCard display minimal normalized SHAP values, suggesting limited influence on model output.
SHAP value dispersion varies by feature: The summary plot reveals that the most important features (Geography_Germany, IsActiveMember, Gender_Male) have a wider spread of SHAP values, reflecting greater variability in their impact across instances, while lower-ranked features show more concentrated distributions near zero.

The SHAPGlobalImportance results indicate that model predictions are primarily driven by a small subset of features, with Geography_Germany and IsActiveMember exerting the greatest influence. The distribution of SHAP values highlights both the magnitude and variability of feature impacts, with the most important features demonstrating substantial and diverse effects on model output. Lower-ranked features contribute minimally, as evidenced by their low normalized SHAP values and limited SHAP value dispersion.

Figures

ValidMind Figure validmind.model_validation.sklearn.SHAPGlobalImportance:cf1d

ValidMind Figure validmind.model_validation.sklearn.SHAPGlobalImportance:1ea5

▶ Test Result: Weakspots Diagnosis (validmind.model_validation.sklearn.WeakspotsDiagnosis)

❌ Weakspots Diagnosis

The WeakspotsDiagnosis test evaluates model performance across segmented regions of the feature space to identify areas where key metrics fall below defined thresholds. The results are presented as bar charts and tables for each feature, showing accuracy, precision, recall, and F1 scores for both training and test datasets across feature bins. The visualizations highlight bins where performance metrics do not meet the specified thresholds, enabling targeted assessment of model weaknesses.

Key insights:

Consistent sub-threshold performance in multiple feature bins: Across features such as Balance, CreditScore, Tenure, and NumOfProducts, no test dataset bin achieves the accuracy or F1 thresholds (0.75 and 0.7, respectively), with most bins also falling below these thresholds for precision and recall.
Pronounced weak spots in Balance and NumOfProducts: For Balance, test accuracy ranges from 0.33 to 0.76, with most bins below the 0.75 threshold; F1 scores are similarly below 0.7 except for the (100359.236, 125449.045] bin. NumOfProducts shows accuracy and F1 scores below thresholds in all bins, with the lowest performance in the highest product count bin.
CreditScore bins show moderate performance but remain below thresholds: Test accuracy for CreditScore ranges from 0.53 to 0.83, but only the lowest bin exceeds the threshold, and F1 scores remain below 0.7 in all but the lowest and highest bins.
Categorical features (e.g., Gender_Male, Geography_Germany, HasCrCard) exhibit similar patterns: For these features, both accuracy and F1 scores are below thresholds in all bins, with recall and precision generally above 0.5 but not reaching higher thresholds.
Limited disparity between training and test performance: Across most features and bins, the gap between training and test metrics is moderate, indicating limited evidence of overfitting within individual bins.

The results indicate that the model does not meet the established performance thresholds in any region of the feature space for the evaluated features. Weak spots are present across both numerical and categorical features, with the most pronounced deficiencies observed in Balance and NumOfProducts bins. While some bins approach the thresholds, overall model performance remains consistently below target levels, and no evidence of severe overfitting is observed between training and test splits within bins. These findings highlight persistent areas of underperformance that warrant further investigation.

Tables

Slice	Number of Records	Feature	Accuracy	Precision	Recall	F1	Dataset
(-250.898, 25089.809]	193	Balance	0.5959	0.4615	0.3243	0.3810	test_dataset_final
(25089.809, 50179.618]	3	Balance	0.6667	0.0000	0.0000	0.0000	test_dataset_final
(50179.618, 75269.427]	31	Balance	0.5806	0.5385	0.5000	0.5185	test_dataset_final
(75269.427, 100359.236]	80	Balance	0.5875	0.6154	0.5714	0.5926	test_dataset_final
(100359.236, 125449.045]	138	Balance	0.7609	0.8202	0.8111	0.8156	test_dataset_final
(125449.045, 150538.854]	134	Balance	0.6716	0.6667	0.7945	0.7250	test_dataset_final
(150538.854, 175628.663]	53	Balance	0.6415	0.6250	0.7407	0.6780	test_dataset_final
(175628.663, 200718.472]	9	Balance	0.5556	0.4000	0.6667	0.5000	test_dataset_final
(200718.472, 225808.281]	6	Balance	0.3333	0.6667	0.4000	0.5000	test_dataset_final
(225808.281, 250898.09]	0	Balance	0.0000	0.0000	0.0000	0.0000	test_dataset_final
(-250.898, 25089.809]	837	Balance	0.6356	0.5545	0.3567	0.4341	train_dataset_final
(25089.809, 50179.618]	22	Balance	0.4545	0.5556	0.3846	0.4545	train_dataset_final
(50179.618, 75269.427]	92	Balance	0.4674	0.5135	0.3800	0.4368	train_dataset_final
(75269.427, 100359.236]	276	Balance	0.6341	0.6301	0.6619	0.6456	train_dataset_final
(100359.236, 125449.045]	590	Balance	0.6932	0.7335	0.7607	0.7469	train_dataset_final
(125449.045, 150538.854]	503	Balance	0.6322	0.6309	0.7463	0.6838	train_dataset_final
(150538.854, 175628.663]	199	Balance	0.5829	0.5546	0.6875	0.6140	train_dataset_final
(175628.663, 200718.472]	51	Balance	0.4510	0.5152	0.5862	0.5484	train_dataset_final
(200718.472, 225808.281]	13	Balance	0.6154	0.8750	0.6364	0.7368	train_dataset_final
(225808.281, 250898.09]	2	Balance	0.5000	1.0000	0.5000	0.6667	train_dataset_final
(349.5, 400.0]	6	CreditScore	0.8333	1.0000	0.8333	0.9091	test_dataset_final
(400.0, 450.0]	17	CreditScore	0.5294	0.6000	0.8182	0.6923	test_dataset_final
(450.0, 500.0]	32	CreditScore	0.6562	0.6000	0.8000	0.6857	test_dataset_final
(500.0, 550.0]	78	CreditScore	0.6282	0.6977	0.6522	0.6742	test_dataset_final
(550.0, 600.0]	101	CreditScore	0.5941	0.5882	0.6000	0.5941	test_dataset_final
(600.0, 650.0]	113	CreditScore	0.6283	0.5882	0.5882	0.5882	test_dataset_final
(650.0, 700.0]	118	CreditScore	0.7288	0.7000	0.6731	0.6863	test_dataset_final
(700.0, 750.0]	81	CreditScore	0.6296	0.6829	0.6222	0.6512	test_dataset_final
(750.0, 800.0]	60	CreditScore	0.6667	0.6897	0.6452	0.6667	test_dataset_final
(800.0, 850.0]	41	CreditScore	0.6341	0.7333	0.5000	0.5946	test_dataset_final
(349.5, 400.0]	8	CreditScore	0.5000	1.0000	0.5000	0.6667	train_dataset_final
(400.0, 450.0]	39	CreditScore	0.4872	0.5714	0.5217	0.5455	train_dataset_final
(450.0, 500.0]	126	CreditScore	0.6111	0.5735	0.6610	0.6142	train_dataset_final
(500.0, 550.0]	274	CreditScore	0.6241	0.5944	0.6538	0.6227	train_dataset_final
(550.0, 600.0]	357	CreditScore	0.6331	0.6440	0.6613	0.6525	train_dataset_final
(600.0, 650.0]	480	CreditScore	0.6083	0.6223	0.5918	0.6067	train_dataset_final
(650.0, 700.0]	496	CreditScore	0.6290	0.6309	0.6000	0.6151	train_dataset_final
(700.0, 750.0]	390	CreditScore	0.6487	0.6763	0.5909	0.6307	train_dataset_final
(750.0, 800.0]	239	CreditScore	0.6778	0.6726	0.6552	0.6638	train_dataset_final
(800.0, 850.0]	176	CreditScore	0.6761	0.6515	0.5584	0.6014	train_dataset_final
(-108.151, 20081.823]	63	EstimatedSalary	0.6984	0.7179	0.7778	0.7467	test_dataset_final
(20081.823, 40071.896]	44	EstimatedSalary	0.6136	0.6923	0.4091	0.5143	test_dataset_final
(40071.896, 60061.969]	57	EstimatedSalary	0.5965	0.6333	0.6129	0.6230	test_dataset_final
(60061.969, 80052.042]	72	EstimatedSalary	0.6528	0.6389	0.6571	0.6479	test_dataset_final
(80052.042, 100042.115]	64	EstimatedSalary	0.5781	0.5000	0.7037	0.5846	test_dataset_final
(100042.115, 120032.188]	66	EstimatedSalary	0.5758	0.6071	0.5000	0.5484	test_dataset_final
(120032.188, 140022.261]	71	EstimatedSalary	0.6761	0.6923	0.7105	0.7013	test_dataset_final
(140022.261, 160012.334]	65	EstimatedSalary	0.6923	0.7143	0.6250	0.6667	test_dataset_final
(160012.334, 180002.407]	85	EstimatedSalary	0.6235	0.6154	0.5854	0.6000	test_dataset_final
(180002.407, 199992.48]	60	EstimatedSalary	0.7500	0.8000	0.7273	0.7619	test_dataset_final
(-108.151, 20081.823]	256	EstimatedSalary	0.6680	0.6942	0.6364	0.6640	train_dataset_final
(20081.823, 40071.896]	273	EstimatedSalary	0.6703	0.6594	0.6791	0.6691	train_dataset_final
(40071.896, 60061.969]	251	EstimatedSalary	0.6335	0.6356	0.6048	0.6198	train_dataset_final
(60061.969, 80052.042]	256	EstimatedSalary	0.6094	0.6475	0.5809	0.6124	train_dataset_final
(80052.042, 100042.115]	264	EstimatedSalary	0.6402	0.6250	0.6299	0.6275	train_dataset_final
(100042.115, 120032.188]	265	EstimatedSalary	0.6151	0.5969	0.6063	0.6016	train_dataset_final
(120032.188, 140022.261]	272	EstimatedSalary	0.6140	0.5833	0.6063	0.5946	train_dataset_final
(140022.261, 160012.334]	246	EstimatedSalary	0.6260	0.6121	0.6017	0.6068	train_dataset_final
(160012.334, 180002.407]	257	EstimatedSalary	0.6187	0.6640	0.5971	0.6288	train_dataset_final
(180002.407, 199992.48]	245	EstimatedSalary	0.6286	0.6379	0.6016	0.6192	train_dataset_final
(-0.001, 0.1]	322	Gender_Male	0.6429	0.6516	0.7912	0.7146	test_dataset_final
(0.9, 1.0]	325	Gender_Male	0.6492	0.6667	0.4490	0.5366	test_dataset_final
(-0.001, 0.1]	1267	Gender_Male	0.6330	0.6439	0.7801	0.7055	train_dataset_final
(0.9, 1.0]	1318	Gender_Male	0.6320	0.6158	0.4084	0.4911	train_dataset_final
(-0.001, 0.1]	438	Geography_Germany	0.6096	0.5530	0.3946	0.4606	test_dataset_final
(0.9, 1.0]	209	Geography_Germany	0.7225	0.7287	0.9514	0.8253	test_dataset_final
(-0.001, 0.1]	1804	Geography_Germany	0.6197	0.5826	0.4131	0.4834	train_dataset_final
(0.9, 1.0]	781	Geography_Germany	0.6620	0.6772	0.9216	0.7807	train_dataset_final
(-0.001, 0.1]	519	Geography_Spain	0.6513	0.6631	0.6801	0.6715	test_dataset_final
(0.9, 1.0]	128	Geography_Spain	0.6250	0.6098	0.4386	0.5102	test_dataset_final
(-0.001, 0.1]	1980	Geography_Spain	0.6399	0.6463	0.6558	0.6510	train_dataset_final
(0.9, 1.0]	605	Geography_Spain	0.6083	0.5833	0.4615	0.5153	train_dataset_final
(-0.001, 0.1]	184	HasCrCard	0.6250	0.6703	0.6100	0.6387	test_dataset_final
(0.9, 1.0]	463	HasCrCard	0.6544	0.6507	0.6507	0.6507	test_dataset_final
(-0.001, 0.1]	785	HasCrCard	0.6178	0.6199	0.6168	0.6183	train_dataset_final
(0.9, 1.0]	1800	HasCrCard	0.6389	0.6424	0.6137	0.6277	train_dataset_final
(-0.001, 0.1]	352	IsActiveMember	0.6307	0.6680	0.7860	0.7222	test_dataset_final
(0.9, 1.0]	295	IsActiveMember	0.6644	0.6119	0.3596	0.4530	test_dataset_final
(-0.001, 0.1]	1385	IsActiveMember	0.6173	0.6407	0.7761	0.7019	train_dataset_final
(0.9, 1.0]	1200	IsActiveMember	0.6500	0.6162	0.3458	0.4430	train_dataset_final
(0.997, 1.3]	381	NumOfProducts	0.6299	0.7177	0.6466	0.6803	test_dataset_final
(1.9, 2.2]	219	NumOfProducts	0.6804	0.3855	0.6275	0.4776	test_dataset_final
(2.8, 3.1]	35	NumOfProducts	0.6571	1.0000	0.6471	0.7857	test_dataset_final
(3.7, 4.0]	12	NumOfProducts	0.5000	1.0000	0.5000	0.6667	test_dataset_final
(0.997, 1.3]	1495	NumOfProducts	0.6415	0.7176	0.6547	0.6847	train_dataset_final
(1.9, 2.2]	904	NumOfProducts	0.6449	0.3536	0.5545	0.4319	train_dataset_final
(2.8, 3.1]	154	NumOfProducts	0.5260	0.9740	0.5137	0.6726	train_dataset_final
(3.7, 4.0]	32	NumOfProducts	0.3750	1.0000	0.3750	0.5455	train_dataset_final
(-0.01, 1.0]	97	Tenure	0.6701	0.6984	0.7719	0.7333	test_dataset_final
(1.0, 2.0]	60	Tenure	0.6000	0.5455	0.6667	0.6000	test_dataset_final
(2.0, 3.0]	74	Tenure	0.6216	0.6500	0.6500	0.6500	test_dataset_final
(3.0, 4.0]	68	Tenure	0.6324	0.6364	0.6176	0.6269	test_dataset_final
(4.0, 5.0]	71	Tenure	0.6197	0.5385	0.7000	0.6087	test_dataset_final
(5.0, 6.0]	54	Tenure	0.7037	0.5789	0.5789	0.5789	test_dataset_final
(6.0, 7.0]	65	Tenure	0.7231	0.6562	0.7500	0.7000	test_dataset_final
(7.0, 8.0]	63	Tenure	0.5714	0.6786	0.5135	0.5846	test_dataset_final
(8.0, 9.0]	62	Tenure	0.7097	0.9130	0.5676	0.7000	test_dataset_final
(9.0, 10.0]	33	Tenure	0.5758	0.8000	0.4000	0.5333	test_dataset_final
(-0.01, 1.0]	379	Tenure	0.6306	0.6368	0.6564	0.6465	train_dataset_final
(1.0, 2.0]	262	Tenure	0.5878	0.5839	0.6107	0.5970	train_dataset_final
(2.0, 3.0]	269	Tenure	0.6171	0.6183	0.6045	0.6113	train_dataset_final
(3.0, 4.0]	252	Tenure	0.6071	0.6532	0.5912	0.6207	train_dataset_final
(4.0, 5.0]	278	Tenure	0.6583	0.6277	0.6615	0.6442	train_dataset_final
(5.0, 6.0]	263	Tenure	0.6160	0.6417	0.5704	0.6039	train_dataset_final
(6.0, 7.0]	240	Tenure	0.6542	0.6477	0.5229	0.5787	train_dataset_final
(7.0, 8.0]	266	Tenure	0.6805	0.6638	0.6260	0.6444	train_dataset_final
(8.0, 9.0]	253	Tenure	0.6324	0.6308	0.6457	0.6381	train_dataset_final
(9.0, 10.0]	123	Tenure	0.6504	0.6885	0.6364	0.6614	train_dataset_final

Figures

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:d219

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:39a8

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:b7be

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:3574

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:7654

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:5e8c

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:ae66

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:b87e

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:c1ff

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:120d

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:1f05

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:89dd

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:61cf

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:8561

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:ab8a

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:30cd

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:10f3

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:9541

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:5b02

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:b160

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:636c

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:efd0

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:1112

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:4172

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:f253

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:ae9c

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:3e7d

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:fc3e

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:bf02

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:29f4

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:b817

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:3e01

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:eeb7

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:a913

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:c2ff

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:6c7a

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:e4a0

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:373c

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:e12a

ValidMind Figure validmind.model_validation.sklearn.WeakspotsDiagnosis:e70c

▶ Test Result: Overfit Diagnosis (validmind.model_validation.sklearn.OverfitDiagnosis)

Overfit Diagnosis

The Overfit Diagnosis test evaluates the extent to which model performance differs between training and test sets across feature segments, using AUC as the performance metric for classification. The results present AUC gaps for binned regions of key features, highlighting where the difference between training and test AUC exceeds the default threshold of 0.04. Both tabular and visual outputs are provided, showing the magnitude and direction of AUC gaps for each feature slice, with positive values indicating higher training performance relative to test.

Key insights:

Localized high overfit in Balance feature: The Balance feature exhibits the largest AUC gap, with the (200718.472, 225808.281] segment showing a gap of 0.5455, indicating a substantial drop in test set performance relative to training.
Multiple segments above threshold in EstimatedSalary and Tenure: EstimatedSalary segments (40071.896, 60061.969] and (80052.042, 100042.115], and Tenure segment (7.0, 8.0], display AUC gaps of 0.0976, 0.0497, and 0.1166 respectively, all exceeding the 0.04 threshold.
CreditScore segments with moderate overfit: CreditScore slices (550.0, 600.0] and (750.0, 800.0] show AUC gaps of 0.043 and 0.0571, both above the threshold, indicating moderate overfitting in these score ranges.
No significant overfit in binary/categorical features: Features such as HasCrCard, IsActiveMember, Geography_Germany, Geography_Spain, and Gender_Male do not display AUC gaps exceeding the threshold in any segment.
Directionality of gaps varies by feature: While most overfit regions show higher training AUC, some segments in other features display negative gaps, indicating lower training performance relative to test, but these do not exceed the threshold.

The Overfit Diagnosis test identifies several feature segments with AUC gaps above the 0.04 threshold, most notably in Balance, EstimatedSalary, Tenure, and CreditScore. The largest observed gap occurs in a high-balance segment, with additional moderate overfit detected in specific salary, tenure, and credit score ranges. Binary and categorical features do not exhibit significant overfitting. The results indicate that overfitting is concentrated in particular numerical feature segments, with the most pronounced effects in regions with limited test data.

Tables

Overfit Diagnosis

Feature	Slice	Number of Training Records	Number of Test Records	Training AUC	Test AUC	Gap
CreditScore	(550.0, 600.0]	357	101	0.6771	0.6341	0.0430
CreditScore	(750.0, 800.0]	239	60	0.7011	0.6440	0.0571
Tenure	(7.0, 8.0]	266	63	0.7205	0.6040	0.1166
Balance	(200718.472, 225808.281]	13	6	0.5455	0.0000	0.5455
EstimatedSalary	(20081.823, 40071.896]	273	44	0.6957	0.6467	0.0490
EstimatedSalary	(40071.896, 60061.969]	251	57	0.7006	0.6030	0.0976
EstimatedSalary	(80052.042, 100042.115]	264	64	0.6863	0.6366	0.0497

Figures

ValidMind Figure validmind.model_validation.sklearn.OverfitDiagnosis:08b5

ValidMind Figure validmind.model_validation.sklearn.OverfitDiagnosis:ece9

ValidMind Figure validmind.model_validation.sklearn.OverfitDiagnosis:af3f

ValidMind Figure validmind.model_validation.sklearn.OverfitDiagnosis:1401

ValidMind Figure validmind.model_validation.sklearn.OverfitDiagnosis:7b53

ValidMind Figure validmind.model_validation.sklearn.OverfitDiagnosis:2d87

ValidMind Figure validmind.model_validation.sklearn.OverfitDiagnosis:0a85

ValidMind Figure validmind.model_validation.sklearn.OverfitDiagnosis:ade9

ValidMind Figure validmind.model_validation.sklearn.OverfitDiagnosis:f6f3

ValidMind Figure validmind.model_validation.sklearn.OverfitDiagnosis:f048

▶ Test Result: Robustness Diagnosis (validmind.model_validation.sklearn.RobustnessDiagnosis)

✅ Robustness Diagnosis

The Robustness Diagnosis test evaluates the resilience of the model 'log_reg_model_v1' by measuring AUC performance decay under increasing levels of Gaussian noise applied to numeric input features. The results table and plot display AUC values and performance decay across both train and test datasets for perturbation sizes ranging from 0.0 to 0.5 standard deviations. AUC values are tracked alongside performance decay metrics, with all results compared against defined performance thresholds.

Key insights:

Minimal AUC decay at low perturbation: For perturbation sizes up to 0.2, both train and test datasets show negligible AUC decay, with performance decay values remaining below 0.004.
Stable performance across moderate noise: At perturbation sizes of 0.3 and 0.4, AUC values for both datasets remain within 0.013 of baseline, and all test cases pass the defined thresholds.
No threshold breaches observed: All performance decay values remain below the plotted threshold lines for both train and test datasets, with no instances of test failure at any perturbation level.
Consistent robustness across datasets: Both train and test datasets exhibit similar trends in AUC decay, indicating consistent robustness to input noise.

The results indicate that 'log_reg_model_v1' maintains stable AUC performance under increasing Gaussian noise, with only minor performance decay observed even at the highest tested perturbation size. No significant risk signals or threshold breaches are present, and robustness is consistent across both train and test datasets.

Tables

Perturbation Size	Dataset	Row Count	AUC	Performance Decay	Passed
Baseline (0.0)	train_dataset_final	2585	0.6765	0.0000	True
Baseline (0.0)	test_dataset_final	647	0.6988	0.0000	True
0.1	train_dataset_final	2585	0.6766	-0.0001	True
0.1	test_dataset_final	647	0.6990	-0.0002	True
0.2	train_dataset_final	2585	0.6728	0.0037	True
0.2	test_dataset_final	647	0.6956	0.0032	True
0.3	train_dataset_final	2585	0.6704	0.0061	True
0.3	test_dataset_final	647	0.7079	-0.0091	True
0.4	train_dataset_final	2585	0.6637	0.0128	True
0.4	test_dataset_final	647	0.6854	0.0134	True
0.5	train_dataset_final	2585	0.6640	0.0125	True
0.5	test_dataset_final	647	0.6805	0.0183	True

Figures

ValidMind Figure validmind.model_validation.sklearn.RobustnessDiagnosis:da10

In summary

In this second notebook, you learned how to:

Import a sample dataset
Identify which tests you might want to run with ValidMind
Initialize ValidMind datasets and model objects
Run individual tests
Utilize the output from tests you've run
Log test results from sets of or individual tests as evidence to the ValidMind Platform
Add supplementary individual test results to your documentation
Assign model predictions to your ValidMind model objects

Next steps

Integrate custom tests

Now that you're familiar with the basics of using the ValidMind Library to run and log tests to provide evidence for your model documentation, let's learn how to incorporate your own custom tests into ValidMind: 3 — Integrate custom tests

Prerequisites

Setting up

Initialize the ValidMind Library

Import sample dataset

Identify qualitative tests

Initialize the ValidMind dataset

Running tests

Run tabular data tests

Descriptive Statistics

Purpose

Test Mechanism

Signs of High Risk

Strengths

Limitations

Required Inputs: dataset

Parameters:

How to Run:

Code:

Descriptive Statistics

Tables

Numerical Variables

Categorical Variables

❌ Class Imbalance

Parameters:

Tables

Exited Class Imbalance

Figures

✅ Class Imbalance

Parameters:

Tables

Exited Class Imbalance

Figures

Utilize test output

❌ High Pearson Correlation

Parameters:

Tables

✅ High Pearson Correlation

Parameters:

Tables

Pearson Correlation Matrix

Figures

Documenting test results

Run and log multiple tests

Test Suite Results: Binary Classification V2

Check out the updated documentation on ValidMind.

Dataset Description

Tables

Dataset Description

✅ Class Imbalance

Parameters:

Tables

Exited Class Imbalance

Figures

✅ Duplicates

Tables

Duplicate Rows Results for Dataset

✅ High Cardinality

Tables

✅ Missing Values

Tables

❌ Skewness

Tables

Skewness Results for Dataset

❌ Unique Rows

Tables

❌ Too Many Zero Values

Tables

IQR Outliers Table

Tables

Summary of Outliers Detected by IQR Method

IQR Outliers Bar Plot

Figures

Descriptive Statistics

Tables

Numerical Variables

Categorical Variables

Pearson Correlation Matrix

Figures

✅ High Pearson Correlation

Parameters: