from validmind.tests import (
describe_test,
list_tests,
list_tasks,
list_tags,
list_tasks_and_tags, )
Explore tests
View and learn more about the tests available in the ValidMind Library, including code examples and usage of key functions.
In this notebook, we'll dive deep into the utilities available for viewing and understanding the various tests that ValidMind provides through the tests
module. Whether you're just getting started or looking for advanced tips, you'll find clear examples and explanations to assist you every step of the way.
Before we go into the details, let's import the describe_test
and list_tests
functions from the validmind.tests
module. These are the two functions that can be used to easily filter through tests and view details for individual tests.
Contents
Listing All Tests
The list_tests
function provides a convenient way to retrieve all available tests in the validmind.tests
module. When invoked without any parameters, it returns a pandas DataFrame containing detailed information about each test.
list_tests()
ID | Name | Description | Required Inputs | Params | Tags | Tasks |
---|---|---|---|---|---|---|
validmind.data_validation.ACFandPACFPlot | AC Fand PACF Plot | Analyzes time series data using Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots to... | ['dataset'] | {} | ['time_series_data', 'forecasting', 'statistical_test', 'visualization'] | ['regression'] |
validmind.data_validation.ADF | ADF | Assesses the stationarity of a time series dataset using the Augmented Dickey-Fuller (ADF) test.... | ['dataset'] | {} | ['time_series_data', 'statsmodels', 'forecasting', 'statistical_test', 'stationarity'] | ['regression'] |
validmind.data_validation.AutoAR | Auto AR | Automatically identifies the optimal Autoregressive (AR) order for a time series using BIC and AIC criteria.... | ['dataset'] | {'max_ar_order': {'type': 'int', 'default': 3}} | ['time_series_data', 'statsmodels', 'forecasting', 'statistical_test'] | ['regression'] |
validmind.data_validation.AutoMA | Auto MA | Automatically selects the optimal Moving Average (MA) order for each variable in a time series dataset based on... | ['dataset'] | {'max_ma_order': {'type': 'int', 'default': 3}} | ['time_series_data', 'statsmodels', 'forecasting', 'statistical_test'] | ['regression'] |
validmind.data_validation.AutoStationarity | Auto Stationarity | Automates Augmented Dickey-Fuller test to assess stationarity across multiple time series in a DataFrame.... | ['dataset'] | {'max_order': {'type': 'int', 'default': 5}, 'threshold': {'type': 'float', 'default': 0.05}} | ['time_series_data', 'statsmodels', 'forecasting', 'statistical_test'] | ['regression'] |
validmind.data_validation.BivariateScatterPlots | Bivariate Scatter Plots | Generates bivariate scatterplots to visually inspect relationships between pairs of numerical predictor variables... | ['dataset'] | {} | ['tabular_data', 'numerical_data', 'visualization'] | ['classification'] |
validmind.data_validation.BoxPierce | Box Pierce | Detects autocorrelation in time-series data through the Box-Pierce test to validate model performance.... | ['dataset'] | {} | ['time_series_data', 'forecasting', 'statistical_test', 'statsmodels'] | ['regression'] |
validmind.data_validation.ChiSquaredFeaturesTable | Chi Squared Features Table | Assesses the statistical association between categorical features and a target variable using the Chi-Squared test.... | ['dataset'] | {'p_threshold': {'type': '_empty', 'default': 0.05}} | ['tabular_data', 'categorical_data', 'statistical_test'] | ['classification'] |
validmind.data_validation.ClassImbalance | Class Imbalance | Evaluates and quantifies class distribution imbalance in a dataset used by a machine learning model.... | ['dataset'] | {'min_percent_threshold': {'type': 'int', 'default': 10}} | ['tabular_data', 'binary_classification', 'multiclass_classification', 'data_quality'] | ['classification'] |
validmind.data_validation.DatasetDescription | Dataset Description | Provides comprehensive analysis and statistical summaries of each column in a machine learning model's dataset.... | ['dataset'] | {} | ['tabular_data', 'time_series_data', 'text_data'] | ['classification', 'regression', 'text_classification', 'text_summarization'] |
validmind.data_validation.DatasetSplit | Dataset Split | Evaluates and visualizes the distribution proportions among training, testing, and validation datasets of an ML... | ['datasets'] | {} | ['tabular_data', 'time_series_data', 'text_data'] | ['classification', 'regression', 'text_classification', 'text_summarization'] |
validmind.data_validation.DescriptiveStatistics | Descriptive Statistics | Performs a detailed descriptive statistical analysis of both numerical and categorical data within a model's... | ['dataset'] | {} | ['tabular_data', 'time_series_data', 'data_quality'] | ['classification', 'regression'] |
validmind.data_validation.DickeyFullerGLS | Dickey Fuller GLS | Assesses stationarity in time series data using the Dickey-Fuller GLS test to determine the order of integration.... | ['dataset'] | {} | ['time_series_data', 'forecasting', 'unit_root_test'] | ['regression'] |
validmind.data_validation.Duplicates | Duplicates | Tests dataset for duplicate entries, ensuring model reliability via data quality verification.... | ['dataset'] | {'min_threshold': {'type': '_empty', 'default': 1}} | ['tabular_data', 'data_quality', 'text_data'] | ['classification', 'regression'] |
validmind.data_validation.EngleGrangerCoint | Engle Granger Coint | Assesses the degree of co-movement between pairs of time series data using the Engle-Granger cointegration test.... | ['dataset'] | {'threshold': {'type': 'float', 'default': 0.05}} | ['time_series_data', 'statistical_test', 'forecasting'] | ['regression'] |
validmind.data_validation.FeatureTargetCorrelationPlot | Feature Target Correlation Plot | Visualizes the correlation between input features and the model's target output in a color-coded horizontal bar... | ['dataset'] | {'fig_height': {'type': '_empty', 'default': 600}} | ['tabular_data', 'visualization', 'correlation'] | ['classification', 'regression'] |
validmind.data_validation.HighCardinality | High Cardinality | Assesses the number of unique values in categorical columns to detect high cardinality and potential overfitting.... | ['dataset'] | {'num_threshold': {'type': 'int', 'default': 100}, 'percent_threshold': {'type': 'float', 'default': 0.1}, 'threshold_type': {'type': 'str', 'default': 'percent'}} | ['tabular_data', 'data_quality', 'categorical_data'] | ['classification', 'regression'] |
validmind.data_validation.HighPearsonCorrelation | High Pearson Correlation | Identifies highly correlated feature pairs in a dataset suggesting feature redundancy or multicollinearity.... | ['dataset'] | {'max_threshold': {'type': 'float', 'default': 0.3}, 'top_n_correlations': {'type': 'int', 'default': 10}, 'feature_columns': {'type': 'list', 'default': None}} | ['tabular_data', 'data_quality', 'correlation'] | ['classification', 'regression'] |
validmind.data_validation.IQROutliersBarPlot | IQR Outliers Bar Plot | Visualizes outlier distribution across percentiles in numerical data using the Interquartile Range (IQR) method.... | ['dataset'] | {'threshold': {'type': 'float', 'default': 1.5}, 'fig_width': {'type': 'int', 'default': 800}} | ['tabular_data', 'visualization', 'numerical_data'] | ['classification', 'regression'] |
validmind.data_validation.IQROutliersTable | IQR Outliers Table | Determines and summarizes outliers in numerical features using the Interquartile Range method.... | ['dataset'] | {'threshold': {'type': 'float', 'default': 1.5}} | ['tabular_data', 'numerical_data'] | ['classification', 'regression'] |
validmind.data_validation.IsolationForestOutliers | Isolation Forest Outliers | Detects outliers in a dataset using the Isolation Forest algorithm and visualizes results through scatter plots.... | ['dataset'] | {'random_state': {'type': 'int', 'default': 0}, 'contamination': {'type': 'float', 'default': 0.1}, 'feature_columns': {'type': 'list', 'default': None}} | ['tabular_data', 'anomaly_detection'] | ['classification'] |
validmind.data_validation.JarqueBera | Jarque Bera | Assesses normality of dataset features in an ML model using the Jarque-Bera test.... | ['dataset'] | {} | ['tabular_data', 'data_distribution', 'statistical_test', 'statsmodels'] | ['classification', 'regression'] |
validmind.data_validation.KPSS | KPSS | Assesses the stationarity of time-series data in a machine learning model using the KPSS unit root test.... | ['dataset'] | {} | ['time_series_data', 'stationarity', 'unit_root_test', 'statsmodels'] | ['data_validation'] |
validmind.data_validation.LJungBox | L Jung Box | Assesses autocorrelations in dataset features by performing a Ljung-Box test on each feature.... | ['dataset'] | {} | ['time_series_data', 'forecasting', 'statistical_test', 'statsmodels'] | ['regression'] |
validmind.data_validation.LaggedCorrelationHeatmap | Lagged Correlation Heatmap | Assesses and visualizes correlation between target variable and lagged independent variables in a time-series... | ['dataset'] | {'num_lags': {'type': 'int', 'default': 10}} | ['time_series_data', 'visualization'] | ['regression'] |
validmind.data_validation.MissingValues | Missing Values | Evaluates dataset quality by ensuring missing value ratio across all features does not exceed a set threshold.... | ['dataset'] | {'min_threshold': {'type': 'int', 'default': 1}} | ['tabular_data', 'data_quality'] | ['classification', 'regression'] |
validmind.data_validation.MissingValuesBarPlot | Missing Values Bar Plot | Assesses the percentage and distribution of missing values in the dataset via a bar plot, with emphasis on... | ['dataset'] | {'threshold': {'type': 'int', 'default': 80}, 'fig_height': {'type': 'int', 'default': 600}} | ['tabular_data', 'data_quality', 'visualization'] | ['classification', 'regression'] |
validmind.data_validation.MutualInformation | Mutual Information | Calculates mutual information scores between features and target variable to evaluate feature relevance.... | ['dataset'] | {'min_threshold': {'type': 'float', 'default': 0.01}, 'task': {'type': 'str', 'default': 'classification'}} | ['feature_selection', 'data_analysis'] | ['classification', 'regression'] |
validmind.data_validation.PearsonCorrelationMatrix | Pearson Correlation Matrix | Evaluates linear dependency between numerical variables in a dataset via a Pearson Correlation coefficient heat map.... | ['dataset'] | {} | ['tabular_data', 'numerical_data', 'correlation'] | ['classification', 'regression'] |
validmind.data_validation.PhillipsPerronArch | Phillips Perron Arch | Assesses the stationarity of time series data in each feature of the ML model using the Phillips-Perron test.... | ['dataset'] | {} | ['time_series_data', 'forecasting', 'statistical_test', 'unit_root_test'] | ['regression'] |
validmind.data_validation.ProtectedClassesDescription | Protected Classes Description | Visualizes the distribution of protected classes in the dataset relative to the target variable... | ['dataset'] | {'protected_classes': {'type': '_empty', 'default': None}} | ['bias_and_fairness', 'descriptive_statistics'] | ['classification', 'regression'] |
validmind.data_validation.RollingStatsPlot | Rolling Stats Plot | Evaluates the stationarity of time series data by plotting its rolling mean and standard deviation over a specified... | ['dataset'] | {'window_size': {'type': 'int', 'default': 12}} | ['time_series_data', 'visualization', 'stationarity'] | ['regression'] |
validmind.data_validation.RunsTest | Runs Test | Executes Runs Test on ML model to detect non-random patterns in output data sequence.... | ['dataset'] | {} | ['tabular_data', 'statistical_test', 'statsmodels'] | ['classification', 'regression'] |
validmind.data_validation.ScatterPlot | Scatter Plot | Assesses visual relationships, patterns, and outliers among features in a dataset through scatter plot matrices.... | ['dataset'] | {} | ['tabular_data', 'visualization'] | ['classification', 'regression'] |
validmind.data_validation.ScoreBandDefaultRates | Score Band Default Rates | Analyzes default rates and population distribution across credit score bands.... | ['dataset', 'model'] | {'score_column': {'type': 'str', 'default': 'score'}, 'score_bands': {'type': 'list', 'default': None}} | ['visualization', 'credit_risk', 'scorecard'] | ['classification'] |
validmind.data_validation.SeasonalDecompose | Seasonal Decompose | Assesses patterns and seasonality in a time series dataset by decomposing its features into foundational components.... | ['dataset'] | {'seasonal_model': {'type': 'str', 'default': 'additive'}} | ['time_series_data', 'seasonality', 'statsmodels'] | ['regression'] |
validmind.data_validation.ShapiroWilk | Shapiro Wilk | Evaluates feature-wise normality of training data using the Shapiro-Wilk test.... | ['dataset'] | {} | ['tabular_data', 'data_distribution', 'statistical_test'] | ['classification', 'regression'] |
validmind.data_validation.Skewness | Skewness | Evaluates the skewness of numerical data in a dataset to check against a defined threshold, aiming to ensure data... | ['dataset'] | {'max_threshold': {'type': '_empty', 'default': 1}} | ['data_quality', 'tabular_data'] | ['classification', 'regression'] |
validmind.data_validation.SpreadPlot | Spread Plot | Assesses potential correlations between pairs of time series variables through visualization to enhance... | ['dataset'] | {} | ['time_series_data', 'visualization'] | ['regression'] |
validmind.data_validation.TabularCategoricalBarPlots | Tabular Categorical Bar Plots | Generates and visualizes bar plots for each category in categorical features to evaluate the dataset's composition.... | ['dataset'] | {} | ['tabular_data', 'visualization'] | ['classification', 'regression'] |
validmind.data_validation.TabularDateTimeHistograms | Tabular Date Time Histograms | Generates histograms to provide graphical insight into the distribution of time intervals in a model's datetime... | ['dataset'] | {} | ['time_series_data', 'visualization'] | ['classification', 'regression'] |
validmind.data_validation.TabularDescriptionTables | Tabular Description Tables | Summarizes key descriptive statistics for numerical, categorical, and datetime variables in a dataset.... | ['dataset'] | {} | ['tabular_data'] | ['classification', 'regression'] |
validmind.data_validation.TabularNumericalHistograms | Tabular Numerical Histograms | Generates histograms for each numerical feature in a dataset to provide visual insights into data distribution and... | ['dataset'] | {} | ['tabular_data', 'visualization'] | ['classification', 'regression'] |
validmind.data_validation.TargetRateBarPlots | Target Rate Bar Plots | Generates bar plots visualizing the default rates of categorical features for a classification machine learning... | ['dataset'] | {} | ['tabular_data', 'visualization', 'categorical_data'] | ['classification'] |
validmind.data_validation.TimeSeriesDescription | Time Series Description | Generates a detailed analysis for the provided time series dataset, summarizing key statistics to identify trends,... | ['dataset'] | {} | ['time_series_data', 'analysis'] | ['regression'] |
validmind.data_validation.TimeSeriesDescriptiveStatistics | Time Series Descriptive Statistics | Evaluates the descriptive statistics of a time series dataset to identify trends, patterns, and data quality issues.... | ['dataset'] | {} | ['time_series_data', 'analysis'] | ['regression'] |
validmind.data_validation.TimeSeriesFrequency | Time Series Frequency | Evaluates consistency of time series data frequency and generates a frequency plot.... | ['dataset'] | {} | ['time_series_data'] | ['regression'] |
validmind.data_validation.TimeSeriesHistogram | Time Series Histogram | Visualizes distribution of time-series data using histograms and Kernel Density Estimation (KDE) lines.... | ['dataset'] | {'nbins': {'type': '_empty', 'default': 30}} | ['data_validation', 'visualization', 'time_series_data'] | ['regression', 'time_series_forecasting'] |
validmind.data_validation.TimeSeriesLinePlot | Time Series Line Plot | Generates and analyses time-series data through line plots revealing trends, patterns, anomalies over time.... | ['dataset'] | {} | ['time_series_data', 'visualization'] | ['regression'] |
validmind.data_validation.TimeSeriesMissingValues | Time Series Missing Values | Validates time-series data quality by confirming the count of missing values is below a certain threshold.... | ['dataset'] | {'min_threshold': {'type': 'int', 'default': 1}} | ['time_series_data'] | ['regression'] |
validmind.data_validation.TimeSeriesOutliers | Time Series Outliers | Identifies and visualizes outliers in time-series data using the z-score method.... | ['dataset'] | {'zscore_threshold': {'type': 'int', 'default': 3}} | ['time_series_data'] | ['regression'] |
validmind.data_validation.TooManyZeroValues | Too Many Zero Values | Identifies numerical columns in a dataset that contain an excessive number of zero values, defined by a threshold... | ['dataset'] | {'max_percent_threshold': {'type': 'float', 'default': 0.03}} | ['tabular_data'] | ['regression', 'classification'] |
validmind.data_validation.UniqueRows | Unique Rows | Verifies the diversity of the dataset by ensuring that the count of unique rows exceeds a prescribed threshold.... | ['dataset'] | {'min_percent_threshold': {'type': 'float', 'default': 1}} | ['tabular_data'] | ['regression', 'classification'] |
validmind.data_validation.WOEBinPlots | WOE Bin Plots | Generates visualizations of Weight of Evidence (WoE) and Information Value (IV) for understanding predictive power... | ['dataset'] | {'breaks_adj': {'type': 'list', 'default': None}, 'fig_height': {'type': 'int', 'default': 600}, 'fig_width': {'type': 'int', 'default': 500}} | ['tabular_data', 'visualization', 'categorical_data'] | ['classification'] |
validmind.data_validation.WOEBinTable | WOE Bin Table | Assesses the Weight of Evidence (WoE) and Information Value (IV) of each feature to evaluate its predictive power... | ['dataset'] | {'breaks_adj': {'type': 'list', 'default': None}} | ['tabular_data', 'categorical_data'] | ['classification'] |
validmind.data_validation.ZivotAndrewsArch | Zivot Andrews Arch | Evaluates the order of integration and stationarity of time series data using the Zivot-Andrews unit root test.... | ['dataset'] | {} | ['time_series_data', 'stationarity', 'unit_root_test'] | ['regression'] |
validmind.data_validation.nlp.CommonWords | Common Words | Assesses the most frequent non-stopwords in a text column for identifying prevalent language patterns.... | ['dataset'] | {} | ['nlp', 'text_data', 'visualization', 'frequency_analysis'] | ['text_classification', 'text_summarization'] |
validmind.data_validation.nlp.Hashtags | Hashtags | Assesses hashtag frequency in a text column, highlighting usage trends and potential dataset bias or spam.... | ['dataset'] | {'top_hashtags': {'type': 'int', 'default': 25}} | ['nlp', 'text_data', 'visualization', 'frequency_analysis'] | ['text_classification', 'text_summarization'] |
validmind.data_validation.nlp.LanguageDetection | Language Detection | Assesses the diversity of languages in a textual dataset by detecting and visualizing the distribution of languages.... | ['dataset'] | {} | ['nlp', 'text_data', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.data_validation.nlp.Mentions | Mentions | Calculates and visualizes frequencies of '@' prefixed mentions in a text-based dataset for NLP model analysis.... | ['dataset'] | {'top_mentions': {'type': 'int', 'default': 25}} | ['nlp', 'text_data', 'visualization', 'frequency_analysis'] | ['text_classification', 'text_summarization'] |
validmind.data_validation.nlp.PolarityAndSubjectivity | Polarity And Subjectivity | Analyzes the polarity and subjectivity of text data within a given dataset to visualize the sentiment distribution.... | ['dataset'] | {'threshold_subjectivity': {'type': '_empty', 'default': 0.5}, 'threshold_polarity': {'type': '_empty', 'default': 0}} | ['nlp', 'text_data', 'data_validation'] | ['nlp'] |
validmind.data_validation.nlp.Punctuations | Punctuations | Analyzes and visualizes the frequency distribution of punctuation usage in a given text dataset.... | ['dataset'] | {'count_mode': {'type': '_empty', 'default': 'token'}} | ['nlp', 'text_data', 'visualization', 'frequency_analysis'] | ['text_classification', 'text_summarization', 'nlp'] |
validmind.data_validation.nlp.Sentiment | Sentiment | Analyzes the sentiment of text data within a dataset using the VADER sentiment analysis tool.... | ['dataset'] | {} | ['nlp', 'text_data', 'data_validation'] | ['nlp'] |
validmind.data_validation.nlp.StopWords | Stop Words | Evaluates and visualizes the frequency of English stop words in a text dataset against a defined threshold.... | ['dataset'] | {'min_percent_threshold': {'type': 'float', 'default': 0.5}, 'num_words': {'type': 'int', 'default': 25}} | ['nlp', 'text_data', 'frequency_analysis', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.data_validation.nlp.TextDescription | Text Description | Conducts comprehensive textual analysis on a dataset using NLTK to evaluate various parameters and generate... | ['dataset'] | {'unwanted_tokens': {'type': 'set', 'default': {"s'", "'s", ' ', 'mr', "''", 'dollar', 'dr', 'mrs', '``', 's', 'us', 'ms'}}, 'lang': {'type': 'str', 'default': 'english'}} | ['nlp', 'text_data', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.data_validation.nlp.Toxicity | Toxicity | Assesses the toxicity of text data within a dataset to visualize the distribution of toxicity scores.... | ['dataset'] | {} | ['nlp', 'text_data', 'data_validation'] | ['nlp'] |
validmind.model_validation.BertScore | Bert Score | Assesses the quality of machine-generated text using BERTScore metrics and visualizes results through histograms... | ['dataset', 'model'] | {'evaluation_model': {'type': '_empty', 'default': 'distilbert-base-uncased'}} | ['nlp', 'text_data', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.model_validation.BleuScore | Bleu Score | Evaluates the quality of machine-generated text using BLEU metrics and visualizes the results through histograms... | ['dataset', 'model'] | {} | ['nlp', 'text_data', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.model_validation.ClusterSizeDistribution | Cluster Size Distribution | Assesses the performance of clustering models by comparing the distribution of cluster sizes in model predictions... | ['dataset', 'model'] | {} | ['sklearn', 'model_performance'] | ['clustering'] |
validmind.model_validation.ContextualRecall | Contextual Recall | Evaluates a Natural Language Generation model's ability to generate contextually relevant and factually correct... | ['dataset', 'model'] | {} | ['nlp', 'text_data', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.model_validation.FeaturesAUC | Features AUC | Evaluates the discriminatory power of each individual feature within a binary classification model by calculating... | ['dataset'] | {'fontsize': {'type': 'int', 'default': 12}, 'figure_height': {'type': 'int', 'default': 500}} | ['feature_importance', 'AUC', 'visualization'] | ['classification'] |
validmind.model_validation.MeteorScore | Meteor Score | Assesses the quality of machine-generated translations by comparing them to human-produced references using the... | ['dataset', 'model'] | {} | ['nlp', 'text_data', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.model_validation.ModelMetadata | Model Metadata | Compare metadata of different models and generate a summary table with the results.... | ['model'] | {} | ['model_training', 'metadata'] | ['regression', 'time_series_forecasting'] |
validmind.model_validation.ModelPredictionResiduals | Model Prediction Residuals | Assesses normality and behavior of residuals in regression models through visualization and statistical tests.... | ['dataset', 'model'] | {'nbins': {'type': '_empty', 'default': 100}, 'p_value_threshold': {'type': '_empty', 'default': 0.05}, 'start_date': {'type': '_empty', 'default': None}, 'end_date': {'type': '_empty', 'default': None}} | ['regression'] | ['residual_analysis', 'visualization'] |
validmind.model_validation.RegardScore | Regard Score | Assesses the sentiment and potential biases in text generated by NLP models by computing and visualizing regard... | ['dataset', 'model'] | {} | ['nlp', 'text_data', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.model_validation.RegressionResidualsPlot | Regression Residuals Plot | Evaluates regression model performance using residual distribution and actual vs. predicted plots.... | ['model', 'dataset'] | {'bin_size': {'type': 'float', 'default': 0.1}} | ['model_performance', 'visualization'] | ['regression'] |
validmind.model_validation.RougeScore | Rouge Score | Assesses the quality of machine-generated text using ROUGE metrics and visualizes the results to provide... | ['dataset', 'model'] | {'metric': {'type': '_empty', 'default': 'rouge-1'}} | ['nlp', 'text_data', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.model_validation.TimeSeriesPredictionWithCI | Time Series Prediction With CI | Assesses predictive accuracy and uncertainty in time series models, highlighting breaches beyond confidence... | ['dataset', 'model'] | {'confidence': {'type': '_empty', 'default': 0.95}} | ['model_predictions', 'visualization'] | ['regression', 'time_series_forecasting'] |
validmind.model_validation.TimeSeriesPredictionsPlot | Time Series Predictions Plot | Plot actual vs predicted values for time series data and generate a visual comparison for the model.... | ['dataset', 'model'] | {} | ['model_predictions', 'visualization'] | ['regression', 'time_series_forecasting'] |
validmind.model_validation.TimeSeriesR2SquareBySegments | Time Series R2 Square By Segments | Evaluates the R-Squared values of regression models over specified time segments in time series data to assess... | ['dataset', 'model'] | {'segments': {'type': '_empty', 'default': None}} | ['model_performance', 'sklearn'] | ['regression', 'time_series_forecasting'] |
validmind.model_validation.TokenDisparity | Token Disparity | Evaluates the token disparity between reference and generated texts, visualizing the results through histograms and... | ['dataset', 'model'] | {} | ['nlp', 'text_data', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.model_validation.ToxicityScore | Toxicity Score | Assesses the toxicity levels of texts generated by NLP models to identify and mitigate harmful or offensive content.... | ['dataset', 'model'] | {} | ['nlp', 'text_data', 'visualization'] | ['text_classification', 'text_summarization'] |
validmind.model_validation.embeddings.ClusterDistribution | Cluster Distribution | Assesses the distribution of text embeddings across clusters produced by a model using KMeans clustering.... | ['model', 'dataset'] | {'num_clusters': {'type': 'int', 'default': 5}} | ['llm', 'text_data', 'embeddings', 'visualization'] | ['feature_extraction'] |
validmind.model_validation.embeddings.CosineSimilarityComparison | Cosine Similarity Comparison | Assesses the similarity between embeddings generated by different models using Cosine Similarity, providing both... | ['dataset', 'models'] | {} | ['visualization', 'dimensionality_reduction', 'embeddings'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.embeddings.CosineSimilarityDistribution | Cosine Similarity Distribution | Assesses the similarity between predicted text embeddings from a model using a Cosine Similarity distribution... | ['dataset', 'model'] | {} | ['llm', 'text_data', 'embeddings', 'visualization'] | ['feature_extraction'] |
validmind.model_validation.embeddings.CosineSimilarityHeatmap | Cosine Similarity Heatmap | Generates an interactive heatmap to visualize the cosine similarities among embeddings derived from a given model.... | ['dataset', 'model'] | {'title': {'type': '_empty', 'default': 'Cosine Similarity Matrix'}, 'color': {'type': '_empty', 'default': 'Cosine Similarity'}, 'xaxis_title': {'type': '_empty', 'default': 'Index'}, 'yaxis_title': {'type': '_empty', 'default': 'Index'}, 'color_scale': {'type': '_empty', 'default': 'Blues'}} | ['visualization', 'dimensionality_reduction', 'embeddings'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.embeddings.DescriptiveAnalytics | Descriptive Analytics | Evaluates statistical properties of text embeddings in an ML model via mean, median, and standard deviation... | ['dataset', 'model'] | {} | ['llm', 'text_data', 'embeddings', 'visualization'] | ['feature_extraction'] |
validmind.model_validation.embeddings.EmbeddingsVisualization2D | Embeddings Visualization2 D | Visualizes 2D representation of text embeddings generated by a model using t-SNE technique.... | ['model', 'dataset'] | {'cluster_column': {'type': None, 'default': None}, 'perplexity': {'type': 'int', 'default': 30}} | ['llm', 'text_data', 'embeddings', 'visualization'] | ['feature_extraction'] |
validmind.model_validation.embeddings.EuclideanDistanceComparison | Euclidean Distance Comparison | Assesses and visualizes the dissimilarity between model embeddings using Euclidean distance, providing insights... | ['dataset', 'models'] | {} | ['visualization', 'dimensionality_reduction', 'embeddings'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.embeddings.EuclideanDistanceHeatmap | Euclidean Distance Heatmap | Generates an interactive heatmap to visualize the Euclidean distances among embeddings derived from a given model.... | ['dataset', 'model'] | {'title': {'type': '_empty', 'default': 'Euclidean Distance Matrix'}, 'color': {'type': '_empty', 'default': 'Euclidean Distance'}, 'xaxis_title': {'type': '_empty', 'default': 'Index'}, 'yaxis_title': {'type': '_empty', 'default': 'Index'}, 'color_scale': {'type': '_empty', 'default': 'Blues'}} | ['visualization', 'dimensionality_reduction', 'embeddings'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.embeddings.PCAComponentsPairwisePlots | PCA Components Pairwise Plots | Generates scatter plots for pairwise combinations of principal component analysis (PCA) components of model... | ['dataset', 'model'] | {'n_components': {'type': '_empty', 'default': 3}} | ['visualization', 'dimensionality_reduction', 'embeddings'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.embeddings.StabilityAnalysisKeyword | Stability Analysis Keyword | Evaluates robustness of embedding models to keyword swaps in the test dataset.... | ['dataset', 'model'] | {'keyword_dict': {'type': None, 'default': None}, 'mean_similarity_threshold': {'type': 'float', 'default': 0.7}} | ['llm', 'text_data', 'embeddings', 'visualization'] | ['feature_extraction'] |
validmind.model_validation.embeddings.StabilityAnalysisRandomNoise | Stability Analysis Random Noise | Assesses the robustness of text embeddings models to random noise introduced via text perturbations.... | ['dataset', 'model'] | {'probability': {'type': 'float', 'default': 0.02}, 'mean_similarity_threshold': {'type': 'float', 'default': 0.7}} | ['llm', 'text_data', 'embeddings', 'visualization'] | ['feature_extraction'] |
validmind.model_validation.embeddings.StabilityAnalysisSynonyms | Stability Analysis Synonyms | Evaluates the stability of text embeddings models when words in test data are replaced by their synonyms randomly.... | ['dataset', 'model'] | {'probability': {'type': 'float', 'default': 0.02}, 'mean_similarity_threshold': {'type': 'float', 'default': 0.7}} | ['llm', 'text_data', 'embeddings', 'visualization'] | ['feature_extraction'] |
validmind.model_validation.embeddings.StabilityAnalysisTranslation | Stability Analysis Translation | Evaluates robustness of text embeddings models to noise introduced by translating the original text to another... | ['dataset', 'model'] | {'source_lang': {'type': 'str', 'default': 'en'}, 'target_lang': {'type': 'str', 'default': 'fr'}, 'mean_similarity_threshold': {'type': 'float', 'default': 0.7}} | ['llm', 'text_data', 'embeddings', 'visualization'] | ['feature_extraction'] |
validmind.model_validation.embeddings.TSNEComponentsPairwisePlots | TSNE Components Pairwise Plots | Creates scatter plots for pairwise combinations of t-SNE components to visualize embeddings and highlight potential... | ['dataset', 'model'] | {'n_components': {'type': '_empty', 'default': 2}, 'perplexity': {'type': '_empty', 'default': 30}, 'title': {'type': '_empty', 'default': 't-SNE'}} | ['visualization', 'dimensionality_reduction', 'embeddings'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.ragas.AnswerCorrectness | Answer Correctness | Evaluates the correctness of answers in a dataset with respect to the provided ground... | ['dataset'] | {'user_input_column': {'type': '_empty', 'default': 'user_input'}, 'response_column': {'type': '_empty', 'default': 'response'}, 'reference_column': {'type': '_empty', 'default': 'reference'}} | ['ragas', 'llm'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.ragas.AspectCritic | Aspect Critic | Evaluates generations against the following aspects: harmfulness, maliciousness,... | ['dataset'] | {'user_input_column': {'type': '_empty', 'default': 'user_input'}, 'response_column': {'type': '_empty', 'default': 'response'}, 'retrieved_contexts_column': {'type': '_empty', 'default': None}, 'aspects': {'type': 'list', 'default': ['coherence', 'conciseness', 'correctness', 'harmfulness', 'maliciousness']}, 'additional_aspects': {'type': 'list', 'default': None}} | ['ragas', 'llm', 'qualitative'] | ['text_summarization', 'text_generation', 'text_qa'] |
validmind.model_validation.ragas.ContextEntityRecall | Context Entity Recall | Evaluates the context entity recall for dataset entries and visualizes the results.... | ['dataset'] | {'retrieved_contexts_column': {'type': 'str', 'default': 'retrieved_contexts'}, 'reference_column': {'type': 'str', 'default': 'reference'}} | ['ragas', 'llm', 'retrieval_performance'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.ragas.ContextPrecision | Context Precision | Context Precision is a metric that evaluates whether all of the ground-truth... | ['dataset'] | {'user_input_column': {'type': 'str', 'default': 'user_input'}, 'retrieved_contexts_column': {'type': 'str', 'default': 'retrieved_contexts'}, 'reference_column': {'type': 'str', 'default': 'reference'}} | ['ragas', 'llm', 'retrieval_performance'] | ['text_qa', 'text_generation', 'text_summarization', 'text_classification'] |
validmind.model_validation.ragas.ContextPrecisionWithoutReference | Context Precision Without Reference | Context Precision Without Reference is a metric used to evaluate the relevance of... | ['dataset'] | {'user_input_column': {'type': 'str', 'default': 'user_input'}, 'retrieved_contexts_column': {'type': 'str', 'default': 'retrieved_contexts'}, 'response_column': {'type': 'str', 'default': 'response'}} | ['ragas', 'llm', 'retrieval_performance'] | ['text_qa', 'text_generation', 'text_summarization', 'text_classification'] |
validmind.model_validation.ragas.ContextRecall | Context Recall | Context recall measures the extent to which the retrieved context aligns with the... | ['dataset'] | {'user_input_column': {'type': 'str', 'default': 'user_input'}, 'retrieved_contexts_column': {'type': 'str', 'default': 'retrieved_contexts'}, 'reference_column': {'type': 'str', 'default': 'reference'}} | ['ragas', 'llm', 'retrieval_performance'] | ['text_qa', 'text_generation', 'text_summarization', 'text_classification'] |
validmind.model_validation.ragas.Faithfulness | Faithfulness | Evaluates the faithfulness of the generated answers with respect to retrieved contexts.... | ['dataset'] | {'user_input_column': {'type': '_empty', 'default': 'user_input'}, 'response_column': {'type': '_empty', 'default': 'response'}, 'retrieved_contexts_column': {'type': '_empty', 'default': 'retrieved_contexts'}} | ['ragas', 'llm', 'rag_performance'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.ragas.NoiseSensitivity | Noise Sensitivity | Assesses the sensitivity of a Large Language Model (LLM) to noise in retrieved context by measuring how often it... | ['dataset'] | {'response_column': {'type': '_empty', 'default': 'response'}, 'retrieved_contexts_column': {'type': '_empty', 'default': 'retrieved_contexts'}, 'reference_column': {'type': '_empty', 'default': 'reference'}, 'focus': {'type': '_empty', 'default': 'relevant'}, 'user_input_column': {'type': '_empty', 'default': 'user_input'}} | ['ragas', 'llm', 'rag_performance'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.ragas.ResponseRelevancy | Response Relevancy | Assesses how pertinent the generated answer is to the given prompt.... | ['dataset'] | {'user_input_column': {'type': '_empty', 'default': 'user_input'}, 'retrieved_contexts_column': {'type': '_empty', 'default': None}, 'response_column': {'type': '_empty', 'default': 'response'}} | ['ragas', 'llm', 'rag_performance'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.ragas.SemanticSimilarity | Semantic Similarity | Calculates the semantic similarity between generated responses and ground truths... | ['dataset'] | {'response_column': {'type': '_empty', 'default': 'response'}, 'reference_column': {'type': '_empty', 'default': 'reference'}} | ['ragas', 'llm'] | ['text_qa', 'text_generation', 'text_summarization'] |
validmind.model_validation.sklearn.AdjustedMutualInformation | Adjusted Mutual Information | Evaluates clustering model performance by measuring mutual information between true and predicted labels, adjusting... | ['model', 'dataset'] | {} | ['sklearn', 'model_performance', 'clustering'] | ['clustering'] |
validmind.model_validation.sklearn.AdjustedRandIndex | Adjusted Rand Index | Measures the similarity between two data clusters using the Adjusted Rand Index (ARI) metric in clustering machine... | ['model', 'dataset'] | {} | ['sklearn', 'model_performance', 'clustering'] | ['clustering'] |
validmind.model_validation.sklearn.CalibrationCurve | Calibration Curve | Evaluates the calibration of probability estimates by comparing predicted probabilities against observed... | ['model', 'dataset'] | {'n_bins': {'type': 'int', 'default': 10}} | ['sklearn', 'model_performance', 'classification'] | ['classification'] |
validmind.model_validation.sklearn.ClassifierPerformance | Classifier Performance | Evaluates performance of binary or multiclass classification models using precision, recall, F1-Score, accuracy,... | ['dataset', 'model'] | {'average': {'type': 'str', 'default': 'macro'}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.ClassifierThresholdOptimization | Classifier Threshold Optimization | Analyzes and visualizes different threshold optimization methods for binary classification models.... | ['dataset', 'model'] | {'methods': {'type': None, 'default': None}, 'target_recall': {'type': None, 'default': None}} | ['model_validation', 'threshold_optimization', 'classification_metrics'] | ['classification'] |
validmind.model_validation.sklearn.ClusterCosineSimilarity | Cluster Cosine Similarity | Measures the intra-cluster similarity of a clustering model using cosine similarity.... | ['model', 'dataset'] | {} | ['sklearn', 'model_performance', 'clustering'] | ['clustering'] |
validmind.model_validation.sklearn.ClusterPerformanceMetrics | Cluster Performance Metrics | Evaluates the performance of clustering machine learning models using multiple established metrics.... | ['model', 'dataset'] | {} | ['sklearn', 'model_performance', 'clustering'] | ['clustering'] |
validmind.model_validation.sklearn.CompletenessScore | Completeness Score | Evaluates a clustering model's capacity to categorize instances from a single class into the same cluster.... | ['model', 'dataset'] | {} | ['sklearn', 'model_performance', 'clustering'] | ['clustering'] |
validmind.model_validation.sklearn.ConfusionMatrix | Confusion Matrix | Evaluates and visually represents the classification ML model's predictive performance using a Confusion Matrix... | ['dataset', 'model'] | {'threshold': {'type': 'float', 'default': 0.5}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance', 'visualization'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.FeatureImportance | Feature Importance | Compute feature importance scores for a given model and generate a summary table... | ['dataset', 'model'] | {'num_features': {'type': 'int', 'default': 3}} | ['model_explainability', 'sklearn'] | ['regression', 'time_series_forecasting'] |
validmind.model_validation.sklearn.FowlkesMallowsScore | Fowlkes Mallows Score | Evaluates the similarity between predicted and actual cluster assignments in a model using the Fowlkes-Mallows... | ['dataset', 'model'] | {} | ['sklearn', 'model_performance'] | ['clustering'] |
validmind.model_validation.sklearn.HomogeneityScore | Homogeneity Score | Assesses clustering homogeneity by comparing true and predicted labels, scoring from 0 (heterogeneous) to 1... | ['dataset', 'model'] | {} | ['sklearn', 'model_performance'] | ['clustering'] |
validmind.model_validation.sklearn.HyperParametersTuning | Hyper Parameters Tuning | Performs exhaustive grid search over specified parameter ranges to find optimal model configurations... | ['model', 'dataset'] | {'param_grid': {'type': 'dict', 'default': None}, 'scoring': {'type': None, 'default': None}, 'thresholds': {'type': None, 'default': None}, 'fit_params': {'type': 'dict', 'default': None}} | ['sklearn', 'model_performance'] | ['clustering', 'classification'] |
validmind.model_validation.sklearn.KMeansClustersOptimization | K Means Clusters Optimization | Optimizes the number of clusters in K-means models using Elbow and Silhouette methods.... | ['model', 'dataset'] | {'n_clusters': {'type': None, 'default': None}} | ['sklearn', 'model_performance', 'kmeans'] | ['clustering'] |
validmind.model_validation.sklearn.MinimumAccuracy | Minimum Accuracy | Checks if the model's prediction accuracy meets or surpasses a specified threshold.... | ['dataset', 'model'] | {'min_threshold': {'type': 'float', 'default': 0.7}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.MinimumF1Score | Minimum F1 Score | Assesses if the model's F1 score on the validation set meets a predefined minimum threshold, ensuring balanced... | ['dataset', 'model'] | {'min_threshold': {'type': 'float', 'default': 0.5}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.MinimumROCAUCScore | Minimum ROCAUC Score | Validates model by checking if the ROC AUC score meets or surpasses a specified threshold.... | ['dataset', 'model'] | {'min_threshold': {'type': 'float', 'default': 0.5}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.ModelParameters | Model Parameters | Extracts and displays model parameters in a structured format for transparency and reproducibility.... | ['model'] | {'model_params': {'type': '_empty', 'default': None}} | ['model_training', 'metadata'] | ['classification', 'regression'] |
validmind.model_validation.sklearn.ModelsPerformanceComparison | Models Performance Comparison | Evaluates and compares the performance of multiple Machine Learning models using various metrics like accuracy,... | ['dataset', 'models'] | {} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance', 'model_comparison'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.OverfitDiagnosis | Overfit Diagnosis | Assesses potential overfitting in a model's predictions, identifying regions where performance between training and... | ['model', 'datasets'] | {'metric': {'type': 'str', 'default': None}, 'cut_off_threshold': {'type': 'float', 'default': 0.04}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'linear_regression', 'model_diagnosis'] | ['classification', 'regression'] |
validmind.model_validation.sklearn.PermutationFeatureImportance | Permutation Feature Importance | Assesses the significance of each feature in a model by evaluating the impact on model performance when feature... | ['model', 'dataset'] | {'fontsize': {'type': None, 'default': None}, 'figure_height': {'type': None, 'default': None}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'feature_importance', 'visualization'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.PopulationStabilityIndex | Population Stability Index | Assesses the Population Stability Index (PSI) to quantify the stability of an ML model's predictions across... | ['datasets', 'model'] | {'num_bins': {'type': 'int', 'default': 10}, 'mode': {'type': 'str', 'default': 'fixed'}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.PrecisionRecallCurve | Precision Recall Curve | Evaluates the precision-recall trade-off for binary classification models and visualizes the Precision-Recall curve.... | ['model', 'dataset'] | {} | ['sklearn', 'binary_classification', 'model_performance', 'visualization'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.ROCCurve | ROC Curve | Evaluates binary classification model performance by generating and plotting the Receiver Operating Characteristic... | ['model', 'dataset'] | {} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance', 'visualization'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.RegressionErrors | Regression Errors | Assesses the performance and error distribution of a regression model using various error metrics.... | ['model', 'dataset'] | {} | ['sklearn', 'model_performance'] | ['regression', 'classification'] |
validmind.model_validation.sklearn.RegressionErrorsComparison | Regression Errors Comparison | Assesses multiple regression error metrics to compare model performance across different datasets, emphasizing... | ['datasets', 'models'] | {} | ['model_performance', 'sklearn'] | ['regression', 'time_series_forecasting'] |
validmind.model_validation.sklearn.RegressionPerformance | Regression Performance | Evaluates the performance of a regression model using five different metrics: MAE, MSE, RMSE, MAPE, and MBD.... | ['model', 'dataset'] | {} | ['sklearn', 'model_performance'] | ['regression'] |
validmind.model_validation.sklearn.RegressionR2Square | Regression R2 Square | Assesses the overall goodness-of-fit of a regression model by evaluating R-squared (R2) and Adjusted R-squared (Adj... | ['dataset', 'model'] | {} | ['sklearn', 'model_performance'] | ['regression'] |
validmind.model_validation.sklearn.RegressionR2SquareComparison | Regression R2 Square Comparison | Compares R-Squared and Adjusted R-Squared values for different regression models across multiple datasets to assess... | ['datasets', 'models'] | {} | ['model_performance', 'sklearn'] | ['regression', 'time_series_forecasting'] |
validmind.model_validation.sklearn.RobustnessDiagnosis | Robustness Diagnosis | Assesses the robustness of a machine learning model by evaluating performance decay under noisy conditions.... | ['datasets', 'model'] | {'metric': {'type': 'str', 'default': None}, 'scaling_factor_std_dev_list': {'type': None, 'default': [0.1, 0.2, 0.3, 0.4, 0.5]}, 'performance_decay_threshold': {'type': 'float', 'default': 0.05}} | ['sklearn', 'model_diagnosis', 'visualization'] | ['classification', 'regression'] |
validmind.model_validation.sklearn.SHAPGlobalImportance | SHAP Global Importance | Evaluates and visualizes global feature importance using SHAP values for model explanation and risk identification.... | ['model', 'dataset'] | {'kernel_explainer_samples': {'type': 'int', 'default': 10}, 'tree_or_linear_explainer_samples': {'type': 'int', 'default': 200}, 'class_of_interest': {'type': None, 'default': None}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'feature_importance', 'visualization'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.ScoreProbabilityAlignment | Score Probability Alignment | Analyzes the alignment between credit scores and predicted probabilities.... | ['model', 'dataset'] | {'score_column': {'type': 'str', 'default': 'score'}, 'n_bins': {'type': 'int', 'default': 10}} | ['visualization', 'credit_risk', 'calibration'] | ['classification'] |
validmind.model_validation.sklearn.SilhouettePlot | Silhouette Plot | Calculates and visualizes Silhouette Score, assessing the degree of data point suitability to its cluster in ML... | ['model', 'dataset'] | {} | ['sklearn', 'model_performance'] | ['clustering'] |
validmind.model_validation.sklearn.TrainingTestDegradation | Training Test Degradation | Tests if model performance degradation between training and test datasets exceeds a predefined threshold.... | ['datasets', 'model'] | {'max_threshold': {'type': 'float', 'default': 0.1}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance', 'visualization'] | ['classification', 'text_classification'] |
validmind.model_validation.sklearn.VMeasure | V Measure | Evaluates homogeneity and completeness of a clustering model using the V Measure Score.... | ['dataset', 'model'] | {} | ['sklearn', 'model_performance'] | ['clustering'] |
validmind.model_validation.sklearn.WeakspotsDiagnosis | Weakspots Diagnosis | Identifies and visualizes weak spots in a machine learning model's performance across various sections of the... | ['datasets', 'model'] | {'features_columns': {'type': None, 'default': None}, 'metrics': {'type': None, 'default': None}, 'thresholds': {'type': None, 'default': None}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_diagnosis', 'visualization'] | ['classification', 'text_classification'] |
validmind.model_validation.statsmodels.AutoARIMA | Auto ARIMA | Evaluates ARIMA models for time-series forecasting, ranking them using Bayesian and Akaike Information Criteria.... | ['model', 'dataset'] | {} | ['time_series_data', 'forecasting', 'model_selection', 'statsmodels'] | ['regression'] |
validmind.model_validation.statsmodels.CumulativePredictionProbabilities | Cumulative Prediction Probabilities | Visualizes cumulative probabilities of positive and negative classes for both training and testing in classification models.... | ['dataset', 'model'] | {'title': {'type': '_empty', 'default': 'Cumulative Probabilities'}} | ['visualization', 'credit_risk'] | ['classification'] |
validmind.model_validation.statsmodels.DurbinWatsonTest | Durbin Watson Test | Assesses autocorrelation in time series data features using the Durbin-Watson statistic.... | ['dataset', 'model'] | {'threshold': {'type': '_empty', 'default': [1.5, 2.5]}} | ['time_series_data', 'forecasting', 'statistical_test', 'statsmodels'] | ['regression'] |
validmind.model_validation.statsmodels.GINITable | GINI Table | Evaluates classification model performance using AUC, GINI, and KS metrics for training and test datasets.... | ['dataset', 'model'] | {} | ['model_performance'] | ['classification'] |
validmind.model_validation.statsmodels.KolmogorovSmirnov | Kolmogorov Smirnov | Assesses whether each feature in the dataset aligns with a normal distribution using the Kolmogorov-Smirnov test.... | ['model', 'dataset'] | {'dist': {'type': 'str', 'default': 'norm'}} | ['tabular_data', 'data_distribution', 'statistical_test', 'statsmodels'] | ['classification', 'regression'] |
validmind.model_validation.statsmodels.Lilliefors | Lilliefors | Assesses the normality of feature distributions in an ML model's training dataset using the Lilliefors test.... | ['dataset'] | {} | ['tabular_data', 'data_distribution', 'statistical_test', 'statsmodels'] | ['classification', 'regression'] |
validmind.model_validation.statsmodels.PredictionProbabilitiesHistogram | Prediction Probabilities Histogram | Assesses the predictive probability distribution for binary classification to evaluate model performance and... | ['dataset', 'model'] | {'title': {'type': '_empty', 'default': 'Histogram of Predictive Probabilities'}} | ['visualization', 'credit_risk'] | ['classification'] |
validmind.model_validation.statsmodels.RegressionCoeffs | Regression Coeffs | Assesses the significance and uncertainty of predictor variables in a regression model through visualization of... | ['model'] | {} | ['tabular_data', 'visualization', 'model_training'] | ['regression'] |
validmind.model_validation.statsmodels.RegressionFeatureSignificance | Regression Feature Significance | Assesses and visualizes the statistical significance of features in a regression model.... | ['model'] | {'fontsize': {'type': 'int', 'default': 10}, 'p_threshold': {'type': 'float', 'default': 0.05}} | ['statistical_test', 'model_interpretation', 'visualization', 'feature_importance'] | ['regression'] |
validmind.model_validation.statsmodels.RegressionModelForecastPlot | Regression Model Forecast Plot | Generates plots to visually compare the forecasted outcomes of a regression model against actual observed values over... | ['model', 'dataset'] | {'start_date': {'type': None, 'default': None}, 'end_date': {'type': None, 'default': None}} | ['time_series_data', 'forecasting', 'visualization'] | ['regression'] |
validmind.model_validation.statsmodels.RegressionModelForecastPlotLevels | Regression Model Forecast Plot Levels | Assesses the alignment between forecasted and observed values in regression models through visual plots... | ['model', 'dataset'] | {} | ['time_series_data', 'forecasting', 'visualization'] | ['regression'] |
validmind.model_validation.statsmodels.RegressionModelSensitivityPlot | Regression Model Sensitivity Plot | Assesses the sensitivity of a regression model to changes in independent variables by applying shocks and... | ['dataset', 'model'] | {'shocks': {'type': None, 'default': [0.1]}, 'transformation': {'type': None, 'default': None}} | ['senstivity_analysis', 'visualization'] | ['regression'] |
validmind.model_validation.statsmodels.RegressionModelSummary | Regression Model Summary | Evaluates regression model performance using metrics including R-Squared, Adjusted R-Squared, MSE, and RMSE.... | ['dataset', 'model'] | {} | ['model_performance', 'regression'] | ['regression'] |
validmind.model_validation.statsmodels.RegressionPermutationFeatureImportance | Regression Permutation Feature Importance | Assesses the significance of each feature in a model by evaluating the impact on model performance when feature... | ['dataset', 'model'] | {'fontsize': {'type': 'int', 'default': 12}, 'figure_height': {'type': 'int', 'default': 500}} | ['statsmodels', 'feature_importance', 'visualization'] | ['regression'] |
validmind.model_validation.statsmodels.ScorecardHistogram | Scorecard Histogram | The Scorecard Histogram test evaluates the distribution of credit scores between default and non-default instances,... | ['dataset'] | {'title': {'type': '_empty', 'default': 'Histogram of Scores'}, 'score_column': {'type': '_empty', 'default': 'score'}} | ['visualization', 'credit_risk', 'logistic_regression'] | ['classification'] |
validmind.ongoing_monitoring.CalibrationCurveDrift | Calibration Curve Drift | Evaluates changes in probability calibration between reference and monitoring datasets.... | ['datasets', 'model'] | {'n_bins': {'type': 'int', 'default': 10}, 'drift_pct_threshold': {'type': 'float', 'default': 20}} | ['sklearn', 'binary_classification', 'model_performance', 'visualization'] | ['classification', 'text_classification'] |
validmind.ongoing_monitoring.ClassDiscriminationDrift | Class Discrimination Drift | Compares classification discrimination metrics between reference and monitoring datasets.... | ['datasets', 'model'] | {'drift_pct_threshold': {'type': '_empty', 'default': 20}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance'] | ['classification', 'text_classification'] |
validmind.ongoing_monitoring.ClassImbalanceDrift | Class Imbalance Drift | Evaluates drift in class distribution between reference and monitoring datasets.... | ['datasets'] | {'drift_pct_threshold': {'type': 'float', 'default': 5.0}, 'title': {'type': 'str', 'default': 'Class Distribution Drift'}} | ['tabular_data', 'binary_classification', 'multiclass_classification'] | ['classification'] |
validmind.ongoing_monitoring.ClassificationAccuracyDrift | Classification Accuracy Drift | Compares classification accuracy metrics between reference and monitoring datasets.... | ['datasets', 'model'] | {'drift_pct_threshold': {'type': '_empty', 'default': 20}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance'] | ['classification', 'text_classification'] |
validmind.ongoing_monitoring.ConfusionMatrixDrift | Confusion Matrix Drift | Compares confusion matrix metrics between reference and monitoring datasets.... | ['datasets', 'model'] | {'drift_pct_threshold': {'type': '_empty', 'default': 20}} | ['sklearn', 'binary_classification', 'multiclass_classification', 'model_performance'] | ['classification', 'text_classification'] |
validmind.ongoing_monitoring.CumulativePredictionProbabilitiesDrift | Cumulative Prediction Probabilities Drift | Compares cumulative prediction probability distributions between reference and monitoring datasets.... | ['datasets', 'model'] | {} | ['visualization', 'credit_risk'] | ['classification'] |
validmind.ongoing_monitoring.FeatureDrift | Feature Drift | Evaluates changes in feature distribution over time to identify potential model drift.... | ['datasets'] | {'bins': {'type': '_empty', 'default': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}, 'feature_columns': {'type': '_empty', 'default': None}, 'psi_threshold': {'type': '_empty', 'default': 0.2}} | ['visualization'] | ['monitoring'] |
validmind.ongoing_monitoring.PredictionAcrossEachFeature | Prediction Across Each Feature | Assesses differences in model predictions across individual features between reference and monitoring datasets... | ['datasets', 'model'] | {} | ['visualization'] | ['monitoring'] |
validmind.ongoing_monitoring.PredictionCorrelation | Prediction Correlation | Assesses correlation changes between model predictions from reference and monitoring datasets to detect potential... | ['datasets', 'model'] | {'drift_pct_threshold': {'type': '_empty', 'default': 20}} | ['visualization'] | ['monitoring'] |
validmind.ongoing_monitoring.PredictionProbabilitiesHistogramDrift | Prediction Probabilities Histogram Drift | Compares prediction probability distributions between reference and monitoring datasets.... | ['datasets', 'model'] | {'title': {'type': '_empty', 'default': 'Prediction Probabilities Histogram Drift'}, 'drift_pct_threshold': {'type': 'float', 'default': 20.0}} | ['visualization', 'credit_risk'] | ['classification'] |
validmind.ongoing_monitoring.PredictionQuantilesAcrossFeatures | Prediction Quantiles Across Features | Assesses differences in model prediction distributions across individual features between reference... | ['datasets', 'model'] | {} | ['visualization'] | ['monitoring'] |
validmind.ongoing_monitoring.ROCCurveDrift | ROC Curve Drift | Compares ROC curves between reference and monitoring datasets.... | ['datasets', 'model'] | {} | ['sklearn', 'binary_classification', 'model_performance', 'visualization'] | ['classification', 'text_classification'] |
validmind.ongoing_monitoring.ScoreBandsDrift | Score Bands Drift | Analyzes drift in population distribution and default rates across score bands.... | ['datasets', 'model'] | {'score_column': {'type': 'str', 'default': 'score'}, 'score_bands': {'type': 'list', 'default': None}, 'drift_threshold': {'type': 'float', 'default': 20.0}} | ['visualization', 'credit_risk', 'scorecard'] | ['classification'] |
validmind.ongoing_monitoring.ScorecardHistogramDrift | Scorecard Histogram Drift | Compares score distributions between reference and monitoring datasets for each class.... | ['datasets'] | {'score_column': {'type': 'str', 'default': 'score'}, 'title': {'type': 'str', 'default': 'Scorecard Histogram Drift'}, 'drift_pct_threshold': {'type': 'float', 'default': 20.0}} | ['visualization', 'credit_risk', 'logistic_regression'] | ['classification'] |
validmind.ongoing_monitoring.TargetPredictionDistributionPlot | Target Prediction Distribution Plot | Assesses differences in prediction distributions between a reference dataset and a monitoring dataset to identify... | ['datasets', 'model'] | {'drift_pct_threshold': {'type': '_empty', 'default': 20}} | ['visualization'] | ['monitoring'] |
validmind.prompt_validation.Bias | Bias | Assesses potential bias in a Large Language Model by analyzing the distribution and order of exemplars in the... | ['model'] | {'min_threshold': {'type': '_empty', 'default': 7}} | ['llm', 'few_shot'] | ['text_classification', 'text_summarization'] |
validmind.prompt_validation.Clarity | Clarity | Evaluates and scores the clarity of prompts in a Large Language Model based on specified guidelines.... | ['model'] | {'min_threshold': {'type': '_empty', 'default': 7}} | ['llm', 'zero_shot', 'few_shot'] | ['text_classification', 'text_summarization'] |
validmind.prompt_validation.Conciseness | Conciseness | Analyzes and grades the conciseness of prompts provided to a Large Language Model.... | ['model'] | {'min_threshold': {'type': '_empty', 'default': 7}} | ['llm', 'zero_shot', 'few_shot'] | ['text_classification', 'text_summarization'] |
validmind.prompt_validation.Delimitation | Delimitation | Evaluates the proper use of delimiters in prompts provided to Large Language Models.... | ['model'] | {'min_threshold': {'type': '_empty', 'default': 7}} | ['llm', 'zero_shot', 'few_shot'] | ['text_classification', 'text_summarization'] |
validmind.prompt_validation.NegativeInstruction | Negative Instruction | Evaluates and grades the use of affirmative, proactive language over negative instructions in LLM prompts.... | ['model'] | {'min_threshold': {'type': '_empty', 'default': 7}} | ['llm', 'zero_shot', 'few_shot'] | ['text_classification', 'text_summarization'] |
validmind.prompt_validation.Robustness | Robustness | Assesses the robustness of prompts provided to a Large Language Model under varying conditions and contexts. This test... | ['model', 'dataset'] | {'num_tests': {'type': '_empty', 'default': 10}} | ['llm', 'zero_shot', 'few_shot'] | ['text_classification', 'text_summarization'] |
validmind.prompt_validation.Specificity | Specificity | Evaluates and scores the specificity of prompts provided to a Large Language Model (LLM), based on clarity, detail,... | ['model'] | {'min_threshold': {'type': '_empty', 'default': 7}} | ['llm', 'zero_shot', 'few_shot'] | ['text_classification', 'text_summarization'] |
validmind.unit_metrics.classification.Accuracy | Accuracy | Calculates the accuracy of a model | ['dataset', 'model'] | {} | ['classification'] | ['classification'] |
validmind.unit_metrics.classification.F1 | F1 | Calculates the F1 score for a classification model. | ['model', 'dataset'] | {} | ['classification'] | ['classification'] |
validmind.unit_metrics.classification.Precision | Precision | Calculates the precision for a classification model. | ['model', 'dataset'] | {} | ['classification'] | ['classification'] |
validmind.unit_metrics.classification.ROC_AUC | ROC AUC | Calculates the ROC AUC for a classification model. | ['model', 'dataset'] | {} | ['classification'] | ['classification'] |
validmind.unit_metrics.classification.Recall | Recall | Calculates the recall for a classification model. | ['model', 'dataset'] | {} | ['classification'] | ['classification'] |
validmind.unit_metrics.regression.AdjustedRSquaredScore | Adjusted R Squared Score | Calculates the adjusted R-squared score for a regression model. | ['model', 'dataset'] | {} | ['regression'] | ['regression'] |
validmind.unit_metrics.regression.GiniCoefficient | Gini Coefficient | Calculates the Gini coefficient for a regression model. | ['dataset', 'model'] | {} | ['regression'] | ['regression'] |
validmind.unit_metrics.regression.HuberLoss | Huber Loss | Calculates the Huber loss for a regression model. | ['model', 'dataset'] | {} | ['regression'] | ['regression'] |
validmind.unit_metrics.regression.KolmogorovSmirnovStatistic | Kolmogorov Smirnov Statistic | Calculates the Kolmogorov-Smirnov statistic for a regression model. | ['dataset', 'model'] | {} | ['regression'] | ['regression'] |
validmind.unit_metrics.regression.MeanAbsoluteError | Mean Absolute Error | Calculates the mean absolute error for a regression model. | ['model', 'dataset'] | {} | ['regression'] | ['regression'] |
validmind.unit_metrics.regression.MeanAbsolutePercentageError | Mean Absolute Percentage Error | Calculates the mean absolute percentage error for a regression model. | ['model', 'dataset'] | {} | ['regression'] | ['regression'] |
validmind.unit_metrics.regression.MeanBiasDeviation | Mean Bias Deviation | Calculates the mean bias deviation for a regression model. | ['model', 'dataset'] | {} | ['regression'] | ['regression'] |
validmind.unit_metrics.regression.MeanSquaredError | Mean Squared Error | Calculates the mean squared error for a regression model. | ['model', 'dataset'] | {} | ['regression'] | ['regression'] |
validmind.unit_metrics.regression.QuantileLoss | Quantile Loss | Calculates the quantile loss for a regression model. | ['model', 'dataset'] | {'quantile': {'type': '_empty', 'default': 0.5}} | ['regression'] | ['regression'] |
validmind.unit_metrics.regression.RSquaredScore | R Squared Score | Calculates the R-squared score for a regression model. | ['model', 'dataset'] | {} | ['regression'] | ['regression'] |
validmind.unit_metrics.regression.RootMeanSquaredError | Root Mean Squared Error | Calculates the root mean squared error for a regression model. | ['model', 'dataset'] | {} | ['regression'] | ['regression'] |
Programmatic Use
To work with a specific set of tests programmatically, you can store the results in a variable. For instance, let's list all tests that are designed for Text Summarization tests and store them in text_summarization_tests
for further use.
= list_tests(task="text_summarization", pretty=False)
text_summarization_tests text_summarization_tests
['validmind.data_validation.DatasetDescription',
'validmind.data_validation.DatasetSplit',
'validmind.data_validation.nlp.CommonWords',
'validmind.data_validation.nlp.Hashtags',
'validmind.data_validation.nlp.LanguageDetection',
'validmind.data_validation.nlp.Mentions',
'validmind.data_validation.nlp.Punctuations',
'validmind.data_validation.nlp.StopWords',
'validmind.data_validation.nlp.TextDescription',
'validmind.model_validation.BertScore',
'validmind.model_validation.BleuScore',
'validmind.model_validation.ContextualRecall',
'validmind.model_validation.MeteorScore',
'validmind.model_validation.RegardScore',
'validmind.model_validation.RougeScore',
'validmind.model_validation.TokenDisparity',
'validmind.model_validation.ToxicityScore',
'validmind.model_validation.embeddings.CosineSimilarityComparison',
'validmind.model_validation.embeddings.CosineSimilarityHeatmap',
'validmind.model_validation.embeddings.EuclideanDistanceComparison',
'validmind.model_validation.embeddings.EuclideanDistanceHeatmap',
'validmind.model_validation.embeddings.PCAComponentsPairwisePlots',
'validmind.model_validation.embeddings.TSNEComponentsPairwisePlots',
'validmind.model_validation.ragas.AnswerCorrectness',
'validmind.model_validation.ragas.AspectCritic',
'validmind.model_validation.ragas.ContextEntityRecall',
'validmind.model_validation.ragas.ContextPrecision',
'validmind.model_validation.ragas.ContextPrecisionWithoutReference',
'validmind.model_validation.ragas.ContextRecall',
'validmind.model_validation.ragas.Faithfulness',
'validmind.model_validation.ragas.NoiseSensitivity',
'validmind.model_validation.ragas.ResponseRelevancy',
'validmind.model_validation.ragas.SemanticSimilarity',
'validmind.prompt_validation.Bias',
'validmind.prompt_validation.Clarity',
'validmind.prompt_validation.Conciseness',
'validmind.prompt_validation.Delimitation',
'validmind.prompt_validation.NegativeInstruction',
'validmind.prompt_validation.Robustness',
'validmind.prompt_validation.Specificity']
Delving into Test Details with describe_test
After identifying a set of potential tests, you might want to explore the specifics of an individual test. The describe_test
function provides a deep dive into the details of a test. It reveals the test name, description, ID, test type, and required inputs. Below, we showcase how to describe a test using its ID:
"validmind.model_validation.sklearn.OverfitDiagnosis") describe_test(
Next steps
By harnessing the functionalities presented in this guide, you should be able to easily list and filter through all of ValidMind's available tests and find those you are interested in running against your model and/or dataset. The next step is to take the IDs of the tests you'd like to run and either create a test suite for reuse or just run them directly to try them out. See the other notebooks for a tutorial on how to do both.
Discover more learning resources
We offer many interactive notebooks to help you document models:
Or, visit our documentation to learn more about ValidMind.