September 27, 2023
In this release, we’ve added support for large language models (LLMs) to enhance the capabilities of the ValidMind Library in preparation for the closed beta,1 along with a number of new demo notebooks that you can try out.
Other enhancements provide improvements for the developer experience and with our documentation site.
Release highlights
ValidMind Library (v1.19.0)
Large language model (LLM) support
We added initial support for large language models (LLMs) in ValidMind via the new FoundationModel
class.
- You can now create an instance of a
FoundationModel
and specifypredict_fn
and aprompt
, and pass that into any test suite, for example. - The
predict_fn
must be defined by the user and implements the logic for calling the Foundation LLM, usually via the Python Library API.
To demonstrate the capabilities of LLM support, this release also includes new demo notebooks:
Prompt validation demo notebook for LLMs
As a proof of concept, we added initial native prompt validation tests to the library, including a notebook and simple template to test out these metrics on a sentiment analysis LLM model we built.
Text summarization model demo notebook for LLMs
We added a new notebook in the library that includes the financial news dataset, initializes a Hugging Face summarization model using the init_model
interface, implements relevant metrics for testing, and demonstrates how to run a text summarization metrics test suite for an LLM instructed as a financial news summarizer.
Support for Hugging Face models
ValidMind can now validate pre-trained models from the HuggingFace Hub, including any language model compatible with the HF transformers API.
To illustrate this new feature, we have included a financial news sentiment analysis demo that runs documentation tests for a Hugging Face model with text classification using the financial_phrasebank
:2
A better developer experience with run_test()
We added a new run_test()
helper function that streamlines running tests for you. This function allows executing any individual test independent of a test suite or a documentation template. A one-line command can execute a test, making it easier to run tests with various parameters and options.
For example:
"ClassImbalance", dataset=dataset, params=params, send=True) run_test(
We also updated the QuickStart notebook to have a consistent experience.
This notebook:
- Now runs
vm.preview_template()
after initializing ValidMind - Now runs
vm.run_documentation_tests()
instead of running a test suite that is not connected to the template
Example usage for run_test
Discover existing tests by calling list_tests()
or describe_test()
:
list_tests()
:
describe_test()
:
View the tests associated with a documentation template by running preview_template()
:
Using the test ID, run a given test and pass in additional configuration parameters and inputs:
# No params
= vm.tests.run_test(
test_results "class_imbalance",
=vm_dataset
dataset
)
# Custom params
= vm.tests.run_test(
test_results "class_imbalance",
={"min_percent_threshold": 30},
params=vm_dataset
dataset )
Output:
Enhancements
ValidMind Library (v1.19.0)
Multi-class test improvements
We made a number of changes to tests to improve the developer experience:
- A new
fail_fast
argument can be passed torun_test_plan()
,run_test_suite()
andrun_documentation_tests()
, used to fail and raise an exception on the first error encountered. This change is useful for debugging. ClassifierPerformance
test now determines if you are testing a binary or a multi-class model. When testing a multi-class model, we now report additional per-class, macro and weighted average tests.- Fixed F1 score test so it works correctly for binary and multi-class models.
Added multi-class classification support
- The library now supports a multi-class version of some the existing tests, such as confusion matrix, accuracy, precision, recall, and more.
- Also, the dataset and model interfaces now support dealing with multiple targets.
Implemented classification model comparison tests
- Added a model performance comparison test for classification tasks.
- The test includes metrics such as
accuracy
,F1
,precision
,recall
, androc_auc
score.
Track additional test metadata
- Added a
metadata
property to every ValidMind test class. - The
metadata
property includes atask_types
field and atags
field which both serve to categorize the tests based on what data and model types they work with, what category of test they fall into, and more.
Documentation
User guide updates
User journey improvements
We enhanced the architecture and content of our external docs site to make the user journey more efficient for model developers and model validators who are new to our products:
- Reworked the “Get Started” section to include more conceptual information and an overview of the high-level workflows.
- Revised the “What is the ValidMind Library?” section to provide an end-to-end overview of the workflow that model developers should follow as they adopt the library.
Site enhancements
Docs site improvements
We made a number of incremental improvements to our user guide:
- New dropdown for the ValidMind Library that gives faster access to the most important bits, such as our code samples and the reference documentation — Click on Developers in the top navigation bar to see it in action!
- Publication date for each page that reflects the last time the source file was touched.
- Previous and next topic footers for related topics that make it easier to keep reading.
- Expanded overview for key ValidMind concepts with some additional information.
- Lighter background for diagrams that improves legibility.
How to upgrade
ValidMind Platform
To access the latest version of the ValidMind Platform,3 hard refresh your browser tab:
- Windows:
Ctrl
+Shift
+R
ORCtrl
+F5
- MacOS:
⌘ Cmd
+Shift
+R
OR hold down⌘ Cmd
and click theReload
button
ValidMind Library
To upgrade the ValidMind Library:4
In your Jupyter Notebook:
Then within a code cell or your terminal, run:
%pip install --upgrade validmind
You may need to restart your kernel after running the upgrade package for changes to be applied.