Amazon SageMaker Clarify is a new machine learning (ML) feature that enables ML developers and data scientists to detect possible bias in their data and ML models and explain model predictions. It’s part of Amazon SageMaker, an end-to-end platform to build, train, and deploy your ML models. Clarify was made available at AWS re:Invent 2020.
In this post, we focus on the explainability capabilities in Clarify. With the concept of Shapley values, Clarify helps identify the importance of various features in overall model predictions as well as for individual inferences. These feature importance graphs are available after the model has trained, and Clarify can also help identify any shifts in feature importance over time through Amazon SageMaker Model Monitor.
At AWS, we take our mission to put ML in the hands of every developer seriously. For this reason, we want to present a way of using Clarify with the scikit-learn pre-built container. This approach generalizes across all pre-built and custom-built containers. We also provide a solution of integrating this new feature into your ML pipeline using the AWS Step Functions Data Science Python SDK.
In this post, we provide a step-by-step guide for using these capabilities and provide code that helps you get started.
For this walkthrough, you should have the following prerequisites:
- An AWS account
Deploy your resources
Before we get started, we need certain roles and policies that let your SageMaker notebooks interact with AWS Step Functions. We create these with an AWS CloudFormation template. The template also creates a SageMaker instance that automatically downloads this GitHub repository. Launch the stack with the following link:
Specify the Amazon Simple Storage Service (Amazon S3) bucket that you have in place for storing your data during stack creation.
After your stack is deployed, navigate to the SageMaker notebook instances on the SageMaker console. There you will find an up-and-running notebook that you can start using.
Use Clarify with a pre-built SKLearn container
One of the most-used Python libraries in the ML space is scikit-learn. For this reason, AWS offers a pre-built container in SageMaker for training and deploying ML models. With Clarify, you can now explain why your scikit-learn model predicts the way it does. For more information about getting started with Clarify using the built-in XGBoost model, see Fairness and Explainability with SageMaker Clarify.
Depending on whether you use a pre-built container or bring your own container, the way to integrate Clarify is slightly different. In the first part of this post, we explain the steps involved to work with Clarify and the pre-built SKLearn container.
Get started by setting global variables
Source - Continue Reading: https://aws.amazon.com/blogs/machine-learning/use-amazon-sagemaker-clarify-with-the-sklearn-pre-built-container/