Build multi-class classification models with Amazon Redshift ML

 Build multi-class classification models with Amazon Redshift ML

Amazon Redshift ML simplifies the use of machine learning (ML) by using simple SQL statements to create and train ML models from data in Amazon Redshift. You can use Amazon Redshift ML to solve binary classification, multi-class classification, and regression problems and can use either AutoML or XGBoost directly.

This post is part of a series that describes the use of Amazon Redshift ML. For more information about building regression using Amazon Redshift ML, see Build regression models with Amazon Redshift ML.

You can use Amazon Redshift ML to automate data preparation, pre-processing, and selection of problem type as depicted in this blog post. We assume that you have a good understanding of your data and what problem type is most applicable for your use case. This post specifically focuses on creating models in Amazon Redshift using the multi-class classification problem type, which consists on classifying instances into one of three or more classes. For example, you can predict whether a transaction is fraudulent, failed or successful, whether a customer will remain active for 3 months, six months, nine months, 12 months, or whether a news is tagged as sports, world news, business.


As a prerequisite for implementing this solution, you need to set up an Amazon Redshift cluster with ML enabled on it. For the preliminary steps to get started, see Create, train, and deploy machine learning models in Amazon Redshift using SQL with Amazon Redshift ML.

Use case

For our use case, we want to target our most active customers for a special customer loyalty program. We use Amazon Redshift ML and multi-class classification to predict how many months a customer will be active over a 13-month period. This translates into up to 13 possible classes, which makes this a better fit for multi-class classification. Customers with predicted activity of 7 months or greater are targeted for a special customer loyalty program.

Input raw data

To prepare the raw data for this model, we populated the table ecommerce_sales in Amazon Redshift using the public data set E-Commerce Sales Forecast, which includes sales data of an online UK retailer.

Enter the following statements to load the data to Amazon Redshift:

	invoiceno VARCHAR(30)   
	,stockcode VARCHAR(30)   
	,description VARCHAR(60)    
	,quantity DOUBLE PRECISION   
	,invoicedate VARCHAR(30)    
	,unitprice    DOUBLE PRECISION
	,customerid BIGINT    
	,country VARCHAR(25)    
Copy ecommerce_sales
From 's3://redshift-ml-multiclass/ecommerce_data.txt'
iam_role '<<your-amazon-redshift-sagemaker-iam-role-arn>>' delimiter 't' IGNOREHEADER 1 region 'us-east-1' maxerror 100;

To reproduce this script in your environment, replace <<your-ama


Source - Continue Reading:


Related post