Skip to content

Commit

Permalink
Updates to notebooks and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
brightsparc committed Sep 16, 2021
1 parent 12d3d50 commit f4785c8
Show file tree
Hide file tree
Showing 7 changed files with 32 additions and 28 deletions.
2 changes: 1 addition & 1 deletion BUILD.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Amazon SageMaker Drift Detection Pipeline
# Amazon SageMaker Drift Detection

This page has details on how to build a custom SageMaker MLOps template from source.

Expand Down
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Amazon SageMaker Drift Detection Pipeline
# Amazon SageMaker Drift Detection

This sample demonstrates how to setup an Amazon SageMaker MLOps deployment pipeline for Drift detection

Expand Down Expand Up @@ -26,9 +26,9 @@ Follow are the list of the parameters.
| PortfolioOwner | The owner of the portfolio |
| ProductVersion | The product version to deploy |

You can copy the the required `ExecutionRoleArn` role from the Studio dashboard.
You can copy the the required `ExecutionRoleArn` role from your **User Details** in the SageMaker Studio dashboard.

![Execution Role](docs/drift-execution-role.png)
![Execution Role](docs/studio-execution-role.png)

Alternatively see [BUILD.md](BUILD.md) for instructions on how to build the MLOps template from source.

Expand All @@ -39,21 +39,21 @@ Once your MLOps project template is registered in **AWS Service Catalog** you ca
1. Switch back to the Launcher
2. Click **New Project** from the **ML tasks and components** section.

On the Create project page, SageMaker templates is chosen by default. This option lists the built-in templates. However, you want to use the template you published for the Amazon SageMaker Drift Detection Pipeline.
On the Create project page, SageMaker templates is chosen by default. This option lists the built-in templates. However, you want to use the template you published for Amazon SageMaker drift detection.

6. Choose **Organization templates**.
7. Choose **Amazon SageMaker Drift Detection Pipeline**.
8. Choose **Select project template**.
3. Choose **Organization templates**.
4. Choose **Amazon SageMaker drift detection template for real-time deployment**.
5. Choose **Select project template**.

![Select Template](docs/drift-select-template.png)

`NOTE`: If you have recently updated your AWS Service Catalog Project, you may need to refresh SageMaker Studio to ensure it picks up the latest version of your template.

9. In the **Project details** section, for **Name**, enter **drift-pipeline**.
6. In the **Project details** section, for **Name**, enter **drift-pipeline**.
- The project name must have 32 characters or fewer.
10. In the Project template parameters
- For **RetrainSchedule**, input a validate [Cron Schedule](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-schedule-expression.html) which defaults to `cron(0 12 1 * ? *)` - the first day of every month.
11. Choose **Create project**.
7. In the Project template parameter, for **RetrainSchedule**, input a validate [Cron Schedule](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-schedule-expression.html)
- This defaults to `cron(0 12 1 * ? *)` which is the first day of every month.
8. Choose **Create project**.

![Create Project](docs/drift-create-project.png)

Expand All @@ -69,7 +69,7 @@ The MLOps Drift Detection template will create the following AWS services and re
- The first repository provides code to create a multi-step model building pipeline using [AWS CloudFormation](https://aws.amazon.com/cloudformation/). The pipeline includes the following steps: data processing, model baseline, model training, model evaluation, and conditional model registration based on accuracy. The pipeline trains a linear regression model using the XGBoost algorithm on trip data from the [NYC Taxi Dataset](https://registry.opendata.aws/nyc-tlc-trip-records-pds/). This repository also includes the [drift-detection.ipynb](build_pipeline/drift-detection.ipynb) notebook to [Run the Pipeline](#run-the-pipeline) (see below)
- The second repository contains code and configuration files for model deployment and monitoring. This repo also uses [AWS CodePipeline](https://aws.amazon.com/codepipeline/) and [CodeBuild](https://aws.amazon.com/codebuild/), which run an [AWS CloudFormation](https://aws.amazon.com/cloudformation/) template to create model endpoints for staging and production. This repository includes the [prod-config.json](deployment_pipeline/prod-config.json) configure to set metrics and threshold for drift detection.

3. Two CodePipeline pipelines:
3. Two AWS CodePipeline pipelines:
- The [model build pipeline](build_pipeline) creates or updates the pipeline definition and then starts a new execution with a custom [AWS Lambda](https://aws.amazon.com/lambda/) function whenever a new commit is made to the ModelBuild CodeCommit repository. The first time the CodePipeline is started, it will fail to complete expects input data to be uploaded to the Amazon S3 artifact bucket.
- The [deployment pipeline](deployment_pipeline/README.md) automatically triggers whenever a new model version is added to the model registry and the status is marked as Approved. Models that are registered with Pending or Rejected statuses aren’t deployed.

Expand Down
16 changes: 8 additions & 8 deletions build_pipeline/batch-pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,6 @@
"source": [
"## Data Prep\n",
"\n",
"A staging SageMaker Pipeline is created by AWS CloudFormation in the `Batch_CFN_Staging` stage of the above pipeline.\n",
"\n",
"Download the test dataset output from the pre-processing job in our build pipeline, which we will use for input to batch scoring."
]
},
Expand Down Expand Up @@ -150,7 +148,9 @@
"source": [
"## Test Staging\n",
"\n",
"Now let's start the batch staging pipeline"
"A staging SageMaker Pipeline is created by AWS CloudFormation in the `Batch_CFN_Staging` stage of the above AWS CodePipeline.\n",
"\n",
"Once it is created, run the next cell to start the pipeline."
]
},
{
Expand Down Expand Up @@ -178,7 +178,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Once this has completed, download the batch scoring results"
"Once this has completed, download the batch scoring results and see the new `fare_amount_prediction` column."
]
},
{
Expand Down Expand Up @@ -342,7 +342,7 @@
"source": [
"## Monitor\n",
"\n",
"Let's let the files produced by the Model Monitor job"
"Let's download the files produced by the Model Monitor job"
]
},
{
Expand All @@ -366,7 +366,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If the job has produced any `constraint_violations.json` let's output this"
"If the job has output a `constraint_violations.json` file let's load this and output the violations."
]
},
{
Expand Down Expand Up @@ -396,7 +396,7 @@
"\n",
"The `EvaluateDrift` Lambda will read the contents of `constraint_violations.json` and will publish Amazon [CloudWatch Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-cloudwatch.html). \n",
"\n",
"If drift is detected above threshold of `0.5` for the target metric then a Amazon CloudWatch metric will Alarm resulting in the SageMaker pipeline to be re-trained.\n",
"If drift is detected for a metric above the threshold defined in the `prod-config.json` in the batch pipeline, then the Amazon CloudWatch will Alarm resulting in the SageMaker pipeline to be re-trained.\n",
"\n",
"To see the CloudWatch metric Alarm click on the link below."
]
Expand Down Expand Up @@ -527,7 +527,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can return to the [build-pipeline](build-pipeeline.ipynb) notebook to complete the cleanup."
"You can return to the [build-pipeline](build-pipeline.ipynb) notebook to complete the cleanup."
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion build_pipeline/build-pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@
"source": [
"### Approve Model\n",
"\n",
"🛑 Once we are happy with this training job, we can [Update the Approval Status](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-approve.html) of a model."
"ℹ️ Once we are happy with this training job, we can [Update the Approval Status](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-approve.html) of a model."
]
},
{
Expand Down
16 changes: 10 additions & 6 deletions build_pipeline/deployment-pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,6 @@
"source": [
"## Data Prep\n",
"\n",
"The staging SageMaker endpoint is created by AWS CloudFormation in the `Batch_CFN_Staging` stage of the above pipeline.\n",
"\n",
"Download the test dataset output from the pre-processing job in our build pipeline, which we will use for input to batch scoring."
]
},
Expand Down Expand Up @@ -133,7 +131,9 @@
"source": [
"## Test Staging\n",
"\n",
"Run the next cell to wait for the staging endpoint to be in service."
"The staging SageMaker endpoint is created by AWS CloudFormation in the `Batch_CFN_Staging` stage of the AWS CodePipeline\n",
"\n",
"Once its created, run the next cell to wait for the staging endpoint to be in service."
]
},
{
Expand Down Expand Up @@ -309,7 +309,9 @@
" lambda x: 70 * random.betavariate(2.5, 2)\n",
")\n",
"\n",
"tweaked_rows = test_df.drop(0, axis=1).to_csv(header=False, index=False).split(\"\\n\")"
"tweaked_rows = (\n",
" test_df.drop(\"fare_amount\", axis=1).to_csv(header=False, index=False).split(\"\\n\")\n",
")"
]
},
{
Expand Down Expand Up @@ -415,7 +417,9 @@
"source": [
"## Retrain\n",
"\n",
"When the model monitoring schedule runs it will publish Amazon [CloudWatch Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-cloudwatch.html). If drift is detected about a pre-configured threshold then an Amazon CloudWatch metric will Alarm resulting in the SageMaker pipeline to be re-trained.\n",
"When the model monitoring schedule runs it will publish Amazon [CloudWatch Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-cloudwatch.html). \n",
"\n",
"If drift is detected for a metric above the threshold defined in the `prod-config.json` in the deployment pipeline, then the Amazon CloudWatch will Alarm resulting in the SageMaker pipeline to be re-trained.\n",
"\n",
"You can simulate drift by putting a metric value above the threshold of `0.5` directly into CloudWatch."
]
Expand Down Expand Up @@ -616,7 +620,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can return to the [build-pipeline](build-pipeeline.ipynb) notebook to complete the cleanup."
"You can return to the [build-pipeline](build-pipeline.ipynb) notebook to complete the cleanup."
]
}
],
Expand Down
Binary file removed docs/drift-execution-role.png
Binary file not shown.
Binary file modified docs/drift-select-template.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f4785c8

Please sign in to comment.