Updates to notebooks and readme

aws-samples · Sep 16, 2021 · f4785c8 · f4785c8
1 parent 12d3d50
commit f4785c8
Show file tree

Hide file tree

Showing 7 changed files with 32 additions and 28 deletions.
diff --git a/BUILD.md b/BUILD.md
@@ -1,4 +1,4 @@
-# Amazon SageMaker Drift Detection Pipeline
+# Amazon SageMaker Drift Detection
 
 This page has details on how to build a custom SageMaker MLOps template from source.
 

diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# Amazon SageMaker Drift Detection Pipeline
+# Amazon SageMaker Drift Detection
 
 This sample demonstrates how to setup an Amazon SageMaker MLOps deployment pipeline for Drift detection
 
@@ -26,9 +26,9 @@ Follow are the list of the parameters.
 | PortfolioOwner     | The owner of the portfolio                     |
 | ProductVersion     | The product version to deploy                  |
 
-You can copy the the required `ExecutionRoleArn` role from the Studio dashboard.
+You can copy the the required `ExecutionRoleArn` role from your **User Details** in the SageMaker Studio dashboard.
 
-![Execution Role](docs/drift-execution-role.png)
+![Execution Role](docs/studio-execution-role.png)
 
 Alternatively see [BUILD.md](BUILD.md) for instructions on how to build the MLOps template from source.
 
@@ -39,21 +39,21 @@ Once your MLOps project template is registered in **AWS Service Catalog** you ca
 1. Switch back to the Launcher
 2. Click **New Project** from the **ML tasks and components** section.
 
-On the Create project page, SageMaker templates is chosen by default. This option lists the built-in templates. However, you want to use the template you published for the Amazon SageMaker Drift Detection Pipeline.
+On the Create project page, SageMaker templates is chosen by default. This option lists the built-in templates. However, you want to use the template you published for Amazon SageMaker drift detection.
 
-6. Choose **Organization templates**.
-7. Choose **Amazon SageMaker Drift Detection Pipeline**.
-8. Choose **Select project template**.
+3. Choose **Organization templates**.
+4. Choose **Amazon SageMaker drift detection template for real-time deployment**.
+5. Choose **Select project template**.
 
 ![Select Template](docs/drift-select-template.png)
 
 `NOTE`: If you have recently updated your AWS Service Catalog Project, you may need to refresh SageMaker Studio to ensure it picks up the latest version of your template.
 
-9. In the **Project details** section, for **Name**, enter **drift-pipeline**.
+6. In the **Project details** section, for **Name**, enter **drift-pipeline**.
   - The project name must have 32 characters or fewer.
-10. In the Project template parameters
-  - For **RetrainSchedule**, input a validate [Cron Schedule](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-schedule-expression.html) which defaults to `cron(0 12 1 * ? *)` - the first day of every month.
-11. Choose **Create project**.
+7. In the Project template parameter, for **RetrainSchedule**, input a validate [Cron Schedule](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-schedule-expression.html)
+  - This defaults to `cron(0 12 1 * ? *)` which is the first day of every month.
+8. Choose **Create project**.
 
 ![Create Project](docs/drift-create-project.png)
 
@@ -69,7 +69,7 @@ The MLOps Drift Detection template will create the following AWS services and re
   -  The first repository provides code to create a multi-step model building pipeline using [AWS CloudFormation](https://aws.amazon.com/cloudformation/).  The pipeline includes the following steps: data processing, model baseline, model training, model evaluation, and conditional model registration based on accuracy. The pipeline trains a linear regression model using the XGBoost algorithm on trip data from the [NYC Taxi Dataset](https://registry.opendata.aws/nyc-tlc-trip-records-pds/). This repository also includes the [drift-detection.ipynb](build_pipeline/drift-detection.ipynb) notebook to [Run the Pipeline](#run-the-pipeline) (see below)
   - The second repository contains code and configuration files for model deployment and monitoring. This repo also uses [AWS CodePipeline](https://aws.amazon.com/codepipeline/) and [CodeBuild](https://aws.amazon.com/codebuild/), which run an [AWS CloudFormation](https://aws.amazon.com/cloudformation/) template to create model endpoints for staging and production.  This repository includes the [prod-config.json](deployment_pipeline/prod-config.json) configure to set metrics and threshold for drift detection.
 
-3. Two CodePipeline pipelines:
+3. Two AWS CodePipeline pipelines:
   - The [model build pipeline](build_pipeline) creates or updates the pipeline definition and then starts a new execution with a custom [AWS Lambda](https://aws.amazon.com/lambda/) function whenever a new commit is made to the ModelBuild CodeCommit repository. The first time the CodePipeline is started, it will fail to complete expects input data to be uploaded to the Amazon S3 artifact bucket.
   - The [deployment pipeline](deployment_pipeline/README.md) automatically triggers whenever a new model version is added to the model registry and the status is marked as Approved. Models that are registered with Pending or Rejected statuses aren’t deployed.
 

diff --git a/build_pipeline/batch-pipeline.ipynb b/build_pipeline/batch-pipeline.ipynb
@@ -78,8 +78,6 @@
    "source": [
     "## Data Prep\n",
     "\n",
-    "A staging SageMaker Pipeline is created by AWS CloudFormation in the `Batch_CFN_Staging` stage of the above pipeline.\n",
-    "\n",
     "Download the test dataset output from the pre-processing job in our build pipeline, which we will use for input to batch scoring."
    ]
   },
@@ -150,7 +148,9 @@
    "source": [
     "## Test Staging\n",
     "\n",
-    "Now let's start the batch staging pipeline"
+    "A staging SageMaker Pipeline is created by AWS CloudFormation in the `Batch_CFN_Staging` stage of the above AWS CodePipeline.\n",
+    "\n",
+    "Once it is created, run the next cell to start the pipeline."
    ]
   },
   {
@@ -178,7 +178,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Once this has completed, download the batch scoring results"
+    "Once this has completed, download the batch scoring results and see the new `fare_amount_prediction` column."
    ]
   },
   {
@@ -342,7 +342,7 @@
    "source": [
     "## Monitor\n",
     "\n",
-    "Let's let the files produced by the Model Monitor job"
+    "Let's download the files produced by the Model Monitor job"
    ]
   },
   {
@@ -366,7 +366,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "If the job has produced any `constraint_violations.json` let's output this"
+    "If the job has output a `constraint_violations.json` file let's load this and output the violations."
    ]
   },
   {
@@ -396,7 +396,7 @@
     "\n",
     "The `EvaluateDrift` Lambda will read the contents of `constraint_violations.json` and will publish Amazon [CloudWatch Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-cloudwatch.html).  \n",
     "\n",
-    "If drift is detected above threshold of `0.5` for the target metric then a Amazon CloudWatch metric will Alarm resulting in the SageMaker pipeline to be re-trained.\n",
+    "If drift is detected for a metric above the threshold defined in the `prod-config.json` in the batch pipeline, then the Amazon CloudWatch will Alarm resulting in the SageMaker pipeline to be re-trained.\n",
     "\n",
     "To see the CloudWatch metric Alarm click on the link below."
    ]
@@ -527,7 +527,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can return to the [build-pipeline](build-pipeeline.ipynb) notebook to complete the cleanup."
+    "You can return to the [build-pipeline](build-pipeline.ipynb) notebook to complete the cleanup."
    ]
   }
  ],

diff --git a/build_pipeline/build-pipeline.ipynb b/build_pipeline/build-pipeline.ipynb
@@ -234,7 +234,7 @@
    "source": [
     "### Approve Model\n",
     "\n",
-    "🛑 Once we are happy with this training job, we can [Update the Approval Status](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-approve.html) of a model."
+    "ℹ️ Once we are happy with this training job, we can [Update the Approval Status](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-approve.html) of a model."
    ]
   },
   {

diff --git a/build_pipeline/deployment-pipeline.ipynb b/build_pipeline/deployment-pipeline.ipynb
@@ -78,8 +78,6 @@
    "source": [
     "## Data Prep\n",
     "\n",
-    "The staging SageMaker endpoint is created by AWS CloudFormation in the `Batch_CFN_Staging` stage of the above pipeline.\n",
-    "\n",
     "Download the test dataset output from the pre-processing job in our build pipeline, which we will use for input to batch scoring."
    ]
   },
@@ -133,7 +131,9 @@
    "source": [
     "## Test Staging\n",
     "\n",
-    "Run the next cell to wait for the staging endpoint to be in service."
+    "The staging SageMaker endpoint is created by AWS CloudFormation in the `Batch_CFN_Staging` stage of the AWS CodePipeline\n",
+    "\n",
+    "Once its created, run the next cell to wait for the staging endpoint to be in service."
    ]
   },
   {
@@ -309,7 +309,9 @@
     "    lambda x: 70 * random.betavariate(2.5, 2)\n",
     ")\n",
     "\n",
-    "tweaked_rows = test_df.drop(0, axis=1).to_csv(header=False, index=False).split(\"\\n\")"
+    "tweaked_rows = (\n",
+    "    test_df.drop(\"fare_amount\", axis=1).to_csv(header=False, index=False).split(\"\\n\")\n",
+    ")"
    ]
   },
   {
@@ -415,7 +417,9 @@
    "source": [
     "## Retrain\n",
     "\n",
-    "When the model monitoring schedule runs it will publish Amazon [CloudWatch Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-cloudwatch.html).  If drift is detected about a pre-configured threshold then an Amazon CloudWatch metric will Alarm resulting in the SageMaker pipeline to be re-trained.\n",
+    "When the model monitoring schedule runs it will publish Amazon [CloudWatch Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-cloudwatch.html).  \n",
+    "\n",
+    "If drift is detected for a metric above the threshold defined in the `prod-config.json` in the deployment pipeline, then the Amazon CloudWatch will Alarm resulting in the SageMaker pipeline to be re-trained.\n",
     "\n",
     "You can simulate drift by putting a metric value above the threshold of `0.5` directly into CloudWatch."
    ]
@@ -616,7 +620,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can return to the [build-pipeline](build-pipeeline.ipynb) notebook to complete the cleanup."
+    "You can return to the [build-pipeline](build-pipeline.ipynb) notebook to complete the cleanup."
    ]
   }
  ],

diff --git a/docs/drift-execution-role.png b/docs/drift-execution-role.png
diff --git a/docs/drift-select-template.png b/docs/drift-select-template.png