Skip to content

Commit

Permalink
Update to batch and deployment README
Browse files Browse the repository at this point in the history
  • Loading branch information
brightsparc committed Sep 16, 2021
1 parent f046a85 commit a79e232
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 8 deletions.
15 changes: 12 additions & 3 deletions batch_pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,20 @@ The model build pipeline contains three stages:
1. Source: This stage pulls the latest code from the **AWS CodeCommit** repository.
2. Build: The **AWS CodeBuild** action creates an Amazon SageMaker Pipeline definition and stores this definition as a JSON on S3. Take a look at the pipeline definition in the CodeCommit repository `pipelines/pipeline.py`. The build also creates an **AWS CloudFormation** template using the AWS CDK - take a look at the respective CDK App `app.py`.
3. BatchStaging: This stage executes the staging CloudFormation template to create/update a **SageMaker Pipeline** based on the latest approved model. The pipeline includes a manual approval gate, which triggers the deployment of the model to production.
4. BatchProd: This stage creates or updates a **SageMaker Pipelines** which includes a **SageMaker Model Monitor** job that will output `constraint_violations.json` when drift is detected. A [CloudWatch Event](https://docs.aws.amazon.com/codepipeline/latest/userguide/create-cloudtrail-S3-source-cfn.html) rule is setup to trigger re-training when this this file is output to S3.
4. BatchProd: This stage creates or updates a **SageMaker Pipelines** which includes a **SageMaker Model Monitor** and **Evaluate Drift Lambda** that will emit [CloudWatch Metrics](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-interpreting-cloudwatch.html) (see below) that will trigger a **CloudWatch Alarm** for drift detection against the previously queried data quality baseline.

![Batch Pipeline](../docs/drift-batch-pipeline.png)

The batch transform pipeline will be triggered when a new file is uploaded to S3 or on a regular schedule.
### Metrics Published

CloudWatch Metrics are emitted with the following:
* Namespace `aws/sagemaker/ModelBuildingPipeline/data-metrics`
* MetricName `feature_baseline_drift_<<feature_name>>`
* MetricValue `distance` from the baseline

### Starting the Batch Pipeline

The batch pipeline outlined above will be started when code is committed to the **AWS CodeCommit** repository or when a model is approved in the **SageMaker Model Registry**.

## Testing

Expand All @@ -31,6 +40,6 @@ export SAGEMAKER_PROJECT_ID="<<project_id>>"
export AWS_REGION="<<region>>"
export ARTIFACT_BUCKET="sagemaker-project-<<project_id>>-build-<<region>>"
export SAGEMAKER_PIPELINE_ROLE_ARN="<<service_catalog_product_use_role>>"
export EVENT_ROLE_ARN="<<service_catalog_product_use_role>>"
export EVALUATE_DRIFT_FUNCTION_ARN="sagemaker-<<project_name>-evaluate-drift"
cdk synth
```
2 changes: 2 additions & 0 deletions build_pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@

This folder contains the code to create a model build pipeline that includes a SageMaker Baseline and Training Job.

To exercise this pipeline open the [build-pipeline.ipynb](build-pipeline.ipynb)

## Build Pipeline

The model build pipeline contains three stages:
Expand Down
16 changes: 11 additions & 5 deletions deployment_pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,20 @@ The deployment pipeline contains four stages:
1. Source: This stage pulls the latest code from the **AWS CodeCommit** repository.
2. Build: The **AWS CodeBuild** action runs the AWS CDK app that queries the **SageMaker Model Registry** for the latest approved model and the respective **SageMaker Pipeline** execution for the [Data Quality Baseline](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-create-baseline.html). Using the `staging-config.json` and `prod-config.json` the CDK app creates two **AWS CloudFormation** templates for the staging and production deployments respectively. Have a look at the CDK app `deployment_pipeline/app.py`.
3. DeployStaging Pipeline: This pipeline executes the staging CloudFormation template to create/update a **SageMaker Endpoint** based on the latest approved model. The pipeline includes a manual approval gate, which triggers the deployment of the model to production.
4. DeployProd Pipeline: This deployment creates or updates a **SageMaker Endpoint** with [Data Capture](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-capture.html) enabled, and also creates a [Model Monitoring Schedule](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-scheduling.html) and an optionally a **CloudWatch Alarm** for drift detection against the previously queried data quality baseline.
4. DeployProd Pipeline: This deployment creates or updates a **SageMaker Endpoint** with [Data Capture](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-capture.html) enabled, and also creates a [Model Monitoring Schedule](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-scheduling.html) which outputs **CloudWatch Metrics** (see below) and **CloudWatch Alarm** for drift detection against the previously queried data quality baseline.

![Deploy Pipeline](../docs/drift-deploy-pipeline.png)

### Triggering the Deployment Pipeline
### Metrics Published

The deployment pipeline outlined above will be triggered when code is committed to the **AWS CodeCommit** repository or when a model is approved in the **SageMaker Model Registry**. See below the CloudWatch event and the EventBridge rule used for triggering the deployment pipeline for the latter.
CloudWatch Metrics are emitted with the following:
* Namespace `aws/sagemaker/Endpoints/data-metrics`
* MetricName `feature_baseline_drift_<<feature_name>>`
* MetricValue `distance` from the baseline

### Starting the Deployment Pipeline

The deployment pipeline outlined above will be started when code is committed to the **AWS CodeCommit** repository or when a model is approved in the **SageMaker Model Registry** (see below the CloudWatch event below).

```
{
Expand All @@ -27,7 +34,7 @@ The deployment pipeline outlined above will be triggered when code is committed
"time": "2021-06-03T04:45:23Z",
"region": "<<region>>",
"resources": [
"arn:aws:sagemaker:<<region>>:<<account>>:model-package/<<project_name>>/26"
"arn:aws:sagemaker:<<region>>:<<account>>:model-package/<<project_name>>/<<version>>"
],
"detail": {
"ModelPackageName": "<<project_name>>/<<version>>",
Expand All @@ -51,7 +58,6 @@ The deployment pipeline outlined above will be triggered when code is committed
```
![EventBridge Model Registry Rule](../docs/deploy-model-approved-rule.png)


## Testing

Once you have created a SageMaker Project, you can test the **Build** stage locally by setting some environment variables.
Expand Down

0 comments on commit a79e232

Please sign in to comment.