Initial commit

jmeisele · Mar 31, 2021 · 60d5c52 · 60d5c52
commit 60d5c52
Show file tree

Hide file tree

Showing 149 changed files with 8,470 additions and 0 deletions.
diff --git a/.env b/.env
@@ -0,0 +1,9 @@
+COMPOSE_PROJECT_NAME=ml_ops
+FEAST_VERSION=develop
+FEAST_CORE_CONFIG=./feast/core/core.yml
+FEAST_ONLINE_SERVING_CONFIG=./feast/serving/online-serving.yml
+GCP_SERVICE_ACCOUNT=./feast/gcp-service-accounts/placeholder.json
+INGESTION_JAR_PATH=https://storage.googleapis.com/feast-jobs/spark/ingestion/feast-ingestion-spark-develop.jar
+MLFLOW_S3_ENDPOINT_URL=http://minio:9000
+AWS_ACCESS_KEY_ID=minioadmin
+AWS_SECRET_ACCESS_KEY=minioadmin
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,129 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+# .env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
diff --git a/README.md b/README.md
@@ -0,0 +1,139 @@
+# MLOps
+Cloud agnostic tech stack for starting an MLOps platform (Level 1)
+
+"We'll build a pipeline - after we deploy the model."
+
+![Wink](docs/wink.gif)
+
+```Model drift will hit when it's least convenient for you```
+
+
+__To run__:
+Make sure docker is running and you have [Docker Compose](https://docs.docker.com/compose/install/) installed. 
+
+1. Clone the project
+    ```bash
+    git clone https://github.com/jmeisele/ml-ops.git
+    ```
+2. Change directories into the repo
+    ```bash
+    cd ml-ops
+    ```
+3. Run database migrations and create the first Airflow user account.
+    ```bash
+    docker-compose up airflow-init
+    ```
+
+4. Build our images and launch with docker compose
+    ```bash
+    docker-compose pull && docker-compose up
+    ```
+5. Open a browser and log in to [MinIO](http://localhost:9090)
+
+    user: _minioadmin_
+
+    password : _minioadmin_
+
+    Create a bucket called ```mlflow```
+
+    ![MinIO](docs/minio.gif)
+6. Open a browser and log in to [Grafana](http://localhost:3000)
+
+    user: _admin_
+
+    password : _admin_
+
+    ![Grafana](docs/grafana_login.gif)
+7. Add the Prometheus data source
+
+    URL: ```http://prometheus:9090```
+
+    ![Prometheus](docs/prometheus.gif)
+8. Add the InfluxDB data source
+
+    URL: ```http://influxdb:8086```
+
+    Basic Auth
+
+      User: _ml-ops-admin_
+
+      Password: _ml-ops-pwd_
+
+      Database: _mlopsdemo_
+
+    ![InfluxDB](docs/influxdb.gif)
+
+9. Import the MLOps Demo Dashhboard from the Grafana directory in this repo
+    ![MLOps_Dashboard](docs/mlopsdashboard.gif)
+
+10. Create an Alarm Notification channel 
+
+    URL: ```http://bridge_server:8002/route```
+
+    ![Alarm_Channel](docs/alarm_channel.gif)
+
+11. Add the alarm channel to some panels 
+    ![Panels](docs/alarms_to_panels.gif)
+
+12. Start the ```send_data.py``` script which sends a POST request every 0.1 seconds
+
+13. Open a browser and turn on the Airflow DAG used to retrain our ML model
+
+    user: _airflow_
+
+    password : _airflow_
+
+  ![Airflow](docs/airflow_login.gif)
+
+14. Lower the alarm threshold to see the Airflow DAG pipeline get triggered
+
+  ![Threshold](docs/lower_threshold.gif)
+
+15. Check [MLFlow](http://localhost:5000) after the Airflow DAG has run to see the model artifacts stored using MinIO as the object storage layer.
+
+16. (Optional) Send a POST request to our model service API endpoint
+    ```bash
+    curl -v -H "Content-Type: application/json" -X POST -d
+    '{
+        "median_income_in_block": 8.3252,
+        "median_house_age_in_block": 41,
+        "average_rooms": 6,
+        "average_bedrooms": 1,
+        "population_per_block": 322,
+        "average_house_occupancy": 2.55,
+        "block_latitude": 37.88,
+        "block_longitude": -122.23
+    }'  
+    http://localhost/model/predict
+    ```
+16. (Optional) If you are so bold, you can also simluate production traffic using locust, __but__ keep in mind you have a lot of services running on your local machine, you would never deploy a production ML API on your local machine to handle production traffic. 
+
+## Level 1 Workflow & Platform Architecture
+![MLOps](docs/mlops_level1.drawio.svg)
+
+## Model Serving Architecture
+![API worker architecture](docs/ml_api_architecture.drawio.svg)
+
+## Services
+- nginx: Load Balancer
+- python-model-service1: FastAPI Machine Learning API 1
+- python-model-service2: FastAPI Machine Learning API 2
+- postgresql: RDBMS
+- rabbitmq: Message Queue
+- rabbitmq workers: Workers listening to RabbitMQ
+- locust: Load testing and simulate production traffic
+- prometheus: Metrics scraping
+- minio: Object storage
+- mlflow: Machine Learning Experiment Management
+- influxdb: Time Series Database
+- chronograf: Admin & WebUI for InxfluxDB
+- grafana: Performance Monitoring
+- redis: Cache
+- airflow: Workflow Orchestrator
+- bridge server: Receives webhook from Grafana and translates to Airflow REST API
+
+## gotchas:
+
+### Postgres:
+
+_Warning: scripts in /docker-entrypoint-initdb.d are only run if you start the container with a data directory that is empty; any pre-existing database will be left untouched on container startup._
diff --git a/airflow.sh b/airflow.sh
@@ -0,0 +1,28 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+#
+# Run airflow command in container
+#
+
+PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+
+set -euo pipefail
+
+export COMPOSE_FILE=${PROJECT_DIR}/docker-compose.yaml
+exec docker-compose run airflow-worker "${@}"