Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
jmeisele committed Mar 31, 2021
0 parents commit 60d5c52
Show file tree
Hide file tree
Showing 149 changed files with 8,470 additions and 0 deletions.
9 changes: 9 additions & 0 deletions .env
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
COMPOSE_PROJECT_NAME=ml_ops
FEAST_VERSION=develop
FEAST_CORE_CONFIG=./feast/core/core.yml
FEAST_ONLINE_SERVING_CONFIG=./feast/serving/online-serving.yml
GCP_SERVICE_ACCOUNT=./feast/gcp-service-accounts/placeholder.json
INGESTION_JAR_PATH=https://storage.googleapis.com/feast-jobs/spark/ingestion/feast-ingestion-spark-develop.jar
MLFLOW_S3_ENDPOINT_URL=http://minio:9000
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin
129 changes: 129 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
# .env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/
139 changes: 139 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# MLOps
Cloud agnostic tech stack for starting an MLOps platform (Level 1)

"We'll build a pipeline - after we deploy the model."

![Wink](docs/wink.gif)

```Model drift will hit when it's least convenient for you```


__To run__:
Make sure docker is running and you have [Docker Compose](https://docs.docker.com/compose/install/) installed.

1. Clone the project
```bash
git clone https://github.com/jmeisele/ml-ops.git
```
2. Change directories into the repo
```bash
cd ml-ops
```
3. Run database migrations and create the first Airflow user account.
```bash
docker-compose up airflow-init
```

4. Build our images and launch with docker compose
```bash
docker-compose pull && docker-compose up
```
5. Open a browser and log in to [MinIO](http://localhost:9090)

user: _minioadmin_

password : _minioadmin_

Create a bucket called ```mlflow```

![MinIO](docs/minio.gif)
6. Open a browser and log in to [Grafana](http://localhost:3000)

user: _admin_

password : _admin_

![Grafana](docs/grafana_login.gif)
7. Add the Prometheus data source

URL: ```http://prometheus:9090```

![Prometheus](docs/prometheus.gif)
8. Add the InfluxDB data source

URL: ```http://influxdb:8086```

Basic Auth

User: _ml-ops-admin_

Password: _ml-ops-pwd_

Database: _mlopsdemo_

![InfluxDB](docs/influxdb.gif)

9. Import the MLOps Demo Dashhboard from the Grafana directory in this repo
![MLOps_Dashboard](docs/mlopsdashboard.gif)

10. Create an Alarm Notification channel

URL: ```http://bridge_server:8002/route```

![Alarm_Channel](docs/alarm_channel.gif)

11. Add the alarm channel to some panels
![Panels](docs/alarms_to_panels.gif)

12. Start the ```send_data.py``` script which sends a POST request every 0.1 seconds

13. Open a browser and turn on the Airflow DAG used to retrain our ML model

user: _airflow_

password : _airflow_

![Airflow](docs/airflow_login.gif)

14. Lower the alarm threshold to see the Airflow DAG pipeline get triggered

![Threshold](docs/lower_threshold.gif)

15. Check [MLFlow](http://localhost:5000) after the Airflow DAG has run to see the model artifacts stored using MinIO as the object storage layer.

16. (Optional) Send a POST request to our model service API endpoint
```bash
curl -v -H "Content-Type: application/json" -X POST -d
'{
"median_income_in_block": 8.3252,
"median_house_age_in_block": 41,
"average_rooms": 6,
"average_bedrooms": 1,
"population_per_block": 322,
"average_house_occupancy": 2.55,
"block_latitude": 37.88,
"block_longitude": -122.23
}'
http://localhost/model/predict
```
16. (Optional) If you are so bold, you can also simluate production traffic using locust, __but__ keep in mind you have a lot of services running on your local machine, you would never deploy a production ML API on your local machine to handle production traffic.

## Level 1 Workflow & Platform Architecture
![MLOps](docs/mlops_level1.drawio.svg)

## Model Serving Architecture
![API worker architecture](docs/ml_api_architecture.drawio.svg)

## Services
- nginx: Load Balancer
- python-model-service1: FastAPI Machine Learning API 1
- python-model-service2: FastAPI Machine Learning API 2
- postgresql: RDBMS
- rabbitmq: Message Queue
- rabbitmq workers: Workers listening to RabbitMQ
- locust: Load testing and simulate production traffic
- prometheus: Metrics scraping
- minio: Object storage
- mlflow: Machine Learning Experiment Management
- influxdb: Time Series Database
- chronograf: Admin & WebUI for InxfluxDB
- grafana: Performance Monitoring
- redis: Cache
- airflow: Workflow Orchestrator
- bridge server: Receives webhook from Grafana and translates to Airflow REST API

## gotchas:

### Postgres:

_Warning: scripts in /docker-entrypoint-initdb.d are only run if you start the container with a data directory that is empty; any pre-existing database will be left untouched on container startup._
28 changes: 28 additions & 0 deletions airflow.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

#
# Run airflow command in container
#

PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

set -euo pipefail

export COMPOSE_FILE=${PROJECT_DIR}/docker-compose.yaml
exec docker-compose run airflow-worker "${@}"
Loading

0 comments on commit 60d5c52

Please sign in to comment.