Skip to content
This repository has been archived by the owner on May 25, 2020. It is now read-only.

Commit

Permalink
updated with Cloud Filestore
Browse files Browse the repository at this point in the history
  • Loading branch information
EamonKeane committed Jul 17, 2018
1 parent 584475f commit 68778e5
Show file tree
Hide file tree
Showing 11 changed files with 162 additions and 49 deletions.
111 changes: 74 additions & 37 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,37 @@

## Cost-effect, scalable and stateless airflow

Deploy an auto-scaling, stateless airflow cluster with the kubernetes executor and CloudSQL with an SSL airflow admin page, google Oauth2 login, and an NFS server for dags in under 20 minutes. Airflow logs are stored on a google cloud bucket. The monthly cost is approximately $150 fixed, plus $0.015 per vCPU hour <https://cloud.google.com/products/calculator/#id=22a2fecd-fc97-412f-8560-1ce1f70bb44f>:
Deploy a highly-available, auto-scaling, stateless airflow cluster with the kubernetes executor and CloudSQL. This also includes an SSL airflow admin page, google Oauth2 login, and Cloud Filestore for storing dags and logs in under 20 minutes. The monthly fixed cost is approximately $150 at the cheapest to $500/month for HA, plus $0.015 per vCPU hour <https://cloud.google.com/products/calculator/#id=22a2fecd-fc97-412f-8560-1ce1f70bb44f>:

Cheapest:

* $30/month for pre-emptible scheduler/web server node
* $70/month for 1 core CloudSQL instance
* $50/month for logs and storage

Cost per CPU Hour:
Cost per CPU Hour:

* Auto-scaling, pre-emptible `n1-highcpu-4` cost of $30/month, or $40/month assuming 75% utilisation.
* $40/(730 hours per month * 4 vCPU) = $0.015/vCPU hour

This calculation assumes you have idempotent dags, for non-idempotent dags the cost is circa $250/month + $0.05/vCPU hour. This compares with approximately $300 + $0.20/(vCPU + DB) hour with Cloud Composer <https://cloud.google.com/composer/pricing>. This tutorial installs on the free Google account ($300 over 12 months).
This calculation assumes you have idempotent dags, for non-idempotent dags the cost is circa $0.05/vCPU hour. This compares with approximately $300 + $0.20/(vCPU + DB)hour with Cloud Composer <https://cloud.google.com/composer/pricing>. This tutorial installs on the free Google account ($300 over 12 months).

## Installation instructions

![airflow-gke-deployed](images/airflow-gke.png "Airflow GKE Helm")

Pre-requisites:

* Ensure you have helm (v2.9.1), kubectl (v1.11.0), openssl, gcloud SDK (v208.0.1)
* Ensure the Cloud SQL Admin API has been enabled on your project (<https://cloud.google.com/sql/docs/mysql/admin-api/>)

Installation instructions:

```bash
git clone https://github.com/EamonKeane/airflow-GKE-k8sExecutor-helm.git
cd airflow-GKE-k8sExecutor-helm
```

-cloud-filestore-location=*|--cloud-filestore-location=*)
CLOUD_FILESTORE_LOCATION="${i#*=}"
;;
-highly-available-=*|--highly-available=*)
HIGHLY_AVAILABLE="${i#*=}"

```bash
# NOTE cloud filestore is only available in the following areas, so choose another region as necessary if your currently configured region is not listed
# asia-eas1, europe-west1, europe-west3, europe-west4, us-central1
Expand All @@ -45,20 +46,65 @@ HIGHLY_AVAILABLE=TRUE
./gcloud-sql-k8s-install.sh \
--project=$PROJECT \
--account=$ACCOUNT \
--gce_zone=$GCE_ZONE \
--gce-zone=$GCE_ZONE \
--region=$REGION \
--database-instance-name=$DATABASE_INSTANCE_NAME \
--cloud-filestore-zone=$CLOUD_FILESTORE_LOCATION \
--cloud-filestore-zone=$CLOUD_FILESTORE_ZONE \
--highly-available=$HIGHLY_AVAILABLE
```

CLOUD_FILESTORE_IP=$(gcloud beta filestore instances describe airflow-dags \
CLOUD_FILESTORE_IP=$(gcloud beta filestore instances describe airflow \
--project=$PROJECT \
--location=$CLOUD_FILESTORE_ZONE \
--format json | jq .networks[0].ipAddresses[0] --raw-output)

For airflow to be able to write to Cloud Filestore, you need to change the permissions on the NFS(<https://cloud.google.com/filestore/docs/quickstart-console>).
Follow the instructions below [Cloud Filestore Permissions](#Setting-file-permissions-on-Cloud-Filestore):

If not using Cloud Filestore, see below for the installation instructions for installing a Google Cloud [NFS Server](#NFS-Server).

# Setting file permissions on Cloud Filestore

Create a VM to mount the file share and make the required changes.

```bash
VM_NAME=change-permissions
gcloud compute --project=$PROJECT instances create $VM_NAME --zone=$GCE_ZONE
```

SSH into the machine

```bash
gcloud compute ssh $VM_NAME --zone=$GCE_ZONE --project=$PROJECT
```

Copy and paste the following into the terminal:

```bash
sudo apt-get -y update
sudo apt-get -y install nfs-common
```

Then copy and paste the following (substituting your `$CLOUD_FILESTORE_IP` for the ip address):

```bash
CLOUD_FILESTORE_IP=
sudo mkdir /mnt/test
sudo mount $CLOUD_FILESTORE_IP:/airflow /mnt/test
sudo mkdir /mnt/test/dags
sudo mkdir /mnt/test/logs
sudo chmod go+rw /mnt/test/dags
sudo chmod go+rw /mnt/test/logs
```

Then delete the VM:

```bash
gcloud compute instances delete $VM_NAME --zone=$GCE_ZONE --project=$PROJECT
```

## Install the helm chart:

```bash
helm upgrade \
--install \
Expand All @@ -72,15 +118,27 @@ helm upgrade \

You can change airflow/airflow.cfg and re-run the above `helm upgrade --install` command to redeploy the changes. This takes approximately 30 seconds.

Set `webScheduler.web.authenticate` to True and complete the section for SSL if you want this [SSL UI](#Exposing-oauth2-Google-ingress-with-cert-manager-and-nginx-ingress).
Alternatively to view the Dashboard UI with no authentication or SSL view:
Quickly copy the example dags folder here to the NFS by using `kubectl cp`:

```bash
NAMESPACE=default
DAGS_FOLDER_LOCAL=/Users/Eamon/kubernetes/airflow-GKE-k8sExecutor-helm/dags
DAGS_FOLDER_REMOTE=/usr/local/airflow/dags
export POD_NAME=$(kubectl get pods --namespace $NAMESPACE -l "app=airflow,tier=scheduler" -o jsonpath="{.items[0].metadata.name}")
kubectl cp $DAGS_FOLDER_LOCAL $POD_NAME:$DAGS_FOLDER_REMOTE
```

View the dashboard using the instructions below and you should see the examples in the dags folder of this repo.

```bash
export POD_NAME=$(kubectl get pods --namespace default -l "app=airflow,tier=web" -o jsonpath="{.items[0].metadata.name}")
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl port-forward $POD_NAME 8080:8080
```

Set `webScheduler.web.authenticate` to True and complete the section for SSL if you want this [SSL UI](#Exposing-oauth2-Google-ingress-with-cert-manager-and-nginx-ingress).
Alternatively to view the Dashboard UI with no authentication or SSL view:

## SSL Admin UI Webpage

To expose the web server behind a https url with google oauth, please see the section for google-oauth, cert-manager and nginx-ingress install instructions [SSL UI](#Exposing-oauth2-Google-ingress-with-cert-manager-and-nginx-ingress).
Expand All @@ -94,8 +152,8 @@ The easiest way to tidy-up is to delete the project and make a new one if re-dep
There are a few elements to the chart:

* This chart only focuses on the kubernetes executor and is tailored to run on GKE, but with some effort could be modified to run on premise or EKS/AKS.
* An NFS server is used for dags as GCE does not have a ReadWriteMany option yet (Cloud Filestore coming soon will be similar to Amazon Elastic File System and Azure File System. You need to populate this separately using e.g. Jenkins.
* Pre-install hooks add the airflow-RBAC account, dags PV, dags PVC and CloudSQL service. If the step fails at this point, you will need to remove everything before running helm again. See `tidying-up.sh` for details.
* Google Cloud Filestore (beta - equivalent of EFS and AFS on AWS and Azure respectively). You need to populate this separately using e.g. Jenkins (see sample jenkins file and instructions below [Jenkins](#Setup-Jenkins-to-sync-dags)).
* Pre-install hooks add the airflow-RBAC account, dags/logs PV, dags/logs PVC and CloudSQL service. If the step fails at this point, you will need to remove everything before running helm again. See `tidying-up.sh` for details.
* Pre-install and pre-upgrade hook to run the alembic migrations
* Separate, templated airflow.cfg a change of which triggers a redeployment of both the web scheduler and the web server. This is due to the name of the configmap being appended with the current seconds (-{{ .Release.Time.Seconds }}) so a new configmap gets deployed each time. You may want to delete old configmaps from time to time.

Expand Down Expand Up @@ -256,16 +314,6 @@ serverPath: $STORAGE_NAME

Set up Jenkins to trigger a build on each git push of this repository (see here for example instructions: <https://github.com/eamonkeane/jenkins-blue>). The dags folder will then appear synced in your webscheduler pods.

## Copy files to NFS

```bash
NAMESPACE=airflow
DAGS_FOLDER_LOCAL=/Users/Eamon/kubernetes/airflow-GKE-k8sExecutor-helm/dags
DAGS_FOLDER_REMOTE=/usr/local/airflow/dags
export POD_NAME=$(kubectl get pods --namespace $NAMESPACE -l "app=airflow,tier=scheduler" -o jsonpath="{.items[0].metadata.name}")
kubectl cp $DAGS_FOLDER_LOCAL $POD_NAME:$DAGS_FOLDER_REMOTE
```

## NFS Server

```bash
Expand Down Expand Up @@ -306,14 +354,3 @@ dagVolume:
```
Setup jenkins per the instructions [below](#Setup-Jenkins-to-sync-dags), or alternatively, copy the example pod operator in this repo to the $STORAGE_NAME of the NFS server (you can get connection instructions at this url <https://console.cloud.google.com/dm/deployments/details/$NFS_DEPLOYMENT_NAME?project=$PROJECT>)
# Setting file permissions
Shell into pod and change mode
```bash
sudo chmod go+rwx /dags
sudo chmod go+rwx /logs
sudo useradd airflow
usermod -a -G root airflow
```
2 changes: 1 addition & 1 deletion airflow/templates/pv-log.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.dagVolume.installPV -}}
{{- if .Values.logVolume.installPV -}}
apiVersion: v1
kind: PersistentVolume
metadata:
Expand Down
2 changes: 1 addition & 1 deletion airflow/templates/pvc-log.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.dagVolume.installPVC -}}
{{- if .Values.logVolume.installPVC -}}
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
Expand Down
2 changes: 1 addition & 1 deletion airflow/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ namespace: default
google:
project:
region:
databaseInstance: airflow2
databaseInstance: airflow
databaseName: airflow

createWorkerRBAC: true
Expand Down
2 changes: 2 additions & 0 deletions dags/test-python.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
for i in range(5):
print("testing")
9 changes: 5 additions & 4 deletions gcloud-sql-k8s-install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ case ${i} in
-region=*|--region=*)
REGION="${i#*=}"
;;
-gce_zone=*|--gce_zone=*)
-gce-zone=*|--gce-zone=*)
GCE_ZONE="${i#*=}"
;;
-database-instance-name=*|--database-instance-name=*)
Expand All @@ -55,7 +55,7 @@ gcloud config set container/new_scopes_behavior true
#https://cloud.google.com/filestore/docs/quickstart-gcloud
# If not creating, see the readme for how to create your own single-file NFS server
CREATE_CLOUD_FILESTORE=TRUE
CLOUD_FILESTORE_NAME=airflow-dags
CLOUD_FILESTORE_NAME=airflow
# The name of the mount directory on cloud filestore (referenced in helm chart)
CLOUD_FILESTORE_SHARE_NAME="airflow"
# Use default so that it is on the same VPC as most of your other resources
Expand All @@ -74,7 +74,7 @@ CREATE_GOOGLE_STORAGE_BUCKET=FALSE
GOOGLE_LOG_STORAGE_BUCKET=$PROJECT-airflow

#### DATABASE OPTIONS ####
CREATE_CLOUDSQL_DATABASE=TRUE
CREATE_CLOUDSQL_DATABASE=FALSE
ACTIVATION_POLICY=always
if [ HIGHLY_AVAILABLE = "TRUE" ]
then
Expand Down Expand Up @@ -310,6 +310,7 @@ FERNET_KEY=$(dd if=/dev/urandom bs=32 count=1 2>/dev/null | openssl base64)
# If you want to save the secret below for future reference
# You can add a --output jsonpath-file=airflow-secret.json to the end
# kubectl create secret generic --help
# The google logs storage bucket is added for convenience but is ignored in the chart if .Values.airflowCfg.remoteLogging isn't set to true

kubectl create secret generic airflow \
--from-literal=fernet-key=$FERNET_KEY \
Expand Down Expand Up @@ -337,8 +338,8 @@ fi
if [ $CREATE_CLOUD_FILESTORE = "TRUE" ]
then
gcloud beta filestore instances create $CLOUD_FILESTORE_NAME \
--location $CLOUD_FILESTORE_ZONE \
--project=$PROJECT \
--location=$CLOUD_FILESTORE_ZONE \
--tier=$CLOUD_FILESTORE_TIER \
--file-share=name=$CLOUD_FILESTORE_SHARE_NAME,capacity=$CLOUD_FILESTORE_CAPACITY \
--network=name=$CLOUD_FILESTORE_NETWORK,reserved-ip-range=$CLOUD_FILESTORE_RESERVED_IP
Expand Down
46 changes: 46 additions & 0 deletions kubernetes-yaml/nginx-pod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
volumeMounts:
- name: airflow-initial
mountPath: /dags
volumes:
- name: airflow-initial
persistentVolumeClaim:
claimName: airflow-initial
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: airflow-initial
spec:
capacity:
storage: 10Gi
persistentVolumeReclaimPolicy: Retain
accessModes:
- ReadWriteMany
nfs:
server: 10.0.0.2
path: /airflow

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: airflow-initial
spec:
storageClassName: ""
accessModes:
# accessModes do not enforce access right, but rather act as labels to match a PV to a PVC.
- "ReadWriteMany"
volumeName: airflow-initial
resources:
requests:
storage: 10Gi
17 changes: 17 additions & 0 deletions kubernetes-yaml/nginx.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
volumeMounts:
- name: airflow-initial
mountPath: /dags
volumes:
- name: airflow-initial
persistentVolumeClaim:
claimName: airflow
10 changes: 8 additions & 2 deletions my-values.example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,14 @@ webScheduler:
dagVolume:
installPV: true
installPVC: true
nfsServer: "12.345.6.7"
nfsPath: /dags
nfsServer: "10.0.0.2"
nfsPath: /airflow

logVolume:
nfsServer: "10.0.0.2"
nfsPath: /airflow
installPV: true
installPVC: true

createWorkerRBAC: true
installPostgresService: true
Expand Down
2 changes: 0 additions & 2 deletions scripts/tidying-up-private.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,3 @@ DATABASE_INSTANCE_NAME=airflow
--gce_zone=$GCE_ZONE \
--region=$REGION \
--database_instance_name=$DATABASE_INSTANCE_NAME


8 changes: 7 additions & 1 deletion tidying-up.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ STORAGE_ROLE='roles/storage.admin'

NFS_DEPLOYMENT_NAME=dags-airflow

CLOUD_FILESTORE_INSTANCE=airflow
CLOUD_FILESTORE_LOCATION=europe-west1-b
PROJECT=icabbi-test-210421

for i in "$@"
do
case ${i} in
Expand Down Expand Up @@ -48,7 +52,9 @@ gcloud iam service-accounts delete $SERVICE_ACCOUNT_NAME@$PROJECT.iam.gserviceac

gsutil rm -r gs://$PROJECT-airflow

gsutil rm -r gs://$PROJECT-airflow
gcloud beta filestore instances delete $CLOUD_FILESTORE_INSTANCE \
--location=$CLOUD_FILESTORE_LOCATION \
--project=$PROJECT

### Permission denied, so had to do this in the dashboard
gcloud iam service-accounts remove-iam-policy-binding $SERVICE_ACCOUNT_FULL \
Expand Down

0 comments on commit 68778e5

Please sign in to comment.