diff --git a/Readme.md b/Readme.md index c7493f9..70f9e03 100644 --- a/Readme.md +++ b/Readme.md @@ -1,23 +1,30 @@ ## Cost-effect, scalable and stateless airflow -Deploy an auto-scaling, stateless airflow cluster with the kubernetes executor and CloudSQL with an SSL airflow admin page, google Oauth2 login, and an NFS server for dags in under 20 minutes. Airflow logs are stored on a google cloud bucket. The monthly cost is approximately $150 fixed, plus $0.015 per vCPU hour : +Deploy a highly-available, auto-scaling, stateless airflow cluster with the kubernetes executor and CloudSQL. This also includes an SSL airflow admin page, google Oauth2 login, and Cloud Filestore for storing dags and logs in under 20 minutes. The monthly fixed cost is approximately $150 at the cheapest to $500/month for HA, plus $0.015 per vCPU hour : + + Cheapest: * $30/month for pre-emptible scheduler/web server node * $70/month for 1 core CloudSQL instance * $50/month for logs and storage -Cost per CPU Hour: + Cost per CPU Hour: * Auto-scaling, pre-emptible `n1-highcpu-4` cost of $30/month, or $40/month assuming 75% utilisation. * $40/(730 hours per month * 4 vCPU) = $0.015/vCPU hour -This calculation assumes you have idempotent dags, for non-idempotent dags the cost is circa $250/month + $0.05/vCPU hour. This compares with approximately $300 + $0.20/(vCPU + DB) hour with Cloud Composer . This tutorial installs on the free Google account ($300 over 12 months). +This calculation assumes you have idempotent dags, for non-idempotent dags the cost is circa $0.05/vCPU hour. This compares with approximately $300 + $0.20/(vCPU + DB)hour with Cloud Composer . This tutorial installs on the free Google account ($300 over 12 months). ## Installation instructions ![airflow-gke-deployed](images/airflow-gke.png "Airflow GKE Helm") +Pre-requisites: + +* Ensure you have helm (v2.9.1), kubectl (v1.11.0), openssl, gcloud SDK (v208.0.1) +* Ensure the Cloud SQL Admin API has been enabled on your project () + Installation instructions: ```bash @@ -25,12 +32,6 @@ git clone https://github.com/EamonKeane/airflow-GKE-k8sExecutor-helm.git cd airflow-GKE-k8sExecutor-helm ``` - -cloud-filestore-location=*|--cloud-filestore-location=*) - CLOUD_FILESTORE_LOCATION="${i#*=}" - ;; - -highly-available-=*|--highly-available=*) - HIGHLY_AVAILABLE="${i#*=}" - ```bash # NOTE cloud filestore is only available in the following areas, so choose another region as necessary if your currently configured region is not listed # asia-eas1, europe-west1, europe-west3, europe-west4, us-central1 @@ -45,20 +46,65 @@ HIGHLY_AVAILABLE=TRUE ./gcloud-sql-k8s-install.sh \ --project=$PROJECT \ --account=$ACCOUNT \ - --gce_zone=$GCE_ZONE \ + --gce-zone=$GCE_ZONE \ --region=$REGION \ --database-instance-name=$DATABASE_INSTANCE_NAME \ - --cloud-filestore-zone=$CLOUD_FILESTORE_LOCATION \ + --cloud-filestore-zone=$CLOUD_FILESTORE_ZONE \ --highly-available=$HIGHLY_AVAILABLE ``` -CLOUD_FILESTORE_IP=$(gcloud beta filestore instances describe airflow-dags \ +CLOUD_FILESTORE_IP=$(gcloud beta filestore instances describe airflow \ --project=$PROJECT \ --location=$CLOUD_FILESTORE_ZONE \ --format json | jq .networks[0].ipAddresses[0] --raw-output) +For airflow to be able to write to Cloud Filestore, you need to change the permissions on the NFS(). +Follow the instructions below [Cloud Filestore Permissions](#Setting-file-permissions-on-Cloud-Filestore): + If not using Cloud Filestore, see below for the installation instructions for installing a Google Cloud [NFS Server](#NFS-Server). +# Setting file permissions on Cloud Filestore + +Create a VM to mount the file share and make the required changes. + +```bash +VM_NAME=change-permissions +gcloud compute --project=$PROJECT instances create $VM_NAME --zone=$GCE_ZONE +``` + +SSH into the machine + +```bash +gcloud compute ssh $VM_NAME --zone=$GCE_ZONE --project=$PROJECT +``` + +Copy and paste the following into the terminal: + +```bash +sudo apt-get -y update +sudo apt-get -y install nfs-common +``` + +Then copy and paste the following (substituting your `$CLOUD_FILESTORE_IP` for the ip address): + +```bash +CLOUD_FILESTORE_IP= +sudo mkdir /mnt/test +sudo mount $CLOUD_FILESTORE_IP:/airflow /mnt/test +sudo mkdir /mnt/test/dags +sudo mkdir /mnt/test/logs +sudo chmod go+rw /mnt/test/dags +sudo chmod go+rw /mnt/test/logs +``` + +Then delete the VM: + +```bash +gcloud compute instances delete $VM_NAME --zone=$GCE_ZONE --project=$PROJECT +``` + +## Install the helm chart: + ```bash helm upgrade \ --install \ @@ -72,8 +118,17 @@ helm upgrade \ You can change airflow/airflow.cfg and re-run the above `helm upgrade --install` command to redeploy the changes. This takes approximately 30 seconds. -Set `webScheduler.web.authenticate` to True and complete the section for SSL if you want this [SSL UI](#Exposing-oauth2-Google-ingress-with-cert-manager-and-nginx-ingress). -Alternatively to view the Dashboard UI with no authentication or SSL view: +Quickly copy the example dags folder here to the NFS by using `kubectl cp`: + +```bash +NAMESPACE=default +DAGS_FOLDER_LOCAL=/Users/Eamon/kubernetes/airflow-GKE-k8sExecutor-helm/dags +DAGS_FOLDER_REMOTE=/usr/local/airflow/dags +export POD_NAME=$(kubectl get pods --namespace $NAMESPACE -l "app=airflow,tier=scheduler" -o jsonpath="{.items[0].metadata.name}") +kubectl cp $DAGS_FOLDER_LOCAL $POD_NAME:$DAGS_FOLDER_REMOTE +``` + +View the dashboard using the instructions below and you should see the examples in the dags folder of this repo. ```bash export POD_NAME=$(kubectl get pods --namespace default -l "app=airflow,tier=web" -o jsonpath="{.items[0].metadata.name}") @@ -81,6 +136,9 @@ echo "Visit http://127.0.0.1:8080 to use your application" kubectl port-forward $POD_NAME 8080:8080 ``` +Set `webScheduler.web.authenticate` to True and complete the section for SSL if you want this [SSL UI](#Exposing-oauth2-Google-ingress-with-cert-manager-and-nginx-ingress). +Alternatively to view the Dashboard UI with no authentication or SSL view: + ## SSL Admin UI Webpage To expose the web server behind a https url with google oauth, please see the section for google-oauth, cert-manager and nginx-ingress install instructions [SSL UI](#Exposing-oauth2-Google-ingress-with-cert-manager-and-nginx-ingress). @@ -94,8 +152,8 @@ The easiest way to tidy-up is to delete the project and make a new one if re-dep There are a few elements to the chart: * This chart only focuses on the kubernetes executor and is tailored to run on GKE, but with some effort could be modified to run on premise or EKS/AKS. -* An NFS server is used for dags as GCE does not have a ReadWriteMany option yet (Cloud Filestore coming soon will be similar to Amazon Elastic File System and Azure File System. You need to populate this separately using e.g. Jenkins. -* Pre-install hooks add the airflow-RBAC account, dags PV, dags PVC and CloudSQL service. If the step fails at this point, you will need to remove everything before running helm again. See `tidying-up.sh` for details. +* Google Cloud Filestore (beta - equivalent of EFS and AFS on AWS and Azure respectively). You need to populate this separately using e.g. Jenkins (see sample jenkins file and instructions below [Jenkins](#Setup-Jenkins-to-sync-dags)). +* Pre-install hooks add the airflow-RBAC account, dags/logs PV, dags/logs PVC and CloudSQL service. If the step fails at this point, you will need to remove everything before running helm again. See `tidying-up.sh` for details. * Pre-install and pre-upgrade hook to run the alembic migrations * Separate, templated airflow.cfg a change of which triggers a redeployment of both the web scheduler and the web server. This is due to the name of the configmap being appended with the current seconds (-{{ .Release.Time.Seconds }}) so a new configmap gets deployed each time. You may want to delete old configmaps from time to time. @@ -256,16 +314,6 @@ serverPath: $STORAGE_NAME Set up Jenkins to trigger a build on each git push of this repository (see here for example instructions: ). The dags folder will then appear synced in your webscheduler pods. -## Copy files to NFS - -```bash -NAMESPACE=airflow -DAGS_FOLDER_LOCAL=/Users/Eamon/kubernetes/airflow-GKE-k8sExecutor-helm/dags -DAGS_FOLDER_REMOTE=/usr/local/airflow/dags -export POD_NAME=$(kubectl get pods --namespace $NAMESPACE -l "app=airflow,tier=scheduler" -o jsonpath="{.items[0].metadata.name}") -kubectl cp $DAGS_FOLDER_LOCAL $POD_NAME:$DAGS_FOLDER_REMOTE -``` - ## NFS Server ```bash @@ -306,14 +354,3 @@ dagVolume: ``` Setup jenkins per the instructions [below](#Setup-Jenkins-to-sync-dags), or alternatively, copy the example pod operator in this repo to the $STORAGE_NAME of the NFS server (you can get connection instructions at this url ) - -# Setting file permissions - -Shell into pod and change mode - -```bash -sudo chmod go+rwx /dags -sudo chmod go+rwx /logs -sudo useradd airflow -usermod -a -G root airflow -``` diff --git a/airflow/templates/pv-log.yaml b/airflow/templates/pv-log.yaml index fcbced6..84d5a6d 100644 --- a/airflow/templates/pv-log.yaml +++ b/airflow/templates/pv-log.yaml @@ -1,4 +1,4 @@ -{{- if .Values.dagVolume.installPV -}} +{{- if .Values.logVolume.installPV -}} apiVersion: v1 kind: PersistentVolume metadata: diff --git a/airflow/templates/pvc-log.yaml b/airflow/templates/pvc-log.yaml index 9eadde6..2c19861 100644 --- a/airflow/templates/pvc-log.yaml +++ b/airflow/templates/pvc-log.yaml @@ -1,4 +1,4 @@ -{{- if .Values.dagVolume.installPVC -}} +{{- if .Values.logVolume.installPVC -}} kind: PersistentVolumeClaim apiVersion: v1 metadata: diff --git a/airflow/values.yaml b/airflow/values.yaml index 2091849..17bd70f 100644 --- a/airflow/values.yaml +++ b/airflow/values.yaml @@ -5,7 +5,7 @@ namespace: default google: project: region: - databaseInstance: airflow2 + databaseInstance: airflow databaseName: airflow createWorkerRBAC: true diff --git a/dags/test-python.py b/dags/test-python.py new file mode 100644 index 0000000..7f986d5 --- /dev/null +++ b/dags/test-python.py @@ -0,0 +1,2 @@ +for i in range(5): + print("testing") \ No newline at end of file diff --git a/gcloud-sql-k8s-install.sh b/gcloud-sql-k8s-install.sh index 2cdea73..bb0cfba 100755 --- a/gcloud-sql-k8s-install.sh +++ b/gcloud-sql-k8s-install.sh @@ -34,7 +34,7 @@ case ${i} in -region=*|--region=*) REGION="${i#*=}" ;; - -gce_zone=*|--gce_zone=*) + -gce-zone=*|--gce-zone=*) GCE_ZONE="${i#*=}" ;; -database-instance-name=*|--database-instance-name=*) @@ -55,7 +55,7 @@ gcloud config set container/new_scopes_behavior true #https://cloud.google.com/filestore/docs/quickstart-gcloud # If not creating, see the readme for how to create your own single-file NFS server CREATE_CLOUD_FILESTORE=TRUE -CLOUD_FILESTORE_NAME=airflow-dags +CLOUD_FILESTORE_NAME=airflow # The name of the mount directory on cloud filestore (referenced in helm chart) CLOUD_FILESTORE_SHARE_NAME="airflow" # Use default so that it is on the same VPC as most of your other resources @@ -74,7 +74,7 @@ CREATE_GOOGLE_STORAGE_BUCKET=FALSE GOOGLE_LOG_STORAGE_BUCKET=$PROJECT-airflow #### DATABASE OPTIONS #### -CREATE_CLOUDSQL_DATABASE=TRUE +CREATE_CLOUDSQL_DATABASE=FALSE ACTIVATION_POLICY=always if [ HIGHLY_AVAILABLE = "TRUE" ] then @@ -310,6 +310,7 @@ FERNET_KEY=$(dd if=/dev/urandom bs=32 count=1 2>/dev/null | openssl base64) # If you want to save the secret below for future reference # You can add a --output jsonpath-file=airflow-secret.json to the end # kubectl create secret generic --help +# The google logs storage bucket is added for convenience but is ignored in the chart if .Values.airflowCfg.remoteLogging isn't set to true kubectl create secret generic airflow \ --from-literal=fernet-key=$FERNET_KEY \ @@ -337,8 +338,8 @@ fi if [ $CREATE_CLOUD_FILESTORE = "TRUE" ] then gcloud beta filestore instances create $CLOUD_FILESTORE_NAME \ + --location $CLOUD_FILESTORE_ZONE \ --project=$PROJECT \ - --location=$CLOUD_FILESTORE_ZONE \ --tier=$CLOUD_FILESTORE_TIER \ --file-share=name=$CLOUD_FILESTORE_SHARE_NAME,capacity=$CLOUD_FILESTORE_CAPACITY \ --network=name=$CLOUD_FILESTORE_NETWORK,reserved-ip-range=$CLOUD_FILESTORE_RESERVED_IP diff --git a/kubernetes-yaml/nginx-pod.yaml b/kubernetes-yaml/nginx-pod.yaml new file mode 100644 index 0000000..011f6a3 --- /dev/null +++ b/kubernetes-yaml/nginx-pod.yaml @@ -0,0 +1,46 @@ +apiVersion: v1 +kind: Pod +metadata: + name: nginx +spec: + containers: + - name: nginx + image: nginx:1.7.9 + ports: + - containerPort: 80 + volumeMounts: + - name: airflow-initial + mountPath: /dags + volumes: + - name: airflow-initial + persistentVolumeClaim: + claimName: airflow-initial +--- +apiVersion: v1 +kind: PersistentVolume +metadata: + name: airflow-initial +spec: + capacity: + storage: 10Gi + persistentVolumeReclaimPolicy: Retain + accessModes: + - ReadWriteMany + nfs: + server: 10.0.0.2 + path: /airflow + +--- +kind: PersistentVolumeClaim +apiVersion: v1 +metadata: + name: airflow-initial +spec: + storageClassName: "" + accessModes: + # accessModes do not enforce access right, but rather act as labels to match a PV to a PVC. + - "ReadWriteMany" + volumeName: airflow-initial + resources: + requests: + storage: 10Gi \ No newline at end of file diff --git a/kubernetes-yaml/nginx.yaml b/kubernetes-yaml/nginx.yaml new file mode 100644 index 0000000..da397e3 --- /dev/null +++ b/kubernetes-yaml/nginx.yaml @@ -0,0 +1,17 @@ +apiVersion: v1 +kind: Pod +metadata: + name: nginx +spec: + containers: + - name: nginx + image: nginx:1.7.9 + ports: + - containerPort: 80 + volumeMounts: + - name: airflow-initial + mountPath: /dags + volumes: + - name: airflow-initial + persistentVolumeClaim: + claimName: airflow \ No newline at end of file diff --git a/my-values.example.yaml b/my-values.example.yaml index ffd2946..d0a49a4 100644 --- a/my-values.example.yaml +++ b/my-values.example.yaml @@ -20,8 +20,14 @@ webScheduler: dagVolume: installPV: true installPVC: true - nfsServer: "12.345.6.7" - nfsPath: /dags + nfsServer: "10.0.0.2" + nfsPath: /airflow + +logVolume: + nfsServer: "10.0.0.2" + nfsPath: /airflow + installPV: true + installPVC: true createWorkerRBAC: true installPostgresService: true diff --git a/scripts/tidying-up-private.sh b/scripts/tidying-up-private.sh index b53eddf..16dd18b 100755 --- a/scripts/tidying-up-private.sh +++ b/scripts/tidying-up-private.sh @@ -10,5 +10,3 @@ DATABASE_INSTANCE_NAME=airflow --gce_zone=$GCE_ZONE \ --region=$REGION \ --database_instance_name=$DATABASE_INSTANCE_NAME - - diff --git a/tidying-up.sh b/tidying-up.sh index c2d0ccb..bd8d435 100755 --- a/tidying-up.sh +++ b/tidying-up.sh @@ -18,6 +18,10 @@ STORAGE_ROLE='roles/storage.admin' NFS_DEPLOYMENT_NAME=dags-airflow +CLOUD_FILESTORE_INSTANCE=airflow +CLOUD_FILESTORE_LOCATION=europe-west1-b +PROJECT=icabbi-test-210421 + for i in "$@" do case ${i} in @@ -48,7 +52,9 @@ gcloud iam service-accounts delete $SERVICE_ACCOUNT_NAME@$PROJECT.iam.gserviceac gsutil rm -r gs://$PROJECT-airflow -gsutil rm -r gs://$PROJECT-airflow +gcloud beta filestore instances delete $CLOUD_FILESTORE_INSTANCE \ + --location=$CLOUD_FILESTORE_LOCATION \ + --project=$PROJECT ### Permission denied, so had to do this in the dashboard gcloud iam service-accounts remove-iam-policy-binding $SERVICE_ACCOUNT_FULL \