From 08c1a47e41e5572469b68193ae691d8250c1d1de Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Sun, 12 Aug 2018 11:00:34 +0800 Subject: [PATCH 01/13] update summary --- SUMMARY.md | 1 - kubernetes/operator-architecture.md | 79 ------------- kubernetes/use-operator-manage-prometheus.md | 116 +++++++++++++++++++ 3 files changed, 116 insertions(+), 80 deletions(-) diff --git a/SUMMARY.md b/SUMMARY.md index edffb08..40ba1af 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -78,7 +78,6 @@ * [监控Kubernetes集群](./kubernetes/use-prometheus-monitor-kubernetes.md) * [使用Grafana创建可视化仪表盘](./kubernetes/use-grafana-in-k8s.md) * [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md) - * [Prometheus Operator架构](./kubernetes/operator-architecture.md) * [使用Prometheus Operator监控用户应用](./kubernetes/use-operator-monitor-app.md) * [使用Prometheus Operator监控集群](./kubernetes/use-operator-monitor-app.md) * [Prometheus Operator下的告警处理](./kubernetes/use-operator-alerting.md) diff --git a/kubernetes/operator-architecture.md b/kubernetes/operator-architecture.md index 5ae777b..e69de29 100644 --- a/kubernetes/operator-architecture.md +++ b/kubernetes/operator-architecture.md @@ -1,79 +0,0 @@ -## Prometheus Opterator架构 - -Prometheus Operator建立在Kubernetes的资源以及控制器的概念之上,通过在Kubernetes中添加自定义资源类型,通过声明式的方式,Operator可以自动部署和管理Prometheus实例的运行状态,并且根据监控目标管理并重新加载Prometheus的配置文件,大大简化Prometheus这类有状态应用运维管理的复杂度。 - -![Prometheus Operator架构](http://p2n2em8ut.bkt.clouddn.com/prometheus-architecture.png) - -如上所示,是Prometheus Operator的架构示意图。为了能够通过声明式的对Prometheus进行自动化管理。Prometheus Operator通过自定义资源类型的方式定义了一下3个主要自定义资源类型: - -* Prometheus:声明式的管理Prometheus实例 -* ServiceMonitor:声明式的管理监控目标,并自定生成监控配置文件 -* Alertmanager:声明式的管理Alertmanager实例 - -除了上图中展示的3大类型以外,还有自定义资源类型PrometheusRule,用于声明式的管理高级规则。 - -### Prometheus - -自定义资源`Prometheus`中声明式的定义了在Kubernetes集群中所需运行的Prometheus的设置。如下所示: - -``` -apiVersion: monitoring.coreos.com/v1 -kind: Prometheus -metadata: - name: prometheus -spec: - serviceMonitorSelector: - matchLabels: - team: frontend - resources: - requests: - memory: 400Mi -``` - -在该Yaml中我们可以定义Prometheus实例所使用的资源,以及需要关联的ServiceMonitor等。除此以外,还可以定义如Replica,Storage,以及关联的Alertmanager实例等信息。 - -对于每一个Promtheus资源而言,Operator会自动通过StatefulSet的方式部署Prometheus实例。Operator会根据ServiceMonitor定义的自动将Prometheus的配置信息通过Secret的方式进行保存。当ServiceMonitor或者Promtheus更新时,Operator会确保Prometheus实例自动加载最新的配置内容。 - -如果Prometheus未关联ServiceMonitor,用户则可以自行管理Secret中的配置内容。Operator会确保这些配置内容被加载到Prometheus实例当中。 - -### ServiceMonitor - -通过自定义资源类型`ServiceMonitor`用户可以通过声明式的方式定义需要监控集群中的哪些资源。如下所示: - -``` -apiVersion: monitoring.coreos.com/v1 -kind: ServiceMonitor -metadata: - name: example-app - labels: - team: frontend -spec: - selector: - matchLabels: - app: example-app - endpoints: - - port: web -``` - -在ServiceMonitor中声明了如何从标签选择器匹配到的这些服务中获取监控指标数据。通过将ServiceMonitor关联到Prometheus从而实现对监控配置的自动管理。在默认情况下ServiceMonitor与Prometheus必须位于相同的命名空间中,而当Prometheus需要跨命名空间获取监控数据时,可以在ServiceMonitor中声明namespaceSelector,如下所示: - -``` -spec: - namespaceSelector: - any: true -``` - -### Alertmanager - -通过自定义资源类型`Alertmanager`,用户可以声明式的定义在Kubernetes集群中所需要运行的Alertmanager信息,如下所示: - -``` -apiVersion: monitoring.coreos.com/v1 -kind: Alertmanager -metadata: - name: example -spec: - replicas: 3 -``` - -在Yaml文件中,我们可以定义Alertmanager的实例数量以及持久化相关的配置,Operator会自动通过StatefulSet的方式部署Alertmanager实例,对于当存在多个Alertmanager副本时,Operator会自动以高可用的模式运行Alertmanager实例。而Alertmanager的配置文件则通过Secret的方式进行管理 \ No newline at end of file diff --git a/kubernetes/use-operator-manage-prometheus.md b/kubernetes/use-operator-manage-prometheus.md index fe40c85..9bfeed6 100644 --- a/kubernetes/use-operator-manage-prometheus.md +++ b/kubernetes/use-operator-manage-prometheus.md @@ -4,3 +4,119 @@ 为了能够自动化的处理这些复杂操作,CoreOS引入了Opterator。简单来说,Opterator就是通过扩展Kubernetes API,帮助用户部署,配置和管理复杂的有状态应用程序示例,通过软件定义的方式来管理运维操作。 +## 安装Prometheus Operator + +在Kubernetes中安装Prometheus Operator非常简单,用户可以从以下地址中过去Prometheus Operator的源码: + +``` +git clone https://github.com/coreos/prometheus-operator.git +``` + +通过运行一下命令安装Prometheus Operator的Deployment实例: + +``` +kubectl apply -f bundle.yaml +``` + +由于Prometheus Operator中需要获取当前集群中运行资源的运行情况,因此在bundle.yaml中定义了名为prometheus-operator的ServiceAccount并且绑定了相应的集群访问权限。 + +## Prometheus Opterator架构 + +Prometheus Operator建立在Kubernetes的资源以及控制器的概念之上,通过在Kubernetes中添加自定义资源类型,通过声明式的方式,Operator可以自动部署和管理Prometheus实例的运行状态,并且根据监控目标管理并重新加载Prometheus的配置文件,大大简化Prometheus这类有状态应用运维管理的复杂度。 + +![Prometheus Operator架构](http://p2n2em8ut.bkt.clouddn.com/prometheus-architecture.png) + +如上所示,是Prometheus Operator的架构示意图。为了能够通过声明式的对Prometheus进行自动化管理。Prometheus Operator通过自定义资源类型的方式定义了一下3个主要自定义资源类型: + +* Prometheus + +自定义资源`Prometheus`中声明式的定义了在Kubernetes集群中所需运行的Prometheus的设置。如下所示: + +``` +apiVersion: monitoring.coreos.com/v1 +kind: Prometheus +metadata: + name: prometheus +spec: + serviceMonitorSelector: + matchLabels: + team: frontend + resources: + requests: + memory: 400Mi +``` + +在该Yaml中我们可以定义Prometheus实例所使用的资源,以及需要关联的ServiceMonitor等。除此以外,还可以定义如Replica,Storage,以及关联的Alertmanager实例等信息。 + +对于每一个Promtheus资源而言,Operator会自动通过StatefulSet的方式部署Prometheus实例。Operator会根据ServiceMonitor定义的自动将Prometheus的配置信息通过Secret的方式进行保存。当ServiceMonitor或者Promtheus更新时,Operator会确保Prometheus实例自动加载最新的配置内容。 + +如果Prometheus未关联ServiceMonitor,用户则可以自行管理Secret中的配置内容。Operator会确保这些配置内容被加载到Prometheus实例当中。 + +* ServiceMonitor + +通过自定义资源类型`ServiceMonitor`用户可以通过声明式的方式定义需要监控集群中的哪些资源。如下所示: + +``` +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: example-app + labels: + team: frontend +spec: + selector: + matchLabels: + app: example-app + endpoints: + - port: web +``` + +在ServiceMonitor中声明了如何从标签选择器匹配到的这些服务中获取监控指标数据。通过将ServiceMonitor关联到Prometheus从而实现对监控配置的自动管理。在默认情况下ServiceMonitor与Prometheus必须位于相同的命名空间中,而当Prometheus需要跨命名空间获取监控数据时,可以在ServiceMonitor中声明namespaceSelector,如下所示: + +``` +spec: + namespaceSelector: + any: true +``` + +* Alertmanager + +通过自定义资源类型`Alertmanager`,用户可以声明式的定义在Kubernetes集群中所需要运行的Alertmanager信息,如下所示: + +``` +apiVersion: monitoring.coreos.com/v1 +kind: Alertmanager +metadata: + name: example +spec: + replicas: 3 +``` + +在Yaml文件中,我们可以定义Alertmanager的实例数量以及持久化相关的配置,Operator会自动通过StatefulSet的方式部署Alertmanager实例,对于当存在多个Alertmanager副本时,Operator会自动以高可用的模式运行Alertmanager实例。而Alertmanager的配置文件则通过Secret的方式进行管理 + +除了以上3大类型以外,还有自定义资源类型PrometheusRule,用于声明式的管理高级规则。 + +如果查看Prometheus Operator Pod实例的日志,在初始化完成后可以看到以下输出内容: + +``` +ts=2018-08-12T02:57:38.014620397Z caller=main.go:130 msg="Starting Prometheus Operator version '0.23.0'." +level=info ts=2018-08-12T02:57:38.119754166Z caller=operator.go:176 component=alertmanageroperator msg="connection established" cluster-version=v1.10.4 +level=info ts=2018-08-12T02:57:38.119944014Z caller=operator.go:314 component=prometheusoperator msg="connection established" cluster-version=v1.10.4 +level=info ts=2018-08-12T02:57:38.604914616Z caller=operator.go:1338 component=prometheusoperator msg="CRD updated" crd=Prometheus +level=info ts=2018-08-12T02:57:38.604978262Z caller=operator.go:566 component=alertmanageroperator msg="CRD updated" crd=Alertmanager +level=info ts=2018-08-12T02:57:38.617738839Z caller=operator.go:1338 component=prometheusoperator msg="CRD updated" crd=ServiceMonitor +level=info ts=2018-08-12T02:57:38.710804217Z caller=operator.go:1338 component=prometheusoperator msg="CRD updated" crd=PrometheusRule +level=info ts=2018-08-12T02:57:41.622981601Z caller=operator.go:192 component=alertmanageroperator msg="CRD API endpoints ready" +level=info ts=2018-08-12T02:57:47.755480463Z caller=operator.go:330 component=prometheusoperator msg="CRD API endpoints ready" +``` + +查看集群中的自定义资源内容: + +``` +$ kubectl get customresourcedefinition +NAME AGE +alertmanagers.monitoring.coreos.com 6d +prometheuses.monitoring.coreos.com 6d +prometheusrules.monitoring.coreos.com 6d +servicemonitors.monitoring.coreos.com 6d +``` \ No newline at end of file From bfe06b1001fc64f3d8f9bf86b295914fcb05c9a6 Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Sun, 12 Aug 2018 11:01:07 +0800 Subject: [PATCH 02/13] update READE.md --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index b2ee765..cef167d 100644 --- a/README.md +++ b/README.md @@ -73,8 +73,7 @@ Prometheus操作指南:云原生监控之道 * [监控Kubernetes集群](./kubernetes/use-prometheus-monitor-kubernetes.md) * [使用Grafana创建可视化仪表盘](./kubernetes/use-grafana-in-k8s.md) * [使用Prometheus Operator](./kubernetes/use-prometheus-operator.md) - * [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md) - * [Prometheus Operator架构](./kubernetes/operator-architecture.md) + * [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md) * [使用Prometheus Operator监控用户应用](./kubernetes/use-operator-monitor-app.md) * [使用Prometheus Operator监控集群](./kubernetes/use-operator-monitor-app.md) * [Prometheus Operator下的告警处理](./kubernetes/use-operator-alerting.md) From bc3edc11c444079a22b8ca88e4182b825ab2193f Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Sun, 12 Aug 2018 11:14:25 +0800 Subject: [PATCH 03/13] updae --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index cef167d..c16c45b 100644 --- a/README.md +++ b/README.md @@ -72,8 +72,7 @@ Prometheus操作指南:云原生监控之道 * [Kubernetes下的服务发现](./kubernetes/service-discovery-with-kubernetes.md) * [监控Kubernetes集群](./kubernetes/use-prometheus-monitor-kubernetes.md) * [使用Grafana创建可视化仪表盘](./kubernetes/use-grafana-in-k8s.md) - * [使用Prometheus Operator](./kubernetes/use-prometheus-operator.md) - * [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md) + * [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md) * [使用Prometheus Operator监控用户应用](./kubernetes/use-operator-monitor-app.md) * [使用Prometheus Operator监控集群](./kubernetes/use-operator-monitor-app.md) * [Prometheus Operator下的告警处理](./kubernetes/use-operator-alerting.md) From b2c9cb57d4218f82a7d7cf53af3298a12d9f9179 Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Sun, 12 Aug 2018 18:05:14 +0800 Subject: [PATCH 04/13] use operator monitor app --- .../prometheus-operator/00prometheus.yaml | 9 + .../prometheus-operator/01prometheus.yaml | 12 ++ .../example-app-monitor.yaml | 12 ++ examples/prometheus-operator/example-app.yaml | 30 ++++ .../prometheus-operator.yaml | 129 ++++++++++++++ .../prometheus-rbac-setup.yaml | 36 ++++ .../prometheus-operator/prometheus-svc.yaml | 13 ++ kubernetes/use-operator-monitor-app.md | 163 +++++++++++++++++- 8 files changed, 400 insertions(+), 4 deletions(-) create mode 100644 examples/prometheus-operator/00prometheus.yaml create mode 100644 examples/prometheus-operator/01prometheus.yaml create mode 100644 examples/prometheus-operator/example-app-monitor.yaml create mode 100644 examples/prometheus-operator/example-app.yaml create mode 100644 examples/prometheus-operator/prometheus-operator.yaml create mode 100644 examples/prometheus-operator/prometheus-rbac-setup.yaml create mode 100644 examples/prometheus-operator/prometheus-svc.yaml diff --git a/examples/prometheus-operator/00prometheus.yaml b/examples/prometheus-operator/00prometheus.yaml new file mode 100644 index 0000000..271176a --- /dev/null +++ b/examples/prometheus-operator/00prometheus.yaml @@ -0,0 +1,9 @@ +apiVersion: monitoring.coreos.com/v1 +kind: Prometheus +metadata: + name: prometheus + labels: + prometheus: prometheus +spec: + replicas: 2 + serviceAccountName: prometheus \ No newline at end of file diff --git a/examples/prometheus-operator/01prometheus.yaml b/examples/prometheus-operator/01prometheus.yaml new file mode 100644 index 0000000..ef74d44 --- /dev/null +++ b/examples/prometheus-operator/01prometheus.yaml @@ -0,0 +1,12 @@ +apiVersion: monitoring.coreos.com/v1 +kind: Prometheus +metadata: + name: prometheus + labels: + prometheus: prometheus +spec: + replicas: 2 + serviceAccountName: prometheus + serviceMonitorSelector: + matchLabels: + team: frontend \ No newline at end of file diff --git a/examples/prometheus-operator/example-app-monitor.yaml b/examples/prometheus-operator/example-app-monitor.yaml new file mode 100644 index 0000000..d2a68fc --- /dev/null +++ b/examples/prometheus-operator/example-app-monitor.yaml @@ -0,0 +1,12 @@ +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: example-app + labels: + team: frontend +spec: + selector: + matchLabels: + app: example-app + endpoints: + - port: web diff --git a/examples/prometheus-operator/example-app.yaml b/examples/prometheus-operator/example-app.yaml new file mode 100644 index 0000000..3ea42d8 --- /dev/null +++ b/examples/prometheus-operator/example-app.yaml @@ -0,0 +1,30 @@ +apiVersion: extensions/v1beta1 +kind: Deployment +metadata: + name: example-app +spec: + replicas: 3 + template: + metadata: + labels: + app: example-app + spec: + containers: + - name: example-app + image: fabxc/instrumented_app + ports: + - name: web + containerPort: 8080 +--- +kind: Service +apiVersion: v1 +metadata: + name: example-app + labels: + app: example-app +spec: + selector: + app: example-app + ports: + - name: web + port: 8080 \ No newline at end of file diff --git a/examples/prometheus-operator/prometheus-operator.yaml b/examples/prometheus-operator/prometheus-operator.yaml new file mode 100644 index 0000000..92c537d --- /dev/null +++ b/examples/prometheus-operator/prometheus-operator.yaml @@ -0,0 +1,129 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: prometheus-operator +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: prometheus-operator +subjects: +- kind: ServiceAccount + name: prometheus-operator + namespace: default +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: prometheus-operator +rules: +- apiGroups: + - apiextensions.k8s.io + resources: + - customresourcedefinitions + verbs: + - '*' +- apiGroups: + - monitoring.coreos.com + resources: + - alertmanagers + - prometheuses + - prometheuses/finalizers + - alertmanagers/finalizers + - servicemonitors + - prometheusrules + verbs: + - '*' +- apiGroups: + - apps + resources: + - statefulsets + verbs: + - '*' +- apiGroups: + - "" + resources: + - configmaps + - secrets + verbs: + - '*' +- apiGroups: + - "" + resources: + - pods + verbs: + - list + - delete +- apiGroups: + - "" + resources: + - services + - endpoints + verbs: + - get + - create + - update +- apiGroups: + - "" + resources: + - nodes + verbs: + - list + - watch +- apiGroups: + - "" + resources: + - namespaces + verbs: + - list + - watch +--- +apiVersion: apps/v1beta2 +kind: Deployment +metadata: + labels: + k8s-app: prometheus-operator + name: prometheus-operator + namespace: default +spec: + replicas: 1 + selector: + matchLabels: + k8s-app: prometheus-operator + template: + metadata: + labels: + k8s-app: prometheus-operator + spec: + containers: + - args: + - --kubelet-service=kube-system/kubelet + - -logtostderr=true + - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1 + - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.23.0 + image: quay.io/coreos/prometheus-operator:v0.23.0 + name: prometheus-operator + ports: + - containerPort: 8080 + name: http + resources: + limits: + cpu: 200m + memory: 200Mi + requests: + cpu: 100m + memory: 100Mi + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true + nodeSelector: + beta.kubernetes.io/os: linux + securityContext: + runAsNonRoot: true + runAsUser: 65534 + serviceAccountName: prometheus-operator +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: prometheus-operator + namespace: default diff --git a/examples/prometheus-operator/prometheus-rbac-setup.yaml b/examples/prometheus-operator/prometheus-rbac-setup.yaml new file mode 100644 index 0000000..f6beb7a --- /dev/null +++ b/examples/prometheus-operator/prometheus-rbac-setup.yaml @@ -0,0 +1,36 @@ +apiVersion: v1 +kind: ServiceAccount +metadata: + name: prometheus +--- +apiVersion: rbac.authorization.k8s.io/v1beta1 +kind: ClusterRole +metadata: + name: prometheus +rules: +- apiGroups: [""] + resources: + - nodes + - services + - endpoints + - pods + verbs: ["get", "list", "watch"] +- apiGroups: [""] + resources: + - configmaps + verbs: ["get"] +- nonResourceURLs: ["/metrics"] + verbs: ["get"] +--- +apiVersion: rbac.authorization.k8s.io/v1beta1 +kind: ClusterRoleBinding +metadata: + name: prometheus +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: prometheus +subjects: +- kind: ServiceAccount + name: prometheus + namespace: default \ No newline at end of file diff --git a/examples/prometheus-operator/prometheus-svc.yaml b/examples/prometheus-operator/prometheus-svc.yaml new file mode 100644 index 0000000..5f4864b --- /dev/null +++ b/examples/prometheus-operator/prometheus-svc.yaml @@ -0,0 +1,13 @@ +apiVersion: v1 +kind: Service +metadata: + name: prometheus +spec: + ports: + - name: web + port: 9090 + targetPort: 9090 + protocol: TCP + selector: + prometheus: prometheus + type: ClusterIP \ No newline at end of file diff --git a/kubernetes/use-operator-monitor-app.md b/kubernetes/use-operator-monitor-app.md index 38120d1..3f05f7a 100644 --- a/kubernetes/use-operator-monitor-app.md +++ b/kubernetes/use-operator-monitor-app.md @@ -1,10 +1,165 @@ -## 使用Prometheus Operator监控集群 +# 使用Prometheus Operator监控用户应用 + +本小节将展示,如何通过Prometheus Operator部署Prometheus实例并且实现对部署在Kubernetes中应用程序的监控。 + +## 部署Prometheus Server + +为了能够让Prometheus实例能够正常的使用服务发现能力,我们首先需要基于Kubernetes的RBAC模型为Prometheus创建ServiceAccount并赋予相应的集群访问权限。如下所示: + +``` +apiVersion: v1 +kind: ServiceAccount +metadata: + name: prometheus +--- +apiVersion: rbac.authorization.k8s.io/v1beta1 +kind: ClusterRole +metadata: + name: prometheus +rules: +- apiGroups: [""] + resources: + - nodes + - services + - endpoints + - pods + verbs: ["get", "list", "watch"] +- apiGroups: [""] + resources: + - configmaps + verbs: ["get"] +- nonResourceURLs: ["/metrics"] + verbs: ["get"] +--- +apiVersion: rbac.authorization.k8s.io/v1beta1 +kind: ClusterRoleBinding +metadata: + name: prometheus +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: prometheus +subjects: +- kind: ServiceAccount + name: prometheus + namespace: default +``` + +将以上内容保存为prometheus-rbac-setup.yaml文件,并在Kubrnetes集群中创建相应的资源: + +``` +$ kubectl create -f prometheus-rbac-setup.yaml +serviceaccount "prometheus" created +clusterrole "prometheus" created +clusterrolebinding "prometheus" created +``` + +在上一小节中已经介绍过Prometheus Operator通过在Kubernetes下实现自定义资源类型,将原本需要手动管理和维护的工作,转换为声明式的管理方式,为了创建Prometheus实例,我们需要创建一个类型为Prometheus的资源,如下所示: + +``` +apiVersion: monitoring.coreos.com/v1 +kind: Prometheus +metadata: + name: prometheus + labels: + prometheus: prometheus +spec: + replicas: 2 + serviceAccountName: prometheus +``` + +将文件保存为prometheus.yaml,并且通过Kubectl命令行工具创建相关资源: + +``` +$ kubectl create -f prometheus.yaml +prometheus "prometheus" created +``` + +此时如果查看Prometheus Operator的日志的话,可以看到类似于以下内容: + +``` +level=info ts=2018-08-12T07:43:54.696691736Z caller=operator.go:893 component=prometheusoperator msg="sync prometheus" key=default/prometheus +``` + +Prometheus Operator监听到Prometheus资源的变化后,会通过Statefulset的方式自动创建Prometheus实例,如下所示: + +``` +$ kubectl get statefulsets +NAME DESIRED CURRENT AGE +prometheus-prometheus 2 2 4m +``` + +为了能够访问通过Prometheus Operator创建的Prometheus实例,需要定义相应的Service资源,如下所示: + +``` +apiVersion: v1 +kind: Service +metadata: + name: prometheus + namespace: default +spec: + ports: + - name: web + port: 9090 + protocol: TCP + targetPort: 9090 + selector: + prometheus: prometheus + type: NodePort +``` + +在Service创建完成后,用户可以通过浏览器访问到通过Prometheus Operator创建的实例: + +![Prometheus实例](http://p2n2em8ut.bkt.clouddn.com/prometheus-operator-instance.png) + +当然,如上所示,目前为止我们的Prometheus还没有包含任何的监控配置信息。 + +## 监控Kubernetes中部署的服务 ``` -git clone https://github.com/coreos/prometheus-operator.git -git checkout v0.22.1 +apiVersion: extensions/v1beta1 +kind: Deployment +metadata: + name: example-app +spec: + replicas: 3 + template: + metadata: + labels: + app: example-app + spec: + containers: + - name: example-app + image: fabxc/instrumented_app + ports: + - name: web + containerPort: 8080 +--- +kind: Service +apiVersion: v1 +metadata: + name: example-app + labels: + app: example-app +spec: + selector: + app: example-app + ports: + - name: web + port: 8080 ``` ``` -cd contrib/kube-prometheus/manifests +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: example-app + labels: + team: frontend +spec: + selector: + matchLabels: + app: example-app + endpoints: + - port: web ``` \ No newline at end of file From 064fb82ae075337da15e92fab9e75f73929e6bce Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Sun, 12 Aug 2018 19:06:20 +0800 Subject: [PATCH 05/13] update use operator monitor app --- kubernetes/use-operator-monitor-app.md | 58 +++++++++++++++++++++++++- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/kubernetes/use-operator-monitor-app.md b/kubernetes/use-operator-monitor-app.md index 3f05f7a..7e93204 100644 --- a/kubernetes/use-operator-monitor-app.md +++ b/kubernetes/use-operator-monitor-app.md @@ -116,6 +116,8 @@ spec: ## 监控Kubernetes中部署的服务 +为了能够模拟应用监控的场景,首先需要在Kubernetes中安装一个测试应用,如下所示: + ``` apiVersion: extensions/v1beta1 kind: Deployment @@ -149,6 +151,18 @@ spec: port: 8080 ``` +将以上内容保存为example-app.yaml,并在Kubernetes中创建相应的资源: + +``` +$ kubectl create -f example-app.yaml +deployment "example-app" created +service "example-app" created +``` + +访问示例应用的8080端口下的/metrics路径可以获取该应用的监控样本数据。在Prometheus Operator下所有与Prometheus相关的操作都是通过自定义资源类型实现的,对于监控配置也是相同的方式,用户只需要通过ServiceMonitor声明监控目标,并且关联到Prometheus资源即可。 + +如下所示,定义类型为ServiceMonitor的资源对象,并且通过selector选择需要监控的目标服务标签: + ``` apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor @@ -162,4 +176,46 @@ spec: app: example-app endpoints: - port: web -``` \ No newline at end of file +``` + +将以上内容保存为example-app-monitor.yaml文件,并且创建相应的资源: + +``` +$ kubectl create -f example-app-monitor.yaml +servicemonitor "example-app" created + +$ kubectl get servicemonitor +NAME AGE +example-app 5s +``` + +为了告诉Promtheus使用ServiceMonitor,需要修改prometheus.yaml的内容,如下所示: + +``` +apiVersion: monitoring.coreos.com/v1 +kind: Prometheus +metadata: + name: prometheus + labels: + prometheus: prometheus +spec: + replicas: 2 + serviceAccountName: prometheus + serviceMonitorSelector: + matchLabels: + team: frontend +``` + +通过在Prometheus中添加serviceMonitorSelector选择器,关联需要监控的ServiceMonitor资源标签。自此,Prometheus Operator会自动根据ServiceMonitor相关的内容生成Prometheus的监控配置文件,并在不重建Pod实例的情况下重新加载该配置。 + +通过UI查看Prometheus配置文件,Prometheus Operator自动为Prometheus创建了一个名为default/example-app/0的监控采集任务,用于采集示例应用程序的监控数据: + +![自动生成的Prometheus配置](http://p2n2em8ut.bkt.clouddn.com/prometheus-config-with-servermonitor.png) + +查看监控Target页面,可以看到当前所有的监控目标 + +![监控Target目标](http://p2n2em8ut.bkt.clouddn.com/prometheus-operator-targets.png) + +到目前为止,通过Prometheus Operator自定义的资源类型Prometheus和ServiceMonitor声明了需要在Kubernetes集群中部署的Prometheus实例以及相应的监控配置。 + +Prometheus Operator通过监听Prometheus和ServicMonitor资源的变化,自动创建和管理Prometheus的配置信息,从而实现了对Prometheus声明式的自动化管理。 \ No newline at end of file From b4d9825fd9e8c79e82efdf8a99ad03db80bbf501 Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Sun, 12 Aug 2018 19:12:41 +0800 Subject: [PATCH 06/13] update servicemonitor content --- kubernetes/use-operator-monitor-app.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kubernetes/use-operator-monitor-app.md b/kubernetes/use-operator-monitor-app.md index 7e93204..6c5866c 100644 --- a/kubernetes/use-operator-monitor-app.md +++ b/kubernetes/use-operator-monitor-app.md @@ -212,10 +212,8 @@ spec: ![自动生成的Prometheus配置](http://p2n2em8ut.bkt.clouddn.com/prometheus-config-with-servermonitor.png) -查看监控Target页面,可以看到当前所有的监控目标 +查看监控Target页面,可以看到当前所有的监控目标: ![监控Target目标](http://p2n2em8ut.bkt.clouddn.com/prometheus-operator-targets.png) -到目前为止,通过Prometheus Operator自定义的资源类型Prometheus和ServiceMonitor声明了需要在Kubernetes集群中部署的Prometheus实例以及相应的监控配置。 - -Prometheus Operator通过监听Prometheus和ServicMonitor资源的变化,自动创建和管理Prometheus的配置信息,从而实现了对Prometheus声明式的自动化管理。 \ No newline at end of file +到目前为止,通过Prometheus Operator自定义的资源类型Prometheus和ServiceMonitor声明了需要在Kubernetes集群中部署的Prometheus实例以及相应的监控配置。通过监听Prometheus和ServicMonitor资源的变化,自动创建和管理Prometheus的配置信息,从而实现了对Prometheus声明式的自动化管理。 \ No newline at end of file From 433312557fa3056487ae5ded8e6d8cdd00f93207 Mon Sep 17 00:00:00 2001 From: ylzheng Date: Sun, 12 Aug 2018 20:28:21 +0800 Subject: [PATCH 07/13] Update use-operator-monitor-app.md --- kubernetes/use-operator-monitor-app.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kubernetes/use-operator-monitor-app.md b/kubernetes/use-operator-monitor-app.md index 6c5866c..77cb099 100644 --- a/kubernetes/use-operator-monitor-app.md +++ b/kubernetes/use-operator-monitor-app.md @@ -216,4 +216,4 @@ spec: ![监控Target目标](http://p2n2em8ut.bkt.clouddn.com/prometheus-operator-targets.png) -到目前为止,通过Prometheus Operator自定义的资源类型Prometheus和ServiceMonitor声明了需要在Kubernetes集群中部署的Prometheus实例以及相应的监控配置。通过监听Prometheus和ServicMonitor资源的变化,自动创建和管理Prometheus的配置信息,从而实现了对Prometheus声明式的自动化管理。 \ No newline at end of file +到目前为止,通过Prometheus Operator自定义的资源类型Prometheus和ServiceMonitor声明了需要在Kubernetes集群中部署的Prometheus实例以及相应的监控配置。通过监听Prometheus和ServicMonitor资源的变化,自动创建和管理Prometheus的配置信息,从而实现了对Prometheus声明式的自动化管理。 From 93b495b575917ac8abd5950cdab805b4ae9b6099 Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Sun, 12 Aug 2018 21:15:40 +0800 Subject: [PATCH 08/13] alert with operator --- README.md | 1 - SUMMARY.md | 1 - .../prometheus-operator/02prometheus.yaml | 16 +++++ .../prometheus-operator/03prometheus.yaml | 21 ++++++ .../alertmanager-service.yaml | 14 ++++ .../alertmanager-setup.yaml | 6 ++ .../prometheus-operator/alertmanager-svc.yaml | 14 ++++ .../prometheus-operator/alertmanager.yaml | 12 ++++ .../example-app-monitor.yaml | 1 + .../node-exporter-daemonset.yaml | 71 ++++++++++++++++++ kubernetes/use-operator-alerting.md | 72 ++++++++++++++++++- 11 files changed, 226 insertions(+), 3 deletions(-) create mode 100644 examples/prometheus-operator/02prometheus.yaml create mode 100644 examples/prometheus-operator/03prometheus.yaml create mode 100644 examples/prometheus-operator/alertmanager-service.yaml create mode 100644 examples/prometheus-operator/alertmanager-setup.yaml create mode 100644 examples/prometheus-operator/alertmanager-svc.yaml create mode 100644 examples/prometheus-operator/alertmanager.yaml create mode 100644 examples/prometheus-operator/node-exporter-daemonset.yaml diff --git a/README.md b/README.md index c16c45b..dc412e8 100644 --- a/README.md +++ b/README.md @@ -74,7 +74,6 @@ Prometheus操作指南:云原生监控之道 * [使用Grafana创建可视化仪表盘](./kubernetes/use-grafana-in-k8s.md) * [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md) * [使用Prometheus Operator监控用户应用](./kubernetes/use-operator-monitor-app.md) - * [使用Prometheus Operator监控集群](./kubernetes/use-operator-monitor-app.md) * [Prometheus Operator下的告警处理](./kubernetes/use-operator-alerting.md) * [第9章 使用Prometheus监控Rancher集群](./rancher/README.md) * [参考资料](./REFERENCES.md) diff --git a/SUMMARY.md b/SUMMARY.md index 40ba1af..21d5879 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -79,7 +79,6 @@ * [使用Grafana创建可视化仪表盘](./kubernetes/use-grafana-in-k8s.md) * [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md) * [使用Prometheus Operator监控用户应用](./kubernetes/use-operator-monitor-app.md) - * [使用Prometheus Operator监控集群](./kubernetes/use-operator-monitor-app.md) * [Prometheus Operator下的告警处理](./kubernetes/use-operator-alerting.md) * [小结](./kubernetes/SUMMARY.md) * [第9章 使用Prometheus监控Rancher集群](./rancher/README.md) diff --git a/examples/prometheus-operator/02prometheus.yaml b/examples/prometheus-operator/02prometheus.yaml new file mode 100644 index 0000000..6195e03 --- /dev/null +++ b/examples/prometheus-operator/02prometheus.yaml @@ -0,0 +1,16 @@ +apiVersion: monitoring.coreos.com/v1 +kind: Prometheus +metadata: + name: prometheus + labels: + prometheus: prometheus +spec: + replicas: 2 + serviceAccountName: prometheus + serviceMonitorNamespaceSelector: + matchExpressions: + - {} + serviceMonitorSelector: + matchExpressions: + - key: k8s-app + operator: Exists \ No newline at end of file diff --git a/examples/prometheus-operator/03prometheus.yaml b/examples/prometheus-operator/03prometheus.yaml new file mode 100644 index 0000000..5401a40 --- /dev/null +++ b/examples/prometheus-operator/03prometheus.yaml @@ -0,0 +1,21 @@ +apiVersion: monitoring.coreos.com/v1 +kind: Prometheus +metadata: + name: prometheus + labels: + prometheus: prometheus +spec: + replicas: 2 + serviceAccountName: prometheus + serviceMonitorSelector: + matchLabels: + team: frontend + alerting: + alertmanagers: + - namespace: default + name: alertmanager-example + port: web + ruleSelector: + matchLabels: + role: alert-rules + prometheus: example \ No newline at end of file diff --git a/examples/prometheus-operator/alertmanager-service.yaml b/examples/prometheus-operator/alertmanager-service.yaml new file mode 100644 index 0000000..05bbbbc --- /dev/null +++ b/examples/prometheus-operator/alertmanager-service.yaml @@ -0,0 +1,14 @@ +apiVersion: v1 +kind: Service +metadata: + name: alertmanager-example +spec: + type: NodePort + ports: + - name: web + nodePort: 30903 + port: 9093 + protocol: TCP + targetPort: web + selector: + alertmanager: example \ No newline at end of file diff --git a/examples/prometheus-operator/alertmanager-setup.yaml b/examples/prometheus-operator/alertmanager-setup.yaml new file mode 100644 index 0000000..f2ffb36 --- /dev/null +++ b/examples/prometheus-operator/alertmanager-setup.yaml @@ -0,0 +1,6 @@ +apiVersion: monitoring.coreos.com/v1 +kind: Alertmanager +metadata: + name: example +spec: + replicas: 3 diff --git a/examples/prometheus-operator/alertmanager-svc.yaml b/examples/prometheus-operator/alertmanager-svc.yaml new file mode 100644 index 0000000..05bbbbc --- /dev/null +++ b/examples/prometheus-operator/alertmanager-svc.yaml @@ -0,0 +1,14 @@ +apiVersion: v1 +kind: Service +metadata: + name: alertmanager-example +spec: + type: NodePort + ports: + - name: web + nodePort: 30903 + port: 9093 + protocol: TCP + targetPort: web + selector: + alertmanager: example \ No newline at end of file diff --git a/examples/prometheus-operator/alertmanager.yaml b/examples/prometheus-operator/alertmanager.yaml new file mode 100644 index 0000000..5e102e1 --- /dev/null +++ b/examples/prometheus-operator/alertmanager.yaml @@ -0,0 +1,12 @@ +global: + resolve_timeout: 5m +route: + group_by: ['job'] + group_wait: 30s + group_interval: 5m + repeat_interval: 12h + receiver: 'webhook' +receivers: +- name: 'webhook' + webhook_configs: + - url: 'http://alertmanagerwh:30500/' \ No newline at end of file diff --git a/examples/prometheus-operator/example-app-monitor.yaml b/examples/prometheus-operator/example-app-monitor.yaml index d2a68fc..225279d 100644 --- a/examples/prometheus-operator/example-app-monitor.yaml +++ b/examples/prometheus-operator/example-app-monitor.yaml @@ -4,6 +4,7 @@ metadata: name: example-app labels: team: frontend + k8s-app: example-app spec: selector: matchLabels: diff --git a/examples/prometheus-operator/node-exporter-daemonset.yaml b/examples/prometheus-operator/node-exporter-daemonset.yaml new file mode 100644 index 0000000..06ae111 --- /dev/null +++ b/examples/prometheus-operator/node-exporter-daemonset.yaml @@ -0,0 +1,71 @@ +apiVersion: apps/v1beta2 +kind: DaemonSet +metadata: + labels: + app: node-exporter + name: node-exporter +spec: + selector: + matchLabels: + app: node-exporter + template: + metadata: + labels: + app: node-exporter + spec: + containers: + - args: + - --web.listen-address=127.0.0.1:9100 + - --path.procfs=/host/proc + - --path.sysfs=/host/sys + image: quay.io/prometheus/node-exporter:v0.15.2 + name: node-exporter + volumeMounts: + - mountPath: /host/proc + name: proc + readOnly: false + - mountPath: /host/sys + name: sys + readOnly: false + hostNetwork: true + hostPID: true + nodeSelector: + beta.kubernetes.io/os: linux + volumes: + - hostPath: + path: /proc + name: proc + - hostPath: + path: /sys + name: sys +--- +apiVersion: v1 +kind: Service +metadata: + labels: + k8s-app: node-exporter + name: node-exporter +spec: + type: ClusterIP + ports: + - name: https + port: 9100 + targetPort: https + selector: + app: node-exporter +--- +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + labels: + k8s-app: node-exporter + name: node-exporter +spec: + jobLabel: k8s-app + endpoints: + - interval: 30s + port: https + selector: + matchLabels: + k8s-app: node-exporter + diff --git a/kubernetes/use-operator-alerting.md b/kubernetes/use-operator-alerting.md index 508c9b0..ed4f8bb 100644 --- a/kubernetes/use-operator-alerting.md +++ b/kubernetes/use-operator-alerting.md @@ -1 +1,71 @@ -## Prometheus Operator下的告警处理 \ No newline at end of file +# Prometheus Operator下的告警处理 + +``` +apiVersion: monitoring.coreos.com/v1 +kind: Alertmanager +metadata: + name: example +spec: + replicas: 3 +``` + +``` +global: + resolve_timeout: 5m +route: + group_by: ['job'] + group_wait: 30s + group_interval: 5m + repeat_interval: 12h + receiver: 'webhook' +receivers: +- name: 'webhook' + webhook_configs: + - url: 'http://alertmanagerwh:30500/' +``` + +``` +$ kubectl create secret generic alertmanager-example --from-file=alertmanager.yaml +``` + +``` +apiVersion: v1 +kind: Service +metadata: + name: alertmanager-example +spec: + type: NodePort + ports: + - name: web + nodePort: 30903 + port: 9093 + protocol: TCP + targetPort: web + selector: + alertmanager: example +``` + +``` +$ kubectl create -f alertmanager-service.yaml +``` + +``` +$ kubectl apply -f prometheus.yaml +``` + +``` +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + creationTimestamp: null + labels: + prometheus: example + role: alert-rules + name: prometheus-example-rules +spec: + groups: + - name: ./example.rules + rules: + - alert: ExampleAlert + expr: vector(1) +``` \ No newline at end of file From f7e1e87e4116798258deee783119f05e77669837 Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Sun, 12 Aug 2018 22:53:19 +0800 Subject: [PATCH 09/13] use operator deploy alertmanager --- README.md | 2 +- SUMMARY.md | 2 +- .../prometheus-operator/02prometheus.yaml | 12 +-- .../prometheus-operator/example-rule.yaml | 13 +++ kubernetes/use-operator-alerting.md | 88 +++++++++++++++---- kubernetes/use-operator-monitor-app.md | 58 +++++++++++- 6 files changed, 147 insertions(+), 28 deletions(-) create mode 100644 examples/prometheus-operator/example-rule.yaml diff --git a/README.md b/README.md index dc412e8..ab826d3 100644 --- a/README.md +++ b/README.md @@ -74,6 +74,6 @@ Prometheus操作指南:云原生监控之道 * [使用Grafana创建可视化仪表盘](./kubernetes/use-grafana-in-k8s.md) * [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md) * [使用Prometheus Operator监控用户应用](./kubernetes/use-operator-monitor-app.md) - * [Prometheus Operator下的告警处理](./kubernetes/use-operator-alerting.md) + * [使用Prometheus Operator管理Alertmanager](./kubernetes/use-operator-alerting.md) * [第9章 使用Prometheus监控Rancher集群](./rancher/README.md) * [参考资料](./REFERENCES.md) diff --git a/SUMMARY.md b/SUMMARY.md index 21d5879..4629a1d 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -79,7 +79,7 @@ * [使用Grafana创建可视化仪表盘](./kubernetes/use-grafana-in-k8s.md) * [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md) * [使用Prometheus Operator监控用户应用](./kubernetes/use-operator-monitor-app.md) - * [Prometheus Operator下的告警处理](./kubernetes/use-operator-alerting.md) + * [使用Prometheus Operator管理Alertmanager](./kubernetes/use-operator-alerting.md) * [小结](./kubernetes/SUMMARY.md) * [第9章 使用Prometheus监控Rancher集群](./rancher/README.md) * [参考资料](./REFERENCES.md) diff --git a/examples/prometheus-operator/02prometheus.yaml b/examples/prometheus-operator/02prometheus.yaml index 6195e03..9b3b701 100644 --- a/examples/prometheus-operator/02prometheus.yaml +++ b/examples/prometheus-operator/02prometheus.yaml @@ -7,10 +7,10 @@ metadata: spec: replicas: 2 serviceAccountName: prometheus - serviceMonitorNamespaceSelector: - matchExpressions: - - {} serviceMonitorSelector: - matchExpressions: - - key: k8s-app - operator: Exists \ No newline at end of file + matchLabels: + team: frontend + ruleSelector: + matchLabels: + role: alert-rules + prometheus: example \ No newline at end of file diff --git a/examples/prometheus-operator/example-rule.yaml b/examples/prometheus-operator/example-rule.yaml new file mode 100644 index 0000000..3b01365 --- /dev/null +++ b/examples/prometheus-operator/example-rule.yaml @@ -0,0 +1,13 @@ +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + labels: + prometheus: example + role: alert-rules + name: prometheus-example-rules +spec: + groups: + - name: ./example.rules + rules: + - alert: ExampleAlert + expr: vector(1) \ No newline at end of file diff --git a/kubernetes/use-operator-alerting.md b/kubernetes/use-operator-alerting.md index ed4f8bb..b341abb 100644 --- a/kubernetes/use-operator-alerting.md +++ b/kubernetes/use-operator-alerting.md @@ -1,4 +1,6 @@ -# Prometheus Operator下的告警处理 +# 使用Prometheus Operator管理Alertmanager + +为了通过Prometheus Operator管理Alertmanager实例,用户可以通过自定义资源Alertmanager进行定义,如下所示,通过replicas可以控制Alertmanager的实例数: ``` apiVersion: monitoring.coreos.com/v1 @@ -9,6 +11,31 @@ spec: replicas: 3 ``` +当replicas大于1时,Prometheus Operator会自动通过集群的方式创建Alertmanager。将以上内容保存为文件alertmanager-setup.yaml,并通过以下命令创建: + +``` +$ kubectl create -f alertmanager-setup.yaml +alertmanager "example" created +``` + +查看Pod的情况如下所示,我们会发现Alertmanager的Pod实例一直处于ContainerCreating的状态中: + +``` +$ kubectl get pods +NAME READY STATUS RESTARTS AGE +alertmanager-example-0 0/2 ContainerCreating 0 4m +``` + +通过kubectl describe命令查看该Pod实例状态,可以看到以下内容: + +``` +$ kubectl describe pods alertmanager-example-0 +... +Warning FailedMount 4s (x2 over 2m) kubelet, cn-beijing.i-2ze52j61t5p9z4n60c9m Unable to mount volumes for pod "alertmanager-example-0_default(f75aff5c-9e37-11e8-9dc5-00163e124757)": timeout expired waiting for volumes to attach or mount for pod "default"/"alertmanager-example-0". list of unmounted volumes=[config-volume]. list of unattached volumes=[config-volume alertmanager-example-db default-token-tzpfg] +``` + +Prometheus Operator将通过Statefulset的方式创建Alertmanager实例,默认情况下,Alertmanager的实例会通过`alertmanager-{ALERTMANAGER_NAME}`的命名规则去查找Secret配置并以文件挂载的方式,将Secret的内容作为配置文件挂载到Alertmanager实例当中。因此,这里还需要为Alertmanager创建相应的配置内容,如下所示,是Alertmanager的配置文件: + ``` global: resolve_timeout: 5m @@ -24,10 +51,24 @@ receivers: - url: 'http://alertmanagerwh:30500/' ``` +将以上内容保存为文件alertmanager.yaml,并且通过以下命令创建名为alrtmanager-example的Secret资源: + ``` $ kubectl create secret generic alertmanager-example --from-file=alertmanager.yaml +secret "alertmanager-example" created ``` +在Secret创建成功后,查看当前Alertmanager Pod实例状态。如下所示: + +``` +$ kubectl get pods +alertmanager-example-0 2/2 Running 0 37m +alertmanager-example-1 2/2 Running 0 31m +alertmanager-example-2 2/2 Running 0 31m +``` + +为了能够访问到这些Alertmanager实例,我们需要创建相应的Service,如下所示: + ``` apiVersion: v1 kind: Service @@ -45,27 +86,38 @@ spec: alertmanager: example ``` -``` -$ kubectl create -f alertmanager-service.yaml -``` +访问Alertmanager UI,并查看当前集群状态: -``` -$ kubectl apply -f prometheus.yaml -``` +![Alertmanager集群状态](http://p2n2em8ut.bkt.clouddn.com/prometheus-alert-cluster-status.png) + +接下来,我们只需要修改我们的Prometheus资源定义,通过alerting指定使用的Alertmanager资源即可: ``` apiVersion: monitoring.coreos.com/v1 -kind: PrometheusRule +kind: Prometheus metadata: - creationTimestamp: null + name: prometheus labels: - prometheus: example - role: alert-rules - name: prometheus-example-rules + prometheus: prometheus spec: - groups: - - name: ./example.rules - rules: - - alert: ExampleAlert - expr: vector(1) -``` \ No newline at end of file + replicas: 2 + serviceAccountName: prometheus + serviceMonitorSelector: + matchLabels: + team: frontend + alerting: + alertmanagers: + - namespace: default + name: alertmanager-example + port: web + ruleSelector: + matchLabels: + role: alert-rules + prometheus: example +``` + +在Prometheus重新加载配置完成后,通过UI可以查看Prometheus最新的配置内容,如下所示: + +![Prometheus配置]](http://p2n2em8ut.bkt.clouddn.com/prometheus-alerting-auto.png) + +自此,通过使用Prometheus Operator提供的自定义资源内容,声明式的创建和管理Prometheus实例以及Alertmanager集群。 \ No newline at end of file diff --git a/kubernetes/use-operator-monitor-app.md b/kubernetes/use-operator-monitor-app.md index 77cb099..44d8b16 100644 --- a/kubernetes/use-operator-monitor-app.md +++ b/kubernetes/use-operator-monitor-app.md @@ -2,7 +2,7 @@ 本小节将展示,如何通过Prometheus Operator部署Prometheus实例并且实现对部署在Kubernetes中应用程序的监控。 -## 部署Prometheus Server +## 部署Prometheus实例 为了能够让Prometheus实例能够正常的使用服务发现能力,我们首先需要基于Kubernetes的RBAC模型为Prometheus创建ServiceAccount并赋予相应的集群访问权限。如下所示: @@ -114,7 +114,7 @@ spec: 当然,如上所示,目前为止我们的Prometheus还没有包含任何的监控配置信息。 -## 监控Kubernetes中部署的服务 +## 使用ServiceMonitor管理监控目标 为了能够模拟应用监控的场景,首先需要在Kubernetes中安装一个测试应用,如下所示: @@ -216,4 +216,58 @@ spec: ![监控Target目标](http://p2n2em8ut.bkt.clouddn.com/prometheus-operator-targets.png) +## 使用PrometheusRule管理告警规则 + +对于Prometheus而言,在传统的管理方式上,我们还需要手动管理Prometheus的告警规则文件,并且在文件发生变化手动通知Prometheus加载这些文件。 而在Prometheus Operator模式下,我们只需要通过自定义资源类型PrometheusRule声明即可 + +``` +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + labels: + prometheus: example + role: alert-rules + name: prometheus-example-rules +spec: + groups: + - name: ./example.rules + rules: + - alert: ExampleAlert + expr: vector(1) +``` + +将以上内容保存为example-rule.yaml文件,并且通过kubectl命令创建相应的资源: + +``` +$ kubectl create -f example-rule.yaml +prometheusrule "prometheus-example-rules" created +``` + +告警规则创建成功后,通过ruleSelector选择需要关联的PrometheusRule即可 + +``` +apiVersion: monitoring.coreos.com/v1 +kind: Prometheus +metadata: + name: prometheus + labels: + prometheus: prometheus +spec: + replicas: 2 + serviceAccountName: prometheus + serviceMonitorSelector: + matchLabels: + team: frontend + ruleSelector: + matchLabels: + role: alert-rules + prometheus: example +``` + +Prometheus重新加载配置后,从UI中我们可以查看到通过PrometheusRule自动创建的告警规则配置: + +![Prometheus告警规则](http://p2n2em8ut.bkt.clouddn.com/prometheus-rule.png) + 到目前为止,通过Prometheus Operator自定义的资源类型Prometheus和ServiceMonitor声明了需要在Kubernetes集群中部署的Prometheus实例以及相应的监控配置。通过监听Prometheus和ServicMonitor资源的变化,自动创建和管理Prometheus的配置信息,从而实现了对Prometheus声明式的自动化管理。 + +到目前为止,我们已经通过Prometheus Operator的自定义资源类型管理了Promtheus的实例,监控配置以及告警规则等资源。通过Prometheus Operator将原本手动管理的工作全部变成声明式的管理模式,大大简化了Kubernetes下的Prometheus运维管理的复杂度。 接下来,我们将继续使用Promtheus Operator定义和管理Alertmanager相关的内容。 From affb9cf8f60fd4d97e7c310b69aab08169cd4839 Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Sun, 12 Aug 2018 23:15:47 +0800 Subject: [PATCH 10/13] fixed image ref error --- kubernetes/use-operator-alerting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kubernetes/use-operator-alerting.md b/kubernetes/use-operator-alerting.md index b341abb..a05f4c9 100644 --- a/kubernetes/use-operator-alerting.md +++ b/kubernetes/use-operator-alerting.md @@ -118,6 +118,6 @@ spec: 在Prometheus重新加载配置完成后,通过UI可以查看Prometheus最新的配置内容,如下所示: -![Prometheus配置]](http://p2n2em8ut.bkt.clouddn.com/prometheus-alerting-auto.png) +![Prometheus配置](http://p2n2em8ut.bkt.clouddn.com/prometheus-alerting-auto2.png) 自此,通过使用Prometheus Operator提供的自定义资源内容,声明式的创建和管理Prometheus实例以及Alertmanager集群。 \ No newline at end of file From 55bb2a8d069f50f012398bf8228a9b0588568eee Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Sun, 19 Aug 2018 11:03:59 +0800 Subject: [PATCH 11/13] remove unused content --- kubernetes/use-operator-manage-prometheus.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/kubernetes/use-operator-manage-prometheus.md b/kubernetes/use-operator-manage-prometheus.md index 9bfeed6..e78c715 100644 --- a/kubernetes/use-operator-manage-prometheus.md +++ b/kubernetes/use-operator-manage-prometheus.md @@ -108,15 +108,4 @@ level=info ts=2018-08-12T02:57:38.617738839Z caller=operator.go:1338 component=p level=info ts=2018-08-12T02:57:38.710804217Z caller=operator.go:1338 component=prometheusoperator msg="CRD updated" crd=PrometheusRule level=info ts=2018-08-12T02:57:41.622981601Z caller=operator.go:192 component=alertmanageroperator msg="CRD API endpoints ready" level=info ts=2018-08-12T02:57:47.755480463Z caller=operator.go:330 component=prometheusoperator msg="CRD API endpoints ready" -``` - -查看集群中的自定义资源内容: - -``` -$ kubectl get customresourcedefinition -NAME AGE -alertmanagers.monitoring.coreos.com 6d -prometheuses.monitoring.coreos.com 6d -prometheusrules.monitoring.coreos.com 6d -servicemonitors.monitoring.coreos.com 6d ``` \ No newline at end of file From 8b10ce021fab4c83dbe93eb0e10f3143d5110cf7 Mon Sep 17 00:00:00 2001 From: wilelm Date: Wed, 19 Sep 2018 15:19:07 +0800 Subject: [PATCH 12/13] Update prometheus-promql-operators-v2.md update, bytes / 1024 is KB, not MB, correct it --- promql/prometheus-promql-operators-v2.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/promql/prometheus-promql-operators-v2.md b/promql/prometheus-promql-operators-v2.md index c58b3e5..10031d3 100644 --- a/promql/prometheus-promql-operators-v2.md +++ b/promql/prometheus-promql-operators-v2.md @@ -7,7 +7,7 @@ 例如,我们可以通过指标node_memory_free_bytes_total获取当前主机可用的内存空间大小,其样本单位为Bytes。这是如果客户端要求使用MB作为单位响应数据,那只需要将查询到的时间序列的样本值进行单位换算即可: ``` -node_memory_free_bytes_total / 1024 +node_memory_free_bytes_total / (1024 * 1024) ``` node_memory_free_bytes_total表达式会查询出所有满足表达式条件的时间序列,在上一小节中我们称该表达式为瞬时向量表达式,而返回的结果成为瞬时向量。 @@ -204,4 +204,4 @@ method_code:http_errors:rate5m / ignoring(code) group_left method:http_requests: {method="post", code="404"} 0.175 // 21 / 120 ``` -> 提醒:group修饰符只能在比较和数学运算符中使用。在逻辑运算and,unless和or才注意操作中默认与右向量中的所有元素进行匹配。 \ No newline at end of file +> 提醒:group修饰符只能在比较和数学运算符中使用。在逻辑运算and,unless和or才注意操作中默认与右向量中的所有元素进行匹配。 From ac6e1c65c8349f235eba22b4784045a6fe1a8330 Mon Sep 17 00:00:00 2001 From: "yunl.zheng" Date: Tue, 9 Oct 2018 23:11:49 +0800 Subject: [PATCH 13/13] add hap-with-prometheus --- SUMMARY.md | 3 +-- grafana/grafana-intro.md | 4 +--- kubernetes/hap-with-prometheus.md | 3 +++ 3 files changed, 5 insertions(+), 5 deletions(-) create mode 100644 kubernetes/hap-with-prometheus.md diff --git a/SUMMARY.md b/SUMMARY.md index 4629a1d..57b9409 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -52,8 +52,6 @@ * [表格:Tabel面板](./grafana/use_table_panel.md) * [模板化Dashboard](./grafana/templating.md) * [共享Dashboard](./grafana/share_dashboard.md) - * 告警 - * 团队与权限管理 * [小结](./grafana/SUMMARY.md) * [第6章 集群与高可用](./ha/READMD.md) * [本地存储](./ha/prometheus-local-storage.md) @@ -80,6 +78,7 @@ * [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md) * [使用Prometheus Operator监控用户应用](./kubernetes/use-operator-monitor-app.md) * [使用Prometheus Operator管理Alertmanager](./kubernetes/use-operator-alerting.md) + * [基于Prometheus的弹性伸缩](./kubernetes/hap-with-prometheus.md) * [小结](./kubernetes/SUMMARY.md) * [第9章 使用Prometheus监控Rancher集群](./rancher/README.md) * [参考资料](./REFERENCES.md) diff --git a/grafana/grafana-intro.md b/grafana/grafana-intro.md index 323c83f..8b5caaa 100644 --- a/grafana/grafana-intro.md +++ b/grafana/grafana-intro.md @@ -38,6 +38,4 @@ Grafana通过组织(Organization)提供了类似于多租户的模式。Orga 在Organization中可以添加多个用户以及团队。Grafana基于角色和权限模式管理用户对Dashboard以及Dashboard的管理权限,其中内置了三个角色,分别是:View,Editor,Admin。 对于单个Dashboard而言Admin用户可以分配其它用户(User)或者团队(Team)的权限。 -对于一组相关的Dashboard,在最新版本的Grafan中还提供目录(Folder)的形式统一管理其权限。 - -> TODO: 添加图例,展示Dashboard的组织关系。 \ No newline at end of file +对于一组相关的Dashboard,在最新版本的Grafan中还提供目录(Folder)的形式统一管理其权限。 \ No newline at end of file diff --git a/kubernetes/hap-with-prometheus.md b/kubernetes/hap-with-prometheus.md new file mode 100644 index 0000000..003b192 --- /dev/null +++ b/kubernetes/hap-with-prometheus.md @@ -0,0 +1,3 @@ +# 基于Prometheus的弹性伸缩 + +弹性伸缩(AutoScaling)是指应用可以根据当前的资源使用情况自动水平扩容或者缩容的能力。 \ No newline at end of file