Prometheus Thanos 集群方案


Prometheus Thanos 集群方案

一、Prometheus 缺陷与不足

先梳理下,当前单实例 Prometheus 架构存在的问题

单点问题

单台的 Prometheus 存在单点故障的风险,监控本身的可用性一样很重要

数据容量

默认情况下,存储到本地 tsdb 的采样数据只会保存 15 天,这个值可以通过 --storage.tsdb.retention.time 配置,值越大,对存储空间的要求也就越大,纵然我们可以扩展单个节点的磁盘以支持保存更久的指标数据

但是,数据并不 safe,这在需要保存长时间指标数据用以分析统计的场景很要命

二、Prometheus 联邦机制

Prometheus 内置支持的一种集群方式,核心原理是 级联抓取,例如:企业里有 N 套 Kubernetes 集群,或者在 N 个机房,每个机房(k8s集群)都部署一个 Prometheus 实例,此时这些 Prometheus 实例数据隔离开的

这里,我们可以选择使用 GrafanaNightingale 创建对应的 Prometheus 数据源,然后通过切换对应的 Dashboard 查看不同机房(k8s集群)的数据,但本质各 Prometheus 实例数据仍然是没有打通

联邦机制,可以把不同的实例数据聚拢到一个中心实例上,这时中心实例就成了瓶颈(单点、容量),为了缓解中心实例的容量问题,它应该配置为只抓取那些需要做聚合计算或其他团队也关注的指标,大部分指标仍下沉在各个边缘实例,让边缘实例消化大部分指标数据

示例配置:

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 30s
    # 中心节点 从 边缘节点 /federate 路径采集指标数据
    metrics_path: '/federate'
    # 当标签重复时,以源数据的标签为准
    honor_labels: true
    params:
      # 通过正则匹配过滤出所有 aggr: 打头的指标(自定义的持久化规则命名规范)
      'match[]':
        - '{__name__=~"aggr:.*"}'
    static_configs:
      - targets:
      - 'prometheus-edge1:9090'
      - 'prometheus-edge2:9090'

联邦机制的使用场景

  • 边缘 Prometheus 基本消化了绝大部分指标数据,也满足了日常告警、看图等需求
  • 仅刮取少量需要做打通多实例做聚合计算的指标数据,以此避免 Prometheus 容量上限

总的来说,联邦机制并没有解决原本 Prometheus 该解决的 单点、容量 问题,只是提供了打通数据的简单思路(手段),所以,通常来说更常用的是 远程存储方案

三、Prometheus 远程存储

Thanos

Thanos 核心特点是使用对象存储做海量时序存储,可以用公有云的 OSS、或私有云 Minio 方案

Thanos 支持 sidecarreceiver 方式与 Prometheus 进行集成,主要有以下几个组件

  • Sidecar:Thanos Sidecar 边车容器,负责将 Prometheus TSDB 数据备份到对象存储中,此外,Sidecar 还实现了 Thanos 的 Store API,这样 Thanos Querier 就可以从 Prometheus 查询最近的指标数据
  • Receiver:Prometheus 实例通过 Remote Write 将数据写入到 Receiver,Receiver 暂存数据到本地 TSDB,定期上传到对象存储中,此外,Receiver 也实现了 Thanos 的 Store API,作用也是让 Thanos Querier 使用
  • Querier:Thanos Querier 提供一个全局的统一查询入口,组件支持对多实例的监控数据自动去重,以及聚合底层组件数据(OSS、Sidecar、Receiver)
  • Store:将 OSS 云存储中的数据内容暴露给 Querier 组件查询 metrics 数据
  • Compactor(可选):Thanos Compactor 组件用以将对象存储中的数据进行压缩、轮转,如 压缩 block(合并 block)、降采样(小的存储块合并为大存储块)、删除超期采样数据
  • Ruler(可选):Thanos Ruler 组件用以评估 Prometheus 的记录规则、报警规则
  • Query Frontend(可选):Query Frontend 组件通过拆分大型查询、缓存提升查询性能

sidecar 模式

需在 Prometheus 实例 Pod 中简单地添加一个 Sidecar 容器,Sidecar 可以选择每 2 小时将一个 TSDB 块写入对象存储,另外 Sidecar 作为服务暴露给 Thanos Querier 组件,用于查询近期内的指标数据,长期数据通过 Store 组件从 OSS 中查询

读取流程如下:

  1. Client 通过 query APIQuery 组件发起查询请求,Query 组件将请求转换成 StoreAPI 发送到 sidecarrulestore
  2. Sidecar 收到 Query 组件的查询请求后,将其转换成 query API 请求,发送给其绑定的 Prometheus,由 Prometheus 从本地读取数据并响应(短期数据)
  3. Ruler 接收到来自 Query 组件的查询请求,直接从本地读取数据并响应,返回短期的本地评估数据
  4. Store 接收到来自 Query 组件的查询请求,首先从对象存储桶中遍历数据块的 meta.json,根据记录时间范围和标签先进行一次过滤,然后从对象存储桶中读取数据块的 indexchunks 进行查询(查询频率较高的 index 会被缓存下来),返回长期的历史采集和评估指标

写入流程如下:

  1. Prometheus 按照刮取频率从 metrics 接口抓取指标数据(以及持久化指标规则),指标数据以 TSDB 格式分块存储到本地,窗口期内存在 WAL,满足窗口期(2 小时)后,写入 TSDB 数据块
  2. Sidecar 嗅探到 Prometheus 数据存储目录生成了新的数据块后,上传数据块到 OSS 存储,上传数据块时对 meta.json 修改,添加 thanos 相关的字段(如 external_labels
  3. Ruler 根据配置的 recording rules 定期地向 Query 组件发起查询,获取评估所需的指标值,并将结果以 TSDB 格式分块存储到本地,当本地生成新数据块时,自行上传该数据块到 OSS 中做为长期历史数据保存
  4. Compact 定期对 OSS 中的数据块进行压缩和降准采样,每次压缩都会在对应的 meta.json 中的 level 加 1(初始为 1),降采样时会创建新的数据块,根据采样步长,从原有的数据块中抽取值存储到新的数据块中,在 meta.json 中记录 resolution 为采样步长

告警流程如下:

  1. Prometheus 根据自身配置的 alerting 规则定期地对自身采集的指标进行评估,当告警条件满足的情况下发起告警到 Alertmanager
  2. Ruler 根据自身配置的 alerting 规则定期的向 query 发起查询请求获取评估所需的指标,当告警条件满足的情况下发起告警到 Alertmanager
  3. Alertmanager 接收 PrometheusRuler 告警消息后进行分组合并,最终通过告警媒介发出警报

receiver 模式

大致流程和 Sidecar 仅是略有不同,所以简单过一下

  • 读请求,Query 组件 走的是 Receiver 组件暂存的 TSDB 数据块
  • 写请求,Prometheus 通过 Remote Write 远程网络写入,Receiver 根据哈希环配置,选择对应的实例,同时 Receiver 嗅探扫描本地新生成的数据块,发现新块就上传到 OSS

四、Thanos 部署

Kubernetes 集群初始化

申请 ECS 节点

$ python aliyun-ecs-sdk.py apply

Success. Instance creation succeed. InstanceIds: i-hp31frvfcnik5l3t3wdo, i-hp31frvfcnik5l3t3wdp, i-hp31frvfcnik5l3t3wdq
Instance boot successfully: node00001 39.104.22.155  172.16.0.13
Instance boot successfully: node00002 39.104.66.174  172.16.0.11
Instance boot successfully: node00003 39.104.169.161 172.16.0.12

修改 Master 节点 IP roles/kubernetes/vars/main.yml

K8S_MASTER_INTERNAL_ADVERTISE_ADDRESS: "172.16.0.13"

修改 Host roles/initial/files/hosts

::1	localhost	localhost.localdomain	localhost6	localhost6.localdomain6
127.0.0.1	 localhost	localhost.localdomain	localhost4	localhost4.localdomain4
172.16.0.13	 k8s-master01
172.16.0.11	 k8s-worker02
172.16.0.12  k8s-worker03

修改 ECS 主机名

$ ap -i alicloud.py --tags=initial setup.yml

刷新 阿里云 inventory 缓存

$ ./alicloud.py --refresh-cache

创建 Kubernetes 集群

ap -i alicloud.py --tags=kubernetes setup.yml

检查集群状态

kc get cs; kc get node; kc get pods -A; kc get --raw='/readyz?verbose'

正常情况,输出如下

NAME    STATUS    MESSAGE    ERROR
scheduler    Healthy   ok    
controller-manager   Healthy   ok    
etcd-0    Healthy   {"health":"true","reason":""}   

NAME    STATUS   ROLES    AGE   VERSION
k8s-master01   Ready    control-plane,master   10m   v1.22.2
k8s-worker02   Ready    <none>    10m   v1.22.2
k8s-worker03   Ready    <none>    10m   v1.22.2


NAMESPACE    NAME    READY   STATUS    RESTARTS    AGE
kube-flannel   kube-flannel-ds-2bhp5    1/1     Running   0    10m
kube-flannel   kube-flannel-ds-4m9zq    1/1     Running   0    10m
kube-flannel   kube-flannel-ds-jmvsf    1/1     Running   0    10m
kube-system    coredns-5485ddfd7b-drwnn    1/1     Running   0    10m
kube-system    coredns-5485ddfd7b-twc6v    1/1     Running   0    10m
kube-system    etcd-k8s-master01    1/1     Running   0    10m
kube-system    kube-apiserver-k8s-master01    1/1     Running   0    10m
kube-system    kube-controller-manager-k8s-master01   1/1     Running   1 (10m ago)   10m
kube-system    kube-proxy-dr2xj    1/1     Running   0    10m
kube-system    kube-proxy-tgsxr    1/1     Running   0    10m
kube-system    kube-proxy-vkf5d    1/1     Running   0    10m
kube-system    kube-scheduler-k8s-master01    1/1     Running   1 (10m ago)   10m


[+]ping ok
[+]log ok
[+]etcd ok
[+]informer-sync ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]shutdown ok
readyz check passed

Thanos Receiver 模式

1. namespace

创建数据目录

# k8s-worker02
$ mkdir -p /data/k8s/{prometheus,thanos-store-gateway-cache}

# k8s-worker03
$ mkdir -p /data/k8s/{grafana,minio,thanos-receiver}

创建 kube-mon 命名空间

$ kc apply -f ns.yml

2. RBAC

为 prometheus 创建 serviceaccount,role、clusterrolebinding

$ kc apply -f rbac.yml

3. StorageClass

资源定义

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
# 延迟绑定:等到第一个声明使用该 PVC 的 Pod 开始调度绑定
volumeBindingMode: WaitForFirstConsumer

创建资源

$ kc apply -f storageclass.yml

4. minio oss

MinIO 是开源的高性能分布式对象存储服务,为大规模私有云基础设施而设计,Minio 兼容 Amazon S3 云存储服务接口,适合于存储 图片、视频、日志文件、数据归档备份、和容器、虚拟机镜像等,对象文件可以是任意大小,从 kb ~ 5T 不等

官方文档:https://docs.min.io/cn/deploy-minio-on-kubernetes.html

下面部署为独立模式的 Minio 配置清单

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: monio-local
  labels:
    app: monio
spec:
  accessModes:
    # 读写权限,只允许被单个节点挂载
    - ReadWriteOnce
  capacity:
    storage: 10Gi
  storageClassName: local-storage
  local:
    path: /data/k8s/minio
  persistentVolumeReclaimPolicy: Retain
  # 节点亲和性,部署在 worker03 节点
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - k8s-worker03
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-pvc
spec:
  accessModes:
    # 读写权限,只允许被单个节点挂载
    - ReadWriteOnce
  resources:
    requests:
      storage: 5G
  storageClassName: local-storage # 最好使用LocalPV
---
apiVersion: v1
kind: Service
metadata:
  name: minio
spec:
  selector:
    app: minio
  type: NodePort
  ports:
    - name: console
      port: 9001
      targetPort: 9001
      nodePort: 30091
    - name: api
      port: 9000
      targetPort: 9000
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: minio
spec:
  selector:
    matchLabels:
      app: minio
  template:
    metadata:
      labels:
        app: minio
    spec:
      volumes:
        - name: data
          persistentVolumeClaim:
              claimName: minio-pvc
      containers:
        - name: minio
          image: minio/minio:latest
          imagePullPolicy: IfNotPresent
          args: ["server", "--console-address", ":9001", "/data"]
          env:
            # MINIO_ROOT_USER WebUI 登录用户名
            - name: MINIO_ROOT_USER
              value: "m1n10_AccessKey"
            # MINIO_ROOT_USER WebUI 登录密码
            - name: MINIO_ROOT_PASSWORD
              value: "m1n10_SecretKey"
          ports:
            # API 服务
            - containerPort: 9000
            # WebUI 
            - containerPort: 9001
          readinessProbe:
            httpGet:
              path: /minio/health/ready
              port: 9000
          livenessProbe:
            httpGet:
              path: /minio/health/ready
              port: 9000
            initialDelaySeconds: 10
            periodSeconds: 10
          volumeMounts:
            - mountPath: /data
              name: data

创建 minio 对象存储

$ kc apply -f minio-deploy.yml

检查确认

$ kc get deploy minio
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
minio   1/1     1            1           65m
$ kc logs -l app=minio

创建 凭证 secret

$ kc create secret generic thanos-objectstorage --from-file=thanos.yaml=thanos-minio.yml -n kube-mon

访问 WebUI

# 用户名 m1n10_AccessKey 
# 密码   m1n10_SecretKey 

创建 bucket thanos

5. node_exporter

资源定义

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-mon
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      nodeSelector:
        kubernetes.io/os: linux
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: root
          hostPath:
            path: /
        - name: system-dbus-socket
          hostPath:
            path: /var/run/dbus/system_bus_socket
      containers:
        - name: node-exporter
          image: prom/node-exporter:v1.5.0
          args:
            - --web.listen-address=0.0.0.0:9110
            - --path.procfs=/host/proc
            - --path.sysfs=/host/sys
            - --path.rootfs=/host/root
            - --collector.filesystem.ignored-mount-points=^/(proc|var/lib/containerd/.+|/var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
            - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
            - --collector.textfile
            - --collector.netdev.device-exclude="^(lo|docker[0-9]|veth.+)$"
            - --collector.systemd
            - --collector.systemd.unit-whitelist="(docker|ssh).service"
            - --collector.conntrack
            - --collector.cpu
            - --collector.diskstats
            - --collector.filefd
            - --collector.filesystem
            - --collector.loadavg
            - --collector.meminfo
            - --collector.netdev
            - --collector.netstat
            - --collector.ntp
            - --collector.sockstat
            - --collector.stat
            - --collector.time
            - --collector.uname
            - --collector.vmstat
            - --collector.tcpstat
            - --collector.xfs
            - --collector.zfs
            - --no-collector.arp
            - --no-collector.bcache
            - --no-collector.bonding
            - --no-collector.buddyinfo
            - --no-collector.drbd
            - --no-collector.edac
            - --no-collector.entropy
            - --no-collector.hwmon
            - --no-collector.infiniband
            - --no-collector.interrupts
            - --no-collector.ipvs
            - --no-collector.ksmd
            - --no-collector.logind
            - --no-collector.mdadm
            - --no-collector.meminfo_numa
            - --no-collector.mountstats
            - --no-collector.nfs
            - --no-collector.nfsd
            - --no-collector.qdisc
            - --no-collector.runit
            - --no-collector.supervisord
            - --no-collector.timex
            - --no-collector.wifi
          ports:
            - containerPort: 9110
          env:
            - name: HOSTIP
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
          resources:
            requests:
              cpu: 150m
              memory: 180Mi
            limits:
              cpu: 150m
              memory: 180Mi
          securityContext:
            runAsUser: 65534
            runAsNonRoot: true
          volumeMounts:
            - mountPath: /host/proc
              name: proc
            - mountPath: /host/sys
              name: sys
            - mountPath: /host/root
              name: root
              readOnly: true
              mountPropagation: HostToContainer
            - mountPath: /var/run/dbus/system_bus_socket
              name: system-dbus-socket
              readOnly: true
      tolerations:
        - operator: "Exists"

创建资源

$ kc apply -f node-exporter.yml

检查确认

$ kc get ds -A
NAMESPACE      NAME              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-flannel   kube-flannel-ds   3         3         3       3            3           <none>                   84m
kube-mon       node-exporter     3         3         3       3            3           kubernetes.io/os=linux   16m
kube-system    kube-proxy        3         3         3       3            3           kubernetes.io/os=linux   85m

$ kc -n kube-mon logs -l app=node-exporter

6. thanos store

Thanos Store 组件,负责将历史监控指标存储在对象存储(Minio)中

资源定义

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: thanos-store-gateway-local
  namespace: kube-mon
  labels:
    app: thanos-store-gateway
spec:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 2Gi
  storageClassName: local-storage
  local:
    path: /data/k8s/thanos-store-gateway-cache
  persistentVolumeReclaimPolicy: Retain
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - k8s-worker02

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: thanos-store-gateway-pvc
  namespace: kube-mon
spec:
  storageClassName: local-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2G

---
# 该服务为 querier 创建 srv 记录,以便查找 store-api 的信息
apiVersion: v1
kind: Service
metadata:
  name: thanos-store-gateway
  namespace: kube-mon
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - name: grpc
      port: 10901
      targetPort: grpc
  selector:
    # 发现 thanos-store-api 标签 为 true 的 StatefulSet Pod
    thanos-store-api: "true"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store-gateway
  namespace: kube-mon
  labels:
    app: thanos-store-gateway
spec:
  replicas: 2
  selector:
    matchLabels:
      app: thanos-store-gateway
  serviceName: thanos-store-gateway
  template:
    metadata:
      labels:
        app: thanos-store-gateway
        # 添加标签,用以让 Headless Service 发现
        thanos-store-api: "true"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - thanos-store-gateway
      containers:
        - name: thanos
          image: thanosio/thanos:v0.31.0
          imagePullPolicy: IfNotPresent
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          args:
            - "store"
            - "--log.level=debug"
            - "--data-dir=/data/$(POD_NAME)"
            - "--objstore.config-file=/etc/secret/thanos.yaml"
            - "--index-cache-size=500MB"
            - "--chunk-pool-size=500MB"
          ports:
            - name: http
              containerPort: 10902
            - name: grpc
              containerPort: 10901
          livenessProbe:
            httpGet:
              port: 10902
              path: /-/healthy
          readinessProbe:
            httpGet:
              port: 10902
              path: /-/ready
          volumeMounts:
            - name: object-storage-config
              mountPath: /etc/secret
              readOnly: false
            - mountPath: /data
              name: thanos-store-gateway-cache-volume
      volumes:
        - name: object-storage-config
          secret:
            secretName: thanos-objectstorage
        - name: thanos-store-gateway-cache-volume
          persistentVolumeClaim:
            claimName: thanos-store-gateway-pvc

创建资源

$ kc apply -f thanos-store.yml

检查确认

$ kc get sts -A                             
NAMESPACE   NAME                   READY   AGE
kube-mon    thanos-store-gateway   2/2     93s

$ kc -n kube-mon logs -l app=thanos-store-gateway

# ...
level=info ts=2023-04-08T03:30:03.084382377Z caller=store.go:370 msg="bucket store ready" init_duration=10.13259ms
level=debug ts=2023-04-08T03:30:03.084430366Z caller=fetcher.go:319 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=info ts=2023-04-08T03:30:03.084660673Z caller=intrumentation.go:56 msg="changing probe status" status=ready
level=info ts=2023-04-08T03:30:03.084716398Z caller=grpc.go:131 service=gRPC/server component=store msg="listening for serving gRPC" address=0.0.0.0:10901
level=info ts=2023-04-08T03:30:03.08623262Z caller=fetcher.go:470 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=1.826716ms duration_ms=1 cached=0 returned=0 partial=0

7. thanos receiver

Thanos Receiver 组件可以接收来自任何 Prometheus 实例的 remote write 远程写入请求,并将数据存储在其本地 TSDB,也可以选择将本地 TSDB 块定期上传到对象存储中,Thanos Querier 组件 通过 Receiver 暴露 Store API 接口,就可以获取到最近一段时间的数据

资源定义

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: thanos-receiver
  namespace: kube-mon
  labels:
    app: thanos-receiver
spec:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 2Gi
  storageClassName: local-storage
  local:
    path: /data/k8s/thanos-receiver
  persistentVolumeReclaimPolicy: Retain
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values: ["k8s-worker03"]
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: thanos-receiver-pvc
  namespace: kube-mon
spec:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 2G
  storageClassName: local-storage
---
apiVersion: v1
kind: Service
metadata:
  name: thanos-receiver
  namespace: kube-mon
spec:
  clusterIP: None
  ports:
    - name: grpc
      port: 10901
      targetPort: 10901
    - name: http
      port: 10902
      targetPort: 10902
    - name: remote-write
      port: 19291
      targetPort: 19291
  selector:
    app: thanos-receiver
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: hashring-config
  namespace: kube-mon
data:
  # Remote Write 请求路由,未配置租户即默认租户,默认租户的请求由以下三个 Receiver 实例进行处理
  hashring.json: |-
    [
      {
        "endpoints": [
            "thanos-receiver-0.thanos-receiver:10901",
            "thanos-receiver-1.thanos-receiver:10901",
            "thanos-receiver-2.thanos-receiver:10901"
        ]
      }
    ]
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: thanos-receiver
  name: thanos-receiver
  namespace: kube-mon
spec:
  selector:
    matchLabels:
      app: thanos-receiver
  serviceName: thanos-receiver
  # receiver 实例数量
  replicas: 3
  template:
    metadata:
      labels:
        app: thanos-receiver
        # 用以让 thanos-receiver 被 querier 组件服务发现
        thanos-store-api: "true"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - thanos-receiver
      volumes:
        - name: object-storage-config
          secret:
            secretName: thanos-objectstorage
        - name: hashring-config
          configMap:
            name: hashring-config
        - name: data-volume
          persistentVolumeClaim:
            claimName: thanos-receiver-pvc
      containers:
        - name: thanos-receiver
          image: thanosio/thanos:v0.31.0
          imagePullPolicy: IfNotPresent
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          args:
            - receive
            - --grpc-address=0.0.0.0:10901
            - --http-address=0.0.0.0:10902
            - --remote-write.address=0.0.0.0:19291
            - --receive.replication-factor=1
            - --objstore.config-file=/etc/secret/thanos.yaml
            - --tsdb.path=/var/thanos/receiver/$(POD_NAME)
            - --tsdb.retention=1d
            - --label=receive_replica="$(POD_NAME)"
            - --receive.local-endpoint=$(POD_NAME).thanos-receiver:10901 # 节点 endpoint,hashring 中记录的节点 host 需要与此处保持一致
            - --receive.hashrings-file=/var/lib/thanos-receive/hashring.json # hashring 文件,用于记录集群多个 Receiver 节点的哈希环配置
          ports:
            - containerPort: 10901
              name: grpc
            - containerPort: 10902
              name: http
            - containerPort: 19291
              name: remote-write
          livenessProbe:
            failureThreshold: 8
            periodSeconds: 30
            httpGet:
              port: 10902
              scheme: HTTP
              path: /-/healthy
          readinessProbe:
            failureThreshold: 20
            periodSeconds: 5
            httpGet:
              port: 10902
              scheme: HTTP
              path: /-/healthy
          volumeMounts:
            # 数据存储
            - name: data-volume
              mountPath: /var/thanos/receiver
              readOnly: false
            # 哈希环配置
            - name: hashring-config
              mountPath: /var/lib/thanos-receive
            # minio oss 存储认证配置
            - name: object-storage-config
              mountPath: /etc/secret
              readOnly: false

创建资源

$ kc apply -f thanos-receiver.yml

persistentvolume/thanos-receiver created
persistentvolumeclaim/thanos-receiver-pvc created
service/thanos-receiver created
configmap/hashring-config created
statefulset.apps/thanos-receiver created

检查确认

$ kc get sts -A                  
NAMESPACE   NAME                   READY   AGE
kube-mon    thanos-receiver        3/3     83s
kube-mon    thanos-store-gateway   2/2     11m


$ kc -n kube-mon logs --tail 10 thanos-receiver-0
# ...
level=info ts=2023-04-08T03:40:50.145702787Z caller=intrumentation.go:56 component=receive msg="changing probe status" status=ready
level=info ts=2023-04-08T03:40:50.145736805Z caller=receive.go:555 component=receive msg="storage started, and server is ready to receive web requests"
level=info ts=2023-04-08T03:40:50.146315543Z caller=receive.go:363 component=receive msg="listening for StoreAPI and WritableStoreAPI gRPC" address=0.0.0.0:10901
level=info ts=2023-04-08T03:40:50.146408006Z caller=grpc.go:131 component=receive service=gRPC/server component=receive msg="listening for serving gRPC" address=0.0.0.0:10901

8. prometheus

Prometheus 配置文件 ConfigMap 对象

apiVersion: v1
kind: ConfigMap
metadata:
  name: configmap-prom-config
  namespace: kube-mon
data:
  # 名称是 .tmpl 后续 thanos 要渲染一下
  prometheus.yaml.tmpl: |
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
      # For Thanos
      external_labels:
        cluster: dayo-thanos-demo
        # 每个 Prometheus 有一个唯一的标签
        replica: $(POD_NAME)
    # 指定 remote write 地址
    remote_write:
      - url: "http://thanos-receiver:19291/api/v1/receive"
    # 报警规则文件配置
    rule_files:
      - /etc/prometheus/rules/*.yml
    alerting:
      # 告警去重
      alert_relabel_configs:
        - regex: replica
          action: labeldrop
      alertmanagers:
        - scheme: http
          path_prefix: /
          static_configs:
            - targets: ['alertmanager:9193']
    # 保持不变
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']
    - job_name: 'node'
      kubernetes_sd_configs:
        - role: node
      relabel_configs:
        # 修改使用自定义端口
        - source_labels: [__address__]
          action: replace
          regex: ([^:]+):.*
          replacement: $1:9110
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubelet'
      kubernetes_sd_configs:
        - role: node
      # 使用 https 协议访问
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        # 跳过证书校验
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'cadvisor'
      kubernetes_sd_configs:
        - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
        - replacement: /metrics/cadvisor
          target_label: __metrics_path__
    - job_name: 'apiserver'
      kubernetes_sd_configs:
        # endpoints
        - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
        - source_labels: [__meta_kubernetes_service_label_component]
          action: keep
          # 根据正则过滤出 apiserver 服务组件的 endpoint
          regex: apiserver
    - job_name: 'pod'
      kubernetes_sd_configs:
        - role: endpoints
      relabel_configs:
        # 通过 service 的注解 prometheus.io/scrape: true 发现对应的 Endpoints(Pod)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        # 通过 prometheus.io/scheme 这个注解获取 http 或 https
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          action: replace
          # 生成 指标接口协议 label
          target_label: __scheme__
          regex: (https?)
        # 通过 prometheus.io/path 这个注解获取 指标接口端点
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          # 生成 指标接口端点路径 label
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          # ([^:]+) 非:开头出现一到多次,匹配 IP 地址
          # (?::\d+)? 不保存子组,:\d+,匹配 :port 出现 0 到 1次
          # (\d+) 端口
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
        # 映射 Service 的 Label 标签
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        # 将 namespace 映射成标签
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        # 将 Service 名称映射成标签
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: kubernetes_service_name
        # 将 Pod 名称映射成标签
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name

Prometheus 规则文件 ConfigMap 对象

apiVersion: v1
kind: ConfigMap
metadata:
  name: configmap-prom-rules
  namespace: kube-mon
data:
  node_records.yml: |+
    groups:
      - name: "node_rules"
        interval: 15s
        rules:
          #################
          #  CPU
          #################
          # 最近 1 分钟节点 CPU 使用率
          - record: node:cpu:cpu_usage
            expr: (1 - sum(irate(node_cpu_seconds_total{mode="idle"}[1m])) by (instance) / sum(irate(node_cpu_seconds_total[1m])) by (instance) )
          # 最近 1 分钟节点各 CPU 核心使用率
          - record: node:cpu:per_cpu_usage
            expr: (1 - sum(irate(node_cpu_seconds_total{mode="idle"}[1m])) by (instance, cpu)  / sum(irate(node_cpu_seconds_total[1m])) by (instance, cpu))
          #################
          #  Memory
          #################
          # 节点 内存 使用率
          - record: node:mem:memory_usage
            expr: (1 - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes) / node_memory_MemTotal_bytes)
          # tmpfs、devtmpfs 内存使用量(单位 MiB)
          - record: node:mem:tmpfs_used
            expr: (node_filesystem_size_bytes{fstype=~".*tmpfs"} - node_filesystem_free_bytes{fstype=~".*tmpfs"}) / 1024 / 1024
          # 最近一分钟内 slab 不可回收内存量的平均值(单位 MiB)
          - record: node:mem:slab_sunreclaim
            expr: avg_over_time(node_memory_SUnreclaim_bytes[1m]) / 1024 / 1024
          # 最近一分钟内 LRU list 中 不可释放内存量的平均值(单位 MiB)
          - record: node:mem:lru_unevictable
            expr: avg_over_time(node_memory_Unevictable_bytes[1m]) / 1024 / 1024
          #################
          #  Disk
          #################
          # 空间 已用百分比
          - record: node:disk:disk_space_usage
            expr: (1 - (node_filesystem_avail_bytes{fstype=~"ext.*|xfs|btrfs",device=~"/dev/vd.*"} / node_filesystem_size_bytes{fstype=~"ext.*|xfs|btrfs",device=~"/dev/vd.*"}))
          # Inode 已用百分比
          - record: node:disk:inode_space_usage
            expr: (1 - (node_filesystem_files_free{fstype="ext4"} / node_filesystem_files{fstype="ext4"}))
          #################
          #  DiskIO
          #################
          # 计算 1 分钟内平均每秒处理磁盘读请求数,对应 iostat -dxk 中的 r/s
          - record: node:disk:read_iops
            expr: sum by (instance) (rate(node_disk_reads_completed_total{device=~"vd.*"}[1m]))
          # 计算 1 分钟内平均每秒处理磁盘写请求数,对应 iostat -dxk 中的 w/s
          - record: node:disk:write_iops
            expr: sum by (instance) (rate(node_disk_writes_completed_total{device=~"vd.*"}[1m]))
          # 计算 1 分钟内平均每秒处理磁盘读带宽,对应 iostat -dxk 中的 rkB/s
          - record: node:disk:read_bandwidth
            expr: sum by (instance) (irate(node_disk_read_bytes_total{device=~"vd.*"}[1m]))
          # 计算 1 分钟内平均每秒处理磁盘写带宽,对应 iostat -dxk 中的 wkB/s
          - record: node:disk:write_bandwidth
            expr: sum by (instance) (irate(node_disk_written_bytes_total{device=~"vd.*"}[1m]))
          # 计算 1 分钟内平均读请求延迟 ms,对应 iostat -dxk 中的 r_await
          - record: node:disk:read_await
            expr: sum by (instance) (rate(node_disk_read_time_seconds_total{device=~"vd.*"}[1m]) / rate(node_disk_reads_completed_total{device=~"vd.*"}[1m]) * 1000)
          # 计算 1 分钟内平均写请求延迟,对应 iostat -dxk 中的 w_await
          - record: node:disk:write_await
            expr: sum by (instance) (rate(node_disk_write_time_seconds_total{device=~"vd.*"}[1m]) / rate(node_disk_writes_completed_total{device=~"vd.*"}[1m]) * 1000)
          #################
          #  File Descriptor
          #################
          # 系统已用文件描述符百分比
          - record: node:proc:os_fd_usage
            expr: (node_filefd_allocated / node_filefd_maximum)
          # 进程已用文件描述符百分比
          - record: node:proc:proc_fd_usage
            expr: (process_open_fds{job="node"} / process_max_fds{job="node"})
          #################
          #  Network
          #################
          # 各实例、各网卡 1 分钟内平均每秒接收字节数
          - record: node:net:network_rx
            expr: sum by(instance, device) (irate(node_network_receive_bytes_total{device=~"eth.*"}[1m]))
          # 各实例、各网卡 1 分钟内平均每秒发送字节数
          - record: node:net:network_tx
            expr: sum by(instance, device) (irate(node_network_transmit_bytes_total{device=~"eth.*"}[1m]))
          #################
          #  TCP
          #################
          # 各实例、各网卡 5 分钟内入向报文错误包占比(平均每秒)
          - record: node:tcp:rx_error_rate5m
            expr: sum by(instance, device) (rate(node_network_receive_errs_total{device=~"eth.*"}[5m]) / rate(node_network_receive_packets_total{device=~"eth.*"}[5m]))
          # 各实例、各网卡 5 分钟内出向报文错误包占比(平均每秒)
          - record: node:tcp:tx_error_rate5m
            expr: sum by(instance, device) (rate(node_network_transmit_errs_total{device=~"eth.*"}[5m]) / rate(node_network_transmit_packets_total{device=~"eth.*"}[5m]))
          # 各实例、各网卡 5 分钟内入向报文丢弃包占比(平均每秒)
          - record: node:tcp:rx_drop_rate5m
            expr: sum by(instance, device) (rate(node_network_receive_drop_total{device=~"eth.*"}[5m]) / rate(node_network_receive_packets_total{device=~"eth.*"}[5m]))
          # 各实例、各网卡 5 分钟内出向报文丢弃包占比(平均每秒)
          - record: node:tcp:tx_drop_rate5m
            expr: sum by(instance, device) (rate(node_network_transmit_drop_total{device=~"eth.*"}[5m]) / rate(node_network_transmit_drop_total{device=~"eth.*"}[5m]))
          # 当前重传报文率 与 30 分钟前对比,涨幅百分比
          - record: node:tcp:retrans_rate5m
            expr: (irate(node_netstat_Tcp_RetransSegs[1m]) / irate(node_netstat_Tcp_OutSegs[1m])) - (irate(node_netstat_Tcp_RetransSegs[1m] offset 30m) / irate(node_netstat_Tcp_OutSegs[1m] offset 30m))
          # 当前重置报文率 与 30 分钟前对比,涨幅百分比
          - record: node:tcp:rst_rate5m
            expr: (irate(node_netstat_Tcp_OutRsts[1m]) / irate(node_netstat_Tcp_OutSegs[1m])) - (irate(node_netstat_Tcp_OutRsts[1m] offset 30m) / irate(node_netstat_Tcp_OutSegs[1m] offset 30m))
          #################
          #  TCP Socket
          #################
          # 半连接队列 syn_backlog 溢出情况
          - record: node:socket:listen_drop
            expr: irate(node_netstat_TcpExt_ListenDrops[1m])
          # 全连接队列 accept 溢出情况
          - record: node:socket:listen_overflow
            expr: irate(node_netstat_TcpExt_ListenOverflows[1m])
          # 连接追踪表使用率
          #################
          #  conntrack table
          #################
          - record: node:net:conntrack_tb_usage
            expr: (node_nf_conntrack_entries / node_nf_conntrack_entries_limit)
  node_alerts.yml: |+
    groups:
      - name: node_alerts
        rules:
        ###### CPU ######
        - alert: HostHighCpuLoad
          # 最近 1m CPU 使用率超过 80%
          expr: node:cpu:cpu_usage > 0.8
          for: 0m
          labels:
            severity: warning
          annotations:
            summary: "{{ $labels.instance }} 节点 CPU 使用率过高"
            description: "最近一分钟内 {{ $labels.instance }} 节点 CPU 使用率超过 80%!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
            console: "URL: http://baidu.com"
        - alert: HostHighCpuCoreLoad
          # 最近 1m CPU 某个核心使用率超过 80%
          expr: node:cpu:per_cpu_usage > 0.8
          for: 1m
          labels:
            severity: warning
          annotations:
            summary: "{{ $labels.instance }} 节点 CPU 核心使用率过高"
            description: "最近一分钟内 {{ $labels.instance }} 节点 CPU 核心 {{ $labels.cpu }} 使用率超过 80%!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
            console: "URL: http://baidu.com"
        ###### Memory ######
        - alert: HostHighTmpfsUsed
          # tmpfs 内存使用超过 1 GiB
          expr: node:mem:tmpfs_used > 200
          for: 1m
          labels:
            severity: warning
          annotations:
            summary: "{{ $labels.instance }} 节点 tmpfs 使用率过高 !"
            description: "最近一分钟内 {{ $labels.instance }} 节点 tmpfs 使用率过高 !\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostHighMemorySlabUnreclaimUsed
          # slab 不可回收内存量内存量过高
          expr: node:mem:slab_sunreclaim > 1024
          for: 1m
          labels:
            severity: warning
          annotations:
            summary: "{{ $labels.instance }} slab 不可回收内存量内存量过高 "
            description: "最近一分钟内 {{ $labels.instance }} slab 不可回收内存量内存量过高 !\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostHighMemoryLruUnreclaimUsed
          # slab 不可回收内存量内存量过高
          expr: node:mem:lru_unevictable > 2048
          for: 1m
          labels:
            severity: warning
          annotations:
            summary: "{{ $labels.instance }} lru list 不可回收内存量内存量过高"
            description: "最近一分钟内 {{ $labels.instance }} lru list 不可回收内存量内存量过高 !\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        ###### Disk ######
        - alert: HostOutOfDiskSpace
          # 磁盘空间使用率超过 90%
          expr: node:disk:disk_space_usage > 0.9
          for: 1m
          labels:
            severity: warning
          annotations:
            summary: "最近一分钟内 {{ $labels.instance }} 节点 CPU 使用率超过 80%"
            description: "最近一分钟内 {{ $labels.instance }} 节点 CPU 使用率超过 80%!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostDiskWillFillIn24Hour
          # 通过predict_linear函数根据过去1h的数据,推测4小时后磁盘是否会满
          expr: predict_linear(node_filesystem_free_bytes[1h], 24*3600) < 0
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "预计实例 {{ $labels.instance }} 挂载点将在一天后打满!"
        - alert: HostOutofDiskInodes
          expr: node:disk:inode_space_usage > 0.8
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 磁盘 inode 超过 80%"
            description: "节点 {{ $labels.instance }} 磁盘 inode 超过 80%!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostInodesWillFillIn24Hour
          # 通过predict_linear函数根据过去1h的数据,推测4小时后磁盘 inode是否会满
          expr: predict_linear(node_filesystem_files_free[1h], 24*3600) < 0
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "预计实例 {{ $labels.instance }} 磁盘 inode 将在一天后打满!"
        ###### DiskIO ######
        - alert: HostUnusualDiskReadLatency
          expr: node:disk:read_await > 100
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 磁盘 读请求耗时(r_await)异常"
            description: "节点 {{ $labels.instance }} 磁盘 读请求耗时(r_await)异常!\n当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostUnusualDiskWriteLatency
          expr: node:disk:write_await > 100
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 磁盘 写请求耗时(w_await)异常"
            description: "节点 {{ $labels.instance }} 磁盘 写请求耗时(w_await)异常!\n当前值:{{ $value }}\n LABELS = {{ $labels }}"
        ###### File Descriptor ######
        - alert: HostHighSystemFdUsed
          expr: node:proc:os_fd_usage > 0.8
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 系统文件描述符使用率超过 80%"
            description: "节点 {{ $labels.instance }} 系统文件描述符使用率 80%!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        ###### File Descriptor ######
        - alert: HostHighSystemFdUsed
          expr: node:proc:proc_fd_usage > 0.8
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 进程文件描述符使用率超过 80%"
            description: "节点 {{ $labels.instance }} 进程文件描述符使用率 80%!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        ###### TCP ######
        - alert: HostNetworkReceiveErrRate
          expr: node:tcp:rx_error_rate5m > 0.01
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 接收报文错误占比异常"
            description: "节点 {{ $labels.instance }} 接收报文错误占比异常!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostNetworkTransmitErrRate
          expr: node:tcp:tx_error_rate5m > 0.01
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 发送报文错误占比异常"
            description: "节点 {{ $labels.instance }} 发送报文错误占比异常!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostNetworkReceiveDropRate
          expr: node:tcp:rx_drop_rate5m > 0.01
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 接收报文丢弃占比异常"
            description: "节点 {{ $labels.instance }} 接收报文丢弃占比异常!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostNetworkTransmitDropRate
          expr: node:tcp:rx_drop_rate5m > 0.01
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 发送报文丢弃占比异常"
            description: "节点 {{ $labels.instance }} 发送报文丢弃占比异常!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostUnusualNetworkRetransRate
          expr: node:tcp:retrans_rate5m > 20
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 报文重传率发生异常升高"
            description: "节点 {{ $labels.instance }} 报文重传率发生异常升高!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostUnusualNetworkResetRate
          expr: node:tcp:rst_rate5m > 20
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 报文重置率发生异常升高"
            description: "节点 {{ $labels.instance }} 报文重置率发生异常升高!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        ###### TCP Socket ######
        - alert: HostSynBacklogOverflow
          expr: node:socket:listen_overflow > 10
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 半连接队列存在溢出现象"
            description: "节点 {{ $labels.instance }} 半连接队列存在溢出现象!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostAcceptBacklogverflow
          expr: node:socket:listen_overflow > 10
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 半连接队列存在溢出现象"
            description: "节点 {{ $labels.instance }} 半连接队列存在溢出现象!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"
        - alert: HostHighConntrackTableUsage
          expr: node:net:conntrack_tb_usage > 80
          for: 1m
          labels:
            security: warning
          annotations:
            summary: "节点 {{ $labels.instance }} 连接追踪表使用率过高"
            description: "节点 {{ $labels.instance }} 连接追踪表使用率过高!\n 当前值:{{ $value }}\n LABELS = {{ $labels }}"

创建 ConfgiMap 对象

$ kc apply -f configmap-prometheus-config-receiver.yml
$ kc apply -f configmap-prometheus-rules.yml

Prometheus 资源定义

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-local
  labels:
    app: prometheus
spec:
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 40Gi
  storageClassName: local-storage
  local:
    # 需保证亲和性节点存在该目录
    path: /data/k8s/prometheus
  persistentVolumeReclaimPolicy: Retain
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                # 按照主机名选择 pv 亲和性节点
                - k8s-worker02
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-data
  namespace: kube-mon
spec:
  selector:
    matchLabels:
      app: prometheus
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
  storageClassName: local-storage
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: kube-mon
  labels:
    app: prometheus
spec:
  type: NodePort
  selector:
    app: prometheus
  ports:
    - name: http
      port: 9090
      targetPort: http
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
  namespace: kube-mon
  labels:
    app: prometheus
spec:
  serviceName: prometheus
  replicas: 2
  selector:
    matchLabels:
      app: prometheus
      thanos-store-api: "true"
  template:
    metadata:
      labels:
        app: prometheus
        thanos-store-api: "true"
    spec:
      serviceAccountName: prometheus
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - prometheus
      volumes:
        - name: prom-config-volume
          configMap:
            # ConfigMap 资源对象的名称
            name: configmap-prom-config
        - name: prom-rules-volume # Prometheus Rules
          configMap:
            # ConfigMap 资源对象的名称
            name: configmap-prom-rules
            items:
              - key: node_records.yml
                path: node_records.yml
              - key: node_alerts.yml
                path: node_alerts.yml
        - name: prom-config-shared-volume
          emptyDir: { }
        - name: data-volume
          persistentVolumeClaim:
            claimName: prometheus-data
      initContainers:
        - name: fix-permissions
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          image: busybox:stable
          imagePullPolicy: IfNotPresent
          command: ['/bin/sh', '-c', "mkdir -p /prometheus/$(POD_NAME) && chown -R nobody:nobody /prometheus"]
          volumeMounts:
            - name: data-volume
              mountPath: /prometheus
      containers:
        - name: prometheus
          image: prom/prometheus:v2.35.0
          imagePullPolicy: IfNotPresent
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          args:
            - "--config.file=/etc/prometheus-shared/prometheus.yaml"
            - "--storage.tsdb.path=/prometheus/$(POD_NAME)"
            - "--storage.tsdb.retention.time=6h"
            - "--storage.tsdb.no-lockfile"
            - "--storage.tsdb.min-block-duration=2h" # Thanos 处理数据压缩
            - "--storage.tsdb.max-block-duration=2h"
            - "--web.enable-admin-api" # 通过一些命令去管理数据
            - "--web.enable-lifecycle" # 支持热更新  localhost:9090/-/reload 加载
            - "--web.listen-address=:9090"
            - "--web.external-url=http://0.0.0.0:9090"
          ports:
            - name: http
              containerPort: 9090
          resources:
            requests:
              cpu: 250m
              memory: 1Gi
            limits:
              cpu: 250m
              memory: 1Gi
          volumeMounts:
            - name: prom-config-shared-volume
              mountPath: /etc/prometheus-shared/
            - name: prom-rules-volume
              mountPath: /etc/prometheus/rules/
            - name: prom-config-volume
              mountPath: /etc/prometheus
            - name: data-volume
              mountPath: /prometheus
        # 渲染 Prometheus 配置
        - name: thanos
          image: thanosio/thanos:v0.31.0
          imagePullPolicy: IfNotPresent
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          args:
            - sidecar
            - --log.level=debug
            - --reloader.config-file=/etc/prometheus/prometheus.yaml.tmpl
            - --reloader.config-envsubst-file=/etc/prometheus-shared/prometheus.yaml
            - --reloader.rule-dir=/etc/prometheus/rules/
          volumeMounts:
            - name: prom-config-shared-volume
              mountPath: /etc/prometheus-shared/
            - name: prom-rules-volume
              mountPath: /etc/prometheus/rules/
            - name: prom-config-volume
              mountPath: /etc/prometheus
            - name: data-volume
              mountPath: /prometheus

创建 prometheus 资源

$ kc apply -f prometheus-receiver.yml 
service/prometheus created
statefulset.apps/prometheus created

检查确认

$ kc get sts -l app=prometheus -n kube-mon
NAME    READY   AGE
prometheus   2/2     2m48s

$ kc -n kube-mon logs --tail 10 -l app=prometheus
ts=2023-04-08T03:49:53.287Z caller=kubernetes.go:313 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2023-04-08T03:49:53.296Z caller=main.go:1179 level=info msg="Completed loading of configuration file" filename=/etc/prometheus-shared/prometheus.yaml totalDuration=90.057454ms db_storage=874ns remote_storage=382.102µs web_handler=348ns query_engine=831ns scrape=236.853µs scrape_sd=79.543253ms notify=28.406µs notify_sd=21.174µs rules=7.966266ms tracing=7.848µs
ts=2023-04-08T03:49:53.296Z caller=main.go:910 level=info msg="Server is ready to receive web requests."
ts=2023-04-08T03:49:56.818Z caller=main.go:1142 level=info msg="Loading configuration file" filename=/etc/prometheus-shared/prometheus.yaml
ts=2023-04-08T03:49:56.819Z caller=kubernetes.go:313 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2023-04-08T03:49:56.821Z caller=kubernetes.go:313 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2023-04-08T03:49:56.834Z caller=main.go:1179 level=info msg="Completed loading of configuration file" filename=/etc/prometheus-shared/prometheus.yaml totalDuration=15.764754ms db_storage=1.574µs remote_storage=125.174µs web_handler=603ns query_engine=1.214µs scrape=82.578µs scrape_sd=2.370842ms notify=13.75µs notify_sd=11.465µs rules=12.105489ms tracing=8.153µs
ts=2023-04-08T03:50:01.736Z caller=dedupe.go:112 component=remote level=info remote_name=216c76 url=http://thanos-receiver:19291/api/v1/receive msg="Done replaying WAL" duration=8.450592918s

访问 Prometheus WebUI 检查配置情况

$ kc get svc prometheus -n kube-mon
NAME    TYPE    CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
prometheus   NodePort   10.108.27.23   <none>    9090:31994/TCP   4m41s

9. thanos query

Thanos Querier 组件支持对监控数据自动去重,提供一个全局的统一查询入口

Receiver 模式 WebUI 中是看不到 Alerts、Rule、Target 相关信息,因为它本地的数据只有远程写过来的 tsdb 数据

资源定义

---
# 创建 Serivce 对象为 thanos-querier 提供全局查询服务
apiVersion: v1
kind: Service
metadata:
  name: thanos-querier
  namespace: kube-mon
  labels:
    app: thanos-querier
spec:
  type: NodePort
  selector:
    app: thanos-querier
  ports:
    - port: 9090
      targetPort: http
      name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-querier
  namespace: kube-mon
  labels:
    app: thanos-querier
spec:
  replicas: 2
  selector:
    matchLabels:
      app: thanos-querier
  template:
    metadata:
      labels:
        app: thanos-querier
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - thanos-querier
      containers:
        - name: thanos
          image: thanosio/thanos:v0.31.0
          imagePullPolicy: IfNotPresent
          args:
            - query
            - --log.level=debug
            # 用于过滤重复数据的标签
            - --query.replica-label=replica
            - --query.replica-label=receive_replica
            # Querier 会通过 Store 发现 Receiver 实例,这样才能查询到近期的数据,所以这里要配置上
            - --store=dnssrv+thanos-store-gateway:10901
          ports:
            - name: grpc
              containerPort: 10901
            - name: http
              containerPort: 10902
          resources:
            requests:
              memory: 512Mi
              cpu: 250m
            limits:
              memory: 512Mi
              cpu: 250m
          livenessProbe:
            initialDelaySeconds: 10
            httpGet:
              path: /-/healthy
              port: http
          readinessProbe:
            initialDelaySeconds: 15
            httpGet:
              path: /-/healthy
              port: http

创建资源

$ kc apply -f thanos-querier.yml     
service/thanos-querier         created
deployment.apps/thanos-querier created

检查确认

$ kc get deploy thanos-querier -n kube-mon
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
thanos-querier   2/2     2    2    21s

$ kc -n kube-mon logs -l app=thanos-querier
# ...
level=info ts=2023-04-08T03:56:03.094267518Z caller=intrumentation.go:75 msg="changing probe status" status=healthy
# 启动完成开始监听
level=info ts=2023-04-08T03:56:03.094289948Z caller=http.go:73 service=http/server component=query msg="listening for requests and metrics" address=0.0.0.0:10902
level=info ts=2023-04-08T03:56:03.094453505Z caller=tls_config.go:195 service=http/server component=query msg="TLS is disabled." http2=false
# 完成就绪检查
level=info ts=2023-04-08T03:56:03.094509886Z caller=intrumentation.go:56 msg="changing probe status" status=ready
level=info ts=2023-04-08T03:56:03.094540468Z caller=grpc.go:131 service=gRPC/server component=query msg="listening for serving gRPC" address=0.0.0.0:10901
level=debug ts=2023-04-08T03:56:08.096982106Z caller=endpointset.go:309 component=endpointset msg="starting to update API endpoints" cachedEndpoints=0
level=debug ts=2023-04-08T03:56:08.100735284Z caller=endpointset.go:312 component=endpointset msg="checked requested endpoints" activeEndpoints=5 cachedEndpoints=0
# ...
# 发现 store 实例   10.244.2.4:10901
level=info ts=2023-04-08T03:56:08.100807469Z caller=endpointset.go:349 component=endpointset msg="adding new store with [storeAPI]" address=10.244.2.4:10901 extLset=
# 发现 receive 实例 10.244.1.6:10901
level=info ts=2023-04-08T03:56:08.100829478Z caller=endpointset.go:349 component=endpointset msg="adding new receive with [storeAPI exemplarsAPI]" address=10.244.1.6:10901 extLset="{receive_replica=\"thanos-receiver-2\", tenant_id=\"default-tenant\"}"
# 发现 store 实例   10.244.2.5:10901
level=info ts=2023-04-08T03:56:08.100846078Z caller=endpointset.go:349 component=endpointset msg="adding new store with [storeAPI]" address=10.244.2.5:10901 extLset=
#  发现 receive 实例 10.244.1.4:10901
level=info ts=2023-04-08T03:56:08.100866424Z caller=endpointset.go:349 component=endpointset msg="adding new receive with [storeAPI exemplarsAPI]" address=10.244.1.4:10901 extLset="{receive_replica=\"thanos-receiver-0\", tenant_id=\"default-tenant\"}"
#  发现 receive 实例 10.244.1.5:10901
level=info ts=2023-04-08T03:56:08.100886006Z caller=endpointset.go:349 component=endpointset msg="adding new receive with [storeAPI exemplarsAPI]" address=10.244.1.5:10901 extLset="{receive_replica=\"thanos-receiver-1\", tenant_id=\"default-tenant\"}"
level=debug ts=2023-04-08T03:56:13.094131667Z caller=endpointset.go:309 component=endpointset msg="starting to update API endpoints" cachedEndpoints=5
# ...

访问 WebUI 试着执行查询

$ kc get svc thanos-querier -n kube-mon
NAME    TYPE    CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
thanos-querier   NodePort   10.97.136.11   <none>    9090:31974/TCP   6m59s

10. thanos frontend query

Thanos 提供 Query Frontend 组件(可选)来提升查询性能,它的工作内容主要是两个方面

  1. 将大型查询拆分为多个较小的查询
  2. 缓存查询结果以此提升性能

资源定义

apiVersion: v1
kind: Service
metadata:
  name: thanos-query-frontend
  namespace: kube-mon
  labels:
    app: thanos-query-frontend
spec:
  type: NodePort
  selector:
    app: thanos-query-frontend
  ports:
    - port: 9090
      name: http
      targetPort: 9090
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-query-frontend
  namespace: kube-mon
  labels:
    app: thanos-query-frontend
spec:
  selector:
    matchExpressions:
      - key: app
        operator: In
        values: ["thanos-query-frontend"]
  template:
    metadata:
      labels:
        app: thanos-query-frontend
    spec:
      containers:
        - name: thanos
          image: thanosio/thanos:v0.31.0
          imagePullPolicy: IfNotPresent
          env:
            - name: HOST_IP_ADDRESS
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
          ports:
            - containerPort: 9090
              name: http
          args:
            - query-frontend
            - --log.level=info
            - --log.format=logfmt
            - --query-frontend.compress-responses
            - --http-address=0.0.0.0:9090
            # 下游 querier
            - --query-frontend.downstream-url=http://thanos-querier.kube-mon.svc.cluster.local:9090
            - --query-range.split-interval=12h             # 以 12h 半天为拆分单位
            - --query-range.max-retries-per-request=10     # HTTP 请求失败时最大重试次数
            - --query-frontend.log-queries-longer-than=10s # 慢查询阈值
            - --labels.split-interval=12h                  # 将长查询拆分为多个短查询
            - --labels.max-retries-per-request=10
            - |-
              --query-range.response-cache-config="config":
                max_size: "200MB"
                max_size_items: 0
                validity: 0s
              type: IN-MEMORY
            - |-
              --labels.response-cache-config="config":
                max_size: "200MB"
                max_size_items: 0
                validity: 0s
              type: IN-MEMORY
          livenessProbe:
            failureThreshold: 4
            periodSeconds: 30
            httpGet:
              port: 9090
              scheme: HTTP
              path: /-/healthy
          readinessProbe:
            failureThreshold: 20
            periodSeconds: 5
            httpGet:
              port: 9090
              scheme: HTTP
              path: /-/ready
          resources:
            requests:
              cpu: 500m
              memory: 512Mi
            limits:
              cpu: 500m
              memory: 512Mi

创建资源

$ kc apply -f thanos-query-frontend.yml

检查确认

$ kc get all -n kube-mon -l app=thanos-query-frontend
NAME    READY   STATUS    RESTARTS   AGE
pod/thanos-query-frontend-7b56c5b69f-llm6x   1/1     Running   0    2m39s

NAME    TYPE    CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/thanos-query-frontend   NodePort   10.101.88.51   <none>    9090:31335/TCP   2m39s

NAME    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/thanos-query-frontend   1/1     1    1    2m39s

NAME    DESIRED   CURRENT   READY   AGE
replicaset.apps/thanos-query-frontend-7b56c5b69f   1    1    1    2m39s

$ kc -n kube-mon logs -l app=thanos-query-frontend

11. Grafana

资源定义

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana-local
  labels:
    app: grafana
spec:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 1Gi
  storageClassName: local-storage
  local:
    path: /data/k8s/grafana
  persistentVolumeReclaimPolicy: Retain
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - k8s-worker03
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-data
  namespace: kube-mon
spec:
  selector:
    matchLabels:
      app: grafana
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: local-storage
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: kube-mon
spec:
  type: NodePort
  ports:
    - port: 3000
  selector:
    app: grafana
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: kube-mon
spec:
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      volumes:
        - name: storage
          persistentVolumeClaim:
            claimName: grafana-data
      initContainers:
        - name: fix-permissions
          image: busybox
          command: [chown, -R, "472:472", "/var/lib/grafana"]
          volumeMounts:
            - mountPath: /var/lib/grafana
              name: storage
      containers:
        - name: grafana
          image: grafana/grafana:9.4.7
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 3000
              name: grafana
          env:
            # Grafana 登录用户名密码
            - name: GF_SECURITY_ADMIN_USER
              value: admin
            - name: GF_SECURITY_ADMIN_PASSWORD
              value: admin123
          readinessProbe:
            failureThreshold: 10
            httpGet:
              path: /api/health
              port: 3000
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 30
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /api/health
              port: 3000
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              cpu: 150m
              memory: 512Mi
            requests:
              cpu: 150m
              memory: 512Mi
          volumeMounts:
            - mountPath: /var/lib/grafana
              name: storage

创建资源

$ kc apply -f grafana.yml

检查确认

$ kc get pods -n kube-mon -o wide -l app=grafana
$ kc -n kube-mon logs -f -l app=grafana

创建 2 个 数据源

导入模版 18435,选择对应数据源

分别查看数据是否正确获取并渲染

12. promoter

promoter 配置文件 promoter-config.yml

---
global:
  # 执行 PromQL 语句,用以渲染图片
  prometheus_url: http://prometheus:9090
  dingtalk_api_token: xxx
  dingtalk_api_secret: xxx
  wechat_api_secret: xxx-xxx
  wechat_api_corp_id: xxx
s3:
  # 阿里云 OSS,用以保存生成的图片
  access_key: "xxx"
  secret_key: "xxx"
  # endpoint: "oss-cn-beijing-internal.aliyuncs.com"
  endpoint: "oss-cn-beijing.aliyuncs.com"
  region: "cn-beijing"
  bucket: "xxx"

receivers:
  - name: dingtalk
    dingtalk_config:
      message_type: markdown
      markdown:
        title: '{{ template "dingtalk.default.title" . }}'
        text: '{{ template "dingtalk.default.content" . }}'
      at:
        atMobiles: [ "138xxxx" ]
        isAtAll: true
  - name: wechat
    wechat_config:
      message_type: markdown
      message: '{{ template "wechat.default.message" . }}'
      to_user: "@all"
      agent_id: 1000002

生成 secret 密文 data

$ cat promoter-config.yml | base64

配置 promoter secret 对象

apiVersion: v1
kind: Secret
metadata:
  name: secret-promoter-config
  namespace: kube-mon
data:
  config.yml: |
    # 密文 data

创建 secret 对象

$ kc apply -f secret-promoter-config.yml

promoter 工作负载定义

apiVersion: v1
kind: Service
metadata:
  name: promoter
  namespace: kube-mon
  labels:
    app: promoter
spec:
  type: ClusterIP
  selector:
    app: promoter
  ports:
    - port: 9194
      protocol: TCP
      targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: promoter
  namespace: kube-mon
  labels:
    app: promoter
spec:
  selector:
    matchLabels:
      app: promoter
  template:
    metadata:
      labels:
        app: promoter
    spec:
      volumes:
        - name: promoter-config
          secret:
            secretName: secret-promoter-config
      containers:
        - name: promoter
          image: lotusching/promoter:latest
          imagePullPolicy: IfNotPresent
          command:
            - "/promoter/bin/promoter"
            - "--config.file=/etc/secret/config.yml"
          volumeMounts:
            - mountPath: /etc/secret
              name: promoter-config
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP

部署 alertmanager webhook 服务

$ kc apply -f promoter.yml

检查确认

$ kc get pods -n kube-mon -o wide -l app=promoter
$ kc -n kube-mon logs -l app=promoter

13. alertmanager

AlertManager ConfigMap 配置对象

apiVersion: v1
kind: ConfigMap
metadata:
  name: configmap-alertmanager-config
  namespace: kube-mon
data:
  alertmanager.yml: |
    global:
      # 当 alertmanager 持续多长时间未接收到告警后标记告警状态为 resolved
      resolve_timeout: 5m
    # 告警路由
    route:
      # 这里的标签列表是接收到报警信息后的重新分组标签
      # 如,接收到的报警信息里有许多具有 instance=A 和 alertname=xx 这样标签的报警信息将会批量被聚合到一个分组里面
      group_by: ['instance', 'alertname']
      group_wait: 1s
      group_interval: 10s
      # 警报重复间隔,每2分钟重复一次警报
      repeat_interval: 2m
      # 警报接收端,这里配置为下面定义的钩子
      receiver: 'promoter-webhook-wechat'
      routes:
      - match_re:
          # severity: ^(error|critical)$
          severity: ^(critical)$
        receiver: promoter-webhook-dingtalk
        continue: true
    receivers:
      - name: 'promoter-webhook-dingtalk'
        webhook_configs:
        # 配置 promoter service 地址端口
        - url: "http://promoter:9194/dingtalk/send"
          send_resolved: true
      - name: 'promoter-webhook-wechat'
        webhook_configs:
        - url: "http://promoter:9194/wechat/send"
          send_resolved: true

AlertManager 工作负载

apiVersion: v1
kind: Service
metadata:
  name: alertmanager
  namespace: kube-mon
  labels:
    app: alertmanager
spec:
  selector:
    app: alertmanager
  type: ClusterIP
  ports:
    - port: 9193
      targetPort: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: alertmanager
  namespace: kube-mon
  labels:
    app: alertmanager
spec:
  selector:
    matchLabels:
      app: alertmanager
  template:
    metadata:
      labels:
        app: alertmanager
    spec:
      volumes:
        - name: alertmanager-config
          configMap:
            name: configmap-alertmanager-config
      containers:
        - name: alertmanager
          image: prom/alertmanager:v0.25.0
          imagePullPolicy: IfNotPresent
          args:
            - "--config.file=/etc/alertmanager/alertmanager.yml"
          ports:
            - containerPort: 9093
              name: http
          volumeMounts:
            - mountPath: "/etc/alertmanager"
              name: alertmanager-config
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 100m
              memory: 256Mi

部署 alertmanager

$ kc apply -f configmap-alertmanager-config.yml
$ kc apply -f alertmanager.yml

检查确认

$ kc get all -n kube-mon -l app=alertmanager
$ kc -n kube-mon logs -l app=alertmanager

14. 功能确认

告警测试

下面命令会触发 HostHighTmpfsUsed 告警策略

$ dd if=/dev/urandom of=testfile count=300 bs=1M
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 1.13119 s, 278 MB/s

稍等片刻

OSS 检查

历史数据也正常上传到了 monio

Grafana 大盘

正常渲染

Thanos Sidecar 模式

部署过程大致与 Receiver 类似,所以仅是贴下过程,先初始化创建 ECS

1. namespace

创建数据目录

# k8s-worker02
$ mkdir -p /data/k8s/{prometheus,thanos-store-gateway-cache}

# k8s-worker03
$ mkdir -p /data/k8s/{grafana,minio,thanos-receiver}

创建 kube-mon 命名空间

$ kc apply -f ns.yml

2. RBA

为 prometheus 创建 serviceaccount,role、clusterrolebinding

$ kc apply -f rbac.yml

3. StorageClass

创建资源

$ kc apply -f storageclass.yml

4. Minio

创建 minio 对象存储

$ kc apply -f minio-deploy.yml
$ kc describe pod -l app=minio
$ kc logs -l app=minio

创建 凭证 secret

$ kc create secret generic thanos-objectstorage --from-file=thanos.yaml=thanos-minio.yml -n kube-mon

访问 WebUI、创建 bucket thanos

访问 WebUI

# 用户名 m1n10_AccessKey 
# 密码   m1n10_SecretKey 

5. node_exporter

创建资源

$ kc apply -f node-exporter.yml
$ kc -n kube-mon describe pod -l app=node-exporter
$ kc -n kube-mon logs -l app=node-exporter

6. thanos Store

创建资源

$ kc apply -f thanos-store.yml
$ kc -n kube-mon describe pod -l app=thanos-store-gateway
$ kc -n kube-mon logs -l app=thanos-store-gateway

7. Prometheus、Thanos sidecar

配置定义

apiVersion: v1
kind: ConfigMap
metadata:
  name: configmap-prom-config
  namespace: kube-mon
data:
  # 名称是 .tmpl 后续 thanos 要渲染一下
  prometheus.yaml.tmpl: |
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
      # For Thanos
      external_labels:
        cluster: dayo-thanos-demo
        # 每个 Prometheus 有一个唯一的标签
        replica: $(POD_NAME)
    # 报警规则文件配置
    rule_files:
      - /etc/prometheus/rules/*.yml
    alerting:
      # 告警去重
      alert_relabel_configs:
        - regex: replica
          action: labeldrop
      alertmanagers:
        - scheme: http
          path_prefix: /
          static_configs:
            - targets: ['alertmanager:9193']
    # 保持不变
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']
    - job_name: 'node'
      kubernetes_sd_configs:
        - role: node
      relabel_configs:
        # 修改使用自定义端口
        - source_labels: [__address__]
          action: replace
          regex: ([^:]+):.*
          replacement: $1:9110
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubelet'
      kubernetes_sd_configs:
        - role: node
      # 使用 https 协议访问
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        # 跳过证书校验
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'cadvisor'
      kubernetes_sd_configs:
        - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
        - replacement: /metrics/cadvisor
          target_label: __metrics_path__
    - job_name: 'apiserver'
      kubernetes_sd_configs:
        # endpoints
        - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
        - source_labels: [__meta_kubernetes_service_label_component]
          action: keep
          # 根据正则过滤出 apiserver 服务组件的 endpoint
          regex: apiserver
    - job_name: 'pod'
      kubernetes_sd_configs:
        - role: endpoints
      relabel_configs:
        # 通过 service 的注解 prometheus.io/scrape: true 发现对应的 Endpoints(Pod)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        # 通过 prometheus.io/scheme 这个注解获取 http 或 https
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          action: replace
          # 生成 指标接口协议 label
          target_label: __scheme__
          regex: (https?)
        # 通过 prometheus.io/path 这个注解获取 指标接口端点
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          # 生成 指标接口端点路径 label
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          # ([^:]+) 非:开头出现一到多次,匹配 IP 地址
          # (?::\d+)? 不保存子组,:\d+,匹配 :port 出现 0 到 1次
          # (\d+) 端口
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
        # 映射 Service 的 Label 标签
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        # 将 namespace 映射成标签
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        # 将 Service 名称映射成标签
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: kubernetes_service_name
        # 将 Pod 名称映射成标签
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name

资源定义

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-local
  labels:
    app: prometheus
spec:
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 40Gi
  storageClassName: local-storage
  local:
    # 需保证亲和性节点存在该目录
    path: /data/k8s/prometheus
  persistentVolumeReclaimPolicy: Retain
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                # 按照主机名选择 pv 亲和性节点
                - k8s-worker02
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-data
  namespace: kube-mon
spec:
  selector:
    matchLabels:
      app: prometheus
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
  storageClassName: local-storage
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: kube-mon
  labels:
    app: prometheus
spec:
  type: NodePort
  selector:
    app: prometheus
  ports:
    - name: http
      port: 9090
      targetPort: http
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
  namespace: kube-mon
  labels:
    app: prometheus
spec:
  serviceName: prometheus
  replicas: 2
  selector:
    matchLabels:
      app: prometheus
      thanos-store-api: "true"
  template:
    metadata:
      labels:
        app: prometheus
        thanos-store-api: "true"
    spec:
      serviceAccountName: prometheus
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - prometheus
      volumes:
        - name: object-storage-config
          secret:
            secretName: thanos-objectstorage
        - name: prom-config-volume
          configMap:
            # ConfigMap 资源对象的名称
            name: configmap-prom-config
        - name: prom-rules-volume # Prometheus Rules
          configMap:
            # ConfigMap 资源对象的名称
            name: configmap-prom-rules
            items:
              - key: node_records.yml
                path: node_records.yml
              - key: node_alerts.yml
                path: node_alerts.yml
        - name: prom-config-shared-volume
          emptyDir: { }
        - name: data-volume
          persistentVolumeClaim:
            claimName: prometheus-data
      initContainers:
        - name: fix-permissions
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          image: busybox:stable
          imagePullPolicy: IfNotPresent
          command: ['/bin/sh', '-c', "mkdir -p /prometheus/$(POD_NAME) && chown -R nobody:nobody /prometheus"]
          volumeMounts:
            - name: data-volume
              mountPath: /prometheus
      containers:
#        - name: debug
#          image: busybox
#          imagePullPolicy: IfNotPresent
#          command: ["/bin/sh", "-c", "sleep 3600"]
#          volumeMounts:
#            - name: prom-config-shared-volume
#              mountPath: /etc/prometheus-shared/
#            - name: prom-rules-volume
#              mountPath: /etc/prometheus/rules/
#            - name: prom-config-volume
#              mountPath: /etc/prometheus/
#            - name: data-volume
#              mountPath: /prometheus
        - name: prometheus
          image: prom/prometheus:v2.35.0
          imagePullPolicy: IfNotPresent
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          args:
            - "--config.file=/etc/prometheus-shared/prometheus.yaml"
            - "--storage.tsdb.path=/prometheus/$(POD_NAME)"
            - "--storage.tsdb.retention.time=6h"
            - "--storage.tsdb.no-lockfile"
            # !!!这里设置为 10m 仅为了测试 Compact 压缩合并,合理值应为 2h!!!
            - "--storage.tsdb.min-block-duration=10m" # Thanos 处理数据压缩
            - "--storage.tsdb.max-block-duration=10m" 
            - "--web.enable-admin-api" # 通过一些命令去管理数据
            - "--web.enable-lifecycle" # 支持热更新  localhost:9090/-/reload 加载
            - "--web.listen-address=:9090"
            - "--web.external-url=http://0.0.0.0:9090"
          ports:
            - name: http
              containerPort: 9090
          resources:
            requests:
              cpu: 250m
              memory: 1Gi
            limits:
              cpu: 250m
              memory: 1Gi
          volumeMounts:
            - name: prom-config-shared-volume
              mountPath: /etc/prometheus-shared/
            - name: prom-rules-volume
              mountPath: /etc/prometheus/rules/
            - name: prom-config-volume
              mountPath: /etc/prometheus
            - name: data-volume
              mountPath: /prometheus
        - name: thanos
          image: thanosio/thanos:v0.31.0
          imagePullPolicy: IfNotPresent
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          args:
            - sidecar
            - --log.level=debug
            - --tsdb.path=/prometheus/$(POD_NAME)
            - --prometheus.url=http://localhost:9090
            - --reloader.config-file=/etc/prometheus/prometheus.yaml.tmpl
            - --reloader.config-envsubst-file=/etc/prometheus-shared/prometheus.yaml
            - --reloader.rule-dir=/etc/prometheus/rules/
            - --objstore.config-file=/etc/secret/thanos.yaml
          ports:
            - containerPort: 10901
              name: grpc
            - containerPort: 10902
              name: http-sidecar
          resources:
            requests:
              cpu: 250m
              memory: 1Gi
            limits:
              cpu: 250m
              memory: 1Gi
          volumeMounts:
            - name: prom-config-shared-volume
              mountPath: /etc/prometheus-shared/
            - name: prom-rules-volume
              mountPath: /etc/prometheus/rules/
            - name: prom-config-volume
              mountPath: /etc/prometheus
            - name: data-volume
              mountPath: /prometheus
            - name: object-storage-config
              mountPath: /etc/secret
              readOnly: false

创建资源

$ kc apply -f configmap-prometheus-config-sidecar.yml
$ kc apply -f configmap-prometheus-rules.yml
$ kc apply -f prometheus-sidecar-with-store.yml

检查资源

$ kc -n kube-mon describe pod -l app=prometheus
$ kc -n kube-mon logs -l app=prometheus

8. Thanos Query

Sidecar 与 Receiver 不同,Sidecar 模式下 Query WebUI 可以查看到 Alert、Record rule、Targets

这是因为 边车容器 与 Prometheus 容器一起运行, --reloader.config* 参数读取并 watch 配置文件变化,自然是能获取到

创建资源

$ kc apply -f thanos-querier.yml
$ kc -n kube-mon get pod -l app=thanos-querier
$ kc -n kube-mon describe pod -l app=thanos-querier
$ kc -n kube-mon logs -l app=thanos-querier

9. Thanos frontend Query

$ kc apply -f thanos-query-frontend.yml
$ kc -n kube-mon get pod -l app=thanos-query-frontend
$ kc -n kube-mon describe pod -l app=thanos-query-frontend
$ kc -n kube-mon logs -l app=thanos-query-frontend

10. Thanos Compact

当监控数据量非常庞大时,可以考虑安装 Thanos Compactor 组件,Compactor 组件支持数据块 压缩、清理、降采样

  • 压缩:压缩 block(将多个 block 合并成一个)
  • 清理:清理超过保留期限的 block
  • 降采样:降低数据精度
    • --retention.resolution-raw:对象存储中只保存特定时间长度的数据块,超过即清理(单位:d,默认 0d)
    • --retention.resolution-5m:为每个存储时长大于 40 小时的块开辟新的存储区域(块),块内数据以 5 分钟为精度进行下采样(单位:d,默认 0d)
    • --retention.resolution-1h:为每个存储时长大于 10 天的块中开辟新的存储区域(块),块内数据以 1 小时为精度进行下采样(单位:d,默认 0d)

官方文档:https://thanos.io/tip/components/sidecar.md/

资源定义

apiVersion: v1
kind: Service
metadata:
  name: thanos-compactor
  namespace: kube-mon
  labels:
    app: thanos-compactor
spec:
  ports:
    - port: 10902
      targetPort: http
      name: http
  selector:
    app: thanos-compactor
  type: NodePort
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-compactor
  namespace: kube-mon
  labels:
    app: thanos-compactor
spec:
  replicas: 1
  selector:
    matchLabels:
      app: thanos-compactor
  serviceName: thanos-compactor
  template:
    metadata:
      labels:
        app: thanos-compactor
    spec:
      volumes:
        - name: object-storage-config
          secret:
            secretName: thanos-objectstorage
      containers:
        - name: thanos
          image: thanosio/thanos:v0.31.0
          imagePullPolicy: IfNotPresent
          args:
            - "compact"
            - "--log.level=debug"
            - "--data-dir=/data"
            - "--objstore.config-file=/etc/secret/thanos.yaml"
            - "--retention.resolution-raw=60d" # 保留远端对象存储中多久的数据
            # 空间充裕的话可以关闭下采样
            # - "--debug.disable-downsampling" 
            - "--wait"
          ports:
            - name: http
              containerPort: 10902
          livenessProbe:
            httpGet:
              port: 10902
              path: /-/healthy
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              port: 10902
              path: /-/ready
            initialDelaySeconds: 15
          volumeMounts:
            - name: object-storage-config
              mountPath: /etc/secret
              readOnly: false

创建资源

$ kc apply -f thanos-compactor.yml

检查确认

$ kc -n kube-mon describe pod -l app=thanos-compactor
$ kc -n kube-mon logs -l app=thanos-compactor

访问 WebUI

当前没有上传到数据块,所以提示 No block found.

稍等一段时间后,我们再查看下,回顾下 Prometheus tsdb 参数

# !!!这里设置为 10m 仅为了测试 Compact 压缩合并,合理值应为 2h!!!
- "--storage.tsdb.min-block-duration=10m" # Thanos 处理数据压缩
- "--storage.tsdb.max-block-duration=10m" 

OK,可以看到,一个个 10m 的小快被合并在了一起

11. Grafana

$ kc apply -f grafana.yml
$ kc -n kube-mon describe pod -l app=grafana
$ kc get pods -n kube-mon -o wide -l app=grafana
$ kc -n kube-mon logs -f -l app=grafana

12. Promoter

$ kc apply -f secret-promoter-config.yml
$ kc apply -f promoter.yml
$ kc get pods -n kube-mon -o wide -l app=promoter
$ kc -n kube-mon logs -l app=promoter

13. AlertManager

# 部署
$ kc apply -f configmap-alertmanager-config.yml
$ kc apply -f alertmanager.yml

# 检查
$ kc get all -n kube-mon -l app=alertmanager
$ kc -n kube-mon logs -l app=alertmanager

14. 功能确认

告警测试

下面命令会触发 HostHighTmpfsUsed 告警策略

$ dd if=/dev/urandom of=testfile count=300 bs=1M
300+0 records in
300+0 records out
314572800 bytes (315 MB) copied, 1.13119 s, 278 MB/s

OK,钉钉通过

微信通过

OSS 存储

观察日志

kc -n kube-mon logs prometheus-0 -c thanos | grep upload

关键输出

level=debug ts=2023-04-10T02:36:39.12636033Z caller=objstore.go:288 msg="uploaded file" from=/prometheus/prometheus-0/thanos/upload/01GXMGAZMBZT67FVVRV0HZCF82/chunks/000001 dst=01GXMGAZMBZT67FVVRV0HZCF82/chunks/000001 bucket="tracing: thanos"
level=debug ts=2023-04-10T02:36:39.208398455Z caller=objstore.go:288 msg="uploaded file" from=/prometheus/prometheus-0/thanos/upload/01GXMGAZMBZT67FVVRV0HZCF82/index dst=01GXMGAZMBZT67FVVRV0HZCF82/index bucket="tracing: thanos"
level=debug ts=2023-04-10T02:36:39.299048784Z caller=objstore.go:288 msg="uploaded file" from=/prometheus/prometheus-0/thanos/upload/01GXMGB2VABREC82XA7NXB360V/chunks/000001 dst=01GXMGB2VABREC82XA7NXB360V/chunks/000001 bucket="tracing: thanos"
level=debug ts=2023-04-10T02:36:39.333543236Z caller=objstore.go:288 msg="uploaded file" from=/prometheus/prometheus-0/thanos/upload/01GXMGB2VABREC82XA7NXB360V/index dst=01GXMGB2VABREC82XA7NXB360V/index bucket="tracing: thanos"

查看 Minio 平台

Compact 压缩

Compact 组件操作的 OSS 中的数据块,如果没发现数据,自然是不会进行任何操作的,通过日志可以观察到,默认情况下它是每分钟检查一次

# ...
### 02:29:27 获取元数据
level=debug ts=2023-04-10T02:29:27.561092785Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
### 啥也没有 cached=0 returned=0 partial=0
level=info ts=2023-04-10T02:29:27.562524099Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" durati
on=1.478105ms duration_ms=1 cached=0 returned=0 partial=0
### 02:30:27 获取元数据
level=debug ts=2023-04-10T02:30:27.560498097Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=info ts=2023-04-10T02:30:27.562412735Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" durati
on=2.012146ms duration_ms=2 cached=0 returned=0 partial=0
### 02:31:27 获取元数据
level=debug ts=2023-04-10T02:31:27.560375625Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=info ts=2023-04-10T02:31:27.561918533Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" durati
on=1.583797ms duration_ms=1 cached=0 returned=0 partial=0
### 02:32:27 获取元数据
level=debug ts=2023-04-10T02:32:27.560803846Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=info ts=2023-04-10T02:32:27.56304876Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duratio
n=2.387654ms duration_ms=2 cached=0 returned=0 partial=0

当 sidecar 将 tsdb block 上传到 minio 后,Compact 组件自动发现新块

level=debug ts=2023-04-10T02:37:27.561358271Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=info ts=2023-04-10T02:37:27.599941009Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=38.670452ms duration_ms=38 cached=28 returned=28 partial=0
level=info ts=2023-04-10T02:38:27.555831736Z caller=compact.go:1291 msg="start sync of metas"
level=debug ts=2023-04-10T02:38:27.555906939Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=debug ts=2023-04-10T02:38:27.568020609Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMGAG7A8SWWVPVQ7MPYSS5Q
# ...
level=debug ts=2023-04-10T02:38:27.568293779Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMG9XYT0M89KVSCZ338Q3CP
level=info ts=2023-04-10T02:38:27.56833485Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duratio
n=6.589682ms duration_ms=6 cached=28 returned=28 partial=0
level=info ts=2023-04-10T02:38:27.56841679Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=12.589463ms duration_ms=12 cached=28 returned=0 partial=0
### 管理块数据(已上传的、标记为删除的)
level=info ts=2023-04-10T02:38:27.568431643Z caller=clean.go:34 msg="started cleaning of aborted partial uploads"
level=info ts=2023-04-10T02:38:27.568440234Z caller=clean.go:61 msg="cleaning of aborted partial uploads done"
level=info ts=2023-04-10T02:38:27.568448547Z caller=blocks_cleaner.go:44 msg="started cleaning of blocks marked for deletion"
level=info ts=2023-04-10T02:38:27.568457032Z caller=blocks_cleaner.go:58 msg="cleaning of blocks marked for deletion done"
#################################
level=debug ts=2023-04-10T02:38:27.568490022Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=debug ts=2023-04-10T02:38:27.580477594Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMGAWPMJ6WY1XVHPCQJW7WC
# ...
level=debug ts=2023-04-10T02:38:27.580742951Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMGAM6H4Q5BTEWFFZNG6D8Z
level=info ts=2023-04-10T02:38:27.580853148Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" durati
on=12.385089ms duration_ms=12 cached=28 returned=0 partial=0
level=info ts=2023-04-10T02:38:27.580874618Z caller=compact.go:1296 msg="start of GC"
### 开始尝试压缩
level=info ts=2023-04-10T02:38:27.580894454Z caller=compact.go:1319 msg="start of compactions"
level=info ts=2023-04-10T02:38:27.580906902Z caller=compact.go:1355 msg="compaction iterations done"
### 尝试首轮降采样
level=info ts=2023-04-10T02:38:27.580926822Z caller=compact.go:430 msg="start first pass of downsampling"
level=debug ts=2023-04-10T02:38:27.580961433Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=debug ts=2023-04-10T02:38:27.594650377Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMGA9JGPB54GW8GQKTFF3MY
# ...
level=debug ts=2023-04-10T02:38:27.594880554Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMG9VG9Y1QVACCAXG87F6DS
level=info ts=2023-04-10T02:38:27.594955239Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=14.018095ms duration_ms=14 cached=28 returned=0 partial=0
level=debug ts=2023-04-10T02:38:27.595026715Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=debug ts=2023-04-10T02:38:27.608109142Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMGAQ4SRGFFHW1G8FYNKDGG
# ...
level=debug ts=2023-04-10T02:38:27.608356229Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMGAMG1D8XSBPY11ESQAMHZ
level=info ts=2023-04-10T02:38:27.60877326Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=13.79693ms duration_ms=13 cached=28 returned=0 partial=0
### 尝试二轮降采样
level=debug ts=2023-04-10T02:38:27.608918151Z caller=downsample.go:246 msg="downsampling bucket" concurrency=1
level=info ts=2023-04-10T02:38:27.608974613Z caller=compact.go:444 msg="start second pass of downsampling"
level=debug ts=2023-04-10T02:38:27.609017232Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=debug ts=2023-04-10T02:38:27.622177502Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMGASPAKPDMPZYRAX4WZJKW
# ...
level=info ts=2023-04-10T02:38:27.622523209Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=13.534556ms duration_ms=13 cached=28 returned=0 partial=0
level=debug ts=2023-04-10T02:38:27.622644914Z caller=downsample.go:246 msg="downsampling bucket" concurrency=1
level=info ts=2023-04-10T02:38:27.622707242Z caller=compact.go:451 msg="downsampling iterations done"
level=debug ts=2023-04-10T02:38:27.622745364Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=debug ts=2023-04-10T02:38:27.633623809Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMGA9JGPB54GW8GQKTFF3MY
# ...
level=debug ts=2023-04-10T02:38:27.63391206Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMGAWPDXHNCV31ZKMRT3G52
level=info ts=2023-04-10T02:38:27.633998101Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" durati
on=11.278018ms duration_ms=11 cached=28 returned=0 partial=0
level=info ts=2023-04-10T02:38:27.634018874Z caller=retention.go:32 msg="start optional retention"
level=info ts=2023-04-10T02:38:27.634027758Z caller=retention.go:47 msg="optional retention apply done"
level=debug ts=2023-04-10T02:38:27.634065079Z caller=fetcher.go:327 component=block.BaseFetcher msg="fetching meta data" concurrency=32
level=debug ts=2023-04-10T02:38:27.644224495Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMGA21M8F0CRQ6XTMCJFCZD
# ...
level=debug ts=2023-04-10T02:38:27.644457389Z caller=fetcher.go:777 msg="block is too fresh for now" block=01GXMG9VG9Y1QVACCAXG87F6DS
level=info ts=2023-04-10T02:38:27.644539853Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" durati
on=10.502657ms duration_ms=10 cached=28 returned=0 partial=0
level=info ts=2023-04-10T02:38:27.644560797Z caller=clean.go:34 msg="started cleaning of aborted partial uploads"
level=info ts=2023-04-10T02:38:27.644569866Z caller=clean.go:61 msg="cleaning of aborted partial uploads done"
level=info ts=2023-04-10T02:38:27.644578172Z caller=blocks_cleaner.go:44 msg="started cleaning of blocks marked for deletion"
level=info ts=2023-04-10T02:38:27.644587935Z caller=blocks_cleaner.go:58 msg="cleaning of blocks marked for deletion done"

通过日志可以看出 Compact 组件也开始工作了,包括:整理数据块、删除已标记的块(这里没有配置保留策略,所以会一直保存下去)

Grafana 大盘

创建数据源,略过

导入仪表盘 18435

资源汇总

贴一下最终 Kubernetes 资源汇总

ConfigMap

$ Thanos-demo  kc get cm -n kube-mon                                                 
NAME                            DATA   AGE
configmap-alertmanager-config   1      3h4m
configmap-prom-config           1      3h26m
configmap-prom-rules            2      3h25m
kube-root-ca.crt                1      3h30m

Secret

$ kc get secret -n kube-mon
NAME    TYPE    DATA   AGE
default-token-lfxrn    kubernetes.io/service-account-token   3    3h31m
prometheus-token-6gxhq   kubernetes.io/service-account-token   3    3h31m
secret-promoter-config   Opaque    1    3h5m
thanos-objectstorage     Opaque    1    3h31m

StorageClass

$ kc get sc -n kube-mon    
NAME            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-storage   kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  3h34m

PV

$ kc get pv -n kube-mon
NAME                         CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS    REASON   AGE
grafana-local                1Gi        RWO            Retain           Bound    kube-mon/grafana-data               local-storage            3h6m
monio-local                  10Gi       RWO            Retain           Bound    default/minio-pvc                   local-storage            3h15m
prometheus-local             40Gi       RWX            Retain           Bound    kube-mon/prometheus-data            local-storage            3h9m
thanos-store-gateway-local   2Gi        RWO            Retain           Bound    kube-mon/thanos-store-gateway-pvc   local-storage            3h13m

PVC

$ kc get pvc -n kube-mon
NAME                       STATUS   VOLUME                       CAPACITY   ACCESS MODES   STORAGECLASS    AGE
grafana-data               Bound    grafana-local                1Gi        RWO            local-storage   3h7m
prometheus-data            Bound    prometheus-local             40Gi       RWX            local-storage   3h10m
thanos-store-gateway-pvc   Bound    thanos-store-gateway-local   2Gi        RWO            local-storage   3h14m

DaemonSet

$ kc get ds -n kube-mon
NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
node-exporter   3         3         3       3            3           kubernetes.io/os=linux   3h34m

Deployment

$ kc get deploy -n kube-mon
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
alertmanager            1/1     1            1           3h9m
grafana                 1/1     1            1           3h9m
promoter                1/1     1            1           3h9m
thanos-querier          2/2     2            2           3h8m
thanos-query-frontend   1/1     1            1           174m

StatefulSet

$ kc get sts -n kube-mon                  
NAME                   READY   AGE
prometheus             2/2     52m
thanos-compactor       1/1     174m
thanos-store-gateway   2/2     3h16m

Service

$ kc get svc -n kube-mon -o wide 
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE     SELECTOR
alertmanager            ClusterIP   10.108.96.79     <none>        9193/TCP          3h10m   app=alertmanager
grafana                 NodePort    10.107.122.187   <none>        3000:32549/TCP    3h10m   app=grafana
prometheus              NodePort    10.98.89.117     <none>        9090:30549/TCP    3h13m   app=prometheus
promoter                ClusterIP   10.101.77.189    <none>        9194/TCP          3h10m   app=promoter
thanos-compactor        NodePort    10.106.233.137   <none>        10902:31336/TCP   175m    app=thanos-compactor
thanos-querier          NodePort    10.111.95.215    <none>        9090:31295/TCP    3h9m    app=thanos-querier
thanos-query-frontend   NodePort    10.99.87.116     <none>        9090:32734/TCP    175m    app=thanos-query-frontend
thanos-store-gateway    ClusterIP   None             <none>        10901/TCP         3h17m   thanos-store-api=true

镜像

$ ctr --namespace k8s.io images ls -q|grep -v 'sha256'
docker.io/grafana/grafana:9.4.7
docker.io/library/busybox:latest
docker.io/lotusching/promoter:latest
docker.io/minio/minio:latest
docker.io/prom/alertmanager:v0.25.0
docker.io/prom/node-exporter:v1.5.0
docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
docker.io/rancher/mirrored-flannelcni-flannel:v0.20.1
docker.io/thanosio/thanos:v0.31.0
registry.aliyuncs.com/google_containers/coredns:v1.8.4
registry.aliyuncs.com/google_containers/etcd:3.5.0-0
registry.aliyuncs.com/google_containers/kube-apiserver:v1.22.2
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.22.2
registry.aliyuncs.com/google_containers/kube-proxy:v1.22.2
registry.aliyuncs.com/google_containers/kube-scheduler:v1.22.2
registry.aliyuncs.com/google_containers/pause:3.5
registry.aliyuncs.com/google_containers/pause:3.6

七、故障排查

部署期间遇到大大小小的问题,这里贴一下大致的排障思路

  1. 检查 Pod 状态,观察 READY、STATUS、RESTART 列

    $ kc get pods -o wide -n kube-mon
  2. 如果状态不正确,检查 Pod 状态详细描述

    $ kc -n kube-mon describe -l app=<name> 
    • 检查是否正常调度
    • 检查 PV、PVC 是否正确挂载
  3. 如果 Pod 是 PV、PVC 相关问题

    • 检查 pv、pvc 是否正确关联绑定,关注 NAMECLAIM

      NAME                             CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                   STORAGECLASS    REASON   AGE
      persistentvolume/grafana-local   1Gi        RWO            Retain           Bound    kube-mon/grafana-data   local-storage            3h20m
  4. 仪表盘 Sytemd 服务单元状态无数据,这是因为 node_exporter 运行在容器中,暂时没去处理,等后续补上处理方式

  1. minio 无上传数据
    • 检查数据目录下,是否生成 01GX... 相关的数据文件目录,如果没有,那就是正常的,因为时间还不够,如果着急确认上传功能,可以通过缩短 --storage.tsdb.min-block-duration--storage.tsdb.max-block-duration 这两个参数,重建 Prometheus 工作负载即可
    • 检查上传日志,kc -n kube-mon logs prometheus-0 -c thanos | grep upload

文章作者: Da
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Da !
  目录