• 首页 首页 icon
  • 工具库 工具库 icon
    • IP查询 IP查询 icon
  • 内容库 内容库 icon
    • 快讯库 快讯库 icon
    • 精品库 精品库 icon
    • 问答库 问答库 icon
  • 更多 更多 icon
    • 服务条款 服务条款 icon

k8s集群监控cadvisor+prometheus+grafana部署

武飞扬头像
luo_guibin
帮助1

目录

1.新建命名空间monitor

2.部署

2.1部署cadvisor

2.2部署node_exporter

2.3部署prometheus

2.4部署rbac权限

2.5.部署 metrics

2.6部署grafana

3.测试监控效果


参考文章:

k8s集群部署cadvisor node-exporter prometheus grafana监控系统 - cyh00001 - 博客园

准备工作:

Cluster集群节点介绍:

master:192.168.136.21(以下所步骤都在该节点进行)

worker:192.168.136.22

worker:192.168.136.23

##vim缩进混乱,冒号模式下,:set paste进入黏贴模式,:set nopaste退出黏贴模式(默认)。##

1.新建命名空间monitor

kubectl create ns monitor

学新通

学新通

拉取cadvisor镜像,由于官方的镜像在在谷歌镜像中,国内无法访问,我这里直接用别人的,直接拉取即可,注意镜像名是 lagoudocker/cadvisor:v0.37.0。

docker pull lagoudocker/cadvisor:v0.37.0 

学新通

2.部署

新建 /opt/cadvisor_prome_gra 目录,配置文件较多,单独新建一个目录。

2.1部署cadvisor

部署cadvisor的DaemonSet资源,DaemonSet资源可以保证集群内的每一个节点运行同一组相同的pod,即使是新加入的节点也会自动创建对应的pod。

 vim case1-daemonset-deploy-cadvisor.yaml

  1.  
    apiVersion: apps/v1
  2.  
    kind: DaemonSet
  3.  
    metadata:
  4.  
    name: cadvisor
  5.  
    namespace: monitor
  6.  
    spec:
  7.  
    selector:
  8.  
    matchLabels:
  9.  
    app: cAdvisor
  10.  
    template:
  11.  
    metadata:
  12.  
    labels:
  13.  
    app: cAdvisor
  14.  
    spec:
  15.  
    tolerations: #污点容忍,忽略master的NoSchedule
  16.  
    - effect: NoSchedule
  17.  
    key: node-role.kubernetes.io/master
  18.  
    hostNetwork: true
  19.  
    restartPolicy: Always # 重启策略
  20.  
    containers:
  21.  
    - name: cadvisor
  22.  
    image: lagoudocker/cadvisor:v0.37.0
  23.  
    imagePullPolicy: IfNotPresent # 镜像策略
  24.  
    ports:
  25.  
    - containerPort: 8080
  26.  
    volumeMounts:
  27.  
    - name: root
  28.  
    mountPath: /rootfs
  29.  
    - name: run
  30.  
    mountPath: /var/run
  31.  
    - name: sys
  32.  
    mountPath: /sys
  33.  
    - name: docker
  34.  
    mountPath: /var/lib/containerd
  35.  
    volumes:
  36.  
    - name: root
  37.  
    hostPath:
  38.  
    path: /
  39.  
    - name: run
  40.  
    hostPath:
  41.  
    path: /var/run
  42.  
    - name: sys
  43.  
    hostPath:
  44.  
    path: /sys
  45.  
    - name: docker
  46.  
    hostPath:
  47.  
    path: /var/lib/containerd
学新通

kubectl apply -f case1-daemonset-deploy-cadvisor.yaml

kubectl get pod -n monitor -owide 查询

因为有三个节点,所以会有三个pod,如果后期加入工作节点,DaemonSet会自动添加。 

学新通

测试cadvisor  <masterIP>:<8080>

学新通

学新通

2.2部署node_exporter

部署node-exporter的DaemonSet资源和Service资源。

vim case2-daemonset-deploy-node-exporter.yaml

  1.  
    apiVersion: apps/v1
  2.  
    kind: DaemonSet
  3.  
    metadata:
  4.  
    name: node-exporter
  5.  
    namespace: monitor
  6.  
    labels:
  7.  
    k8s-app: node-exporter
  8.  
    spec:
  9.  
    selector:
  10.  
    matchLabels:
  11.  
    k8s-app: node-exporter
  12.  
    template:
  13.  
    metadata:
  14.  
    labels:
  15.  
    k8s-app: node-exporter
  16.  
    spec:
  17.  
    tolerations:
  18.  
    - effect: NoSchedule
  19.  
    key: node-role.kubernetes.io/master
  20.  
    containers:
  21.  
    - image: prom/node-exporter:v1.3.1
  22.  
    imagePullPolicy: IfNotPresent
  23.  
    name: prometheus-node-exporter
  24.  
    ports:
  25.  
    - containerPort: 9100
  26.  
    hostPort: 9100
  27.  
    protocol: TCP
  28.  
    name: metrics
  29.  
    volumeMounts:
  30.  
    - mountPath: /host/proc
  31.  
    name: proc
  32.  
    - mountPath: /host/sys
  33.  
    name: sys
  34.  
    - mountPath: /host
  35.  
    name: rootfs
  36.  
    args:
  37.  
    - --path.procfs=/host/proc
  38.  
    - --path.sysfs=/host/sys
  39.  
    - --path.rootfs=/host
  40.  
    volumes:
  41.  
    - name: proc
  42.  
    hostPath:
  43.  
    path: /proc
  44.  
    - name: sys
  45.  
    hostPath:
  46.  
    path: /sys
  47.  
    - name: rootfs
  48.  
    hostPath:
  49.  
    path: /
  50.  
    hostNetwork: true
  51.  
    hostPID: true
  52.  
    ---
  53.  
    apiVersion: v1
  54.  
    kind: Service
  55.  
    metadata:
  56.  
    annotations:
  57.  
    prometheus.io/scrape: "true"
  58.  
    labels:
  59.  
    k8s-app: node-exporter
  60.  
    name: node-exporter
  61.  
    namespace: monitor
  62.  
    spec:
  63.  
    type: NodePort
  64.  
    ports:
  65.  
    - name: http
  66.  
    port: 9100
  67.  
    nodePort: 39100
  68.  
    protocol: TCP
  69.  
    selector:
  70.  
    k8s-app: node-exporter
学新通

 kubectl get pod -n monitor

 学新通

 验证 node-exporter 数据 ,注意是9100端口,<nodeIP>:<9100>

学新通

2.3部署prometheus

prometheus资源包括ConfigMap资源、Deployment资源、Service资源。

vim case3-1-prometheus-cfg.yaml

  1.  
    ---
  2.  
    kind: ConfigMap
  3.  
    apiVersion: v1
  4.  
    metadata:
  5.  
    labels:
  6.  
    app: prometheus
  7.  
    name: prometheus-config
  8.  
    namespace: monitor
  9.  
    data:
  10.  
    prometheus.yml: |
  11.  
    global:
  12.  
    scrape_interval: 15s
  13.  
    scrape_timeout: 10s
  14.  
    evaluation_interval: 1m
  15.  
    scrape_configs:
  16.  
    - job_name: 'kubernetes-node'
  17.  
    kubernetes_sd_configs:
  18.  
    - role: node
  19.  
    relabel_configs:
  20.  
    - source_labels: [__address__]
  21.  
    regex: '(.*):10250'
  22.  
    replacement: '${1}:9100'
  23.  
    target_label: __address__
  24.  
    action: replace
  25.  
    - action: labelmap
  26.  
    regex: __meta_kubernetes_node_label_(. )
  27.  
    - job_name: 'kubernetes-node-cadvisor'
  28.  
    kubernetes_sd_configs:
  29.  
    - role: node
  30.  
    scheme: https
  31.  
    tls_config:
  32.  
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  33.  
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  34.  
    relabel_configs:
  35.  
    - action: labelmap
  36.  
    regex: __meta_kubernetes_node_label_(. )
  37.  
    - target_label: __address__
  38.  
    replacement: kubernetes.default.svc:443
  39.  
    - source_labels: [__meta_kubernetes_node_name]
  40.  
    regex: (. )
  41.  
    target_label: __metrics_path__
  42.  
    replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
  43.  
    - job_name: 'kubernetes-apiserver'
  44.  
    kubernetes_sd_configs:
  45.  
    - role: endpoints
  46.  
    scheme: https
  47.  
    tls_config:
  48.  
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  49.  
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  50.  
    relabel_configs:
  51.  
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
  52.  
    action: keep
  53.  
    regex: default;kubernetes;https
  54.  
    - job_name: 'kubernetes-service-endpoints'
  55.  
    kubernetes_sd_configs:
  56.  
    - role: endpoints
  57.  
    relabel_configs:
  58.  
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
  59.  
    action: keep
  60.  
    regex: true
  61.  
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
  62.  
    action: replace
  63.  
    target_label: __scheme__
  64.  
    regex: (https?)
  65.  
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
  66.  
    action: replace
  67.  
    target_label: __metrics_path__
  68.  
    regex: (. )
  69.  
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
  70.  
    action: replace
  71.  
    target_label: __address__
  72.  
    regex: ([^:] )(?::\d )?;(\d )
  73.  
    replacement: $1:$2
  74.  
    - action: labelmap
  75.  
    regex: __meta_kubernetes_service_label_(. )
  76.  
    - source_labels: [__meta_kubernetes_namespace]
  77.  
    action: replace
  78.  
    target_label: kubernetes_namespace
  79.  
    - source_labels: [__meta_kubernetes_service_name]
  80.  
    action: replace
  81.  
    target_label: kubernetes_service_name
学新通

注意case3-2配置文件中的k8s-master记得更改,不能改成本地主机ip(原因未知)

设置192.168.136.21(k8s-master)节点为prometheus数据存放路径 /data/prometheus。

学新通

vim case3-2-prometheus-deployment.yaml

  1.  
    ---
  2.  
    apiVersion: apps/v1
  3.  
    kind: Deployment
  4.  
    metadata:
  5.  
    name: prometheus-server
  6.  
    namespace: monitor
  7.  
    labels:
  8.  
    app: prometheus
  9.  
    spec:
  10.  
    replicas: 1
  11.  
    selector:
  12.  
    matchLabels:
  13.  
    app: prometheus
  14.  
    component: server
  15.  
    #matchExpressions:
  16.  
    #- {key: app, operator: In, values: [prometheus]}
  17.  
    #- {key: component, operator: In, values: [server]}
  18.  
    template:
  19.  
    metadata:
  20.  
    labels:
  21.  
    app: prometheus
  22.  
    component: server
  23.  
    annotations:
  24.  
    prometheus.io/scrape: 'false'
  25.  
    spec:
  26.  
    nodeName: k8s-master
  27.  
    serviceAccountName: monitor
  28.  
    containers:
  29.  
    - name: prometheus
  30.  
    image: prom/prometheus:v2.31.2
  31.  
    imagePullPolicy: IfNotPresent
  32.  
    command:
  33.  
    - prometheus
  34.  
    - --config.file=/etc/prometheus/prometheus.yml
  35.  
    - --storage.tsdb.path=/prometheus
  36.  
    - --storage.tsdb.retention=720h
  37.  
    ports:
  38.  
    - containerPort: 9090
  39.  
    protocol: TCP
  40.  
    volumeMounts:
  41.  
    - mountPath: /etc/prometheus/prometheus.yml
  42.  
    name: prometheus-config
  43.  
    subPath: prometheus.yml
  44.  
    - mountPath: /prometheus/
  45.  
    name: prometheus-storage-volume
  46.  
    volumes:
  47.  
    - name: prometheus-config
  48.  
    configMap:
  49.  
    name: prometheus-config
  50.  
    items:
  51.  
    - key: prometheus.yml
  52.  
    path: prometheus.yml
  53.  
    mode: 0644
  54.  
    - name: prometheus-storage-volume
  55.  
    hostPath:
  56.  
    path: /data/prometheusdata
  57.  
    type: Directory
学新通

创建sa和clusterrolebinding

kubectl create serviceaccount monitor -n monitor

kubectl create clusterrolebinding monitor-clusterrolebinding -n monitor --clusterrole=cluster-admin --serviceaccount=monitor:monitor

kubectl apply -f case3-2-prometheus-deployment.yaml

学新通

 case3-2这一步有大坑,用“k8s-master"可以,但是用“192.168.136.21”就不可以!Deployment和pod一直起不来,查看pod的日志显示找不到“192.168.136.21”主机,改成“k8s-master”也不行,几天后突然就好了,期间有关过机。(原因未知)

 学新通

学新通

vim case3-3-prometheus-svc.yaml

  1.  
    ---
  2.  
    apiVersion: v1
  3.  
    kind: Service
  4.  
    metadata:
  5.  
    name: prometheus
  6.  
    namespace: monitor
  7.  
    labels:
  8.  
    app: prometheus
  9.  
    spec:
  10.  
    type: NodePort
  11.  
    ports:
  12.  
    - port: 9090
  13.  
    targetPort: 9090
  14.  
    nodePort: 30090
  15.  
    protocol: TCP
  16.  
    selector:
  17.  
    app: prometheus
  18.  
    component: server
学新通

kubectl apply -f case3-3-prometheus-svc.yaml

学新通

2.4部署rbac权限

包括Secret资源、ServiceAccount资源、ClusterRole资源、ClusterRoleBinding资源,ServiceAccount是服务账户,ClusterRole是权限规则,ClusterRoleBinding是将ServiceAccount和ClusterRole进行绑定。

pod和 apiserver 的认证信息通过 secret 进行定义,由于认证信息属于敏感信息,所以需要保存在secret 资源当中,并以存储卷的方式挂载到 Pod 当中。从而让 Pod 内运行的应用通过对应的secret 中的信息来连接 apiserver,并完成认证。

vim case4-prom-rbac.yaml

  1.  
    apiVersion: v1
  2.  
    kind: ServiceAccount
  3.  
    metadata:
  4.  
    name: prometheus
  5.  
    namespace: monitor
  6.  
     
  7.  
    ---
  8.  
    apiVersion: v1
  9.  
    kind: Secret
  10.  
    type: kubernetes.io/service-account-token
  11.  
    metadata:
  12.  
    name: monitor-token
  13.  
    namespace: monitor
  14.  
    annotations:
  15.  
    kubernetes.io/service-account.name: "prometheus"
  16.  
    ---
  17.  
    apiVersion: rbac.authorization.k8s.io/v1
  18.  
    kind: ClusterRole
  19.  
    metadata:
  20.  
    name: prometheus
  21.  
    rules:
  22.  
    - apiGroups:
  23.  
    - ""
  24.  
    resources:
  25.  
    - nodes
  26.  
    - services
  27.  
    - endpoints
  28.  
    - pods
  29.  
    - nodes/proxy
  30.  
    verbs:
  31.  
    - get
  32.  
    - list
  33.  
    - watch
  34.  
    - apiGroups:
  35.  
    - "extensions"
  36.  
    resources:
  37.  
    - ingresses
  38.  
    verbs:
  39.  
    - get
  40.  
    - list
  41.  
    - watch
  42.  
    - apiGroups:
  43.  
    - ""
  44.  
    resources:
  45.  
    - configmaps
  46.  
    - nodes/metrics
  47.  
    verbs:
  48.  
    - get
  49.  
    - nonResourceURLs:
  50.  
    - /metrics
  51.  
    verbs:
  52.  
    - get
  53.  
    ---
  54.  
    #apiVersion: rbac.authorization.k8s.io/v1beta1
  55.  
    apiVersion: rbac.authorization.k8s.io/v1
  56.  
    kind: ClusterRoleBinding
  57.  
    metadata:
  58.  
    name: prometheus
  59.  
    roleRef:
  60.  
    apiGroup: rbac.authorization.k8s.io
  61.  
    kind: ClusterRole
  62.  
    name: prometheus
  63.  
    subjects:
  64.  
    - kind: ServiceAccount
  65.  
    name: prometheus
  66.  
    namespace: monitor
学新通

kubectl apply -f case4-prom-rbac.yaml

学新通

学新通

2.5.部署 metrics

包括Deployment资源、Service资源、ServiceAccount资源、ClusterRole资源、ClusterRoleBinding资源。

注意是部署在kube-system!

学新通

vim case5-kube-state-metrics-deploy.yaml

  1.  
    apiVersion: apps/v1
  2.  
    kind: Deployment
  3.  
    metadata:
  4.  
    name: kube-state-metrics
  5.  
    namespace: kube-system
  6.  
    spec:
  7.  
    replicas: 1
  8.  
    selector:
  9.  
    matchLabels:
  10.  
    app: kube-state-metrics
  11.  
    template:
  12.  
    metadata:
  13.  
    labels:
  14.  
    app: kube-state-metrics
  15.  
    spec:
  16.  
    serviceAccountName: kube-state-metrics
  17.  
    containers:
  18.  
    - name: kube-state-metrics
  19.  
    image: registry.cn-hangzhou.aliyuncs.com/zhangshijie/kube-state-metrics:v2.6.0
  20.  
    ports:
  21.  
    - containerPort: 8080
  22.  
     
  23.  
    ---
  24.  
    ---
  25.  
    apiVersion: v1
  26.  
    kind: ServiceAccount
  27.  
    metadata:
  28.  
    name: kube-state-metrics
  29.  
    namespace: kube-system
  30.  
    ---
  31.  
    apiVersion: rbac.authorization.k8s.io/v1
  32.  
    kind: ClusterRole
  33.  
    metadata:
  34.  
    name: kube-state-metrics
  35.  
    rules:
  36.  
    - apiGroups: [""]
  37.  
    resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
  38.  
    verbs: ["list", "watch"]
  39.  
    - apiGroups: ["extensions"]
  40.  
    resources: ["daemonsets", "deployments", "replicasets"]
  41.  
    verbs: ["list", "watch"]
  42.  
    - apiGroups: ["apps"]
  43.  
    resources: ["statefulsets"]
  44.  
    verbs: ["list", "watch"]
  45.  
    - apiGroups: ["batch"]
  46.  
    resources: ["cronjobs", "jobs"]
  47.  
    verbs: ["list", "watch"]
  48.  
    - apiGroups: ["autoscaling"]
  49.  
    resources: ["horizontalpodautoscalers"]
  50.  
    verbs: ["list", "watch"]
  51.  
    ---
  52.  
    apiVersion: rbac.authorization.k8s.io/v1
  53.  
    kind: ClusterRoleBinding
  54.  
    metadata:
  55.  
    name: kube-state-metrics
  56.  
    roleRef:
  57.  
    apiGroup: rbac.authorization.k8s.io
  58.  
    kind: ClusterRole
  59.  
    name: kube-state-metrics
  60.  
    subjects:
  61.  
    - kind: ServiceAccount
  62.  
    name: kube-state-metrics
  63.  
    namespace: kube-system
  64.  
     
  65.  
    ---
  66.  
    apiVersion: v1
  67.  
    kind: Service
  68.  
    metadata:
  69.  
    annotations:
  70.  
    prometheus.io/scrape: 'true'
  71.  
    name: kube-state-metrics
  72.  
    namespace: kube-system
  73.  
    labels:
  74.  
    app: kube-state-metrics
  75.  
    spec:
  76.  
    type: NodePort
  77.  
    ports:
  78.  
    - name: kube-state-metrics
  79.  
    port: 8080
  80.  
    targetPort: 8080
  81.  
    nodePort: 31666
  82.  
    protocol: TCP
  83.  
    selector:
  84.  
    app: kube-state-metrics
学新通

 kubectl apply -f case5-kube-state-metrics-deploy.yaml

学新通

学新通

学新通

2.6部署grafana

grafana图形界面对接prometheus数据源,包括Deployment资源、Service资源。

vim grafana-enterprise.yaml

  1.  
    apiVersion: apps/v1
  2.  
    kind: Deployment
  3.  
    metadata:
  4.  
    name: grafana-enterprise
  5.  
    namespace: monitor
  6.  
    spec:
  7.  
    replicas: 1
  8.  
    selector:
  9.  
    matchLabels:
  10.  
    app: grafana-enterprise
  11.  
    template:
  12.  
    metadata:
  13.  
    labels:
  14.  
    app: grafana-enterprise
  15.  
    spec:
  16.  
    containers:
  17.  
    - image: grafana/grafana
  18.  
    imagePullPolicy: Always
  19.  
    #command:
  20.  
    # - "tail"
  21.  
    # - "-f"
  22.  
    # - "/dev/null"
  23.  
    securityContext:
  24.  
    allowPrivilegeEscalation: false
  25.  
    runAsUser: 0
  26.  
    name: grafana
  27.  
    ports:
  28.  
    - containerPort: 3000
  29.  
    protocol: TCP
  30.  
    volumeMounts:
  31.  
    - mountPath: "/var/lib/grafana"
  32.  
    name: data
  33.  
    resources:
  34.  
    requests:
  35.  
    cpu: 100m
  36.  
    memory: 100Mi
  37.  
    limits:
  38.  
    cpu: 500m
  39.  
    memory: 2500Mi
  40.  
    volumes:
  41.  
    - name: data
  42.  
    emptyDir: {}
  43.  
    ---
  44.  
    apiVersion: v1
  45.  
    kind: Service
  46.  
    metadata:
  47.  
    name: grafana
  48.  
    namespace: monitor
  49.  
    spec:
  50.  
    type: NodePort
  51.  
    ports:
  52.  
    - port: 80
  53.  
    targetPort: 3000
  54.  
    nodePort: 31000
  55.  
    selector:
  56.  
    app: grafana-enterprise
学新通

kubectl apply -f grafana-enterprise.yaml

学新通学新通

学新通

账号admin 密码admin

添加数据源data sources,命名为prometheus,注意端口号30090

学新通

 添加模板13332,还可以添加其他模板,例如:14981、13824、14518。

点击左侧“ ”号,选择“import”导入模板。

学新通

 模板13332

学新通

cadvisor模板编号14282,此处有个bug尚未解决,可以监控集群内所有容器的性能资源,但如果选中其中一个容器就无法显示数据。(应该是可以解决的)。

学新通

 现在显示的是pod的ID,不方便管理员浏览,为了方便显示成pod的name,模板右侧的“设置图标”,选择“Variables”,选择第二个,将“name”改成“pod”即可。

学新通

学新通

  仪表台的每一个板块也需要更改,点击板块标题,选择“Edit”,“name”改成“pod”。

学新通

学新通

3.测试监控效果

新建名为nginx01的deployment任务,测试监控结果。

vim nginx01.yaml

  1.  
    apiVersion: apps/v1
  2.  
    kind: Deployment
  3.  
    metadata:
  4.  
    name: nginx01
  5.  
    spec:
  6.  
    replicas: 2
  7.  
    selector:
  8.  
    matchLabels:
  9.  
    app: nginx01
  10.  
    template:
  11.  
    metadata:
  12.  
    labels:
  13.  
    app: nginx01
  14.  
    spec:
  15.  
    containers:
  16.  
    - name: nginx
  17.  
    image: nginx:1.7.9
学新通

 kubectl apply -f nginx01.yaml 

学新通

学新通

出现两个nginx01,因为设置了2个副本。

学新通

 至此,cadvisor prometheus grafana集群监控部署完成。

这篇好文章是转载于:学新通技术网

  • 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
  • 本站站名: 学新通技术网
  • 本文地址: /boutique/detail/tanhibagei
系列文章
更多 icon
同类精品
更多 icon
继续加载