Kubernetes 进阶教程 | 基于 PromQL 语法探索 Prometheus 的奥秘

2023-12-17 11:03:09

Prometheus 部署

Prometheus 是一个开源的监控系统，用于收集、存储和分析时间序列数据，提供了丰富的数据查询语言 PromQL。在 Kubernetes 集群中部署 Prometheus 非常简单。

首先，创建一个新的命名空间，以隔离 Prometheus 组件：

kubectl create namespace prometheus

然后，添加 Prometheus Helm 仓库：

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

安装 Prometheus：

helm install prometheus prometheus-community/prometheus --namespace prometheus

安装完成后，检查 Prometheus 的 Pod 状态：

kubectl get pods --namespace prometheus

2. Grafana 部署

Grafana 是一个开源的监控仪表盘和数据可视化平台，用于展示 Prometheus 收集的数据。

创建新的 Grafana 命名空间：

kubectl create namespace grafana

添加 Grafana Helm 仓库：

helm repo add grafana https://grafana.github.io/helm-charts

安装 Grafana：

helm install grafana grafana/grafana --namespace grafana

安装完成后，检查 Grafana 的 Pod 状态：

kubectl get pods --namespace grafana

3. PromQL 语法简介

PromQL 是一种强大的查询语言，用于查询和分析 Prometheus 存储的时间序列数据。它提供了一系列操作符和函数，可以对数据进行聚合、过滤和转换。

以下是一些常用的 PromQL 查询示例：

查询所有名称包含 "node" 的指标：

node_*

查询所有名称包含 "node" 且以 "cpu" 结尾的指标：

node_*_cpu

查询过去 5 分钟内 CPU 使用率超过 80% 的节点：

avg(node_cpu_utilization{node="node-1"}) > 0.8

查询所有 Kubernetes Pod 的 CPU 使用率：

sum(container_cpu_usage_seconds_total{namespace!="kube-system"})

4. Prometheus 告警配置

Prometheus 提供了强大的告警机制，可以根据预定义的规则对监控指标进行监控，并在满足条件时发出告警通知。

以下是一些常用的告警规则示例：

当 CPU 使用率超过 80% 时发出告警：

ALERTS[ResourceUsageAlert]
  IF avg(node_cpu_utilization{node="node-1"}) > 0.8
  FOR 5m
  LABELS { severity = "critical" }
  ANNOTATIONS { summary = "Node {{ $labels.node }} CPU usage is high" }

当内存使用率超过 90% 时发出告警：

ALERTS[ResourceUsageAlert]
  IF avg(node_memory_utilization{node="node-1"}) > 0.9
  FOR 5m
  LABELS { severity = "warning" }
  ANNOTATIONS { summary = "Node {{ $labels.node }} memory usage is high" }

5. 结语

在本教程中，我们介绍了如何在 Kubernetes 集群中部署和使用 Prometheus 和 Grafana，并学习了如何使用 PromQL 语法对监控数据进行查询和分析。通过 Prometheus 和 Grafana 的结合，我们可以构建功能丰富的监控仪表盘，实时追踪 Kubernetes 集群和应用程序的性能指标，实现全面的系统监控。