Datacenter: id=27, name=Dire Dawa Asset: id=819, name=DIR-IGW-T9K001-NPB01 AssetInfo dashboard -> TSG-9140-NPB-TRAFFIC-> Throughput - DP1/L4 监控图表异常原因分析如下 问题说明: 图表异常时间范围:2023-06-08 22:14:55 - 2023-06-08 22:16:55 (UTC+3) prometheus 采集间隔时间: 1分钟 prometheus 采集状态指标: up 指标:1 表示目标服务健康,即可达;0 表示抓取失败 scrape_samples_scraped 指标:目标暴露的样本数 1. up 指标数据如下:在 22:14:55,22:15:55,22:16:55 结果均为 1 分析:说明 10.219.11.1:9000 服务健康(成功建立连接) { "metric": { "__name__": "up", "asset": "DIR-IGW-T9K001-NPB01", "asset_id": "819", "datacenter": "Dire Dawa", "datacenter_id": "27", "device_group": "DIR-IGW", "endpoint": "MRAPM-STREAM-DIR-IGW-T9K001-NPB01", "endpoint_id": "4913", "instance": "10.219.11.1:9000", "job": "4913", "module": "MRAPM-STREAM", "module_id": "13", "nz_agent_id": "45", "project": "TSG-9140", "project_id": "3" }, "values": [ [ 1686251695.097, // 2023-06-08 22:14:55 "1" ], [ 1686251755.097, // 2023-06-08 22:15:55 "1" ], [ 1686251815.097, // 2023-06-08 22:16:55 "1" ] ] } 2.scrape_samples_scraped 指标数据如下:在 22:14:55,22:16:55 结果均为 36, 在 22:15:55 时结果为 0 分析:结合 up 指标可得,在 22:15:55 时,up=1,scrape_samples_scraped=0, 说明在该时间点,prometheus 成功连接了目标服务,但是没有拉取到任何指标数据 { "metric": { "__name__": "scrape_samples_scraped", "asset": "DIR-IGW-T9K001-NPB01", "asset_id": "819", "datacenter": "Dire Dawa", "datacenter_id": "27", "device_group": "DIR-IGW", "endpoint": "MRAPM-STREAM-DIR-IGW-T9K001-NPB01", "endpoint_id": "4913", "instance": "10.219.11.1:9000", "job": "4913", "module": "MRAPM-STREAM", "module_id": "13", "nz_agent_id": "45", "project": "TSG-9140", "project_id": "3" }, "values": [ [ 1686251695.097,// 2023-06-08 22:14:55 "36" ], [ 1686251755.097,// 2023-06-08 22:15:55 "0" ], [ 1686251815.097,// 2023-06-08 22:16:55 "36" ] ] } 3.rx_bits_total 指标说明:Throughput - DP1/L4 监控图表使用指标项 rx_bits_total 指标数据如下:在 22:14:55,22:16:55 均有数据,但缺少 22:15:55 的指标数据 分析:监控图表中查询表达式为:irate(rx_bits_total{asset="DIR-IGW-T9K001-NPB01", app="sapp4", device="eth_vf_raw"}[2m]),由于 irate 函数会计算两个时间范围之间的速率,现在缺少了一个周期的数据,所以造成了图表中断 { "metric": { "__name__": "rx_bits_total", "app": "sapp4", "asset": "DIR-IGW-T9K001-NPB01", "asset_id": "819", "datacenter": "Dire Dawa", "datacenter_id": "27", "device": "eth_vf_raw", "device_group": "DIR-IGW", "endpoint": "MRAPM-STREAM-DIR-IGW-T9K001-NPB01", "endpoint_id": "4913", "module": "MRAPM-STREAM", "module_id": "13", "nz_agent_id": "45", "project": "TSG-9140", "project_id": "3" }, "values": [ [ 1686251695.097,// 2023-06-08 22:14:55 "34554128445999550" ], [ 1686251815.097,// 2023-06-08 22:16:55 "34555126093601070" ] ] } 总结: 1. 在 22:14:55,22:15:55,22:16:55 时间点,10.219.11.1:9000 服务健康 2. 在 22:15:55 时间点,prometheus 进行了指标采集,但是 10.219.11.1:9000 服务没有提供指标数据 3. 在 22:15:55 时间点,由于缺少了一个周期的指标数据,结合图表使用 irate 函数,所以造成了图表中断