Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent.alive 监控项 异常报警 #917

Open
CHAIIOUu opened this issue Dec 28, 2020 · 2 comments
Open

agent.alive 监控项 异常报警 #917

CHAIIOUu opened this issue Dec 28, 2020 · 2 comments

Comments

@CHAIIOUu
Copy link

agent 版本:Open-Falcon agent version 0.3.x, build fa81b78
问题现象:
机器A上部署agent,负责采集主机监控项,监控项上报日志如下:
image
机器B上部署transfer和hbs:
image
且接收信息日志正常。
机器C上部署openfalcon server端,连接均正常。
我在dashboard上设置了机器A agent.alive 监控项的nodata,若中断则上报值为 -1.
但现在机器A 上报信息均为正常,其网卡、cpu、mem等监控信息可在dashboard中查询到,且为实时数据。但是agent.alive则为-1 值,触发告警。重启多次,并且重新部署之后都未恢复。
image

@CHAIIOUu
Copy link
Author

机器A日志:
2020/12/28 15:01:48 var.go:102: => <Total=97> <Endpoint:172.18.151.202_smarthome-openfalcon-opentsdb, Metric:agent.alive, Type:GAUGE, Tags:, Step:60, Time:1609138908, Value:1>
2020/12/28 15:02:48 var.go:102: => <Total=97> <Endpoint:172.18.151.202_smarthome-openfalcon-opentsdb, Metric:agent.alive, Type:GAUGE, Tags:, Step:60, Time:1609138968, Value:1>
机器B连接:
tcp6 0 0 172.18.151.197:6030 172.18.151.202:44994 ESTABLISHED 21868/falcon-hbs
tcp6 0 0 172.18.151.197:8433 172.18.151.202:43795 ESTABLISHED 21822/falcon-transf
机器C连接:
tcp6 0 0 192.168.1.1:3306 172.18.151.197:56708 ESTABLISHED 68828/mysqld
tcp6 0 0 192.168.1.1:6070 172.18.151.197:33836 ESTABLISHED 160472/falcon-graph
tcp6 0 0 192.168.1.1:6080 172.18.151.197:49380 ESTABLISHED 160162/falcon-judge

openfalcon server端 nodata日志为:
2020/12/28 07:23:00 sender.go:93: send items: [<JsonMetaData Endpoint:172.18.151.202_smarthome-openfalcon-opentsdb, Metric:agent.alive, Tags:, DsType:GAUGE, Step:60, Value:-1, Timestamp:1609140060>]

但明明agent上报agent.alive 值为1 ,为什么该告警项还会触发nodata呢?我们遇到过多次这种情况,agent.alive告警挂掉了,但是进程正常,且其他监控项正常采集上报,且其他告警项正常!

@laodaxyz
Copy link

对比一下 上报的endpoint 和 host表的hostname名称是否一致 我们碰到了 大小写不一致的时候 不填写新的host记录 但是会触发nodata报警

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants