登录NZ系统,静默OLAP NationalCenter Flink Topology Down和OLAP Flink TaskManager Down 备份: 登录10.224.11.24 mkdir -p /data/tsg/olap/backup/22.07/flink/20221026 cp -r /data/tsg/olap/flink-1.13.1/conf /data/tsg/olap/backup/22.07/flink/20221026/ 更新: 登录10.224.11.24 1:更新配置 cd /data/tsg/olap/flink-1.13.1/conf vim flink-conf.yaml 将61行taskmanager.memory.flink.size的值,修改为:153600m 将该配置分发到10.224.11.25-10.224.11.31的该目录下。 2:停止任务: cd /data/tsg/olap/topology/completion ./stop.sh config/ cd /data/tsg/olap/topology/dos-detection ./stop.sh config/ cd /data/tsg/olap/topology/flink-top ./stop.sh cd /data/tsg/olap/topology/account-framedip-Hbase ./stop.sh config/ cd /data/tsg/olap/topology/livecharts ./stop.sh config/ 3:重启taskmanager节点 登录10.224.11.31 → 10.224.11.26 service keepflinktask stop && sleep 10 jps -l | grep flink | grep TaskManagerRunner | wc -l #如果结果是1执行: ps -ef | grep TaskManagerRunner | grep -v grep |awk '{print $2}'|xargs kill -9 #如果结果是0执行: service keepflinktask start 4:恢复任务 登录10.224.11.24 cd /data/tsg/olap/topology/completion ./start.sh config/ cd /data/tsg/olap/topology/dos-detection ./start.sh config/ cd /data/tsg/olap/topology/flink-top ./start.sh cd /data/tsg/olap/topology/account-framedip-Hbase ./start.sh config/ cd /data/tsg/olap/topology/livecharts ./start.sh config/ 验证: 1:任务数量 全部重启后,等待2分钟。 flink list 结果应该有6个RUNNING的任务。 2:登录NZ系统 查询该语句,单位percent(0.0-1.0),时间过去6小时。 1 - (node_memory_MemAvailable_bytes{olap_node_exporter="flink_task"} / node_memory_MemTotal_bytes{olap_node_exporter="flink_task"}) (sum by (asset) (flink_taskmanager_Status_JVM_Memory_Heap_Used{module="NC-Flink"})/sum by (asset) (flink_taskmanager_Status_JVM_Memory_Heap_Committed{module="NC-Flink"})) 反馈确认没有问题后: 1:解除OLAP NationalCenter Flink Topology Down和OLAP Flink TaskManager Down静默 2:解除10.224.11.26-31的OLAP High Memory Usage > 80%静默。