架构师_程序员_码农网

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 4864|回复: 2

【实战】CentOS 7 下 Kafka 经常挂掉解决方案

[复制链接]
发表于 2021-9-22 17:31:38 | 显示全部楼层 |阅读模式
回顾:

【转】Kafka Windows 系统运行一段时间就挂掉解决方案
https://www.itsvse.com/thread-9984-1-1.html

Windows Kafka ERROR Failed to clean up log for __consumer_offsets
https://www.itsvse.com/thread-9980-1-1.html

使用如下命令,查看 Kafka 运行状态。如下:

kafka.service
   Loaded: loaded (/usr/lib/systemd/system/kafka.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2021-09-22 14:43:11 CST; 1h 43min ago
  Process: 7363 ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties (code=exited, status=1/FAILURE)
Main PID: 7363 (code=exited, status=1/FAILURE)

Sep 22 14:43:11 devops02 kafka-server-start.sh[7363]: [2021-09-22 14:43:11,295] WARN [ReplicaManager broker=1] Stopping serving replicas in dir /tmp/kafka-logs (kafka.server.ReplicaManager)
Sep 22 14:43:11 devops02 kafka-server-start.sh[7363]: [2021-09-22 14:43:11,298] WARN [GroupCoordinator 1]: Failed to write empty metadata for group KqBatchAna: This is not the correct coordinator. (kafka.co...upCoordinator)
Sep 22 14:43:11 devops02 kafka-server-start.sh[7363]: [2021-09-22 14:43:11,303] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions HashSet(__consumer_offsets-22, __consumer_offsets-30, ...-8, __consumer
Sep 22 14:43:11 devops02 kafka-server-start.sh[7363]: [2021-09-22 14:43:11,304] INFO [ReplicaAlterLogDirsManager on broker 1] Removed fetcher for partitions HashSet(__consumer_offsets-22, __consumer_offsets...fsets-8, __con
Sep 22 14:43:11 devops02 kafka-server-start.sh[7363]: [2021-09-22 14:43:11,378] WARN [ReplicaManager broker=1] Broker 1 stopped fetcher for partitions __consumer_offsets-22,__consumer_offsets-30,__consumer_...fsets-21,__con
Sep 22 14:43:11 devops02 kafka-server-start.sh[7363]: [2021-09-22 14:43:11,379] WARN Stopping serving logs in dir /tmp/kafka-logs (kafka.log.LogManager)
Sep 22 14:43:11 devops02 kafka-server-start.sh[7363]: [2021-09-22 14:43:11,386] ERROR Shutdown broker because all log dirs in /tmp/kafka-logs have failed (kafka.log.LogManager)
Sep 22 14:43:11 devops02 systemd[1]: kafka.service: main process exited, code=exited, status=1/FAILURE
Sep 22 14:43:11 devops02 systemd[1]: Unit kafka.service entered failed state.
Sep 22 14:43:11 devops02 systemd[1]: kafka.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

QQ截图20210922165229.jpg

进入到 Kafka 日志目录 /usr/local/kafka/logs 查看 server.log 日志文件,如下:

[2021-09-22 14:43:11,286] ERROR Error while rolling log segment for __consumer_offsets-8 in dir /tmp/kafka-logs (kafka.server.LogDirFailureChannel)
java.io.FileNotFoundException: /tmp/kafka-logs/__consumer_offsets-8/00000000000000000000.index (No such file or directory)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:182)
        at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175)
        at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241)
        at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241)
        at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:507)
        at kafka.log.Log.$anonfun$roll$8(Log.scala:2037)
        at kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:2037)
        at scala.Option.foreach(Option.scala:437)
        at kafka.log.Log.$anonfun$roll$2(Log.scala:2037)
        at kafka.log.Log.roll(Log.scala:2453)
        at kafka.log.Log.maybeRoll(Log.scala:1988)
        at kafka.log.Log.append(Log.scala:1263)
        at kafka.log.Log.appendAsLeader(Log.scala:1112)
        at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1069)
        at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1057)
        at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:958)
        at scala.collection.Iterator$$anon$9.next(Iterator.scala:575)
        at scala.collection.mutable.Growable.addAll(Growable.scala:62)
        at scala.collection.mutable.Growable.addAll$(Growable.scala:57)
        at scala.collection.immutable.MapBuilderImpl.addAll(Map.scala:692)
        at scala.collection.immutable.Map$.from(Map.scala:643)
        at scala.collection.immutable.Map$.from(Map.scala:173)
        at scala.collection.MapOps.map(Map.scala:266)
        at scala.collection.MapOps.map$(Map.scala:266)
        at scala.collection.AbstractMap.map(Map.scala:372)
        at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:946)
        at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:616)
        at kafka.coordinator.group.GroupMetadataManager.storeGroup(GroupMetadataManager.scala:325)
        at kafka.coordinator.group.GroupCoordinator.$anonfun$onCompleteJoin$1(GroupCoordinator.scala:1206)
        at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:227)
        at kafka.coordinator.group.GroupCoordinator.onCompleteJoin(GroupCoordinator.scala:1178)
        at kafka.coordinator.group.DelayedJoin.onComplete(DelayedJoin.scala:43)
        at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:72)
        at kafka.coordinator.group.DelayedJoin.$anonfun$tryComplete$1(DelayedJoin.scala:38)
        at kafka.coordinator.group.GroupCoordinator.$anonfun$tryCompleteJoin$1(GroupCoordinator.scala:1172)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.scala:17)
        at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:227)
        at kafka.coordinator.group.GroupCoordinator.tryCompleteJoin(GroupCoordinator.scala:1171)
        at kafka.coordinator.group.DelayedJoin.tryComplete(DelayedJoin.scala:38)
        at kafka.server.DelayedOperation.safeTryCompleteOrElse(DelayedOperation.scala:110)
        at kafka.server.DelayedOperationPurgatory.tryCompleteElseWatch(DelayedOperation.scala:234)
        at kafka.coordinator.group.GroupCoordinator.prepareRebalance(GroupCoordinator.scala:1144)
        at kafka.coordinator.group.GroupCoordinator.$anonfun$maybePrepareRebalance$1(GroupCoordinator.scala:1118)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:227)
        at kafka.coordinator.group.GroupCoordinator.maybePrepareRebalance(GroupCoordinator.scala:1117)
        at kafka.coordinator.group.GroupCoordinator.removeMemberAndUpdateGroup(GroupCoordinator.scala:1156)
        at kafka.coordinator.group.GroupCoordinator.$anonfun$handleLeaveGroup$3(GroupCoordinator.scala:498)
        at scala.collection.immutable.List.map(List.scala:246)
        at kafka.coordinator.group.GroupCoordinator.$anonfun$handleLeaveGroup$2(GroupCoordinator.scala:470)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:227)
        at kafka.coordinator.group.GroupCoordinator.handleLeaveGroup(GroupCoordinator.scala:467)
        at kafka.server.KafkaApis.handleLeaveGroupRequest(KafkaApis.scala:1659)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:180)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:74)
        at java.lang.Thread.run(Thread.java:748)

错误原因:Linux 会定时清理 /tmp 目录下的文件,kafka 文件目录默认存放在 /tmp/kafka-logs 目录下,导致被定时给清理掉了,导致程序运行异常。

在 CentOS 7 下,与清理相关的系统服务有3个:

systemd-tmpfiles-setup.service  :Create Volatile Files and Directories
systemd-tmpfiles-setup-dev.service:Create static device nodes in /dev
systemd-tmpfiles-clean.service :Cleanup of Temporary Directories

相关配置文件,也有 3 个,如下:

/etc/tmpfiles.d/*.conf
/run/tmpfiles.d/*.conf
/usr/lib/tmpfiles.d/*.conf

使用如下命令,查看系统记录与 tmpfiles 相关的日志:

QQ截图20210922172549.jpg

tmp 目录在 /usr/lib/tmpfiles.d/tmp.conf 文件配置,如下图:

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

# See tmpfiles.d(5) for details

# Clear tmp directories separately, to make them easier to override
v /tmp 1777 root root 10d
v /var/tmp 1777 root root 30d

# Exclude namespace mountpoints created with PrivateTmp=yes
x /tmp/systemd-private-%b-*
X /tmp/systemd-private-%b-*/tmp
x /var/tmp/systemd-private-%b-*
X /var/tmp/systemd-private-%b-*/tmp

QQ截图20210922172721.jpg

解决方案一

修改 Kafka 的配置文件 /config/server.properties,修改 log.dirs 配置,例如:



解决方案二

添加排除目录,编辑文件:/usr/lib/tmpfiles.d/tmp.conf


(完)




上一篇:CentOS 7 安装 ASP.NET Core 3.1 的运行时环境
下一篇:nslookup 命令简单介绍
码农网,只发表在实践过程中,遇到的技术难题,不误导他人。
发表于 2021-9-22 19:51:17 | 显示全部楼层
又来学习学习。。。。。
码农网,只发表在实践过程中,遇到的技术难题,不误导他人。
 楼主| 发表于 2022-2-7 14:31:46 | 显示全部楼层
查看清理日志命令:


2月 02 18:18:09 centos7-itsvse systemd[1]: Starting Cleanup of Temporary Directories...
2月 02 18:18:09 centos7-itsvse systemd[1]: Started Cleanup of Temporary Directories.
2月 03 18:19:09 centos7-itsvse systemd[1]: Starting Cleanup of Temporary Directories...
2月 03 18:19:09 centos7-itsvse systemd[1]: Started Cleanup of Temporary Directories.
2月 04 18:20:09 centos7-itsvse systemd[1]: Starting Cleanup of Temporary Directories...
2月 04 18:20:09 centos7-itsvse systemd[1]: Started Cleanup of Temporary Directories.
2月 05 18:21:09 centos7-itsvse systemd[1]: Starting Cleanup of Temporary Directories...
2月 05 18:21:09 centos7-itsvse systemd[1]: Started Cleanup of Temporary Directories.
2月 06 18:22:09 centos7-itsvse systemd[1]: Starting Cleanup of Temporary Directories...
2月 06 18:22:09 centos7-itsvse systemd[1]: Started Cleanup of Temporary Directories.
码农网,只发表在实践过程中,遇到的技术难题,不误导他人。
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

免责声明:
码农网所发布的一切软件、编程资料或者文章仅限用于学习和研究目的;不得将上述内容用于商业或者非法用途,否则,一切后果请用户自负。本站信息来自网络,版权争议与本站无关。您必须在下载后的24个小时之内,从您的电脑中彻底删除上述内容。如果您喜欢该程序,请支持正版软件,购买注册,得到更好的正版服务。如有侵权请邮件与我们联系处理。

Mail To:help@itsvse.com

QQ|手机版|小黑屋|架构师 ( 鲁ICP备14021824号-2 )|网站地图

GMT+8, 2024-4-25 11:36

Powered by Discuz! X3.4

Copyright © 2001-2021, Tencent Cloud.

快速回复 返回顶部 返回列表