【Ambari】爬坑指南:HDFS错误汇总

大数据集群错误汇总

  • 大数据集群错误汇总:link
  • ZooKeeper错误汇总:link
  • HDFS错误汇总:link
  • Yarn错误汇总:link

HDFS错误汇总

启动namenode失败

在这里插入图片描述

  • stderr: /var/lib/ambari-agent/data/errors-580.txt
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/namenode.py", line 408, in <module>
    NameNode().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/namenode.py", line 138, in start
    upgrade_suspended=params.upgrade_suspended, env=env)
  File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/hdfs_namenode.py", line 199, in namenode
    create_log_dir=True
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/utils.py", line 261, in service
    Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports)
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
    returns=self.resource.returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/3.1.0.0-78/hadoop/bin/hdfs --config /usr/hdp/3.1.0.0-78/hadoop/conf --daemon start namenode'' returned 1.
  • stdout: /var/lib/ambari-agent/data/output-580.txt
2020-04-11 10:50:35,438 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping NameNode metrics system...
2020-04-11 10:50:35,438 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(216)) - NameNode metrics system stopped.
2020-04-11 10:50:35,439 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(607)) - NameNode metrics system shutdown complete.
2020-04-11 10:50:35,439 ERROR namenode.NameNode (NameNode.java:main(1715)) - Failed to start namenode.
java.net.BindException: Port in use: master:50070
	at org.apache.hadoop.http.HttpServer2.constructBindException(HttpServer2.java:1197)
	at org.apache.hadoop.http.HttpServer2.bindForSinglePort(HttpServer2.java:1219)
	at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:1278)
	at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1133)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:177)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:869)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:691)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710)
Caused by: java.net.BindException: Cannot assign requested address
	at sun.nio.ch.Net.bind0(Native Method)
	at sun.nio.ch.Net.bind(Net.java:433)
	at sun.nio.ch.Net.bind(Net.java:425)
	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
	at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:351)
	at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:319)
	at org.apache.hadoop.http.HttpServer2.bindListener(HttpServer2.java:1184)
	at org.apache.hadoop.http.HttpServer2.bindForSinglePort(HttpServer2.java:1215)
	... 9 more
2020-04-11 10:50:35,440 INFO  util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: java.net.BindException: Port in use: master:50070
2020-04-11 10:50:35,443 INFO  namenode.NameNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/39.99.193.128
************************************************************/
  • 解决方案

首先查看error日志,resource_management.core.exceptions.ExecutionFailed: Execution of ‘ambari-sudo.sh su hdfs -l -s /bin/bash -c ‘ulimit -c unlimited ; /usr/hdp/3.1.0.0-78/hadoop/bin/hdfs --config /usr/hdp/3.1.0.0-78/hadoop/conf --daemon start namenode’’ returned 1.查找解决方法无果。
然后仔细查看output日志(上面放的是不完整版),发现java.net.BindException: Port in use: master:50070,首先排查端口是否被占用了:netstat -lnp|grep 50070,如果被占用了杀掉占用进程就可以了:kill -9 <进程号>,但是没有发现占用端口的进程。于是,查看配置信息:vim /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
39.**.***.128  master # 公网ip
123.**.***.54  slave1
123.**.***.90  slave2
59.**.***.72   slave3
39.***.***.21  slave4

发现master的ip是公网ip,将它换成内网Ip之后

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# 39.**.***.128  master # 公网ip
172.**.***.86 master # 内网ip
123.**.***.54  slave1
123.**.***.90  slave2
59.**.***.72   slave3
39.***.***.21  slave4

再次启动namenode,问题解决
在这里插入图片描述

NFSGateway启动失败

在这里插入图片描述

  • stderr
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/nfsgateway.py", line 92, in <module>
    NFSGateway().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/nfsgateway.py", line 54, in start
    nfsgateway(action="start")
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/hdfs_nfsgateway.py", line 64, in nfsgateway
    prepare_rpcbind()
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HDFS/package/scripts/hdfs_nfsgateway.py", line 55, in prepare_rpcbind
    raise Fail("Failed to start rpcbind or portmap")
resource_management.core.exceptions.Fail: Failed to start rpcbind or portmap
  • 解决方案
[root@master ~]# systemctl enable rpcbind
[root@master ~]# systemctl start rpcbind

在这里插入图片描述


版权声明:本文为Sisyphus_98原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。