Hadoop 配置参数摘要和默认端口整理


core-site.xml

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
配置项中文含义英文含义示例官方默认值
fs.defaultFSHDFS分布式文件系统访问URIThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.100.200:9000</value>
</property>
file:/// 
hadoop.tmp.dir其他临时文件的根目录A base for other temporary directories.<property>
<name>hadoop.tmp.dir</name>
<value>file:///uloc/hadoopdata/hadoop-${user.name}/tmp</value>
</property>
/tmp/hadoop-${user.name}
io.file.buffer.size读写操作时的缓存大小。一般为硬件page size的整数倍The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
4096
hadoop.http.staticuser.userThe user name to filter as, on static web filters while rendering content. An example use is the HDFS web UI (user to be used for browsing files).<property>
<name>hadoop.http.staticuser.user</name>
<value>bruce</value>
</property>
dr.who
配置项默认地址/端口英文含义备注
hadoop.registry.zk.quorumlocalhost:2181List of hostname:port pairs defining the zookeeper quorum binding for the registry
fs.defaultFSfile:///The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.一般配置形式为,hdfs://192.168.100.200:9000。
这样NamdeNode中会启动ipc.Server监听端口9000


hdf-site.xml:

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
配置项中文含义英文含义示例官方默认值
dfs.namenode.name.dirHDFS name node 存放 name table 的目录。

如果有多个目录,则name table会在每个目录下放1份拷贝。
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. <property>
<name>dfs.namenode.name.dir</name>
<value>file:///uloc/hadoopdata/hadoop-${user.name}/dfs/name</value>
<final>true</final>
</property>
file://${hadoop.tmp.dir}/dfs/name 
dfs.datanode.data.dirHDFS data node 存放数据 block 的目录。可以配置多个目录,则每个目录保存一份拷贝。Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that do not exist will be created if local filesystem permission allows. <property>
<name>dfs.datanode.data.dir</name>
<value>file:///uloc/hadoopdata/hadoop-${user.name}/dfs/data</value>
<final>true</final>
</property>
file://${hadoop.tmp.dir}/dfs/data
dfs.replication数据块的副本数量Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. <property>
<name>dfs.replication</name>
<value>1</value>
</property>
3
dfs.namenode.secondary.http-address备份name node的 地址和端口The secondary namenode http server address and port. <property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.100.200:50090</value>
</property>
0.0.0.0:50090
dfs.namenode.checkpoint.dir备份name node存放临时快照的目录。可以配置多个目录,则每个目录保存一份拷贝。Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy. <property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///uloc/hadoopdata/hadoop-${user.name}/dfs/namesecondary</value>
<final>true</final>
</property>
file://${hadoop.tmp.dir}/dfs/namesecondary
dfs.permissions.enabledHDFS文件访问权限控制开关If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.  <property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
 true
dfs.datanode.addressdata node用作数据传输的地址和端口The datanode server address and port for data transfer.<property>
<name>dfs.datanode.address</name>
<value>192.168.100.200:50010</value>
</property>
 0.0.0.0:50010 
dfs.webhdfs.enabledWebHDFS特性开关Enable WebHDFS (REST API) in Namenodes and Datanodes.<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
 true
dfs.support.appendYarn框架中无此配置项<property>   
<name>dfs.support.append</name>
   
<value>true</value>
 
</property>
dfs.permissions.superusergroupThe name of the group of super-users.<property>
<name>dfs.permissions.superusergroup</name>
<value>oinstall</value>
</property>
supergroup
dfs.block.invalidate.limit每次删除block的数量。建议默认设置为1000


配置项默认地址/端口英文含义
dfs.namenode.rpc-addressRPC address that handles all clients requests. In the case of HA/Federation where multiple namenodes exist, the name service id is added to the name e.g. dfs.namenode.rpc-address.ns1 dfs.namenode.rpc-address.EXAMPLENAMESERVICE The value of this property will take the form of nn-host1:rpc-port.
dfs.namenode.rpc-bind-hostThe actual address the RPC server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.rpc-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node listen on all interfaces by setting it to 0.0.0.0.
dfs.namenode.servicerpc-addressRPC address for HDFS Services communication. BackupNode, Datanodes and all other services should be connecting to this address if it is configured. In the case of HA/Federation where multiple namenodes exist, the name service id is added to the name e.g. dfs.namenode.servicerpc-address.ns1 dfs.namenode.rpc-address.EXAMPLENAMESERVICE The value of this property will take the form of nn-host1:rpc-port. If the value of this property is unset the value of dfs.namenode.rpc-address will be used as the default.
dfs.namenode.servicerpc-bind-hostThe actual address the service RPC server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.servicerpc-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node listen on all interfaces by setting it to 0.0.0.0.
dfs.namenode.secondary.http-address0.0.0.0:50090The secondary namenode http server address and port.
dfs.namenode.secondary.https-address0.0.0.0:50091The secondary namenode HTTPS server address and port.
dfs.namenode.http-address0.0.0.0:50070The address and the base port where the dfs namenode web ui will listen on.
dfs.namenode.https-address0.0.0.0:50470The namenode secure http server address and port.
dfs.namenode.backup.address0.0.0.0:50100The backup node server address and port. If the port is 0 then the server will start on a free port.
dfs.namenode.backup.http-address0.0.0.0:50105The backup node http server address and port. If the port is 0 then the server will start on a free port.
dfs.datanode.address0.0.0.0:50010The datanode server address and port for data transfer.
dfs.datanode.http.address0.0.0.0:50075The datanode http server address and port.
dfs.datanode.ipc.address0.0.0.0:50020The datanode ipc server address and port.
dfs.datanode.https.address0.0.0.0:50475The datanode secure http server address and port.
dfs.journalnode.rpc-address0.0.0.0:8485The JournalNode RPC server address and port.
dfs.journalnode.http-address0.0.0.0:8480The address and port the JournalNode HTTP server listens on. If the port is 0 then the server will start on a free port.
dfs.journalnode.https-address0.0.0.0:8481The address and port the JournalNode HTTPS server listens on. If the port is 0 then the server will start on a free port.



mapred-site.xml:

http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
配置项中文含义英文含义示例官方默认值
mapreduce.framework.nameMapReduce框架The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
local
mapreduce.shuffle.portShuffleHandler运行的端口Default port that the ShuffleHandler will run on. ShuffleHandler is a service run at the NodeManager to facilitate transfers of intermediate Map outputs to requesting Reducers. <property>
<name>mapreduce.shuffle.port</name>
<value>13562</value>
</property>
13562
mapred.system.dirYarn不支持<property>    
<name>mapred.system.dir</name>
    
<value>file:///uloc/hadoopdata/hadoop-${user.name}/mapred/system</value>
    
<final>true</final>
 
</property> 
mapred.local.dirYarn不支持<property>     
<name>mapred.local.dir</name>
   
<value>file:///uloc/hadoopdata/hadoop-${user.name}/mapred/local</value>
   
<final>true</final>
 
</property>  
mapred.child.java.optsTask任务进程的Java运行参数选项Java opts for the task processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc Usage of -Djava.library.path can cause programs to no longer function if hadoop native libraries are used. These values should instead be set as part of LD_LIBRARY_PATH in the map / reduce JVM env using the mapreduce.map.env and mapreduce.reduce.env config settings. <property>   
<name>mapred.child.java.opts</name>
   
<value>-Xmx3072M</value>
 
</property>  
-Xmx200m
mapreduce.reduce.java.optsYarn不支持<property>   
<name>mapreduce.reduce.java.opts</name>
   
<value>-Xmx1024M</value>
 
</property> 
mapreduce.map.memory.mbThe amount of memory to request from the scheduler for each map task. <property>   
<name>mapreduce.map.memory.mb</name>
   
<value>1024</value>
 
</property> 
1024
mapreduce.reduce.memory.mbThe amount of memory to request from the scheduler for each reduce task. <property>   
<name>mapreduce.reduce.memory.mb</name>
   
<value>1024</value>
 
</property>  
1024
mapreduce.task.io.sort.mbThe total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.<property>   
<name>mapreduce.task.io.sort.mb</name>
   
<value>1024</value>
 
</property>  
100
mapreduce.task.io.sort.factorThe number of streams to merge at once while sorting files. This determines the number of open file handles.<property>   
<name>mapreduce.task.io.sort.factor</name>
   
<value>100</value>
 
</property>  
10
mapreduce.reduce.shuffle.parallelcopiesThe default number of parallel transfers run by reduce during the copy(shuffle) phase. <property>   
<name>mapreduce.reduce.shuffle.parallelcopies</name>
   
<value>50</value>
 
</property>
5
mapreduce.jobhistory.addressMapReduce JobHistory Server IPC host:port<property>
<name>mapreduce.jobhistory.address</name>
<value>192.168.100.200:10020</value>
</property>
 0.0.0.0:10020 



配置项默认地址/端口英文含义
mapreduce.jobtracker.http.address0.0.0.0:50030The job tracker http server address and port the server will listen on. If the port is 0 then the server will start on a free port.
mapreduce.tasktracker.report.address127.0.0.1:0The interface and port that task tracker server listens on. Since it is only connected to by the tasks, it uses the local interface. EXPERT ONLY. Should only be changed if your host does not have the loopback interface.
mapreduce.tasktracker.http.address0.0.0.0:50060The task tracker http server address and port. If the port is 0 then the server will start on a free port.
mapreduce.jobhistory.address0.0.0.0:10020MapReduce JobHistory Server IPC host:port
mapreduce.jobhistory.webapp.address0.0.0.0:19888MapReduce JobHistory Server Web UI host:port
mapreduce.jobhistory.admin.address0.0.0.0:10033The address of the History server admin interface.



yarn-site.xml:

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
配置项中文含义英文含义示例官方默认值
yarn.resourcemanager.addressThe address of the applications manager interface in the RM.<property>
<name>yarn.resourcemanager.address</name>
<value>192.168.100.200:8032</value>
</property>
${yarn.resourcemanager.hostname}:8032 
yarn.resourcemanager.scheduler.addressThe address of the scheduler interface.<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.100.200:8030</value>
</property>
${yarn.resourcemanager.hostname}:8030
yarn.resourcemanager.resource-tracker.address<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.100.200:8031</value>
</property>
${yarn.resourcemanager.hostname}:8031
yarn.resourcemanager.admin.address The address of the RM admin interface. <property>   
<name>yarn.resourcemanager.admin.address</name>
   
<value>192.168.100.200:8033</value>
 
</property>  
${yarn.resourcemanager.hostname}:8033
yarn.resourcemanager.webapp.addressThe http address of the RM web application.<property>   
<name>yarn.resourcemanager.webapp.address</name>
   
<value>192.168.100.200:8088</value>
 
</property>  
${yarn.resourcemanager.hostname}:8088
yarn.nodemanager.aux-servicesA comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers<property>   
<name>yarn.nodemanager.aux-services</name>
   
<value>mapreduce.shuffle</value>
 
</property>  
yarn.nodemanager.aux-services.mapreduce.shuffle.classYarn不支持<property>   
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
 
</property>
yarn.scheduler.maximum-allocation-mbThe maximum allocation for every container request at the RM, in MBs. Memory requests higher than this will throw a InvalidResourceRequestException.<property>   
<name>yarn.scheduler.maximum-allocation-mb</name>
   
<value>10000</value>
 
</property> 
8192
yarn.scheduler.minimum-allocation-mbThe minimum allocation for every container request at the RM, in MBs. Memory requests lower than this will throw a InvalidResourceRequestException.<property>   
<name>yarn.scheduler.minimum-allocation-mb</name>
   
<value>1000</value>
 
</property>  
1024
mapreduce.reduce.memory.mbYarn不支持<property>   
<name>mapreduce.reduce.memory.mb</name>
   
<value>1000</value>
 
</property> 
yarn.nodemanager.local-dirsList of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this. <property>
<name>yarn.nodemanager.local-dirs</name>
<value>/uloc/hadoopdata/hadoop-${user.name}/yarn/nmlocal</value>
</property>
 ${hadoop.tmp.dir}/nm-local-dir 
yarn.nodemanager.resource.memory-mbAmount of physical memory, in MB, that can be allocated for containers.<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
8192不能设置过小(比如,1024)。否则job可以提交,但是无法运行。

实验中,2048都无法进行job运行。选择3172可以运行。
yarn.nodemanager.remote-app-log-dirWhere to aggregate logs to.<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/uloc/hadoopdata/hadoop-${user.name}/yarn/logs</value>
</property>
/tmp/logs 
yarn.nodemanager.log-dirsWhere to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container. <property>
<name>yarn.nodemanager.log-dirs</name>
<value>/uloc/hadoopdata/hadoop-${user.name}/yarn/userlogs</value>
</property>
${yarn.log.dir}/userlogs 
yarn.web-proxy.addressThe address for the web proxy as HOST:PORT, if this is not given then the proxy will run as part of the RM<property>
<name>yarn.web-proxy.address</name>
<value>192.168.100.200:54315</value>
</property>
yarn.resourcemanager.hostname The hostname of the RM. <property>
<name>yarn.resourcemanager.hostname</name>
<value>robot123</value>
</property>
0.0.0.0
yarn.nodemanager.address The address of the container manager in the NM. <property>
<name>yarn.nodemanager.address</name>
<value>192.168.100.200:11000</value>
</property> 
${yarn.nodemanager.hostname}:0



配置项默认地址/端口英文含义
yarn.resourcemanager.hostname0.0.0.0The hostname of the RM.
yarn.resourcemanager.address${yarn.resourcemanager.hostname}:8032The address of the applications manager interface in the RM.
yarn.resourcemanager.scheduler.address${yarn.resourcemanager.hostname}:8030The address of the scheduler interface.
yarn.resourcemanager.webapp.address${yarn.resourcemanager.hostname}:8088The http address of the RM web application.
yarn.resourcemanager.webapp.https.address${yarn.resourcemanager.hostname}:8090The https adddress of the RM web application.
yarn.resourcemanager.resource-tracker.address${yarn.resourcemanager.hostname}:8031
yarn.resourcemanager.admin.address${yarn.resourcemanager.hostname}:8033The address of the RM admin interface.
yarn.nodemanager.hostname0.0.0.0The hostname of the NM.
yarn.nodemanager.address${yarn.nodemanager.hostname}:0The address of the container manager in the NM.
yarn.nodemanager.localizer.address${yarn.nodemanager.hostname}:8040Address where the localizer IPC is.
yarn.nodemanager.webapp.address${yarn.nodemanager.hostname}:8042NM Webapp address.
yarn.timeline-service.hostname0.0.0.0The hostname of the timeline service web application.
yarn.timeline-service.address${yarn.timeline-service.hostname}:10200This is default address for the timeline server to start the RPC server.
yarn.timeline-service.webapp.address${yarn.timeline-service.hostname}:8188The http address of the timeline service web application.
yarn.timeline-service.webapp.https.address${yarn.timeline-service.hostname}:8190The https address of the timeline service web application.
yarn.sharedcache.admin.address0.0.0.0:8047The address of the admin interface in the SCM (shared cache manager)
yarn.sharedcache.webapp.address0.0.0.0:8788The address of the web application in the SCM (shared cache manager)
yarn.sharedcache.uploader.server.address0.0.0.0:8046The address of the node manager interface in the SCM (shared cache manager)
yarn.sharedcache.client-server.address0.0.0.0:8045The address of the client interface in the SCM (shared cache manager)


版权声明:本文为jollypigclub原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。