hadoop+hbase压缩snappy,gz

目录

snappy压缩:

gz压缩


snappy压缩:

  1. 介绍snappy

  2. 安装工具

  3. 编译hadoop(生产文件)

  4. 测试

使用版本: 

hadoop-2.6.0-cdh5.7.0

centos 7

java 1.7.0_80

开始情况:

开始安装

2.安装工具:

java 安装并配置PATH

java下载路径:  https://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html

vi /etc/profile (添加如下)

export JAVA_HOME=/opt/jdk1.8.0_191

export PATH=$JAVA_HOME/bin:$PATH

启动环境变量 source /etc/profile

安装基础软件

yum -y install gcc gcc-c++ libtool cmake maven zlib-devel openssl-devel protobuf

解压安装:

hadoop-2.6.0-cdh5.9.0-src.tar.gz(下载地址:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0-src.tar.gz,也可下载二进制包,内包含src源码:hadoop-2.6.0-cdh5.9.0-tar.gz)

snappy1.1.1.tar.gz(下载地址:http://pkgs.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.1.tar.gz/8887e3b7253b22a31f5486bca3cbc1c2/snappy-1.1.1.tar.gz

安装snappy

# tar xf snappy-1.1.1.tar.gz
# cd snappy-1.1.1
# ./configure
# make && make install

查看snappy是否安装完成

# ll /usr/local/lib/ | grep snappy

3: 编译hadoop

# cd hadoop-2.6.0-cdh5.7.0

# mvn package -Pdist,native -DskipTests -Dtar  -Drequire.snappy

把编译生成的hadoop动态库替换原来的

在hadoop2.x源码已经集成了Snappy压缩了,所以编译安装hadoop-snappy 根本是多余的,只要安装snappy本地库和重新编译hadoop native 库

4:修改配置文件:

hadoop.env.sh

export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native

core-site.xml

        <property>
            <name>io.compression.codecs</name>
            <value>org.apache.hadoop.io.compress.SnappyCodec</value>
        </property>

mapred-site.xml

<property>    
    <name>mapred.output.compress</name>    
    <value>true</value>    
</property>    
<property>    
    <name>mapred.output.compression.codec</name>    
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>    
</property>    
<property>    
    <name>mapred.compress.map.output</name>    
    <value>true</value>    
</property>    
<property>    
    <name>mapred.map.output.compression.codec</name>    
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>    
</property>  

hadoop测试:

全部配置好后(集群中所有的节点都需要copy动态库lib和修改配置),重启hadoop集群环境,运行自带的测试实例 wordcount,如果mapreduce过程中没有错误信息即表示snappy压缩安装方法配置成功。

hbase-site.xml

  <property>
     <name>hbase.regionserver.codecs</name>
     <value>snappy</value>
  </property>

hbase测试:

bin/hbase shell
create 'test',{NAME=>'cf',COMPRESSION='SNAPPY'}
put 'test','1','cf:f1','1'
hbase(main):014:1> disable 'tsdb'                                          #禁用表
hbase(main):014:1> desc 'tsdb'                                             #查看表结构
hbase(main):014:1> alter 'tsdb',{NAME=>'t',COMPRESSION => 'SNAPPY'}        #压缩修改为snappy
hbase(main):014:1> enable 'tsdb'                                           #使用该表
hbase(main):014:1> major_compact 'tsdb'                                    #最好使该表的region compact一次

所遇问题:

[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (clean) @ hadoop-main ---
[WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed with message:
Detected JDK Version: 1.8.0-191 is not in the allowed range [1.7.0,1.7.1000}].
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................ FAILURE [0.188s]
[INFO] Apache Hadoop Project POM ......................... SKIPPED
[INFO] Apache Hadoop Annotations ......................... SKIPPED
[INFO] Apache Hadoop Assemblies .......................... SKIPPED
[INFO] Apache Hadoop Project Dist POM .................... SKIPPED
[INFO] Apache Hadoop Maven Plugins ....................... SKIPPED
[INFO] Apache Hadoop MiniKDC ............................. SKIPPED
[INFO] Apache Hadoop Auth ................................ SKIPPED
[INFO] Apache Hadoop Auth Examples ....................... SKIPPED
[INFO] Apache Hadoop Common .............................. SKIPPED
[INO] Apache Hadoop NFS ................................. SKIPPED
[INFO] Apache Hadoop KMS ................................. SKIPPED
[INFO] Apache Hadoop Common Project ...................... SKIPPED
[INFO] Apache Hadoop HDFS ................................ SKIPPED
[INFO] Apache Hadoop HttpFS .............................. SKIPPED
[INFO] Apache Hadoop HDFS BookKeeper Journal ............. SKIPPED
[INFO] Apache Hadoop HDFS-NFS ............................ SKIPPED
[INFO] Apache Hadoop HDFS Project ........................ SKIPPED
[INFO] hadoop-yarn ....................................... SKIPPED
[INFO] hadoop-yarn-api ................................... SKIPPED
[INFO] hadoop-yarn-common ................................ SKIPPED
[INFO] hadoop-yarn-server ................................ SKIPPED
[INFO] hadoop-yarn-server-common ......................... SKIPPED
[INFO] hadoop-yarn-server-nodemanager .................... SKIPPED
[INFO] hadoop-yarn-server-web-proxy ...................... SKIPPED
[INFO] hadoop-yarn-server-applicationhistoryservice ...... SKIPPED
[INFO] hadoop-yarn-server-resourcemanager ................ SKIPPED
[INFO] hadoop-yarn-server-tests .......................... SKIPPED
[INFO] hadoop-yarn-client ................................ SKIPPED
[INFO] hadoop-yarn-applications .......................... SKIPPED
[INFO] hadoop-yarn-applications-distributedshell ......... SKIPPED
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SKIPPED
[INFO] hadoop-yarn-site .................................. SKIPPED
[INFO] hadoop-yarn-registry .............................. SKIPPED
[INFO] hadoop-yarn-project ............................... SKIPPED
[INFO] hadoop-mapreduce-client ........................... SKIPPED
[INFO] hadoop-mapreduce-client-core ...................... SKIPPED
[INFO] hadoop-mapreduce-client-common .................... SKIPPED
[INFO] hadoop-mapreduce-client-shuffle ................... SKIPPED
[INFO] hadoop-mapreduce-client-app ....................... SKIPPED
[INFO] hadoop-mapreduce-client-hs ........................ SKIPPED
[INFO] hadoop-mapreduce-client-jobclient ................. SKIPPED
[INFO] hadoop-mapreduce-client-hs-plugins ................ SKIPPED
[INFO] hadoop-mapreduce-client-nativetask ................ SKIPPED
[INFO] Apache Hadoop MapReduce Examples .................. SKIPPED
[INFO] hadoop-mapreduce .................................. SKIPPED
[INFO] Apache Hadoop MapReduce Streaming ................. SKIPPED
[INFO] Apache Hadoop Distributed Copy .................... SKIPPED
[INFO] Apache Hadoop Archives ............................ SKIPPED
[INFO] Apache Hadoop Archive Logs ........................ SKIPPED
[INFO] Apache Hadoop Rumen ............................... SKIPPED
[INFO] Apache Hadoop Gridmix ............................. SKIPPED
[INFO] Apache Hadoop Data Join ........................... SKIPPED
[INFO] Apache Hadoop Ant Tasks ........................... SKIPPED
[INFO] Apache Hadoop Extras .............................. SKIPPED
[INFO] Apache Hadoop Pipes ............................... SKIPPED
[INFO] Apache Hadoop OpenStack support ................... SKIPPED
[INFO] Apache Hadoop Amazon Web Services support ......... SKIPPED
[INFO] Apache Hadoop Azure support ....................... SKIPPED
[INFO] Apache Hadoop Client .............................. SKIPPED
[INFO] Apache Hadoop Mini-Cluster ........................ SKIPPED
[INFO] Apache Hadoop Scheduler Load Simulator ............ SKIPPED
[INFO] Apache Hadoop Tools Dist .......................... SKIPPED
[IFO] Apache Hadoop Tools ............................... SKIPPED
[INFO] Apache Hadoop Distribution ........................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.092s
[INFO] Finished at: Tue Nov 20 09:25:13 CST 2018
[INFO] Final Memory: 41M/1920M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce (clean) on project hadoop-main: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce (clean) on project hadoop-main: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed.
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:217)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
    a org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
    at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
    at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
    at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
    at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
    at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:414)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:357)
Caused by: org.apache.maven.plugin.MojoExecutionException: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed.
    at org.apache.maven.plugins.enforcer.EnforceMojo.execute(EnforceMojo.java:209)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)

    .. 19 more
[ERRR]
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

java的版本不对, 最开始使用的是[

root@slave1 bin]# java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

解决方案: 修改java版本

gz压缩:

使用gz压缩格式(非常简单):

core-site.xml

       <property>
            <name>io.compression.codecs</name>
            <value>org.apache.hadoop.io.compress.GzipCodec,
            org.apache.hadoop.io.compress.DefaultCodec,
            org.apache.hadoop.io.compress.SnappyCodec</value>
        </property>

mapred-site.xml

    <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
    </property>
    <property>
        <name>mapreduce.map.output.compress.codec</name>
        <value>org.apache.hadoop.io.compress.GzipCodec</value>
    </property>

hbase:

hbase(main):014:1> disable 'tsdb'                                          #禁用表
hbase(main):014:1> desc 'tsdb'                                             #查看表结构
hbase(main):014:1> alter 'tsdb',{NAME=>'t',COMPRESSION => 'GZ'}        #压缩修改为snappy
hbase(main):014:1> enable 'tsdb'                                           #使用该表
hbase(main):014:1> major_compact 'tsdb'                                    #最好使该表的region compact一次

所遇问题:

hbase(main):001:0> list
TABLE                                                                                                                                                                                                                                                                       
ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
    at org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2296)
    at org.apache.hadoop.hbase.master.MasterRpcServices.isMasterRunning(MasterRpcServices.java:936)
    at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55654)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
    at java.lang.Thread.run(Thread.java:748)


Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:

  hbase> list
  hbase> list 'abc.*'
  hbase> list 'ns:abc.*'
  hbase> list 'ns:.*'

解决方案:

查看lHbase的log日志,发现dfs是安全模式

  1. 2015-09-11 15:28:09,429 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
    2015-09-11 15:28:19,433 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
    2015-09-11 15:28:29,458 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
    2015-09-11 15:28:29,540 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: getMaster attempt 23 of 140 failed; retrying after sleep of 64134
    org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet

     

查看dfs模式

  1. grid@master1:~$ hadoop dfsadmin -safemode get

  2. Safe mode is ON

关闭dfs安全模式

  1. grid@master1:~$ hadoop dfsadmin -safemode leave

  2. Safe mode is OFF

再次启动Hbase,能正常启动

GZ测试结果:

由8.4G的dfs目录文件变成了1.3G, 

 

参考: 

https://blog.csdn.net/qq_27078095/article/details/56865443

http://www.micmiu.com/bigdata/hadoop/hadoop-snappy-install-config/

https://blog.csdn.net/dante_003/article/details/78855323


版权声明:本文为u012447842原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。