Kylin 3.1.2 集群搭建
搭建准备
① 搭建环境(3 台腾讯云服务器 + 1 台 mysql 中间件)
| 节点 | node1 | node2 | node3 |
|---|---|---|---|
| 配置 | 2C - 8G - 50G | 2C - 4G - 50G | 2C - 4G - 50G |
| 服务 | node1 | node2 | node3 |
|---|---|---|---|
| Zookeeper | √ | √ | √ |
| NameNode(Hadoop) | √ | ||
| DataNode(Hadoop) | √ | √ | √ |
| JobManager(Hadoop) | √ | ||
| TaskManager(Hadoop) | √ | √ | √ |
| HBase Master(HBase) | √ | ||
| HRegionServer(HBase) | √ | √ | √ |
| Hive-Metastore-Server(Hive 元数据服务) | √ | ||
| kylin | √ |
② 搭建准备
Kylin 依赖于 Hadoop、Hive、Zookeeper 和 Hbase,所以需要先搭建好其他的应用环境
③ 软件版本要求
Hadoop 2.7.5 + Hive 2.1.0 + HBase 2.1.0
④ Kylin 下载
安装配置
① 解压
# 解压到指定目录
tar -zxvf apache-kylin-3.1.2-bin-hbase1x.tar.gz -C /opt/server
# 创建软连接
ln -s /opt/server/apache-kylin-3.1.2-bin-hbase1x /opt/server/kylin
② 增加 kylin 的依赖组件配置
# 进入 kylin 的配置目录
cd /opt/server/kylin/conf
# 创建软连接
ln -s /opt/server/hadoop-2.7.5/etc/hadoop/hdfs-site.xml hdfs-site.xml
ln -s /opt/server/hadoop-2.7.5/etc/hadoop/core-site.xml core-site.xml
ln -s /opt/server/hbase-2.1.0/conf/hbase-site.xml hbase-site.xml
ln -s /opt/server/hive-2.1.0/conf/hive-site.xml hive-site.xml
ln -s /opt/server/spark/conf/spark-defaults.conf spark-defaults.conf
③ 修改配置文件 kylin.sh
# 进入 kylin 的 bin 目录
cd /opt/server/kylin/bin
vim kylin.sh
# 增加以下内容
export HADOOP_HOME=/opt/server/hadoop-2.7.5
export HIVE_HOME=/opt/server/hive-2.1.0
export HBASE_HOME=/opt/server/hbase-2.1.0
export SPARK_HOME=/opt/server/spark
④ 配置 kylin.properties
# 进入 kylin 的 bin 目录
cd /opt/server/kylin/conf
vim kylin.properties
# 修改以下内容
# 36 行开始
kylin.env.hdfs-working-dir=/user/kylin
kylin.env.zookeeper-base-path=/kylin
# 112 行开始,去掉注释
kylin.source.hive.keep-flat-table=false
kylin.source.hive.database-for-flat-table=default
kylin.source.hive.redistribute-flat-table=true
# 126 行开始,去掉注释
kylin.storage.url=hbase
kylin.storage.hbase.table-name-prefix=KYLIN_
kylin.storage.hbase.namespace=default
kylin.storage.hbase.compression-codec=none
# 322 行开始,去掉注释并修改
kylin.env.hadoop-conf-dir=/opt/server/hadoop-2.7.5/etc/hadoop
kylin.engine.spark.rdd-partition-cut-mb=10
kylin.engine.spark.min-partition=1
kylin.engine.spark.max-partition=1000
# 322 行开始,注意这里的配置请参考 spark 的配置 $SPAEK_HOME/conf/spark-defaults.conf
kylin.engine.spark-conf.spark.master=yarn
kylin.engine.spark-conf.spark.submit.deployMode=cluster
kylin.engine.spark-conf.spark.yarn.queue=default
kylin.engine.spark-conf.spark.driver.memory=512M
kylin.engine.spark-conf.spark.executor.memory=1G
kylin.engine.spark-conf.spark.executor.instances=2
kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=512
kylin.engine.spark-conf.spark.shuffle.service.enabled=true
kylin.engine.spark-conf.spark.eventLog.enabled=true
kylin.engine.spark-conf.spark.eventLog.dir=hdfs://node1:8020/user/spark/log/
kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs://node1:8020/user/spark/log
kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false
# 352 行,注意这里的配置请参考 spark 的配置 $SPAEK_HOME/conf/spark-defaults.conf
kylin.engine.spark-conf.spark.yarn.archive=hdfs://node1:8020/user/spark/jars
kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
⑤ 初始化 kylin 在 HDFS 上的目录
hdfs dfs -mkdir /user/kylin
⑥ 配置环境变量
配置yarn的spark_shuffle实现
① 修改配置文件 yarn-site.xml ( 3 个节点都修改 )
cd /opt/server/hadoop-2.7.5/etc/hadoop
vim yarn-site.xml
# 修改及增加以下内容
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
② 将 spark-2.2.0-yarn-shuffle.jar 拷贝到 Hadoop 的目录下( 3 个节点都操作 )
/opt/server/hadoop-2.7.5/share/hadoop/yarn
# 拷贝到 node2 和 node3 节点
scp /opt/server/hadoop-2.7.5/share/hadoop/yarn/spark-2.2.0-yarn-shuffle.jar node2:$PWD
scp /opt/server/hadoop-2.7.5/share/hadoop/yarn/spark-2.2.0-yarn-shuffle.jar node3:$PWD
启动
① 启动 kylin
kylin.sh start
版权声明:本文为hell_oword原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。