Kylin 3.1.2 集群搭建

Apache Kylin | 安装指南

搭建准备

① 搭建环境(3 台腾讯云服务器 + 1 台 mysql 中间件)

节点node1node2node3
配置2C - 8G - 50G2C - 4G - 50G2C - 4G - 50G
服务node1node2node3
Zookeeper
NameNode(Hadoop)
DataNode(Hadoop)
JobManager(Hadoop)
TaskManager(Hadoop)
HBase Master(HBase)
HRegionServer(HBase)
Hive-Metastore-Server(Hive 元数据服务)
kylin

② 搭建准备

Kylin 依赖于 Hadoop、Hive、Zookeeper 和 Hbase,所以需要先搭建好其他的应用环境

Hadoop 介绍及集群搭建

Hive 环境搭建

zookeeper 集群搭建

HBase 集群搭建

环境搭建-Spark on YARN

③ 软件版本要求

Hadoop 2.7.5 + Hive 2.1.0 + HBase 2.1.0
在这里插入图片描述
④ Kylin 下载

Apache Kylin 3.1.2 Download

安装配置

① 解压

# 解压到指定目录
tar -zxvf apache-kylin-3.1.2-bin-hbase1x.tar.gz -C /opt/server

# 创建软连接
ln -s /opt/server/apache-kylin-3.1.2-bin-hbase1x /opt/server/kylin

② 增加 kylin 的依赖组件配置

# 进入 kylin 的配置目录
cd /opt/server/kylin/conf

# 创建软连接
ln -s /opt/server/hadoop-2.7.5/etc/hadoop/hdfs-site.xml hdfs-site.xml
ln -s /opt/server/hadoop-2.7.5/etc/hadoop/core-site.xml core-site.xml
ln -s /opt/server/hbase-2.1.0/conf/hbase-site.xml hbase-site.xml
ln -s /opt/server/hive-2.1.0/conf/hive-site.xml hive-site.xml
ln -s /opt/server/spark/conf/spark-defaults.conf spark-defaults.conf

③ 修改配置文件 kylin.sh

# 进入 kylin 的 bin 目录
cd /opt/server/kylin/bin
vim kylin.sh
# 增加以下内容
export HADOOP_HOME=/opt/server/hadoop-2.7.5
export HIVE_HOME=/opt/server/hive-2.1.0
export HBASE_HOME=/opt/server/hbase-2.1.0
export SPARK_HOME=/opt/server/spark

④ 配置 kylin.properties

# 进入 kylin 的 bin 目录
cd /opt/server/kylin/conf
vim kylin.properties
# 修改以下内容
# 36 行开始
kylin.env.hdfs-working-dir=/user/kylin
kylin.env.zookeeper-base-path=/kylin

# 112 行开始,去掉注释
kylin.source.hive.keep-flat-table=false
kylin.source.hive.database-for-flat-table=default
kylin.source.hive.redistribute-flat-table=true

# 126 行开始,去掉注释
kylin.storage.url=hbase
kylin.storage.hbase.table-name-prefix=KYLIN_
kylin.storage.hbase.namespace=default
kylin.storage.hbase.compression-codec=none

# 322 行开始,去掉注释并修改
kylin.env.hadoop-conf-dir=/opt/server/hadoop-2.7.5/etc/hadoop
kylin.engine.spark.rdd-partition-cut-mb=10
kylin.engine.spark.min-partition=1
kylin.engine.spark.max-partition=1000

# 322 行开始,注意这里的配置请参考 spark 的配置 $SPAEK_HOME/conf/spark-defaults.conf
kylin.engine.spark-conf.spark.master=yarn
kylin.engine.spark-conf.spark.submit.deployMode=cluster
kylin.engine.spark-conf.spark.yarn.queue=default
kylin.engine.spark-conf.spark.driver.memory=512M
kylin.engine.spark-conf.spark.executor.memory=1G
kylin.engine.spark-conf.spark.executor.instances=2
kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=512
kylin.engine.spark-conf.spark.shuffle.service.enabled=true
kylin.engine.spark-conf.spark.eventLog.enabled=true
kylin.engine.spark-conf.spark.eventLog.dir=hdfs://node1:8020/user/spark/log/
kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs://node1:8020/user/spark/log
kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false

# 352 行,注意这里的配置请参考 spark 的配置 $SPAEK_HOME/conf/spark-defaults.conf
kylin.engine.spark-conf.spark.yarn.archive=hdfs://node1:8020/user/spark/jars
kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec

⑤ 初始化 kylinHDFS 上的目录

hdfs dfs -mkdir /user/kylin

⑥ 配置环境变量


配置yarn的spark_shuffle实现

① 修改配置文件 yarn-site.xml ( 3 个节点都修改 )

cd /opt/server/hadoop-2.7.5/etc/hadoop
vim yarn-site.xml
# 修改及增加以下内容
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle,spark_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
        <value>org.apache.spark.network.yarn.YarnShuffleService</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

② 将 spark-2.2.0-yarn-shuffle.jar 拷贝到 Hadoop 的目录下( 3 个节点都操作 )

/opt/server/hadoop-2.7.5/share/hadoop/yarn

# 拷贝到 node2 和 node3 节点
scp /opt/server/hadoop-2.7.5/share/hadoop/yarn/spark-2.2.0-yarn-shuffle.jar node2:$PWD
scp /opt/server/hadoop-2.7.5/share/hadoop/yarn/spark-2.2.0-yarn-shuffle.jar node3:$PWD

启动

① 启动 kylin

kylin.sh start

版权声明:本文为hell_oword原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。