1.准备工作:
1.ubuntu系统3个 master slava1 slave2
2.分别安装jdk//spark
3.spark下载路径/Seafile/works/大数据安装包
4.下载安装包spark-3.0.1-bin-hadoop3.2.tgz
2.步骤
1.进入master机器
2.确保 java 环境安装成功 java -version
3.确保 scala 环境安装成功 scala -version
4.创建目录/data,将spark-3.0.1-bin-hadoop3.2.tgz移到该目录
5.进入/data目录
6.sudo tar -zxvf spark-3.0.1-bin-hadoop3.2.tgz
7.sudo mv spark-3.0.1-bin-hadoop3.2 spark
3.修改配置文件
1.cd /data/spark.conf
2.cp spark-env.sh.template spark-env.sh
3.修改spark-env.sh,在尾巴加入:
export JAVA_HOME=/opt/soft/jdk1.8.0\_201
export SCALA_HOME=/opt/soft/scala-2.11.12
export SPARK_MASTER_IP=ip #本地的ip或主机名
export SPARK_MASTER_HOST=ip #本地的ip或主机名
export SPARK_WORKER_MEMORY=4g
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1
4.cp slaves.template slaves
5.vi slaves在最后面加入:
slave1
slave2
6.将spark同步到slave1和salve2
scp –r /data/spark slave1:/data/spark/
scp –r /data/spark slave2:/data/spark/
4.启动集群:
sh sbin/start-all.sh
启动后如果报错
查看日志/data/spark/logs/spark-gleam-org.apache.spark.deploy.worker.Worker-1-ubuntu.out
5.运行自带例子:
单机:
spark-submit --class org.apache.spark.examples.JavaSparkPi /data/spark/examples/jars/spark-examples_2.12-3.0.1.jar 100
21/02/18 11:08:44 INFO DAGScheduler: Job 0 finished: reduce at JavaSparkPi.java:54, took 7.696739 s
Pi is roughly 3.1417356
集群
spark-submit --class org.apache.spark.examples.JavaSparkPi /data/spark/examples/jars/spark-examples_2.12-3.0.1.jar 100 --master master
21/02/18 11:09:43 INFO DAGScheduler: Job 0 finished: reduce at JavaSparkPi.java:54, took 1.676412 s
Pi is roughly 3.1396