SparkStreaming项目实战(一)
一、项目介绍
整合Hadoop各组件。
二、环境准备
2.1 创建机器
这里采用3台阿里云ECS【建议内存2-4G】(当然也可以用本地的虚拟机来模拟):
修改主机名
- hadoop000
- hadoop001
- hadoop002
# vi /etc/hostname |
修改IP映射
在所有机子上进行如下操作:
# vi /etc/hosts |
阿里云内网互通教程:https://blog.csdn.net/weixin_42167895/article/details/106394009
2.2 创建用户
Linux上创建hadoop用户,并赋予sudo权限。登录密码:hadoop
# adduser hadoop |
添加 sudo 权限
切换root用户
添加sudo文件的写权限
Code# chmod u+w /etc/sudoers
修改/etc/sudoers文件
Code# vi /etc/sudoers
在原有root下添加如下内容:
Code## Allow root to run any commands anywhere
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL撤销sudoers文件写权限
Code# chmod u-w /etc/sudoers
2.3 创建目录
在Linux上hadoop用户的根目录创建如下目录:
- app:存放软件的安装目录
- data:存放测试数据
- lib:存放开发的jar
- software:存放软件安装包目录
- source:存放框架源码
mkdir app data lib software source |
2.4 软件版本
【cdh 版本对应组件版本】https://blog.csdn.net/weixin_42286868/article/details/104817644
本次实战所用版本
apache-flume-1.9.0-bin.tar.gz
apache-maven-3.6.3-bin.tar.gz
hadoop-3.1.2-centos7.6-x64.tar.gz
hbase-2.2.4-bin.tar.gz
jdk-8u251-linux-x64.tar.gz
kafka_2.12-2.4.1.tgz
scala-2.12.11.tgz(Spark 2.4.5使用Scala 2.12)
spark-2.4.5-bin-hadoop2.7.tgz(注意!spark用编译源码!Choose a package type: Source Code。我因为编译不了,此次不采用编译方式)
zookeeper-3.4.14.tar.gz
2.5 集群规划
软件 | Hadoop000 | Hadoop001 | Hadoop002 |
---|---|---|---|
HDFS | NameNode DataNode |
NameNode DataNode |
DataNode |
YARN | ResourceManager NodeManager |
ResourceManager NodeManager |
NodeManager |
ZooKeeper | ZooKeeper | ZooKeeper | ZooKeeper |
Kafka | Kafka | Kafka | Kafka |
HBase | RegionServer | RegionServer | RegionServer Master |
Flume | Flume | Flume | Flume |
Spark | Spark | Spark | Spark |
三、SSH 配置
在三台机子上分别生成密钥对
Codessh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
分别追加公钥,并设置权限
Codecat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 644 ~/.ssh/authorized_keys分发给其余2台机器
Code# 在000上
scp ~/.ssh/id_rsa.pub hadoop@hadoop001:/home/hadoop/.ssh/id_rsa_hadoop000.pub
scp ~/.ssh/id_rsa.pub hadoop@hadoop002:/home/hadoop/.ssh/id_rsa_hadoop000.pub
# 在001上
scp ~/.ssh/id_rsa.pub hadoop@hadoop000:/home/hadoop/.ssh/id_rsa_hadoop001.pub
scp ~/.ssh/id_rsa.pub hadoop@hadoop002:/home/hadoop/.ssh/id_rsa_hadoop001.pub
# 在002上
scp ~/.ssh/id_rsa.pub hadoop@hadoop001:/home/hadoop/.ssh/id_rsa_hadoop002.pub
scp ~/.ssh/id_rsa.pub hadoop@hadoop000:/home/hadoop/.ssh/id_rsa_hadoop002.pub追加其他机器的公钥
Code# 在000上
cat ~/.ssh/id_rsa_hadoop001.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_rsa_hadoop002.pub >> ~/.ssh/authorized_keys
# 在001上
cat ~/.ssh/id_rsa_hadoop000.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_rsa_hadoop002.pub >> ~/.ssh/authorized_keys
# 在002上
cat ~/.ssh/id_rsa_hadoop001.pub >> ~/.ssh/authorized_keys
cat ~/.ssh/id_rsa_hadoop000.pub >> ~/.ssh/authorized_keys测试相互是否能ssh免密登录成功
四、便捷配置
4.1 批量执行命令脚本
在000上创建一个脚本文件
Codesudo touch /usr/local/bin/xcall.sh
# 将脚本标记为可执行文件
sudo chmod a+x /usr/local/bin/xcall.sh
# 编辑内容
sudo vi /usr/local/bin/xcall.shxcall.sh添加以下内容(表示通过SSH对所有主机进行命令操作):
shell!/bin/bash
params=$@
for (( i = 0 ; i <= 2 ; i = $i + 1 )) ; do
echo ============= hadoop00$i ==============
ssh hadoop00$i "$params"
done执行脚本
Code[hadoop@hadoop000 ~]$ xcall.sh hostname
============= hadoop000 ==============
hadoop000
============= hadoop001 ==============
hadoop001
============= hadoop002 ==============
hadoop002
4.2 批量复制文件脚本
在000上创建一个脚本文件
Codesudo touch /usr/local/bin/xscp.sh
# 将脚本标记为可执行文件
sudo chmod a+x /usr/local/bin/xscp.sh
# 编辑内容
sudo vi /usr/local/bin/xscp.shxcall.sh添加以下内容(表示通过SSH对所有主机进行命令操作):
shell!/bin/bash
if [[ $# -lt 1 ]] ; then echo no params ; exit ; fi
p=$1
dir=`dirname $p`
filename=`basename $p`
cd $dir
fullpath=`pwd -P .`
user=`whoami`
for (( i = 1 ; i <= 2 ; i = $i + 1 )) ; do
echo ============= hadoop00$i ==============
scp -r $p ${user}@hadoop00$i:$fullpath
done执行脚本
Code[hadoop@hadoop000 ~]$ mkdir tmp
[hadoop@hadoop000 ~]$ echo test >> tmp/xscp.txt
[hadoop@hadoop000 ~]$ xscp.sh tmp
============= hadoop000 ==============
xscp.txt 100% 5 23.6KB/s 00:00
============= hadoop001 ==============
xscp.txt 100% 5 4.0KB/s 00:00
============= hadoop002 ==============
xscp.txt
4.3 显示路径
[hadoop@hadoop000 software]$ vi ~/.bash_profile |
五、JDK 安装
下载
解压到 ~/app
Codetar -zxvf ~/software/jdk-8u251-linux-x64.tar.gz -C ~/app/
将java配置系统环境变量中:~/.bash_profile
Code# JDK
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_251
export PATH=$JAVA_HOME/bin:$PATH配置生效:
source ~/.bash_profile
分发
Code[hadoop@hadoop000 /home/hadoop/app]$xscp.sh jdk1.8.0_251
[hadoop@hadoop000 /home/hadoop/app]$xscp.sh ~/.bash_profile检测安装
Codejava -version
六、Hadoop 完全分布式搭建
在hadoop000上如下操作:
下载
解压
Code[hadoop@hadoop000 /home/hadoop/software]$tar -zxvf hadoop-3.1.2-centos7.6-x64.tar.gz -C ~/app/
配置环境变量:vi ~/.bash_profile
Code# Hadoop
export HADOOP_HOME=/home/hadoop/app/hadoop-3.1.2
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH配置生效:
source ~/.bash_profile
修改配置文件
/etc/hadoop/hadoop-env.sh
Codeexport JAVA_HOME=/home/hadoop/app/jdk1.8.0_251
*/etc/hadoop/hdfs-site.xml *
xml<configuration>
<!-- 指定HDFS副本的数量,副本数不要超过节点数量。 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>/etc/hadoop/core-site.xml
xml<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop000/</value>
</property>
<!-- 配置Hadoop的临时工作目录存放数据,默认/tmp/hadoop-${user.name} -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/tmp/hadoop</value>
</property>
</configuration>/etc/hadoop/mapred-site.xml
xml<configuration>
<!-- 指定mr运行时框架,这里指定在yarn上,默认是local在本地跑 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>/etc/hadoop/yarn-site.xml
xml<configuration>
<!-- 指定YARN(ResourceManager)的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop000</value>
</property>
<!-- reduce获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>/etc/hadoop/worker
Codehadoop000
hadoop001
hadoop002
分发
Code[hadoop@hadoop000 /home/hadoop/app]$xscp.sh hadoop-3.1.2
$ xscp.sh ~/.bash_profile初始化文件系统(在NN上,即Hadoop000上)
Code$ hadoop namenode -format
启动Hadoop集群
Code$ start-dfs.sh
$ start-yarn.sh
[hadoop@hadoop000 /home/hadoop]$xcall.sh ~/app/jdk1.8.0_251/bin/jps
============= hadoop000 ==============
20068 Jps
19591 ResourceManager
19031 NameNode
19367 SecondaryNameNode
19721 NodeManager
19183 DataNode
============= hadoop001 ==============
9156 DataNode
9268 NodeManager
9388 Jps
============= hadoop002 ==============
8882 Jps
8649 DataNode
8761 NodeManager
七、ZooKeeper 安装
在hadoop000上如下操作:
下载
解压
Codetar -zxvf ~/software/zookeeper-3.4.14.tar.gz -C ~/app/
配置环境变量:~/.bash_profile
Code# ZK
export ZK_HOME=/home/hadoop/app/zookeeper-3.4.14
export PATH=$ZK_HOME/bin:$PATH配置生效:
source ~/.bash_profile
修改配置文件
/conf/zoo.cfg
Code[hadoop@hadoop000 /home/hadoop/app/zookeeper-3.4.14/conf]$cp zoo_sample.cfg zoo.cfg
[hadoop@hadoop000 /home/hadoop/app/zookeeper-3.4.14/conf]$vi zoo.cfg
# 修改该属性:
dataDir=/home/hadoop/app/tmp/zookeeper
# 追加内容
# server.n=host:port1:port2,数字n必须是myid中的值
# port1:leader端口, 作为leader时,供follower连接的端口
# port2:选举端口,选举leader时,供其他follower连接的端口
server.1=hadoop000:2888:3888
server.2=hadoop001:2888:3888
server.3=hadoop002:2888:3888
# 然后创建对应目录:(每台都要)
$mkdir -p /home/hadoop/app/tmp/zookeeper/bin/zkEnv.sh
Code# 修改以下两个地方的目录,将日志文件输出到安装目录
if [ "x${ZOO_LOG_DIR}" = "x" ]
then
ZOO_LOG_DIR="${ZOOKEEPER_PREFIX}/logs"
fi
if [ "x${ZOO_LOG4J_PROP}" = "x" ]
then
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
fi/conf/log4j.properties
Code# 修改以下几个地方
zookeeper.root.logger=INFO, ROLLINGFILE
zookeeper.log.dir=/home/hadoop/app/zookeeper-3.4.14/logs
zookeeper.tracelog.dir=/home/hadoop/app/zookeeper-3.4.14/logs
log4j.appender.ROLLINGFILE=org.apache.log4j.DailyRollingFileAppender
#log4j.appender.ROLLINGFILE.MaxFileSize=10MB (把这句话注释掉)
分发
Code[hadoop@hadoop000 /home/hadoop/app]$xscp.sh zookeeper-3.4.14
[hadoop@hadoop000 /home/hadoop/app]$xscp.sh ~/.bash_profile在每台主机的ZK数据目录dataDir中添加myid
Code[hadoop@hadoop000 /home/hadoop]$echo 1 > /home/hadoop/app/tmp/zookeeper/myid
[hadoop@hadoop001 /home/hadoop]$echo 2 > /home/hadoop/app/tmp/zookeeper/myid
[hadoop@hadoop002 /home/hadoop]$echo 3 > /home/hadoop/app/tmp/zookeeper/myid在所有机器上启动服务
Code$zkServer.sh start
查看进程
Code$xcall.sh ~/app/jdk1.8.0_251/bin/jps
============= hadoop000 ==============
5458 QuorumPeerMain
6405 Jps
============= hadoop001 ==============
5156 QuorumPeerMain
5944 Jps
============= hadoop002 ==============
5809 Jps
5012 QuorumPeerMain
八、Hadoop HA 配置
8.1 HDFS HA + 自动容灾
修改配置文件
*/etc/hadoop/hdfs-site.xml *
xml<configuration>
<!-- 新增以下内容 -->
<!-- 添加集群服务名称 -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- myucluster下的名称节点两个id(只能有2个),HA不需要第二名称节点 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- 配置每个nn的rpc地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop000:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop001:8020</value>
</property>
<!-- 配置每个nn的webui端口 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop000:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop001:9870</value>
</property>
<!-- 名称节点共享编辑目录,即JN节点(在DN上) -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop000:8485;hadoop001:8485;hadoop002:8485/mycluster</value>
</property>
<!-- 配置JN存放edit的本地路径 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/app/tmp/hadoop/journal</value>
</property>
<!-- java类,client使用它判断哪个节点是激活态 -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 脚本列表或者java类,在容灾保护激活态的nn. -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/wbw/.ssh/id_rsa</value>
</property>
<!-- 启动自动容灾 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>/etc/hadoop/core-site.xml
xml<configuration>
<!-- *修改为集群名称 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!-- *指定zk连接地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop000:2181,hadoop001:2181,hadoop002:2181</value>
</property>
</configuration>
分发到所有机器
Code[hadoop@hadoop000 /home/hadoop/app/hadoop-3.1.2/etc]$xscp.sh hadoop
数据迁移
先停掉所有Hadoop进程,然后再所有机器上启动JN进程
Code$ hadoop-daemon.sh start journalnode
启动jn之后,在两个NN之间进行disk元数据同步
到hadoop000将元数据信息复制到hadoop001
Codescp -r /home/hadoop/app/tmp/hadoop/dfs hadoop@hadoop001:/home/hadoop/app/tmp/hadoop/
在新的nn(未格式化的nn)【这里为hadoop001】上运行一下命令,实现待命状态引导。
Code# 需要hadoop000的namenode为启动状态,提示是否格式化,选择N
[hadoop@hadoop000 /home/hadoop]$hadoop-daemon.sh start namenode
[hadoop@hadoop001 /home/hadoop]$hdfs namenode -bootstrapStandby在一个NN上执行以下命令,完成edit日志到jn节点的传输
Code[hadoop@hadoop001 /home/hadoop]$hdfs namenode -initializeSharedEdits
关闭所有Hadoop进程,然后登录其中一台NN,再ZK中初始化HA状态
Code$ hdfs zkfc -formatZK
启动dfs进程
Code$ start-dfs.sh
[hadoop@hadoop000 /home/hadoop/app/tmp/hadoop/dfs]$xcall.sh ~/app/jdk1.8.0_251/bin/jps
============= hadoop000 ==============
20128 QuorumPeerMain
31971 DFSZKFailoverController
31531 DataNode
31404 NameNode
31757 JournalNode
32029 Jps
============= hadoop001 ==============
14689 DataNode
14599 NameNode
14793 JournalNode
14924 DFSZKFailoverController
14959 Jps
9439 QuorumPeerMain
============= hadoop002 ==============
11890 DataNode
8935 QuorumPeerMain
11994 JournalNode
12047 Jps
8.2 RM 自动容灾
配置文件
/etc/hadoop/yarn-site.xml
xml<!-- 添加如下内容 -->
<!-- 开启yarn的HA -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 配置名字ID -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 配置RM节点地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop000</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop001</value>
</property>
<!-- 配置RM,WEB-UI端口 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop000:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop001:8088</value>
</property>
<!-- 配置ZK集群 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop000:2181,hadoop001:2181,hadoop002:2181</value>
</property>
分发到所有机器
Code$xscp.sh yarn-site.xml
启动yarn
九、HBase 安装
在hadoop000上如下操作:
下载
解压
Code[hadoop@hadoop000 /home/hadoop/software]$tar -zxvf hbase-2.2.4-bin.tar.gz -C ~/app/
配置环境变量中:~/.bash_profile
Code# HBase
export HBASE_HOME=/home/hadoop/app/hbase-2.2.4
export PATH=$HBASE_HOME/bin:$PATH配置生效:
source ~/.bash_profile
验证安装
Codehbase version
修改配置文件
/conf/hbase-env.sh
Code# 修改JDK路径
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_251
# 使用自己的ZK管理
export HBASE_MANAGES_ZK=false/conf/hbse-site.xml
xml<configuration>
<!-- 使用完全分布式 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 指定hbase数据在hdfs上的存放路径 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://mycluster/hbase</value>
</property>
<!-- 配置zk地址 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop000:2181,hadoop001:2181,hadoop002:2181</value>
</property>
<!-- zk的本地目录 -->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/app/tmp/zookeeper</value>
</property>
</configuration>/conf/regionservers
Codehadoop000
hadoop001
hadoop002
把Hadoop关于HDFS的相关配置文件(hdfs-site.xml和core-site.xml)拷贝到HBase的conf目录下
Code[hadoop@hadoop000 /home/hadoop/app/hadoop-3.1.2/etc/hadoop]$cp hdfs-site.xml ~/app/hbase-2.2.4/conf/
[hadoop@hadoop000 /home/hadoop/app/hadoop-3.1.2/etc/hadoop]$cp core-site.xml ~/app/hbase-2.2.4/conf分发到其他机器
Code[hadoop@hadoop000 /home/hadoop/app]$xscp.sh hbase-2.2.4
[hadoop@hadoop000 /home/hadoop/app]$xscp.sh ~/.bash_profile在Hadoop002上启动Hbase
Code$ start-hbase.sh
[hadoop@hadoop000 /home/hadoop/app]$xcall.sh ~/app/jdk1.8.0_251/bin/jps
============= hadoop000 ==============
20128 QuorumPeerMain
31971 DFSZKFailoverController
1141 HRegionServer
32394 ResourceManager
31404 NameNode
31757 JournalNode
1389 Jps
32527 NodeManager
============= hadoop001 ==============
15105 ResourceManager
15698 Jps
14599 NameNode
14793 JournalNode
15594 HRegionServer
14924 DFSZKFailoverController
15199 NodeManager
9439 QuorumPeerMain
============= hadoop002 ==============
12148 NodeManager
12389 HRegionServer
8935 QuorumPeerMain
18910 HMaster
11994 JournalNode
12493 Jps高可用配置:直接另一台机器上启动MASTER即可
Codehbase-daemon.sh start master
如果发现HMaster自动关闭,可以查看日志。如果和WAL有关,则再hbase-site加如下内容:
xml<!-- 解决HMaster自动关闭 -->
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>如果还不行,删除zk下rmr /hbase。然后删除Hadoop的日志和数据,然后格式话HDFS。
十、Flume 安装
在hadoop000上如下操作:
下载
解压到 ~/app
Codetar -zxvf ~/software/apache-flume-1.9.0-bin.tar.gz -C ~/app/
配置环境变量中:~/.bash_profile
Code# Flume
export FLUME_HOME=/home/hadoop/app/apache-flume-1.9.0-bin
export PATH=$FLUME_HOME/bin:$PATH配置生效:
source ~/.bash_profile
flume-env.sh 配置
Codecd ~/app/apache-flume-1.9.0-bin/conf/
cp flume-env.sh.template flume-env.sh
# 添加如下内容
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_251分发
Code[hadoop@hadoop000 /home/hadoop/app]$xscp.sh apache-flume-1.9.0-bin
[hadoop@hadoop000 /home/hadoop/app]$xscp.sh ~/.bash_profile检测安装
Codeflume-ng version
进行测试,查看是否两机器间flume是否可以正常运行,更多使用方式参考《Flume学习笔记》
十一、Kafka 安装
在hadoop000上如下操作:
下载
解压
Code[hadoop@hadoop000 /home/hadoop/software]$tar -zxvf kafka_2.12-2.4.1.tgz -C ~/app/
配置环境变量
Code[hadoop@hadoop000 /home/hadoop/software]$vi ~/.bash_profile
# KAFKA
export KAFKA_HOME=/home/hadoop/app/kafka_2.12-2.4.1
export PATH=$KAFKA_HOME/bin:$PATH配置生效:
source ~/.bash_profile
修改配置文件
server.properties
Code# 拷贝一份初始配置文件
[hadoop@hadoop000 /home/hadoop/app/kafka_2.12-2.4.1/config]$cp server.properties server.properties.bak修改server.properties内容:
properties# 设置ID,保证集群中唯一(这里取和zk一样的编号)
1 =
# 打开注释(注意!每台机子要记得改)
listeners=PLAINTEXT://hadoop000:9092
# 修改日志目录
/home/hadoop/app/tmp/kafka-logs =
# 修改zookeeper集群
hadoop000:2181,hadoop001:2181,hadoop002:2181 =
分发
Code[hadoop@hadoop000 /home/hadoop/app]$xscp.sh kafka_2.12-2.4.1
[hadoop@hadoop000 /home/hadoop/app]$xscp.sh ~/.bash_profile*注意修改各机器server.properties中的 *
broker.id
和listeners
在所有机器上启动服务(确保ZK集群开启)
Codekafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
查看进程
Code$xcall.sh ~/app/jdk1.8.0_251/bin/jps
============= hadoop000 ==============
5458 QuorumPeerMain
6405 Jps
6331 Kafka
============= hadoop001 ==============
5156 QuorumPeerMain
5876 Kafka
5944 Jps
============= hadoop002 ==============
5809 Jps
5012 QuorumPeerMain
5741 Kafka
进行测试,查看是否两机器间kafka是否可以正常运行,更多使用方式参考《Kafka学习笔记》
十二、Maven 安装
在hadoop000上如下操作:
下载
解压
Code[hadoop@hadoop000 /home/hadoop/software]$tar -zxvf apache-maven-3.6.3-bin.tar.gz -C ~/app/
配置环境变量:~/.bash_profile
Code# MAVEN
export MAVEN_HOME=/home/hadoop/app/apache-maven-3.6.3
export PATH=$MAVEN_HOME/bin:$PATH配置生效:
source ~/.bash_profile
添加阿里仓库
conf/settings.xml
xml<!-- 阿里云中央仓库 -->
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>查看版本
Codemvn -version
Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /home/hadoop/app/apache-maven-3.6.3
Java version: 1.8.0_251
十三、Scala 安装
下载
解压
Code[hadoop@hadoop000 /home/hadoop/software]$tar -zxvf scala-2.11.12.tgz -C ~/app/
配置环境变量
Code# SCALA
export SCALA_HOME=/home/hadoop/app/scala-2.11.12
export PATH=$SCALA_HOME/bin:$PATH配置生效:
source ~/.bash_profile
十四、Spark 安装
14.1 直接解压方式
在hadoop000上如下操作:
下载
解压
Code[hadoop@hadoop000 /home/hadoop/software]$tar -zxvf spark-2.4.5-bin-hadoop2.7.tgz -C ~/app/
配置环境变量
Code[hadoop@hadoop000 /home/hadoop/software]$vi ~/.bash_profile
# SPARK
export SPARK_HOME=/home/hadoop/app/spark-2.4.5-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH配置生效:
source ~/.bash_profile
分发
Code[hadoop@hadoop000 /home/hadoop/app]$xscp.sh spark-2.4.5-bin-hadoop2.7
[hadoop@hadoop000 /home/hadoop/app]$xscp.sh ~/.bash_profile测试
Codespark-shell
14.2 编译源码方式(推荐)
【官网教程】https://spark.apache.org/docs/latest/building-spark.html
本人在做实验的时候,编译失败?所以采用了第一种方式安装
在hadoop000上如下操作:
下载
解压
Code[hadoop@hadoop000 /home/hadoop/software]$tar -zxvf spark-2.4.5.tgz -C ~/source/
修改
pom.xml
文件,添加仓库(注意)xml<repository>
<id>cloudera</id>
<name>cloudera repository</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>Maven编译指定Hadoop版本并启用YARN,使用Hive和JDBC支持进行构建
第一种方式
Code./build/mvn -Pyarn -Phadoop-3.1 -Dhadoop.version=3.1.2 -Phive -Phive-thriftserver -DskipTests clean package
设置maven内存大小,根据实际情况调整。
Codeexport MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
第二种方式(推荐)【阿里云上太慢了,这里再本地机器上进行编译,同时使用代理】16:15~
编译成一个包,这里名字建议直接用Hadoop的版本。
Code./dev/make-distribution.sh --name hadoop3.1.2 --pip --r --tgz -Pyarn -Phadoop-3.1 -Dhadoop.version=3.1.2 -Phive -Phive-thriftserver -DskipTests clean package
修改 ./dev/make-distribution.sh 以跳过检查
注释掉以下内容:位于文件中的128~146行。
Code128 #VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\
129 # | grep -v "INFO"\
130 # | grep -v "WARNING"\
131 # | tail -n 1)
132 #SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
133 # | grep -v "INFO"\
134 # | grep -v "WARNING"\
135 # | tail -n 1)
136 #SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
137 # | grep -v "INFO"\
138 # | grep -v "WARNING"\
139 # | tail -n 1)
140 #SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
141 # | grep -v "INFO"\
142 # | grep -v "WARNING"\
143 # | fgrep --count "<id>hive</id>";\
144 # # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
145 # # because we use "set -o pipefail"
146 # echo -n)添加以下内容
CodeVERSION=2.4.5
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=3.1.2
SPARK_HIVE=1【选做】还可以调整内存大小,默认是1G,这里改成2G
Codeexport MAVEN_OPTS="${MAVEN_OPTS:--Xmx8g -XX:ReservedCodeCacheSize=2g}"
提示:
如果在编译过程中,看到的异常信息不太懂,可以在编译命令后面添加
-X
,就能看到更详细的编译信息。