原生Hadoop搭建

博主： smalleyes
发布时间：2025 年 05 月 11 日
45 次浏览
暂无评论
33006字数
分类：大数据

生产环境版本

jdk-1.7.0_71
SCALA-2.11.8
ZOOKEPPER-3.4.6
SPARK-2.1.0
HIVE-1.2.1
HBASE-1.0.2
mysql-5.6.33
kafka-2.1.0_0.8.2.0

集群搭建

一.服务器准备

1.挂载数据盘(root)

数据盘的设备名默认由系统分配，I/O优化实例的数据盘设备名从 /dev/vdb递增排列，包括 /dev/vdb−/dev/vdz。如果数据盘设备名为 dev/xvd*（ *是a−z的任意一个字母），表示您使用的是非I/O优化实例。

查看数据盘
执行命令后，如果不存在 /dev/vdb，表示您的实例没有数据盘。确认数据盘是否已挂载。
```
fdisk -l
```

分区数据盘(一般情况分一个区即可)

fdisk -u /dev/vdb
p 查看数据盘分区情况
n 创建新分区
p 选择分区类型为主分区

输入分区编号并按回车键。仅创建一个分区，输入1。
输入第一个可用的扇区编号：按回车键采用默认值2048。
输入最后一个扇区编号：仅创建一个分区，按回车键采用默认值。
输入p：查看该数据盘的规划分区情况。
输入w：开始分区，并在分区后退出。

查看新分区

fdisk -lu /dev/vdb
-----------------------------------------------------------
Disk /dev/vdb: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x3e60020e

Device Boot Start End Blocks Id System
/dev/vdb1 2048 41943039 20970496 83 Linux

在新分区创建文件系统
如果需要在 Linux、Windows和Mac系统之间共享文件，可以使用mkfs.vfat创建VFAT文件系统。
```
mkfs.ext4 /dev/vdb1
```
备份etc/fstab文件
```
cp /etc/fstab /etc/fstab.bak
```

向etc/fstab写入新分区信息

echo /dev/vdb1 /mnt ext4 defaults 0 0 >> /etc/fstab

查看新分区信息
```
cat /etc/fstab
```

挂载文件系统

mount /dev/vdb1/ /app

#若需要卸载文件系统可执行以下命令:
umount /app

查看磁盘使用情况
若出现新建文件系统信息,则挂载成功
```
df -h
```

2.创建用户(root用户执行)

#创建用户主目录
useradd -d /app -m app

passwd hadoop

3.修改主机名(root用户执行)

vim /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=hadoop03

4.修改hosts文件(root用户执行)

vim /etc/hosts

10.0.0.99       hadoop01
10.0.0.100      hadoop02
10.0.0.101      hadoop03
10.0.0.102      hadoop04

修改完以上文件后 reboot 重启

5.配置ssh免密登录

生成rsa秘钥
```
ssh-keygen -t rsa 
```
拷贝秘钥至其他服务器上
在一台服务器上配置好所有服务器的公钥,然后复制到其他服务器即可,本机的公钥也需要
```
scp .ssh/id_rsa.pub hadoop@hadoop02:/app/hadoop/id_rsa.pub 
cat id_rsa.pub >> ~/.ssh/authorized_keys
```
修改文件夹权限
```
1.chmod 700 -R ~/.ssh
```

其他方式

ssh-copy-id -i ~/.ssh/id_rsa.pub app@192.168.1.233

二.JDK与SCALA环境搭建

1.复制JDK包与SCALA包并解压

scp jdk-1.7.0_71.tar hadoop@10.0.0.99:/app/java
tar -xvf jdk-1.7.0_71.tar

2.配置环境变量

vim /etc/profile
#并输入以下内容
export JAVA_HOME=/app/java/jdk1.7.0_71
export JRE_HOME=/app/java/jdk1.7.0_71/jre
export SCALA_HOME=/app/scala/scala-2.11.8

export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$SCALA_HOME/bin:$PATH

三.zookeeper集群搭建

1.zookeeper包下载,解压

tar -xvf zookeeper-3.4.6.tar

2.创建data和logs目录

mkdir data
mkdir logs

3.配置zoo.cfg

cp zoo_sample.cfg zoo.cfg

#修改zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
# The number of milliseconds of each tick


tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataLogDir=/app/hadoop/zookeeper3.4.6/logs
dataDir=/app/hadoop/zookeeper3.4.6/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
maxClientCnxns=500
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=24
server.1=hadoop01:2888:3888 
server.2=hadoop02:2888:3888 
server.3=hadoop03:2888:3888

#  心跳间隔时间
tickTime=2000 
# 最小SessionTimeou 
minSessionTimeout=4000
# 最大SessionTimeou
maxSessionTimeout=100000

4.创建data/myid文件

echo 1 >> myid

5.拷贝文件至其他服务器

scp -r zookeeper3.4.6/ hadoop@hadoop02:/app/data/

6.修改各服务器myid文件

修改为服务器对应server.n 的n值

7.启动zk

./zkServer.sh start
#集群启动需要所有服务都启动完,zk状态才会正常

8.开机自启动

#待完成

9.集群数量单数原因

容错
由于在增删改操作中需要半数以上服务器通过，来分析以下情况。
2台服务器，至少2台正常运行才行（2的半数为1，半数以上最少为2），正常运行1台服务器都不允许挂掉
3台服务器，至少2台正常运行才行（3的半数为1.5，半数以上最少为2），正常运行可以允许1台服务器挂掉
4台服务器，至少3台正常运行才行（4的半数为2，半数以上最少为3），正常运行可以允许1台服务器挂掉
5台服务器，至少3台正常运行才行（5的半数为2.5，半数以上最少为3），正常运行可以允许2台服务器挂掉
6台服务器，至少3台正常运行才行（6的半数为3，半数以上最少为4），正常运行可以允许2台服务器挂掉

通过以上可以发现，3台服务器和4台服务器都最多允许1台服务器挂掉，5台服务器和6台服务器都最多允许2台服务器挂掉

但是明显4台服务器成本高于3台服务器成本，6台服务器成本高于5服务器成本。这是由于半数以上投票通过决定的。

防脑裂
一个zookeeper集群中，可以有多个follower、observer服务器，但是必需只能有一个leader服务器。
如果leader服务器挂掉了，剩下的服务器集群会通过半数以上投票选出一个新的leader服务器。
集群互不通讯情况：
一个集群3台服务器，全部运行正常，但是其中1台裂开了，和另外2台无法通讯。3台机器里面2台正常运行过半票可以选出一个leader。
一个集群4台服务器，全部运行正常，但是其中2台裂开了，和另外2台无法通讯。4台机器里面2台正常工作没有过半票以上达到3，无法选出leader正常运行。
一个集群5台服务器，全部运行正常，但是其中2台裂开了，和另外3台无法通讯。5台机器里面3台正常运行过半票可以选出一个leader。
一个集群6台服务器，全部运行正常，但是其中3台裂开了，和另外3台无法通讯。6台机器里面3台正常工作没有过半票以上达到4，无法选出leader正常运行。

### 四.Hadoop集群搭建

1.下载并拷贝包至服务器

scp hadoop-2.6.0.tat.gz hadoop@10.0.0.99:/app/hadoop

2.创建数据存放目录

mkdir -p app/data/hadoop/dfs/tmp
mkdir -p app/data/hadoop/dfs/data
mkdir -p app/data/hadoop/dfs/journal
mkdir -p app/data/hadoop/dfs/name

3.修改hadoop-env.sh

#修改java环境变量
export JAVA_HOME=/app/java/jdk1.7.0_71

4.修改core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<!--指定tmp存放目录-->
<name>hadoop.tmp.dir</name>
<value>/app/data/hadoop/dfs/tmp</value>
<description>Abaseforothertemporarydirectories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-cluster1</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec
</value>
</property>
<!--指定zookeeper地址-->
 <property>
      <name>ha.zookeeper.quorum</name>
      <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
 </property>
</configuration>

5.修改hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop01:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/app/data/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/app/data/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
    <name>dfs.hosts.exclude</name>
    <value>/app/hadoop/hadoop-2.6.0/etc/hadoop/excludes</value>
</property>
<property>
        <name>dfs.ha.namenodes.hadoop-cluster1</name>
        <value>nn1,nn2</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.hadoop-cluster1.nn1</name>
        <value>hadoop01:9000</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.hadoop-cluster1.nn1</name>
        <value>hadoop01:50070</value>
    </property>
 <property>
        <name>dfs.namenode.rpc-address.hadoop-cluster1.nn2</name>
        <value>hadoop02:9000</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.hadoop-cluster1.nn2</name>
        <value>hadoop02:50070</value>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://hadoop02:8485;hadoop03:8485;hadoop04:8485/hadoop-cluster1</value>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.hadoop-cluster1</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/app/hadoop/.ssh/id_rsa</value>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/app/data/hadoop/dfs/journal</value>
    </property>
    <!-- 开启NameNode故障时自动切换 -->
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
</configuration>

6.修改mapred-site.xml

cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
vim mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop01:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop01:19888</value>
    </property>
    <property>
        <name>mapreduce.map.output.compress</name> 
        <value>true</value>
    </property>
    <property>
        <name>mapreduce.map.output.compress.codec</name> 
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
    </property>
</configuration>

7.修改yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>        
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>hadoop01:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>hadoop01:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>hadoop01:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>hadoop01:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>hadoop01:8088</value>
    </property>
</configuration>

8.修改环境变量

vim /etc/profile

export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.6.0
export HADOOP_HOME=/app/hadoop/hadoop-2.6.0
export HADOOP_CONF_DIR=/app/hadoop/hadoop-2.6.0/etc/hadoop
export YARN_CONF_DIR=/app/hadoop/hadoop-2.6.0/etc/hadoop
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$SCALA_HOME/bin:$MAVEN_HOME/bin:$ZK_HOME/bin:$HADOOP_COMMON_HOME/bin:$HADOOP_COMMON_HOME/sbin:$PATH

9.分发hadoop包至其他节点

scp -r hadoop-2.6.0 hadoop@hadoop02:/app/hadoop/
scp -r hadoop-2.6.0 hadoop@hadoop03:/app/hadoop/
scp -r hadoop-2.6.0 hadoop@hadoop04:/app/hadoop/

10.启动journalnode节点

./sbin/hadoop-daemon.sh  start journalnode

11.格式化zkfc

/bin/hdfs zkfc -formatZK

12.格式化namenode

./bin/hdfs namenode -format

13.启动datanode

./sbin/hadoop-daemon.sh start datanode

14.启动namenode

#namenode1 
./sbin/hadoop-daemon.sh start namenode
 
#namenode2
./bin/hdfs namenode -bootstrapStandby
./sbin/hadoop-daemon.sh start namenode

15.启动yarn

./start-yarn.sh

16.查看集群状态

http://10.0.0.99:50070

17.待解决问题

snappy库支持问题.
#安装gcc环境
yum install -y gcc-c++

五.kafka集群搭建

1.下载kafka包并上传至服务器

http://kafka.apache.org/downloads.html

2.解压tar包

tar -xvf kafka_2.10-0.8.2.0.tgz

3.修改配置文件

server.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
# 
#    http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

############################# Socket Server Settings #############################

# The port the socket server listens on
port=9092

# Hostname the broker will bind to. If not set, the server will bind to all interfaces
#host.name=localhost

# Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for "host.name" if configured.  Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=<hostname routable by clients>

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=<port accessible by clients>

# The number of threads handling network requests
num.network.threads=3
 
# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
log.dirs=/app/hadoop/kafka_2.10-0.8.2.0/logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=6

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk. 
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. 
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion
log.retention.hours=48

# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according 
# to the retention policies
log.retention.check.interval.ms=300000

# By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
# If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
log.cleaner.enable=false

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000

4.复制包至其他节点

scp -r kafka-2.10_0.8.2.0 hadoop@hadoop03:/app/hadoop

5.启动各服务器kafka

./kafka-server-start.sh ../config/server.properties &

6.验证集群

#创建一个测试topic
./kafka-topics.sh --create --zookeeper hadoop01:2181 --replication-factor 3 --partitions 1 --topic wxtest

#启动一台kafka hadoop04 的producer并往hadoop03推送数据
./kafka-console-producer.sh --broker-list hadoop03:9092 --topic wxtest
test for hadoop03

#停掉hadoop03的kafka
./kafka-server-stop.sh

#在hadoop02节点启动consumer,看是否接收到hadoop04推送的数据
./kafka-console-consumer.sh    --zookeeper hadoop01:2181 --topic wxtest --from-beginning

7.待解决问题

kafka启动后,hadoop02 zk挂掉的问题

六.安装mysql

./mysql_install_db --verbose --user=hadoop --defaults-file=/app/hadoop/mysql-5.6.33-linux-glibc2.5-x86_64/my.cnf --datadir=/app/data/mysql/data/ --basedir=/app/hadoop/mysql-5.6.33-linux-glibc2.5-x86_64 --pid-file=/app/data/mysql/data/mysql.pid --tmpdir=/app/data/mysql/tmp

cp support-files/mysql.server /etc/init.d/mysql

./mysqld_safe --defaults-file=/etc/my.cnf --socket=/app/data/mysql/tmp/mysql.sock --user=hadoop

./mysql -h localhost -S /app/data/mysql/tmp/mysql.sock -u root -p

GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'rootbqs123' WITH GRANT OPTION;

create database hive;
alter database hive character set latin1;

七.hive搭建

1.下载hive1包,上传至服务器并解压

apache-hive- 1.2.1-bin.tar.gz

http://hive.apache.org/downloads.html

tar -xvf apache-hive-1.2.1-bin.tar.gz

2.修改hive-env.sh文件

注:hive-env.sh初始时没有,需要复制hive-env.sh.template文件

cp hive-env.sh.template hive-env.sh
vim hive-env.sh

3.修改hive-site.xml文件

4.启动hivemetastore

#-p参数若不指定,默认为9083端口
hive --service metastore -p <port_num>

#客户端使用hive命令进入
hive

5.注意事项

#启动时报错:/tmp/hive on HDFS should be writable. Current permissions are: rwx--x--x
# 当前用户在hdfs无权限写入数据~解决方式
hadoop fs -chmod -R 777 /tmp

八.HBase搭建

hbase采用分布式集群搭建，节点情况如下

hadoop01	hadoop02	hadoop03	hadoop04
Hmaster	Hmaster
	regionserver	regionserver	regionserver

1.上传hbase-1.0.2-bin.tar.gz包至服务器

下载地址:http://archive.apache.org/dist/hbase/hbase-1.0.2/

tar -xvf hbase-1.0.2-bin.tar.gz

2.修改hbase-env.sh文件

#
#/**
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.7+ required.
export JAVA_HOME=/app/java/jdk1.7.0_71/

# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=

# The maximum amount of heap to use. Default is left to JVM default.
export HBASE_HEAPSIZE=1G

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/
export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:$HBASE_HOME/lib/native/

# Uncomment below if you intend to use off heap cache. For example, to allocate 8G of 
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="$HBASE_OPTS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly"



# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# See the package documentation for org.apache.hadoop.hbase.io.hfile for other configurations
# needed setting up off-heap block caching. 

# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# NOTE: HBase provides an alternative JMX implementation to fix the random ports issue, please see JMX
# section in HBase Reference Guide for instructions.

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"

# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/app/hadoop/hbase-1.0.2/pids

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false

# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.

3.修改hbase-site.xml

4.修改regionservers文件

写入需要作为从节点的服务器

hadoop02
hadoop03
hadoop04

5.创建backup-masters文件并写入备份主节点

hadoop02

6.链接hadoop的配置

ln -s /app/hadoop/hadoop2.6.0/etc/hadoop/hdfs-site.xml /app/hadoop/hbase-1.0.2/conf/
ln -s /app/hadoop/hadoop2.6.0/etc/hadoop/core-site.xml /app/hadoop/hbase-1.0.2/conf/

7.拷贝包至其他服务器

scp -r /app/hadoop/hbase-1.0.2/ hadoop@hadoop02:/app/hadoop/
scp -r /app/hadoop/hbase-1.0.2/ hadoop@hadoop03:/app/hadoop/
scp -r /app/hadoop/hbase-1.0.2/ hadoop@hadoop04:/app/hadoop/

8.配置环境变量

vim /etc/profile
#添加以下配置
export HBASE_HOME=/home/hadoop/hbase-1.0.2
export PATH=$HABSE_HOME/bin:$PATH

9.启动服务

方式一:

start-hbase.sh
#启动日志如下:
starting master, logging to /app/hadoop/hbase-1.0.2/logs/hbase-hadoop-master-hadoop01.out
hadoop04: starting regionserver, logging to /app/hadoop/hbase-1.0.2/bin/../logs/hbase-hadoop-regionserver-hadoop04.out
hadoop02: starting regionserver, logging to /app/hadoop/hbase-1.0.2/bin/../logs/hbase-hadoop-regionserver-hadoop02.out
hadoop03: starting regionserver, logging to /app/hadoop/hbase-1.0.2/bin/../logs/hbase-hadoop-regionserver-hadoop03.out
hadoop02: starting master, logging to /app/hadoop/hbase-1.0.2/bin/../logs/hbase-hadoop-master-hadoop02.out

方式二:

#启动主节点
hbase-daemon.sh start master
#启动从节点
hbase-daemon.sh start regionserver

10.问题

启动异常

java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer
        at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2523)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:64)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2538)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2521)
        ... 5 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1905)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:769)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:575)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:492)
        ... 10 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1811)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
        ... 13 more

原因: hbase-site配置了phoenix相关内容,但lib目录无相关jar包导致

解决办法:

1.去除Phoenix相关配置

2.拷贝Phoenix相关jar包至各目录

本次通过方法1成功启动, phoenix相关配置后续加入

11.hive整合hbase

因为Hive与HBase整合的实现是利用两者本身对外的API接口互相通信来完成的，其具体工作交由Hive的lib目录中的hive-hbase-handler- .jar工具类来实现。所以只需要将hive的 hive-hbase-handler- .jar 复制到hbase/lib中就可以了。

#拷贝至本机目录下
cp /app/hadoop/apache-hive-1.2.1-bin/lib/hive-hbase-handler-1.2.1.jar /app/hadoop/hbase-1.0.2/lib/

#拷贝至其他主机目录下
cd /app/hadoop/apache-hive-1.2.1-bin/lib
scp hive-hbase-handler-1.2.1.jar hadoop@hadoop02:/app/hadoop/hbase-1.0.2/lib/

测试整合效果

通过不同主机分别进入hive和hbase

hive
hbase shell


#在hive创建表

在hbase创建表'wx_test_hive_hbase'

 create 'wx_test_hive_hbase','INFO'

在hive创建表'hive_wx_test_hive_hbase'

create external table hive_wx_test_hive_hbase(
id string,
area_code string,
area_desc string
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,INFO:areaCode,INFO:areaDesc")
TBLPROPERTIES("hbase.table.name" = "wx_test_hive_hbase");

create external table hive_wx_test(
id string,
area_code string
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,INFO:areaCode")
TBLPROPERTIES("hbase.table.name" = "wx_test");

在hive和hbase分别插入不同数据

#hive

#hbase
put 'wx_test_hive_hbase','00001','INFO:area_code','0001','INFO:area_desc','深圳'

九.spark集群搭建

1.下载spark并上传服务器

https://archive.apache.org/dist/spark/spark-2.1.0/

2.修改配置文件spark-env.sh

注:原始包里只有spark-env.sh.template文件,需要拷贝一份为spark-env.sh文件

cp spark-env.sh.template spark-env.sh

#修改配置如下:

3.修改u

最后修改：2025 年 05 月 11 日

如果觉得我的文章对你有用，请随意赞赏

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

评论 *

私密评论

名称 *

🎲

邮箱 *

地址

原生Hadoop搭建

smalleyes • 2025 年 05 月 11 日

<h2>生产环境版本</h2><p>jdk-1.7.0_71<br>SCALA-2.11.8<br>ZOOKEPPER-3.4.6<br>SPARK-2.1.0<br>HIVE-1.2.1<br>HBASE-1.0.2<br>mysql-5.6.33<br>kafka-2.1.0_0.8.2.0</p><h2>集群搭建</h2><h3>一.服务器准备</h3><h4>1.挂载数据盘(root)</h4><p>数据盘的设备名默认由系统分配，I/O优化实例的数据盘设备名从 /dev/vdb递增排列，包括 /dev/vdb−/dev/vdz。如果数据盘设备名为 <code>dev/xvd*</code>（ <code>*</code>是a−z的任意一个字母），表示您使用的是非I/O优化实例。</p><ul><li><p>查看数据盘</p><p>执行命令后，如果不存在 /dev/vdb，表示您的实例没有数据盘。确认数据盘是否已挂载。</p><pre><code class="lang-shell">fdisk -l</code></pre></li><li><p>分区数据盘(一般情况分一个区即可)</p><pre><code class="lang-shell">fdisk -u /dev/vdb
p 查看数据盘分区情况
n 创建新分区
p 选择分区类型为主分区

输入分区编号并按回车键。仅创建一个分区，输入1。
输入第一个可用的扇区编号：按回车键采用默认值2048。
输入最后一个扇区编号：仅创建一个分区，按回车键采用默认值。
输入p：查看该数据盘的规划分区情况。
输入w：开始分区，并在分区后退出。</code></pre></li><li><p>查看新分区</p><pre><code class="lang-shell">fdisk -lu /dev/vdb
-----------------------------------------------------------
Disk /dev/vdb: 21.5 GB, 21474836480 bytes, 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x3e60020e

Device Boot Start End Blocks Id System
/dev/vdb1 2048 41943039 20970496 83 Linux</code></pre></li><li><p>在新分区创建文件系统</p><p>如果需要在 Linux、Windows和Mac系统之间共享文件，可以使用<code>mkfs.vfat</code>创建VFAT文件系统。</p><pre><code class="lang-shell">mkfs.ext4 /dev/vdb1</code></pre></li><li><p>备份etc/fstab文件</p><pre><code class="lang-shell">cp /etc/fstab /etc/fstab.bak</code></pre></li><li><p>向etc/fstab写入新分区信息</p><pre><code class="lang-shell">echo /dev/vdb1 /mnt ext4 defaults 0 0 &gt;&gt; /etc/fstab</code></pre></li><li><p>查看新分区信息</p><pre><code class="lang-shell">cat /etc/fstab</code></pre></li><li><p>挂载文件系统</p><pre><code class="lang-shell">mount /dev/vdb1/ /app

#若需要卸载文件系统可执行以下命令:
umount /app</code></pre></li><li><p>查看磁盘使用情况</p><p>若出现新建文件系统信息,则挂载成功</p><pre><code class="lang-shell">df -h</code></pre></li></ul><h4>2.创建用户(root用户执行)</h4><pre><code class="lang-shell">#创建用户主目录
useradd -d /app -m app

passwd hadoop</code></pre><h4>3.修改主机名(root用户执行)</h4><pre><code class="lang-shell">vim /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=hadoop03</code></pre><h4>4.修改hosts文件(root用户执行)</h4><pre><code class="lang-shell">vim /etc/hosts

10.0.0.99       hadoop01
10.0.0.100      hadoop02
10.0.0.101      hadoop03
10.0.0.102      hadoop04</code></pre><p>修改完以上文件后 <code>reboot</code> 重启</p><h4>5.配置ssh免密登录</h4><ul><li><p>生成rsa秘钥</p><pre><code class="lang-shell">ssh-keygen -t rsa </code></pre></li><li><p>拷贝秘钥至其他服务器上</p><p>在一台服务器上配置好所有服务器的公钥,然后复制到其他服务器即可,本机的公钥也需要</p><pre><code class="lang-shell">scp .ssh/id_rsa.pub hadoop@hadoop02:/app/hadoop/id_rsa.pub 
cat id_rsa.pub &gt;&gt; ~/.ssh/authorized_keys</code></pre></li><li><p>修改文件夹权限</p><pre><code class="lang-shell">1.chmod 700 -R ~/.ssh</code></pre></li><li><p>其他方式</p><pre><code class="lang-shell">ssh-copy-id -i ~/.ssh/id_rsa.pub app@192.168.1.233</code></pre></li></ul><h3>二.JDK与SCALA环境搭建</h3><h4>1.复制JDK包与SCALA包并解压</h4><pre><code class="lang-shell">scp jdk-1.7.0_71.tar hadoop@10.0.0.99:/app/java
tar -xvf jdk-1.7.0_71.tar</code></pre><h4>2.配置环境变量</h4><pre><code class="lang-shell">vim /etc/profile
#并输入以下内容
export JAVA_HOME=/app/java/jdk1.7.0_71
export JRE_HOME=/app/java/jdk1.7.0_71/jre
export SCALA_HOME=/app/scala/scala-2.11.8

export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$SCALA_HOME/bin:$PATH
</code></pre><h3>三.zookeeper集群搭建</h3><h4>1.zookeeper包下载,解压</h4><pre><code class="lang-shell">tar -xvf zookeeper-3.4.6.tar</code></pre><h4>2.创建data和logs目录</h4><pre><code class="lang-shell">mkdir data
mkdir logs</code></pre><h4>3.配置zoo.cfg</h4><pre><code class="lang-shell">cp zoo_sample.cfg zoo.cfg

#修改zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
# The number of milliseconds of each tick

tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataLogDir=/app/hadoop/zookeeper3.4.6/logs
dataDir=/app/hadoop/zookeeper3.4.6/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
maxClientCnxns=500
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to &quot;0&quot; to disable auto purge feature
autopurge.purgeInterval=24
server.1=hadoop01:2888:3888 
server.2=hadoop02:2888:3888 
server.3=hadoop03:2888:3888

#  心跳间隔时间
tickTime=2000 
# 最小SessionTimeou 
minSessionTimeout=4000
# 最大SessionTimeou
maxSessionTimeout=100000
</code></pre><h4>4.创建data/myid文件</h4><pre><code class="lang-shell">echo 1 &gt;&gt; myid</code></pre><h4>5.拷贝文件至其他服务器</h4><pre><code class="lang-shell">scp -r zookeeper3.4.6/ hadoop@hadoop02:/app/data/</code></pre><h4>6.修改各服务器myid文件</h4><p>修改为服务器对应server.n 的n值</p><h4>7.启动zk</h4><pre><code class="lang-shell">./zkServer.sh start
#集群启动需要所有服务都启动完,zk状态才会正常</code></pre><h4>8.开机自启动</h4><pre><code class="lang-shell">#待完成</code></pre><h4>9.集群数量单数原因</h4><ul><li><p>容错</p><p>由于在增删改操作中需要半数以上服务器通过，来分析以下情况。</p><p>2台服务器，至少2台正常运行才行（2的半数为1，半数以上最少为2），正常运行1台服务器都不允许挂掉</p><p>3台服务器，至少2台正常运行才行（3的半数为1.5，半数以上最少为2），正常运行可以允许1台服务器挂掉</p><p>4台服务器，至少3台正常运行才行（4的半数为2，半数以上最少为3），正常运行可以允许1台服务器挂掉</p><p>5台服务器，至少3台正常运行才行（5的半数为2.5，半数以上最少为3），正常运行可以允许2台服务器挂掉</p><p>6台服务器，至少3台正常运行才行（6的半数为3，半数以上最少为4），正常运行可以允许2台服务器挂掉 </p></li></ul><p>通过以上可以发现，3台服务器和4台服务器都最多允许1台服务器挂掉，5台服务器和6台服务器都最多允许2台服务器挂掉</p><p>但是明显4台服务器成本高于3台服务器成本，6台服务器成本高于5服务器成本。这是由于半数以上投票通过决定的。</p><ul><li><p>防脑裂</p><p>一个zookeeper集群中，可以有多个follower、observer服务器，但是必需只能有一个leader服务器。</p><p>如果leader服务器挂掉了，剩下的服务器集群会通过半数以上投票选出一个新的leader服务器。</p><p>集群互不通讯情况：</p><p>一个集群3台服务器，全部运行正常，但是其中1台裂开了，和另外2台无法通讯。3台机器里面2台正常运行过半票可以选出一个leader。</p><p><strong>一个集群4台服务器，全部运行正常，但是其中2台裂开了，和另外2台无法通讯。4台机器里面2台正常工作没有过半票以上达到3，无法选出leader正常运行。</strong></p><p>一个集群5台服务器，全部运行正常，但是其中2台裂开了，和另外3台无法通讯。5台机器里面3台正常运行过半票可以选出一个leader。</p><p><strong>一个集群6台服务器，全部运行正常，但是其中3台裂开了，和另外3台无法通讯。6台机器里面3台正常工作没有过半票以上达到4，无法选出leader正常运行。</strong></p></li></ul><p> ### 四.Hadoop集群搭建</p><h4>1.下载并拷贝包至服务器</h4><pre><code class="lang-shell">scp hadoop-2.6.0.tat.gz hadoop@10.0.0.99:/app/hadoop</code></pre><h4>2.创建数据存放目录</h4><pre><code class="lang-shell">mkdir -p app/data/hadoop/dfs/tmp
mkdir -p app/data/hadoop/dfs/data
mkdir -p app/data/hadoop/dfs/journal
mkdir -p app/data/hadoop/dfs/name</code></pre><h4>3.修改hadoop-env.sh</h4><pre><code class="lang-shell">#修改java环境变量
export JAVA_HOME=/app/java/jdk1.7.0_71</code></pre><h4>4.修改core-site.xml</h4><pre><code class="lang-xml">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?&gt;
&lt;!--
  Licensed under the Apache License, Version 2.0 (the &quot;License&quot;);
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an &quot;AS IS&quot; BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
--&gt;

&lt;!-- Put site-specific property overrides in this file. --&gt;

&lt;configuration&gt;
&lt;property&gt;
&lt;!--指定tmp存放目录--&gt;
&lt;name&gt;hadoop.tmp.dir&lt;/name&gt;
&lt;value&gt;/app/data/hadoop/dfs/tmp&lt;/value&gt;
&lt;description&gt;Abaseforothertemporarydirectories.&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.default.name&lt;/name&gt;
&lt;value&gt;hdfs://hadoop-cluster1&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;io.file.buffer.size&lt;/name&gt;
&lt;value&gt;4096&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;io.compression.codecs&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec
&lt;/value&gt;
&lt;/property&gt;
&lt;!--指定zookeeper地址--&gt;
 &lt;property&gt;
      &lt;name&gt;ha.zookeeper.quorum&lt;/name&gt;
      &lt;value&gt;hadoop01:2181,hadoop02:2181,hadoop03:2181&lt;/value&gt;
 &lt;/property&gt;
&lt;/configuration&gt;
</code></pre><h4>5.修改hdfs-site.xml</h4><pre><code class="lang-xml">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?&gt;
&lt;!--
  Licensed under the Apache License, Version 2.0 (the &quot;License&quot;);
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

&lt;!-- Put site-specific property overrides in this file. --&gt;

&lt;configuration&gt;
&lt;property&gt;
&lt;name&gt;dfs.namenode.secondary.http-address&lt;/name&gt;
&lt;value&gt;hadoop01:9001&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.namenode.name.dir&lt;/name&gt;
&lt;value&gt;file:/app/data/hadoop/dfs/name&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.datanode.data.dir&lt;/name&gt;
&lt;value&gt;file:/app/data/hadoop/dfs/data&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.replication&lt;/name&gt;
&lt;value&gt;3&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.nameservices&lt;/name&gt;
&lt;value&gt;hadoop-cluster1&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.webhdfs.enabled&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
    &lt;name&gt;dfs.hosts.exclude&lt;/name&gt;
    &lt;value&gt;/app/hadoop/hadoop-2.6.0/etc/hadoop/excludes&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
        &lt;name&gt;dfs.ha.namenodes.hadoop-cluster1&lt;/name&gt;
        &lt;value&gt;nn1,nn2&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;dfs.namenode.rpc-address.hadoop-cluster1.nn1&lt;/name&gt;
        &lt;value&gt;hadoop01:9000&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;dfs.namenode.http-address.hadoop-cluster1.nn1&lt;/name&gt;
        &lt;value&gt;hadoop01:50070&lt;/value&gt;
    &lt;/property&gt;
 &lt;property&gt;
        &lt;name&gt;dfs.namenode.rpc-address.hadoop-cluster1.nn2&lt;/name&gt;
        &lt;value&gt;hadoop02:9000&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;dfs.namenode.http-address.hadoop-cluster1.nn2&lt;/name&gt;
        &lt;value&gt;hadoop02:50070&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;dfs.namenode.shared.edits.dir&lt;/name&gt;
        &lt;value&gt;qjournal://hadoop02:8485;hadoop03:8485;hadoop04:8485/hadoop-cluster1&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;dfs.client.failover.proxy.provider.hadoop-cluster1&lt;/name&gt;
        &lt;value&gt;org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;dfs.ha.fencing.methods&lt;/name&gt;
        &lt;value&gt;sshfence&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;dfs.ha.fencing.ssh.private-key-files&lt;/name&gt;
        &lt;value&gt;/app/hadoop/.ssh/id_rsa&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;dfs.journalnode.edits.dir&lt;/name&gt;
        &lt;value&gt;/app/data/hadoop/dfs/journal&lt;/value&gt;
    &lt;/property&gt;
    &lt;!-- 开启NameNode故障时自动切换 --&gt;
    &lt;property&gt;
        &lt;name&gt;dfs.ha.automatic-failover.enabled&lt;/name&gt;
        &lt;value&gt;true&lt;/value&gt;
    &lt;/property&gt;
&lt;/configuration&gt;
</code></pre><h4>6.修改mapred-site.xml</h4><pre><code class="lang-shell">cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
vim mapred-site.xml</code></pre><pre><code class="lang-xml">&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;?xml-stylesheet type=&quot;text/xsl&quot; href=&quot;configuration.xsl&quot;?&gt;
&lt;!--
  Licensed under the Apache License, Version 2.0 (the &quot;License&quot;);
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

&lt;!-- Put site-specific property overrides in this file. --&gt;

&lt;configuration&gt;
    &lt;property&gt;
        &lt;name&gt;mapreduce.framework.name&lt;/name&gt;
        &lt;value&gt;yarn&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;mapreduce.jobhistory.address&lt;/name&gt;
        &lt;value&gt;hadoop01:10020&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;mapreduce.jobhistory.webapp.address&lt;/name&gt;
        &lt;value&gt;hadoop01:19888&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;mapreduce.map.output.compress&lt;/name&gt; 
        &lt;value&gt;true&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;mapreduce.map.output.compress.codec&lt;/name&gt; 
        &lt;value&gt;org.apache.hadoop.io.compress.SnappyCodec&lt;/value&gt;
    &lt;/property&gt;
&lt;/configuration&gt;
</code></pre><h4>7.修改yarn-site.xml</h4><pre><code class="lang-xml">&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;!--
  Licensed under the Apache License, Version 2.0 (the &quot;License&quot;);
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

&lt;!-- Site specific YARN configuration properties --&gt;
    &lt;property&gt;
        &lt;name&gt;yarn.nodemanager.aux-services&lt;/name&gt;
        &lt;value&gt;mapreduce_shuffle&lt;/value&gt;
    &lt;/property&gt;        
    &lt;property&gt;
        &lt;name&gt;yarn.nodemanager.aux-services.mapreduce.shuffle.class&lt;/name&gt;
        &lt;value&gt;org.apache.hadoop.mapred.ShuffleHandler&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;yarn.resourcemanager.address&lt;/name&gt;
        &lt;value&gt;hadoop01:8032&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;yarn.resourcemanager.scheduler.address&lt;/name&gt;
        &lt;value&gt;hadoop01:8030&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;yarn.resourcemanager.resource-tracker.address&lt;/name&gt;
        &lt;value&gt;hadoop01:8031&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;yarn.resourcemanager.admin.address&lt;/name&gt;
        &lt;value&gt;hadoop01:8033&lt;/value&gt;
    &lt;/property&gt;
    &lt;property&gt;
        &lt;name&gt;yarn.resourcemanager.webapp.address&lt;/name&gt;
        &lt;value&gt;hadoop01:8088&lt;/value&gt;
    &lt;/property&gt;
&lt;/configuration&gt;
</code></pre><h4>8.修改环境变量</h4><pre><code class="lang-shell">vim /etc/profile</code></pre><pre><code class="lang-shell">export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.6.0
export HADOOP_HOME=/app/hadoop/hadoop-2.6.0
export HADOOP_CONF_DIR=/app/hadoop/hadoop-2.6.0/etc/hadoop
export YARN_CONF_DIR=/app/hadoop/hadoop-2.6.0/etc/hadoop
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib
export HADOOP_OPTS=&quot;-Djava.library.path=$HADOOP_HOME/lib&quot;
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$SCALA_HOME/bin:$MAVEN_HOME/bin:$ZK_HOME/bin:$HADOOP_COMMON_HOME/bin:$HADOOP_COMMON_HOME/sbin:$PATH</code></pre><h4>9.分发hadoop包至其他节点</h4><pre><code class="lang-shell">scp -r hadoop-2.6.0 hadoop@hadoop02:/app/hadoop/
scp -r hadoop-2.6.0 hadoop@hadoop03:/app/hadoop/
scp -r hadoop-2.6.0 hadoop@hadoop04:/app/hadoop/</code></pre><h4>10.启动journalnode节点</h4><pre><code class="lang-shell">./sbin/hadoop-daemon.sh  start journalnode</code></pre><h4>11.格式化zkfc</h4><pre><code class="lang-shell">/bin/hdfs zkfc -formatZK</code></pre><h4>12.格式化namenode</h4><pre><code class="lang-shell">./bin/hdfs namenode -format</code></pre><h4>13.启动datanode</h4><pre><code class="lang-shell">./sbin/hadoop-daemon.sh start datanode</code></pre><h4>14.启动namenode</h4><pre><code class="lang-shell">#namenode1 
./sbin/hadoop-daemon.sh start namenode
 
#namenode2
./bin/hdfs namenode -bootstrapStandby
./sbin/hadoop-daemon.sh start namenode</code></pre><h4>15.启动yarn</h4><pre><code class="lang-shell">./start-yarn.sh</code></pre><h4>16.查看集群状态</h4><p><span class="external-link"><a class="no-external-link" href="http://10.0.0.99:50070" target="_blank"><i data-feather="external-link"></i>http://10.0.0.99:50070</a></span></p><h4>17.待解决问题</h4><pre><code class="lang-shell">snappy库支持问题.
#安装gcc环境
yum install -y gcc-c++</code></pre><h3>五.kafka集群搭建</h3><h4>1.下载kafka包并上传至服务器</h4><p><span class="external-link"><a class="no-external-link" href="http://kafka.apache.org/downloads.html" target="_blank"><i data-feather="external-link"></i>http://kafka.apache.org/downloads.html</a></span></p><h4>2.解压tar包</h4><pre><code class="lang-shell">tar -xvf kafka_2.10-0.8.2.0.tgz</code></pre><h4>3.修改配置文件</h4><p>server.properties</p><pre><code class="lang-properties"># Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the &quot;License&quot;); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
# 
#    http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an &quot;AS IS&quot; BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

############################# Socket Server Settings #############################

# The port the socket server listens on
port=9092

# Hostname the broker will bind to. If not set, the server will bind to all interfaces
#host.name=localhost

# Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for &quot;host.name&quot; if configured.  Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=&lt;hostname routable by clients&gt;

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=&lt;port accessible by clients&gt;

# The number of threads handling network requests
num.network.threads=3
 
# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600

############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
log.dirs=/app/hadoop/kafka_2.10-0.8.2.0/logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=6

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk. 
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. 
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion
log.retention.hours=48

# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according 
# to the retention policies
log.retention.check.interval.ms=300000

# By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
# If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
log.cleaner.enable=false

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. &quot;127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002&quot;.
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
</code></pre><h4>4.复制包至其他节点</h4><pre><code class="lang-shell">scp -r kafka-2.10_0.8.2.0 hadoop@hadoop03:/app/hadoop</code></pre><h4>5.启动各服务器kafka</h4><pre><code class="lang-shell">./kafka-server-start.sh ../config/server.properties &amp;</code></pre><h4>6.验证集群</h4><pre><code class="lang-shell">#创建一个测试topic
./kafka-topics.sh --create --zookeeper hadoop01:2181 --replication-factor 3 --partitions 1 --topic wxtest

#启动一台kafka hadoop04 的producer并往hadoop03推送数据
./kafka-console-producer.sh --broker-list hadoop03:9092 --topic wxtest
test for hadoop03

#停掉hadoop03的kafka
./kafka-server-stop.sh

#在hadoop02节点启动consumer,看是否接收到hadoop04推送的数据
./kafka-console-consumer.sh    --zookeeper hadoop01:2181 --topic wxtest --from-beginning</code></pre><h4>7.待解决问题</h4><p>kafka启动后,hadoop02 zk挂掉的问题</p><h3>六.安装mysql</h3><pre><code class="lang-shell">./mysql_install_db --verbose --user=hadoop --defaults-file=/app/hadoop/mysql-5.6.33-linux-glibc2.5-x86_64/my.cnf --datadir=/app/data/mysql/data/ --basedir=/app/hadoop/mysql-5.6.33-linux-glibc2.5-x86_64 --pid-file=/app/data/mysql/data/mysql.pid --tmpdir=/app/data/mysql/tmp
</code></pre><pre><code class="lang-shell">cp support-files/mysql.server /etc/init.d/mysql</code></pre><pre><code class="lang-shell">./mysqld_safe --defaults-file=/etc/my.cnf --socket=/app/data/mysql/tmp/mysql.sock --user=hadoop</code></pre><pre><code class="lang-shell">./mysql -h localhost -S /app/data/mysql/tmp/mysql.sock -u root -p</code></pre><pre><code class="lang-shell">GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'rootbqs123' WITH GRANT OPTION;</code></pre><pre><code class="lang-shell">create database hive;
alter database hive character set latin1;</code></pre><h3>七.hive搭建</h3><h4>1.下载hive1包,上传至服务器并解压</h4><p>apache-hive-    1.2.1-bin.tar.gz</p><p><span class="external-link"><a class="no-external-link" href="http://hive.apache.org/downloads.html" target="_blank"><i data-feather="external-link"></i>http://hive.apache.org/downloads.html</a></span></p><pre><code class="lang-shell">tar -xvf apache-hive-1.2.1-bin.tar.gz</code></pre><h4>2.修改hive-env.sh文件</h4><p>注:hive-env.sh初始时没有,需要复制hive-env.sh.template文件</p><pre><code class="lang-shell">cp hive-env.sh.template hive-env.sh
vim hive-env.sh</code></pre><h4>3.修改hive-site.xml文件</h4><h4>4.启动hivemetastore</h4><pre><code class="lang-shell">#-p参数若不指定,默认为9083端口
hive --service metastore -p &lt;port_num&gt;

#客户端使用hive命令进入
hive</code></pre><h4>5.注意事项</h4><pre><code class="lang-shell">#启动时报错:/tmp/hive on HDFS should be writable. Current permissions are: rwx--x--x
# 当前用户在hdfs无权限写入数据~解决方式
hadoop fs -chmod -R 777 /tmp

</code></pre><h3>八.HBase搭建</h3><p>hbase采用分布式集群搭建，节点情况如下</p><table><thead><tr><th>hadoop01</th><th>hadoop02</th><th>hadoop03</th><th>hadoop04</th></tr></thead><tbody><tr><td>Hmaster</td><td>Hmaster</td><td> </td><td> </td></tr><tr><td> </td><td>regionserver</td><td>regionserver</td><td>regionserver</td></tr></tbody></table><h4>1.上传hbase-1.0.2-bin.tar.gz包至服务器</h4><p>下载地址:<span class="external-link"><a class="no-external-link" href="http://archive.apache.org/dist/hbase/hbase-1.0.2/" target="_blank"><i data-feather="external-link"></i>http://archive.apache.org/dist/hbase/hbase-1.0.2/</a></span></p><pre><code class="lang-shell">tar -xvf hbase-1.0.2-bin.tar.gz</code></pre><h4>2.修改hbase-env.sh文件</h4><pre><code class="lang-shell">#
#/**
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * &quot;License&quot;); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an &quot;AS IS&quot; BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.7+ required.
export JAVA_HOME=/app/java/jdk1.7.0_71/

# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=

# The maximum amount of heap to use. Default is left to JVM default.
export HBASE_HEAPSIZE=1G

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/
export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:$HBASE_HOME/lib/native/

# Uncomment below if you intend to use off heap cache. For example, to allocate 8G of 
# offheap, set the value to &quot;8G&quot;.
# export HBASE_OFFHEAPSIZE=1G

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS=&quot;$HBASE_OPTS -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly&quot;

# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS=&quot;-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps&quot;

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS=&quot;-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:&lt;FILE-PATH&gt;&quot;

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
export SERVER_GC_OPTS=&quot;-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:&lt;FILE-PATH&gt; -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M&quot;

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS=&quot;-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps&quot;

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS=&quot;-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:&lt;FILE-PATH&gt;&quot;

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS=&quot;-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:&lt;FILE-PATH&gt; -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M&quot;

# See the package documentation for org.apache.hadoop.hbase.io.hfile for other configurations
# needed setting up off-heap block caching.

# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# NOTE: HBase provides an alternative JMX implementation to fix the random ports issue, please see JMX
# section in HBase Reference Guide for instructions.

# export HBASE_JMX_BASE=&quot;-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false&quot;
# export HBASE_MASTER_OPTS=&quot;$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101&quot;
# export HBASE_REGIONSERVER_OPTS=&quot;$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102&quot;
# export HBASE_THRIFT_OPTS=&quot;$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103&quot;
# export HBASE_ZOOKEEPER_OPTS=&quot;$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104&quot;
# export HBASE_REST_OPTS=&quot;$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105&quot;

# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID=&quot;hbase&quot;

# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS=&quot;-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR&quot;

# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS=&quot;$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070&quot;
# export HBASE_REGIONSERVER_OPTS=&quot;$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071&quot;
# export HBASE_THRIFT_OPTS=&quot;$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072&quot;
# export HBASE_ZOOKEEPER_OPTS=&quot;$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073&quot;

# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/app/hadoop/hbase-1.0.2/pids

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false

# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to &quot;&lt;DESIRED_LOG LEVEL&gt;,DRFA&quot;.
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.
</code></pre><h4>3.修改hbase-site.xml</h4><h4>4.修改regionservers文件</h4><p>写入需要作为从节点的服务器</p><pre><code class="lang-shell">hadoop02
hadoop03
hadoop04</code></pre><h4>5.创建backup-masters文件并写入备份主节点</h4><pre><code class="lang-shell">hadoop02</code></pre><h4>6.链接hadoop的配置</h4><pre><code class="lang-shell">ln -s /app/hadoop/hadoop2.6.0/etc/hadoop/hdfs-site.xml /app/hadoop/hbase-1.0.2/conf/
ln -s /app/hadoop/hadoop2.6.0/etc/hadoop/core-site.xml /app/hadoop/hbase-1.0.2/conf/</code></pre><h4>7.拷贝包至其他服务器</h4><pre><code class="lang-shell">scp -r /app/hadoop/hbase-1.0.2/ hadoop@hadoop02:/app/hadoop/
scp -r /app/hadoop/hbase-1.0.2/ hadoop@hadoop03:/app/hadoop/
scp -r /app/hadoop/hbase-1.0.2/ hadoop@hadoop04:/app/hadoop/</code></pre><h4>8.配置环境变量</h4><pre><code class="lang-shell">vim /etc/profile
#添加以下配置
export HBASE_HOME=/home/hadoop/hbase-1.0.2
export PATH=$HABSE_HOME/bin:$PATH</code></pre><h4>9.启动服务</h4><ul><li><p>方式一:</p><pre><code class="lang-shell">start-hbase.sh
#启动日志如下:
starting master, logging to /app/hadoop/hbase-1.0.2/logs/hbase-hadoop-master-hadoop01.out
hadoop04: starting regionserver, logging to /app/hadoop/hbase-1.0.2/bin/../logs/hbase-hadoop-regionserver-hadoop04.out
hadoop02: starting regionserver, logging to /app/hadoop/hbase-1.0.2/bin/../logs/hbase-hadoop-regionserver-hadoop02.out
hadoop03: starting regionserver, logging to /app/hadoop/hbase-1.0.2/bin/../logs/hbase-hadoop-regionserver-hadoop03.out
hadoop02: starting master, logging to /app/hadoop/hbase-1.0.2/bin/../logs/hbase-hadoop-master-hadoop02.out</code></pre></li></ul><ul><li><p>方式二:</p><pre><code class="lang-shell">#启动主节点
hbase-daemon.sh start master
#启动从节点
hbase-daemon.sh start regionserver</code></pre></li></ul><h4>10.问题</h4><ul><li><p>启动异常</p><pre><code class="lang-tex">java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer
        at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2523)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:64)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2538)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2521)
        ... 5 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1905)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.&lt;init&gt;(RSRpcServices.java:769)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.createRpcServices(HRegionServer.java:575)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.&lt;init&gt;(HRegionServer.java:492)
        ... 10 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1811)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
        ... 13 more
</code></pre><p>原因: hbase-site配置了phoenix相关内容,但lib目录无相关jar包导致</p><p>解决办法:</p><p>1.去除Phoenix相关配置</p><p>2.拷贝Phoenix相关jar包至各目录</p><p>本次通过方法1成功启动, phoenix相关配置后续加入</p></li></ul><h4>11.hive整合hbase</h4><p>因为Hive与HBase整合的实现是利用两者本身对外的API接口互相通信来完成的，其具体工作交由Hive的lib目录中的hive-hbase-handler- <em>.jar工具类来实现。所以只需要将hive的 hive-hbase-handler- </em>.jar 复制到hbase/lib中就可以了。</p><pre><code class="lang-shell">#拷贝至本机目录下
cp /app/hadoop/apache-hive-1.2.1-bin/lib/hive-hbase-handler-1.2.1.jar /app/hadoop/hbase-1.0.2/lib/

#拷贝至其他主机目录下
cd /app/hadoop/apache-hive-1.2.1-bin/lib
scp hive-hbase-handler-1.2.1.jar hadoop@hadoop02:/app/hadoop/hbase-1.0.2/lib/</code></pre><ul><li>测试整合效果</li></ul><p>通过不同主机分别进入hive和hbase</p><pre><code class="lang-shell">hive
hbase shell

#在hive创建表</code></pre><p>在hbase创建表'wx_test_hive_hbase'</p><pre><code class="lang-sql"> create 'wx_test_hive_hbase','INFO'</code></pre><p>在hive创建表'hive_wx_test_hive_hbase'</p><pre><code class="lang-sql">create external table hive_wx_test_hive_hbase(
id string,
area_code string,
area_desc string
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (&quot;hbase.columns.mapping&quot; = &quot;:key,INFO:areaCode,INFO:areaDesc&quot;)
TBLPROPERTIES(&quot;hbase.table.name&quot; = &quot;wx_test_hive_hbase&quot;);

create external table hive_wx_test(
id string,
area_code string
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (&quot;hbase.columns.mapping&quot; = &quot;:key,INFO:areaCode&quot;)
TBLPROPERTIES(&quot;hbase.table.name&quot; = &quot;wx_test&quot;);</code></pre><p>在hive和hbase分别插入不同数据</p><pre><code class="lang-sql">#hive

#hbase
put 'wx_test_hive_hbase','00001','INFO:area_code','0001','INFO:area_desc','深圳'</code></pre><h3>九.spark集群搭建</h3><h4>1.下载spark并上传服务器</h4><p><span class="external-link"><a class="no-external-link" href="https://archive.apache.org/dist/spark/spark-2.1.0/" target="_blank"><i data-feather="external-link"></i>https://archive.apache.org/dist/spark/spark-2.1.0/</a></span></p><h4>2.修改配置文件spark-env.sh</h4><p>注:原始包里只有spark-env.sh.template文件,需要拷贝一份为spark-env.sh文件</p><pre><code class="lang-shell">cp spark-env.sh.template spark-env.sh

#修改配置如下:
</code></pre><h4>3.修改u</h4>

生产环境版本

集群搭建

一.服务器准备

1.挂载数据盘(root)

2.创建用户(root用户执行)

3.修改主机名(root用户执行)

4.修改hosts文件(root用户执行)

5.配置ssh免密登录

二.JDK与SCALA环境搭建

1.复制JDK包与SCALA包并解压

2.配置环境变量

三.zookeeper集群搭建

1.zookeeper包下载,解压

2.创建data和logs目录

3.配置zoo.cfg

4.创建data/myid文件

5.拷贝文件至其他服务器

6.修改各服务器myid文件

7.启动zk

8.开机自启动

9.集群数量单数原因

1.下载并拷贝包至服务器

2.创建数据存放目录

3.修改hadoop-env.sh

4.修改core-site.xml

5.修改hdfs-site.xml

6.修改mapred-site.xml

7.修改yarn-site.xml

8.修改环境变量

9.分发hadoop包至其他节点

10.启动journalnode节点

11.格式化zkfc

12.格式化namenode

13.启动datanode

14.启动namenode

15.启动yarn

16.查看集群状态

17.待解决问题

五.kafka集群搭建

1.下载kafka包并上传至服务器

2.解压tar包

3.修改配置文件

4.复制包至其他节点

5.启动各服务器kafka

6.验证集群

7.待解决问题

六.安装mysql

七.hive搭建

1.下载hive1包,上传至服务器并解压

2.修改hive-env.sh文件

3.修改hive-site.xml文件

4.启动hivemetastore

5.注意事项

八.HBase搭建

1.上传hbase-1.0.2-bin.tar.gz包至服务器

2.修改hbase-env.sh文件

3.修改hbase-site.xml

4.修改regionservers文件

5.创建backup-masters文件并写入备份主节点

6.链接hadoop的配置

7.拷贝包至其他服务器

8.配置环境变量

9.启动服务

10.问题

11.hive整合hbase

九.spark集群搭建

1.下载spark并上传服务器

2.修改配置文件spark-env.sh

3.修改u

发表评论 取消回复 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

原生Hadoop搭建

发表评论取消回复
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款