hive安装和集成hbase

Published on 2022-10-02 17:52 in 分类: 软件 with 狂盗一枝梅
分类: 软件

hive安装和集成hbase

1、安装前准备

  • 依赖java环境
  • 依赖hadoop
  • 依赖hbase
  • 依赖mysql

2、安装包下载

官网地址:https://hive.apache.org/downloads.html

下载地址:https://dlcdn.apache.org/hive/

3.1.3下载地址:https://dlcdn.apache.org/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz

下载完成后上传到hadoop01机器上的/usr/local/文件夹,并且使用

tar -zxvf apache-hive-3.1.3-bin.tar.gz

解压到当前文件夹,然后使用命令

ln -s apache-hive-3.1.3-bin hive

创建软链接

3、配置环境变量

vim /etc/profile

export HIVE_HOME=/usr/local/hive
export PATH=$HIVE_HOME/bin:$PATH

4、msyql相关配置

  • 创建好数据库hive并且配置好远程登录用作之后配置使用。
  • 将mysql驱动 mysql-connector-java-8.0.25.jar 复制到 $HIVE_HOME/lib目录

mysql驱动下载地址:https://mvnrepository.com/artifact/mysql/mysql-connector-java

5、配置文件准备

使用命令

cp /usr/local/hive/conf/hive-default.xml.template /usr/local/hive/conf/hive-site.xml 

将默认配置复制出来一份并重命名为hive-site.xml文件。

修改配置文件中的内容,这里建议先下载下来之后再使用notepad++进行修改,因为文件内容实在太长

  <!-- jdbc 连接的 URL -->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://10.182.71.115:3306/hive?useSSL=false</value>
  </property>
  
  <!-- jdbc 连接的 Driver-->
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
  </property>
  
  <!-- jdbc 连接的 username-->
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  <!-- jdbc 连接的 password -->
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
  </property>
  
  <!-- Hive 元数据存储版本的验证 -->
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
  </property>
  <!--元数据存储授权-->
  <property>
    <name>hive.metastore.event.db.notification.api.auth</name>
    <value>false</value>
  </property>
  
  <!-- Hive 默认在 HDFS 的工作目录 -->
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
  
  <property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>

6、初始化数据库

运行以下命令

schematool -initSchema -dbType mysql -verbose

运行完该命令之后会发现mysql数据库中的hive数据库已经新建了好多表。

7、启动hive

直接运行命令

hive

即可启动hive,在命令行中运行

show databases;

命令,如果一切OK,那就安装没问题。

8、集成hbase

8.1 新增或者修改hive-site.xml配置文件

<property>
    <name>hive.zookeeper.quorum</name>
    <value>hadoop01,hadoop02,hadoop03</value>
</property>
<property>
	<name>hbase.zookeeper.quorum</name>
	<value>hadoop01,hadoop02,hadoop03</value>
</property>

8.2 hbase依赖拷贝

ln -s /usr/local/hbase/lib/hbase-client-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-protocol-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-it-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-server-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-common-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-hadoop2-compat-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-hadoop-compat-2.5.0.jar $HIVE_HOME/lib/

8.3 启动hive

启动之后,创建外部表

CREATE TABLE sogoulogs(
id string,
datetime string,
userid string,
searchname string,
retorder string,
cliorder string,
cliurl string
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES
("hbase.columns.mapping" = "
:key,
info:datetime,
info:userid,
info:searchname,
info:retorder,
info:cliorder,
info:cliurl
")
TBLPROPERTIES ("hbase.table.name" = "sogoulogs");

9、集成测试

9.1 查询全部

hive> select * from sogoulogs;
OK
0001    20221003        001     你好    1       1       https://www.baidu.com
1940021547934818200:00:011664791412121  00:00:01        19400215479348182       [天津工业大学\] 1       65      www.baidu.com/
Time taken: 0.338 seconds, Fetched: 2 row(s)

可以看到查询全部不会触发MapReduce计算

9.2 查询用户总数

hive> select count(distinct userid) from sogoulogs;
Query ID = root_20221003183038_e7f8d2b1-b050-4a88-b77a-16953b68fe45
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1664524473382_0005, Tracking URL = http://hadoop01:8089/proxy/application_1664524473382_0005/
Kill Command = /usr/local/hadoop/bin/mapred job  -kill job_1664524473382_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-10-03 18:30:53,824 Stage-1 map = 0%,  reduce = 0%
2022-10-03 18:31:13,438 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.21 sec
2022-10-03 18:31:19,666 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.9 sec
MapReduce Total cumulative CPU time: 9 seconds 900 msec
Ended Job = job_1664524473382_0005
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 9.9 sec   HDFS Read: 10026 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 900 msec
OK
2
Time taken: 43.774 seconds, Fetched: 1 row(s)

9.3 统计新闻话题浏览量排行

hive> select searchname,count(*) as rank from sogoulogs group by searchname order by rank desc limit 10;
Query ID = root_20221003183406_43676434-3ca5-46cc-ad60-3210412dd148
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1664524473382_0006, Tracking URL = http://hadoop01:8089/proxy/application_1664524473382_0006/
Kill Command = /usr/local/hadoop/bin/mapred job  -kill job_1664524473382_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-10-03 18:34:18,129 Stage-1 map = 0%,  reduce = 0%
2022-10-03 18:34:29,557 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.31 sec
2022-10-03 18:34:34,699 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.93 sec
MapReduce Total cumulative CPU time: 7 seconds 930 msec
Ended Job = job_1664524473382_0006
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1664524473382_0007, Tracking URL = http://hadoop01:8089/proxy/application_1664524473382_0007/
Kill Command = /usr/local/hadoop/bin/mapred job  -kill job_1664524473382_0007
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2022-10-03 18:34:51,056 Stage-2 map = 0%,  reduce = 0%
2022-10-03 18:34:57,260 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.23 sec
2022-10-03 18:35:04,469 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 4.59 sec
MapReduce Total cumulative CPU time: 4 seconds 590 msec
Ended Job = job_1664524473382_0007
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 7.93 sec   HDFS Read: 13796 HDFS Write: 161 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 4.59 sec   HDFS Read: 7743 HDFS Write: 169 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 520 msec
OK
你好    1
[天津工业大学\] 1
Time taken: 59.951 seconds, Fetched: 2 row(s)

可以看到,稍复杂的sql会拆分成多个mapreduce任务执行

参考文档

Apache-Hive3.1.3安装

https://blog.csdn.net/v15220/article/details/125131542


#hadoop #hbase #hive
目录