hive安装和集成hbase
1、安装前准备
- 依赖java环境
- 依赖hadoop
- 依赖hbase
- 依赖mysql
2、安装包下载
官网地址:https://hive.apache.org/downloads.html
下载地址:https://dlcdn.apache.org/hive/
3.1.3下载地址:https://dlcdn.apache.org/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
下载完成后上传到hadoop01机器上的/usr/local/
文件夹,并且使用
tar -zxvf apache-hive-3.1.3-bin.tar.gz
解压到当前文件夹,然后使用命令
ln -s apache-hive-3.1.3-bin hive
创建软链接
3、配置环境变量
vim /etc/profile
export HIVE_HOME=/usr/local/hive
export PATH=$HIVE_HOME/bin:$PATH
4、msyql相关配置
- 创建好数据库hive并且配置好远程登录用作之后配置使用。
- 将mysql驱动
mysql-connector-java-8.0.25.jar
复制到$HIVE_HOME/lib
目录
mysql驱动下载地址:https://mvnrepository.com/artifact/mysql/mysql-connector-java
5、配置文件准备
使用命令
cp /usr/local/hive/conf/hive-default.xml.template /usr/local/hive/conf/hive-site.xml
将默认配置复制出来一份并重命名为hive-site.xml文件。
修改配置文件中的内容,这里建议先下载下来之后再使用notepad++进行修改,因为文件内容实在太长
<!-- jdbc 连接的 URL -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://10.182.71.115:3306/hive?useSSL=false</value>
</property>
<!-- jdbc 连接的 Driver-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<!-- jdbc 连接的 username-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<!-- jdbc 连接的 password -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<!-- Hive 元数据存储版本的验证 -->
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<!--元数据存储授权-->
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<!-- Hive 默认在 HDFS 的工作目录 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>system:java.io.tmpdir</name>
<value>/tmp/hive/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
6、初始化数据库
运行以下命令
schematool -initSchema -dbType mysql -verbose
运行完该命令之后会发现mysql数据库中的hive数据库已经新建了好多表。
7、启动hive
直接运行命令
hive
即可启动hive,在命令行中运行
show databases;
命令,如果一切OK,那就安装没问题。
8、集成hbase
8.1 新增或者修改hive-site.xml配置文件
<property>
<name>hive.zookeeper.quorum</name>
<value>hadoop01,hadoop02,hadoop03</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop01,hadoop02,hadoop03</value>
</property>
8.2 hbase依赖拷贝
ln -s /usr/local/hbase/lib/hbase-client-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-protocol-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-it-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-server-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-common-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-hadoop2-compat-2.5.0.jar $HIVE_HOME/lib/
ln -s /usr/local/hbase/lib/hbase-hadoop-compat-2.5.0.jar $HIVE_HOME/lib/
8.3 启动hive
启动之后,创建外部表
CREATE TABLE sogoulogs(
id string,
datetime string,
userid string,
searchname string,
retorder string,
cliorder string,
cliurl string
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES
("hbase.columns.mapping" = "
:key,
info:datetime,
info:userid,
info:searchname,
info:retorder,
info:cliorder,
info:cliurl
")
TBLPROPERTIES ("hbase.table.name" = "sogoulogs");
9、集成测试
9.1 查询全部
hive> select * from sogoulogs;
OK
0001 20221003 001 你好 1 1 https://www.baidu.com
1940021547934818200:00:011664791412121 00:00:01 19400215479348182 [天津工业大学\] 1 65 www.baidu.com/
Time taken: 0.338 seconds, Fetched: 2 row(s)
可以看到查询全部不会触发MapReduce计算
9.2 查询用户总数
hive> select count(distinct userid) from sogoulogs;
Query ID = root_20221003183038_e7f8d2b1-b050-4a88-b77a-16953b68fe45
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1664524473382_0005, Tracking URL = http://hadoop01:8089/proxy/application_1664524473382_0005/
Kill Command = /usr/local/hadoop/bin/mapred job -kill job_1664524473382_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-10-03 18:30:53,824 Stage-1 map = 0%, reduce = 0%
2022-10-03 18:31:13,438 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.21 sec
2022-10-03 18:31:19,666 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 9.9 sec
MapReduce Total cumulative CPU time: 9 seconds 900 msec
Ended Job = job_1664524473382_0005
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 9.9 sec HDFS Read: 10026 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 900 msec
OK
2
Time taken: 43.774 seconds, Fetched: 1 row(s)
9.3 统计新闻话题浏览量排行
hive> select searchname,count(*) as rank from sogoulogs group by searchname order by rank desc limit 10;
Query ID = root_20221003183406_43676434-3ca5-46cc-ad60-3210412dd148
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1664524473382_0006, Tracking URL = http://hadoop01:8089/proxy/application_1664524473382_0006/
Kill Command = /usr/local/hadoop/bin/mapred job -kill job_1664524473382_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-10-03 18:34:18,129 Stage-1 map = 0%, reduce = 0%
2022-10-03 18:34:29,557 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.31 sec
2022-10-03 18:34:34,699 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.93 sec
MapReduce Total cumulative CPU time: 7 seconds 930 msec
Ended Job = job_1664524473382_0006
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1664524473382_0007, Tracking URL = http://hadoop01:8089/proxy/application_1664524473382_0007/
Kill Command = /usr/local/hadoop/bin/mapred job -kill job_1664524473382_0007
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2022-10-03 18:34:51,056 Stage-2 map = 0%, reduce = 0%
2022-10-03 18:34:57,260 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.23 sec
2022-10-03 18:35:04,469 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.59 sec
MapReduce Total cumulative CPU time: 4 seconds 590 msec
Ended Job = job_1664524473382_0007
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.93 sec HDFS Read: 13796 HDFS Write: 161 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 4.59 sec HDFS Read: 7743 HDFS Write: 169 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 520 msec
OK
你好 1
[天津工业大学\] 1
Time taken: 59.951 seconds, Fetched: 2 row(s)
可以看到,稍复杂的sql会拆分成多个mapreduce任务执行
参考文档
https://blog.csdn.net/v15220/article/details/125131542
注意:本文归作者所有,未经作者允许,不得转载