大数据-NoSQL数据库-HBase操作框架:Phoenix【Java写的基于JDBC API的操作HBase数据库的SQL引擎框架;低延迟、事务性、可使用sql语句、提供JDBC接口】

本文主要是介绍大数据-NoSQL数据库-HBase操作框架:Phoenix【Java写的基于JDBC API的操作HBase数据库的SQL引擎框架;低延迟、事务性、可使用sql语句、提供JDBC接口】,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

一、Phoenix概述

1、Phoenix 定义

  • Phoenix 最早是 saleforce 的一个开源项目,后来成为 Apache 的顶级项目。
  • Phoenix 构建在 HBase 之上的开源 SQL 层. 能够让我们使用标准的 JDBC API 去建表, 插入数据和查询 HBase 中的数据, 从而可以避免使用 HBase 的客户端 API.
  • 在我们的应用和 HBase 之间添加了 Phoenix, 并不会降低性能, 而且我们也少写了很多代码.

2、Phoenix 特点

  1. 将 SQl 查询编译为 HBase 扫描
  2. 确定扫描 Rowkey 的最佳开始和结束位置
  3. 扫描并行执行
  4. 将 where 子句推送到服务器端的过滤器
  5. 通过协处理器进行聚合操作
  6. 完美支持 HBase 二级索引创建
  7. DML命令以及通过DDL命令创建和操作表和版本化增量更改。
  8. 容易集成:如Spark,Hive,Pig,Flume和Map Reduce。

3、Phoenix 架构

在这里插入图片描述

4、Phoenix 数据存储

Phoenix 将 HBase 的数据模型映射到关系型世界
在这里插入图片描述

二、Phoenix安装

1、下载 Phoenix

http://archive.apache.org/dist/phoenix/apache-phoenix-4.14.2-HBase-1.3/

2、解压 jar 包到任意节点(比如:hadoop102节点)

想要在哪台服务器上使用Phoenix,就在该台服务器安装

[whx@hadoop102 soft]$ tar -zxvf apache-phoenix-4.14.2-HBase-1.3-bin.tar.gz  -C ../module/

3、修改目录名称

[whx@hadoop102 module]$ mv apache-phoenix-4.14.2-HBase-1.3-bin/  phoenix
[whx@hadoop102 module]$ ll
total 32
drwxrwxr-x.  9 whx whx 4096 Jan 31 14:45 flume
drwxr-xr-x. 11 whx whx 4096 Jan 31 10:43 hadoop-2.7.2
drwxrwxr-x.  8 whx whx 4096 Feb  2 10:48 hbase
drwxrwxr-x.  9 whx whx 4096 Jan 30 19:27 hive
drwxr-xr-x.  8 whx whx 4096 Dec 13  2016 jdk1.8.0_121
drwxr-xr-x.  8 whx whx 4096 Feb  1 16:32 kafka
drwxrwx---.  5 whx whx 4096 May 24  2019 phoenix
drwxr-xr-x. 11 whx whx 4096 Jan 29 22:01 zookeeper-3.4.10
[whx@hadoop102 module]$ 

4、复制Phoenix目录下的 jar 包到HBase目录

复制 HBase 需要用到的 server 和 client 2 个 jar 包到/ojpt/module/hbase/lib目录

[whx@hadoop102 phoenix]$ cp phoenix-4.14.2-HBase-1.3-server.jar /opt/module/hbase/lib
[whx@hadoop102 phoenix]$ cp phoenix-4.14.2-HBase-1.3-client.jar /opt/module/hbase/lib
[whx@hadoop102 phoenix]$ 

5、分发Phoenix目录到hadoop101、hadoop103节点

[whx@hadoop102 module]$ xsync.sh phoenix/

6、分发HBase里的Phoenix的 jar 包到hadoop101、hadoop103节点

需要把刚才 copy 的 2个jar 包分发到其他 HBase 节点

[whx@hadoop102 hbase]$ xsync.sh lib/

7、配置hadoop102节点环境变量

[whx@hadoop102 ~]$ vim /etc/profile
JAVA_HOME=/opt/module/jdk1.8.0_121
HADOOP_HOME=/opt/module/hadoop-2.7.2
ZOOKEEPER_HOME=/opt/module/hadoop-2.7.2
HIVE_HOME=/opt/module/hive
FLUME_HOME=/opt/module/flume
HBASE_HOME=/opt/module/hbase
PHOENIX_HOME=/opt/module/phoenix
PHOENIX_CLASSPATH=$PHOENIX_HOME
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HIVE_HOME/bin:$FLUME_HOME/bin:$HBASE_HOME/bin:$PHOENIX_HOME/binexport PATH JAVA_HOME HADOOP_HOME ZOOKEEPER_HOME HIVE_HOME FLUME_HOME HBASE_HOME PHOENIX_HOME PHOENIX_CLASSPATH
[whx@hadoop102 ~]$ source /etc/profile

三、Phoenix的启动与停止

1、首先启动 hadoop, zookeeper, HBase

[whx@hadoop102 ~]$ start-dfs.sh
[whx@hadoop102 ~]$ xcall.sh /opt/module/zookeeper-3.4.10/bin/zkServer.sh start
[whx@hadoop102 ~]$ /opt/module/hbase/bin/start-hbase.sh

2、启动 Phoenix

[whx@hadoop102 ~]$ sqlline.py hadoop101,hadoop102,hadoop103:2181
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:hadoop101,hadoop102,hadoop103:2181 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:hadoop101,hadoop102,hadoop103:2181
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/phoenix/phoenix-4.14.2-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
21/02/04 08:52:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 4.14)
Driver: PhoenixEmbeddedDriver (version 4.14)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
133/133 (100%) Done
Done
sqlline version 1.2.0
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

四、Phoenix的使用

1、查看所有表

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-+
| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  | |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-+
|            | SYSTEM       | CATALOG     | SYSTEM TABLE  |          |            |                            |                 |              | |
|            | SYSTEM       | FUNCTION    | SYSTEM TABLE  |          |            |                            |                 |              | |
|            | SYSTEM       | LOG         | SYSTEM TABLE  |          |            |                            |                 |              | |
|            | SYSTEM       | SEQUENCE    | SYSTEM TABLE  |          |            |                            |                 |              | |
|            | SYSTEM       | STATS       | SYSTEM TABLE  |          |            |                            |                 |              | |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+--------------+-+
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 
  • Phoenix运行的时候,需要现在HBase数据库中创建一些Phoenix自己需要用到的一些表,比如:CATALOG 、FUNCTION、LOG、SEQUENCE、STATS
  • 其中 TABLE_SCHEM 为库名,TABLE_NAME 为表名
  • 从HBase中也能看到Phoenix新建的表
    [whx@hadoop102 ~]$ hbase shell
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/opt/module/hbase/lib/phoenix-4.14.2-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/opt/module/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    HBase Shell; enter 'help<RETURN>' for list of supported commands.
    Type "exit<RETURN>" to leave the HBase Shell
    Version 1.3.1, r930b9a55528fe45d8edce7af42fef2d35e77677a, Thu Apr  6 19:36:54 PDT 2017hbase(main):001:0> list
    TABLE                                                                                                                                                                                                                                                                    
    SYSTEM.CATALOG                                                                                                                                                                                                                                                           
    SYSTEM.FUNCTION                                                                                                                                                                                                                                                          
    SYSTEM.LOG                                                                                                                                                                                                                                                               
    SYSTEM.MUTEX                                                                                                                                                                                                                                                             
    SYSTEM.SEQUENCE                                                                                                                                                                                                                                                          
    SYSTEM.STATS                                                                                                                                                                                                                                                             
    6 row(s) in 0.1220 seconds=> ["SYSTEM.CATALOG", "SYSTEM.FUNCTION", "SYSTEM.LOG", "SYSTEM.MUTEX", "SYSTEM.SEQUENCE", "SYSTEM.STATS"]
    hbase(main):002:0> 
    

2、创建表

CREATE TABLE IF NOT EXISTS us_population (state CHAR(2) NOT NULL,city VARCHAR NOT NULL,population BIGINTCONSTRAINT whx_pk PRIMARY KEY (state, city)) column_encoded_bytes=0;
  1. char类型必须添加长度限制
  2. varchar 可以不用长度限制
  3. 主键映射到 HBase 中会成为 Rowkey. 如果有多个主键(联合主键), 会把多个主键的值拼成 rowkey
  4. 在 Phoenix 中, 默认会把表名,字段名等自动转换成大写. 如果要使用消息, 需要把他们用双引号括起来.
  5. column_encoded_bytes=0 表示禁止编码
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> CREATE TABLE IF NOT EXISTS us_population (
. . . . . . . . . . . . . . . . . . . . . . .>       state CHAR(2) NOT NULL,
. . . . . . . . . . . . . . . . . . . . . . .>       city VARCHAR NOT NULL,
. . . . . . . . . . . . . . . . . . . . . . .>       population BIGINT
. . . . . . . . . . . . . . . . . . . . . . .>       CONSTRAINT whx_pk PRIMARY KEY (state, city)) 
. . . . . . . . . . . . . . . . . . . . . . .>       column_encoded_bytes=0;No rows affected (1.244 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
|            | SYSTEM       | CATALOG        | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | FUNCTION       | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | LOG            | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | SEQUENCE       | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | STATS          | SYSTEM TABLE  |          |            |                            |                 |             |
|            |              | US_POPULATION  | TABLE         |          |            |                            |                 |             |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

在HBase中查看从Phoenix新建的表 us_population

hbase(main):001:0> describe 'US_POPULATION'
Table US_POPULATION is ENABLED                                                                                                                     
US_POPULATION, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apach
e.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserve
r|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase
.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.inde
x.PhoenixIndexBuilder'}                                                                                                                            
COLUMN FAMILIES DESCRIPTION                                                                                                                        
{NAME => '0', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL 
=> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                    
1 row(s) in 0.1760 secondshbase(main):002:0> 

3、删除表

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> drop table us_population;

4、插入/更新记录(upsert,而非insert)

upsert into us_population values('NY','NewYork',8143197);
upsert into us_population values('CA','Los Angeles',3844829);
upsert into us_population values('IL','Chicago',2842518);

在Phoenix中插入记录:

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> upsert into us_population values('NY','NewYork',8143197);
1 row affected (0.027 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> upsert into us_population values('CA','Los Angeles',3844829);
1 row affected (0.011 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> upsert into us_population values('IL','Chicago',2842518);
1 row affected (0.006 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

在Phoenix控制台查看

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from us_population;
+--------+--------------+-------------+
| STATE  |     CITY     | POPULATION  |
+--------+--------------+-------------+
| CA     | Los Angeles  | 3844829     |
| IL     | Chicago      | 2842518     |
| NY     | NewYork      | 8143197     |
+--------+--------------+-------------+
3 rows selected (0.035 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

在hbase控制台查看

hbase(main):002:0> scan 'US_POPULATION'
ROW                                   COLUMN+CELL                                                                                                  CALos Angeles                        column=0:POPULATION, timestamp=1612401978719, value=\x80\x00\x00\x00\x00:\xAA\xDD                            CALos Angeles                        column=0:_0, timestamp=1612401978719, value=x                                                                ILChicago                            column=0:POPULATION, timestamp=1612401979579, value=\x80\x00\x00\x00\x00+_\x96                               ILChicago                            column=0:_0, timestamp=1612401979579, value=x                                                                NYNewYork                            column=0:POPULATION, timestamp=1612401978697, value=\x80\x00\x00\x00\x00|A]                                  NYNewYork                            column=0:_0, timestamp=1612401978697, value=x                                                                
3 row(s) in 0.0530 secondshbase(main):003:0> 

5、删除记录(delete)

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> delete from us_population where state='CA' and city='Los Angeles';
1 row affected (0.01 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from us_population;
+--------+----------+-------------+
| STATE  |   CITY   | POPULATION  |
+--------+----------+-------------+
| IL     | Chicago  | 2842518     |
| NY     | NewYork  | 8143197     |
+--------+----------+-------------+
2 rows selected (0.023 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

6、Phoenix 表映射HBase中的表

默认情况下, 直接在 HBase 中创建的表通过 Phoenix 是查不到的。

如果要在 Phoenix 中操作直接在 HBase 中创建的表,则需要在 Phoenix 中进行表的映射。

映射方式有两种: 1. 视图映射 2. 表映射

先在HBase数据库中创建一个测试表:whx_table

hbase(main):019:0> create 'whx_table','cf_user','cf_company'
0 row(s) in 1.2170 seconds=> Hbase::Table - whx_table
hbase(main):020:0> desc 'whx_table'
Table whx_table is ENABLED                                                                                                                         
whx_table                                                                                                                                          
COLUMN FAMILIES DESCRIPTION                                                                                                                        
{NAME => 'cf_company', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                 
{NAME => 'cf_user', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                    
2 row(s) in 0.0100 secondshbase(main):021:0> 

向whx_table表中插入数据

hbase(main):027:0> put 'whx_table','1001','cf_user:firstname','Nick'
0 row(s) in 0.0470 seconds
hbase(main):029:0> put 'whx_table','1001','cf_user:lastname','Lee'
0 row(s) in 0.0150 seconds
hbase(main):030:0> put 'whx_table','1001','cf_company:name','HUAWEI'
0 row(s) in 0.0140 seconds
hbase(main):031:0> put 'whx_table','1001','cf_company:address','changanjie10hao'
0 row(s) in 0.0080 seconds
hbase(main):033:0> get 'whx_table','1001'
COLUMN                                                              CELL                                                                            cf_company:address                                                 timestamp=1612408142513, value=changanjie10hao                                  cf_company:name                                                    timestamp=1612408141461, value=HUAWEI                                           cf_user:firstname                                                  timestamp=1612408054676, value=Nick                                             cf_user:lastname                                                   timestamp=1612408141421, value=Lee                                              
1 row(s) in 0.0200 secondshbase(main):034:0> 

6.1 视图映射

Phoenix 创建的视图是只读的, 所以只能用来查询, 无法通过视图对数据进行修改等操作。

在Phoenix 中 创建whx_table视图来映射HBase里的whx_table表

create view "whx_table"(
empid_pk varchar primary key,
"cf_user"."firstname" varchar,
"cf_user"."lastname" varchar,
"cf_company"."name" varchar,
"cf_company"."address" varchar);
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
|            | SYSTEM       | CATALOG        | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | FUNCTION       | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | LOG            | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | SEQUENCE       | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | STATS          | SYSTEM TABLE  |          |            |                            |                 |             |
|            |              | US_POPULATION  | TABLE         |          |            |                            |                 |             |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> create view "whx_table"(empid_pk varchar primary key,"cf_user"."firstname" varchar,"cf_user"."lastname" varchar,"cf_company"."name" varchar,"cf_company"."address" varchar) column_encoded_bytes=0;
1 row affected (5.913 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
|            | SYSTEM       | CATALOG        | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | FUNCTION       | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | LOG            | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | SEQUENCE       | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | STATS          | SYSTEM TABLE  |          |            |                            |                 |             |
|            |              | US_POPULATION  | TABLE         |          |            |                            |                 |             |
|            |              | whx_table      | VIEW          |          |            |                            |                 |             |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from "whx_table";
+-----------+------------+-----------+---------+------------------+
| EMPID_PK  | firstname  | lastname  |  name   |     address      |
+-----------+------------+-----------+---------+------------------+
| 1001      | Nick       | Lee       | HUAWEI  | changanjie10hao  |
+-----------+------------+-----------+---------+------------------+
1 row selected (0.075 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

视图名称与表名称不能重复;如果还想在Phoenix 中 创建whx_table表来映射HBase里的whx_table表,则需要先将Phoenix 中 创建的whx_table视图删掉。

在Phoenix 中删除whx_table视图并不会影响HBase中的whx_table表

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> drop view "whx_table";
No rows affected (0.034 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
|            | SYSTEM       | CATALOG        | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | FUNCTION       | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | LOG            | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | SEQUENCE       | SYSTEM TABLE  |          |            |                            |                 |             |
|            | SYSTEM       | STATS          | SYSTEM TABLE  |          |            |                            |                 |             |
|            |              | US_POPULATION  | TABLE         |          |            |                            |                 |             |
+------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

6.2 表映射

使用 Phoenix创建对 HBase 的表映射,有两种方法:

  1. 当 HBase 中已经存在表时,可以以类似创建视图的方式创建关联表,只需要将create view 改为 create table 即可。 在 HBase 中创建表:
    create table "whx_table"(empid_pk varchar primary key,"cf_user"."firstname" varchar,"cf_user"."lastname" varchar,"cf_company"."name" varchar,"cf_company"."address" varchar) column_encoded_bytes=0;
    
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables
    +------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
    | TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE |
    +------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
    |            | SYSTEM       | CATALOG        | SYSTEM TABLE  |          |            |                            |                 |             |
    |            | SYSTEM       | FUNCTION       | SYSTEM TABLE  |          |            |                            |                 |             |
    |            | SYSTEM       | LOG            | SYSTEM TABLE  |          |            |                            |                 |             |
    |            | SYSTEM       | SEQUENCE       | SYSTEM TABLE  |          |            |                            |                 |             |
    |            | SYSTEM       | STATS          | SYSTEM TABLE  |          |            |                            |                 |             |
    |            |              | US_POPULATION  | TABLE         |          |            |                            |                 |             |
    +------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> create table "whx_table"(empid_pk varchar primary key,"cf_user"."firstname" varchar,"cf_user"."lastname" varchar,"cf_company"."name" varchar,"cf_company"."address" varchar) column_encoded_bytes=0;
    1 row affected (5.913 seconds)
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> !tables
    +------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
    | TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME   |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE |
    +------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
    |            | SYSTEM       | CATALOG        | SYSTEM TABLE  |          |            |                            |                 |             |
    |            | SYSTEM       | FUNCTION       | SYSTEM TABLE  |          |            |                            |                 |             |
    |            | SYSTEM       | LOG            | SYSTEM TABLE  |          |            |                            |                 |             |
    |            | SYSTEM       | SEQUENCE       | SYSTEM TABLE  |          |            |                            |                 |             |
    |            | SYSTEM       | STATS          | SYSTEM TABLE  |          |            |                            |                 |             |
    |            |              | US_POPULATION  | TABLE         |          |            |                            |                 |             |
    |            |              | whx_table      | TABLE         |          |            |                            |                 |             |
    +------------+--------------+----------------+---------------+----------+------------+----------------------------+-----------------+-------------+
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from "whx_table";
    +-----------+------------+-----------+---------+------------------+
    | EMPID_PK  | firstname  | lastname  |  name   |     address      |
    +-----------+------------+-----------+---------+------------------+
    | 1001      | Nick       | Lee       | HUAWEI  | changanjie10hao  |
    +-----------+------------+-----------+---------+------------------+
    1 row selected (0.061 seconds)
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 
    
    说明: 添加column_encoded_bytes=0这个参数之后, 在 HBase 中添加的数据在 Phoenix 中也可以查询到. 否则 HBase 中添加的数据在 Phoenix 中查询不到.
  2. 当 HBase 中不存在表时,可以直接使用 create table 指令创建需要的表,系统将会自动在 Phoenix 和 HBase 中创建 whx_table 的表,并会根据指令内的参数对表结构进行初始化。

在Phoenix 中删除whx_table表会同时删掉HBase中的whx_table表

在Phoenix 中对whx_table表可以进行增删改查操作

  1. 插入操作:表名要用双引号来限定大小写,属性名用单引号
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> upsert into "whx_table" values ('1002','Tom','Lee','LIANXIANG','changanjie11hao');
    1 row affected (0.03 seconds)
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from "whx_table";
    +-----------+------------+-----------+------------+------------------+
    | EMPID_PK  | firstname  | lastname  |    name    |     address      |
    +-----------+------------+-----------+------------+------------------+
    | 1001      | Nick       | Lee       | HUAWEI     | changanjie10hao  |
    | 1002      | Tom        | Lee       | LIANXIANG  | changanjie11hao  |
    +-----------+------------+-----------+------------+------------------+
    2 rows selected (0.031 seconds)
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 
    
  2. 删除操作:表名要用双引号来限定大小写
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> delete from  "whx_table" where EMPID_PK='1002';
    1 row affected (0.009 seconds)
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from "whx_table";
    +-----------+------------+-----------+---------+------------------+
    | EMPID_PK  | firstname  | lastname  |  name   |     address      |
    +-----------+------------+-----------+---------+------------------+
    | 1001      | Nick       | Lee       | HUAWEI  | changanjie10hao  |
    +-----------+------------+-----------+---------+------------------+
    1 row selected (0.029 seconds)
    0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 
    

7、Phoenix 创建 HBase 二级索引

7.1 配置 HBase 支持 Phoenix 创建二级索引(在hadoop102节点)

7.1.1 修改HBase的配置文件:/opt/module/hbase/conf/hbase-site.xml
<configuration><!-- 每个regionServer的共享目录,用来持久化Hbase,默认情况下在/tmp/hbase下面 -->  <property>     <name>hbase.rootdir</name>     <value>hdfs://hadoop101:9000/HBase</value>   </property><!-- hbase集群模式,false表示hbase的单机,true表示是分布式模式 -->  <property>   <name>hbase.cluster.distributed</name><value>true</value></property><!-- hbase依赖的外部Zookeeper地址 -->  <property>    <name>hbase.zookeeper.quorum</name><value>hadoop101:2181,hadoop102:2181,hadoop103:2181</value></property><!--外部Zookeeper各个Linux服务器节点上保存数据的目录--><property>   <name>hbase.zookeeper.property.dataDir</name><value>/opt/module/zookeeper-3.4.10/datas</value></property>
</configuration>

改为:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration><!-- 每个regionServer的共享目录,用来持久化Hbase,默认情况下在/tmp/hbase下面 -->  <property>     <name>hbase.rootdir</name>     <value>hdfs://hadoop101:9000/HBase</value>   </property><!-- hbase集群模式,false表示hbase的单机,true表示是分布式模式 -->  <property>   <name>hbase.cluster.distributed</name><value>true</value></property><!-- hbase依赖的外部Zookeeper地址:如果要配置HBase支持Phoenix创建二级索引,则不要添加端口号2181 -->  <property>    <name>hbase.zookeeper.quorum</name><value>hadoop101,hadoop102,hadoop103</value></property><!--外部Zookeeper各个Linux服务器节点上保存数据的目录--><property>   <name>hbase.zookeeper.property.dataDir</name><value>/opt/module/zookeeper-3.4.10/datas</value></property><!--配置HBase支持Phoenix创建二级索引:添加如下配置到HBase的Hmaster节点的hbase-site.xml--><property><name>hbase.master.loadbalancer.class</name><value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value></property><property><name>hbase.coprocessor.master.classes</name><value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value></property><!--配置HBase支持Phoenix创建二级索引:添加如下配置到HBase的Hregionerver节点的hbase-site.xml--><property><name>hbase.regionserver.wal.codec</name><value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value></property><property><name>hbase.region.server.rpc.scheduler.factory.class</name><value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value><description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description></property><property><name>hbase.rpc.controllerfactory.class</name><value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value><description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description></property>
</configuration>
7.1.2 在hadoop102节点上分发/opt/module/hbase/conf/hbase-site.xml
[whx@hadoop102 conf]$ xsync.sh hbase-site.xml 
7.1.3 重启HBase、Phoenix

7.2 测试索引

7.2.1 没创建索引时:
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> explain select "firstname" from "whx_table" where "firstname"='Nick';
+---------------------------------------------------------------------+-----------------+----------------+--------------+
|                                PLAN                                 | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER whx_table  | null            | null           | null         |
|     SERVER FILTER BY cf_user."firstname" = 'Nick'                   | null            | null           | null         |
+---------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.032 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

在phoneix中如果出现了FULL SCAN ,代表没有使用上二级索引,出现了全部列扫描

7.2.2 创建索引
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103>  create index idx_firstname on "whx_table"("cf_user"."firstname");
2 rows affected (6.383 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

测试

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> explain select "firstname" from "whx_table" where "firstname"='Nick';
+-----------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                       PLAN                                        | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER IDX_FIRSTNAME ['Nick']  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY                                               | null            | null           | null         |
+-----------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.054 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

如果出现 RANGE SCAN OVER IDX_FIRSTNAME,代表使用上了IDX_FIRSTNAME索引,进行了范围查询!

注意:利用索引查询时不能写select * 语句;

7.2.3 删除索引

drop index 索引名 on 表名

0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> drop index idx_firstname on "whx_table";
7.2.4 联合索引
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> create index idx_firstname_lastname on "whx_table"("cf_user"."firstname","cf_user"."lastname");
2 rows affected (6.26 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> select * from "whx_table";
+-----------+------------+-----------+------------+------------------+
| EMPID_PK  | firstname  | lastname  |    name    |     address      |
+-----------+------------+-----------+------------+------------------+
| 1001      | Nick       | Lee       | HUAWEI     | changanjie10hao  |
| 1002      | Tom        | Lee       | LIANXIANG  | changanjie11hao  |
+-----------+------------+-----------+------------+------------------+
2 rows selected (0.053 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> explain select "firstname" from "whx_table" where "firstname"='Nick' and "lastname"='Lee';
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                               PLAN                                               | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER IDX_FIRSTNAME_LASTNAME ['Nick','Lee']  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY                                                              | null            | null           | null         |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.049 seconds)
0: jdbc:phoenix:hadoop101,hadoop102,hadoop103> 

如果出现 RANGE SCAN OVER IDX_FIRSTNAME_LASTNAME,代表使用上了IDX_FIRSTNAME_LASTNAME索引,进行了范围查询!

7.3 全局索引与局部索引区别

创建全局索引的方法:

CREATE INDEX my_index ON my_table (my_col)

创建局部索引的方法(相比全局索引多了一个关键字 local):

CREATE LOCAL INDEX my_index ON my_table (my_index)
7.3.1 Global index

Global index 是一种分布式索引,可以直接利用索引定位服务器和region,速度更快,但是由于分布式的原因,数据一旦出现新增变化,分布式的索引要进行跨服务的同步操作,带来大量的通信消耗。所以在写操作频繁的字段上不适合建立Global index。
- Global(全局)索引在创建后,专门在hbase中,生成一个表,将索引的信息存储在表中!
- 适合多读少写的场景!
- 每次写操作,不仅要更新数据,还需要更新索引!
- 比如:数据表在RegionServer01,索引表在RegionServer02中,每次发送一次put请求,必须先请求RegionServer01,再请求RegionServer02,才能完成更新。网络开销很大,加重RegionServer集群的压力。

7.3.2 Local index

Local index 由于是数据与索引在同一服务器上,所以要查询的数据在哪台服务器的哪个region是无法定位的,只能先找到region然后再利用索引。

  • local(本地)索引,在创建后,在表中,创建一个列族,在这个列族中保存索引的信息!
  • 适合多写少读的场景!
  • 索引是以列族的形式在表中存储,索引和数据在一个RegionServer上,此时 频繁写操作时,只需要请求当前的RegionServer。



Phoenix综述(史上最全Phoenix中文文档)

这篇关于大数据-NoSQL数据库-HBase操作框架:Phoenix【Java写的基于JDBC API的操作HBase数据库的SQL引擎框架;低延迟、事务性、可使用sql语句、提供JDBC接口】的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1128906

相关文章

JVM 的类初始化机制

前言 当你在 Java 程序中new对象时,有没有考虑过 JVM 是如何把静态的字节码(byte code)转化为运行时对象的呢,这个问题看似简单,但清楚的同学相信也不会太多,这篇文章首先介绍 JVM 类初始化的机制,然后给出几个易出错的实例来分析,帮助大家更好理解这个知识点。 JVM 将字节码转化为运行时对象分为三个阶段,分别是:loading 、Linking、initialization

Spring Security 基于表达式的权限控制

前言 spring security 3.0已经可以使用spring el表达式来控制授权,允许在表达式中使用复杂的布尔逻辑来控制访问的权限。 常见的表达式 Spring Security可用表达式对象的基类是SecurityExpressionRoot。 表达式描述hasRole([role])用户拥有制定的角色时返回true (Spring security默认会带有ROLE_前缀),去

浅析Spring Security认证过程

类图 为了方便理解Spring Security认证流程,特意画了如下的类图,包含相关的核心认证类 概述 核心验证器 AuthenticationManager 该对象提供了认证方法的入口,接收一个Authentiaton对象作为参数; public interface AuthenticationManager {Authentication authenticate(Authenti

Spring Security--Architecture Overview

1 核心组件 这一节主要介绍一些在Spring Security中常见且核心的Java类,它们之间的依赖,构建起了整个框架。想要理解整个架构,最起码得对这些类眼熟。 1.1 SecurityContextHolder SecurityContextHolder用于存储安全上下文(security context)的信息。当前操作的用户是谁,该用户是否已经被认证,他拥有哪些角色权限…这些都被保

Spring Security基于数据库验证流程详解

Spring Security 校验流程图 相关解释说明(认真看哦) AbstractAuthenticationProcessingFilter 抽象类 /*** 调用 #requiresAuthentication(HttpServletRequest, HttpServletResponse) 决定是否需要进行验证操作。* 如果需要验证,则会调用 #attemptAuthentica

Spring Security 从入门到进阶系列教程

Spring Security 入门系列 《保护 Web 应用的安全》 《Spring-Security-入门(一):登录与退出》 《Spring-Security-入门(二):基于数据库验证》 《Spring-Security-入门(三):密码加密》 《Spring-Security-入门(四):自定义-Filter》 《Spring-Security-入门(五):在 Sprin

Java架构师知识体认识

源码分析 常用设计模式 Proxy代理模式Factory工厂模式Singleton单例模式Delegate委派模式Strategy策略模式Prototype原型模式Template模板模式 Spring5 beans 接口实例化代理Bean操作 Context Ioc容器设计原理及高级特性Aop设计原理Factorybean与Beanfactory Transaction 声明式事物

大模型研发全揭秘:客服工单数据标注的完整攻略

在人工智能(AI)领域,数据标注是模型训练过程中至关重要的一步。无论你是新手还是有经验的从业者,掌握数据标注的技术细节和常见问题的解决方案都能为你的AI项目增添不少价值。在电信运营商的客服系统中,工单数据是客户问题和解决方案的重要记录。通过对这些工单数据进行有效标注,不仅能够帮助提升客服自动化系统的智能化水平,还能优化客户服务流程,提高客户满意度。本文将详细介绍如何在电信运营商客服工单的背景下进行

SQL中的外键约束

外键约束用于表示两张表中的指标连接关系。外键约束的作用主要有以下三点: 1.确保子表中的某个字段(外键)只能引用父表中的有效记录2.主表中的列被删除时,子表中的关联列也会被删除3.主表中的列更新时,子表中的关联元素也会被更新 子表中的元素指向主表 以下是一个外键约束的实例展示

基于MySQL Binlog的Elasticsearch数据同步实践

一、为什么要做 随着马蜂窝的逐渐发展,我们的业务数据越来越多,单纯使用 MySQL 已经不能满足我们的数据查询需求,例如对于商品、订单等数据的多维度检索。 使用 Elasticsearch 存储业务数据可以很好的解决我们业务中的搜索需求。而数据进行异构存储后,随之而来的就是数据同步的问题。 二、现有方法及问题 对于数据同步,我们目前的解决方案是建立数据中间表。把需要检索的业务数据,统一放到一张M