本文主要是介绍StarRocks分区表历史数据删除与管理,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
一、背景介绍
- 在使用 StarRocks 时,可能会遇到需要删除大批量数据的情况。然而,StarRocks 对 DELETE 操作的支持并不理想,主要存在以下问题:
- 不建议执行高频的 DELETE 操作:删除的数据会标记为“Deleted”,暂时保留在 Segment 中,不会立即进行物理删除。Compaction(数据版本合并)完成之后会被回收。
- 查询效率可能降低:执行 DELETE 语句后,可能会导致接下来一段时间内(Compaction 完成之前)的查询效率降低。
- DELETE 语句不支持函数传参:例如:
DELETE FROM xxx WHERE DATE(time) < DATE_SUB(CURDATE(), INTERVAL 2 DAY); 报错:ERROR 1064 (HY000): Getting analyzing error from line 1, column 48 to line 1, column 57. Detail message: Left expr of binary predicate should be column name.
为了更高效地管理分区表中的历史数据,我们可以使用 truncate table partition
操作。然而,StarRocks不支持在 truncate table partition
中使用 WHERE 条件。因此,我们需要编写一个 Shell 脚本来生成相应的 SQL 语句。
二、使用示例
1、创建分区测试表
CREATE TABLE example_db.test_tb
( `partition_date` date NULL COMMENT "分区日期(yyyy-mm-dd)",`id` int NULL COMMENT "主键id",`name` STRING NULL COMMENT "姓名",`age` STRING NULL COMMENT "年龄",`time` datetime NULL COMMENT "etl_时间"
) ENGINE = OLAP
DUPLICATE KEY(`partition_date`,`id`)
PARTITION BY RANGE(`partition_date`)
(START ("2024-01-01") END ("2024-06-01") EVERY (INTERVAL 1 MONTH))
DISTRIBUTED BY HASH(`partition_date`) BUCKETS 3
PROPERTIES (
"replication_num" = "3",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "MONTH",
"dynamic_partition.prefix" = "p",
"dynamic_partition.end" = "2",
"dynamic_partition.buckets" = "1"
);dynamic_partition.enable:开启动态分区特性
dynamic_partition.time_unit:动态分区的时间粒度,取值为 HOUR、DAY、WEEK、MONTH 或 YEAR
dynamic_partition.prefix:动态分区的前缀名,默认值为 p
dynamic_partition.end:提前创建的分区数量,取值范围为正整数。根据 dynamic_partition.time_unit 属性的不同,以当天(周/月)为基准,提前创建对应范围的分区
dynamic_partition.start:TTL保留的动态分区的起始偏移,取值范围为负整数。根据 dynamic_partition.time_unit 属性的不同,以当天(周/月)为基准,分区范围在此偏移之前的分区将会被删除。(如果不填写,则默认为 Integer.MIN_VALUE,即 -2147483648,表示不删除历史分区。)
dynamic_partition.buckets:分区创建的分桶数
2、插入测试数据
mysql> INSERT INTO example_db.test_tb VALUES-> ('2024-01-01','1', '张三', '20','2024-01-01 10:00:00'),-> ('2024-02-01','2', '李四', '20','2024-02-01 10:00:00'),-> ('2024-03-01','3', '王五', '20','2024-03-01 10:00:00');
Query OK, 3 rows affected (0.41 sec)
{'label':'insert_ff644af6-2caa-11ef-80c1-1aa2745601c2', 'status':'VISIBLE', 'txnId':'4964505'}mysql> select * from example_db.test_tb ;
+----------------+------+--------+------+---------------------+
| partition_date | id | name | age | time |
+----------------+------+--------+------+---------------------+
| 2024-02-01 | 2 | 李四 | 20 | 2024-02-01 10:00:00 |
| 2024-01-01 | 1 | 张三 | 20 | 2024-01-01 10:00:00 |
| 2024-03-01 | 3 | 王五 | 20 | 2024-03-01 10:00:00 |
+----------------+------+--------+------+---------------------+
3 rows in set (0.02 sec)
3、查看分区数据
mysql> show partitions from example_db.test_tb;
+-------------+---------------+----------------+---------------------+--------------------+--------+----------------+----------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+--------------------------+----------+------------+----------+
| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | VisibleVersionHash | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | LastConsistencyCheckTime | DataSize | IsInMemory | RowCount |
+-------------+---------------+----------------+---------------------+--------------------+--------+----------------+----------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+--------------------------+----------+------------+----------+
| 16077719 | p202401 | 2 | 2024-06-17 21:10:41 | 0 | NORMAL | partition_date | [types: [DATE]; keys: [2024-01-01]; ..types: [DATE]; keys: [2024-02-01]; ) | partition_date | 3 | 3 | HDD | 9999-12-31 23:59:59 | NULL | 0B | false | 0 |
| 16077720 | p202402 | 2 | 2024-06-17 21:10:41 | 0 | NORMAL | partition_date | [types: [DATE]; keys: [2024-02-01]; ..types: [DATE]; keys: [2024-03-01]; ) | partition_date | 3 | 3 | HDD | 9999-12-31 23:59:59 | NULL | 0B | false | 0 |
| 16077721 | p202403 | 2 | 2024-06-17 21:10:41 | 0 | NORMAL | partition_date | [types: [DATE]; keys: [2024-03-01]; ..types: [DATE]; keys: [2024-04-01]; ) | partition_date | 3 | 3 | HDD | 9999-12-31 23:59:59 | NULL | 0B | false | 0 |
| 16077722 | p202404 | 1 | 2024-06-17 21:10:25 | 0 | NORMAL | partition_date | [types: [DATE]; keys: [2024-04-01]; ..types: [DATE]; keys: [2024-05-01]; ) | partition_date | 3 | 3 | HDD | 9999-12-31 23:59:59 | NULL | 0B | false | 0 |
| 16077723 | p202405 | 1 | 2024-06-17 21:10:25 | 0 | NORMAL | partition_date | [types: [DATE]; keys: [2024-05-01]; ..types: [DATE]; keys: [2024-06-01]; ) | partition_date | 3 | 3 | HDD | 9999-12-31 23:59:59 | NULL | 0B | false | 0 |
| 16077786 | p202406 | 1 | 2024-06-17 21:10:25 | 0 | NORMAL | partition_date | [types: [DATE]; keys: [2024-06-01]; ..types: [DATE]; keys: [2024-07-01]; ) | partition_date | 1 | 3 | HDD | 9999-12-31 23:59:59 | NULL | 0B | false | 0 |
| 16077791 | p202407 | 1 | 2024-06-17 21:10:25 | 0 | NORMAL | partition_date | [types: [DATE]; keys: [2024-07-01]; ..types: [DATE]; keys: [2024-08-01]; ) | partition_date | 1 | 3 | HDD | 9999-12-31 23:59:59 | NULL | 0B | false | 0 |
| 16077796 | p202408 | 1 | 2024-06-17 21:10:25 | 0 | NORMAL | partition_date | [types: [DATE]; keys: [2024-08-01]; ..types: [DATE]; keys: [2024-09-01]; ) | partition_date | 1 | 3 | HDD | 9999-12-31 23:59:59 | NULL | 0B | false | 0 |
+-------------+---------------+----------------+---------------------+--------------------+--------+----------------+----------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+--------------------------+----------+------------+----------+
8 rows in set (0.00 sec)
三、Shell脚本示例
- 以下是一个示例Shell脚本,用于根据当前时间删除指定时间段前的历史分区数据。该脚本支持按年、月、日分区进行删除。
#!/bin/bash# 配置数据库连接信息
ip="xxx"
port="xxx"
user="xxx"
passwd="xxx"
db="xxx"
tb="xxx"# 检查输入参数
if [ $# -ne 3 ]; thenecho "Usage: $0 {DAY|YEAR|MONTH} {start_date} {end_date}"echo "Example: $0 DAY 2024-06-08 2024-06-14"exit 1
fi# 获取参数
partition_type=$1
start_date=$2
end_date=$3# 转换日期格式
start_date=$(date -d "$start_date" +%Y%m%d)
end_date=$(date -d "$end_date" +%Y%m%d)# 生成分区列表
partitions=""case $partition_type inDAY)current_date=$start_datewhile [ "$current_date" -le "$end_date" ]; dopartitions+="p$current_date,"current_date=$(date -d "$current_date + 1 day" +%Y%m%d)done;;YEAR)start_year=$(date -d "$start_date" +%Y)end_year=$(date -d "$end_date" +%Y)for ((year=$start_year; year<=$end_year; year++)); dopartitions+="p$year,"done;;MONTH)start_month=$(date -d "$start_date" +%Y%m)end_month=$(date -d "$end_date" +%Y%m)current_month=$start_monthwhile [ "$(date -d "${current_month}01" +%Y%m)" -le "$end_month" ]; dopartitions+="p$current_month,"current_month=$(date -d "${current_month}01 + 1 month" +%Y%m)done;;*)echo "Invalid partition type: $partition_type"echo "Usage: $0 {DAY|YEAR|MONTH} {start_date} {end_date}"exit 1;;
esac# 去掉最后一个逗号
partitions=${partitions%,}# 生成truncate table语句
truncate_stmt="truncate table $db.$tb partition ($partitions);"# 输出truncate语句
echo $truncate_stmt# 执行truncate语句并检查结果
if echo $truncate_stmt | mysql -h $ip -u$user -p"$passwd" -P$port $db; thenecho "执行成功!"
elseecho "执行失败!"
fi
1、使用示例
假设我们需要删除以下分区:
- 按年分区:从2022-01-01到2024-06-01
- 按月分区:从2024-01-01到2024-06-01
- 按日分区:从2024-01-01到2024-06-01
我们可以通过以下命令运行脚本:
- 按年分区
./truncate_partitions.sh YEAR 2022-01-01 2024-06-01生成truncate table语句:
truncate table example_db.test_tb partition (p2022,p2023,p2024);
执行成功!
- 按月分区
./truncate_partitions.sh MONTH 2024-01-01 2024-06-01生成truncate table语句:
truncate table example_db.test_tb partition (p202401,p202402,p202403,p202404,p202405,p202406);
执行成功!
- 按日分区
./truncate_partitions.sh DAY 2024-01-01 2024-06-01生成truncate table语句:
truncate table example_db.test_tb partition (p20240101,p20240102,p20240103,p20240104,p20240105,p20240106,p20240107,p20240108,p20240109,p20240110,p20240111,p20240112,p20240113,p20240114,p20240115,p20240116,p20240117,p20240118,p20240119,p20240120,p20240121,p20240122,p20240123,p20240124,p20240125,p20240126,p20240127,p20240128,p20240129,p20240130,p20240131,p20240201,p20240202,p20240203,p20240204,p20240205,p20240206,p20240207,p20240208,p20240209,p20240210,p20240211,p20240212,p20240213,p20240214,p20240215,p20240216,p20240217,p20240218,p20240219,p20240220,p20240221,p20240222,p20240223,p20240224,p20240225,p20240226,p20240227,p20240228,p20240229,p20240301,p20240302,p20240303,p20240304,p20240305,p20240306,p20240307,p20240308,p20240309,p20240310,p20240311,p20240312,p20240313,p20240314,p20240315,p20240316,p20240317,p20240318,p20240319,p20240320,p20240321,p20240322,p20240323,p20240324,p20240325,p20240326,p20240327,p20240328,p20240329,p20240330,p20240331,p20240401,p20240402,p20240403,p20240404,p20240405,p20240406,p20240407,p20240408,p20240409,p20240410,p20240411,p20240412,p20240413,p20240414,p20240415,p20240416,p20240417,p20240418,p20240419,p20240420,p20240421,p20240422,p20240423,p20240424,p20240425,p20240426,p20240427,p20240428,p20240429,p20240430,p20240501,p20240502,p20240503,p20240504,p20240505,p20240506,p20240507,p20240508,p20240509,p20240510,p20240511,p20240512,p20240513,p20240514,p20240515,p20240516,p20240517,p20240518,p20240519,p20240520,p20240521,p20240522,p20240523,p20240524,p20240525,p20240526,p20240527,p20240528,p20240529,p20240530,p20240531,p20240601);
执行成功!
四、总结
- 通过使用 Shell 脚本,我们可以方便地生成 StarRocks 中
truncate table partition
的 SQL 语句,从而高效地管理分区表中的历史数据。
这篇关于StarRocks分区表历史数据删除与管理的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!