本文主要是介绍Clickhouse MergeTree异常数据处理,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
作者:俊达
说明
clickhouse mergetree的数据文件如果遇到数据损坏,可能会导致clickhouse无法启动。
本文章说明如何处理这类问题。
测试
我们先人为模拟破坏mergetree数据文件:
detach table:
ck01 :) detach table metrics;DETACH TABLE metricsQuery id: bb7f334b-5203-4040-8282-eb45b01b1b72Ok.0 rows in set. Elapsed: 0.001 sec.
清空data.mrk文件
root@ck01:/data/clickhouse/clickhouse/data/local/metrics/20221129_12_12_0# ls -l
total 36
-r--r----- 1 root root 251 Dec 6 05:37 checksums.txt
-r--r----- 1 root root 129 Dec 6 05:37 columns.txt
-r--r----- 1 root root 1 Dec 6 05:37 count.txt
-r--r----- 1 root root 164 Dec 6 05:37 data.bin
-r--r----- 1 root root 176 Dec 6 05:37 data.mrk3
-r--r----- 1 root root 10 Dec 6 05:37 default_compression_codec.txt
-r--r----- 1 root root 8 Dec 6 05:37 minmax_tt.idx
-r--r----- 1 root root 4 Dec 6 05:37 partition.dat
-r--r----- 1 root root 24 Dec 6 05:37 primary.idxroot@ck01:/data/clickhouse/clickhouse/data/local/metrics/20221129_12_12_0# echo > data.mrk3
配置参数
max_suspicious_broken_parts默认为10,表示可以容忍10个part数据异常。这里我们将max_suspicious_broken_parts设置为0。
root@ck01:/data/log/clickhouse-server# cd /data/etc/clickhouse-server/config.d/root@ck01:/data/etc/clickhouse-server/config.d# cat max_suspicious_broken_parts.xml
<?xml version="1.0"?>
<yandex><merge_tree><max_suspicious_broken_parts>0</max_suspicious_broken_parts></merge_tree>
</yandex>
启动实例
2022.12.06 06:50:39.749105 [ 64066 ] {} <Error> Application: DB::Exception: Suspiciously many (1 parts, 592.00 B in total) broken parts to remove while maximum allowed broken parts count is 0. You can change the maximum value with merge tree setting 'max_suspicious_broken_parts' in <merge_tree> configuration section or in table settings in .sql file (don't forget to return setting back to default value): Cannot attach table `local`.`metrics` from metadata file /data/clickhouse/clickhouse/store/8a5/8a567911-82c1-402d-8e61-76dd938e89ef/metrics.sql from query ATTACH TABLE local.metrics UUID 'def88518-fd7b-418d-a7dd-6564e38bba39' (`tt` DateTime, `tags` Map(String, String), `metric` String, `value` Float64, `str_value` String) ENGINE = MergeTree PARTITION BY toYYYYMMDD(tt) ORDER BY (metric, tt) SETTINGS index_granularity = 8192, max_suspicious_broken_parts = 0
2022.12.06 06:50:39.749145 [ 64066 ] {} <Information> Application: shutting down
2022.12.06 06:50:39.749151 [ 64066 ] {} <Debug> Application: Uninitializing subsystem: Logging Subsystem
2022.12.06 06:50:39.749253 [ 64067 ] {} <Information> BaseDaemon: Stop SignalListener thread
这时候,我们发现,实例已经无法正常启动了。
解决方法
方法1:
如果可以接受数据丢失,可以将参数max_suspicious_broken_parts设置得大一些
<?xml version="1.0"?>
<yandex><merge_tree><max_suspicious_broken_parts>1000</max_suspicious_broken_parts></merge_tree>
</yandex>
方法2:
使用force_restore_data标记文件
touch /data/clickhouse/clickhouse/flags/force_restore_data
clickhouse-server --config-file /data/etc/clickhouse-server/config.xml --daemon
启动后,可以从detached_parts表中查看detach的part信息:
ck01 :) select * from detached_parts where table='metrics';SELECT *
FROM detached_parts
WHERE table = 'metrics'Query id: c6f61369-259d-4c07-b730-7572c3095bad┌─database─┬─table───┬─partition_id─┬─name─────────────────────────────┬─disk────┬─reason──────────┬─min_block_number─┬─max_block_number─┬─level─┐
│ local │ metrics │ 20221129 │ broken-on-start_20221129_12_12_0 │ default │ broken-on-start │ 12 │ 12 │ 0 │
└──────────┴─────────┴──────────────┴──────────────────────────────────┴─────────┴─────────────────┴──────────────────┴──────────────────┴───────┘1 row in set. Elapsed: 0.002 sec.
异常数据被移到了detached目录
ls detached/
broken-on-start_20221129_12_12_0
如果不允许丢失数据,需要从备份中恢复数据。
可以使用alter table attach part命令从备份中将part恢复出来。
更多技术信息请查看云掣官网https://yunche.pro/?t=yrgw
这篇关于Clickhouse MergeTree异常数据处理的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!