Compaction magic in Couchbase Server

2023-11-04 09:08

本文主要是介绍Compaction magic in Couchbase Server,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!


http://blog.couchbase.com/compaction-magic-couchbase-server-20

Compaction magic in Couchbase Server 2.0

With Couchbase’s append-only storage design, it’s impossible to corrupt data and index files as updates go only to the end of the file. There are no in-place file updates and the files are never in an inconsistent state. But writing to an ever-expanding file will eventually eat up all your diskspace. Therefore, Couchbase server has a process called compaction. Compaction cleans up the disk space by removing stale data and index values so that the data and index files don’t unnecessarily eat up your disk space.  If your app’s use-case is mostly-reads, this maybe OK but if you have write-heavy workloads, you may want to learn about how auto-compaction works in Couchbase Server. 

By design, documents in Couchbase Server are partitioned into vBuckets (or partitions). There are multiple files used for storage – a data file per partition (the “data files”), multiple index-files (active, replica and temp) per design document and a master file that has metadata related to the design documents and view definitions. For example on Mac OSX (as shown below), the sample ‘gamesim’ bucket has  64 individual data files, one per partition (0.couch.1 to 63.couch.1), and a master file that has design documents and other view metadata (master.couch.1)

 

Couchbase Data and Master File

The index files are in the @indexes folder and consist of the active index file starting with main_, the replica index file (if index replication is enabled) starting with replica_ and a temporary file that is used while building and updating the index starting with tmp_.

Index Files in Couchbase Server

 

Data and index files in Couchbase Server are organized as b-trees.  The root nodes (shown in red) contains pointers to the intermediate nodes, which contain pointers to the leaf nodes (shown in blue). In the case of data files, the root and intermediate nodes track the sizes of documents under their sub-tree.  The leaf nodes store the document id, document metadata and pointers to the document content. For index files, the root and intermediate nodes track the outputted map-function result and the reduce values under their subtree.

B-tree storage for data files (shown on the left) and index files (shown on the right)

All mutations including inserts, updates and deletes are written to the end of the file leaving old data in place. Add a new document? The b-tree grows at the end. Delete a document? That gets recorded at the end of the b-tree. For example, as shown in the Figure below, document A is mutated followed by a mutation to document B and then a new document D is added followed by another mutation to document A. Old data is shown by the red crossed-out nodes in the Figure below.

Logical data layout for data and index files

 

By examining the size of the documents tracked by the root node in the b-tree file structure, the ratio of the actual size to the current size of the file is calculated. If this ratio hits a certain threshold that is configurable as shown in the Figure below, an online compaction process is triggered. Compaction scans through the current data and index files and creates new data and index files, without the items marked for cleanup. During compaction, the b-trees are balanced and the reduce values are re-computed for the new tree. Additionally, data that does not belong to a particular node is also cleaned up. 
Finally to catch-up with the active workload that might have changed the old partition data file during compaction, Couchbase copies over the data that was appended since the start of the compaction process to the new partition data file so that it is up-to date. The new index file is also updated in this manner. The old partition data and index file are then deleted and the new data and index files are used. 
Normally, compaction is an all-or-nothing operation but since  compaction in Couchbase is on a per partition (vbucket) basis, the dataset can be  compacted incrementally without losing any changes it has made when aborted.

Configuring the compaction thresholds in the settings UI tab for data  and index files

 

Compaction in Couchbase Server is an  online operation. By default, auto-compaction kicks in when the fragmentation threshold reaches 30%, but you should test what settings works well for your workload and tune this setting accordingly.

Because  compaction is a resource intensive you can also schedule it during off-peak hours. To prevent auto compaction from taking place when your database is in heavy use, you can configure an off-peak time period during which compaction is allowed using the UI shown above.  For example, here I’ve set compaction to run between 12am and 1am server time every day. If the compaction operation does not complete in this time period, it will continue, but you can check the box to have it aborted. 
Compaction can also be triggered manually per bucket or per design document as shown in the Figures below.

Manually compacting a data bucket in Couchbase

 

Manually compacting a design document in Couchbase

Compaction performance in Couchbase Server depends on IO capacity and  proper cluster sizing. Your cluster must be properly sized so that there is enough capacity in all the various areas to support everything else the system is doing to maintain the required level of performance.  So, how do you tune compaction in Couchbase Server?
There is no magic bullet here ... Depending on your application’s IOPS requirement, you need to size your cluster properly and might want to test your workload across a variety of different storage hardware. If your app is write heavy, SSD’s might be the best option but for heavy read ratios, EBS might be a good solution at a low cost. 
By default, if both data and view indexes are configured for auto-compaction, compaction operates sequentially, first on the database and then on the views.  By enabling parallel compaction, both the databases and views can be compacted at the same time. This requires more CPU and disk I/O, but if the database and view indexes are stored on different physical disk devices ( as is our best practice anyway), the two can complete in parallel so that the index and data files does not grow extremely large.
Conclusion
At the end of the day, every database needs regular maintenance. Online compaction is a huge plus but you have to test your system and configure your compaction settings appropriately so that it does not affect your system load. 

这篇关于Compaction magic in Couchbase Server的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/344707

相关文章

SQL Server使用SELECT INTO实现表备份的代码示例

《SQLServer使用SELECTINTO实现表备份的代码示例》在数据库管理过程中,有时我们需要对表进行备份,以防数据丢失或修改错误,在SQLServer中,可以使用SELECTINT... 在数据库管理过程中,有时我们需要对表进行备份,以防数据丢失或修改错误。在 SQL Server 中,可以使用 SE

Window Server创建2台服务器的故障转移群集的图文教程

《WindowServer创建2台服务器的故障转移群集的图文教程》本文主要介绍了在WindowsServer系统上创建一个包含两台成员服务器的故障转移群集,文中通过图文示例介绍的非常详细,对大家的... 目录一、 准备条件二、在ServerB安装故障转移群集三、在ServerC安装故障转移群集,操作与Ser

SQL Server数据库磁盘满了的解决办法

《SQLServer数据库磁盘满了的解决办法》系统再正常运行,我还在操作中,突然发现接口报错,后续所有接口都报错了,一查日志发现说是数据库磁盘满了,所以本文记录了SQLServer数据库磁盘满了的解... 目录问题解决方法删除数据库日志设置数据库日志大小问题今http://www.chinasem.cn天发

red5-server源码

red5-server源码:https://github.com/Red5/red5-server

安装SQL2005后SQL Server Management Studio 没有出来的解决方案

一种情况,在安装 sqlServer2005 时 居然出现两个警告: 1 Com+ 目录要求 2 Edition change check 郁闷!网上说出现两个警告,是肯定装不成功的!我抱着侥幸的态度试了下,成功了。 安装成功后,正准备 “ 仅工具、联机丛书和示例(T)” 但是安装不了,他提示我“工作站组件”安装过了对现有组件无法更新或升级。 解决办法: 1 打开“控

ERROR 2003 (HY000): Can't connect to MySQL server on (10061)

在linux系统上装了一个mysql-5.5,启动后本机都是可以访问的,操作都正常,同时建了一个%的用户(支持远程访问), root@debian:/# mysql -u loongson -pEnter password: Welcome to the MySQL monitor. Commands end with ; or \g.Your MySQL connection id

Oracle和Sql_Server 部分sql语句的区别

比如:A表中, 字段:gxmlflag  number;  比如数据:20210115 字段:gxmldate date ;    比如数据:2021-01-15 09:50:50 一、在Oracle数据库中: 1、insert 和 update 语句: t.gxmlflag = to_char(sysdate,'yyyymmdd'),t.gxmldate=sysdate 比如:update f

【VueJS】live-server 快速搭建服务 及 注意事项

本地开发常常需要搭建临时的服务,第一时间我们会想到用 http-server。 但现在流行修改文件浏览器自动刷新,这里可以使用 live-server 很简单就能启动一个看起来很专业的本地服务。 你只需要全局安装live-server: npm install -g live-server 并在项目根目录执行这条命令: PS E:\AblazeProject\Vue> live-serv

【虚拟机/服务器】Ubuntu Server上配置FTP服务器

当使用FTP工具连接但 Ubuntu Server 未安装 vsftpd 时会出现这样的报错 安装vsftpd sudo apt-get install vsftpd 卸载vsftpd sudo apt-get remove --purge vsftpd 当 Ubuntu Server 未启动 vsftpd 时会出现这样的报错 启动vsftpd sudo service

【虚拟机/服务器】在Ubuntu Server上从零开始配置Nginx、Mysql、PHP7.0

1、升级当前系统数据源 sudo apt-get update && sudo apt-get upgrade 遇到询问是否继续,输入 y 或直接回车继续就好了 2、安装 Nginx sudo apt-get install nginx 安装完成之后就会默认自动开启 Nginx 服务器,可以通过 ps -ef | grep nginx 查看。 3、配置 Nginx 环境 1)替换默认