Compaction magic in Couchbase Server

2023-11-04 09:08

本文主要是介绍Compaction magic in Couchbase Server,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!


http://blog.couchbase.com/compaction-magic-couchbase-server-20

Compaction magic in Couchbase Server 2.0

With Couchbase’s append-only storage design, it’s impossible to corrupt data and index files as updates go only to the end of the file. There are no in-place file updates and the files are never in an inconsistent state. But writing to an ever-expanding file will eventually eat up all your diskspace. Therefore, Couchbase server has a process called compaction. Compaction cleans up the disk space by removing stale data and index values so that the data and index files don’t unnecessarily eat up your disk space.  If your app’s use-case is mostly-reads, this maybe OK but if you have write-heavy workloads, you may want to learn about how auto-compaction works in Couchbase Server. 

By design, documents in Couchbase Server are partitioned into vBuckets (or partitions). There are multiple files used for storage – a data file per partition (the “data files”), multiple index-files (active, replica and temp) per design document and a master file that has metadata related to the design documents and view definitions. For example on Mac OSX (as shown below), the sample ‘gamesim’ bucket has  64 individual data files, one per partition (0.couch.1 to 63.couch.1), and a master file that has design documents and other view metadata (master.couch.1)

 

Couchbase Data and Master File

The index files are in the @indexes folder and consist of the active index file starting with main_, the replica index file (if index replication is enabled) starting with replica_ and a temporary file that is used while building and updating the index starting with tmp_.

Index Files in Couchbase Server

 

Data and index files in Couchbase Server are organized as b-trees.  The root nodes (shown in red) contains pointers to the intermediate nodes, which contain pointers to the leaf nodes (shown in blue). In the case of data files, the root and intermediate nodes track the sizes of documents under their sub-tree.  The leaf nodes store the document id, document metadata and pointers to the document content. For index files, the root and intermediate nodes track the outputted map-function result and the reduce values under their subtree.

B-tree storage for data files (shown on the left) and index files (shown on the right)

All mutations including inserts, updates and deletes are written to the end of the file leaving old data in place. Add a new document? The b-tree grows at the end. Delete a document? That gets recorded at the end of the b-tree. For example, as shown in the Figure below, document A is mutated followed by a mutation to document B and then a new document D is added followed by another mutation to document A. Old data is shown by the red crossed-out nodes in the Figure below.

Logical data layout for data and index files

 

By examining the size of the documents tracked by the root node in the b-tree file structure, the ratio of the actual size to the current size of the file is calculated. If this ratio hits a certain threshold that is configurable as shown in the Figure below, an online compaction process is triggered. Compaction scans through the current data and index files and creates new data and index files, without the items marked for cleanup. During compaction, the b-trees are balanced and the reduce values are re-computed for the new tree. Additionally, data that does not belong to a particular node is also cleaned up. 
Finally to catch-up with the active workload that might have changed the old partition data file during compaction, Couchbase copies over the data that was appended since the start of the compaction process to the new partition data file so that it is up-to date. The new index file is also updated in this manner. The old partition data and index file are then deleted and the new data and index files are used. 
Normally, compaction is an all-or-nothing operation but since  compaction in Couchbase is on a per partition (vbucket) basis, the dataset can be  compacted incrementally without losing any changes it has made when aborted.

Configuring the compaction thresholds in the settings UI tab for data  and index files

 

Compaction in Couchbase Server is an  online operation. By default, auto-compaction kicks in when the fragmentation threshold reaches 30%, but you should test what settings works well for your workload and tune this setting accordingly.

Because  compaction is a resource intensive you can also schedule it during off-peak hours. To prevent auto compaction from taking place when your database is in heavy use, you can configure an off-peak time period during which compaction is allowed using the UI shown above.  For example, here I’ve set compaction to run between 12am and 1am server time every day. If the compaction operation does not complete in this time period, it will continue, but you can check the box to have it aborted. 
Compaction can also be triggered manually per bucket or per design document as shown in the Figures below.

Manually compacting a data bucket in Couchbase

 

Manually compacting a design document in Couchbase

Compaction performance in Couchbase Server depends on IO capacity and  proper cluster sizing. Your cluster must be properly sized so that there is enough capacity in all the various areas to support everything else the system is doing to maintain the required level of performance.  So, how do you tune compaction in Couchbase Server?
There is no magic bullet here ... Depending on your application’s IOPS requirement, you need to size your cluster properly and might want to test your workload across a variety of different storage hardware. If your app is write heavy, SSD’s might be the best option but for heavy read ratios, EBS might be a good solution at a low cost. 
By default, if both data and view indexes are configured for auto-compaction, compaction operates sequentially, first on the database and then on the views.  By enabling parallel compaction, both the databases and views can be compacted at the same time. This requires more CPU and disk I/O, but if the database and view indexes are stored on different physical disk devices ( as is our best practice anyway), the two can complete in parallel so that the index and data files does not grow extremely large.
Conclusion
At the end of the day, every database needs regular maintenance. Online compaction is a huge plus but you have to test your system and configure your compaction settings appropriately so that it does not affect your system load. 

这篇关于Compaction magic in Couchbase Server的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/344707

相关文章

SQL server配置管理器找不到如何打开它

《SQLserver配置管理器找不到如何打开它》最近遇到了SQLserver配置管理器打不开的问题,尝试在开始菜单栏搜SQLServerManager无果,于是将自己找到的方法总结分享给大家,对SQ... 目录方法一:桌面图标进入方法二:运行窗口进入方法三:查找文件路径方法四:检查 SQL Server 安

python连接本地SQL server详细图文教程

《python连接本地SQLserver详细图文教程》在数据分析领域,经常需要从数据库中获取数据进行分析和处理,下面:本文主要介绍python连接本地SQLserver的相关资料,文中通过代码... 目录一.设置本地账号1.新建用户2.开启双重验证3,开启TCP/IP本地服务二js.python连接实例1.

mysql出现ERROR 2003 (HY000): Can‘t connect to MySQL server on ‘localhost‘ (10061)的解决方法

《mysql出现ERROR2003(HY000):Can‘tconnecttoMySQLserveron‘localhost‘(10061)的解决方法》本文主要介绍了mysql出现... 目录前言:第一步:第二步:第三步:总结:前言:当你想通过命令窗口想打开mysql时候发现提http://www.cpp

SQL Server清除日志文件ERRORLOG和删除tempdb.mdf

《SQLServer清除日志文件ERRORLOG和删除tempdb.mdf》数据库再使用一段时间后,日志文件会增大,特别是在磁盘容量不足的情况下,更是需要缩减,以下为缩减方法:如果可以停止SQLSe... 目录缩减 ERRORLOG 文件(停止服务后)停止 SQL Server 服务:找到错误日志文件:删除

Windows Server服务器上配置FileZilla后,FTP连接不上?

《WindowsServer服务器上配置FileZilla后,FTP连接不上?》WindowsServer服务器上配置FileZilla后,FTP连接错误和操作超时的问题,应该如何解决?首先,通过... 目录在Windohttp://www.chinasem.cnws防火墙开启的情况下,遇到的错误如下:无法与

一文详解SQL Server如何跟踪自动统计信息更新

《一文详解SQLServer如何跟踪自动统计信息更新》SQLServer数据库中,我们都清楚统计信息对于优化器来说非常重要,所以本文就来和大家简单聊一聊SQLServer如何跟踪自动统计信息更新吧... SQL Server数据库中,我们都清楚统计信息对于优化器来说非常重要。一般情况下,我们会开启"自动更新

JAVA虚拟机中 -D, -X, -XX ,-server参数使用

《JAVA虚拟机中-D,-X,-XX,-server参数使用》本文主要介绍了JAVA虚拟机中-D,-X,-XX,-server参数使用,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有... 目录一、-D参数二、-X参数三、-XX参数总结:在Java开发过程中,对Java虚拟机(JVM)的启动参数进

Windows server服务器使用blat命令行发送邮件

《Windowsserver服务器使用blat命令行发送邮件》在linux平台的命令行下可以使用mail命令来发送邮件,windows平台没有内置的命令,但可以使用开源的blat,其官方主页为ht... 目录下载blatBAT命令行示例备注总结在linux平台的命令行下可以使用mail命令来发送邮件,Win

MySQL 中的服务器配置和状态详解(MySQL Server Configuration and Status)

《MySQL中的服务器配置和状态详解(MySQLServerConfigurationandStatus)》MySQL服务器配置和状态设置包括服务器选项、系统变量和状态变量三个方面,可以通过... 目录mysql 之服务器配置和状态1 MySQL 架构和性能优化1.1 服务器配置和状态1.1.1 服务器选项

查询SQL Server数据库服务器IP地址的多种有效方法

《查询SQLServer数据库服务器IP地址的多种有效方法》作为数据库管理员或开发人员,了解如何查询SQLServer数据库服务器的IP地址是一项重要技能,本文将介绍几种简单而有效的方法,帮助你轻松... 目录使用T-SQL查询方法1:使用系统函数方法2:使用系统视图使用SQL Server Configu