golang中的内存缓存如何避免被GC扫描,BigCache实现原理

2024-06-21 18:52

本文主要是介绍golang中的内存缓存如何避免被GC扫描,BigCache实现原理,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

GC到底清理的是什么?

Golang是函数式编程语言,如果是函数内定义的临时变量,在函数退出时会被自动清理掉不需要GC参与;如果使用了指针,那么即使函数退出了也不会将其清理,这个时候就需要全局的GC来清扫。

对于cache组件来说,存储的对象比较多,本质上就是一个大的哈希表,如果GC要去扫描这些对象的话可能会造成大量的延迟,因此我们不需要GC来扫描它们。

利用 Go 1.5+ 的特性:当 map 中的 key 和 value 都是基础类型时,GC 就不会扫到 map 里的 key 和 value

对于key:golang中的字符串本质上也是指针,于是将它进行hash操作,将字符串转换成整型,信息经过hash操作后有可能会丢失部分信息,为了避免hash冲突时分不清KEY值,所以会将key放到value中一起存储。

对于value:构建一个超大的byte数组buf,将原来的key和value信息经过序列化变成二进制数据,将其存放在buf中,并记录下它的offset,然后将offset值存到map的value位置,即 map[key-hash]offset。为了节省内存消耗,会将buf设计为环形队列的结构。

BigCache:https://github.com/allegro/bigcache

https://syslog.ravelin.com/further-dangers-of-large-heaps-in-go-7a267b57d487

to keep the amount of GC work down you essentially have two choices as follows.1、Make sure the memory you allocate contains no pointers. That means no slices, no strings, no time.Time, and definitely no pointers to other allocations. If an allocation has no pointers it gets marked as such and the GC does not scan it.2、Allocate the memory off-heap by directly calling the mmap syscall yourself. Then the GC knows nothing about the memory. This has upsides and downsides. The downside is that this memory can’t really be used to reference objects allocated normally, as the GC may think they are no longer in-use and free them.

How it works

BigCache relies on optimization presented in 1.5 version of Go (issue-9477). This optimization states that if map without pointers in keys and values is used then GC will omit its content. Therefore BigCache uses map[uint64]uint32 where keys are hashed and values are offsets of entries.Entries are kept in byte slices, to omit GC again. Byte slices size can grow to gigabytes without impact on performance because GC will only see single pointer to it.

Bigcache vs Freecache

Both caches provide the same core features but they reduce GC overhead in different ways. Bigcache relies on map[uint64]uint32, freecache implements its own mapping built on slices to reduce number of pointers.

bigcache 开发团队的博客的测试数据:

With an empty cache, this endpoint had maximum responsiveness latency of 10ms for 10k rps. When the cache was filled, it had more than a second latency for 99th percentile. Metrics indicated that there were over 40 mln objects in the heap and GC mark and scan phase took over four seconds.

最终他们采用了 map[uint64]uint64 作为 cacheShard 中的关键存储。key 是 sharding 时得到的 uint64 hashed key,value 则只存 offset ,整体使用 FIFO 的 bytes queue,也符合按照时序淘汰的需求。

经过优化,bigcache 在 2000w 条记录下 GC 的表现:

go version 
go version go1.13 linux/arm64go run caches_gc_overhead_comparison.go Number of entries:  20000000
GC pause for bigcache:  22.382827ms
GC pause for freecache:  41.264651ms
GC pause for map:  72.236853ms

初始化操作

// BigCache is fast, concurrent, evicting cache created to keep big number of entries without impact on performance.
// It keeps entries on heap but omits GC for them. To achieve that, operations take place on byte arrays,
// therefore entries (de)serialization in front of the cache will be needed in most use cases.
type BigCache struct {shards     []*cacheShard // 缓存分片lifeWindow uint64 // 过期时间clock      clockhash       Hasherconfig     ConfigshardMask  uint64close      chan struct{}
}type Config struct {// Number of cache shards, value must be a power of twoShards int// Time after which entry can be evictedLifeWindow time.Duration// Interval between removing expired entries (clean up).// If set to <= 0 then no action is performed. Setting to < 1 second is counterproductive — bigcache has a one second resolution.CleanWindow time.Duration// Max number of entries in life window. Used only to calculate initial size for cache shards.// When proper value is set then additional memory allocation does not occur.MaxEntriesInWindow int// Max size of entry in bytes. Used only to calculate initial size for cache shards.MaxEntrySize int// StatsEnabled if true calculate the number of times a cached resource was requested.StatsEnabled bool// Verbose mode prints information about new memory allocationVerbose bool// Hasher used to map between string keys and unsigned 64bit integers, by default fnv64 hashing is used.Hasher Hasher// HardMaxCacheSize is a limit for BytesQueue size in MB.// It can protect application from consuming all available memory on machine, therefore from running OOM Killer.// Default value is 0 which means unlimited size. When the limit is higher than 0 and reached then// the oldest entries are overridden for the new ones. The max memory consumption will be bigger than// HardMaxCacheSize due to Shards' s additional memory. Every Shard consumes additional memory for map of keys// and statistics (map[uint64]uint32) the size of this map is equal to number of entries in// cache ~ 2×(64+32)×n bits + overhead or map itself.HardMaxCacheSize int// OnRemove is a callback fired when the oldest entry is removed because of its expiration time or no space left// for the new entry, or because delete was called.// Default value is nil which means no callback and it prevents from unwrapping the oldest entry.// ignored if OnRemoveWithMetadata is specified.OnRemove func(key string, entry []byte)// OnRemoveWithMetadata is a callback fired when the oldest entry is removed because of its expiration time or no space left// for the new entry, or because delete was called. A structure representing details about that specific entry.// Default value is nil which means no callback and it prevents from unwrapping the oldest entry.OnRemoveWithMetadata func(key string, entry []byte, keyMetadata Metadata)// OnRemoveWithReason is a callback fired when the oldest entry is removed because of its expiration time or no space left// for the new entry, or because delete was called. A constant representing the reason will be passed through.// Default value is nil which means no callback and it prevents from unwrapping the oldest entry.// Ignored if OnRemove is specified.OnRemoveWithReason func(key string, entry []byte, reason RemoveReason)onRemoveFilter int// Logger is a logging interface and used in combination with `Verbose`// Defaults to `DefaultLogger()`Logger Logger
}// DefaultConfig initializes config with default values.
// When load for BigCache can be predicted in advance then it is better to use custom config.
func DefaultConfig(eviction time.Duration) Config {return Config{Shards:             1024,LifeWindow:         eviction,CleanWindow:        1 * time.Second,MaxEntriesInWindow: 1000 * 10 * 60,MaxEntrySize:       500,StatsEnabled:       false,Verbose:            true,Hasher:             newDefaultHasher(),HardMaxCacheSize:   0,Logger:             DefaultLogger(),}
}

其中分片的个数 Shards必须为 2 的倍数。

func (c *BigCache) Set(key string, entry []byte) errorentry的类型只能是[]byte,如果你要存结构体,我们还需要使用 json.Marshal 这样的工具来序列化。

对key进行hash

func (f fnv64a) Sum64(key string) uint64 {var hash uint64 = offset64for i := 0; i < len(key); i++ {hash ^= uint64(key[i])hash *= prime64}return hash
}

根据hash找到shard

shardMask:  uint64(config.Shards - 1),func (c *BigCache) getShard(hashedKey uint64) (shard *cacheShard) {return c.shards[hashedKey&c.shardMask]
}默认值 shards = 1024 (10000000000),shardMask = 1023 (1111111111),然后进行按位与运算,它要比取模运算效率高。

然后向shard 中写入

type cacheShard struct {hashmap     map[uint64]uint64 // 索引列表,hashedkey-indexOfEntryentries     queue.BytesQueue  // 实际数据存储lock        sync.RWMutexentryBuffer []byteonRemove    onRemoveCallbackisVerbose    boolstatsEnabled boollogger       Loggerclock        clocklifeWindow   uint64hashmapStats map[uint64]uint32stats        StatscleanEnabled bool
}func (s *cacheShard) set(key string, hashedKey uint64, entry []byte) error {currentTimestamp := uint64(s.clock.Epoch())s.lock.Lock()// 冲突检查,将之前的key对应value置为空,粗暴解决哈希冲突问题if previousIndex := s.hashmap[hashedKey]; previousIndex != 0 {if previousEntry, err := s.entries.Get(int(previousIndex)); err == nil {resetHashFromEntry(previousEntry)//remove hashkeydelete(s.hashmap, hashedKey)}}if !s.cleanEnabled {// 每次插入新数据时,bigCache 都会获取 BytesQueue 头部数据if oldestEntry, err := s.entries.Peek(); err == nil {// 然后判断数据是否过期,如果过期则删除s.onEvict(oldestEntry, currentTimestamp, s.removeOldestEntry)}}// 包装数据,得到 []bytew := wrapEntry(currentTimestamp, hashedKey, key, entry, &s.entryBuffer)for {// 找到插入位置,记录在 hashmapif index, err := s.entries.Push(w); err == nil {s.hashmap[hashedKey] = uint64(index)s.lock.Unlock()return nil}// 没有找到插入位置时因为没有空间了,所以要删除。if s.removeOldestEntry(NoSpace) != nil {s.lock.Unlock()return errors.New("entry is bigger than max shard size")}}
}func wrapEntry(timestamp uint64, hash uint64, key string, entry []byte, buffer *[]byte) []byte {keyLength := len(key)blobLength := len(entry) + headersSizeInBytes + keyLengthif blobLength > len(*buffer) {*buffer = make([]byte, blobLength)}blob := *bufferbinary.LittleEndian.PutUint64(blob, timestamp)binary.LittleEndian.PutUint64(blob[timestampSizeInBytes:], hash)binary.LittleEndian.PutUint16(blob[timestampSizeInBytes+hashSizeInBytes:], uint16(keyLength))copy(blob[headersSizeInBytes:], key)copy(blob[headersSizeInBytes+keyLength:], entry)return blob[:blobLength]
}// BytesQueue is a non-thread safe queue type of fifo based on bytes array.
// For every push operation index of entry is returned. It can be used to read the entry later
type BytesQueue struct {full         boolarray        []byte // 实际存储数据的地方capacity     intmaxCapacity  inthead         inttail         intcount        intrightMargin  intheaderBuffer []byteverbose      bool
}
cacheShard.entryBuffer
它是一个容器,用来包装数据,可以重复利用,它的值为
entryBuffer:  make([]byte, config.MaxEntrySize+headersSizeInBytes)它表示的是一个entry占用的最大存储空间。如果某个entry占用空间比entryBuffer还大,那么entryBuffer就会被替换掉。

在bigCache中,bigCache将数据存储在BytesQueue中,BytesQueue的底层结构是[]byte ,这样只给GC增加了一个额外对象,由于字节切片除了自身对象并不包含其他指针数据,所以GC对于整个对象的标记时间是O(1)的。

关于哈希冲突

func (s *cacheShard) get(key string, hashedKey uint64) ([]byte, error) {s.lock.RLock()wrappedEntry, err := s.getWrappedEntry(hashedKey)if err != nil {s.lock.RUnlock()return nil, err}if entryKey := readKeyFromEntry(wrappedEntry); key != entryKey {s.lock.RUnlock()s.collision()if s.isVerbose {s.logger.Printf("Collision detected. Both %q and %q have the same hash %x", key, entryKey, hashedKey)}return nil, ErrEntryNotFound}entry := readEntry(wrappedEntry)s.lock.RUnlock()s.hit(hashedKey)return entry, nil
}

结合Set方法我们知道,在写入的时候是直接覆盖的,在读取的时候会直接报不存在。

BigCache does not handle collisions. When new item is inserted and it's hash collides with previously stored item, new item overwrites previously stored value.

删除
对应的Key的删除其实就是把 entries 中 arrary 对应的 itemIndex 上置为0。其实这种做法并没有正在删除数据,只是置为0,实际的内存并没有归还,但是把 s.hashmap 中的key对应的 index 删除了。这就做了假删除。用户已经查询不到这个数据了。

缓存过期

bigCache 可以为插入的数据设置过期时间,但是缺点是所有数据的过期时间都是一样的。bigCache 中自动删除数据有两种场景:

  • 在插入数据时删除过期数据
  • 通过设置 CleanWindow,启动 goroutine 后台定时批量删除过期数据

这篇关于golang中的内存缓存如何避免被GC扫描,BigCache实现原理的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1082049

相关文章

python生成随机唯一id的几种实现方法

《python生成随机唯一id的几种实现方法》在Python中生成随机唯一ID有多种方法,根据不同的需求场景可以选择最适合的方案,文中通过示例代码介绍的非常详细,需要的朋友们下面随着小编来一起学习学习... 目录方法 1:使用 UUID 模块(推荐)方法 2:使用 Secrets 模块(安全敏感场景)方法

Spring StateMachine实现状态机使用示例详解

《SpringStateMachine实现状态机使用示例详解》本文介绍SpringStateMachine实现状态机的步骤,包括依赖导入、枚举定义、状态转移规则配置、上下文管理及服务调用示例,重点解... 目录什么是状态机使用示例什么是状态机状态机是计算机科学中的​​核心建模工具​​,用于描述对象在其生命

Spring Boot 结合 WxJava 实现文章上传微信公众号草稿箱与群发

《SpringBoot结合WxJava实现文章上传微信公众号草稿箱与群发》本文将详细介绍如何使用SpringBoot框架结合WxJava开发工具包,实现文章上传到微信公众号草稿箱以及群发功能,... 目录一、项目环境准备1.1 开发环境1.2 微信公众号准备二、Spring Boot 项目搭建2.1 创建

IntelliJ IDEA2025创建SpringBoot项目的实现步骤

《IntelliJIDEA2025创建SpringBoot项目的实现步骤》本文主要介绍了IntelliJIDEA2025创建SpringBoot项目的实现步骤,文中通过示例代码介绍的非常详细,对大家... 目录一、创建 Spring Boot 项目1. 新建项目2. 基础配置3. 选择依赖4. 生成项目5.

golang程序打包成脚本部署到Linux系统方式

《golang程序打包成脚本部署到Linux系统方式》Golang程序通过本地编译(设置GOOS为linux生成无后缀二进制文件),上传至Linux服务器后赋权执行,使用nohup命令实现后台运行,完... 目录本地编译golang程序上传Golang二进制文件到linux服务器总结本地编译Golang程序

Linux下删除乱码文件和目录的实现方式

《Linux下删除乱码文件和目录的实现方式》:本文主要介绍Linux下删除乱码文件和目录的实现方式,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录linux下删除乱码文件和目录方法1方法2总结Linux下删除乱码文件和目录方法1使用ls -i命令找到文件或目录

SpringBoot+EasyExcel实现自定义复杂样式导入导出

《SpringBoot+EasyExcel实现自定义复杂样式导入导出》这篇文章主要为大家详细介绍了SpringBoot如何结果EasyExcel实现自定义复杂样式导入导出功能,文中的示例代码讲解详细,... 目录安装处理自定义导出复杂场景1、列不固定,动态列2、动态下拉3、自定义锁定行/列,添加密码4、合并

mybatis执行insert返回id实现详解

《mybatis执行insert返回id实现详解》MyBatis插入操作默认返回受影响行数,需通过useGeneratedKeys+keyProperty或selectKey获取主键ID,确保主键为自... 目录 两种方式获取自增 ID:1. ​​useGeneratedKeys+keyProperty(推

Spring Boot集成Druid实现数据源管理与监控的详细步骤

《SpringBoot集成Druid实现数据源管理与监控的详细步骤》本文介绍如何在SpringBoot项目中集成Druid数据库连接池,包括环境搭建、Maven依赖配置、SpringBoot配置文件... 目录1. 引言1.1 环境准备1.2 Druid介绍2. 配置Druid连接池3. 查看Druid监控

Linux在线解压jar包的实现方式

《Linux在线解压jar包的实现方式》:本文主要介绍Linux在线解压jar包的实现方式,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录linux在线解压jar包解压 jar包的步骤总结Linux在线解压jar包在 Centos 中解压 jar 包可以使用 u