本文主要是介绍Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask 笔记,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask 论文阅读笔记
看了 Ref[1] 和 Ref[2],基本就差不多了
Vectorization:materialization 开销,可以利用 SIMD 并行数据操作,最好是 column store
Code gen:指令数少,利于计算密集型
- join (memory bound):向量化快
- memory load 消耗 CPU cycle,向量化减少 cache miss
- computation (CPU intensive task):code gen 快
- cache 压力小,code gen 指令数少,高效利用 register
- selection 使用 SIMD
- 越多 select,越稀疏,column 上 offset 越大,导致 cache miss
消除分支的操作:a>b?1:0
可以被写成没有分支的语句 setg
Reference
- 在 2019.4.20 杭州举办的 Infra Meetup No.98 上,我司 TiDB 研发工程师徐怀宇为大家带来了《Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask》论文分享。
- CMU 15-721 Advanced Database Systems (Spring 2018)
- A Deep Dive into Query Execution Engine of Spark SQL - Maryann Xue - You Tube
- Vectorization vs. compilation in query execution
这篇关于Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask 笔记的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!