CLR Threadpool 的内部机制

2024-06-12 18:32
文章标签 机制 内部 clr threadpool

本文主要是介绍CLR Threadpool 的内部机制,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

转自:New and Improved CLR 4 Thread Pool Engine

A way to think of the CLR thread pool and how it gets used, e.g. when you call ThreadPool.QueueUserWorkItem, is to picture a global queue where work items (essentially delegates) get queued on a global queue and multiple threads pick them out in a First In First Out order. The FIFO order is not something that is documented or guaranteed, but my personal guess is that too many applications rely on it, so I don’t see it changing any time soon.

The image on the left shows the main program thread as it is creating a work item; the second image shows the global thread pool queue after code has queued 3 work items; the third image shows 2 threads from the thread pool that have grabbed 2 work items and executing them. If in the context of those work items (i.e. from the executing code in the delegates) more work items get created for the CLR thread pool they end up on the global queue (see image on the right) and life goes on.

CLR Thread Pool v4 from a System.Threading.Tasks perspective
In CLR 4, the thread pool engine has had some improvements made to it (it has been having positive tweaks in every release of the CLR) and part of these improvements are some performance gains that are achievable when using the new System.Threading.Tasks.Task type. I'll show a code example in another post, but you can think of creating and starting a Task (passing it a delegate) as the equivalent of calling QueueUserWorkItem on the ThreadPool. A way to visualize the CLR thread pool when used via the Task-based API is that, in addition to the single global queue, each thread in the thread pool has its own local queue:

Just as with normal thread pool usage, the main program thread may create Tasks that will get queued on the global queue (e.g. Task1 and Task2) and threads will grab those Tasks typically in a FIFO manner. Where things diverge is that any new Tasks (e.g. Task3) created in the context of the executing Task (e.g. Task2) end up on a local queue for that thread pool thread.

Why Local Queues
With the era of manycore machines upon us and devs taking advantage of parallelism, the number of threads in the thread pool is likely to increase: at a minimum equal to the number of cores for compute bound operations, and likely more due to injection of additional threads as a result of IO bound operations or blocking calls stalling the CPU. Bottom line: more cores = more threads.

With more threads competing for work items, it is not optimal to put up with the contention issues of a single queue that all of them are trying to access safely. This would be amplified by the goal of fine grained parallelism where each work item finishes fairly quickly, so the trips to the global queue would be frequent.

It is for this reason we introduce a local queue per thread, where additional tasks get queued (with no contention) and will then be retrieved by that same thread (again with no contention).

LIFO
So, picking up from the picture further up, let's assume that Task2 additionally creates two more Tasks e.g. Task4 and Task5.

The tasks end up on the local queue as expected, but which Task does the thread pick to execute when it completes its current task (i.e. Task2)? The initially surprising answer is that it could be Task5, which is the last one that was queued – in other words a LIFO algorithm can be used for the local queues.

The reason that this is a good thing is locality. In most scenarios the data required by the last created Task in the queue is still hot in the cache, so it makes sense to pull that down and execute it. Obviously, this means there are no promises on ordering, but some level of fairness is relinquished in the sake of better performance.

Work Stealing
If the only enhancements were the introduction of LIFO local queues then performance is greatly increased, but you should be concerned. You should be concerned about what happens when another thread in the thread pool (likely executing on another core) finishes its work. Luckily, you don't have to be:

The other worker thread completes Task1 and then goes to its local queue and finds it empty; it then goes to the global queue and finds it empty. We don't want it sitting there idle so a beautiful thing happens: work stealing. The thread goes to a local queue of another thread and "steals" a Task and executes it! That way we keep all our cores busy and this contributes to our fine grained parallelism load balancing goal. In the image above notice that "stealing" happens in a FIFO manner, which again for locality reasons is good (its data would be cold in the cache). Furthermore, in many divide and conquer scenarios, tasks generated earlier on are likely to generate more work (e.g. Task6) themselves, which would end up now on this other thread's queue and hence reduce frequent stealing.
Nitpicking note: further up, I mentioned "no contention"; clearly work stealing is the exception to that rule.

What's Next
In addition to the links sprinkled above, you can find a simple implementation of a work stealing threadpool on Joe's blog post and some of the concepts above are touched on in this MSDN mag article by Eric and Erika. If you want to see the pictures above in a fully animated slide, get my PDC deck (slide 8) from this page. Do note that the implementation I am talking about above did not ship with the September CTP of Visual Studio 2010, but is slated for the next drop. Also note that all internal implementation details are subject to change and just shared here for the geek factor ;-)

So you can see how using the Task-based API will yield cool perf gains under the covers. An additional reason to use the new Task-based API is because of its richness. I'll touch on that in a next blog post.

这篇关于CLR Threadpool 的内部机制的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1055003

相关文章

C++如何通过Qt反射机制实现数据类序列化

《C++如何通过Qt反射机制实现数据类序列化》在C++工程中经常需要使用数据类,并对数据类进行存储、打印、调试等操作,所以本文就来聊聊C++如何通过Qt反射机制实现数据类序列化吧... 目录设计预期设计思路代码实现使用方法在 C++ 工程中经常需要使用数据类,并对数据类进行存储、打印、调试等操作。由于数据类

SpringRetry重试机制之@Retryable注解与重试策略详解

《SpringRetry重试机制之@Retryable注解与重试策略详解》本文将详细介绍SpringRetry的重试机制,特别是@Retryable注解的使用及各种重试策略的配置,帮助开发者构建更加健... 目录引言一、SpringRetry基础知识二、启用SpringRetry三、@Retryable注解

SpringKafka错误处理(重试机制与死信队列)

《SpringKafka错误处理(重试机制与死信队列)》SpringKafka提供了全面的错误处理机制,通过灵活的重试策略和死信队列处理,下面就来介绍一下,具有一定的参考价值,感兴趣的可以了解一下... 目录引言一、Spring Kafka错误处理基础二、配置重试机制三、死信队列实现四、特定异常的处理策略五

java中反射(Reflection)机制举例详解

《java中反射(Reflection)机制举例详解》Java中的反射机制是指Java程序在运行期间可以获取到一个对象的全部信息,:本文主要介绍java中反射(Reflection)机制的相关资料... 目录一、什么是反射?二、反射的用途三、获取Class对象四、Class类型的对象使用场景1五、Class

Java捕获ThreadPoolExecutor内部线程异常的四种方法

《Java捕获ThreadPoolExecutor内部线程异常的四种方法》这篇文章主要为大家详细介绍了Java捕获ThreadPoolExecutor内部线程异常的四种方法,文中的示例代码讲解详细,感... 目录方案 1方案 2方案 3方案 4结论方案 1使用 execute + try-catch 记录

Nginx之upstream被动式重试机制的实现

《Nginx之upstream被动式重试机制的实现》本文主要介绍了Nginx之upstream被动式重试机制的实现,可以通过proxy_next_upstream来自定义配置,具有一定的参考价值,感兴... 目录默认错误选择定义错误指令配置proxy_next_upstreamproxy_next_upst

Spring排序机制之接口与注解的使用方法

《Spring排序机制之接口与注解的使用方法》本文介绍了Spring中多种排序机制,包括Ordered接口、PriorityOrdered接口、@Order注解和@Priority注解,提供了详细示例... 目录一、Spring 排序的需求场景二、Spring 中的排序机制1、Ordered 接口2、Pri

MySQL 缓存机制与架构解析(最新推荐)

《MySQL缓存机制与架构解析(最新推荐)》本文详细介绍了MySQL的缓存机制和整体架构,包括一级缓存(InnoDBBufferPool)和二级缓存(QueryCache),文章还探讨了SQL... 目录一、mysql缓存机制概述二、MySQL整体架构三、SQL查询执行全流程四、MySQL 8.0为何移除查

一文详解Java Condition的await和signal等待通知机制

《一文详解JavaCondition的await和signal等待通知机制》这篇文章主要为大家详细介绍了JavaCondition的await和signal等待通知机制的相关知识,文中的示例代码讲... 目录1. Condition的核心方法2. 使用场景与优势3. 使用流程与规范基本模板生产者-消费者示例

一文带你理解Python中import机制与importlib的妙用

《一文带你理解Python中import机制与importlib的妙用》在Python编程的世界里,import语句是开发者最常用的工具之一,它就像一把钥匙,打开了通往各种功能和库的大门,下面就跟随小... 目录一、python import机制概述1.1 import语句的基本用法1.2 模块缓存机制1.