@synchronized, NSLock, pthread, OSSpinLock showdown, done right

2024-01-31 06:58

本文主要是介绍@synchronized, NSLock, pthread, OSSpinLock showdown, done right,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

http://perpendiculo.us/2009/09/synchronized-nslock-pthread-osspinlock-showdown-done-right/


Somewhere out there on the internet, there’s a “showdown” between @synchronized, NSLock, pthread mutexes, and OSSpinLock. It aims to measure their performance relative to each other, but uses sloppy code to perform the measuring. As a result, while the performance ordering is correct (@synchronized is the slowest, OSSpinLock is the fastest), the relative cost is severely misrepresented. Herein I attempt to rectify that benchmark.

Locking is absolutely required for critical sections. These arise in multithreaded code, and sometimes their performance can have severe consequences in applications. The problem with the aforementioned benchmark is that it did a bunch of extraneous work while it was locking/unlocking. It was doing the same amount of extraneous work, so the relative order was correct (the fastest was still the fastest, the slowest still the slowest, etc), but it didn’t properly show just how much faster the fastest was.

In the benchmark, the author used autorelease pools, allocated objects, and then released them all.  While locking.  This is a pretty reasonable use-case, but by no means the only one.  For most high-performance, multithreaded code, you’ll spend a _bunch_ of time trying to make the critical sections as small and fast as possible.  Large, slow critical sections effectively undo the multithreading speed up by causing threads to block each other out unnecessarily.  So when you’ve trimmed the critical sections down to the minimum, another sometimes-justified optimization is to optimize the amount of time spent locking/unlocking itself.

Just to make things exciting though, not all locking primitives are created equal.  Two of the 4 mentioned have special properties that can affect how long they take, and how the operate under pressure.  I’ll get to that towards the end.

First up, here’s my “no-nonsense” microbench code:

#import <Foundation/Foundation.h>
#import <objc/runtime.h>
#import <objc/message.h>
#import <libkern/OSAtomic.h>
#import <pthread.h>#define ITERATIONS (1024*1024*32)static unsigned long long disp=0, land=0;int main()
{double then, now;unsigned int i, count;pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;OSSpinLock spinlock = OS_SPINLOCK_INIT;NSAutoreleasePool *pool = [NSAutoreleasePool new];NSLock *lock = [NSLock new];then = CFAbsoluteTimeGetCurrent();for(i=0;i<ITERATIONS;++i){[lock lock];[lock unlock];}now = CFAbsoluteTimeGetCurrent();printf("NSLock: %f sec\n", now-then);    then = CFAbsoluteTimeGetCurrent();IMP lockLock = [lock methodForSelector:@selector(lock)];IMP unlockLock = [lock methodForSelector:@selector(unlock)];for(i=0;i<ITERATIONS;++i){lockLock(lock,@selector(lock));unlockLock(lock,@selector(unlock));}now = CFAbsoluteTimeGetCurrent();printf("NSLock+IMP Cache: %f sec\n", now-then);    then = CFAbsoluteTimeGetCurrent();for(i=0;i<ITERATIONS;++i){pthread_mutex_lock(&mutex);pthread_mutex_unlock(&mutex);}now = CFAbsoluteTimeGetCurrent();printf("pthread_mutex: %f sec\n", now-then);then = CFAbsoluteTimeGetCurrent();for(i=0;i<ITERATIONS;++i){OSSpinLockLock(&spinlock);OSSpinLockUnlock(&spinlock);}now = CFAbsoluteTimeGetCurrent();printf("OSSpinlock: %f sec\n", now-then);id obj = [NSObject new];then = CFAbsoluteTimeGetCurrent();for(i=0;i<ITERATIONS;++i){@synchronized(obj){}}now = CFAbsoluteTimeGetCurrent();printf("@synchronized: %f sec\n", now-then);[pool release];return 0;
}

We do 5 tests:  We test NSLock, NSLock with IMP caching, pthread mutexes, OSSpinLocks, and then finally @synchronized.  We simply lock and unlock 33554432 times (that’s 1024*1024*32 for those keeping score at home ;), and see how long it takes.  No allocation, no releases, no autorelease pools, nothing.  Just pure lock/unlock goodness.  I ran the test a few times, and averaged the results (so overall, the results are from something like 100 million lock/unlock cycles each)

  1. NSLock: 3.5175 sec
  2. NSLock+IMP Cache: 3.1165 sec
  3. Mutex: 1.5870 sec
  4. SpinLock: 1.0893
  5. @synchronized: 9.9488 sec
Lock Performance

Lock Performance

From the above graph, we can see a couple thing:  First, @synchronized is _Really_ expensive — like, 3 times as expensive as anything else.  We’ll get into why that is in a moment.  Otherwise, we see that NSLock and NSLock+IMP Cache are pretty close — these are built on top of pthread mutexes, but we have to pay for the extra ObjC overhead.  Then there’s Mutex (pthread mutexes) and SpinLock — these are pretty close, but even then SpinLock is almost 30% faster than Mutex.  We’ll get into that one too.  So from top to bottom we have almost an order of magnitude difference between the worst and best.

The nice part about these all is that they all take about the same amount of code — using NSLock takes as many lines as a pthread mutex, and the same number for a spinlock.  @synchronized saves a line or two, but with a cost like that it quickly looks unappealing in all but the most trivial of cases.

So, what makes @sychronized and SpinLock so different from the others?

@synchronized is very heavy weight because it has to set up an exception handler, and it actually ends up taking a few internal locks on its way there.  So instead of a simple cheap lock, you’re paying for a couple locks/unlocks just to acquire your measly lock.  Those take time.

OSSpinLock, on the other hand, doesn’t even enter the kernel — it just keeps reloading the lock, hoping that it’s unlocked.  This is terribly inefficient if locks are held for more than a few nanoseconds, but it saves a costly system call and a couple context switches.  Pthread mutexes actually use an OSSpinLock first, to keep things running smoothly where there’s no contention.  When there is, it resorts to heavier, kernel-level locking/tasking stuff.

So, if you’ve got hotly-contested locks, OSSpinLock probably isn’t for you (unless your critical sections are _Really_ _Fast_).  Pthread mutexes are a tiny bit more expensive, but they avoid the power-wasting effects of OSSpinLock.

NSLock is a pretty wrapper on pthread mutexes.  They don’t provide much else, so there’s not much point in using them over pthread mutexes.

Of course, standard optimization disclaimers apply:  don’t do it until you’re sure you’ve chosen the correct algorithms, have profiled to find hotspots, and have found locking to be one of those hot items.  Otherwise, you’re wasting your time on something that’s likely to provide minimal benefits.

4 Comments »

  1. This is very interesting and useful, thanks for the sample code too!

    Comment by Zachary Howe — 2012.08.10 @ 12:48 pm

  2. thank you very much, With your article, i finally konw the speed of different sync mechanism

    Comment by maple — 2012.11.01 @ 8:05 am

  3. nice article 

这篇关于@synchronized, NSLock, pthread, OSSpinLock showdown, done right的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/662915

相关文章

关键字synchronized、volatile的比较

关键字volatile是线程同步的轻量级实现,所以volatile性能肯定比synchronized要好,并且volatile只能修饰于变量,而synchronized可以修饰方法,以及代码块。随着JDK新版本的发布,synchronized关键字的执行效率上得到很大提升,在开发中使用synchronized关键字的比率还是比较大的。多线程访问volatile不会发生阻塞,而synchronize

面试官:synchronized的锁升级过程是怎样的?

大家好,我是大明哥,一个专注「死磕 Java」系列创作的硬核程序员。 回答 在 JDK 1.6之前,synchronized 是一个重量级、效率比较低下的锁,但是在JDK 1.6后,JVM 为了提高锁的获取与释放效,,对 synchronized 进行了优化,引入了偏向锁和轻量级锁,至此,锁的状态有四种,级别由低到高依次为:无锁、偏向锁、轻量级锁、重量级锁。 锁升级就是无锁 —>

SylixOS pthread_join退出

1 问题描述 在移植中间件过程中,在SylixOS下调用pthread_join时,如果线程在pthread_join等待之前结束,则线程返回无效线程错误值。在Linux下这种调用会正常返回。两种实现是有差别的,实现的原理分别如下。 2 函数实现机制 2.1 实现机制 在SylixOS下调用pthread_join时,如果线程在pthread_join等待之前结束,线程返回无效线程错误标志

【Java编程的思想】理解synchronized

用法和基本原理 synchronized可以用于修饰类的实例方法、静态方法和代码块 实例方法 在介绍并发基础知识的时候,有一部分是关于竞态条件的,当多个线程访问和操作同一个对象时,由于语句不是原子操作,所以得到了不正确的结果。这个地方就可以用synchronized进行处理 public class Counter {private int count;public synchroni

【大数据Java基础- Java并发 20】深入分析synchronized的实现原理

记得刚刚开始学习Java的时候,一遇到多线程情况就是synchronized,相对于当时的我们来说synchronized是这么的神奇而又强大,那个时候我们赋予它一个名字“同步”,也成为了我们解决多线程情况的百试不爽的良药。但是,随着我们学习的进行我们知道synchronized是一个重量级锁,相对于Lock,它会显得那么笨重,以至于我们认为它不是那么的高效而慢慢摒弃它。 诚然,随着Javs S

Csting Left Mid Right

 CString Left( int nCount ) const;                   //从左边1开始获取前 nCount 个字符 CString Mid( int nFirst ) const;                      //从左边第 nCount+1 个字符开始,获取后面所有的字符 CString Mid( int nFirst, int nC

java线程 yield,sleep,join,synchronized wait notify notifyAll,ReentrantLock lock condition, 生产者消费者

yield,sleep,join yield,join,sleep,join是Thread中的方法,不需要 在synchronized 代码块中调用,和synchronized 没关系,也不会释放锁。 Thread.sleep(100);Thread.yield();Thread t;t.join(); (1)yield()不一定保证让出cpu yield()只是使当前线程重新回

java线程锁 synchronized

//java锁是对同一个对象或者同一个对象中的方法加锁;关键是同一个 错误的加锁方式 public class MyWaitNotify { public static void main(String[] args) { MyT m1=new MyT("A"); MyT m2=new MyT("B"); m1.start(); m2.start(); } } class MyT

java synchronized原理与 为何锁升级及过程

关于锁升级 java1.6之前Syntronized 没有锁升级概念,只有重量锁:即用户态和内核态的上下文切换 会比较浪费时间。 java1.6之后,Syntronized关键字 开始有锁升级的概念,即偏向锁,轻量级锁,重量级锁。   注意CAS不是自旋锁,(CAS有3个操作数,内存值V,旧的预期值A,要修改的新值B。当且仅当预期值A和内存值V相同时,将内存值V修改为B,否则什么都不做。)

多线程 | synchronized的简单使用

synchronized 关键字是 Java 中解决并发问题的一种常用方法,也是最简单的一种方法,其作用有三个: (1)互斥性:确保线程互斥的访问同步代码 (2)可见性:保证共享变量的修改能够及时可见 (3)有序性:有效解决重排序问题, 其用法也有三个: 修饰实例方法修饰静态方法修饰代码块 修饰实例方法 public class SynchronizedInstanceMethod impl