Internals of QMutex in Qt 5
You may want to read this blog if you want to understand the internals of QMutex or if you are interested in lock-free algorithms and want to discover one. You do not need to read or understand this article if you just want to use QMutex in your application. I found implementing QMutex interesting and therefore I want to share the knowledge.
I am going to show and explain a simplified version of the real code. If you want to see the real code, you can just browse the Qt source code. This article will help you to understand the concepts.
In order to better understand this article, you need to know some basics of lock-free programming. If you understand what the ABA problem is and know how to make a lock-free stack, you are probably fine. Otherwise I recommend reading my previous post as an introduction.
The contents of this blog entry are taken from the last part of my presentation at the Qt Developer days 2011.
QMutex
Before understanding how something works, one needs to understand what it does first. Here is the basic interface we are going to analyze.
class QMutex { public: void lock(); void unlock(); bool tryLock(); private: //... };
What it does is unsurprising. If you don't guess what those functions do, you can read it in the QMutex documentation.
Motivation
We want QMutex to be as efficient as possible. QMutex in Qt 4.8 is already quite fast because it is already using atomic operations. So don't expect differences in speed between Qt 4.8 and Qt 5. But what we improved is more from a memory perspective. We want QMutex to use as little memory as possible, so you can put more QMutex in your objects to have more lock granularity.
sizeof(QMutex) == sizeof(void*).
That condition was already true in Qt 4. But Qt 4 has lots of hidden costs because it is using pimpl, as almost every class in Qt does. Constructing a QMutex in Qt 4 allocates and initializes a QMutexPrivate which itself initializes some platform specific primitives. So we have a memory overhead of over 120 bytes per mutex and also the cost of initialization/destruction. In Qt 5, there are no more hidden costs per mutex.
Another good point is that it is now a POD.
This is good because QMutex is often used to lock global resources.
Then a global mutex seems the obvious thing to use.
But, in Qt, we want to avoid the use of global objects. Indeed,
global objects should
be avoided
especially in library code, because the order of initialization is unspecified and they slow down
the start-up even if that part of the library is not used.
But as PODs do not need to be initialized, they can be used as global object.
Note: QMutex is not a POD because it stil has a constructor. QMutex inherit from the QBasicMutex class which
is the same but without the constructor. QBasicMutex should be used for static global objects.
QBasicMutex singletonMutex; // static global object !!! (OK only for POD) MySingleton *MySingleton::self() { QMutexLocker locker(&singletonMutex) if(!m_self) m_self = new MySingleton; return m_self; }
That code is dangerous in Qt 4 because maybe another global object in another compilation unit wants to make use of the singleton and the mutex could be used before it is created.
Summary of the QMutex changes from Qt 4.8 to Qt 5
- QMutex uses much less memory.
- The construction and destruction is faster.
- It is now suitable as a global object. (Using QBasicMutex)
- The cost of locking or unlocking should not have changed.
Overview
class QMutex { public: // API ... private: QAtomicPointer<QMutexPrivate> d_ptr; };
d_ptr
is the only data member. The trick is that it is not always a pointer to an actual QMutexPrivate
It may have a magic value:
Value of d_ptr
|
Meaning |
---|---|
0x0
|
The mutex is unlocked |
0x1
|
The mutex is locked and no other threads are waiting |
Other address | An actual pointer to a QMutexPrivate |
You should be familiar with a pointer having 0 (or 0x0
) as a value
(also called NULL
).
0x1
is a special value that represents an uncontended locked mutex.
So the principle of a lock is to try to atomically change the value of d_ptr
from 0 to 1
If this succeeds, we have the lock, else, we need to wait.
bool QMutex::tryLock() { return d_ptr.testAndSetAcquire(0x0, 0x1); } void QMutex::lock() { while(!d_ptr.testAndSetAcquire(0x0, 0x1)) { // change the value of d_ptr to a QMutexPrivate and call wait on it. // ... see bellow } }
The function bool QAtomicPointer::testAndSetAcquire(expected, newValue)
will change the value
of the pointer to newValue only if the previous value is expected and returns true if it succeeds.
It does it atomically: If other threads change the value behind its back, the operation
would fail and returns false.
Unlock
Let us start with the code:
void QMutex::unlock() { Q_ASSERT(d_ptr); // cannot be 0x0 because the mutex is locked. QMutexPrivate *d = d_ptr.fetchAndStoreRelease(0x0); if (d == 0x1) return; // no threads are waiting for the lock d->wake(); // wake up all threads d->deref(); }
The function fetchAndStoreRelease
will atomically exchange the value of
d_ptr
to 0x0
and
return the previous value. This exchange is atomic: it is guaranteed that we have the old value
and that no threads had put a different value in between.
The returned value cannot be 0, because our mutex was locked
(we have the lock, since we are unlocking it).
If it was 0x1
, no threads were waiting: we are finished.
Otherwise, we have a pointer to QMutexPrivate and threads are waiting, so we need to wake all
those threads up. They will then shortly try to lock the mutex again.
Waking all the threads while only one can acquire the mutex is a waste of CPU.
The real code inside Qt only wakes up one thread.
You will also notice a deref()
. This will de-reference the QMutexPrivate
and release it if needed.
Memory Management
In lock free programming, memory management is always a difficult topic.
We use the technique of reference counting in order to make sure the
QMutexPrivate
is not released while one thread is trying to use it.
It is important to see that the QMutexPrivate
can never be deleted.
Consider this code:
d_ptr->ref().
This is not an atomic operation. The ref()
itself is atomic, but there
is a step before.
The code first reads the value of d_ptr
and puts the memory address in a register.
But another thread may unlock the mutex and release the QMutexPrivate
before the call to ref()
.
We would end up accessing free'ed memory.
QMutexPrivate
are never deleted. Instead, they are put in a pool to be reused
(a lock-free stack).
We expect the total number of QMutexPrivate
to be rather small.
A QMutexPrivate
is only needed if there is a thread waiting,
and there cannot be more threads waiting than the number of threads on the application.
Here is how the QMutexPrivate
looks
class QMutexPrivate { QAtomicInt refCount; public: bool ref() { do { int c = refCount; if (c == 0) // do not reference a QMutexPrivate that return false; // has been already released. } while (!refCount.testAndSetAcquire(c, c + 1)) return true; } void deref() { if (!refCount.deref()) release(); } /* Release this QMutexPrivate by pushing it on the global lock-free stack */ void release(); /* Pop a mutex private from the global lock-free stack (or allocate a new one if the stack was empty) And make sure it is properly initialized The refCount is initially 1 */ static QMutexPrivate *allocate(); /* Block until wake is called. If wake was already called, returns immediately */ void wait(); /* Wake all the other waiting threads. All further calls to wait() will return immediately */ void wake(); // ... platform specifics };
Lock
Now, we have got enough information to look at the implementation of the locking.
void QMutex::lock() { // Try to atomically change d_ptr from 0 to 1 while (!d_ptr.testAndSetAcquire(0x0, 0x1)) { // If it did not succeed, we did not acquire the lock // so we need to wait. // We make a local copy of d_ptr to operate on a // pointer that does not change behind our back. QMutexPrivate *copy = d_ptr; // It is possible that d_ptr was changed to 0 before we did the copy. if (!copy) // In that case, the mutex is unlocked, continue; // so we can try to lock it again if (copy == 0x1) { // There is no QMutexPrivate yet, we need to allocate one copy = QMutexPrivate::allocate(); if (!d_ptr.testAndSetOrdered(0x1, copy)) { // d_ptr was not 0x1 anymore, either the mutex was unlocked // or another thread has put a QMutexPrivate. // either way, release the now useless allocated QMutexPrivate, // and try again. copy->deref(); continue; } } // Try to reference the QMutexPrivate if (!copy->ref()) // but if it fails because it was already released, continue; // the mutex has been unlocked, so try again // Now, it is possible that the QMutexPrivate had been released, // but re-used again in another mutex. Hence, we need to check that // we are really holding a reference to the right mutex. if (d_ptr != copy) { copy->deref(); continue; } // From this point we know that we have the right QMutexPrivate and that // it won't be released or change behind our back. // So we can wait. copy->wait(); // The mutex has been unlocked, we can release the QMutexPrivate copy->deref(); // and retry to lock. } }
I hope that the comments were self explanatory.
To The Real Code.
The simplified algorithms code shown here has some limitations that have been solved in the real implementation inside Qt:
- QMutex has a recursive mode with the same API (the Qt 5 implementation supports them but they are more expensive than non-recursive mutex)
- This simplification wakes all the threads waiting on a single mutex instead of just one. That is not optimal since only one is going to acquire the lock.
- It is not fair. This means that a thread waiting for a long time might never acquire the mutex while two other threads always exchange the mutex between each other.
- There is the
QMutex::tryLock(int timeout)
that only blocks for a limited amount of time if the mutex cannot be acquired.
This code is also not used on Linux. On Linux, we use the futexes, using
the d_ptr
as the futex controller.
If you are interested, you can look at the actual code: the file qmutex.cpp, qmutex.h and qmutex_p.h available on our code browser
Woboq is a software company that specializes in development and consulting around Qt and C++. Hire us!
If you like this blog and want to read similar articles, consider subscribing via our RSS feed (Via Google Feedburner, Privacy Policy), by e-mail (Via Google Feedburner, Privacy Policy) or follow us on twitter or add us on G+.
Article posted by Olivier Goffart on 21 December 2011
Click to subscribe via RSS or e-mail on Google Feedburner. (external service).
Click for the privacy policy of Google Feedburner.
Google Analytics Tracking Opt-Out
Loading comments embeds an external widget from disqus.com.
Check disqus privacy policy for more information.