Internals of QMutex in Qt 5

You may want to read this blog if you want to understand the internals of QMutex or if you are interested in lock-free algorithms and want to discover one. You do not need to read or understand this article if you just want to use QMutex in your application. I found implementing QMutex interesting and therefore I want to share the knowledge.

I am going to show and explain a simplified version of the real code. If you want to see the real code, you can just browse the Qt source code. This article will help you to understand the concepts.

In order to better understand this article, you need to know some basics of lock-free programming. If you understand what the ABA problem is and know how to make a lock-free stack, you are probably fine. Otherwise I recommend reading my previous post as an introduction.

The contents of this blog entry are taken from the last part of my presentation at the Qt Developer days 2011.

QMutex

Before understanding how something works, one needs to understand what it does first. Here is the basic interface we are going to analyze.

class QMutex {
public:
    void lock();
    void unlock();
    bool tryLock();
private:
    //...
};

What it does is unsurprising. If you don't guess what those functions do, you can read it in the QMutex documentation.

Motivation

We want QMutex to be as efficient as possible. QMutex in Qt 4.8 is already quite fast because it is already using atomic operations. So don't expect differences in speed between Qt 4.8 and Qt 5. But what we improved is more from a memory perspective. We want QMutex to use as little memory as possible, so you can put more QMutex in your objects to have more lock granularity.

sizeof(QMutex) == sizeof(void*).

That condition was already true in Qt 4. But Qt 4 has lots of hidden costs because it is using pimpl, as almost every class in Qt does. Constructing a QMutex in Qt 4 allocates and initializes a QMutexPrivate which itself initializes some platform specific primitives. So we have a memory overhead of over 120 bytes per mutex and also the cost of initialization/destruction. In Qt 5, there are no more hidden costs per mutex.

Another good point is that it is now a POD. This is good because QMutex is often used to lock global resources. Then a global mutex seems the obvious thing to use. But, in Qt, we want to avoid the use of global objects. Indeed, global objects should be avoided especially in library code, because the order of initialization is unspecified and they slow down the start-up even if that part of the library is not used. But as PODs do not need to be initialized, they can be used as global object.
Note: QMutex is not a POD because it stil has a constructor. QMutex inherit from the QBasicMutex class which is the same but without the constructor. QBasicMutex should be used for static global objects.

QBasicMutex singletonMutex; // static global object !!!  (OK only for POD)
MySingleton *MySingleton::self() {
    QMutexLocker locker(&singletonMutex)
    if(!m_self)
        m_self = new MySingleton;
    return m_self;
}

That code is dangerous in Qt 4 because maybe another global object in another compilation unit wants to make use of the singleton and the mutex could be used before it is created.

Summary of the QMutex changes from Qt 4.8 to Qt 5

QMutex uses much less memory.
The construction and destruction is faster.
It is now suitable as a global object. (Using QBasicMutex)
The cost of locking or unlocking should not have changed.

Overview

class QMutex {
public:
    // API ...
private:
    QAtomicPointer<QMutexPrivate> d_ptr;
};

d_ptr is the only data member. The trick is that it is not always a pointer to an actual QMutexPrivate It may have a magic value:

Value of `d_ptr`	Meaning
`0x0`	The mutex is unlocked
`0x1`	The mutex is locked and no other threads are waiting
Other address	An actual pointer to a QMutexPrivate

You should be familiar with a pointer having 0 (or 0x0) as a value (also called NULL). 0x1 is a special value that represents an uncontended locked mutex.

So the principle of a lock is to try to atomically change the value of d_ptr from 0 to 1 If this succeeds, we have the lock, else, we need to wait.

bool QMutex::tryLock() {
    return d_ptr.testAndSetAcquire(0x0, 0x1);
}

void QMutex::lock() {
    while(!d_ptr.testAndSetAcquire(0x0, 0x1)) {
        // change the value of d_ptr to a QMutexPrivate and call wait on it.
        // ... see bellow
    }
}

The function bool QAtomicPointer::testAndSetAcquire(expected, newValue) will change the value of the pointer to newValue only if the previous value is expected and returns true if it succeeds. It does it atomically: If other threads change the value behind its back, the operation would fail and returns false.

Unlock

Let us start with the code:

void QMutex::unlock() {
    Q_ASSERT(d_ptr); // cannot be 0x0 because the mutex is locked.

    QMutexPrivate *d = d_ptr.fetchAndStoreRelease(0x0);
    if (d == 0x1)
        return; // no threads are waiting for the lock

    d->wake(); // wake up all threads
    d->deref();
}

The function fetchAndStoreRelease will atomically exchange the value of d_ptr to 0x0 and return the previous value. This exchange is atomic: it is guaranteed that we have the old value and that no threads had put a different value in between.

The returned value cannot be 0, because our mutex was locked (we have the lock, since we are unlocking it). If it was 0x1, no threads were waiting: we are finished. Otherwise, we have a pointer to QMutexPrivate and threads are waiting, so we need to wake all those threads up. They will then shortly try to lock the mutex again. Waking all the threads while only one can acquire the mutex is a waste of CPU. The real code inside Qt only wakes up one thread.

You will also notice a deref(). This will de-reference the QMutexPrivate and release it if needed.

Memory Management

In lock free programming, memory management is always a difficult topic. We use the technique of reference counting in order to make sure the QMutexPrivate is not released while one thread is trying to use it.

It is important to see that the QMutexPrivate can never be deleted. Consider this code:

d_ptr->ref().

This is not an atomic operation. The ref() itself is atomic, but there is a step before. The code first reads the value of d_ptr and puts the memory address in a register. But another thread may unlock the mutex and release the QMutexPrivate before the call to ref(). We would end up accessing free'ed memory.

QMutexPrivate are never deleted. Instead, they are put in a pool to be reused (a lock-free stack). We expect the total number of QMutexPrivate to be rather small. A QMutexPrivate is only needed if there is a thread waiting, and there cannot be more threads waiting than the number of threads on the application.

Here is how the QMutexPrivate looks

class QMutexPrivate {
    QAtomicInt refCount;
public:
    bool ref() {
        do {
            int c = refCount;
            if (c == 0)       // do not reference a QMutexPrivate that
                return false; //   has been already released.
        } while (!refCount.testAndSetAcquire(c, c + 1))
        return true;
    }
    void deref() {
        if (!refCount.deref())
            release();
    }

    /* Release this QMutexPrivate by pushing it on the
       global lock-free stack */
    void release();

    /* Pop a mutex private from the global lock-free stack
       (or allocate a new one if the stack was empty)
       And make sure it is properly initialized
       The refCount is initially 1
      */
    static QMutexPrivate *allocate();

    /* Block until wake is called.
       If wake was already called, returns immediately */
    void wait();

    /* Wake all the other waiting threads. All further calls to wait()
       will return immediately */
    void wake();

    // ... platform specifics
};

Lock

Now, we have got enough information to look at the implementation of the locking.

void QMutex::lock() {
    // Try to atomically change d_ptr from 0 to 1
    while (!d_ptr.testAndSetAcquire(0x0, 0x1)) {
        // If it did not succeed, we did not acquire the lock
        // so we need to wait.

        // We make a local copy of d_ptr to operate on a
        // pointer that does not change behind our back.
        QMutexPrivate *copy = d_ptr;

        // It is possible that d_ptr was changed to 0 before we did the copy.
        if (!copy)     // In that case, the mutex is unlocked,
            continue;  // so we can try to lock it again

        if (copy == 0x1) {
            // There is no QMutexPrivate yet, we need to allocate one
            copy = QMutexPrivate::allocate();
            if (!d_ptr.testAndSetOrdered(0x1, copy)) {
                // d_ptr was not 0x1 anymore, either the mutex was unlocked
                // or another thread has put a QMutexPrivate.
                // either way, release the now useless allocated QMutexPrivate,
                // and try again.
                copy->deref();
                continue;
            }
        }

        // Try to reference the QMutexPrivate
        if (!copy->ref())  // but if it fails because it was already released,
            continue;      // the mutex has been unlocked, so try again

        // Now, it is possible that the QMutexPrivate had been released,
        // but re-used again in another mutex. Hence, we need to check that
        // we are really holding a reference to the right mutex.
        if (d_ptr != copy) {
            copy->deref();
            continue;
        }

        // From this point we know that we have the right QMutexPrivate and that
        // it won't be released or change behind our back.
        // So we can wait.
        copy->wait();

        // The mutex has been unlocked, we can release the QMutexPrivate
        copy->deref();
        // and retry to lock.
    }
}

I hope that the comments were self explanatory.

To The Real Code.

The simplified algorithms code shown here has some limitations that have been solved in the real implementation inside Qt:

QMutex has a recursive mode with the same API (the Qt 5 implementation supports them but they are more expensive than non-recursive mutex)
This simplification wakes all the threads waiting on a single mutex instead of just one. That is not optimal since only one is going to acquire the lock.
It is not fair. This means that a thread waiting for a long time might never acquire the mutex while two other threads always exchange the mutex between each other.
There is the QMutex::tryLock(int timeout) that only blocks for a limited amount of time if the mutex cannot be acquired.

This code is also not used on Linux. On Linux, we use the futexes, using the d_ptr as the futex controller.

If you are interested, you can look at the actual code: the file qmutex.cpp, qmutex.h and qmutex_p.h available on our code browser

Article posted by Olivier Goffart on 21 December 2011

Load Comments...

Loading comments embeds an external widget from disqus.com.
Check disqus privacy policy for more information.

Get notified when we post a new interesting article!

Click to subscribe via RSS or e-mail on Google Feedburner. (external service).

Click for the privacy policy of Google Feedburner.

Internals of QMutex in Qt 5

QMutex

Motivation

Summary of the QMutex changes from Qt 4.8 to Qt 5

Overview

Unlock

Memory Management

Lock

To The Real Code.

Article posted by Olivier Goffart on 21 December 2011

Woboq

Services

Products

Knowledge

Misc