Est. 2011

QStringLiteral explained

QStringLiteral is a new macro introduced in Qt 5 to create QString from string literals. (String literals are strings inside "" included in the source code). In this blog post, I explain its inner working and implementation.

Summary

Let me start by giving a guideline on when to use it: If you want to initialize a QString from a string literal in Qt5, you should use:

  • Most of the cases: QStringLiteral("foo") if it will actually be converted to QString
  • QLatin1String("foo") if it is use with a function that has an overload for QLatin1String. (such as operator==, operator+, startWith, replace, ...)

I have put this summary at the beginning for the ones that don't want to read the technical details that follow.

Read on to understand how QStringLiteral works

Reminder on how QString works

QString, as many classes in Qt, is an implicitly shared class. Its only member is a pointer to the 'private' data. The QStringData is allocated with malloc, and enough room is allocated after it to put the actual string data in the same memory block.

// Simplified for the purpose of this blog
struct QStringData {
  QtPrivate::RefCount ref; // wrapper around a QAtomicInt
  int size; // size of the string
  uint alloc : 31; // amount of memory reserved after this string data
  uint capacityReserved : 1; // internal detail used for reserve()

  qptrdiff offset; // offset to the data (usually sizeof(QSringData))

  inline ushort *data()
  { return reinterpret_cast<ushort *>(reinterpret_cast<char *>(this) + offset); }
};

// ...

class QString {
  QStringData *d;
public:
  // ... public API ...
};

The offset is a pointer to the data relative to the QStringData. In Qt4, it used to be an actual pointer. We'll see why it has been changed.

The actual data in the string is stored in UTF-16, which uses 2 bytes per character.

Literals and Conversion

Strings literals are the strings that appears directly in the source code, between quotes.
Here are some examples. (suppose action, string, and filename are QString

    o->setObjectName("MyObject");
    if (action == "rename")
        string.replace("%FileName%", filename);

In the first line, we call the function QObject::setObjectName(const QString&). There is an implicit conversion from const char* to QString, via its constructor. A new QStringData is allocated with enough room to hold "MyObject", and then the string is copied and converted from UTF-8 to UTF-16.

The same happens in the last line where the function QString::replace(const QString &, const QString &) is called. A new QStringData is allocated for "%FileName%".

Is there a way to prevent the allocation of QStringData and copy of the string?

Yes, one solution to avoid the costly creation of a temporary QString object is to have overload for common function that takes const char* parameter.
So we have those overloads for operator==

bool operator==(const QString &, const QString &);
bool operator==(const QString &, const char *);
bool operator==(const char *, const QString &) 

The overloads do not need to create a new QString object for our literal and can operate directly on the raw char*.

Encoding and QLatin1String

In Qt5, we changed the default decoding for the char* strings to UTF-8. But many algorithms are much slower with UTF-8 than with plain ASCII or latin1

Hence you can use QLatin1String, which is just a thin wrapper around char * that specify the encoding. There are overloads taking QLatin1String for functions that can opperate or the raw latin1 data directly without conversion.

So our first example now looks like:

    o->setObjectName(QLatin1String("MyObject"));
    if (action == QLatin1String("rename"))
        string.replace(QLatin1String("%FileName%"), filename);

The good news is that QString::replace and operator== have overloads for QLatin1String. So that is much faster now.

In the call to setObjectName, we avoided the conversion from UTF-8, but we still have an (implicit) conversion from QLatin1String to QString which has to allocate the QStringData on the heap.

Introducing QStringLiteral

Is it possible to avoid the allocation and copy of the string literal even for the cases like setObjectName? Yes, that is what QStringLiteral is doing.

This macro will try to generate the QStringData at compile time with all the field initialized. It will even be located in the .rodata section, so it can be shared between processes.

We need two languages feature to do that:

  1. The possibility to generate UTF-16 at compile time:
    On Windows we can use the wide char L"String". On Unix we are using the new C++11 Unicode literal: u"String". (Supported by GCC 4.4 and clang.)
  2. The ability to create static data from expressions.
    We want to be able to put QStringLiteral everywhere in the code. One way to do that is to put a static QStringData inside a C++11 lambda expression. (Supported by MSVC 2010 and GCC 4.5) (And we also make use of the GCC __extension__ ({ }) Update: The support for the GCC extension was removed before the beta because it does not work in every context lambas are working, such as in default functions arguments)

Implementation

We will need need a POD structure that contains both the QStringData and the actual string. Its structure will depend on the method we use to generate UTF-16.

The code bellow was extracted from qstring.h, with added comments and edited for readability.


/* We define QT_UNICODE_LITERAL_II and declare the qunicodechar
   depending on the compiler */
#if defined(Q_COMPILER_UNICODE_STRINGS)
   // C++11 unicode string
   #define QT_UNICODE_LITERAL_II(str) u"" str
   typedef char16_t qunicodechar;
#elif __SIZEOF_WCHAR_T__ == 2
   // wchar_t is 2 bytes  (condition a bit simplified)
   #define QT_UNICODE_LITERAL_II(str) L##str
   typedef wchar_t qunicodechar;
#else
   typedef ushort qunicodechar; // fallback
#endif

// The structure that will contain the string.
// N is the string size
template <int N>
struct QStaticStringData
{
    QStringData str;
    qunicodechar data[N + 1];
};

// Helper class wrapping a pointer that we can pass to the QString constructor
struct QStringDataPtr
{ QStringData *ptr; };


#if defined(QT_UNICODE_LITERAL_II)
// QT_UNICODE_LITERAL needed because of macro expension rules
# define QT_UNICODE_LITERAL(str) QT_UNICODE_LITERAL_II(str)
# if defined(Q_COMPILER_LAMBDA)

#  define QStringLiteral(str) \
    ([]() -> QString { \
        enum { Size = sizeof(QT_UNICODE_LITERAL(str))/2 - 1 }; \
        static const QStaticStringData<Size> qstring_literal = { \
            Q_STATIC_STRING_DATA_HEADER_INITIALIZER(Size), \
            QT_UNICODE_LITERAL(str) }; \
        QStringDataPtr holder = { &qstring_literal.str }; \
        const QString s(holder); \
        return s; \
    }()) \

# elif defined(Q_CC_GNU)
// Use GCC To  __extension__ ({ }) trick instead of lambda
// ... <skiped> ...
# endif
#endif

#ifndef QStringLiteral
// no lambdas, not GCC, or GCC in C++98 mode with 4-byte wchar_t
// fallback, return a temporary QString
// source code is assumed to be encoded in UTF-8
# define QStringLiteral(str) QString::fromUtf8(str, sizeof(str) - 1)
#endif

Let us simplify a bit this macro and look how the macro would expand

o->setObjectName(QStringLiteral("MyObject"));
// would expand to: 
o->setObjectName(([]() {
        // We are in a lambda expression that returns a QStaticString

        // Compute the size using sizeof, (minus the null terminator)
        enum { Size = sizeof(u"MyObject")/2 - 1 };

        // Initialize. (This is static data initialized at compile time.)
        static const QStaticStringData<Size> qstring_literal =
        { { /* ref = */ -1, 
            /* size = */ Size, 
            /* alloc = */ 0, 
            /* capacityReserved = */ 0, 
            /* offset = */ sizeof(QStringData) },
          u"MyObject" };

         QStringDataPtr holder = { &qstring_literal.str };
         QString s(holder); // call the QString(QStringDataPtr&) constructor
         return s;
    }()) // Call the lambda
  );

The reference count is initialized to -1. A negative value is never incremented or decremented because we are in read only data.

One can see why it is so important to have an offset (qptrdiff) rather than a pointer to the string (ushort*) as it was in Qt4. It is indeed impossible to put pointer in the read only section because pointers might need to be relocated at load time. That means that each time an application or library, the OS needs to re-write all the pointers addresses using the relocation table.

Results

For fun, we can look at the assembly generated for a very simple call to QStringLiteral. We can see that there is almost no code, and how the data is laid out in the .rodata section

We notice the overhead in the binary. The string takes twice as much memory since it is encoded in UTF-16, and there is also a header of sizeof(QStringData) = 24. This memory overhead is the reason why it still makes sense to still use QLatin1String when the function you are calling has an overload for it.

QString returnAString() {
    return QStringLiteral("Hello");
}

Compiled with g++ -O2 -S -std=c++0x (GCC 4.7) on x86_64

    .text
    .globl  _Z13returnAStringv
    .type   _Z13returnAStringv, @function
_Z13returnAStringv:
    ; load the address of the QStringData into %rdx
    leaq    _ZZZ13returnAStringvENKUlvE_clEvE15qstring_literal(%rip), %rdx
    movq    %rdi, %rax
    ; copy the QStringData from %rdx to the QString return object
    ; allocated by the caller.  (the QString constructor has been inlined)
    movq    %rdx, (%rdi)
    ret
    .size   _Z13returnAStringv, .-_Z13returnAStringv
    .section    .rodata
    .align 32
    .type   _ZZZ13returnAStringvENKUlvE_clEvE15qstring_literal, @object
    .size   _ZZZ13returnAStringvENKUlvE_clEvE15qstring_literal, 40
_ZZZ13returnAStringvENKUlvE_clEvE15qstring_literal:
    .long   -1   ; ref
    .long   5    ; size
    .long   0    ; alloc + capacityReserved 
    .zero   4    ; padding
    .quad   24   ; offset
    .string "H"  ; the data. Each .string add a terminal '\0'
    .string "e"
    .string "l"
    .string "l"
    .string "o"
    .string ""
    .string ""
    .zero   4

Conclusion

I hope that now that you have read this you will have a better understanding on where to use and not to use QStringLiteral.
There is another macro QByteArrayLiteral, which work exactly on the same principle but creates a QByteArray.

Update: See also the internals of QMutex and more C++11 features in Qt5.

Woboq is a software company that specializes in development and consulting around Qt and C++. Hire us!

If you like this blog and want to read similar articles, consider subscribing via our RSS feed (Via Google Feedburner, Privacy Policy), by e-mail (Via Google Feedburner, Privacy Policy) or follow us on twitter or add us on G+.

Submit on reddit Submit on reddit Tweet about it Share on Facebook Post on Google+

Article posted by Olivier Goffart on 21 May 2012

« Previous: Signals and Slots in Qt5 | Next: C++11 in Qt5 »
Load Comments...
Loading comments embeds an external widget from disqus.com.
Check disqus privacy policy for more information.
Get notified when we post a new interesting article!

Click to subscribe via RSS or e-mail on Google Feedburner. (external service).

Click for the privacy policy of Google Feedburner.
© 2011-2023 Woboq GmbH
Google Analytics Tracking Opt-Out