Est. 2011

Proof Of Concept: Re-implementing Qt moc using libclang

I have been trying to re-write Qt's moc using libclang from the LLVM project.

The result is moc-ng. It is really two different things:

  1. A plugin for clang to be used when compiling your code with clang;
  2. and an executable that can be used as a drop in replacement for moc.

What is moc again?

moc is a developer tool which is part of the Qt library. It's role is to handle the Qt's extension within the C++ code to offer introspection and enable the Qt signals and slots.

What are clang and libclang?

clang is the C and C++ frontend to the LLVM compiler. It is not only a compiler though, it also contains a library (libclang) which helps to write a C++ parser.

Motivation

moc is implemented using a custom naive C++ parser which does just enough to extract the right information from your source files. The limitation is that it can sometimes choke on more complex C++ code and it is not compatible with some of the features provided by the new versions of the C++ standard (such as C++11 trailing return functions or advanced templated argument types)

Using clang as a frontend just gives it a perfect parser than can handle all the most complicated constructs allowed by C++.

Having it as a plugin for clang would also allow to pass meta-data directly to LLVM without going trough the generated code. Allowing to do things that would not be possible with generated code such as having Q_OBJECT in a function-locale class. (That's not yet implemented)

Expressive Diagnostics

Clang has also a very good diagnostics framework, which allows better error analysis.
Compare: The error from moc:

With moc-ng

See how I used clang's look-up system to check the existence of the identifiers and suggest typo correction, while moc ignores such error and you get a weird error in the generated code.

Meet moc-ng

moc-ng is my proof of concept attempt of re-implementing the moc using clang as a frontend. It is not officially supported by the Qt-project.

It is currently in alpha state, but is already working very well. I was able to replace moc and compile many modules of qt5, including qtbase, qtdeclarative and qt-creator.

All the Qt tests that I ran passed or had an expected failure (for example tst_moc is parsing moc's error output, which has now changed)

Compatibility with the official moc

I have tried as much as possible to stay compatible with the real moc. But there are some differences to be aware of.

Q_MOC_RUN

There is a Q_MOC_RUN macro that is defined when the original moc is run. It is typically used to hide to moc some complicated C++ constructs it would otherwise choke on. Because we need to see the full C++ like a normal compiler, we don't define this. This may be a problem when signals or slots or other Qt meta things are defined in a Q_MOC_RUN block.

Missing or not Self-Contained Headers

The official moc ignores any headers that are not found. So if include paths are not passed to moc, it won't complain. Also, the moc parser does not care if the type have not been declared, and it won't report any of those errors.

moc-ng has a stricter C++ parser that requires a self-contained header. Fortunately, clang falls back gracefully when there are errors, and I managed to turn all the errors into warnings. So when parsing a non self contained headers or if the include flags were wrong, one gets lots of warning from moc.

Implementation details and Challenges

I am now going to go over some implementation details and challenges I encountered.

I used the C++ clang tooling API directly, instead of using the libclang's C wrapper, even tough the C++ API does not maintain source compatibility. The reasons are that the C++ API is much more complete, and that I want to use C++. I did not want to write a C++ wrapper around a C wrapper around the C++ clang.
In my experience with the code browser (which is also using the C++ API directly), there is not so much API changes and keeping the compatibility is not that hard.

Annotations

The clang libraries parse the C++ and give the AST. From that AST, one can list all the classes and theirs method in a translation unit. It has all the information you can find from the code, with the location of each declarations.

But the pre-processor removed all the special macro like signals or slots. I needed a way to know which method are tagged with special Qt keywords.
At first, I tought I would use pre-processor hook to remember the location where those special macro are expended. That could have worked. But there is a better way. I got the idea from the qt-creator wip/clang branch which tries to use clang as a code model. They use attribute extension to annotate the methods. Annotations are meant exactly for this use case: annotate the source code with non standard extensions so a plugin can act upon. And the good news is that they can be placed exactly where the signals or slot keyword can be placed.

#define Q_SIGNAL  __attribute__((annotate("qt_signal")))
#define Q_SLOT    __attribute__((annotate("qt_slot")))
#define Q_INVOKABLE  __attribute__((annotate("qt_invokable")))

#define signals    public Q_SIGNAL
#define slots      Q_SLOT

We do the same for all the other macro that annotate method. But we still need to find something for macro that annotate classes: Q_OBJECT, Q_PROPERTY, Q_ENUMS
Those where a bit more tricky. And the solution I found is to use a static_assert, with a given pattern. However, static_assert is C++11 only and I want it to work without C++11 enabled. Fortunately clang accept the C11's _Static_assert as an extension on all the modes. Using this trick, I can walk the AST to find the specific static_assert that matches the pattern and get the content within a string literal.

#define QT_ANNOTATE_CLASS(type, anotation)  \
    __extension__ _Static_assert(sizeof (#anotation), #type);

#define Q_ENUMS(x) QT_ANNOTATE_CLASS(qt_enums, x)
#define Q_FLAGS(x) QT_ANNOTATE_CLASS(qt_flags, x)

#define Q_OBJECT(x)   QT_ANNOTATE_CLASS(qt_qobject, "") \ 
        /*... other Q_OBJECT declarations ... */

We just have to replace the Qt macros by our macros. I do that by injecting code right when we exit qobjectdefs.h which defines all the Qt macro.

Tags

QMetaMethod::tag allows the programmer to leave any tag for some extension in front of a method. It is not so much used. To my knowledge, only QtDBus relies on this feature for Q_NOREPLY .

The problem is that this relies on macro that are defined only if Q_MOC_RUN is not defined. So I had to hack to pre-processor hooks to see when we are defining macro in places that are conditioned on Q_MOC_RUN. I can do that because the pre-processor callback has hooks on #if and #endif so i can see if we are currently handling a block of code that would be hidden from the moc. And when one defines a macro there, I register it as possible tags. Later, when such macro is expended, I register their locations. For each method, I can then query to know if there was a tag on the same line. There is many cases where this would fail. But fortunately, tags are not a commonly used feature, and the simple cases are covered.

Suppressing The Errors

As stated, the Qt moc ignores most of the errors. I already tell clang not to parse the bodies of the functions. But you may still get errors if types used in declarations are not found. When moc-ng is run as a binary, it is desirable to not abort on those errors, for compatibility with moc. I did not find easy way to change errors into warnings. You can promote some warnings into errors or change fatal errors to normal errors, but you cannot easily suppress errors or change them into warnings.

What I did is create my own diagnostic consumer , which proxies the error to the default one, but turns some of them into warnings. The problem is that clang would still count them as error. So the hack I did was to reset the error count. I wish there was a better way.

When used as a plugin, there is only one kind of error that one should ignore, it is if there is an include "foo.moc" That file will not exist because the moc is not run. Fortunately, clang has a callback when an include file has not been found. If it looks like a file that should have been generated by moc (starting by moc_ or ending by .moc) then that include can be ignored.

Qt's Binary JSON

Since Qt5, there is a macro Q_PLUGIN_METADATA which you can use to load a JSON file, and moc would embed this JSON in some binary proprietary format which is used internally in QJsonDocument.

I did not want to depend on Qt (to avoid the bootstrap issue). Fortunately, LLVM already has a good YAML parser (which is a super-set of JSON), so parsing was not a problem at all. The problem was to generate Qt's binary format. I spend too much time trying to figure out why Qt would not accept my binary before noticing that QJsonDocument enforces some alignment constraint on some items. Bummer.

Error Reporting within String Literal

When parsing the contents of things like Q_PROPERTY, I wish to report an error at the location it is in the source code. Using the macro described earlier, the content of Q_PROPERTY is turned in a string literal. Clang supports reporting errors within string literals in macros. As you can see on the screen shot, this works pretty well.

But there is still two level of indirection I would like to hide. It would be nice to hide some builtins macro from the diagnostic (I've hidden one level in the screenshot).
Also, I want to be able to report the location int the Q_PROPERTY line and not in the scratch space. But when using the # in macro, clang does not track the exact spelling location anymore.

Consider compiling this snippet with clang: It should warn you about the escape sequence \o, \p and \q not being valid. And look where the caret is for each warning

#define M(A, B)  A "\p" #B;
char foo[] = M("\o",   \q );

For \o and \p, clang puts the caret at the right place when the macro is expanded. But for \q, the caret is not put at its spelling location.

The way clang use to track the real origin of a source location is a very clever and efficient way. Each source location is represented by a clang::SourceLocation with is basically a 32 bit integer. The source location space is divided in consecutive entry that represents files or macro expansion. Each time a macro is expanded, there is a new macro expansion entry added, containing the source location of the expansion, and the location of the #define. In principle, there could be a new entry for each expended tokens, but consecutive entries are merged.
One could not do the same for strignified tokens because the string literal is only one token, but is coming from possibly many tokens. There are also some escaping rules to take in account that make it harder.

The way to do it is probably to leave the source location as they are, but having a special case for the scratch space while trying to find out the location of the caret.

Built-in includes

Some headers required by the standard library are not located in a standard location, but are shipped with clang and looked up in ../lib/clang/3.2/include relative to the binary.
I don't want to requires external files. I would like to just to have a simple single static binary without dependencies.

The solution would be to bundle those headers within the binary. I have nothing like qrc resources, but I can do the same in few lines of cmake

file(GLOB BUILTINS_HEADERS "${LLVM_BIN_DIR}/../lib/clang/${LLVM_VERSION}/include/*.h")
foreach(BUILTIN_HEADER ${BUILTINS_HEADERS})
    file(READ ${BUILTIN_HEADER} BINARY_DATA HEX)
    string(REGEX REPLACE "(..)" "\\\\x\\1" BINARY_DATA "${BINARY_DATA}")
    string(REPLACE "${LLVM_BIN_DIR}/../lib/clang/${LLVM_VERSION}/include/" 
                   "/builtins/" FN "${BUILTIN_HEADER}")
    set(EMBEDDED_DATA "${EMBEDDED_DATA} { \"${FN}\" , \"${BINARY_DATA}\" } , ")
endforeach()
configure_file(embedded_includes.h.in embedded_includes.h)

This will just go over all *.h files in the builtin include directory, read them in a hex string. and the regexp transforms that in something suitable in a C++ string literal. Then configure_file will replace @EMBEDDED_DATA@ by its value.
Here is how embedded_includes.h.in looks like:

static struct { char *filename; char *data; } EmbeddedFiles[] = {
    @EMBEDDED_DATA@
    {0, 0}
};

Conclusion

moc-ng was a fun project to do. Just like developing our C/C++ code browser. The clang/llvm frameworks are really powerfull and nice to work with.

Please have a look at the moc-ng project on GitHub or browse the source online.

Woboq is a software company that specializes in development and consulting around Qt and C++. Hire us!

If you like this blog and want to read similar articles, consider subscribing via our RSS feed (Via Google Feedburner, Privacy Policy), by e-mail (Via Google Feedburner, Privacy Policy) or follow us on twitter or add us on G+.

Submit on reddit Submit on reddit Tweet about it Share on Facebook Post on Google+

Article posted by Olivier Goffart on 10 June 2013

Load Comments...
Loading comments embeds an external widget from disqus.com.
Check disqus privacy policy for more information.
Get notified when we post a new interesting article!

Click to subscribe via RSS or e-mail on Google Feedburner. (external service).

Click for the privacy policy of Google Feedburner.
© 2011-2023 Woboq GmbH
Google Analytics Tracking Opt-Out