Proof Of Concept: Re-implementing Qt moc using libclang
I have been trying to re-write Qt's moc using libclang from the LLVM project.
The result is moc-ng. It is really two different things:
- A plugin for clang to be used when compiling your code with clang;
- and an executable that can be used as a drop in replacement for moc.
What is moc
again?
moc
is a developer tool which is part of the Qt library.
It's role is to handle the Qt's extension within the C++ code to offer introspection and enable
the Qt signals and slots.
What are clang and libclang?
clang is the C and C++ frontend to the LLVM compiler. It is not only a compiler though, it also contains a library (libclang) which helps to write a C++ parser.
Motivation
moc
is implemented using a custom naive C++ parser which does just enough to extract the right information from your source files.
The limitation is that it can sometimes choke on more complex C++ code and it is not compatible with some of the
features provided by the new versions of the C++ standard
(such as C++11 trailing return functions or advanced templated argument types)
Using clang as a frontend just gives it a perfect parser than can handle all the most complicated constructs allowed by C++.
Having it as a plugin for clang would also allow to pass meta-data directly to LLVM without going trough the
generated code. Allowing to do things that would not be possible with generated code such as having Q_OBJECT
in a function-locale class.
(That's not yet implemented)
Expressive Diagnostics
Clang has also a very good diagnostics framework, which allows better error analysis.
Compare: The error from moc:
With moc-ng
See how I used clang's look-up system to check the existence of the identifiers and suggest typo correction,
while moc
ignores such error and you get a weird error in the generated code.
Meet moc-ng
moc-ng is my proof of concept attempt of re-implementing the moc using clang as a frontend. It is not officially supported by the Qt-project.
It is currently in alpha state, but is already working very well. I was able to replace moc
and compile many modules of qt5, including qtbase, qtdeclarative and qt-creator.
All the Qt tests that I ran passed or had an expected failure (for example tst_moc is parsing moc's error output, which has now changed)
Compatibility with the official moc
I have tried as much as possible to stay compatible with the real moc
. But there are some differences to be aware of.
Q_MOC_RUN
There is a Q_MOC_RUN
macro that is defined when the original moc is run.
It is typically used to hide to moc some complicated C++ constructs it would otherwise choke on.
Because we need to see the full C++ like a normal compiler, we don't define this.
This may be a problem when signals or slots or other Qt meta things are defined in a Q_MOC_RUN block.
Missing or not Self-Contained Headers
The official moc
ignores any headers that are not found.
So if include paths are not passed to moc, it won't complain. Also, the moc parser does not care if the type
have not been declared, and it won't report any of those errors.
moc-ng has a stricter C++ parser that requires a self-contained header. Fortunately, clang falls back gracefully when there are errors, and I managed to turn all the errors into warnings. So when parsing a non self contained headers or if the include flags were wrong, one gets lots of warning from moc.
Implementation details and Challenges
I am now going to go over some implementation details and challenges I encountered.
I used the C++ clang tooling API directly, instead of using the libclang's C wrapper,
even tough the C++ API does not maintain source compatibility.
The reasons are that the C++ API is much more complete, and that I want to use C++.
I did not want to write a C++ wrapper around a C wrapper around the C++ clang.
In my experience with the code browser (which is also using the C++ API directly),
there is not so much API changes and keeping the compatibility is not that hard.
Annotations
The clang libraries parse the C++ and give the AST. From that AST, one can list all the classes and theirs method in a translation unit. It has all the information you can find from the code, with the location of each declarations.
But the pre-processor removed all the special macro like signals or slots. I needed a way to know which method are tagged
with special Qt keywords.
At first, I tought I would use pre-processor hook
to remember the location where those special macro are expended.
That could have worked. But there is a better way. I got the idea from the qt-creator
wip/clang branch which tries to use
clang as a code model. They use attribute extension to annotate the methods.
Annotations are meant exactly for this use case: annotate the source code with non standard extensions so a plugin
can act upon. And the good news is that they can be placed exactly where the signals or slot keyword can be placed.
#define Q_SIGNAL __attribute__((annotate("qt_signal"))) #define Q_SLOT __attribute__((annotate("qt_slot"))) #define Q_INVOKABLE __attribute__((annotate("qt_invokable"))) #define signals public Q_SIGNAL #define slots Q_SLOT
We do the same for all the other macro that annotate method.
But we still need to find something for macro that annotate classes: Q_OBJECT, Q_PROPERTY, Q_ENUMS
Those where a bit more tricky. And the solution I found is to use a static_assert
, with a given pattern.
However, static_assert
is C++11 only and I want it to work without C++11 enabled.
Fortunately clang accept the C11's _Static_assert
as an extension on all the modes.
Using this trick, I can
walk the AST to find the specific static_assert
that matches the pattern and get the content within
a string literal.
#define QT_ANNOTATE_CLASS(type, anotation) \ __extension__ _Static_assert(sizeof (#anotation), #type); #define Q_ENUMS(x) QT_ANNOTATE_CLASS(qt_enums, x) #define Q_FLAGS(x) QT_ANNOTATE_CLASS(qt_flags, x) #define Q_OBJECT(x) QT_ANNOTATE_CLASS(qt_qobject, "") \ /*... other Q_OBJECT declarations ... */
We just have to replace the Qt macros by our macros.
I do that by
injecting
code right when we exit qobjectdefs.h
which defines all the Qt macro.
Tags
QMetaMethod::tag
allows the programmer to leave any tag for some extension in front of a method.
It is not so much used. To my knowledge, only QtDBus relies on this feature for
Q_NOREPLY
.
The problem is that this relies on macro that are defined only if Q_MOC_RUN
is not defined.
So I had to hack to pre-processor hooks to see when we are defining macro in places that are conditioned on Q_MOC_RUN
.
I can do that because the pre-processor callback has hooks on #if
and #endif
so i can see if we are currently handling a block of code
that would be hidden from the moc. And when one defines a macro there, I register it as possible tags.
Later, when such macro is expended, I
register their locations. For each method, I can then
query
to know if there was a tag on
the same line. There is many cases where this would fail. But fortunately, tags are not a commonly used feature, and the simple cases
are covered.
Suppressing The Errors
As stated, the Qt moc ignores most of the errors. I already tell clang not to parse the bodies of the functions. But you may still get errors if types used in declarations are not found. When moc-ng is run as a binary, it is desirable to not abort on those errors, for compatibility with moc. I did not find easy way to change errors into warnings. You can promote some warnings into errors or change fatal errors to normal errors, but you cannot easily suppress errors or change them into warnings.
What I did is create my own diagnostic consumer , which proxies the error to the default one, but turns some of them into warnings. The problem is that clang would still count them as error. So the hack I did was to reset the error count. I wish there was a better way.
When used as a plugin, there is only one kind of error that one should ignore, it is if there is an include "foo.moc"
That file will not exist because the moc is not run. Fortunately, clang has a callback when an include file has not been found.
If it looks like a file that should have been generated by moc (starting by moc_
or ending by .moc
)
then that include
can be ignored.
Qt's Binary JSON
Since Qt5, there is a macro Q_PLUGIN_METADATA
which you can use to load a JSON
file, and moc would embed
this JSON in some binary proprietary format which is used internally in QJsonDocument
.
I did not want to depend on Qt (to avoid the bootstrap issue).
Fortunately, LLVM already has a good YAML parser (which is a super-set of JSON), so parsing was not a problem at all.
The problem was to generate Qt's binary format.
I spend too much time trying to figure out why Qt would not accept my binary before noticing that
QJsonDocument
enforces some alignment
constraint on some items. Bummer.
Error Reporting within String Literal
When parsing the contents of things like Q_PROPERTY
, I wish to report an error at the location it is in the source code.
Using the macro described earlier, the content of Q_PROPERTY
is turned in a string literal.
Clang supports reporting errors within string literals in macros. As you can see on the screen shot, this works pretty well.
But there is still two level of indirection I would like to hide.
It would be nice to hide some builtins macro from the diagnostic (I've hidden one level in the screenshot).
Also, I want to be able to report the location int the Q_PROPERTY
line and not in the scratch space.
But when using the #
in macro, clang does not track the exact spelling location anymore.
Consider compiling this snippet with clang: It should warn you about the escape sequence
\o
, \p
and \q
not being valid. And look where the caret is for each warning
#define M(A, B) A "\p" #B; char foo[] = M("\o", \q );
For \o
and \p
, clang puts the caret at the right place when the macro is expanded.
But for \q
, the caret is not put at its spelling location.
The way clang use to track the real origin of a source location is a very clever and efficient way.
Each source location is represented by a clang::SourceLocation
with is basically a 32 bit integer.
The source location space is divided in consecutive entry that represents files or macro expansion.
Each time a macro is expanded, there is a new macro expansion entry added, containing the source location of the
expansion, and the location of the #define.
In principle, there could be a new entry for each expended tokens, but consecutive entries are merged.
One could not do the same for strignified tokens because the string literal is only one token,
but is coming from possibly many tokens. There are also some escaping rules to take in account that
make it harder.
The way to do it is probably to leave the source location as they are, but having a special case for the scratch space while trying to find out the location of the caret.
Built-in includes
Some headers required by the standard library are not located in a standard
location, but are shipped with clang and looked up in ../lib/clang/3.2/include
relative to the binary.
I don't want to requires external files. I would like to just to have a simple single static binary without dependencies.
The solution would be to bundle those headers within the binary.
I have nothing like qrc
resources, but I can do the same in few lines of cmake
file(GLOB BUILTINS_HEADERS "${LLVM_BIN_DIR}/../lib/clang/${LLVM_VERSION}/include/*.h") foreach(BUILTIN_HEADER ${BUILTINS_HEADERS}) file(READ ${BUILTIN_HEADER} BINARY_DATA HEX) string(REGEX REPLACE "(..)" "\\\\x\\1" BINARY_DATA "${BINARY_DATA}") string(REPLACE "${LLVM_BIN_DIR}/../lib/clang/${LLVM_VERSION}/include/" "/builtins/" FN "${BUILTIN_HEADER}") set(EMBEDDED_DATA "${EMBEDDED_DATA} { \"${FN}\" , \"${BINARY_DATA}\" } , ") endforeach() configure_file(embedded_includes.h.in embedded_includes.h)
This will just go over all *.h files in the builtin include directory, read them in a hex string. and the regexp transforms that
in something suitable in a C++ string literal. Then configure_file
will replace @EMBEDDED_DATA@ by its value.
Here is how embedded_includes.h.in
looks like:
static struct { char *filename; char *data; } EmbeddedFiles[] = { @EMBEDDED_DATA@ {0, 0} };
Conclusion
moc-ng was a fun project to do. Just like developing our C/C++ code browser. The clang/llvm frameworks are really powerfull and nice to work with.
Please have a look at the moc-ng project on GitHub or browse the source online.
Woboq is a software company that specializes in development and consulting around Qt and C++. Hire us!
If you like this blog and want to read similar articles, consider subscribing via our RSS feed (Via Google Feedburner, Privacy Policy), by e-mail (Via Google Feedburner, Privacy Policy) or follow us on twitter or add us on G+.
Article posted by Olivier Goffart on 10 June 2013
Click to subscribe via RSS or e-mail on Google Feedburner. (external service).
Click for the privacy policy of Google Feedburner.
Google Analytics Tracking Opt-Out
Loading comments embeds an external widget from disqus.com.
Check disqus privacy policy for more information.