Data initialization in C++
In this blog post, I am going to review the different kind of data and how they are initialized in a program.
What I am going to explain here is valid for Linux and GCC.
Code Example
I'll just start by showing a small piece of code. What is going to interest us is where the data will end up in memory and how it is initialized.
const char string_data[] = "hello world"; // .rodata const int even_numbers[] = { 0*2 , 1*2, 2*2, 3*2, 4*2}; //.rodata int all_numbers[] = { 0, 1, 2, 3, 4 }; //.data static inline int odd(int n) { return n*2 + 1; } const int odd_numbers[] = { odd(0), odd(1), odd(2), odd(3), odd(4) }; //initialized QString qstring_data("hello QString"); //object with constructor and destructor
I'll analyze the assembly. It has been generated with the following command, then re-formatted for better presentation in this blog post.
g++ -O2 -S data.cpp
(I also had to add a function that uses the data in order to avoid that the compiler removes some arrays that were not used.)
The sections
On Linux, the binaries (program or libraries) are stored as file in the ELF format. Those files are composed of many sections. I'll just go over a few of them:
The code: .text
This section is the actual code of your library or program it contains all the instructions for each function. That part of the code is mapped into memory, and shared between the instances of the processes that uses it (provided the library is compiled as position independent, which is usually the case).
I am not interested in the code in this blog post, let us move to the data sections.
The read-only data: .rodata
This section will be loaded the same way as the .text section is loaded. It will also be shared between processes.
It contains the arrays that are marked as const
such as string_data
and
even_numbers
.
.section .rodata _ZL11string_data: .string "hello world" _ZL12even_numbers: .long 0 .long 2 .long 4 .long 6 .long 8
You can see that even if the even_numbers
array was initialized with multiplications, the
compiler was able to optimize and generate the array at compile time.
The _ZL11
that is part of the name is the
mangling
because it is const
.
Writable data: .data
The data section will contain the pre-initialized data that are not read-only.
This section is not shared between processes but copied for each instance of
processes that uses it.
(Actually, with the copy-on-write optimization in the kernel, it might need to be copied only if the
data changes.)
There goes our all_number array that has not been declared as const
.
.data all_numbers: .long 0 .long 1 .long 2 .long 3 .long 4
Initialized at run-time: .bss + .ctors
The compiler was not able to optimize the calls to odd()
, it has to be
computed at run-time.
Where will our odd_numbers
array be stored?
What will happen is that it will not be stored in the binary, but some space will be reserved in the .bss section. That section is just some memory which is allocated to each process, it is initialized to 0.
The binary also contains a section with code that is going to be executed before
main()
is being called.
.section .text.startup _GLOBAL__sub_I_odd_numbers: movl $1, _ZL11odd_numbers(%rip) movl $3, _ZL11odd_numbers+4(%rip) movl $5, _ZL11odd_numbers+8(%rip) movl $7, _ZL11odd_numbers+12(%rip) movl $9, _ZL11odd_numbers+16(%rip) ret .section .ctors,"aw",@progbits .quad _GLOBAL__sub_I_odd_numbers .local _ZL11odd_numbers ; reserve 20 bytes in the .bss section .comm _ZL11odd_numbers,20,16
The .ctor section contains a table of pointers to functions that are going
to be called by the loader before it calls main()
. In our case, there is only one,
the code that initializes the odd_numbers
array.
Global Object
How about our QString
?
It is a global C++ object with a constructor and destructor.
It is simply initialized by running the constructor at start-up.
.section .rodata.str1.1,"aMS",@progbits,1 .LC0: .string "hello QString" .section .text.startup,"ax",@progbits _GLOBAL__sub_I_qstring_data: ; QString constructor (inlined) movl $-1, %esi movl $.LC0, %edi call _ZN7QString16fromAscii_helperEPKci movq %rax, _ZL12qstring_data(%rip) ; register the destructor movl $__dso_handle, %edx movl $_ZL12qstring_data, %esi movl $_ZN7QStringD1Ev, %edi jmp __cxa_atexit ; (tail call)
Here is the code of the constructor, which have been inlined.
We can also see that the code calls the function __cxa_atexit
with the parameters
$_ZL12qstring_data
and $_ZN7QStringD1Ev
Which are respectively the address of the QString object, and a function pointer of the
QString
destructor.
In other words, this code registers the destructor of QString to be run on exit.
The third parameter $__dso_handle
is a handle to this dynamic shared object
(used to run the destructor when a plugin is unloaded for example).
What is the problem with global objects with constructor?
- The order in which the constructors are called are not specified by the C++ standard. If you have dependencies between your global objects, you will run into trouble.
- All the constructors of all the global in all the libraries need to be run
before
main()
and slow down the startup of the application. (Even for objects that will never be used).
This is why it is not recommended to have global objects in libraries.
Instead, one can use function static objects, which are initialized on the first use.
(Qt provides a macro for that:
Q_GLOBAL_STATIC
which is made public in Qt 5.1.)
Here comes C++11
C++11 comes with a new feature: constexpr
That keyword can be used in two ways: If you specify that a function is a
constexpr it means that the function can be run at compile-time.
If you specify that a variable is a constexpr, then it means it can be
computed at compile time.
Let us slightly modify the example above and see what it does:
static inline constexpr int odd(int n) { return n*2 + 1; } constexpr int odd_numbers[] = { odd(0), odd(1), odd(2), odd(3), odd(4) };
Two constexpr were added.
.section .rodata _ZL11odd_numbers: .long 1 .long 3 .long 5 .long 7 .long 9
Now they are generated at compile time.
If a class has a constructor that is declared as constexpr and has no destructor, you can have this as global object and it will be initialized at compile time.
Since Qt 4.8, there is a macro
Q_DECL_CONSTEXPR
which expands to
constexpr
if the compiler supports it, or to nothing otherwise.
Woboq is a software company that specializes in development and consulting around Qt and C++. Hire us!
If you like this blog and want to read similar articles, consider subscribing via our RSS feed (Via Google Feedburner, Privacy Policy), by e-mail (Via Google Feedburner, Privacy Policy) or follow us on twitter or add us on G+.
Article posted by Olivier Goffart on 16 May 2013
Click to subscribe via RSS or e-mail on Google Feedburner. (external service).
Click for the privacy policy of Google Feedburner.
Google Analytics Tracking Opt-Out
Loading comments embeds an external widget from disqus.com.
Check disqus privacy policy for more information.