Coding Tricks 101: How to Save the Assembler Code Generated by GCC
Posted on 2013-01-24 18:07 by Timo Bingmann at Permlink with 2 Comments. Tags: #c++ #coding tricks #frontpage
This is the first issue of a series of blog posts about some Linux coding tricks I have collected in the last few years.
Folklore says that compilers are among the most complex computer programs written today. They incorporate many optimization algorithms, inline functions and fold constant expressions; all without changing output, correctness or side effects of the code. If you think about it, the work gcc
, llvm
and other compilers do is really amazing and mostly works just great.
Sometimes, however, you want to know exactly what a compiler does with your C/C++ code. Most straight-forward questions can be answered using a debugger. However, if you want to verify whether the compiler really applies those optimizations to your program, that your intuition expects it to do, then a debugger is usually not useful, because optimized programs can look very different from the original. Some example questions are:
- Is a local integer variable stored in a register and how long does it exist?
- Does the compiler use special instructions for a simple copy loop?
- Are special conditional instructions used for an
if
orswitch
statement? - Is a specific function inlined or called each time?
These questions can be answered definitely by investigating the compiler's output. On the Net, there are multiple "online compilers," which can visualize the assembler output of popular compilers for small pieces of code: see the "GCC Explorer" or "C/C++ to Assembly v2". However, for inspecting parts of a larger project, these tools are unusable, because the interesting pieces are embedded in much larger source files.
Luckily, gcc
does not output binary machine code directly. Instead, it internally writes assembler code, which then is translated by as
into binary machine code (actually, gcc
creates more intermediate structures). This internal assembler code can be outputted to a file, with some annotation to make it easier to read.
Example with Generated Assembler Output
In this post, we will consider the following simple C program test.c.html :
#include <stdio.h> int some_function(int a) { a *= 5; // a is not really needed. return 42; } int main(int argc, char* argv[]) { int variableA = 10; printf("This is a test program.\n"); variableA += 5; variableA += some_function(20); return variableA; }
The standard method to generate assembler code from a C program is to run gcc -S test.c
. This will output a file test.s
. But, with some additional command line tricks, there are much more detailed assembler outputs available. Running the following compile line will not only output the binary, but also fork the intermediate assembler code into test.s.html . The orange command line parameters can be added to any gcc
call.
$ gcc test.c -o test -Wa,-adhln=test.s -g -fverbose-asm -masm=intel
The additional compiler option -Wa,-adhln=test.s
instructs gcc
to pass additional options to the internally called assembler: "-adhln=test.s
". These tell as
to output a listing to test.s
according to the following parameters (from as -help
):
-a[sub-option...] turn on listings Sub-options [default hls]: c omit false conditionals d omit debugging directives g include general info h include high-level source l include assembly m include macro expansions n omit forms processing s include symbols =FILE list to FILE (must be last sub-option)
The non-assembler options -g -fverbose-asm -masm=intel
yield a more detailed assembler listing. The debug -g
interleaves the assembler listing with the original C code. With -fverbose-asm
, gcc
outputs some additional information about which variable is manipulated in a register. And -masm=intel
changes the assembler mnemonics to Intel's style, instead of the AT&T style. Intel's style follows the right-to-left assignment paradigm, which I prefer because it resembles the same way C assignment are defined.
For most practical analysis, you will want to look at the assembler code generated by gcc
with additional optimization options:
$ gcc test.c -o test -Wa,-adhln=test-O3.s -g -fverbose-asm -masm=intel -O3 -march=native
The outputted listing test-O3.s.html is rather verbose and some shortened exerts are reproduced below. The listing is composed of parts of the original C code (prefixed with linenum:file ****
) intermingled with the assembler code generated for those lines. Obviously, the generated assembler code is not meant to be read by human eyes, but nevertheless one can usually recognize a lot of details, even with only moderate knowledge of assembler code.
[... header ...] 56 .p2align 4,,15 57 .globl some_function 59 some_function: 60 .LFB22: 61 .file 1 "test.c" 1:test.c **** #include <stdio.h> 2:test.c **** 3:test.c **** int some_function(int a) 4:test.c **** { 62 .loc 1 4 0 63 .cfi_startproc 64 .LVL0: 5:test.c **** a *= 5; // a is not really needed. 6:test.c **** return 42; 7:test.c **** } 65 .loc 1 7 0 66 0000 B82A0000 mov eax, 42 #, 66 00 67 0005 C3 ret 68 .cfi_endproc
For example, in some_function()
, the original return 42;
generated the lines 66 and 67: move the value to eax
and return. However, a *= 5;
does not generate any code; it was optimized out. Nevertheless, the original comment from the C code is included in the listing.
69 .LFE22: 71 .section .rodata.str1.1,"aMS",@progbits,1 72 .LC0: 73 0000 54686973 .string "This is a test program." 73 20697320 73 61207465 73 73742070 73 726F6772 74 .text 75 0006 662E0F1F .p2align 4,,15 75 84000000 75 0000 76 .globl main 78 main: 79 .LFB23: 8:test.c **** 9:test.c **** int main(int argc, char* argv[]) 10:test.c **** { 80 .loc 1 10 0 81 .cfi_startproc 82 .LVL1: 83 0010 4883EC08 sub rsp, 8 #, 84 .LCFI0: 85 .cfi_def_cfa_offset 16 86 .file 2 "/usr/include/bits/stdio2.h" 1:/usr/include/bits/stdio2.h **** /* Checking macros for stdio functions. */ [... more lines from stdio2.h ...] 102:/usr/include/bits/stdio2.h **** __extern_always_inline int 103:/usr/include/bits/stdio2.h **** printf (__const char *__restrict __fmt, ...) 104:/usr/include/bits/stdio2.h **** { 105:/usr/include/bits/stdio2.h **** return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ()); 87 .loc 2 105 0 88 0014 BF000000 mov edi, OFFSET FLAT:.LC0 #, 88 00 89 .LVL2: 90 0019 E8000000 call puts # 90 00 91 .LVL3: 11:test.c **** int variableA = 10; 12:test.c **** 13:test.c **** printf("This is a test program.\n"); 14:test.c **** 15:test.c **** variableA += 5; 16:test.c **** variableA += some_function(20); 17:test.c **** 18:test.c **** return variableA; 19:test.c **** } 92 .loc 1 19 0 93 001e B8390000 mov eax, 57 #, 93 00 94 0023 4883C408 add rsp, 8 #, 95 .LCFI1: 96 .cfi_def_cfa_offset 8 97 0027 C3 ret 98 .cfi_endproc
Of main()
only the bare bones are left: calling puts()
(line 90) and returning 57 (line 93). Apparently gcc
was able to identify that the string "This is a test program.\n"
contains no formatting instructions, like %d
, and can therefore be outputted using puts()
. The string itself is put into the .rodata
section as ASCII characters and labelled with .LC0
(line 71-73).
The constant return value of some_function()
was folded with "+= 5"
and the initial 10, to just moving 57 to eax
in line 93 and returning. Thus the binary code for some_function()
does still exist, because it must be callable from an externally linked object, but within main()
it is inlined.
The real challenge of inspecting larger projects is to find the assembler code corresponding to the parts you are interested in. The following methods have proven efficient:
- Marking the parts with a unique C/C++ comment and then searching for it in the interleaved debug lines.
- Noting the line number of interest, and searching for "
<linenum>:<file>
" in the assembler listing.
Some closing notes about my own observations:
- Many simple loops use modern vector instructions, while more complex parts yield slower basic instructions.
- When using optimization, integral variables exist only as long as they are used, and the register allocator does exactly what you think it should.
- However, not all template inlining works as one would expect: most importantly, functor object are not inlined; e.g. for
std::map
the comparator call cannot be inlined. In general: onlystatic const
template attributes are inlined.
Happy analysing the assembly output of gcc
.
Merci stef pour le taff que tu as fait j'oublie pas