Coding Tricks 101: How to Save the Assembler Code Generated by GCC

Posted on 2013-01-24 18:07 by Timo Bingmann at Permlink with 2 Comments. Tags: #c++ #coding tricks #frontpage

This is the first issue of a series of blog posts about some Linux coding tricks I have collected in the last few years.

Folklore says that compilers are among the most complex computer programs written today. They incorporate many optimization algorithms, inline functions and fold constant expressions; all without changing output, correctness or side effects of the code. If you think about it, the work gcc, llvm and other compilers do is really amazing and mostly works just great.

Sometimes, however, you want to know exactly what a compiler does with your C/C++ code. Most straight-forward questions can be answered using a debugger. However, if you want to verify whether the compiler really applies those optimizations to your program, that your intuition expects it to do, then a debugger is usually not useful, because optimized programs can look very different from the original. Some example questions are:

Is a local integer variable stored in a register and how long does it exist?
Does the compiler use special instructions for a simple copy loop?
Are special conditional instructions used for an if or switch statement?
Is a specific function inlined or called each time?

These questions can be answered definitely by investigating the compiler's output. On the Net, there are multiple "online compilers," which can visualize the assembler output of popular compilers for small pieces of code: see the "GCC Explorer" or "C/C++ to Assembly v2". However, for inspecting parts of a larger project, these tools are unusable, because the interesting pieces are embedded in much larger source files.

Luckily, gcc does not output binary machine code directly. Instead, it internally writes assembler code, which then is translated by as into binary machine code (actually, gcc creates more intermediate structures). This internal assembler code can be outputted to a file, with some annotation to make it easier to read.

Example with Generated Assembler Output

In this post, we will consider the following simple C program test.c.html :

#include <stdio.h>

int some_function(int a)
{
    a *= 5;     // a is not really needed.
    return 42;
}

int main(int argc, char* argv[])
{
    int variableA = 10;

    printf("This is a test program.\n");

    variableA += 5;
    variableA += some_function(20);
    
    return variableA;
}

The standard method to generate assembler code from a C program is to run gcc -S test.c. This will output a file test.s. But, with some additional command line tricks, there are much more detailed assembler outputs available. Running the following compile line will not only output the binary, but also fork the intermediate assembler code into test.s.html . The orange command line parameters can be added to any gcc call.

$ gcc test.c -o test -Wa,-adhln=test.s -g -fverbose-asm -masm=intel

The additional compiler option -Wa,-adhln=test.s instructs gcc to pass additional options to the internally called assembler: "-adhln=test.s". These tell as to output a listing to test.s according to the following parameters (from as -help):

-a[sub-option...]    turn on listings
                     Sub-options [default hls]:
                     c      omit false conditionals
                     d      omit debugging directives
                     g      include general info
                     h      include high-level source
                     l      include assembly
                     m      include macro expansions
                     n      omit forms processing
                     s      include symbols
                     =FILE  list to FILE (must be last sub-option)

The non-assembler options -g -fverbose-asm -masm=intel yield a more detailed assembler listing. The debug -g interleaves the assembler listing with the original C code. With -fverbose-asm, gcc outputs some additional information about which variable is manipulated in a register. And -masm=intel changes the assembler mnemonics to Intel's style, instead of the AT&T style. Intel's style follows the right-to-left assignment paradigm, which I prefer because it resembles the same way C assignment are defined.

For most practical analysis, you will want to look at the assembler code generated by gcc with additional optimization options:

$ gcc test.c -o test -Wa,-adhln=test-O3.s -g -fverbose-asm -masm=intel -O3 -march=native

The outputted listing test-O3.s.html is rather verbose and some shortened exerts are reproduced below. The listing is composed of parts of the original C code (prefixed with linenum:file ****) intermingled with the assembler code generated for those lines. Obviously, the generated assembler code is not meant to be read by human eyes, but nevertheless one can usually recognize a lot of details, even with only moderate knowledge of assembler code.

[... header ...]
  56                            .p2align 4,,15
  57                    .globl some_function
  59                    some_function:
  60                    .LFB22:
  61                            .file 1 "test.c"
   1:test.c        **** #include <stdio.h>
   2:test.c        ****
   3:test.c        **** int some_function(int a)
   4:test.c        **** {
  62                            .loc 1 4 0
  63                            .cfi_startproc
  64                    .LVL0:
   5:test.c        ****     a *= 5;     // a is not really needed.
   6:test.c        ****     return 42;
   7:test.c        **** }
  65                            .loc 1 7 0
  66 0000 B82A0000              mov	eax, 42	#,
  66      00
  67 0005 C3                    ret
  68                            .cfi_endproc

For example, in some_function(), the original return 42; generated the lines 66 and 67: move the value to eax and return. However, a *= 5; does not generate any code; it was optimized out. Nevertheless, the original comment from the C code is included in the listing.

  69                    .LFE22:
  71                            .section	.rodata.str1.1,"aMS",@progbits,1
  72                    .LC0:
  73 0000 54686973              .string	"This is a test program."
  73      20697320
  73      61207465
  73      73742070
  73      726F6772
  74                            .text
  75 0006 662E0F1F              .p2align 4,,15
  75      84000000
  75      0000
  76                    .globl main
  78                    main:
  79                    .LFB23:
   8:test.c        ****
   9:test.c        **** int main(int argc, char* argv[])
  10:test.c        **** {
  80                            .loc 1 10 0
  81                            .cfi_startproc
  82                    .LVL1:
  83 0010 4883EC08              sub	rsp, 8	#,
  84                    .LCFI0:
  85                            .cfi_def_cfa_offset 16
  86                            .file 2 "/usr/include/bits/stdio2.h"
   1:/usr/include/bits/stdio2.h **** /* Checking macros for stdio functions. */
[... more lines from stdio2.h ...]
 102:/usr/include/bits/stdio2.h **** __extern_always_inline int
 103:/usr/include/bits/stdio2.h **** printf (__const char *__restrict __fmt, ...)
 104:/usr/include/bits/stdio2.h **** {
 105:/usr/include/bits/stdio2.h ****   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
  87                            .loc 2 105 0
  88 0014 BF000000              mov	edi, OFFSET FLAT:.LC0	#,
  88      00
  89                    .LVL2:
  90 0019 E8000000              call	puts	#
  90      00
  91                    .LVL3:
  11:test.c        ****     int variableA = 10;
  12:test.c        ****
  13:test.c        ****     printf("This is a test program.\n");
  14:test.c        ****
  15:test.c        ****     variableA += 5;
  16:test.c        ****     variableA += some_function(20);
  17:test.c        ****
  18:test.c        ****     return variableA;
  19:test.c        **** }
  92                            .loc 1 19 0
  93 001e B8390000              mov	eax, 57	#,
  93      00
  94 0023 4883C408              add	rsp, 8	#,
  95                    .LCFI1:
  96                            .cfi_def_cfa_offset 8
  97 0027 C3                    ret
  98                            .cfi_endproc

Of main() only the bare bones are left: calling puts() (line 90) and returning 57 (line 93). Apparently gcc was able to identify that the string "This is a test program.\n" contains no formatting instructions, like %d, and can therefore be outputted using puts(). The string itself is put into the .rodata section as ASCII characters and labelled with .LC0 (line 71-73).

The constant return value of some_function() was folded with "+= 5" and the initial 10, to just moving 57 to eax in line 93 and returning. Thus the binary code for some_function() does still exist, because it must be callable from an externally linked object, but within main() it is inlined.

The real challenge of inspecting larger projects is to find the assembler code corresponding to the parts you are interested in. The following methods have proven efficient:

Marking the parts with a unique C/C++ comment and then searching for it in the interleaved debug lines.
Noting the line number of interest, and searching for "<linenum>:<file>" in the assembler listing.

Some closing notes about my own observations:

Many simple loops use modern vector instructions, while more complex parts yield slower basic instructions.
When using optimization, integral variables exist only as long as they are used, and the register allocator does exactly what you think it should.
However, not all template inlining works as one would expect: most importantly, functor object are not inlined; e.g. for std::map the comparator call cannot be inlined. In general: only static const template attributes are inlined.

Happy analysing the assembly output of gcc.

Post Comment
Name:
E-Mail or Homepage:	URLs (http://...) are displayed, e-mails are hidden and used for Gravatar.
Many common HTML elements are allowed in the text, but no CSS style.