Previous Page
Next Page

18.3. Compiling C Programs with GCC

When you run GCC, its default behavior is to produce an executable program file from source code. To start with a simple example, we'll run GCC to make a finished executable program from the C source code in Example 1-1 at the beginning of this book:

$ gcc -Wall circle.c

This command line contains only the compiler's name, the source file name, and one option: -Wall instructs GCC to print warnings if it finds certain problems in the program (see the section "Compiler Warnings," later in this chapter, for more information). If there are no errors in the source code, GCC runs and exits without writing to the screen. Its output is a program file in the current working directory with the default name a.out. (In Windows, the default name is a.exe.) We can run this new program file:

$ ./a.out

which produces the screen output shown in Example 1-1.

If you do not want the executable program file to be named a.out, you can specify an output filename on the command line using the -o option:

$ gcc -Wall -o circle circle.c

This command produces the same executable, but it is now named circle.

18.3.1. Step by Step

The following sections present GCC options to let you control each stage of the compiling process: preprocessing , compiling, assembling, and linking. You can also perform the individual steps by invoking separate tools, such as the C preprocessor cpp, the assembler as, and the linker ld. GCC can also be configured to use such external programs on a given host system. For the sake of a uniform overview, however, this chapter shows you how to perform all four steps by invoking GCC and letting it control the process.

18.3.1.1. Preprocessing

Before submitting the source code to the actual compiler, the preprocessor executes directives and expands macros in the source files (see steps 1 through 4 in the section "The C Compiler's Translation Phases" in Chapter 1). GCC ordinarily leaves no intermediate output file containing the results of this preprocessing stage. However, you can save the preprocessor output for diagnostic purposes by using the -E option, which directs GCC to stop after preprocessing. The preprocessor output is directed to the standard output stream, unless you indicate an output filename using the -o option:

$ gcc -E -o circle.i circle.c

Because header files can be large, the preprocessor output from source files that include several headers is often unwieldy.

You may find it helpful to use the -C option as well, which prevents the preprocessor from removing comments from source and header files:

$ gcc -E -C -o circle.i circle.c

The following commonly used options affect GCC's behavior in the preprocessor phase:


-D name[= definition]

Defines the symbol name before preprocessing the source files. The macro name must not be defined in the source and header files themselves. Use this option together with #ifdef name directives in the source code for conditional compiling.


-U name

"Undefines" the symbol name, if defined on the command line or in GCC's default settings. The -D and -U options are processed in the order in which they occur on the command line.


-I directory[: directory[...]]

When header files are required by #include directives in the source code, search for them in the specified directory (or directories), in addition to the system's standard include directories.

The usual search order for include directories is:

  1. The directory containing the given source file (for filenames in given in quotation marks in an #include directive).

  2. Directories specified by -I options, in command-line order.

  3. Directories specified in the environment variables C_INCLUDE_PATH and CPATH.

  4. The system's default include directories.


-I-

This option divides any -Idirectory options on the command line into two groups. All directories appended to an -I option to the left of -I- are not searched for header files named in angle brackets in an #include directive, such as this one:

#include <stdio.h>

Instead, they are searched only for header files named in quotation marks in the #include directive, thus:

#include "myheader.h"

The second group consists of any directories named in an -I option to the right of -I-. These directories are searched for header files named in any #include directive.

Furthermore, if -I- appears on the command line, then the directory containing the source file is no longer automatically searched first for header files.

18.3.1.2. Compiling

At the heart of the compiler's job is the translation of C programs into the machine's assembly language.[*] Assembly language is a human-readable programming language that correlates closely to the actual machine code. Consequently, there is a different assembly language for each CPU architecture.

[*] Actually, as a retargetable compiler, GCC doesn't translate C statements directly into the target machine's assembly language, but uses an intermediate language, called Register Transfer Language or RTL, between the input language and the assembly-language output. This abstraction layer allows the compiler to choose the most economical way of coding a given operation in any context. Furthermore, an abstract description of the target machine in an interchangeable file provides a structured way to retarget the compiler to new architectures. From the point of view of GCC users, though, we can ignore this intermediate step.

Assembly language is often referred to more simply as "assembler ." Strictly speaking, however, the term "assembler" refers to the program that translates assembly language into machine code. In this chapter, we use "assembly language" to refer to the human-readable code and "assembler" to refer to the program that translates assembly language into a binary object file.


Ordinarily GCC stores its assembly-language output in temporary files, and deletes them immediately after the assembler has run. But you can use the -S option to stop the compiling process after the assembly-language output has been generated. If you do not specify an output filename, GCC with the -S option creates an assembly-language file with a name ending in .s for each input file compiled. An example:

$ gcc -S circle.c

The compiler preprocesses circle.c and translates it into assembly language, and saves the results in the file circle.s. To include the names of C variables as comments on the assembly language statements that access those variables, use the additional option -fverbose-asm:

$ gcc -S -fverbose-asm circle.c

18.3.1.3. Assembling

Because each machine architecture has its own assembly language, GCC invokes an assembler on the host system to translate the assembly-language program into executable binary code. The result is an object file, which contains the machine code to perform the functions defined in the corresponding source file, and also contains a symbol table describing all objects in the file that have external linkage.

If you invoke GCC to compile and link a program in one command, then its object files are only temporary, and are deleted after the linker has run. Most often, however, compiling and linking are done separately. The -c option instructs GCC not to link the program, but to produce an object file with the filename ending .o for each input file:

$ gcc -c circle.c

This command produces the object file circle.o.

You can use GCC's option -Wa to pass command-line options to the assembler itself. For example, suppose we want the assembler to run with the following options:


-as=circle.sym

Print the module's symbol table in a separate listing, and save the specified listing output in a filenamed circle.sym.


-L

Include local symbolsthat is, symbols representing C identifiers with internal linkagein the symbol table. (Don't confuse this assembler option with the GCC option -L!)

We can have GCC add these options to its invocation of the assembler by appending them as a comma-separated list to GCC's own -Wa option:

$ gcc -v -o circle -Wa,-as=circle.sym,-L circle.c

The list must begin with a comma after -Wa, and must contain no spaces. You can also use additional -Wa options in the same command. The -v option, which makes GCC print the options applied at each step of compiling, allows you to see the resulting assembler command line (along with a great deal of other information).

You can append several switches to the assembler's -a option to control the listing output. For a full reference, see the assembler's manual. The default listing output, produced when you simply specify -a with no additional switches, contains the assembly language code followed by the symbol table.

GCC's -g option makes the compiler include debugging information in its output. If you specify the -g option in addition to the assembler's -a option, then the resulting assembly language listing is interspersed with the corresponding lines of C source code:

$ gcc -g -o circle -Wa,-a=circle.list,-L circle.c

The resulting listing file, circle.list, allows you to examine line by line how the compiler has translated the C statements in the program circle.

18.3.1.4. Linking

The linker joins a number of binary object files into a single executable file. In the process, it has to complete the external references among your program's various modules by substituting the final locations of the objects for the symbolic references. The linker does this using the same information that the assembler provides in the symbol table.

Furthermore, the linker must also add the code for any C standard library functions you have used in your program. In the context of linking, a library is simply a set of object files collected in a single archive file for easier handling.

When you link your program to a library, only its member object files containing the functions you use are actually linked into your program. To make libraries of your own out of object files that you have compiled, use the utility ar; see its manual page for information.


The bulk of the standard library functions are ordinarily in the file libc.a (the ending .a stands for "archive") or in a shareable version for dynamic linking in libc.so (the ending .so stands for "shared object"). These libraries are generally in /lib/ or /usr/lib/, or in another library directory that GCC searches by default.

Certain functions are contained in separate library files, such as the standard library's floating-point math functions. To demonstrate how to link such libraries, let us replace the definition of p in circle.c with another one. In Example 1-1, the variable pi was initialized with a literal:

const double pi = 3.1415926536;    // Pi is a constant

We can initialize pi using the result of the arc tangent function by replacing that line with this one:

const double pi = 4.0 * atan(1.0);   // because tan(pi/4) = 1

Of course we will add the directive #include <math.h> at the beginning of the source file to declare the new external function. But the atan( ) function is not defined in the source code, nor in libc.a. To compile circle.c with this change, we have to use the -l option to link the math library as well:

$ gcc -o circle -lm circle.c

The filename of the math library is libm.a. (On systems that support dynamic linking, GCC automatically uses the shared library libm.so, if it is available. See "Dynamic Linking and Shared Object Files," later in this chapter, for more details.) The prefix lib and the suffix .a are standard, and GCC adds them automatically to whatever base name follows the -l on the command linein this case, m.

Normally, GCC automatically searches for a file with the library's name in standard library directories, such as /usr/lib. There are three ways to link a library that is not in a path where GCC searches for it. One is to present GCC with the full path and filename of the library as if it were an object file. For example, if the library were named libmath.a and located in /usr/local/lib, the following command would make GCC compile circle.c, then link the resulting circle.o with libmath.a:

$ gcc -o circle circle.c /usr/local/lib/libmath.a

In this case the library filename must be placed after the name of the source or object files that use it. This is because the linker works through the files on its command line sequentially, and does not go back to an earlier library file to resolve a reference in a later object.

The second way to link a library not in GCC's search path is to use the -L option to add another directory for GCC to search for libraries:

$ gcc -o circle -L/usr/local/lib -lmath circle.c

You can add more than one library directory either by using multiple -L options, or by using one -L followed by a colon-separated path list. The third way to make sure GCC finds the necessary libraries is to make sure that the directories containing your libraries are listed in the environment variable LIBARY_PATH.

You can pass options directly to the linker stage using -Wl followed by a comma-separated list, as in this command:

$ gcc -lm -Wl,-M circle.c circulararea.c > circle.map

The option -Wl,-M on the GCC command line passes the option -M to the linker command line, instructing the linker to print a link script and a memory map of the linked executable on standard output. (By itself, -M would be a preprocessor option to make GCC produce a dependency rule for use in a makefile.)

The list must begin with a comma after -Wl, and must contain no spaces. In case of doubt, you can use several -Wl options in the same GCC command line. Use the -v option to see the resulting linker command.

18.3.1.5. All of the above

There is another GCC option that offers a convenient way to obtain all the intermediate output files at once, and that is -save-temps. When you use that option, GCC will compile and link normally, but will save all preprocessor output, assembly language, and object files in the current directory. The intermediate files produced with the -save-temps option have the same base filename as the corresponding source files, with the endings .i, .s, and .o for preprocessor output, assembly language, and object files, respectively.

18.3.1.6. None of the above

If you invoke GCC with the option -fsyntax-only, it does not preprocess, compile, assemble, or link. It merely tests the input files for correct syntax. See also "Compiler Warnings," later in this chapter.

18.3.2. Multiple Input Files

In Chapter 1, we went on to divide circle.c into two separate source files (see Examples 1-2 and 1-3). Compiling multiple source files results in multiple object files, each containing the machine code and symbols corresponding to the objects in one source file. GCC uses temporary files for the object output, unless you use the option -c to instruct it to compile only, and not link:

$ gcc -c circle.c
$ gcc -c circulararea.c

These commands produce two object files in the current working directory named circle.o and circulararea.o. You can achieve the same result by putting both source filenames on one GCC command line:

$ gcc -c circle.c circulararea.c

In practice, however, the compiler is usually invoked for one small task at a time. Large programs consist of many source files, which have to be compiled, tested, edited, and compiled again many times during development, and very few of the changes made between builds affect all source files. To save time, a tool such as make (see Chapter 19) controls the build process, invoking the compiler to recompile only those object files that are older than the latest version of the corresponding source file.

Once all the object files have been compiled from current source files, you can use GCC to link them:

$ gcc -o circle circle.o circulararea.o -lm

GCC assumes that files with the filename extension .o are object files to be linked.

18.3.2.1. File types

The compiler recognizes a number of file extensions that pertain to C programs, interpreting them as follows:


.c

C source code, to be preprocessed before compiling.


.i

C preprocessor output, ready for compiling.


.h

C header file. (To save time compiling many source files that include the same headers, GCC allows you to create "precompiled header" files, which it then uses automatically as appropriate.)


.s

Assembly language.


.S

Assembly language with C preprocessor directives, to be preprocessed before assembling.

GCC also recognizes the file extensions .ii, .cc, .cp, .cxx, .cpp, .CPP, .c++, .C, .hh, .H, .m, .mi, .f, .for, .FOR, .F, .fpp, .FPP, .r, .ads, and .adb; these file types are involved in compiling C++, Objective-C, Fortran, or Ada programs. A file with any other filename extension is interpreted as an object file ready for linking.

If you use other naming conventions for your input files, you can use the option -x file_type to specify how GCC should treat them. file_type must be one of the following: c, c-header, cpp-output, assembler (meaning that the file contains assembly language), assembler-with-cpp, or none. All files that you list on the command line following an -x option will be treated as the type that you specify. To change types, use -x again. For example:

$ gcc -o bigprg mainpart.c -x assembler trickypart.asm -x c otherpart.c

You can use the -x option several times on the same command line to indicate files of different types. The option -x none turns off the file type indication, so that subsequent filenames are interpreted according to their endings again.

18.3.2.2. Mixed input types

You can mix any combination of input file types on the GCC command line. The compiler ignores any files that cannot be processed as you request. An example:

$ gcc -c circle.c circulararea.s /usr/lib/libm.a

With this command line, assuming all the specified files are present, GCC compiles and assembles circle.c, assembles circulararea.s, and ignores the library file, because the -c option says not to do any linking. The results are two object files: circle.o and circulararea.o.

18.3.3. Dynamic Linking and Shared Object Files

Shared libraries are special object files that can be linked to a program at runtime. The use of shared libraries has a number of advantages: a program's executable file is smaller; and shared modules permit modular updating, as well as more efficient use of the available memory.

To create a shared object file, use GCC's -shared option. The input file must be an existing object file. Here is a simple example using our circle program:

$ gcc -c circulararea.c
$ gcc -shared -o libcirculararea.so circulararea.o

The second of these two commands creates the shared object file libcirculararea.so. To link an executable to a shared object file, name the object file on the command line like any other object or library file:

$ gcc -c circle.c
$ gcc -o circle circle.o libcirculararea.so -lm

This command creates an executable that dynamically links to libcirculararea.so at runtime. Of course, you will also have to make sure that your program can find the shared library at runtimeeither by installing your libraries in a standard directory, such as /usr/lib, or by setting an appropriate environment variable such as LD_LIBRARY_PATH. The mechanisms for configuring dynamic loading vary from one system to another.

If shared libraries are available on your system, but you want to avoid using themto exclude a potential opening for rogue code, for exampleyou can invoke GCC with the -static option, thus:

$ gcc -static -o circle circle.o circulararea.o -lm

The resulting program file may be much larger than the dynamically linked one, however.

18.3.4. Freestanding Programs

In addition to the object and library files you specify on the GCC command line, the linker must also link in the system-specific startup code that the program needs in order to load and interact smoothly with the operating system. This code is already on hand in a standard object file named crt0.o, which contains the actual entry point of the executable program. (The crt stands for "C runtime.") On most systems, GCC also links programs by default with initialization routines in object files named crtbegin.o and crtend.o.

However, if you are writing a freestanding program, such as an operating system or an application for an embedded microcontroller, you can instruct GCC not to link this code by using the -ffreestanding and -nostartfiles options. The option -nostdlib allows you to disable automatic linking to the C standard library. If you use this option, you must provide your own versions of any standard functions used in your program. Finally, in a freestanding environment, a C program need not begin with main( ). You can use the linker option -ename on the GCC command line to specify an alternative entry point for your program.


Previous Page
Next Page