A Manual of C Style

Part of the 22C:50 System Software support pages
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Copyright © 2000 Douglas W. Jones, revised 2002; This work may be transmitted or stored in electronic form on any computer attached to the Internet or World Wide Web so long as this notice is included in the copy. Individuals may make single copies for their own use. All other rights are reserved.

Contents

Introduction
Indenting
Spaces
New Lines
Comments
Capitalization
Special Rules Regarding Defines
Integer and Boolean Types
Pointers
Files and Directories
Include Files
Local Functions and Variables
Public Variables and Definitions
Casting and Types
Functions and Function Parameters
The Main Program
Exception Handling
Makefiles and Inter-File Relationships
Parameters to the Compilation


Introduction

The definitive source for C stylists is the book that defined the language,

The C Programming Language by Brian W. Kernighan and Dennis M. Ritchie. Prentice Hall, 1978 (First edition) and 1988 (Second Edition).

The first edition of this book both introduced the C language to the world and set a standard for clear language tutorials with a consistant style of presentation. The second edition accounted for changes in C introduced by the ANSI C standard. However, as with any short work, these books provide little advice about structuring large programs or other software engineering issues, and the C style used in these books is not explicitly described, but simply illustrated by numerous small examples.

There can be no such thing as a standard style for a language with as diverse a community of programmers as C has. Furthermore, a style well suited to feasibility studies, small algorithms and introductory courses may be quite inappropriate for large-scale system development.

The style presented here is aimed at large programs that can profit by strong information hiding tools, modular development, and a large dose of object-oriented development methodology, despite the lack of explicit support for this in C. This style might be considered an example of the kind of house style that tends to develop in programming teams engaged in large-scale long-term programming projects.

Afterthought: Some institutions view their house styles as guidelines; in this case, the style rules are frequently stated in subjective terms. Other organizations require rigid adherence to their house style, and state their style rules in objective form. Some elements of style are easily stated in objective terms, for example, the directive to use tabs for indenting; other style elements are extremely difficult to state objectively, for example, the subjective requirement for meaningful identifers. Programmers are well advised to avoid employers that impose rigid objective style guidelines that make any attempt at being comprehensive, particularly if these guidelines are mechanically enforced.

There is a large collection of C style guides from a wide variety of sources cataloged on the web at:

http://www.cs.umd.edu/users/cml/cstyle/

Indenting

All indenting should be done with tabs, one tab per level of indenting. Anything between curly braces should be indented one level from the surrounding text.

|struct widget {
|        atype x;
|        btype y;
|};
|
|void afunction()
|{
|        int i;
|
|        for (i = 0; i < 10; i++) {
|                loop_body();
|        }
|
|        if (cond()) {
|                then_clause();
|        } else {
|                else_clause();
|        }
|}

We do not require, as some stylists do, that the opening and closing braces for a block be vertically aligned, as this leads to blocks that take up too much space in the text editor window. Instead, we require that the opening brace for a block occur at the end of the line that introduces that block, and that the closing brace occur at the start of the line that marks the end of that block. Only closing semicolons, else or while clauses should follow closing braces.

Spaces

In most cases, a minimum of spaces should be used, but, spaces should be used around two-operand operators, including assignment, and spaces should follow all commas, semicolons, and similar punctuation marks, as in standard English usage. Spaces should also separate the parameters of a function from the parentheses surrounding parameter lists, and spaces should separate keywords from adjoining punctuation.

|void function( atype p, btype * q )
|{
|        int i;
|
|        for (;;) {
|                i = (i << 5) / (i * -2);
|                function( p, &(*q[5]) );
|
|        }
|}

In the above, note that the (;;) construct is something of a special case. This eccentric construct is used to construct infinite loops and loops where the loop termination test is in mid loop. Also note that the construct &(*q[5]) contains no spaces because none of the operators are binary, all are unary.

New Lines

C allows a new-line between any two lexemes, but in general, use of new lines should follow the usage shown in Kernighan and Ritchie. The semicolon at the end of every declaration and statement should be followed with a new line, a new line should follow every opening brace, etc. In addition, we constrain if statements to the following forms:

|        if (a > b) statement();
|
|        if (a > b) {
|                one_option();
|        } else if (a < b) {
|                another_option();
|        } else {
|                yet_another_option();
|        }

The point of this is to avoid if statements where the different branches take different forms. Either all branches are blocks enclosed in curly braces or the entire if statement fits on one line and has no else clause.

No source line should be over 80 characters. When lines threaten to get long, they should be broken up. Three common cases occur, long if statements with complex Boolean expressions, long expressions on assignment statements, and long function calls.

|        if (one_test()
|        &&  another_test()
|        &&  yet_another() ) {
|
|                variable = a_term
|                        + another_term
|                        + even_more;
|
|                function( a_parameter(),
|                        another_parameter(),
|                        "And a very long string "
|                        "that went on too long." );
|        }

Note that the && and || operators in C are both Boolean operators and control structures. As such, it makes perfectly good sense to align them under the if keyword. Note also that, in C, two adjacent string constants without any intervening operator are simply concatenated, so any string constant can be broken across line boundaries between any two characters. If the string contains text, however, it should only be broken between words.

Comments

Comments in C begin with /* and end with */. As far as the compiler is concerned, this is the only issue. Note that, while many of the C compilers available today accept the C++ style of comments, starting with --, we do not allow this because this style guide is for the older C language. We encourage the following usage:

|/* filename */
|
|/******************************************
| * One line file description              *
| * Author:  Author's Name                 *
| * Revised by:  Author's Name             *
| *     Description of purpose of revision *
| ******************************************/
|
|/* comment describing a major declaration */
|sometype x;
|
|/******************************************
| * Big comment dividing major sections    *
| ******************************************/
|
|void function( atype p, btype q, ctype r )
|/* purpose of function
|   given:  p, purpose of parameter p
|           q, purpose of parameter q                       \* revised *\
|           r, purpose of parameter r
|   note:  special noteworthy considerations caller might need to know
|   warning:  special warnings caller may care about
|*/
|{
|        int i; /* comment describing a minor declaration */
|
|        /* one line comment before some bit of the body */
|        body();                                            /* revised */
|
|        if (test()) {
|                /* comment giving condition met */    
|        }
|}

In any given source file, the big bricks of commentary should all be the same width and in most cases, they will follow a steriotypical pattern.

Sometimes, it is useful to flag lines that were added in a revision of the program. The comments saying revised in the above are examples of this. These revision notes may be shortened to version numbers; all such notes should be aligned in a column in the right margin, except when the code is indented too far to the right, in which case, these comments may be aligned in the left margin. When, on occasion, such a comment ends up inside a larger block comment, use of backslashes to make "backward comment delimiters", as illustrated above, makes it clear that these are comments within the larger comment.

Afterthought: Attempts to devise more comprehensive and objective requirements for commenting frequently lead to comment inflation, the inclusion of comments stating the obvious in programs that would have been clearer if many of the comments had been omitted. In fact, the most useful style guideline when it comes to commentary is that good comments rarely state the obvious; rather, they tell the programmer something that isn't immediately obvious when reading the code itself.

Capitalization

In general, as in Kernighan and Ritchie, identifiers defined by the C preprocessor should be capitalized. Thus, capitalization indicates that a constant or function is handled by the preprocessor and not the compiler.

|#define FIVE 5
|
|#define INCREMENT(x) ((x) + 1)
|
|int six = INCREMENT(FIVE);

Note that we do not encourage the use of the style known as StudlyCaps or BiCapitalization. Instead, we encourage the use of underscore as a spacer within identifiers, as in two_words or the simple run-on concatenation of the words, as in runon.

When a module name is used as the prefix on the name of a symbol exported by that module, the underscore should be used, and underscore should not be used in contexts where a reader might be led to think that the prefix was a module name. Thus, we would expect a stack module to export functions named stack_push() and stack_pop(), for example, but we would discourage the use of the identifier stack_thing for any purpose other than some interface to the stack module.

Afterthought: Attempts to devise more comprehensive and objective capitalization rules for C programs are doomed by the inconsistant usage of capitalization in the C standard library. Some of the functions in the standard library are actually defines in the associated header files, while others are real functions. Many experienced programmers don't know which are real functions and which are defines.

Special Rules Regarding Defines

The C preprocessor's define mechanism is both a powerful tool and a source of many potential problems. Consider this innocent fragment:

|#define DOUBLE(x) = x * 2; /* bad usage!!! */

This defines DOUBLE in such a way that each use is replaced by a string that begins with an equals sign and contains a semicolon, and possibly even a comment. The programmer probably intended:

|/* still bad usage */
|#define DOUBLE(x) x * 2

Here, DOUBLE will usually produce the expected result, but DOUBLE(x+1) is x+1*2 or x+2 which is most likely not what the programmer hoped for. Similarly, ++DOUBLE(x) is the legal expression ++x*2, which means (x=x+1)*2 which is not likely to be what the programmer hoped for. To avoid such trouble, use lots of parentheses.

|/* good usage */
|#define DOUBLE(x) ((x) * 2)

Afterthought: The problem with comments on the same line as the define is resolved by newer ANSI compliant C preprocessors, where the define mechanism operates at the lexical level; many non-compliant C preprocessors are still around, though, and some of these operate in terms of the actual text of the line, comments and all. This rarely causes problems, but in the context of nested defines, particularly large ones such as putchar() with defined functions passed as arguments, the expansion can become huge, and passing comments in the expansion can lead to problems.

Integer and Boolean Types

The types short int, int and long int have rather fuzzy definitions. As a result, programs that work under one compiler may not work under another because the range of values supported changes from one compiler to another. For a quarter of a century, C offered no standard solution to this problem. The C99 standard now solves this with the header file <stdint.h>. Use it!

<stdint.h> defines the following useful types for integers where you need to specify a known and fixed-precision:

|int8_t   sbyte;  /*   -128 <= sbyte <=   +127   */
|uint8_t  ubyte;  /*      0 <= ubyte <=   +255   */
|int16_t  sword;  /* -32768 <= sword <= +32767   */
|uint16_t uword;  /*      0 <= uword <= +65535   */
|int32_t  slong;  /* -2**31 <= slong <= +2**31-1 */
|uint32_t ulong;  /*      0 <= ulong <= +2**32-1 */

Enforcing the range constraints on some fixed-range types can be expensive, requiring extra instructions to truncate the results of arithmetic. Therefore, <stdint.h> also defines "fast" versions of the above types, so the type int_fast8_t, for example, will hold at least the range of values allowed for int8_t and supports fast access to and arithmetic on those values. Similarly, uint_fast16_t.

These "fast" identifiers are ugly. Fortunately, the type int is, for practical purposes, int_fast16_t because the C standard requires that int variables be at least 16 bits and it implicitly encourages compiler writers to use a single word for int variables.

The C language contains no built-in Boolean type. So, relational operators such as < return integer values, with the convention that zero is false and one is true. The if and while statements interpret zero as false and nonzero as true. Clever programmers can use this to good advantage, but most of us are not clever most of the time. As with the problems with integer ranges, this problem was solved in C99 with yet another header file, stdbool.h. Use it! This include file allows things like:

|bool flag = true; /* define flag to be boolean, initially true */

These different header files should always be included at the head of every C program. This adds clutter, so the natural thing to do is add yet another header file that includes all of the definitions needed to repair deficiencies in the C language.

|/* cfixup.h */
|
|/***************************************
| * Repairs to the C language           *
| ***************************************/
|
|#include <stdbool.h>
|#include <stdint.h>

Pointers

To dynamically allocate an instance of some type, standard C requires a call to the malloc() function. This returns a pointer of type void* and requires a parameter giving the size of the object being allocated. A conservative programmer should therefore write something like the following to dynamically allocate an object:

|pointer = malloc( sizeof( *pointer ) );

The above code uses implicit casting to convert the (void *) pointer returned by malloc() to the type of the pointer variable. The implicit casting of (void *) to other pointer types is potentially dangerous, but in this case, it is the best way to go. Programmers wishing to avoid this cumbersome notation might prefer to define a compact memory allocation operator:

|#define NEW(p) (p = malloc( sizeof( *p ) ))

Given this, the statement NEW( pointer ); replaces the clumsy call to malloc given at the start of this section.

Another minor nuisance of the C programming language is that the constant NULL is not predefined. Several different header files have historically defined it, notably stdio.h. The stdlib.h header file is now the preferred source for this definition. Every C program should always begin by including this file, but this adds clutter, so in large programming projects, it makes sense to add it to the fixup header file:

|/* cfixup.h */
|
|/***************************************
| * Repairs to the C language           *
| ***************************************/
|
|#include <stdlib.h>
|#include <stdbool.h>
|#include <stdint.h>
|
|#define NEW(p) (p = malloc( sizeof( *p ) ))

You might argue that all programs should include <stdio.h>, so it should never be necessary to define NULL. This is not true. In large programs, it is quite common to find that only a few source files do any input-output; these, of course, should include <stdio.h>, but to include this in the others would mislead readers to expect some I/O activity where there is none.

Files and Directories

If the program is named stackit all source files involved in the program should be stored in a directory called stackit. These files should include one file called README that is intended entirely for human consumption and documents the contents of the directory. The README file for a demonstration stack program might be:

|README
|
|---------------------------------------------------------
|A demonstration program for the stack abstract data type
|Author:     Author's Name
|Copyright:  No rights reserved
|Warranty:   None - use at your own risk
|---------------------------------------------------------
|
|This directory contains all the tools needed to build
|stackit, the stack demonstration program.
|
|To compile stackit, simply run make.  Before running make,
|read the instructions at the head of the makefile.
|
|This file contains the following components:
|
|      Makefile      -- the input to make, for making stackit
|
|      README        -- this file
|
|      cfixup.h      -- definitions we wish were standard
|      exception.h   -- a civilized exception model for C programs
|
|      main.c        -- the main program
|
|      stack.c       -- the stack module
|      stack.h

The list of files in the README file may be in alphabetical order, with capital letters first, so that the the makefile for the program comes first, the README file comes second, and all other files (with lower-case names) follow. If there are many files, departures from alphabetical order in order to form files into logical groupings may be in order.

The main program must be in a file called main.c, and the instructions for compiling the program must be in a file called Makefile and given in a format acceptable to all versions of the make utility.

Each abstraction used in the program should be clearly described by a pair of files, one giving the interface specification for the module, the other giving the implementation. For example, the stack module is described by stack.h and stack.c in the above example, where the .h extension indicates that a file contains an interface specification and the .c extension indicates that the file contains an implementation.

Include Files

As a rule, header files should be used to give interface specifications. Therefore, each module in the program (other than the main program) will consist of a C source file with a name ending with .c and a header file to be included by users of that abstraction with the same name except that the ending should be .h. C++ or Java programmers might wish to consider each module to be a class, where the .h file gives the interface specification for the class and the .c file gives the implementation.

As a rule, all include directives should be grouped at the head of each file right after the block of comments giving the file's purpose and authorship. Inclusions from the standard C library should be listed first, while inclusions of header files for other components of the application should be listed second. The final include directive in each .c file but the main program should be the include directive for its own header file, set off from the others by a blank line; this is illustrated below:

|/* stack.c */
|
|/****************************************
| * A stack implementation               *
| * Author:  Someone                     *
| ****************************************/
|
|#include <stdio.h>
|#include <longjmp.h>
|#include "cfixup.h"
|
|#include "stack.h"

If the documentation for a standard library routine in Kernighan and Ritchie or on the UNIX man page for that routine lists an include directive, that include directive should be listed in the source program. This requirement is independent of the fact that many of the header files in the C library include other header files from the library. The standard C header files (stored in /usr/include on UNIX systems) are carefully constructed so that no damage occurs if one is included more than once.

If the code in the .c file of a module makes use of some other module, calling code given there or using types defined there, it must include the header file for that module.

The header file for a module should not include other header files! Instead, it should begin with a comment brick that clearly documents the header files that users of this module must include prior to including this header file. Thus, a typical header file might begin:

|/* stack.h */
|
|/****************************************
| * Stack interface specification        *
| * Author:  Somebody                    *
| ****************************************/
|
|/****************************************
| * Prerequisites for use                *
| *   The user must include <stdio.h>    *
| *                         <setjmp.h>   *
| *                         "cfixup.h"   *
| ****************************************/

All functions exported by a module should be documented identically in the .h file and the .c file; the commentary and formatting of the declarations should be identical, and the order of function definitions should be identical, except that the definition in the header file ends with a semicolon and the definition in the C source file is followed with the code of the function. Given this definition in the header file:

|void stack_push( STACK_TYPE i );
|/* push an item on the stack
|   given:  i, the item to be pushed
|*/

The corresponding implementation in the C source file might be:

|void stack_push( STACK_TYPE i )
|/* push an item on the stack
|   given:  i, the item to be pushed
|*/
|{
|        stack[sp] = i;
|        i++;
|}

Local Functions and Variables

The only function definitions in the header file for a module should be definitions of functions that module exports for use by other modules. The implementaiton of a module may require additional functions local to that module. These functions should be clearly documented as local by having their names prefixed with the keyword static!

|static bool stack_full()
|/* test for stack full condition
|*/
|{
|        return sp >= STACKSIZE
|}

Similarly, variables private to a module but shared by multiple functions should be marked as static. For example, a stack of integers might have public access functions called stack_push() and stack_pop(), using variables declared as:

|static int sp = 0;          /* the stack pointer */
|static STACK_TYPE stack[STACKSIZE] /* the actual stack */

Public Variables and Definitions

Some variables, constants and types defined in a package may need to be exported to users of that package. The definitions for these should be in the header file for the package. Some object oriented programming languages enforce the strange restriction that the only components of objects that may be externally manipulated are the methods or access functions of that object. To users of such languages, the idea of making a variable component of an object public may seem strange, but any variable within an object for which both set and inspect methods are provided is effectively public and may as well have been exported.

Exported constants should be given names by define directives. Exported types may be named by C's typedef or struct mechanisms, or they may be given by define directives. These might appear in a header file as follows:

|/* the type of an item on the stack */
|#define STACK_TYPE long int
|
|EXTERN EXCEPTION stack_overflow;

Here, we assume that all identifiers in the public interface to the stack package begin with stack_ or STACK_. The declaration of the variable stack_overflow as an object of type EXCEPTION illustrates an important little problem: When the header file is included included in the source of some other module, we want the keyword extern to be used as a prefix; extern declarations neither allocate storage nor provide an initial value, but merely allow the variable to be used while requiring some other module to define the variable. When the header file is included in the module that is supposed to define this variable, on the other hand, we need the keyword extern to be omitted so that storage is actually allocated and initial values are allowed!

To do this, we use the defined identifier EXTERN. This should expand to nothing when the stack.h is included by stack.c, but it should expand to extern when stack.h is included elsewhere. We arrange this by defining EXTERN as follows in stack.c, right before the include directive:

|#define EXTERN
|#include "stack.h"

We begin the code for stack.h as follows:

|/* stack.h */
|
|/****************************************
| * Stack interface specification        *
| * Author:  Somebody                    *
| ****************************************/
|
|/****************************************
| * Prerequisites for use                *
| *   The user must include <stdio.h>    *
| *                         <setjmp.h>   *
| *                         "cfixup.h"   *
| *   In stack.c, but nowhere else       *
| *     EXTERN must be defined first     *
| ****************************************/
|
|#ifndef EXTERN
|        #define EXTERN extern
|#endif
|
|/****************************************
| * The Interface                        *
| ****************************************/

This code guarantees that EXTERN, if not defined by the user of the header file, will be given its default meaning, the keyword extern. Finally, the header file should end with the removal of EXTERN from the set of defined symbols:

|#undef EXTERN

This is something of a mess, but the net result seems reasonable, and if these examples are followed, verbatim, it will work every time. (Following such a pattern without understanding it is an example of what is known as cargo cult programming, after the cultic rituals of the cargo cults that emerged in remote areas of Melanesia after World War II.)

Casting and Types

In C, all scalar types are subject to the automatic conversions known as widening and narrowing. These are illustrated here:

|{
|        int i;  /* likely to be 16 or 32 bits */
|        char c; /* likely to be 8 bits */
|
|        /* discouraged programming style */
|        i = c;  /* requires widening */
|        c = i;  /* requires narrowing */
|}

We strongly discourage reliance on this feature of C. Instead, we encourage use of explicit typecasting whenever two operands are not of identically the same type, as illustrated here:

|        /* preferred programming style */
|        i = (int) c;
|        c = (char) i;
|}

The C language has a moderately large set of integer subtypes:

long int
signed long int
unsigned long int
Typically 32 bits.
int32-t
uint32-t
Exactly 32 bits.

int
signed int
unsigned int
Typically 16 or 32 bits, depending on the compiler.

short int
signed short int
unsigned short int
Typically 16 bits.
int16-t
uint16-t
Exactly 16 bits.

char
signed char
unsigned char
Typically 8 bits.
int8-t
uint8-t
Exactly 8 bits.

The declarations of these types constrain the representation and the range of values of items of these types, but they say nothing about the uses of the values! Therefore, we strongly encourage the use of defined scalar type names that explicitly indicate the use of the values. Previous sections of this have given several illustrations of such defined types:

|#define STACK_TYPE long int
|#define SECONDS uint8-t

The C compiler will not complain if a value of type bool is assigned to a variable of type STACK_TYPE, but if you develop the habit of explicitly casting the assignment instead of relying on widening or narrowing, the resulting nonsense is clearly apparent:

|{
|        STACK_TYPE s;
|
|        /* legal and not obviously right or wrong */
|        s = 0;
|
|        /* legal yet obviously wrong */
|        s = (STACK_TYPE) bool;
|}

Functions and Function Parameters

In C, if no type is given for the return value of a function, the type int is assumed. We forbid this usage! If a function is supposed to return an integer, it should explicitly be declared as such. If a function is not supposed to return a value, it should be declared as a function of type void. (In Pascal, void functions are called procedures, while in FORTRAN they are called subroutines.)

We forbid functions returning structures or other complex data types. Functions may only return pointers or scalar values (floating or integer or character). This rule maintains compatability with early C compilers that only allowed return values that could fit in one register, and it also discourages the excessive copying of large chunks of data that results when functions return arrays or structures.

We forbid function parameters that are structures or other complex data types. Parameters may only be pointers or scalar values (floating or integer or character). Passing a structure requires copying that structure from the actual parameter to the local variable of the function that holds the parameter; this is expensive if the parameter is big, while passing a pointer to a structure or array is always affordable.

The Main Program

The C standard requires that the main program be packaged as a function called main(). Many C programming examples declare no arguments nor specify the type of the main program. In fact, the type defaults to int, and implementations of the standard C environment are required to pass arguments to the main program, usually from the command line. We require that all C main programs be declared as the last function in a C source file called main.c, with no corresponding main.h, and with exactly the following form:

|int main(int argc, char * argv[])
|{

Any other functions declared in main.c should be declared to be static, directly called only from within main.c or passed as function parameters to functions called from within main.c.

There is no need to add any comments to the function header for the main function, since all aspects of its return value and parameter list are established by the standard.

We strongly recommend that the first productive computation within the main program be that involved in command line argument processing. Typically, this will involve opening various input and output files, setting various global flags, and so on. If the program has command line arguments of any complexity, we recommend that this computation be done in a function called getmyargs() that is called by the first productive statement of the main program, passing argc and argv without change from the main program. Because these are passed unchanged and named and typed exactly the same as in the main program, they need no further explanatory commentary.

Exception Handling

The standard C library includes a special set of tools defined in <setjmp.h>. This package exports the type jmp_buf and the functions setjmp() and longjmp(). In sum, these can be used to build exception handlers and the control transfers required. Unfortunately, the use of these functions to handle exceptions is not well understood, even by many experienced C programmers, so we package them up behind a wall of defines. The stack package, for example, might raise the stack overflow exception if an attempt is made to push on a full stack, as shown below:

|EXCEPTION stack_overflow;
|
|void stack_push( STACK_TYPE i )
|/* push an item on the stack
|   given:  i, the item to be pushed
|*/
|{
|        if (i >= STACKSIZE) EXCEPT_RAISE( stack_overflow );
|        stack[sp] = i;
|        i++;
|}

Usually, as illustrated, the exception name will be defined as part of the visible interface of a package, and it will be raised within that package when the package encounters an exceptional condition. A user of the stack package may catch and handle this exception as follows:

|{
|        STACKTYPE a, b;
|        a = compute_an_item();
|
|        EXCEPT_CATCH( stack_overflow ) {
|                stack_push( a );
|                function_that_calls_push();
|        } EXCEPT_HANDLER {
|                report_error();
|        } EXCEPT_END;
|}

Exception handlers may be nested, and exceptions may be raised anywhere that the exception name is visible. There is no limit on the number of function calls that separate the handler from the point where the exception is raised.

The default handler for an exception should be established immediately at the beginning of the main program, as illustrated here:

|int main(int argc, char * argv[])
|{
|        EXCEPT_INIT( stack_overflow, "Unhandled stack overflow" );
|        getmyargs( argc, argv );

This initialization merely establishes an error message to be output before program termination if this exception is raised outside the control of an EXCEPT_CATCH block. Technically, if the exception is never raised outside the control of such a block, there is no need to establish such default behavior, and the default need not be established until just before such an unhandled exception is raised. Despite this, we require each exception to be initialized with an appropriate message as the very first thing the main program does.

The definitions of the exception handling routines are given in a special header file, exceptions.h that is not easy to follow. Only if you want to test your understanding of the long-jump package in the C library should you read the following code:

|/* exception.h */
|
|/*******************************************
| * The exception model that we really want *
| * Author:    Douglas W. Jones             *
| * Date:      September 14, 2000           *
| * Copyright: No rights reserved           *
| * Warranty:  None - use at your own risk  *
| *******************************************/
|
|/*******************************************
| * Prerequisites for use                   *
| *   The user must include <stdio.h>       *
| *                         <setjmp.h>      *
| *******************************************/
|
|/* a private type to this include file */
|struct exception_ {
|        jmp_buf jb;
|};
|
|#define EXCEPTION struct exception_ *
|
|#define EXCEPT_INIT( EXCEPT, MSG )                              \
|        EXCEPT = (EXCEPTION)malloc( sizeof(struct exception_) );\
|        if (setjmp( EXCEPT->jb ) != 0) {                        \
|                fputs( MSG, stderr );                           \
|                exit(-1);                                       \
|        }
|
|#define EXCEPT_RAISE( EXCEPT )                                  \
|        longjmp( EXCEPT->jb, 1 );
|
|#define EXCEPT_CATCH( EXCEPT )                                  \
|        {                                                       \
|                EXCEPTION except_save = EXCEPT;                 \
|                EXCEPTION *except_who = &EXCEPT;                \
|                struct exception_ except_new;                   \
|                if (setjmp( except_new.jb ) == 0) {             \
|                        EXCEPT = &except_new;
|
|#define EXCEPT_HANDLER                                          \
|                        *except_who = except_save;              \
|                } else {                                        \
|                        *except_who = except_save;
|
|#define EXCEPT_END                                              \
|                }                                               \
|        }

Afterthought: The exception mechanism given here is based on the exception model of the Ada programming language. Costello and Truta have developed a similar exception handling package for C that is more in the C and C++ style; an archived release is available at cexcept.sourceforge.net. The semantic difference leads to differences in runtime speed and flexibility. The mechanism given here finds the correct handler faster, but it makes it a bit difficult to write one handler that handles multiple logically distinct exceptions.

Makefiles and Inter-File Relationships

The relationships between the files of any multiple-file program should be clearly documented. For example, the running example, a program using stacks, might consist of the following pieces:

cfixup.h
Definitions we wish were standard in C

exception.h
The exception model that we really want

stack.h
The stack interface specification

stack.c
The stack implementation

main.c
The main program

To compile this program, we need to translate each C source program to object format, producing the following additional files:

stack.o
Produced by compiling stack.c

main.o
Produced by compiling main.c

Finally, we must link these object files together to produce an executable file. By default, C compilers produce their executable output on a.out (a name that originally meant assembler output), but usually, programs should have names. For example, suppose we want our stack application to be named stackit. This adds one more file:

stackit
Produced by linking main.o with stack.o and the C library.

All these files would typically be stored in the same directory, probably a directory named stackit, and when you make some change, perhaps editing just stack.h, you naturally wonder, which files need to be recompiled to make which others as a result of the change you made?

All versions of UNIX support a powerful tool for this purpose, always called make, but there are many versions of make that differ in numerous details. All versions take, as input, a file called Makefile that describes all the components of a large program, how they relate to each other, and what commands should be executed to produce each secondary component from the primary components on which it depends. After making any change to a program described by a makefile, the make command will read Makefile to determine what files are involved, check the dates of last modifications, and make exactly those updates required.

A makefile consists of a sequence of entries of the form:

|a: b c d
|        command

This entry says that file a depends on files b, c and d. That is, if any change is made to files b, c or d, then file a must be rebuilt from them. When make detects a need to rebuild a, it will do so by executing command. The makefile for our example might be:

|# Makefile
|
|###########################################################
|# File dependencies and instructions for building stackit #
|# Author:  Author's Name                                  #
|# Instructions:                                           #
|#          make       -- builds stackit                   #
|###########################################################
|
|stackit: main.o stack.o
|        cc -o stackit main.o stack.o
|
|main.o: main.c stack.h exception.h cfixup.h
|        cc -c main.c
|
|stack.o: stack.c stack.h exception.h cfixup.h
|        cc -c stack.c

As illustrated in the above, comments in a makefile are preceeded by a pound sign (#). Aside from the change in comment symbol, we require that the same kind and style of commentary be provided in makefiles as is provided in any other part of the program.

The -c command-line option on UNIX-based compilers is used to direct the compiler to produce a .o file from each source file it is given as input, without making any effort to link them. The -o option causes the compiler to place its output in the named file instead of placing the output in a.out.

There are many different C compilers. On most UNIX systems, one of them is called cc, but it doesn't always generate the best error diagnostic, and if you have a particularly nasty syntax error, you may want to explore other C compilers to see which gives the most useful error diagnostics. Making such a change could require changing every use of cc in the makefile, but we can avoid this by rewriting the above using the standard parameter notation of make:

|# Makefile
|
|###########################################################
|# File dependencies and instructions for building stackit #
|# Author:  Author's Name                                  #
|# Instructions:                                           #
|#          make       -- builds stackit                   #
|###########################################################
|
|###########################################################
|# Configuration constants                                 #
|###########################################################
|
|# what C compiler to use (HP-UX choices are c89, gcc, CC, g++, cc -Aa)
|CC = cc -Aa
|
|###########################################################
|# File dependencies                                       #
|###########################################################
|
|stackit: main.o stack.o
|        $(CC) -o stackit main.o stack.o
|
|main.o: main.c stack.h exception.h cfixup.h
|        $(CC) -c main.c
|
|stack.o: stack.c stack.h exception.h cfixup.h
|        $(CC) -c stack.c

A single change can now be used to force the use of a different compiler. The symbol definition facility of make is very similar to that of the C preprocessor's define mechanism, except that the notation for definition uses the equals sign and the notation for reference to a defined symbol uses the awkward $() construct.

The -Aa option given in the above example applies only to the HP-UX cc compiler; it requires the compiler to accept ANSI C input without warnings. Without this option, the compiler flags every ANSI extension to the original definition of C as being a nonstandard feature.

Some files in large programs may have huge lists of dependencies. We can deal with these in two ways. First, we can use multiple lines to describe dependencies:

|stack.o: stack.c
|stack.o: stack.h exception.h cfixup.h
|        $(CC) -c stack.c

Second, we can use defined symbols as names for whole lists of related files:

|OBJECTS = main.o stack.o
|stackit: $(OBJECTS)
|        $(CC) -o stackit $(OBJECTS)

There are a large number of versions of make, each with unique, interesting, arcane and sometimes useful extensions. The style guidelines given here should work equally well under most of them, since these guidelines avoid all of the extensions and stick only to the core function of make.

Afterword: Makefiles are incredibly powerful, but they can also be a source of real trouble. The problem is that the command language for the different available versions of the make utility differ in many details, and worse yet, much of what is known about constructing good makefiles has passed down from programmer to programmer by word of mouth and not through well written widely available documentation.

Parameters to the Compilation

Some attributes of data structures should be set as constants, but these constants may need to be changed from one compilation to another. For example, in the stack module that has been used as a running example, the size of the array used to represent the stack might need to be changed from one compilation to the next. As with changes of the compiler being used, it is reasonable to consider such constants to be overall parameters to the source code and include them in the makefile instead of including them in obscure header files. The makefile for our example application would therefore begin:

|# Makefile
|
|###########################################################
|# File dependencies and instructions for building stackit #
|# Author:  Author's Name                                  #
|# Instructions:                                           #
|#          make       -- builds stackit                   #
|###########################################################
|
|###########################################################
|# Configuration constants                                 #
|###########################################################
|
|# what C compiler to use (HP-UX choices are c89, gcc, CC, g++, cc -Aa)
|CC = cc -Aa
|
|# the stack size to use
|STACKSIZE = 27

Now, when we need to recompile the application with a different limit on the stack, we merely change the makefile to reflect this, with no need to dip into the header files where more obscure constants are stored. We transmit this constant to the points where it is needed by adding it, as a parameter, to the compilation commands where it is needed. In this case, the stack implementation in stack.c needs to know STACKSIZE, and this constant is needed nowhere else. Therefore, we change the compile command for stack.c to transmit this:

|stack.o: stack.c Makefile
|stack.o: stack.h exception.h cfixup.h
|        $(CC) -c -DSTACKSIZE=$(STACKSIZE) stack.c

The -D option on most C and C++ compilers causes the compiler to define the following symbol as if it had been defined by a #define preprocessor directive.

Note that we have added the makefile itself to the list of file dependencies for the parameterized program. Since the makefile contains parameters to the compilation instructions for the program, if someone changes the makefile, we should recompile that program.

Of course, if the source file for some program depends on a symbol that must be provided by the makefile instead of coming from some header file, the comments in the source file should say this. The comment demanding such a definition should be given right at the head of the source file, as suggested here:

|/* stack.c */
|
|/****************************************
| * A stack implementation               *
| * Author:  Someone                     *
| ****************************************/
|
|/* stack size */
|/* STACKSIZE defined by Makefile */
|
|#include <stdio.h>
|#include <longjmp.h>
|#include "cfixup.h"

Last Modified:Tuesday, 26-Aug-2008 11:14:53 CDT. Valid HTML 4.01!