Previous Page
Next Page

12.6. Implementation

The example that follows is an implementation of the principal functions for a binary search tree, and uses dynamic memory management. This tree is intended to be usable for data of any kind. For this reason, the structure type of the nodes includes a flexible member to store the data, and a member indicating the size of the data:

    typedef struct Node { struct Node *left,    // Pointers to the left and
                                      *right;   // right child nodes.
                          size_t size;          // Size of the data payload.
                          char data[ ];          // The data itself.
                        } Node_t;

The pointers left and right are null pointers if the node has no left or right child.

As the user of our implementation, you must provide two auxiliary functions. The first of these is a function to obtain a key that corresponds to the data value passed to it, and the second compares two keys. The first function has the following type:

    typedef const void *GetKeyFunc_t( const void *dData );

The second function has a type like that of the comparison function used by the standard function bsearch( ):

    typedef int CmpFunc_t( const void *pKey1, const void *pKey2 );

The arguments passed on calling the comparison function are pointers to the two keys that you want to compare. The function's return value is less than zero, if the first key is less than the second; or equal to zero, if the two keys are equal; or greater than zero, if the first key is greater than the second. The key may be the same as the data itself. In this case, you need to provide only a comparison function.

Next, we define a structure type to represent a tree. This structure has three members: a pointer to the root of the tree; a pointer to the function to calculate a key, with the type GetKeyFunc_t; and a pointer to the comparison function, with the type CmpFunc_t.

    typedef struct { struct Node  *pRoot;     // Pointer to the root.
                     CmpFunc_t    *cmp;       // Compares two keys.
                     GetKeyFunc_t *getKey;    // Converts data into a key value.
                   } BST_t;

The pointer pRoot is a null pointer while the tree is empty.

The elementary operations for a binary search tree are performed by functions that insert, find, and delete nodes, and functions to traverse the tree in various ways, performing a programmer-specified operation on each element if desired.

The prototypes of these functions, and the typedef declarations of GetKeyFunc_t, CmpFunc_t, and BST_t, are placed in the header file BSTree.h. To use this binary tree implementation, you must include this header file in the program's source code.

The function prototypes in BSTree.h are:

BST_t *newBST( CmpFunc_t *cmp, GetKeyFunc_t *getKey );

This function dynamically generates a new object with the type BST_t, and returns a pointer to it.

_Bool BST_insert( BST_t *pBST, const void *pData, size_t size );

BST_insert( ) dynamically generates a new node, copies the data referenced by pData to the node, and inserts the node in the specified tree.

const void *BST_search( BST_t *pBST, const void *pKey );

The BST_search( ) function searches the tree and returns a pointer to the data item that matches the key referenced by the pKey argument.

_Bool BST_erase( BST_t *pBST, const void *pKey );

This function deletes the first node whose data contents match the key referenced by pKey.

void BST_clear( BST_t *pBST );

BST_clear( ) deletes all nodes in the tree, leaving the tree empty.

int BST_inorder( BST_t *pBST, _Bool (*action)( void *pData ));

int BST_rev_inorder( BST_t *pBST, _Bool (*action)( void *pData ));

int BST_preorder( BST_t *pBST, _Bool (*action)( void *pData ));

int BST_postorder( BST_t *pBST, _Bool (*action)( void *pData ));

Each of these functions traverses the tree in a certain order, and calls the function referenced by action to manipulate the data contents of each node. If the action modifies the node's data, then at least the key value must remain unchanged to preserve the tree's sorting order.

The function definitions, along with some recursive helper functions, are placed in the source file BSTree.c. The helper functions are declared with the static specifier, because they are for internal use only, and not part of the search tree's "public" interface. The file BSTree.c also contains the definition of the nodes' structure type. You as the programmer do not need to deal with the contents of this file, and may be content to use a binary object file compiled for the given system, adding it to the command line when linking the program.

12.6.1. Generating an Empty Tree

When you create a new binary search tree, you specify how a comparison between two data items is performed. For this purpose, the newBST( ) function takes as its arguments a pointer to a function that compares two keys, and a pointer to a function that calculates a key from an actual data item. The second argument can be a null pointer if the data itself serves as the key for comparison. The return value is a pointer to a new object with the type BST_t.

    const void *defaultGetKey( const void *pData ) { return pData; }

    BST_t *newBST( CmpFunc_t *cmp, GetKeyFunc_t *getKey )
      BST_t *pBST = NULL;
      if ( cmp != NULL )
        pBST = malloc( sizeof(BST_t) );
      if ( pBST != NULL )
        pBST->pRoot = NULL;
        pBST->cmp = cmp;
        pBST->getKey = (getKey != NULL) ? getKey : defaultGetKey;
      return pBST;

The pointer to BST_t returned by newBST( ) is the first argument to all the other binary-tree functions. This argument specifies the tree on which you want to perform a given operation.

12.6.2. Inserting New Data

To copy a data item to a new leaf node in the tree, pass the data to the BST_insert( ) function. The function inserts the new leaf at a position that is consistent with the binary tree's sorting condition. The recursive algorithm involved is simple: if the current subtree is emptythat is, if the pointer to its root node is a null pointerthen insert the new node as the root by making the parent point to it. If the subtree is not empty, continue with the left subtree if the new data is less than the current node's data; otherwise, continue with the right subtree. The recursive helper function insert( ) applies this algorithm.

The insert( ) function takes an additional argument, which is a pointer to a pointer to the root node of a subtree. Because this argument is a pointer to a pointer, the function can modify it in order to link a new node to its parent. BST_insert( ) returns TRue if it succeeds in inserting the new data; otherwise, false.

    static _Bool insert( BST_t *pBST, Node_t **ppNode, const void *pData,
                         size_t size );

    _Bool BST_insert( BST_t *pBST, const void *pData, size_t size )
      if ( pBST == NULL || pData == NULL || size == 0 )
        return false;
      return insert( pBST, &(pBST->pRoot), pData, size );

    static _Bool insert( BST_t *pBST, Node_t **ppNode, const void *pData,
                         size_t size )
      Node_t *pNode = *ppNode;        // Pointer to the root node of the subtree
                                      // to insert the new node in.
      if ( pNode == NULL )
      {                               // There's a place for a new leaf here.
        pNode = malloc( sizeof(Node_t) + size );
        if ( pNode != NULL )
          pNode->left = pNode->right = NULL;     // Initialize the new node's
                                                 // members.
          memcpy( pNode->data, pData, size );
          *ppNode = pNode;                       // Insert the new node.
          return true;
          return false;
      else                                   // Continue looking for a place ...
        const void *key1 = pBST->getKey( pData ),
                   *key2 = pBST->getKey( pNode->data );
        if ( pBST->cmp( key1, key2 ) < 0 )       // ... in the left subtree,
          return insert( pBST, &(pNode->left), pData, size );
        else                                     // or in the right subtree.
          return insert( pBST, &(pNode->right), pData, size );

12.6.3. Finding Data in the Tree

The function BST_search( ) uses the binary search algorithm to find a data item that matches a given key. If a given node's data does not match the key, the search continues in the node's left subtree if the key is less than that of the node's data, or in the right subtree if the key is greater. The return value is a pointer to the data item from the first node that matches the key, or a null pointer if no match was found.

The search operation uses the recursive helper function search( ). Like insert( ), search( ) takes as its second parameter a pointer to the root node of the subtree to be searched.

    static const void *search( BST_t *pBST, const Node_t *pNode,
                               const void *pKey );

    const void *BST_search( BST_t *pBST, const void *pKey )
      if  ( pBST == NULL || pKey == NULL ) return NULL;
      return search( pBST, pBST->pRoot, pKey );     // Start at the root of the
                                                    // tree.

    static const void *search( BST_t *pBST, const Node_t *pNode,
                               const void *pKey )
      if ( pNode == NULL )
        return NULL;                                   // No subtree to search;
                                                       // no match found.
      {                                                // Compare data:
        int cmp_res = pBST->cmp( pKey, pBST->getKey(pNode->data) );

        if ( cmp_res == 0 )                            // Found a match.
          return pNode->data;
        else if ( cmp_res < 0 )                        // Continue the search
          return search( pBST, pNode->left, pKey );    // in the left subtree,
          return search( pBST, pNode->right, pKey );   // or in the right
                                                       // subtree.

12.6.4. Removing Data from the Tree

The BST_erase( ) function searches for a node that matches the specified key, and deletes it if found. Deleting means removing the node from the tree structure and releasing the memory it occupies. The function returns false if it fails to find a matching node to delete, or true if successful.

The actual searching and deleting is performed by means of the recursive helper function erase( ). The node needs to be removed from the tree in such a way that the tree's sorting condition is not violated. A node that has no more than one child can be removed simply by linking its child, if any, to its parent. If the node to be removed has two children, though, the operation is more complicated: you have to replace the node you are removing with the node from the right subtree that has the smallest data value. This is never a node with two children. For example, to remove the root node from the tree in Figure 12-1, we would replace it with the node that has the value 11. This removal algorithm is not the only possible one, but it has the advantage of not increasing the tree's height.

The recursive helper function detachMin( ) plucks the minimum node from a specified subtree, and returns a pointer to the node:

    static Node_t *detachMin( Node_t **ppNode )
      Node_t *pNode = *ppNode;               // A pointer to the current node.
      if ( pNode == NULL )
        return NULL;                         // pNode is an empty subtree.
      else if ( pNode->left != NULL )
        return detachMin( &(pNode->left) );  // The minimum is in the left
                                             // subtree.
      {                                // pNode points to the minimum node.
        *ppNode = pNode->right;        // Attach the right child to the parent.
        return pNode;

Now we can use this function in the definition of erase( ) and BST_erase( ):

    static _Bool erase( BST_t *pBST, Node_t **ppNode, const void *pKey );

    _Bool BST_erase( BST_t *pBST, const void *pKey )
      if ( pBST == NULL || pKey == NULL ) return false;
      return erase( pBST, &(pBST->pRoot), pKey );     // Start at the root of
                                                      // the tree.

    static _Bool erase( BST_t *pBST, Node_t **ppNode, const void *pKey )
      Node_t *pNode = *ppNode;                 // Pointer to the current node.
      if ( pNode == NULL )
        return false;                          // No match found.

                                               // Compare data:
      int cmp_res = pBST->cmp( pKey, pBST->getKey(pNode->data) );

      if ( cmp_res < 0 )                              // Continue the search
        return  erase( pBST, &(pNode->left), pKey );  // in the left subtree,
      else if ( cmp_res > 0 )
        return erase( pBST, &(pNode->right), pKey );  // or in the right
                                                      // subtree.
      {                                       // Found the node to be deleted.
        if ( pNode->left == NULL )            // If no more than one child,
          *ppNode = pNode->right;             // attach the child to the parent.
        else if ( pNode->right == NULL )
          *ppNode = pNode->left;
        else                              // Two children: replace the node with
        {                                 // the minimum from the right subtree.
          Node_t *pMin = detachMin( &(pNode->right) );
          *ppNode = pMin;            // Graft it onto the deleted node's parent.
          pMin->left  = pNode->left;     // Graft the deleted node's children.
          pMin->right = pNode->right;
        free( pNode );                   // Release the deleted node's storage.
        return true;

A function in Example 12-2, BST_clear( ), deletes all the nodes of a tree. The recursive helper function clear( ) deletes first the descendants of the node referenced by its argument, then the node itself.

Example 12-2. The BST_clear( ) and clear( ) functions
static void clear( Node_t *pNode );

void BST_clear( BST_t *pBST )
  if ( pBST != NULL)
    clear( pBST->pRoot );
    pBST->pRoot = NULL;

static void clear( Node_t *pNode )
  if ( pNode != NULL )
    clear( pNode->left );
    clear( pNode->right );
    free( pNode );

12.6.5. Traversing a Tree

There are several recursive schemes for traversing a binary tree . They are often designated by abbreviations in which L stands for a given node's left subtree, R for its right subtree, and N for the node itself:

In-order or LNR traversal

First traverse the node's left subtree, then visit the node itself, then traverse the right subtree.

Pre-order or NLR traversal

First visit the node itself, then traverse its left subtree, then its right subtree.

Post-order or LRN traversal

First traverse the node's left subtree, then the right subtree, then visit the node itself.

An in-order traversal visits all the nodes in their sorting order, from least to greatest. If you print each node's data as you visit it, the output appears sorted.

It's not always advantageous to process the data items in their sorting order, though. For example, if you want to store the data items in a file and later insert them in a new tree as you read them from the file, you might prefer to traverse the tree in pre-order. Then reading each data item in the file and inserting it will reproduce the original tree structure. And the clear( ) function in Example 12-2 uses a post-order traversal to avoid destroying any node before its children.

Each of the traversal functions takes as its second argument a pointer to an "action" function that it calls for each node visited. The action function takes as its argument a pointer to the current node's data, and returns true to indicate success and false on failure. This functioning enables the tree-traversal functions to return the number of times the action was performed successfully.

The following example contains the definition of the BST_inorder( ) function, and its recursive helper function inorder( ). The other traversal functions are similar.

    static int inorder( Node_t *pNode, _Bool (*action)(void *pData) );

    int BST_inorder( BST_t *pBST, _Bool (*action)(void *pData) )
      if  ( pBST == NULL || action ==  NULL )
        return 0;
        return inorder( pBST->pRoot, action );

    static int inorder( Node_t *pNode, _Bool (*action)(void *pData) )
      int count = 0;
      if ( pNode == NULL )
        return 0;

      count = inorder( pNode->left, action );       // L: Traverse the left
                                                    // subtree.
      if ( action( pNode->data ))                   // N: Visit the current node
        ++count;                                    // itself.
      count += inorder( pNode-> right, action );    // R: Traverse the right
                                                    // subtree.
      return count;

12.6.6. A Sample Application

To illustrate one use of a binary search tree, the filter program in Example 12-3, sortlines, presents a simple variant of the Unix utility sort. It reads text line by line from the standard input stream, and prints the lines in sorted order to standard output. A typical command line to invoke the program might be:

    sortlines < demo.txt

This command prints the contents of the file demo.txt to the console.

Example 12-3. The sortlines program
// This program reads each line of text into a node of a binary tree,
// then prints the text in sorted order.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "BSTree.h"               // Prototypes of the BST functions.

#define LEN_MAX 1000              // Maximum length of a line.
char buffer[LEN_MAX];

// Action to perform for each line:
_Bool printStr( void *str ) { return printf( "%s", str ) >= 0; }

int main( )
  BST_t *pStrTree = newBST( (CmpFunc_t*)strcmp, NULL );
  int n;

  while ( fgets( buffer, LEN_MAX, stdin ) != NULL )   // Read each line.
    size_t len = strlen( buffer );                    // Length incl. newline
                                                      // character.
    if ( !BST_insert( pStrTree, buffer, len+1 ))      // Insert the line in the
       break;                                         // tree.
  if ( !feof(stdin) )
  {                                                   // If unable to read the
                                                      // entire text:
    fprintf( stderr, "sortlines: "
             "Error reading or storing text input.\n" );
    exit( EXIT_FAILURE );
  n = BST_inorder( pStrTree, printStr );              // Print each line, in
                                                      // sorted order.
  fprintf( stderr, "\nsortlines: Printed %d lines.\n", n );

  BST_clear( pStrTree );                              // Discard all nodes.
  return 0;

The loop that reads input lines breaks prematurely if a read error occurs, or if there is insufficient memory to insert a new node in the tree. In such cases, the program exits with an error message.

An in-order traversal visits every node of the tree in sorted order. The return value of BST_inorder( ) is the number of lines successfully printed. sortlines prints the error and success information to the standard error stream, so that it is separate from the actual data output. Redirecting standard output to a file or a pipe affects the sorted text, but not these messages.

The BST_clear( ) function call is technically superfluous, as all of the program's dynamically allocated memory is automatically released when the program exits.

The binary search tree presented in this chapter can be used for any kind of data. Most applications require the BST_search( ) and BST_erase( ) functions in addition to those used in Example 12-2. Furthermore, more complex programs will no doubt require functions not presented here, such as one to keep the tree's left and right branches balanced.

Previous Page
Next Page