[ Team LiB ] Previous Section Next Section

30.11 TCP Prethreaded Server, per-Thread accept

We found earlier in this chapter that it is faster to prefork a pool of children than to create one child for every client. On a system that supports threads, it is reasonable to expect a similar speedup by creating a pool of threads when the server starts, instead of creating a new thread for every client. The basic design of this server is to create a pool of threads and then let each thread call accept. Instead of having each thread block in the call to accept, we will use a mutex lock (similar to Section 30.8) that allows only one thread at a time to call accept. There is no reason to use file locking to protect the call to accept from all the threads, because with multiple threads in a single process, we know that a mutex lock can be used.

Figure 30.27 shows the pthread07.h header that defines a Thread structure that maintains some information about each thread.

Figure 30.27 pthread07.h header.

server/pthread07.h

1 typedef struct {
2     pthread_t thread_tid;      /* thread ID */
3     long    thread_count;      /* # connections handled */
4 } Thread;
5 Thread *tptr;                  /* array of Thread structures; calloc'ed */

6 int     listenfd, nthreads;
7 socklen_t addrlen;
8 pthread_mutex_t mlock;

We also declare a few globals, such as the listening socket descriptor and a mutex variable that all the threads need to share.

Figure 30.28 shows the main function.

Figure 30.28 main function for prethreaded TCP server.

server/serv07.c

 1 #include    "unpthread.h"
 2 #include    "pthread07.h"

 3 pthread_mutex_t mlock = PTHREAD_MUTEX_INITIALIZER;

 4 int
 5 main(int argc, char **argv)
 6 {
 7     int     i;
 8     void    sig_int(int), thread_make(int); 

 9     if (argc == 3)
10         listenfd = Tcp_listen(NULL, argv[1], &addrlen);
11     else if (argc == 4)
12         listenfd = Tcp_listen(argv[1], argv[2], &addrlen);
13     else
14         err_quit("usage: serv07 [ <host> ] <port#> <#threads>");
15     nthreads = atoi(argv[argc - 1]);
16     tptr = Calloc(nthreads, sizeof(Thread));

17     for (i = 0;  i < nthreads; i++)
18         thread_make(i);          /* only main thread returns */

19     Signal(SIGINT, sig_int);

20     for ( ; ; )
21         pause();                 /* everything done by threads */
22 }

The thread_make and thread_main functions are shown in Figure 30.29.

Figure 30.29 thread_make and thread_main functions.

server/pthread07.c

 1 #include    "unpthread.h"
 2 #include    "pthread07.h"

 3 void
 4 thread_make(int i)
 5 {
 6     void     *thread_main(void *);

 7     Pthread_create(&tptr[i].thread_tid, NULL, &thread_main, (void *) i);
 8     return;                     /* main thread returns */
 9 }

10 void *
11 thread_main(void *arg)
12 {
13     int     connfd;
14     void    web_child(int);
15     socklen_t clilen;
16     struct sockaddr *cliaddr;

17     cliaddr = Malloc(addrlen);

18     printf("thread %d starting\n", (int) arg);
19     for ( ; ; ) {
20         clilen = addrlen;
21         Pthread_mutex_lock(&mlock);
22         connfd = Accept(listenfd, cliaddr, &clilen);
23         Pthread_mutex_unlock(&mlock);
24         tptr[(int) arg].thread_count++;

25         web_child(connfd);      /* process request */
26         Close(connfd);
27     }
28 }

Create thread

7 Each thread is created and executes the thread_main function. The only argument is the index number of the thread.

21–23 The thread_main function calls the functions pthread_mutex_lock and pthread_mutex_unlock around the call to accept.

Comparing rows 6 and 7 in Figure 30.1, we see that this latest version of our server is faster than the create-one-thread-per-client version. We expect this, since we create the pool of threads only once, when the server starts, instead of creating one thread per client. Indeed, this version of our server is the fastest on these two hosts.

Figure 30.2 shows the distribution of the thread_count counters in the Thread structure, which we print in the SIGINT handler when the server is terminated. The uniformity of this distribution is caused by the thread scheduling algorithm that appears to cycle through all the threads in order when choosing which thread receives the mutex lock.

On a Berkeley-derived kernel, we do not need any locking around the call to accept and can make a version of Figure 30.29 without any mutex locking and unlocking. Doing so, however, increases the process control CPU time. If we look at the two components of the CPU time, the user time and the system time, without any locking, the user time decreases (because the locking is done in the threads library, which executes in user space), but the system time increases (the kernel's thundering herd as all threads blocked in accept are awakened when a connection arrives). Since some form of mutual exclusion is required to return each connection to a single thread, it is faster for the threads to do this themselves than for the kernel.

    [ Team LiB ] Previous Section Next Section