Previous Section  < Day Day Up >  Next Section

9.5. Optimizing Process CPU Usage

When a particular process or application has been determined to be a CPU bottleneck, it is necessary to determine where (and why) it is spending its time.

Figure 9-3 shows the method for investigating a processs CPU usage.

Figure 9-3.


Go to Section 9.5.1 to begin the investigation.

9.5.1. Is the Process Spending Time in User or Kernel Space?

You can use the time command to determine whether an application is spending its time in kernel or user mode. oprofile can also be used to determine where time is spent. By profiling per process, it is possible to see whether a process is spending its time in the kernel or user space.

If the application is spending a significant amount of time in kernel space (greater than 25 percent), go to Section 9.5.2. Otherwise, go to Section 9.5.3.

9.5.2. Which System Calls Is the Process Making, and How Long Do They Take to Complete?

Next, run strace to see which system calls are made and how long they take to complete. You can also run oprofile to see which kernel functions are being called.

It may be possible to increase performance by minimizing the number of system calls made or by changing which systems calls are made on behalf of the program. Some of the system's calls may be unexpected and a result of the application's calls to various libraries. You can run ltrace and strace to help determine why they are being made.

Now that the problem has been identified, it is up to you to fix it. Go to Section 9.9.

9.5.3. In Which Functions Does the Process Spend Time?

Next, run oprofile on the application using the cycle event to determine which functions are using all the CPU cycles (that is, which functions are spending all the application time).

Keep in mind that although oprofile shows you how much time was spent in a process, when profiling at the function level, it is not clear whether a particular function is hot because it is called very often or whether it just takes a long time to complete.

One way to determine which case is true is to acquire a source-level annotation from oprofile and look for instructions/source lines that should have little overhead (such as assignments). The number of samples that they have will approximate the number of times that the function was called relative to other high-cost source lines. Again, this is only approximate because oprofile samples only the CPU, and out-of-order processors can misattribute some cycles.

It is also helpful to get a call graph of the functions to determine how the hot functions are being called. To do this, go to Section 9.5.4.

9.5.4. What Is the Call Tree to the Hot Functions?

Next, you can figure out how and why the time-consuming functions are being called. Running the application with gprof can show the call tree for each function. If the time-consuming functions are in a library, you can use ltrace to see which functions. Finally, you can use newer versions of oprofile that support call-tree tracing. Alternatively, you can run the application in gdb and set a breakpoint at the hot function. You can then run that application, and it will break during every call to the hot function. At this point, you can generate a backtrace and see exactly which functions and source lines made the call.

Knowing which functions call the hot functions may enable you to eliminate or reduce the calls to these functions, and correspondingly speed up the application.

If reducing the calls to the time-consuming functions did not speed up the application, or it is not possible to eliminate these functions, go to Section 9.5.5.

Otherwise, go to Section 9.9.

9.5.5. Do Cache Misses Correspond to the Hot Functions or Source Lines?

Next, run oprofile, cachegrind, and kcache against your application to see whether the time-consuming functions or source lines are those with a high number of cache misses. If they are, try to rearrange or compress your data structures and accesses to make them more cache friendly. If the hot lines do not correspond to high cache misses, try to rearrange your algorithm to reduce the number of times that the particular line or function is executed.

In any event, the tools have told you as much as they can, so go to Section 9.9.

    Previous Section  < Day Day Up >  Next Section