BKL: The Big Kernel Lock

Welcome to the redheaded stepchild of the kernel. The Big Kernel Lock (BKL) is a global spin lock that was created to ease the transition from Linux's original SMP implementation to fine-grained locking. The BKL has some interesting properties:

You can sleep while holding the BKL. The lock is automatically dropped when the task is unscheduled and reacquired when the task is rescheduled. Of course, this does not mean it is always safe to sleep while holding the BKL, merely that you can and you will not deadlock.
The BKL is a recursive lock. A single process can acquire the lock multiple times and not deadlock, as it would with a spin lock.
You can use the BKL only in process context.
It is evil.

These features helped ease the transition from kernel version 2.0 to 2.2. When SMP support was introduced in kernel version 2.0, only one task could be in the kernel at a time. (Of course, now the kernel is quite finely threadedwe have come a long way.) A goal of 2.2 was to allow multiple processors to execute in the kernel concurrently. The BKL was introduced to help ease the transition to finer grained locking. It was a great aide then; now it is a scalability burden^[5].

^[5] It may not be as terrible as some make it out to besome people believe it to be the kernel incarnation of the devil.

Use of the BKL is discouraged. In fact, new code should never introduce locking that uses the BKL. The lock is still fairly well used in parts of the kernel, however. Therefore, understanding the BKL and its interfaces is important. The BKL behaves like a spin lock, with the additions previously discussed. The function lock_kernel() acquires the lock and the function unlock_kernel() releases the lock. A single thread of execution may acquire the lock recursively, but must then call unlock_kernel() an equal number of times to release the lock. On the last unlock call the lock will be released. The function kernel_locked() returns nonzero if the lock is currently held; otherwise, it returns zero. These interfaces are declared in <linux/smp_lock.h>. Here is sample usage:

lock_kernel();

/*
 * Critical section, synchronized against all other BKL users...
 * Note, you can safely sleep here and the lock will be transparently
 * released. When you reschedule, the lock will be transparently
 * reacquired. This implies you will not deadlock, but you still do
 * not want to sleep if you need the lock to protect data here!
 */

unlock_kernel();

The BKL also disables kernel preemption while it is held. On UP kernels, the BKL code does not actually perform any physical locking. Table 9.8 has a complete list of the BKL functions.

Table 9.8. List of BKL functions
Function
Description
lock_kernel()
Acquires the BKL
unlock_kernel()
Releases the BKL
kernel_locked()
Returns nonzero if the lock is held and zero otherwise (UP always returns nonzero)

One of the major issues concerning the BKL is determining what the lock is protecting. Too often, the BKL is seemingly associated with code (for example, "it synchronizes callers to foo()") instead of data ("it protects the foo structure"). This makes replacing BKL uses with a spin lock difficult because it is not easy to determine just what is being locked. The replacement is made even harder in that the relationship between all BKL users needs to be determined.

Seq Locks

The seq lock is a new type of lock introduced in the 2.6 kernel. It provides a very simple mechanism for reading and writing shared data. It works by maintaining a sequence counter. Whenever the data in question is written to, a lock is obtained and a sequence number is incremented. Prior to and after reading the data, the sequence number is read. If the values are the same, then a write did not begin in the middle of the read. Further, if the values are even then a write is not underway (grabbing the write lock makes the value odd, whereas releasing it makes it even because the lock starts at zero).

To define a seq lock:

seqlock_t mr_seq_lock = SEQLOCK_UNLOCKED;

The write path is then

write_seqlock(&mr_seq_lock);
/* write lock is obtained... */
write_sequnlock(&mr_seq_lock);

This looks like normal spin lock code. The oddness comes in with the read path, which is quite a bit different:

unsigned long seq;

do {
        seq = read_seqbegin(&mr_seq_lock);
        /* read data here ... */
} while (read_seqretry(&mr_seq_lock, seq));

Seq locks are useful to provide a very lightweight and scalable lock for use with many readers and a few writers. Seq locks, however, favor writers over readers. The write lock always succeeds in being obtained so long as there are no other writers. Readers do not affect the write lock, as is the case with reader-writer spin locks and semaphores. Furthermore, pending writers continually cause the read loop (the previous example) to repeat, until there are no longer any writers holding the lock.