Slab Allocator Interface

A new cache is created via

kmem_cache_t * kmem_cache_create(const char *name, size_t size,
                     size_t align, unsigned long flags,
                     void (*ctor)(void*, kmem_cache_t *, unsigned long),
                     void (*dtor)(void*, kmem_cache_t *, unsigned long))

The first parameter is a string storing the name of the cache. The second parameter is the size of each element in the cache. The third parameter is the offset of the first object within a slab. This is done to ensure a particular alignment within the page. Normally, zero is sufficient, which results in the standard alignment. The flags parameter specifies optional settings controlling the cache's behavior. It can be zero, specifying no special behavior, or one or more of the following flags OR'ed together:

SLAB_NO_REAP This flag instructs the slab layer not to automatically reap objects (that is, free the memory backing unused objects) when memory is low. Normally, you do not want to set this flag because your cache could then prevent the system from recovering enough memory to continue operation when memory is low.
SLAB_HWCACHE_ALIGN This flag instructs the slab layer to align each object within a slab to a cache line. This prevents "false sharing" (two or more objects mapping to the same cache line despite existing at different addresses in memory). This improves per formance, but comes at a cost of increased memory footprint because the stricter alignment results in more wasted slack space. How large the increase in memory consumption is depends on the size of the objects and how they naturally align with respect to the system's cache lines. For frequently used caches in performance-critical code, setting this option is a good idea; otherwise, think twice.
SLAB_MUST_HWCACHE_ALIGN If debugging is enabled, it might be infeasible to both perform debugging and cache-align the objects. This flag forces the slab layer to cache-align the objects. Normally, this flag is not needed and the previous is sufficient. Specifying this flag while slab debugging is enabled (it is disabled by default) might result in a large increase in memory consumption. Only objects where cache alignment is critical, such as the process descriptor, should set this flag.
SLAB_POISON This flag causes the slab layer to fill the slab with a known value (a5a5a5a5). This is called poisoning and is useful for catching access to uninitialized memory.
SLAB_RED_ZONE This flag causes the slab layer to insert "red zones" around the allocated memory to help detect buffer overruns.
SLAB_PANIC This flag causes the slab layer to panic if the allocation fails. This flag is useful when the allocation must not fail, as in, say, allocating the VMA structure cache (see Chapter 14, "The Process Address Space") during bootup.
SLAB_CACHE_DMA This flag instructs the slab layer to allocate each slab in DMA-able memory. This is needed if the allocated object is used for DMA and must reside in ZONE_DMA. Otherwise, you do not need this and you should not set it.

The final two parameters, ctor and dtor, are a constructor and destructor for the cache, respectively. The constructor is called whenever new pages are added to the cache. The destructor is called whenever pages are removed from the cache. Having a destructor requires a constructor. In practice, caches in the Linux kernel do not often utilize a constructor or destructor. You can pass NULL for both of these parameters.

On success, kmem_cache_create() returns a pointer to the created cache. Otherwise, it returns NULL. This function must not be called from interrupt context because it can sleep.

To destroy a cache, call

int kmem_cache_destroy(kmem_cache_t *cachep)

As the name implies, this function destroys the given cache. It is generally invoked from module shutdown code in modules that create their own caches. It must not be called from interrupt context because it may sleep. The caller of this function must ensure two conditions are true prior to invoking this function:

All slabs in the cache are empty. Indeed, if an object in one of the slabs were still allocated and in use, how could the cache be destroyed?
No one accesses the cache during (and obviously after) a call to kmem_cache_destroy(). The caller must ensure this synchronization.

On success, the function returns zero; it returns nonzero otherwise.

After a cache is created, an object is obtained from the cache via

void * kmem_cache_alloc(kmem_cache_t *cachep, int flags)

This function returns a pointer to an object from the given cache cachep. If no free objects are in any slabs in the cache, and the slab layer must obtain new pages via kmem_getpages(), the value of flags is passed to __get_free_pages(). These are the same flags we looked at earlier. You probably want GFP_KERNEL or GFP_ATOMIC.

To later free an object and return it to its originating slab, use the function

void kmem_cache_free(kmem_cache_t *cachep, void *objp)

This marks the object objp in cachep as free.

Example of Using the Slab Allocator

Let's look at a real-life example that uses the task_struct structure (the process descriptor). This code, in slightly more complicated form, is in kernel/fork.c.

First, the kernel has a global variable that stores a pointer to the task_struct cache:

kmem_cache_t *task_struct_cachep;

During kernel initialization, in fork_init(), the cache is created:

task_struct_cachep = kmem_cache_create("task_struct",
                                       sizeof(struct task_struct),
                                       ARCH_MIN_TASKALIGN,
                                       SLAB_PANIC,
                                       NULL,
                                       NULL);

This creates a cache named task_struct, which stores objects of type struct task_struct. The objects are created with an offset of ARCH_MIN_TASKALIGN bytes within the slab. This preprocessor define is an architecture-specific value. It is usually defined as L1_CACHE_BYTESthe size in bytes of the L1 cache. There is no constructor or destructor. Note that the return value is not checked for NULL, which denotes failure, because the SLAB_PANIC flag was given. If the allocation fails, the slab allocator calls panic(). If you do not provide this flag, you must check the return! The SLAB_PANIC flag is used here because this is a requisite cache for system operation (the machine is not much good without process descriptors).

Each time a process calls fork(), a new process descriptor must be created (recall Chapter 3, "Process Management"). This is done in dup_task_struct(), which is called from do_fork():

struct task_struct *tsk;

tsk = kmem_cache_alloc(task_struct_cachep, GFP_KERNEL);
if (!tsk)
        return NULL;

After a task dies, if it has no children waiting on it, its process descriptor is freed and returned to the task_struct_cachep slab cache. This is done in free_task_struct() (where tsk is the exiting task):

kmem_cache_free(task_struct_cachep, tsk);

Because process descriptors are part of the core kernel and always needed, the task_struct_cachep cache is never destroyed. If it were, however, you would destroy the cache via

int err;

err = kmem_cache_destroy(task_struct_cachep);
if (err)
        /* error destroying cache */

Easy enough? The slab layer handles all the low-level alignment, coloring, allocations, freeing, and reaping during low-memory conditions. If you are frequently creating many objects of the same type, consider using the slab cache. Definitely do not implement your own free list!