Team LiB
Previous Section Next Section

mmap() and do_mmap(): Creating an Address Interval

The do_mmap() function is used by the kernel to create a new linear address interval. Saying that this function creates a new VMA is not technically correct, because if the created address interval is adjacent to an existing address interval, and if they share the same permissions, the two intervals are merged into one. If this is not possible, a new VMA is created. In any case, do_mmap() is the function used to add an address interval to a process's address spacewhether that means expanding an existing memory area or creating a new one.

The do_mmap() function is declared in <linux/mm.h>:

unsigned long do_mmap(struct file *file, unsigned long addr,
                      unsigned long len, unsigned long prot,
                      unsigned long flag, unsigned long offset)

This function maps the file specified by file at offset offset for length len. The file parameter can be NULL and offset can be zero, in which case the mapping will not be backed by a file. In that case, this is called an anonymous mapping. If a file and offset are provided, the mapping is called a file-backed mapping.

The addr function optionally specifies the initial address from which to start the search for a free interval.

The prot parameter specifies the access permissions for pages in the memory area. The possible permission flags are defined in <asm/mman.h> and are unique to each supported architecture, although in practice each architecture defines the flags listed in Table 14.2.

Table 14.2. Page Protection Flags


Effect on the Pages in the New Interval


Corresponds to VM_READ


Corresponds to VM_WRITE


Corresponds to VM_EXEC


Page cannot be accessed

The flags parameter specifies flags that correspond to the remaining VMA flags. These flags are also defined in <asm/mman.h>. See Table 14.3.

Table 14.3. Page Protection Flags


Effect on the New Interval


The mapping can be shared


The mapping cannot be shared


The new interval must start at the given address addr


The mapping is not file-backed, but is anonymous


Corresponds to VM_GROWSDOWN


Corresponds to VM_DENYWRITE


Corresponds to VM_EXECUTABLE


Corresponds to VM_LOCKED


No need to reserve space for the mapping


Populate (prefault) page tables


Do not block on I/O

If any of the parameters are invalid, do_mmap() returns a negative value. Otherwise, a suitable interval in virtual memory is located. If possible, the interval is merged with an adjacent memory area. Otherwise, a new vm_area_struct structure is allocated from the vm_area_cachep slab cache, and the new memory area is added to the address space's linked list and red-black tree of memory areas via the vma_link() function. Next, the total_vm field in the memory descriptor is updated. Finally, the function returns the initial address of the newly created address interval.

The mmap() System Call

The do_mmap() functionality is exported to user-space via the mmap() system call. The mmap() system call is defined as

void  * mmap2(void *start,
              size_t length,
              int prot,
              int flags,
              int fd,
              off_t pgoff)

This system call is named mmap2() because it is the second variant of mmap(). The original mmap() took an offset in bytes as the last parameter; the current mmap2() receives the offset in pages. This enables larger files with larger offsets to be mapped. The original mmap(), as specified by POSIX, is available from the C library as mmap(), but is no longer implemented in the kernel proper, whereas the new version is available as mmap2(). Both library calls use the mmap2() system call, with the original mmap() converting the offset from bytes to pages.

    Team LiB
    Previous Section Next Section