Team LiB
Previous Section Next Section

The File Object

The final primary VFS object that we shall look at is the file object. The file object is used to represent a file opened by a process. When we think of the VFS from the perspective of user-space, the file object is what readily comes to mind. Processes deal directly with files, not superblocks, inodes, or dentries. It is not surprising that the information in the file object is the most familiar (data such as access mode and current offset) or that the file operations are familiar system calls such as read() and write().

The file object is the in-memory representation of an open file. The object (but not the physical file) is created in response to the open() system call and destroyed in response to the close() system call. All these file-related calls are actually methods defined in the file operations table. Because multiple processes can open and manipulate a file at the same time, there can be multiple file objects in existence for the same file. The file object merely represents a process's view of an open file. The object points back to the dentry (which in turn points back to the inode) that actually represents the open file. The inode and dentry objects, of course, are unique.

The file object is represented by struct file and is defined in <linux/fs.h>. Let's look at the structure, again with comments added to describe each entry:

struct file {
        struct list_head       f_list;        /* list of file objects */
        struct dentry          *f_dentry;     /* associated dentry object */
        struct vfsmount        *f_vfsmnt;     /* associated mounted fs */
        struct file_operations *f_op;         /* file operations table */
        atomic_t               f_count;       /* file object's usage count */
        unsigned int           f_flags;       /* flags specified on open */
        mode_t                 f_mode;        /* file access mode */
        loff_t                 f_pos;         /* file offset (file pointer) */
        struct fown_struct     f_owner;       /* owner data for signals */
        unsigned int           f_uid;         /* user's UID */
        unsigned int           f_gid;         /* user's GID */
        int                    f_error;       /* error code */
        struct file_ra_state   f_ra;          /* read-ahead state */
        unsigned long          f_version;     /* version number */
        void                   *f_security;   /* security module */
        void                   *private_data; /* tty driver hook */
        struct list_head       f_ep_links;    /* list of eventpoll links */
        spinlock_t             f_ep_lock;     /* eventpoll lock */
        struct address_space   *f_mapping;    /* page cache mapping */
};

Similar to the dentry object, the file object does not actually correspond to any on-disk data. Therefore, no flag is in the object to represent whether the object is dirty and needs to be written back to disk. The file object does point to its associated dentry object via the f_dentry pointer. The dentry in turn points to the associated inode, which reflects whether the file is dirty.

File Operations

As with all the other VFS objects, the file operations table is quite important. The operations associated with struct file are the familiar system calls that form the basis of the standard Unix system calls.

The file object methods are specified in file_operations and defined in <linux/fs.h>:

struct file_operations {
        struct module *owner;
        loff_t (*llseek) (struct file *, loff_t, int);
        ssize_t (*read) (struct file *, char *, size_t, loff_t *);
        ssize_t (*aio_read) (struct kiocb *, char *, size_t, loff_t);
        ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
        ssize_t (*aio_write) (struct kiocb *, const char *, size_t, loff_t);
        int (*readdir) (struct file *, void *, filldir_t);
        unsigned int (*poll) (struct file *, struct poll_table_struct *);
        int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
        int (*mmap) (struct file *, struct vm_area_struct *);
        int (*open) (struct inode *, struct file *);
        int (*flush) (struct file *);
        int (*release) (struct inode *, struct file *);
        int (*fsync) (struct file *, struct dentry *, int);
        int (*aio_fsync) (struct kiocb *, int);
        int (*fasync) (int, struct file *, int);
        int (*lock) (struct file *, int, struct file_lock *);
        ssize_t (*readv) (struct file *, const struct iovec *,
                          unsigned long, loff_t *);
        ssize_t (*writev) (struct file *, const struct iovec *,
                           unsigned long, loff_t *);
        ssize_t (*sendfile) (struct file *, loff_t *, size_t,
                             read_actor_t, void *);
        ssize_t (*sendpage) (struct file *, struct page *, int,
                             size_t, loff_t *, int);
        unsigned long (*get_unmapped_area) (struct file *, unsigned long,
                                            unsigned long, unsigned long,
                                            unsigned long);
        int (*check_flags) (int flags);
        int (*dir_notify) (struct file *filp, unsigned long arg);
        int (*flock) (struct file *filp, int cmd, struct file_lock *fl);
};

Filesystems can implement unique functions for each of these operations, or they can use a generic method if one exists. The generic methods tend to work fine on normal Unix-based filesystems. A filesystem is under no obligation to implement all these methodsalthough not implementing the basics is sillyand can simply set the method to NULL if not interested.

Here are the individual operations:

  • loff_t llseek(struct file *file,
                   loff_t offset, int origin) 
    

    This function updates the file pointer to the given offset. It is called via the llseek() system call.

  • ssize_t read(struct file *file,
                  char *buf, size_t count,
                  loff_t *offset)
    

    This function reads count bytes from the given file at position offset into buf. The file pointer is then updated. This function is called by the read() system call.

  • ssize_t aio_read(struct kiocb *iocb,
                      char *buf, size_t count,
                      loff_t offset)
    

    This function begins an asynchronous read of count bytes into buf of the file described in iocb. This function is called by the aio_read() system call.

  • ssize_t write(struct file *file,
                   const char *buf, size_t count,
                   loff_t *offset)
    

    This function writes count bytes from buf into the given file at position offset. The file pointer is then updated. This function is called by the write() system call.

  • ssize_t aio_write(struct kiocb *iocb,
                       const char *buf,
                       size_t count, loff_t offset)
    

    This function begins an asynchronous write of count bytes into buf of the file described in iocb. This function is called by the aio_write() system call.

  • int readdir(struct file *file, void *dirent,
                 filldir_t filldir)
    

    This function returns the next directory in a directory listing. This function is called by the readdir() system call.

  • unsigned int poll(struct file *file,
         struct poll_table_struct *poll_table)
    

    This function sleeps, waiting for activity on the given file. It is called by the poll() system call.

  • int ioctl(struct inode *inode,
               struct file *file,
               unsigned int cmd,
               unsigned long arg)
    

    This function is used to send a command and argument pair to a device. It is used when the file is an open device node. This function is called from the ioctl() system call.

  • int mmap(struct file *file,
              struct vm_area_struct *vma)
    

    This function memory maps the given file onto the given address space and is called by the mmap() system call.

  • int open(struct inode *inode,
              struct file *file)
    

    This function creates a new file object and links it to the corresponding inode object. It is called by the open() system call.

  • int flush(struct file *file)

    This function is called by the VFS whenever the reference count of an open file decreases. Its purpose is filesystem dependent.

  • int release(struct inode *inode,
                 struct file *file)
    

    This function is called by the VFS when the last remaining reference to the file is destroyedfor example, when the last process sharing a file descriptor calls close() or exits. Its purpose is filesystem dependent.

  • int fsync(struct file *file,
               struct dentry *dentry,
               int datasync)
    

    This function is called by the fsync() system call to write all cached data for the file to disk.

  • int aio_fsync(struct kiocb *iocb,
                   int datasync)
    

    This function is called by the aio_fsync() system call to write all cached data for the file associated with iocb to disk.

  • int fasync(int fd, struct file *file, int on)

    This function enables or disables signal notification of asynchronous I/O.

  • int lock(struct file *file, int cmd,
              struct file_lock *lock)
    

    This function manipulates a file lock on the given file.

  • ssize_t readv(struct file *file,
                   const Pstruct iovec *vector,
                   unsigned long count,
                   loff_t *offset)
    

    This function is called by the readv() system call to read from the given file and put the results into the count buffers described by vector. The file offset is then incremented.

  • ssize_t writev(struct file *file,
                    const struct iovec *vector,
                    unsigned long count,
                    loff_t *offset)
    

    This function is called by the writev() system call to write from the count buffers described by vector into the file specified by file. The file offset is then incremented.

  • ssize_t sendfile(struct file *file,
                      loff_t *offset,
                      size_t size,
                      read_actor_t actor,
                      void *target)
    

    This function is called by the sendfile() system call to copy data from one file to another. It performs the copy entirely in the kernel and avoids an extraneous copy to user-space.

  • ssize_t sendpage(struct file *file,
                      struct page *page,
                      int offset, size_t size,
                      loff_t *pos, int more)
    

    This function is used to send data from one file to another.

  • unsigned long get_unmapped_area(struct file
                     *file,
                     unsigned long addr,
                     unsigned long len,
                     unsigned long offset,
                     unsigned long flags)
    

    This function gets unused address space to map the given file.

  • int check_flags(int flags)

    This function is used to check the validity of the flags passed to the fcntl() system call when the SETFL command is given. As with many VFS operations, filesystems need not implement check_flags(); currently, only NFS does so. This function enables filesystems to restrict invalid SETFL flags that are otherwise allowed by the generic fcntl() function. In the case of NFS, combining O_APPEND and O_DIRECT is not allowed.

  • int flock(struct file *filp,
               int cmd,
               struct file_lock *fl)
    

    This function is used to implement the flock() system call, which provides advisory locking.

    Team LiB
    Previous Section Next Section