Linux Device Drivers

Linux Device Drivers

Character drivers
IO & Memory
Linux Kernel
Process Management
Process Address space

Linux Scheduler
Memory Management
System Calls
Kernel Synchronization
Linux Inter Process Communications

Serial Ports
Parallel Ports
Introduction to Hardware
Linux Timers
DMA in Linux
Linux Threads
Linux Thread Synchronization

Linux Multi Threading
Debugging in Linux
GDB GNU Debugger
KDB Kernel Debugger
KGDB Kernel GNU Debugger
Example Ethernet Driver

Process Address SpaceMemory areas can contain:

Process Address Space


Memory areas can contain:


ü     A memory map of the executable file’s code (text section)

ü     A memory map of the executable file’s initialized global var (data)

ü     A memory map of the zero page containing global vars (bss-section)

ü     A memory map of zero page used for process’s user space stack

ü     An additional text, data and bss-section for each shared library

ü     Any memory mapped files

ü     Any shared memory segments

ü     Any anonymous memory mappings such as those associated withmalloc()


User context, Kernel Context, Stack, Heap, Uninitialized data, Initialized Read-write data, Initialized Read-only data, Text, Kernel Data,


Virtual address space


Memory Descriptor

The kernel represents a process’s address space with a data structure called the memory descriptor. The structure contains all the information related to the process address space.

The memory descriptor is represented by struct mm_struct and defined in <linux/sched.h>


struct mm_struct{

structvm_area_struct *mmap;   /* list of memory areas*/

structrb_rootmm_rb; /* red-black tree of VMAs */

pgd_t*pgd;               /* page global directory*/

atomic_tmm_users;   /* address space users*/

atomic_tmm_count;   /* primary usage counter*/

intmap_count;       /* number of memory areas*/

unsigned long start_code;

unsigned long end_code;

unsigned long start_data;

unsigned long end_data;



Allocating a Memory Descriptor


The memory descriptor associated with a given task is stored in the mm field of the task’s process descriptor. Thus current->mm is the current process’s memory descriptor.

The mm_struct structure is allocated from the mm_cachep slab cache via the allocate_mm() macro in kernel/fork.c

Each process receives a unique mm_struct and thus a unique process address space.


Destroying a Memory Descriptor


When the process associated with a specific address space exits,  The exit_mm() function is invoked. It the calls mm_put(), which decrements the memory descriptor’s   mm_userscounter. If the user count reaches zero, mmdrop() is called to decrement the mm_count usage. If that counter is zero, then the free_mm() macro is invoked

to return mm_structto slab cache.


The mm_struct and Kernel Threads


Kernel threads do not have a process address space and do not have associated memory descriptor. Thus the mm field of a kernel thread’s process descriptor is NULL.

Whenever kernel thread begins running, kernel threads use the memory descriptor of whatever task ran previously.


Memory Areas


Memory areas are represented by a memory area object, which is stored in the vm_area_struct structure and defined in <linux/mm.h>.

Memory areas are called virtual memory area or VMA in the kernel.

The vm_area_struct structure describes a single memory area over a contiguous interval in a given address space.


struct vm_area_struct{

struct mm_struct *vm_mm; /* associated mm_struct */

unsigned long vm_start;  /* VMA start, inclusive */

unsigned long vm_end;    /* VMA end, exclusive */

structvm_area_struct *vm_next; /* list of VMA’s */

pgprot_tvm_page_prot;         /* access permission */

unsigned long vm_flags;       /* flags */

structvm_operations_struct *vm_ops;     /* associated ops */

unsigned long vm_pgoff;  /* offset with in a file */

structfile *vm_file;               /* mapped file, if any */



VMA flags

The vm_flags field contains bit flags, defined in <linux/mm.h> that specify the behavior of and provide information about the pages contained in the memory area.

VM_READ                 Page can be read from

VM_WRITE               Page can be written to

VM_EXEC                 Page can be executed

VM_SHARED            Pages are shared

VM_SHM                   The area is used for shared memory

VM_IO                        The area maps a device’s IO space


VMA operations

The vm_ops field in the vm_area_struct structure points to the table of operations associated with a given memory area, which the kernel can invoke to manipulate the VMA. The operations table is represented by struct vm_operations_struct and is defined in <linux/mm.h>.

struct vm_operations_struct{

void (*open) (structvm_area_struct*);

void (*close) (structvm_area_struct*);

structpage * (*nopage) (structvm_area_struct*,

unsigned long int);

int(*populate) (structvm_area_struct*, unsigned long,

unsigned long, pgporot_t, unsigned long int);



VMA operations…


open    – is invoked when the given memory area is added to an address space.

close    – is invoked when the given memory area is removed from an address space.

nopage – is invoked by the page fault handler when a page that is not present in physical memory is accessed.

Populate– is invoked by the remap_pages() system call to prefault a new mapping.


Lists and Trees of Memory Area


Memory areas are accessed via both mmapand the mm_rbfields of the memory descriptor. These two data structures independently point to all the memory area objects associated with memory  descriptor.

The mmap links together all the memory area objects in a singly linked list.

mm_rb links together all the memory area objects in a red-black tree.

A red-black tree is a type of balanced binary tree. Each element in a red-black tree is called a node.

The linked list is used when every node needs to be traversed.

The red-black is used when locating a specific memory area in the address space.


Memory Areas in Real Life


Let’s look at a particular process’s address space and the memory areas inside. We can use /proc filesystem and the pmap(1) utility. Example:


intmain(intargc, char *argv[])


return 0;



The output from /proc/<pid>/maps lists the memory area in the  process’s address space:

#cat /proc/1426/maps

start-end                             permission           offset                    major:minor        inode     file

00e80000-00faf000          r-xp                       00000000            03:012                  08530 /lib/

00fb2000-00fb4000         rw-p                      00000000            00:000


Memory Areas in Real Life…


The pmaputility formats the information in a more readable manner:



00e80000 (1212KB)   r-xp     (03:01 208530)           /lib/

00fb2000 (8 KB)        rw-p    (00:00 0)

Bfffe000 (8KB)          rwxp    (00:00 0)                     [ stack ]

mapped: 1340KB        writable/private: 40KB          shared: 0 KB


Manipulating Memory Areas


The kernel often has to find whether any memory area in a process address space match a given criteria, such as whether a given address exists in memory area.


These functions are all declared in <linux/mm.h>





mmap() and do_mmap(): Creating an Address Interval


The do_mmap() function is used by the kernel to create a new linear address interval. The do_mmap() function is declared <linux/mm.h>


unsigned long do_mmap (structfile *, unsigned long addr,

unsigned long len, unsigned long prot,

unsigned long flag, unsigned long offset)


If file parameter is zero and offset is 0, the mapping will not be backed by a file.

The prot parameter specifies the access permission for pages:

PROT_READ corresponds to            VM_READ

PROT_WRITE            corresponds to            VM_WRITE

PROT_EXEC              corresponds to            VM_EXEC

PROT_NONE             corresponds to            VM_NONE


mmap() and do_mmap(): Creating an Address Interval…

The flags parameter specifies that correspond to the remaining VMA flags.

MAP_SHARED          the mapping can be shared

PROT_WRITE            the mapping can not be shared

PROT_FIXED the new interval must start at the given address addr


The mmap() system call


The mmap() system call is defined as

void *mmap2 (void *start, size_tLength, intprot, intflags, intfd, off_tpgoff);

The offset is in pages, the old mmap() took an offset in bytes. This enables larger files with larger offsets to be mapped.


munmap() and do_munmap(): Removing an Address Interval


The do_munmap() function removes an address interval from a specified address space. The function is defined in <linux/mm.h>

int munmap (struct mm_struct *mm, unsigned long start, size_t len);

On success 0 is returned, otherwise a negative error code is returned.


The munmap() system call

The munmap() system call is exported to user space as a means to allow processes to remove address intervals from their address space.

int munmap (void *start, size_t Length);

The system call is defined in mm/mmap.c and acts as a very simple wrapper to do_munmap()


Page Tables

Applications operate on virtual memory that is mapped to physical addresses, processors operate directly on those physical addresses. When an application accesses a virtual memory address it must be first converted to a physical address before the processor can resolve the request. Performing this lookup is done via page tables. Page tables work by splitting the virtual address into chunks. Each chunk is used as an index into a table. The table points to either another table or the associated physical page.


In Linux the page tables consist of three levels. The multiple levels allow a sparsely populated address space.


Page tables data structures are architecture dependent and are defined in <asm/page.h>


Page Tables


Page Cache


The Linux kernel implements a memory disk cache called the page cache. The goal of this cache is to minimize disk I/O by storing in physical memory.

The page cache consists of physical pages in RAM. Each page in the cache corresponds to multiple blocks on the disk. Whenever the kernel begins a page I/O operation, it first cheeks whether the requisite data is in the page cache. If it is, the kernel does not access the disk and gets the data from page cache.