Srinivas B T

Linux Device Drivers

Linux Device Drivers

Linux
Modules
Character drivers
IO & Memory
Linux Kernel
Process Management
Process Address space

Linux Scheduler
Memory Management
Interrupts
Signals
System Calls
Kernel Synchronization
Linux Inter Process Communications

Serial Ports
Parallel Ports
Introduction to Hardware
Linux Timers
DMA in Linux
Linux Threads
Linux Thread Synchronization

Linux Multi Threading
Debugging in Linux
GDB GNU Debugger
KDB Kernel Debugger
KGDB Kernel GNU Debugger
Example Ethernet Driver

Process Address SpaceMemory areas can contain:

Process Address Space

Memory areas can contain:

ü A memory map of the executable file’s code (text section)

ü A memory map of the executable file’s initialized global var (data)

ü A memory map of the zero page containing global vars (bss-section)

ü A memory map of zero page used for process’s user space stack

ü An additional text, data and bss-section for each shared library

ü Any memory mapped files

ü Any shared memory segments

ü Any anonymous memory mappings such as those associated withmalloc()

User context, Kernel Context, Stack, Heap, Uninitialized data, Initialized Read-write data, Initialized Read-only data, Text, Kernel Data,

Virtual address space

Memory Descriptor

The kernel represents a process’s address space with a data structure called the memory descriptor. The structure contains all the information related to the process address space.

The memory descriptor is represented by struct mm_struct and defined in <linux/sched.h>

struct mm_struct{

structvm_area_struct *mmap; /* list of memory areas*/

structrb_rootmm_rb; /* red-black tree of VMAs */

pgd_t*pgd; /* page global directory*/

atomic_tmm_users; /* address space users*/

atomic_tmm_count; /* primary usage counter*/

intmap_count; /* number of memory areas*/

unsigned long start_code;

unsigned long end_code;

unsigned long start_data;

unsigned long end_data;

};

Allocating a Memory Descriptor

The memory descriptor associated with a given task is stored in the mm field of the task’s process descriptor. Thus current->mm is the current process’s memory descriptor.

The mm_struct structure is allocated from the mm_cachep slab cache via the allocate_mm() macro in kernel/fork.c

Each process receives a unique mm_struct and thus a unique process address space.

Destroying a Memory Descriptor

When the process associated with a specific address space exits, The exit_mm() function is invoked. It the calls mm_put(), which decrements the memory descriptor’s mm_userscounter. If the user count reaches zero, mmdrop() is called to decrement the mm_count usage. If that counter is zero, then the free_mm() macro is invoked

to return mm_structto slab cache.

The mm_struct and Kernel Threads

Kernel threads do not have a process address space and do not have associated memory descriptor. Thus the mm field of a kernel thread’s process descriptor is NULL.

Whenever kernel thread begins running, kernel threads use the memory descriptor of whatever task ran previously.

Memory Areas

Memory areas are represented by a memory area object, which is stored in the vm_area_struct structure and defined in <linux/mm.h>.

Memory areas are called virtual memory area or VMA in the kernel.

The vm_area_struct structure describes a single memory area over a contiguous interval in a given address space.

struct vm_area_struct{

struct mm_struct *vm_mm; /* associated mm_struct */

unsigned long vm_start; /* VMA start, inclusive */

unsigned long vm_end; /* VMA end, exclusive */

structvm_area_struct *vm_next; /* list of VMA’s */

pgprot_tvm_page_prot; /* access permission */

unsigned long vm_flags; /* flags */

structvm_operations_struct *vm_ops; /* associated ops */

unsigned long vm_pgoff; /* offset with in a file */

structfile *vm_file; /* mapped file, if any */

};

VMA flags

The vm_flags field contains bit flags, defined in <linux/mm.h> that specify the behavior of and provide information about the pages contained in the memory area.

VM_READ Page can be read from

VM_WRITE Page can be written to

VM_EXEC Page can be executed

VM_SHARED Pages are shared

VM_SHM The area is used for shared memory

VM_IO The area maps a device’s IO space

VMA operations

The vm_ops field in the vm_area_struct structure points to the table of operations associated with a given memory area, which the kernel can invoke to manipulate the VMA. The operations table is represented by struct vm_operations_struct and is defined in <linux/mm.h>.

struct vm_operations_struct{

void (*open) (structvm_area_struct*);

void (*close) (structvm_area_struct*);

structpage * (*nopage) (structvm_area_struct*,

unsigned long int);

int(*populate) (structvm_area_struct*, unsigned long,

unsigned long, pgporot_t, unsigned long int);

};

VMA operations…

open – is invoked when the given memory area is added to an address space.

close – is invoked when the given memory area is removed from an address space.

nopage – is invoked by the page fault handler when a page that is not present in physical memory is accessed.

Populate– is invoked by the remap_pages() system call to prefault a new mapping.

Lists and Trees of Memory Area

Memory areas are accessed via both mmapand the mm_rbfields of the memory descriptor. These two data structures independently point to all the memory area objects associated with memory descriptor.

The mmap links together all the memory area objects in a singly linked list.

mm_rb links together all the memory area objects in a red-black tree.

A red-black tree is a type of balanced binary tree. Each element in a red-black tree is called a node.

The linked list is used when every node needs to be traversed.

The red-black is used when locating a specific memory area in the address space.

Memory Areas in Real Life

Let’s look at a particular process’s address space and the memory areas inside. We can use /proc filesystem and the pmap(1) utility. Example:

intmain(intargc, char *argv[])

{

return 0;

}

The output from /proc/<pid>/maps lists the memory area in the process’s address space:

#cat /proc/1426/maps

start-end permission offset major:minor inode file

00e80000-00faf000 r-xp 00000000 03:012 08530 /lib/libc-2.3.2.so

00fb2000-00fb4000 rw-p 00000000 00:000

Memory Areas in Real Life…

The pmaputility formats the information in a more readable manner:

#pmap1426

00e80000 (1212KB) r-xp (03:01 208530) /lib/libc-2.3.2.so

00fb2000 (8 KB) rw-p (00:00 0)

Bfffe000 (8KB) rwxp (00:00 0) [ stack ]

mapped: 1340KB writable/private: 40KB shared: 0 KB

Manipulating Memory Areas

The kernel often has to find whether any memory area in a process address space match a given criteria, such as whether a given address exists in memory area.

These functions are all declared in <linux/mm.h>

find_vma()

find_vma_prev()

find_vma_intersection()

mmap() and do_mmap(): Creating an Address Interval

The do_mmap() function is used by the kernel to create a new linear address interval. The do_mmap() function is declared <linux/mm.h>

unsigned long do_mmap (structfile *, unsigned long addr,

unsigned long len, unsigned long prot,

unsigned long flag, unsigned long offset)

If file parameter is zero and offset is 0, the mapping will not be backed by a file.

The prot parameter specifies the access permission for pages:

PROT_READ corresponds to VM_READ

PROT_WRITE corresponds to VM_WRITE

PROT_EXEC corresponds to VM_EXEC

PROT_NONE corresponds to VM_NONE

mmap() and do_mmap(): Creating an Address Interval…

The flags parameter specifies that correspond to the remaining VMA flags.

MAP_SHARED the mapping can be shared

PROT_WRITE the mapping can not be shared

PROT_FIXED the new interval must start at the given address addr

The mmap() system call

The mmap() system call is defined as

void *mmap2 (void *start, size_tLength, intprot, intflags, intfd, off_tpgoff);

The offset is in pages, the old mmap() took an offset in bytes. This enables larger files with larger offsets to be mapped.

munmap() and do_munmap(): Removing an Address Interval

The do_munmap() function removes an address interval from a specified address space. The function is defined in <linux/mm.h>

int munmap (struct mm_struct *mm, unsigned long start, size_t len);

On success 0 is returned, otherwise a negative error code is returned.

The munmap() system call

The munmap() system call is exported to user space as a means to allow processes to remove address intervals from their address space.

int munmap (void *start, size_t Length);

The system call is defined in mm/mmap.c and acts as a very simple wrapper to do_munmap()

Page Tables

Applications operate on virtual memory that is mapped to physical addresses, processors operate directly on those physical addresses. When an application accesses a virtual memory address it must be first converted to a physical address before the processor can resolve the request. Performing this lookup is done via page tables. Page tables work by splitting the virtual address into chunks. Each chunk is used as an index into a table. The table points to either another table or the associated physical page.

In Linux the page tables consist of three levels. The multiple levels allow a sparsely populated address space.

Page tables data structures are architecture dependent and are defined in <asm/page.h>

Page Tables

Page Cache

The Linux kernel implements a memory disk cache called the page cache. The goal of this cache is to minimize disk I/O by storing in physical memory.

The page cache consists of physical pages in RAM. Each page in the cache corresponds to multiple blocks on the disk. Whenever the kernel begins a page I/O operation, it first cheeks whether the requisite data is in the page cache. If it is, the kernel does not access the disk and gets the data from page cache.