Srinivas B T

Linux Device Drivers

Linux Device Drivers

Linux
Modules
Character drivers
IO & Memory
Linux Kernel
Process Management
Process Address space

Linux Scheduler
Memory Management
Interrupts
Signals
System Calls
Kernel Synchronization
Linux Inter Process Communications

Serial Ports
Parallel Ports
Introduction to Hardware
Linux Timers
DMA in Linux
Linux Threads
Linux Thread Synchronization

Linux Multi Threading
Debugging in Linux
GDB GNU Debugger
KDB Kernel Debugger
KGDB Kernel GNU Debugger
Example Ethernet Driver

Memory Management:

Memory Management:

� Large Address Space

� Protection

� Memory Mapping

� Fair Physical Memory Allocation

� Shared virtual memory

Virtual Memory:

Virtual memory acts as a logical layer between the application memory requests and the hardware MMU.

� Several processes can be executed concurrently.

� Run application with larger memory needs than available physical memory.

� Processes can execute program whose code is only partially loaded into memory.

� Processes can share a single memory image of a library or program.

� Programs can be re-locatable, they can be placed anywhere

� M/C independent code, not concerned with organization

�� Virtual memory organization

Memory Protection:

Protecting the operating system from user processes, and protecting user processes from one another.� Implementation of protected system:

1. How programs can access different types of segments

2. Ensuring accesses with in the limits of the segment

3. Maintaining privilege levels or who has access to a segment, and

4. Controlling access to privileged instructions

� Type Checking

� Limit Checking

� Privilege Levels -Control of execution transfer, protected instructions and IOPL

Abstract model of Virtual to Physical address mapping:

Demand Paging:

OS adopts a memory allocation strategy called demand paging. �With demand paging, a process can start program execution with none of its pages in physical memory. As it accesses a non-present page, the MMU generates an exception, the exception handler finds the affected memory region, allocates a free page, and initializes with appropriate data.

Swapping:

In order to extend the size of virtual address space usable by the processes, the OS makes use of swap areas on disk. The VM regards the contents of a page frame as the basic unit for swapping.

Whenever some process refers to a swapped-out page, the MMU raises an exception handler then allocates a new page frame and initializes the page frame with its old contents saved on disk.

Page Replacement:

If no frame is free, we find one that is not currently being used and free it. We can free a frame by writing its contents to swap space, and changing the page table to indicate that the page is no longer in the memory. We can now use the freed frame to hold the page for which the process faulted.�

Two algorithms of page replacement:

FIFO Page replacement

LRU Page Replacement

Linux uses a Least Recently Used (LRU) page aging technique to fairly choose pages which might be removed from the system

Linux Memory Management:

Some portion of RAM is permanently assigned to the kernel and used to store both the kernel code and the static kernel data structures.� The remaining part of memory is dynamic memory and is needed by processes and kernel both.� Linux uses two different techniques for handling physically contiguous memory areas.

� Page Frame Management (Buddy system Algorithm )

� Memory Area Management (Slab Allocator)

Page Frame Management:

Linux adopts 4 KB page frame size as the standard memory allocation unit.

� The Page Fault exceptions issued by the paging circuitry are easily interpreted. �Either the page requested exists but the process is not allowed to address it, or the page does not exist. In the second case, the memory allocator must find a free 4 KB page frame and assign it to the process.

� The 4 KB size is a multiple of most disk block sizes, so transfers of data between main memory and disks are more efficient.

Page Descriptors

State information of a page frame is kept in a page descriptor of type struct page. All page descriptors are stored in the mem_map array. Since each descriptor is less than 64 bytes long, mem_map requires about four page frames for each megabyte of RAM.

Pages

The kernel represents every physical page on the system with a structpage structure. This structure is defined in <linux/mm.h>.

structpage {

page_flags _tflags;

atomic_t _count;

atomic_t _mapcount;

unsigned long private;

struct address_space* mapping;

pgoff_t index;

struct list_head lru;

void virtual;

};

The flags field stores the status of the page. Such flags include whether the page is dirty or locked in memory. The flag values are defined in <linux/page-flags.h>

Memory Zones

Linux partitions the physical memory in three zones:

ZONE_DMA

Contains pages of memory below 16 MB

ZONE_NORMAL

Contains pages of memory at and above 16 MB and below 896 MB

ZONE_HIGHMEM

Contains pages of memory at and above 896 MB

Each zone is represented by struct zone, which is defined in

< linux/mmzone.h>

struct zone {

spinlock_tlock;

unsigned longfree_pages;

unsigned longpages_min;

unsigned long pages_low;

};

Buddy system algorithm�

All page frames are grouped into 10 lists of blocks that contain groups of 1, 2, 4, 8, 16, 32, 64, 128, 256, and 512 contiguous page frames, respectively

The address of the first page frame of a block is a multiple of the group size, for example, a 16 frame block is a multiple of 16 �2 12

The algorithm for allocating, for example, a block of 128contiguous page frames

� First checks for a free block in the 128 list

� If no free block, it then looks in the 256 list for a free block

� If it finds a block, the kernel allocates 128 of the 256 page frames and puts the remaining 128 into the 128 list

� If no block it looks at the next larger list, allocating it and dividing the block similarly

� If no block can be allocated an error is reported

Buddy system algorithm�

�When a block is released, the kernel attempts to merge together pairs of free buddy blocks of size b into a single block of size 2b

� Two blocks are considered buddies if

� Both have the same size

� They are located in contiguous physical addresses

� The physical address of the first page from of the first block is a multiple of 2b �2 12

� The merging is iterative

Getting Pages

To allocate big chunks of memory (approx 2 MB):

unsigned long __get_free_pages (int flags, unsigned long order);

unsigned long get_zeroed_page (int flags);

returns a pointer to a new page and fills the page with zeros.

Freeing Pages

A family of functions allows you to free allocated pages:

void __free_pages(structpage *page, unsigned intorder);

void free_pages(unsigned long addr, unsigned intorder);

void free_page(unsigned long addr);

Let us see look an example, allocate 8 pages:

unsigned long page;

page = __get_free_pages(GFP_KERNEL, 3);

/* page is the address of first eight contiguous pages */

free_page(page, 3);

FLAGS

Some of important flags are:

GFP_BUFFER:�� Used in managing the buffer cache, allows the allocator to sleep.

GFP_ATOMIC:�� Used to allocate memory from interrupt handler and other code outside of a process �context. Never sleeps.

GFP_USER:�� Used to allocate memory on behalf of the user. It may sleep.

GFP_KERNEL:�� Normal allocation of kernel memory. May sleep.

GFP_DMA:�� allocation from ZONE_DMA, device driver that need DMA-able memory.

kmalloc()

In the kernel, it is often necessary to allocate dynamic memory, or example for temporary buffers. The functions used for this are kmalloc() and kfree(). With kmalloc() one can reserve memory < 128K

The function is declared in <linux/slab.h>

void *kmalloc(size_tsize, intflags);

Which flag to use when:

Process context, can sleep�� use GFP_KERNEL

Process context, cannot sleep�� use GFP_ATOMIC

Interrupt handler�� use GFP_ATOMIC

Softirq�� use GFP_ATOMIC

Tasklet�� use GFP_ATOMIC

Need DMA-able memory�� use (GFP_DMA | GFP_KERNEL)

kfree()

The kfree() method frees a block of memory previously allocated with kmalloc(). It is declared in <linux/slab.h>

void kfree(const void *ptr);

Look an example of allocating memory in an interrupt handler.

char *buf;

buf= kmalloc(BUF_SIZE, GFP_ATOMIC);

When it is no longer needed, can be freed:

kfree(buf);

Non-contiguous memory area management

Linux uses non-contiguous memory areas in several ways � for instance, to allocate data structures for active swap areas, to allocate space for a module, or to allocate buffers to some I/O drivers.

The vmalloc( ) function allocates a noncontiguous memory area to the kernel. The parameter size denotes the size of the requested area.� If the function is able to satisfy the request, it then returns the initial linear address of the new area; otherwise, it returns a NULL pointer.

The vfree( ) function releases noncontiguous memory areas.

vmalloc()

Using the functions vmalloc() and vmfree(), the memory can be allocated in multiples of a page of memory. It is limited only by the size of free physical memory.

The vmalloc() function is declared in <linux/vmalloc.h> and defined in mm/vmalloc.c. Usage is similar to user-space malloc()

void *vmalloc(unsigned long size);

The function returns a pointer to at least size bytes of virtually contiguous memory.

Vfree()

To free an allocation obtained via vmalloc(), use:

void *vmfree(void *addr);

Usage of these functions is simple:

char *buf;

buf= vmalloc(16 * PAGE_SIZE); /* get 16 pages */

When it is no longer needed, can be freed:

vfree(buf);

Slab layer

Allocating and freeing data structures is one of the most common operations inside any kernel. To facilitate frequent allocations and de-allocations of data, programmers often introduce free lists. A freelist contains a block of available, already allocated, data structures.

The new instance of data structure is allocated and freed from this free list.

Thus a free list acts as an object cache, caching a frequently used type of object.

The concept of a slab allocator was first implemented in Sun Microsystem�s SunOS 5.4.

Slab layer goals

� Frequently used data structures tend to be allocated and freed often, so cache them.

� Frequent allocation and deallocationcan result in memory fragmentation. To prevent this, the cached free lists are arranged contiguously.

� The free list provides improved performance.

� If the allocator is aware of concepts such as object size, page size, and total cache size, it will be helpful.

� If the part of the cache is made per�processor, allocations and freescan be performed without SMP lock.

� Stored objects can be colored to prevent multiple objects from mapping to the same cache lines.

Design of Slab Layer

The slab layer divides different objects into groups called caches, each of which stores a different type of object. There is one cache per object type. For example, one cache is for process descriptors (a free list of task_struct structure), whereas another cache is for inode objects (struct inode).

The caches are divided into slabs. The slabs are composed of one or more physically contiguous pages. Each slab contains some number of objects, which are the data structures being cached. Each slab is one of the three states: full, partial or empty. When the kernel requests a new object, the request is satisfied from a partial slab. Otherwise the request is satisfied from an empty slab.

Example of using the slab allocator

Let us look at an example that uses the task_struct structure (process descriptor). This code is in kernel/fork.c.

Kernel has a global variable that stores a pointer to the task_struct cache

kmem_cache_t *task_struct_cachep;

During kernel initialization, in fork_init(), the cache is created:

task_struct_cachep = kmem_cache_create("task_struct", sizeof(struct task-struct),ARCH_MIN_TASKALIGN, SLAB_PANIC, NULL, NULL);

This creates a cache named task_struct, which stores objects of type struct task_struct.

Each time a process calls fork(), a new process descriptor must be created. This is done in dup_task_struct(), which is called from do_fork():

struct task_struct*tsk;

tsk = kmem_cache_alloc(task_struct_cachep, GFP_KERNEL);

After a task dies, it has no children waiting on it, its process descriptor is freed and returned to the task_struct_cachep slab cache. This is done in free_task_struct():

kmem_cache_free(task_struct_cachep, tsk);

Because process descriptors are part of the core kernel and always needed, the task_struct_cachep is never destroyed. However, to destroy the cache via:

int err;

err = kmem_cache_destroy(task_struct_cachep, tsk);

The kmalloc() interface is built on top of the slab layer using a family of general purpose caches.

A group of general caches exist whose objects are of geometrically distributed sizes ranging from 32 to 131072 bytes

� To obtain objects from these general caches, use kmalloc(size, flags)

� To release objects from these general caches, use kfree(objp)

Permanent Mappings

To map a given page structure into the kernel address space use

void *kmap(struct page *page);

This function works on either high or low memory.

High memory should be unmapped when no longer needed. This is done via:

void kunmap(struct page *page);