Introducing IO Monitoring
Disk IO subsystems are the slowest part of any Linux system. This is due mainly to their distance from the CPU and the fact that disks require the physics to work (rotation and seek). If the time taken to access disk as opposed to memory was converted into minutes and seconds, it is the difference between 7 days and 7 minutes. As a result, it is essential that the Linux kernel minimizes the amount of IO it generates on a disk. The following subsections describe the different ways the kernel processes data IO from disk to memory and back.
Reading and Writing Data Memory Pages
The Linux kernel breaks disk IO into pages. The default page size on most Linux systems is 4K. It reads and writes disk blocks in and out of memory in 4K page sizes. You can check the page size of your system by using the time command in verbose mode and searching for the page size:
# /usr/bin/time -v date
<snip>
Page size (bytes): 4096
<snip>
Major and Minor Page Faults
Sounds confusing to most of the beginners when they read the term “Faults”. In Actual, “Fault” is a command Kernel operation to manage the memory and perform the IO operations. Below is the explanation:
Linux, like most UNIX systems, uses a virtual memory layer that maps into physical address space. This mapping is "on demand" in the sense that when a process starts, the kernel only maps that which is required. When an application starts, the kernel searches the CPU caches and then physical memory. If the data does not exist in either, the kernel issues a major page fault (MPF). A MPF is a request to the disk subsystem to retrieve pages off disk and buffer them in RAM. Once memory pages are mapped into the buffer cache, the kernel will attempt to use these pages resulting in a minor page fault (MnPF). A MnPF saves the kernel time by reusing a page in memory as opposed to placing it back on the disk.
In the following example, the time command is used to demonstrate how many MPF and MnPF occurred when an application started. The first time the application starts, there are many MPFs:
# /usr/bin/time -v evolution
<snip>
Major (requiring I/O) page faults: 163
Minor (reclaiming a frame) page faults: 5918
<snip>
The second time evolution starts, the kernel does not issue any MPFs because the application is in memory already:
# /usr/bin/time -v evolution
<snip>
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 5581
<snip>
The File Buffer Cache
The file buffer cache is used by the kernel to minimize MPFs and maximize MnPFs. As a system generates IO over time, this buffer cache will continue to grow as the system will leave these pages in memory until memory gets low and the kernel needs to "free" some of these pages for other uses. The end result is that many system administrators see low amounts of free memory and become concerned when in reality, the system is just making good use of its caches.
The following output is taken from the /proc/meminfo file:
# cat /proc/meminfo
MemTotal: 2075672 kB
MemFree: 52528 kB
Buffers: 24596 kB
Cached: 1766844 kB
<snip>
The system has a total of 2 GB (MemTotal) of RAM available on it. There is currently 52 MB of RAM "free" (MemFree), 24 MB RAM that is allocated to disk write operations (Buffers), and 1.7 GB of pages read from disk in RAM (Cached). The kernel is using these via the MnPF mechanism as opposed to pulling all of these pages in from disk. It is impossible to tell from these statistics whether or not the system is under distress as we only have part of the picture.
Types of Memory Pages
There are 3 types of memory pages in the Linux kernel. These pages are described below:
- Read Pages These are pages of data read in via disk (MPF) that are read only and backed on disk. These pages exist in the Buffer Cache and include static files, binaries, and libraries that do not change. The Kernel will continue to page these into memory as it needs them. If memory becomes short, the kernel will "steal" these pages and put them back on the free list causing an application to have to MPF to bring them back in.
- Dirty Pages These are pages of data that have been modified by the kernel while in memory. These pages need to be synced back to disk at some point using the pdflush daemon. In the event of a memory shortage, kswapd (along with pdflush) will write these pages to disk in order to make more room in memory.
- Anonymous Pages These are pages of data that do belong to a process, but do not have any file or backing store associated with them. They can't be synchronized back to disk. In the event of a memory shortage, kswapd writes these to the swap device as temporary storage until more RAM is free ("swapping" pages).
Writing Data Pages Back to Disk
Applications themselves may choose to write dirty pages back to disk immediately using the fsync() or sync() system calls. These system calls issue a direct request to the IO scheduler. If an application does not invoke these system calls, the pdflush kernel daemon runs at periodic intervals and writes pages back to disk.
# ps -ef | grep pdflush
root 186 6 0 18:04 ? 00:00:00 [pdflush]
No comments:
Post a Comment