Certain conditions occur on a system that create IO bottlenecks. These conditions may be identified by using a standard set of system monitoring tools. These tools include top, vmstat, iostat, and sar. There are some similarities between the output of these commands, but for the most part, each offers a unique set of output that provides a different aspect on performance. The following subsections describe conditions that cause IO bottlenecks.
Condition 1: CPU Wait on IO Too Much IO at Once
In an ideal environment, a CPU splits a percentage of its time between user (65%), kernel (30%) and idle (5%). If IO starts to cause the bottleneck on the system, a new condition “Wait on IO (WIO)” appears in the CPU performance statistics. A WIO condition occurs when a CPU is completely idle because all the runnable processes are waiting on IO. This meant that all the applications are in a sleep state because they are waiting for requests to complete in the IO subsystems.
The vmstat command provides WIO statistics in the last 4 fields of output under the “cpu” header.
# vmstat 1
procs ----memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
3 2 0 55452 9236 1739020 0 0 9352 0 2580 8771 20 24 0 57
2 3 0 53888 9232 1740836 0 0 14860 0 2642 8954 23 25 0 52
2 2 0 51856 9212 1742928 0 0 12688 0 2636 8487 23 25 0 52
These last 4 columns provide percentages of CPU utilization for user (us), kernel (sys), idle (id), and WIO (wa). In the previous output, the CPU averages 50% idle waiting on IO requests to complete. This means that there is 50% of the processor that is usable for executing applications, but no applications can execute because the kernel is waiting on IO requests to complete. You can observe this in the blocked threads column (b).
It is also worth noting that the major cause of the IO bottleneck is disk reads due to the large amount of disk blocks read into memory (bi). There is no data being written out to disk as the blocks out (bo) column has a zero value. From this output alone, it appears that the system is processing a large IO request.
FIELD DESCRIPTIONS
Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptable sleep.
w: The number of processes swapped out but otherwise runnable.
This field is calculated, but Linux never desperation swaps.
Memory
swpd: the amount of virtual memory used (kB).
free: the amount of idle memory (kB).
buff: the amount of memory used as buffers (kB).
Swap
si: Amount of memory swapped in from disk (kB/s).
so: Amount of memory swapped to disk (kB/s).
IO
bi: Blocks sent to a block device (blocks/s).
bo: Blocks received from a block device (blocks/s).
System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.
CPU
These are percentages of total CPU time.
us: user time
sy: system time
id: idle time
The top tool can provide enough insight to make an educated guess. Start the top command with a delay of 1 second:
# top -d 1
Once top is running, sort the output by faults (MPF and MnPF) by typing "F" to bring up the sort menu and "u" to sort by faults.
No comments:
Post a Comment