Let's take a look on a simple scenario: single dd writes on devices with bs=16k
# dd if=/dev/zero of=dev_name bs=16k count=655360
The difference between block I/O and raw I/O and The difference between Solaris and Linux
The device name can be block devices or raw devices. For I/O on block devices, there's a cache layer in OS kernel. For I/O on raw devices, I/O bypasses the kernel's buffer cache.The device names are different between Linux and Solaris:
- Solaris block device: /dev/dsk/...
- Solaris raw device: /dev/rdsk/...
- Linux block device: /dev/sdxxx, /dev/mapper/xxxx, etc.
- Linux raw device: /dev/raw/xxxx (raw binding on block device)
What do they look like in iostat? (Don't look at performance numbers, I tested on different disks), I'm illustrating the behavior of single dd write on block devices and raw devices.
Solaris block device (bs=16k):
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 13006.1 0.0 101.6 133.5 256.4 10.3 19.7 100 100 c5t20360080E536D50Ed0
Linux block device (bs=16k):
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 14098.00 0.00 115.00 0.00 57868.00 1006.40 140.86 1325.40 0.00 1325.40 8.70 100.00
Solaris raw device (bs=16k):
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 2805.0 0.0 43.8 0.0 0.9 0.0 0.3 3 87 c5t20360080E536D50Ed0
Linux raw device (bs=8k):
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 3645.00 0.00 29160.00 16.00 0.63 0.18 0.00 0.18 0.17 62.90
For "single dd write" on block devices, look at "actv" column (Solaris) and "avgqu-sz" column (Linux), which means that average number of transactions actively being serviced (removed from the wait queue but not yet completed) are much larger than 1. On Solaris, the "actv" limit is controlled by kernel parameter ssd_max_throttle or sd_max_throttle, the default is 256. If the limit is hit, the I/O request will be queued ("wait" in iostat). The upper layer (e.g. ZFS filesystem) may also limit the I/Os sent for each device . For "single dd write" on raw devices, the "actv" or "avgqu-sz" is never larger than 1 (unless the backend of dev is not a real device, e.g. a regular file), which means that only when previous data transfer is completed, the next data can be send to the device. In this regard, Solaris and linux behave similarly. In addition, since this is I/O on raw devices, the I/O size in iostat is equal to application write size. While modern CPU and memory subsystem is very fast, "single dd write" on raw devices becomes more like a disk subsystem latency testing.
On the other hand, Solaris and Linux has different caching implementations for dd writes on block devices. From above iostat output, you can see Solaris write() splits the data into 8k trunks then send data to the I/O driver, while Linux can smartly merge small I/Os ("wrqm" column in iostat). This means that for "dd writes on block devices" Linux usually performs better than Solaris. However, this is not very common in Solaris, in most cases, application I/O writes are on the file system which can consolidate writes or on raw devices which is usually used and optimized by databases. Below is iostat during the write testing on a file in a ZFS filesystem.
$ dd if=/dev/zero of=TEST bs=16k count=655360
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 154.0 0.0 154.0 0.0 10.0 0.1 64.7 2 100 c7t0d0
We can see the above I/O write size is 154.0/154.0=1MB/s.What you can expect for single dd write performance
Below is my test result on a modern server, yours may be different.2.5" 10KPM internal disk without using internal hardware raid:
single dd writes on block device bs=16k: 160MB/s.
single dd writes on raw device bs=16k: 160MB/s (drive write-cache enabled)
1.3MB/s (drive write-cache disabled)
A low-end SAN storage:
single dd writes on block device bs=16k: depends on OS, LUN disk layout, raid level etc.
single dd writes on raw device bs=16k: a little more than 40MB/s (with some tweak on storage settings can get 75MB/s)
Additional notes:
-----------------------
- On linux, raw I/O is similar as O_DRIECT, GNU dd has a "oflag=direct" option for block device.
- Also tested on internal hardware raid0 volume of internal disk, the performance is similar as above internal disk testing result. A benefit of using internal hardware raid is for "oflag=dsync" writes, if you configure the internal raid controller cache as write-back enabled, then you will get much better performance.