HowTo: Linux: Double your disk read performance in a single command

Under the right conditions (that is, with certain hardware configurations which I'll identify later) it is possible to literally double your sequential read performance from disk. That's right, I said double. All with a single command. What is this magic you ask? Read on.

Requirements: What you'll need for it to work


Let me immediately say that this will absolutely not work on all systems. This (probably) wont improve read performance on your desktop or workstation (unless it's a high end workstation), and probably wont improve your performance unless you're running large, high speed, hardware RAID arrays. For example, systems using Dell PERC 5/i SAS/SATA arrays, various 3ware cards (9500s for sure, newer probably, and older 8000 series possibly). If you have a hardware RAID array, try it. It certainly wont hurt.

Testbench: Dell PowerEdge 2950 w/ PERC 5/i SAS (6x300 GB SAS 15K RPM disks, RAID5)


A beefy system indeed. I'll hold out on the rest of the specs as their impact on performance should be marginal at best. For my own sanity, I carved up this fine array into two virtual disks, one (/dev/sda) for the operating system and the other (/dev/sdb) for our website and database stuff. The latter was used for benchmarking.

To simulate slightly more real world testing, I decided to use a machine that's been in production, has been updated, has a formatted (ext3) and mounted partition with a really large MySQL db on it (~350 GB). The file I'll be reading is a database and ~120 GB in size. The database was shutdown and I sync'ed the disks before running my commands.

Before Me


[root@cyberman mydb]# time dd if=files.ibd of=/dev/null bs=256k
492000+0 records in
492000+0 records out
128974848000 bytes (129 GB) copied, 831.429 seconds, 155 MB/s

real    13m51.457s
user    0m0.155s
sys     1m31.558s


After Me


[root@cyberman mydb]# time dd if=files.ibd of=/dev/null bs=256k
492000+0 records in
492000+0 records out
128974848000 bytes (129 GB) copied, 409.294 seconds, 315 MB/s

real    6m49.304s
user    0m0.063s
sys     2m6.730s

Yes that's right, it went from 155 MB/s to 315 MB/s! Yum!

Ok, so the magic is less than impressive. In fact, it's a well known trick, at least if you noticed the terrible performance of the 3Ware 9500S RAID controller and cared enough to investigate. It all has to do with a sneaky little block device parameter known as readahead.

Without going into too much gory detail, readahead controls how much in advance the operating system reads when, well, reading, as its name implies. By default, some operating systems (in particular, RHEL5 Server) sets this to 256 (512-byte sectors), or about 128 KB. When dealing with large filesystems spanning many disks, this paltry figure can actually nuke your performance - as was the case with the 3Ware 9500S card. This parameter is tunable during runtime using the blockdev program.

The Magic


Here's the magic I used to double sequential read performance:

[root@cyberman mydb]# blockdev --getra /dev/sdb
256
[root@cyberman mydb]# blockdev --setra 262144 /dev/sdb

In retrospect, 262144 (about 128 MB!) was a TAD overkill. More realistic values are 1024, 2048, 4096, 8192, and maybe 16384...it really depends on the number of hard disks, their speed, your RAID controller, etc. With the 3Ware 9500S, I experienced significantly diminishing returns after 4096; your mileage may vary.

Caveats, notes, buts, etc...


Like I said early on, this wont work on all systems and may not benefit you all that much (certainly not on many small files, where a great deal of disk seeks are required, and possibly not on large, sparse files.) In addition, parallel accesses to disk nuke read performance due to the additional seek times (although these are mitigated on large arrays with many, high RPM drives; NCQ may help here too). This is GREAT when you want to read a single, large file, and performance really matters (want to saturate a gigabit network? This will help). Fragmentation (non-contiguousness) will hurt performance and mitigate the benefits of readahead.

Some recommendations for those who wish to have extremely high data read rates (sequentially especially):

Use XFS. Learn it and love it, as it's a great filesystem for large files. SGI even provides a tool called xfs_fsr, part of the xfsdump package of tools, to perform online defragmentation of files, great if you have moderately full filesystems with lots of large files that tend to get moved around, deleted, copied, edited, etc etc (can you say video server? I can!).

Test various readahead values to see what kind of performance you get. Try it on an unmounted filesystem (ie, if=/dev/sdb) to see what kind of "raw" read performance you can attain. Good starting values are 1024 or 2048 (as an engineer I have an affinity for numbers that are powers of 2) but you may find that performance is great at slightly higher values, depending on your hardware.

To address some performance penalties when dealing with small or fragmented files, use lots of high RPM disks. Eight 10K or better still 15K RPM drives will REALLY fly.

Remember that your performance with this many disks might be limited by the HBA or bus (typically 1-4 gbps for HBAs, varies greatly for bus (PCI, PCI-X, PCIe...)). At 4 gbps, your maximum achievable throughput assuming zero-overhead is 512 MB/s, which is totally doable with 8x15K SAS disks.

Minimize the number of threads concurrently accessing a filesystem. Once you start sharing your filesystem, your read performance starts to go to hell. Fast disks will help, but this is a real performance killer (and sometimes unavoidable).

reply

These are the defaults. Nice, safe, but not necessarily optimal. What's all this about 16-bit mode? I thought that went out with the 386! And why are most of the other options turned off?
link building

Flying

To address some performance penalties when dealing with small or fragmented files, use lots of high RPM disks. Eight 10K or better still 15K RPM.

belek property

bodrum property

yaa

thanks, great find.

internet

The system's utilities and libraries usually come from the GNU operating system, started in 1983 by Richard Stallman. The GNU contribution is the reason for the alternative name GNU/Linux.
optimizare site

Engineer

Well an engineer that loves his job must have a certain attraction to numbers cause engineers uses them all the time

Blog

WOw from what i can make out from these lines of command it is amazing.....wonder who made it ... imbunatatire trafic

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Copy the characters (respecting upper/lower case) from the image.