There are different kind of performance indicators for disks. The most commonly known is probably throughput. When you use a USB stick you will immediately notice if it’s USB 2.0 or 3.x in case of a GiB sized data transfer. Another important performance indicator for storage is IOPS. IOPS is the number of I/O operations that the disk can handle per second. For example, a typical 7200U SATA HDD has about 75 IOPS, while a Samsung SSD 850 PRO has about 100k IOPS. You’ve probably noticed a significant performance boost after replacing your HDD with a SSD for your operating system.
To reach the maximum IOPS and throughput limits, the applications have to issue I/O requests with enough parallelism. If they don’t, well then disk latency becomes the bottleneck. Disk latency is the amount of time it takes to process a I/O transaction. It can be measured with a tool called ioping, which works very much like the ping tool for hosts.
ioping is available for common Linux distributions and BSD.
# Ubuntu/Debian apt-get install ioping # Arch pacman -S ioping # Fedora dnf install ioping # MacOS brew install ioping # FreeBSD pkg install ioping
If you are running on Windows you can download ioping-1.2-win32.zip then unzip and run the ioping executable. In case your OS is not listed you can try to build ioping from source see: https://github.com/koct9i/ioping
If you know how ping works, then you already know how to use ioping, just write the command and give it a directory path as argument. Here you can see an example run for the current directory:
ioping . 4 KiB <<< . (ext4 /dev/dm-0 107.8 GiB): request=1 time=282.6 us (warmup) 4 KiB <<< . (ext4 /dev/dm-0 107.8 GiB): request=2 time=855.6 us 4 KiB <<< . (ext4 /dev/dm-0 107.8 GiB): request=3 time=871.1 us 4 KiB <<< . (ext4 /dev/dm-0 107.8 GiB): request=4 time=748.5 us 4 KiB <<< . (ext4 /dev/dm-0 107.8 GiB): request=5 time=755.4 us 4 KiB <<< . (ext4 /dev/dm-0 107.8 GiB): request=6 time=747.8 us ^C --- . (ext4 /dev/dm-0 107.8 GiB) ioping statistics --- 5 requests completed in 3.98 ms, 20 KiB read, 1.26 k iops, 4.91 MiB/s generated 6 requests in 5.20 s, 24 KiB, 1 iops, 4.62 KiB/s min/avg/max/mdev = 747.8 us / 795.7 us / 871.1 us / 55.5 us
As we can see the average response time for my SSD is about 800 us (0.8 ms), which results in 1260 sequential IOPS. Even though the SSD could achieve something like 100k IOPS in parallel, it can do only a little more than 1k IOPS on sequential request.
It’s also possible to ping RAM in case it’s mounted on
/tmp, which is the default case under many linux distributions. If you want to ping the memory 10 times run
ioping -c 10 /tmp. I did that for RAM and some other devices and collected the results in the following table:
|RAM||22 us||48000||DDR3 1600MHz (PC3L 12800S)|
|SSD||796 us||1240||TOSHIBA THNSNJ12|
|iSCSI||1.5 ms||649||Hetzner Cloud Storage (Ceph block device)|
|HDD||14 ms||73||HGST HTS725050A7|
|SSHFS||26 ms||40||Hetzner VPS (20 ms network ping)|
As we can see a fast SSD over network mount can easily beat a local HDD. The I/O latency for Hetzner Cloud Storage is about 1.5 ms. This tells us that their SSD based Ceph cluster must be in the same data center as the VPS, that makes sense. A mount over ssh with sshfs reveals that sshfs itself adds about 6 ms of latency on top of the network latency. Although the drive itself is a SSD, the network latency turns it into a very slow filesystem mount with about half the performance of a local HDD. It is possible to calculate sequential IOPS, since it follows from the latency with IOPS = 1/, whereas is latency in seconds.
We can conclude that I/O latency plays an important role for many applications, because I/O operations often happen to be sequential. This is similar to the importance of single-thread performance in a multi-core CPU architecture. While it is good to have many cores available, the single-thread performance of each core is still very relevant for the overall CPU speed in practice. The reason for that is that many applications runs single-threaded, like all coreutils except for sort, and can only utilize one core at a time.