NVME Performance/speed

Hello all,

I am currently running a WD Blue SN570 in the PCIe 2.0 NVME slot.
Using the 202306 image with no “tweaks”.
I noticed using the simple hdparm utility that I am getting consistently the following results -

/dev/nvme0n1:
Timing cached reads: 1478 MB in 2.00 seconds = 738.96 MB/sec
Timing buffered disk reads: 500 MB in 3.00 seconds = 166.48 MB/sec

Seeing the rather “modest” performance results, I did not dig deeper using fio etc.
I then changed the NVME SSD for a Samsung 990 Pro, the results (see above) were more or less the same.
Given that a PCIe 2.0 slot is not the fastest today, I am still surprised at the relatively poor I/O performance.
Where is the bottleneck?

Does anyone else have a similiar situation?

Aubrey

Took the plunge and ran some further tests on the SN570 -

 Category                  Test                      Result

HDParm Disk Read 192.01 MB/s
HDParm Cached Disk Read 190.16 MB/s
DD Disk Write 82.1 MB/s
FIO 4k random read 46439 IOPS (185759 KB/s)
FIO 4k random write 7882 IOPS (31531 KB/s)
IOZone 4k read 42733 KB/s
IOZone 4k write 33075 KB/s
IOZone 4k random read 35236 KB/s
IOZone 4k random write 56797 KB/s
Test(s) courtesy of StarFive VisionFive 2 Official Debian SSD Boot Guide

Aubrey

Samsung 970 Pro 1TB.
sudo hdparm --direct -t /dev/nvme0n1
Timing O_DIRECT disk reads: 980 MB in 3.00 seconds = 326.19 MB/sec
That is without changing cpu governor.

I also run with 980 Pro, 990 Pro and WD SN850X, but for whatever reason old 970 Pro has by far the best results.

I could imagine that the newer models are so strongly optimised for PCIe 4.0 that they no longer perform so well under PCIe 3.0 and PCIe 2.0, or that the optimisations are detrimental for connections over only one lane. Benchmarks are usually performed on the latest mainboards and with the latest CPUs. It is certainly more important that the hardware is optimised for these benchmarks and a slowdown in weaker systems is accepted.

1 Like

Given that the NVME PCIe 2.0 slot is not the fastest today, I have to pose the question again - where is the bottleneck - with the current hardware/software/firmware?

Aubrey

You need to consider the number of PCIe lanes that are available. I thought the NVME performance was poor but when I found out that the vf2 has only one lane and the benchmarks you see for the PC are using 4 lanes on PCIe Gen 3 or Gen 4 you can begin to understand why the performance is like this, 1 lane PCIe Gen 2. That being said I have 1 TB of storage at a reasonable speed and that’s a good thing. I do my development in a browser on a PC with Jupyter Lab running on the vf2. The drive does what I need it to do quite well.

2 Likes

Your argument is unconvincing.
Using a CM4 on a Waveshare Baseboard (A), which also has a PCIe 2.0 interface and using the same NVME device as mentioned earlier, I am getting more than 50% throughput.

Aubrey

I am not arguing for anything. The performance is not what it should be. It is limited by the nature of the design but it could be better. I suspect that there are many improvements that could be made to the driver to improve the speed.

1 Like

A perfect sentiment. I agree with you. A mixture of driver and firmware perhaps.

Aubrey

Contention and overhead play a big role too. Any “little” thing to a normal system can turn out to be a big deal on the board. I’m trying out a bunch of different drives now that NVMEs have gotten so cheap, and I popped a brand new drive onto the board today. First thing I did was run hdparm on the raw space. When I tried the Pi Benchmarks script on it, it required a partition, so I did that and went ahead and created an ext4 fs, mounted it, and ran the script. Immediately I noticed the drive had gotten about 10% slower. Without thinking, I’d mounted the drive on a directory that’s being shared by samba. Note that it’s still a 100% empty file system with nothing being written to or read from. Samba shouldn’t be doing anything. So I ran a bunch of hdparm passes and took an average. Then I unmounted it and mounted it on a new directory. Bunch more hdparm runs and average. The speed returned to the same as the raw drive. Just sharing the drive with a single idle smbd share (that doesn’t even show up in top as doing anything) dropped the read speeds from 300MB/s to 276.5MB/s, or 8% slower.