You are testing 3 things at once with your benchmark:
- Single core performance only ( -T1 instead of -T$(nproc) )
- Memory access speed (read and write)
- Storage access speed (read and write)
If there was enough RAM I would probably try to eliminate benchmarking the data storage. By using something as simple as a basic ram disk.
I was always told to benchmark each component in isolation, as much as is possible, to have a better understanding of where the true bottlenecks are located in any system. I’m not saying that overall performance benchmarks are not important for real world applications. But comparing a system with an older MicroSD card to one with a brand new M.2 NVMe SSD using a storage access time based benchmark will be dominated by the choice and age of the storage (All SSD’s slowdown with the number of block erase/program cycles. Older used drives are slower when nearly full than empty). Components benchmarked in isolation does show the very best that is possible, that will typically never be achievable under real world usage. But does give an upper limit to which well written applications could peak at under ideal conditions.
Another method, that pre-caches a file or directory into cache memory (as long as there is enough free memory for all the files to fit), for accelerated reads is by using the “vmtouch” command.