Does the JH7110 Support RISC-V Extension D?

Right now, it looks as if any “optimized” compilation of zstd leads to longer runtime, this is true for both gcc and clang (see my earlier post with the table with results). If I recompile the binary using gcc or clang without any extra options (as dynamic or static binary), it runs faster. There is a Debian bug reports, which reports slow performance of the Debian build. Slow Debian build is something I was also able to verify on my amd64 workstation with a self compiled binary, but unlike RISC-V, clang did not create a faster binary than gcc on amd64 (this indicates that both compilers perform similar on amd64). But, like on RISC-V, optimized builds were slower, therefore I suspect something is fundamentally wrong with the way I compile zstd.

On my JH7110, clang creates faster binaries of zstd than gcc (contrary to older research papers, which saw gcc at an advantage over a wide set of software).
zstd has a built-in benchmark (zstd -b), I used this to verify my results:

root@visionfive2:/tmp/benchmark# zstd -b
 3#Lorem ipsum       :  10000000 ->   2981954 (x3.354),   12.0 MB/s,   30.1 MB/s
root@visionfive2:/tmp/benchmark# ./zstd-local-optimized-static_nooptions -b
 3#Lorem ipsum       :  10000000 ->   2981954 (x3.354),   13.0 MB/s,   31.0 MB/s
root@visionfive2:/tmp/benchmark# ./zstd-local-optimized-static-clang-19.1.7_nooptions -b
 3#Lorem ipsum       :  10000000 ->   2981954 (x3.354),   14.0 MB/s,   42.8 MB/s
root@visionfive2:/tmp/benchmark# ./zstd-local-optimized_march-rv64gc_zba_zbb -b
 3#Lorem ipsum       :  10000000 ->   2981954 (x3.354),   4.35 MB/s,   10.0 MB/s
root@visionfive2:/tmp/benchmark# ./zstd-local-optimized-static_march-rv64g_-misa-spec_20191213_-march_rv64imafd_zicsr_zifence -b
 3#Lorem ipsum       :  10000000 ->   2981954 (x3.354),   4.15 MB/s,    9.8 MB/s

Due to the performance regression on zstd, I decided to turn to bzip3 as a second benchmark to test the effect of optimized builds and verify my results.
Results were similar, with minor (2%) speed improvements for a locally compiled version versus Debian Stock, without any optimization for the CPU. Results were the same for gcc and clang. With optimization, I saw massive performance degradation.
Also interesting is the fact, that optimized binaries were larger. I would expect a more optimized binary to be smaller due to the usage of more specialized CPU functions.

1 Like