While googling for easy to run and not x86 centric megaflops benchmark I found this: GitHub - AMDmi3/flops: flops.c benchmark by Al Aburto with some improvements
This is portable C89 only tool to estimate performance of your compiler and FPU, although quite old (1992) and single threaded. It benchmarks only FPU, no memory transfers are declared to be done during it (it claims to use registers when possible). It tests fadd, fsub, fmul and fdiv and their variations, with double precision.
Tests ran on idling board, with cpuidle disabled (running constantly at 1500MHz), and pinned to cpu0 with taskset -c 0
.
Compiler versions used:
gcc: 12.2.0
clang: 16.0.4
gcc (dosbox): 4.2.1 (quite ancient but okay for any x86-32)
Here are my results:
gcc -fPIC -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u74 -mtune=sifive-7-series -pipe --param l1-cache-size=32 --param l2-cache-size=2048 -O3 -ffast-math -funroll-loops -funroll-all-loops -DUNIX -Wall -Wextra -pedantic flops.c -s -o flops.gcc.rv64imafdczbb_zba_sifive-7-series-unroll
:
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0450 310.8877
2 -1.4166e-13 0.0360 194.5428
3 4.7184e-14 0.0148 1145.2378
4 -1.2546e-13 0.0135 1110.3240
5 -1.3800e-13 0.0501 578.2750
6 3.2385e-13 0.0152 1910.6304
7 -6.5654e-11 0.1207 99.4402
8 3.4855e-13 0.0158 1893.3609
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 267.0047
MFLOPS(2) = 244.6028
MFLOPS(3) = 530.4581
MFLOPS(4) = 1532.5871
clang --driver-mode=gcc -menable-experimental-extensions -Wno-unused-command-line-argument -fPIC -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u74 -mtune=sifive-7-series -pipe -fomit-frame-pointer --param l1-cache-size=32 --param l2-cache-size=2048 -O3 -ffast-math -funroll-loops -DUNIX -Wall -Wextra -pedantic flops.c -s -o flops.clang.rv64imafdczbb_zba_sifive-7-series-unroll
:
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0417 335.7616
2 -1.3323e-13 0.0394 177.4636
3 1.9429e-14 0.0113 1510.0237
4 1.2157e-13 0.0131 1145.6756
5 6.1129e-13 0.0557 521.0849
6 3.3162e-13 0.0182 1595.1813
7 -2.4497e-11 0.1248 96.1899
8 3.4855e-13 0.0172 1746.3337
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 249.4224
MFLOPS(2) = 237.9115
MFLOPS(3) = 518.0758
MFLOPS(4) = 1524.0478
I also built it for fun for dosbox, it gave me about 4.3MFLOPS on VF2’s dosbox running at max 100%, and from 9.9 to 12.9MFLOPS on my i7-8750h.
You may look into binaries:
flops.dist.tar.xz (113.4 KB)
and logs from various systems, I made several versions including building Win32 ones under VF2 itself with mingw-w64 built using universal mingw suite from https://mxe.cc/ (quite useful! no ads, but they provide a self contained “no fuss” build system for building mingw on any Linux)