Flops: ancient (quite) but portable quick & dirty FPU benchmark

strlcat · June 22, 2023, 5:37pm

While googling for easy to run and not x86 centric megaflops benchmark I found this: GitHub - AMDmi3/flops: flops.c benchmark by Al Aburto with some improvements

This is portable C89 only tool to estimate performance of your compiler and FPU, although quite old (1992) and single threaded. It benchmarks only FPU, no memory transfers are declared to be done during it (it claims to use registers when possible). It tests fadd, fsub, fmul and fdiv and their variations, with double precision.

Tests ran on idling board, with cpuidle disabled (running constantly at 1500MHz), and pinned to cpu0 with taskset -c 0.

Compiler versions used:
gcc: 12.2.0
clang: 16.0.4
gcc (dosbox): 4.2.1 (quite ancient but okay for any x86-32)

Here are my results:
gcc -fPIC -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u74 -mtune=sifive-7-series -pipe --param l1-cache-size=32 --param l2-cache-size=2048 -O3 -ffast-math -funroll-loops -funroll-all-loops -DUNIX -Wall -Wextra -pedantic flops.c -s -o flops.gcc.rv64imafdczbb_zba_sifive-7-series-unroll:

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0450    310.8877
     2     -1.4166e-13      0.0360    194.5428
     3      4.7184e-14      0.0148   1145.2378
     4     -1.2546e-13      0.0135   1110.3240
     5     -1.3800e-13      0.0501    578.2750
     6      3.2385e-13      0.0152   1910.6304
     7     -6.5654e-11      0.1207     99.4402
     8      3.4855e-13      0.0158   1893.3609

   Iterations      =  512000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =   267.0047
   MFLOPS(2)       =   244.6028
   MFLOPS(3)       =   530.4581
   MFLOPS(4)       =  1532.5871

clang --driver-mode=gcc -menable-experimental-extensions -Wno-unused-command-line-argument -fPIC -mabi=lp64d -march=rv64imafdczbb_zba -mcpu=sifive-u74 -mtune=sifive-7-series -pipe -fomit-frame-pointer --param l1-cache-size=32 --param l2-cache-size=2048 -O3 -ffast-math -funroll-loops -DUNIX -Wall -Wextra -pedantic flops.c -s -o flops.clang.rv64imafdczbb_zba_sifive-7-series-unroll:

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0417    335.7616
     2     -1.3323e-13      0.0394    177.4636
     3      1.9429e-14      0.0113   1510.0237
     4      1.2157e-13      0.0131   1145.6756
     5      6.1129e-13      0.0557    521.0849
     6      3.3162e-13      0.0182   1595.1813
     7     -2.4497e-11      0.1248     96.1899
     8      3.4855e-13      0.0172   1746.3337

   Iterations      =  512000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =   249.4224
   MFLOPS(2)       =   237.9115
   MFLOPS(3)       =   518.0758
   MFLOPS(4)       =  1524.0478

I also built it for fun for dosbox, it gave me about 4.3MFLOPS on VF2’s dosbox running at max 100%, and from 9.9 to 12.9MFLOPS on my i7-8750h.

You may look into binaries:
flops.dist.tar.xz (113.4 KB)
and logs from various systems, I made several versions including building Win32 ones under VF2 itself with mingw-w64 built using universal mingw suite from https://mxe.cc/ (quite useful! no ads, but they provide a self contained “no fuss” build system for building mingw on any Linux)

strlcat · June 22, 2023, 9:50pm

Edit:

Added compiler versions used
Added binaries and source I built and compared VF2 against other systems

strlcat · June 23, 2023, 10:33am

In comparison:
gcc -O2 -DUNIX -Wall -Wextra -pedantic flops.c -s -o flops.gcc.rv64gc (most software probably is built this way, including GPU binaries):

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -7.6739e-13      0.0627    223.2070
     2     -5.7021e-13      0.0399    175.6143
     3     -2.4314e-14      0.0434    391.9548
     4      6.8612e-14      0.0400    374.6673
     5     -1.6209e-14      0.0832    348.5732
     6      1.3961e-13      0.0447    648.6781
     7     -3.6152e-11      0.1311     91.5045
     8      8.9373e-15      0.0494    607.5640

   Iterations      =  256000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =   214.2803
   MFLOPS(2)       =   190.3339
   MFLOPS(3)       =   321.1960
   MFLOPS(4)       =   512.7001

clang --driver-mode=gcc -O2 -DUNIX -Wall -Wextra -pedantic flops.c -s -o flops.clang.rv64gc:

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -7.6739e-13      0.0681    205.7057
     2     -5.7021e-13      0.0412    169.9322
     3     -2.4314e-14      0.0434    391.9756
     4      6.8612e-14      0.0400    374.6767
     5     -1.6209e-14      0.1072    270.4843
     6      1.3961e-13      0.0687    421.9714
     7     -3.6152e-11      0.1388     86.4404
     8      8.9373e-15      0.0734    408.7417

   Iterations      =  256000000
   NullTime (usec) =     0.0000
   MFLOPS(1)       =   208.5551
   MFLOPS(2)       =   172.1992
   MFLOPS(3)       =   270.5593
   MFLOPS(4)       =   403.5019

See the difference.