In comparison:
gcc -O2 -DUNIX -Wall -Wextra -pedantic flops.c -s -o flops.gcc.rv64gc
(most software probably is built this way, including GPU binaries):
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -7.6739e-13 0.0627 223.2070
2 -5.7021e-13 0.0399 175.6143
3 -2.4314e-14 0.0434 391.9548
4 6.8612e-14 0.0400 374.6673
5 -1.6209e-14 0.0832 348.5732
6 1.3961e-13 0.0447 648.6781
7 -3.6152e-11 0.1311 91.5045
8 8.9373e-15 0.0494 607.5640
Iterations = 256000000
NullTime (usec) = 0.0000
MFLOPS(1) = 214.2803
MFLOPS(2) = 190.3339
MFLOPS(3) = 321.1960
MFLOPS(4) = 512.7001
clang --driver-mode=gcc -O2 -DUNIX -Wall -Wextra -pedantic flops.c -s -o flops.clang.rv64gc
:
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -7.6739e-13 0.0681 205.7057
2 -5.7021e-13 0.0412 169.9322
3 -2.4314e-14 0.0434 391.9756
4 6.8612e-14 0.0400 374.6767
5 -1.6209e-14 0.1072 270.4843
6 1.3961e-13 0.0687 421.9714
7 -3.6152e-11 0.1388 86.4404
8 8.9373e-15 0.0734 408.7417
Iterations = 256000000
NullTime (usec) = 0.0000
MFLOPS(1) = 208.5551
MFLOPS(2) = 172.1992
MFLOPS(3) = 270.5593
MFLOPS(4) = 403.5019
See the difference.