Experimental Gentoo Image

If you want to install Xorg and configure X then your Config file
/etc/X11/xorg.conf should contain the following.

# X.Org X server configuration file.

Section "Device"
        Identifier      "Video Device"
        Driver          "modesetting"
        # Option                "Atomic"                "true"
        Option          "NoCursor"              "true"
EndSection

Section "OutputClass"
        Identifier      "starfive display"
        MatchDriver     "starfive"
        Option          "PrimaryGPU"    "true"
EndSection

Section "Monitor"
        Identifier      "Monitor"
        # Option                "DPMS"                  "false"
EndSection

Section "Screen"
        Identifier      "Screen"
        Monitor         "Monitor"
        Device          "Video Device"
EndSection

Section "ServerLayout"
        Identifier      "Server Layout"
        Screen          "Screen"
EndSection

Section "ServerFlags"
        Option          "DefaultServerLayout"   "Server Layout"

        # Enable support for DRM format modifiers
        # Option                "Debug"                 "dmabuf_capable"

        # Disable screen blanking. Disable DPMS in the Monitor section as well.
        # Option                "BlankTime"             "35790"
        # Option                "StandbyTime"           "35790"
        # Option                "SuspendTime"           "35790"
        # Option                "OffTime"               "35790"
EndSection

3 Likes

rv64imafdczbb0p93_zba0p93

Is that an idiosyncrasy of Gentoo? Why not just rv64gc_zba_zbb? In particular, the 0p93 part looks obsolete.

I got that suggestion from the safe gcc flags thread. There are probably better settings to use but this is just a possible starting point. I suggest testing other settings as well and checking various benchmarks to see if the performance is better.

Well I think it’s wrong. Also, I’d be extremely interested in learning what if anything --param l1-cache-size=32 --param l2-cache-size=2048 would change. Nothing I suspect.

There is another setting to try for riscv mentioned on this link https://wiki.gentoo.org/wiki/Safe_CFLAGS

I do not know, I think of it a bit like putting items in three boxes two that are right beside you (32 KiB L1 I-cache and 32 KiB L1 D-cache) and one that is further away (2MiB L2 Loosely Integrated Memory), before having to deal with the higher latency main LPDDR4 RAM (which is very far away by comparison). If you do not know the exact size of all the boxes, there is no way you could know what optimisations might help, or cause less performance, at compile time.

Looking at the Dhrystone benchmark above where the VF2 scores higher (5.797 DMIPS/MHz) than the than a RPi4 (5.21 DMIPS/MHz), I’d say that it helps.

Looking at the Dhrystone benchmark above where the VF2 scores higher than the than a RPi4, I’d say that it helps.

No, that’s not how you compare. You compile two versions of Dhrystones, one with those flags and one without. I’d be very surprised if the two executables aren’t bit-identical. If they aren’t, then I would peruse the objdump -d of both to see what changed. If they are the same, then those flags are mere cargo culture.

1 Like

I’d see it as an indication, and what you suggest as the second step, to validate or disprove.

Cache size only drives heuristics in gcc. Modeling the memory subsystem with main memory, L2 and L1 isn’t really state of the art any more - but it’s not wrong.

It’s also used if you have a x86 CPU and use -march=native.

gcc -v -E -x c /dev/null -o /dev/null -march=native 2>&1 | grep /cc1 | grep param

/usr/libexec/gcc/x86_64-pc-linux-gnu/12/cc1 -E -quiet -v /dev/null -o /dev/null -march=haswell -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mno-avx512f -mbmi -mbmi2 -maes -mpclmul -mno-avx512vl -mno-avx512bw -mno-avx512dq -mno-avx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mno-avx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -mno-adx -mabm -mno-cldemote -mno-clflushopt -mno-clwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mno-hle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mno-prfchw -mno-ptwrite -mno-rdpid -mrdrnd -mno-rdseed -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mno-xsavec -mxsaveopt -mno-xsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 –param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=8192 -mtune=haswell -dumpbase null

I’m confused that you didn’t know @tommythorn ? You know always everything right?

Thank you for your work @andrew . Later in the gcc threat I changed my recommendations to:

COMMON_FLAGS="-O2 -pipe -fomit-frame-pointer"
OPT_FLAGS="--param l1-cache-size=32 --param l2-cache-size=2048"
CFLAGS="-mabi=lp64d -march=rv64imafdc_zicsr_zba_zbb -mcpu=sifive-u74 -mtune=sifive-7-series ${COMMON_FLAGS} ${OPT_FLAGS}"
CXXFLAGS="-mabi=lp64d -march=rv64imafdc_zicsr_zba_zbb -mcpu=sifive-u74 -mtune=sifive-7-series ${COMMON_FLAGS} ${OPT_FLAGS}"

And tommythorn is right: rv64imafdc_zicsr_zba_zbb and rv64gc_zba_zbb points to the same gcc optimizations, but I thinks rv64imafdc_zicsr_zba_zbb is more informative because all extensions are readable.

RV64GC, the letter G means imafd, C means compressed instructions. (found on:https://gist.github.com/dominiksalvet/2a982235957012c51453139668e21fce)

1 Like

I have updated my Post to refer people to the Gentoo Safe CFLAGS.
Thanks for all the input on this.

1 Like

I’m not on gentoo. I’m on a debian image 69 sdcard /boot with a fedora 38 rootfs on my nvme as /.
I haven’t upgraded to the newer march debian image/firmware.

I did a dnf upgrade this morning.

As mentioned here: Safe CFLAGS - Gentoo Wiki

davidm@fc38-rv64-vf2-YOW 2023-03-16_06:15:21_EDT : ~
 $ gcc -v -E -x c /dev/null -o /dev/null -march=native 2>&1 | grep /cc1 | grep mtune
davidm@fc38-rv64-vf2-YOW 2023-03-16_06:17:27_EDT : ~

That gave me nothing.

Yeah so I imagine the rv64 stuff for fedora 38 gcc 13 is tuned to hifive unmatched.

davidm@fc38-rv64-vf2-YOW 2023-03-16_06:19:03_EDT : ~
 $ gcc -v -E -x c /dev/null -o /dev/null
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
Target: riscv64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-13.0.1-20230215/obj-riscv64-redhat-linux/isl-install --with-arch=rv64gc --with-abi=lp64d --with-multilib-list=lp64d --build=riscv64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.1 20230215 (Red Hat 13.0.1-0) (GCC) 
COLLECT_GCC_OPTIONS='-v' '-E' '-o' '/dev/null' '-march=rv64imafdc_zicsr_zifencei' '-mabi=lp64d' '-misa-spec=20191213' '-march=rv64imafdc_zicsr_zifencei'
 /usr/libexec/gcc/riscv64-redhat-linux/13/cc1 -E -quiet -v /dev/null -o /dev/null -march=rv64imafdc_zicsr_zifencei -mabi=lp64d -misa-spec=20191213 -march=rv64imafdc_zicsr_zifencei -dumpbase null
ignoring nonexistent directory "/usr/lib/gcc/riscv64-redhat-linux/13/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/riscv64-redhat-linux/13/../../../../riscv64-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/riscv64-redhat-linux/13/include
 /usr/local/include
 /usr/include
End of search list.
COMPILER_PATH=/usr/libexec/gcc/riscv64-redhat-linux/13/:/usr/libexec/gcc/riscv64-redhat-linux/13/:/usr/libexec/gcc/riscv64-redhat-linux/:/usr/lib/gcc/riscv64-redhat-linux/13/:/usr/lib/gcc/riscv64-redhat-linux/
LIBRARY_PATH=/usr/lib/gcc/riscv64-redhat-linux/13/:/lib64/lp64d/../lib64/lp64d/:/usr/lib64/lp64d/../lib64/lp64d/:/lib/../lib64/lp64d/:/usr/lib/../lib64/lp64d/:/lib64/lp64d/:/usr/lib64/lp64d/:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-E' '-o' '/dev/null' '-march=rv64imafdc_zicsr_zifencei' '-mabi=lp64d' '-misa-spec=20191213' '-march=rv64imafdc_zicsr_zifencei'
davidm@fc38-rv64-vf2-YOW 2023-03-16_06:19:17_EDT : ~
 $

I added this to my .bashrc this morning:

COMMON_FLAGS="-O3 -pipe -fomit-frame-pointer"
OPT_FLAGS="--param l1-cache-size=32 --param l2-cache-size=2048"
CFLAGS="-mabi=lp64d -march=rv64imafdc_zicsr_zba_zbb -mcpu=sifive-u74 -mtune=sifive-7-series ${COMMON_FLAGS} ${OPT_FLAGS}"
CXXFLAGS="-mabi=lp64d -march=rv64imafdc_zicsr_zba_zbb -mcpu=sifive-u74 -mtune=sifive-7-series ${COMMON_FLAGS} ${OPT_FLAGS}"

export PS1="\u@\h \D{%F_%T_%Z} : \w\n $ "
git clone https://github.com/sifive/benchmark-dhrystone.git
cd benchmark-dhrystone/
emacs -nw --color=no Makefile

I uncommented this line: benchmark-dhrystone/Makefile at master · sifive/benchmark-dhrystone · GitHub

Here are my results:

$ time ./dhrystone 

Dhrystone Benchmark, Version 2.1 (Language: C)

Program compiled without 'register' attribute

Execution starts, 20000000 runs through Dhrystone
Execution ends

Final values of the variables used in the benchmark:

Int_Glob:            5
        should be:   5
Bool_Glob:           1
        should be:   1
Ch_1_Glob:           A
        should be:   A
Ch_2_Glob:           B
        should be:   B
Arr_1_Glob[8]:       7
        should be:   7
Arr_2_Glob[8][7]:    20000010
        should be:   Number_Of_Runs + 10
Ptr_Glob->
  Ptr_Comp:          94880
        should be:   (implementation-dependent)
  Discr:             0
        should be:   0
  Enum_Comp:         2
        should be:   2
  Int_Comp:          17
        should be:   17
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
  Ptr_Comp:          94880
        should be:   (implementation-dependent), same as above
  Discr:             0
        should be:   0
  Enum_Comp:         1
        should be:   1
  Int_Comp:          18
        should be:   18
  Str_Comp:          DHRYSTONE PROGRAM, SOME STRING
        should be:   DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:           5
        should be:   5
Int_2_Loc:           13
        should be:   13
Int_3_Loc:           7
        should be:   7
Enum_Loc:            1
        should be:   1
Str_1_Loc:           DHRYSTONE PROGRAM, 1'ST STRING
        should be:   DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:           DHRYSTONE PROGRAM, 2'ND STRING
        should be:   DHRYSTONE PROGRAM, 2'ND STRING

Microseconds for one run through Dhrystone: 0 
Dhrystones per Second:                      10000000 


real	0m2.451s
user	0m2.439s
sys	0m0.011s

Yes, it would be preferrable to see these kinds of benchmark tools become a package within the default vf2 debian repo and possibly within the fedora 38 vf2 repos if such a thing will ever exist.

1 Like

Dhrystones and CoreMark just aren’t very useful; they are very synthetic toy benchmarks that proves almost nothing. You really need to use something more realistic. SPEC is good but not free, Geekbench is good, but also not free. Phoronix has a great suite, but they are not easy to set up in my opinion. Personally, besides SPEC, I use Real Workload I care about, like “cargo install http” or some of my private tools I use for work.

Add: some tools, like zstd and xz have built-in benchmarks, which are excellent, but you need to be super careful to not draw to broad conclusions from this (eg. they are 100% I$ resident, use no FP, and have a very particular memory access pattern)

Ironically, microarchitects like Dhrystones (but not as a benchmark) because it’s a loop of less than 200 instructions and we study Every Single One and know exactly how branches behaves (eg. one of them is taken exactly every five time), etc. However you can cover those with a tiny BP like GShare or YAGS, but that BP will look much worse when you are running, say, Firefox or Postgres.

The dirty secret about Dhrystones that people don’t tell you: it’s actually just testing your strcmp/strcpy performance, so you can totally game it by providing better versions of that.

2 Likes

Phoronix cpu test suite partially installed 5 tests and only two succeeded/gave results: pts/mafft and pts/himeno.

wget https://phoronix-test-suite.com/releases/phoronix-test-suite-10.8.4.tar.gz
tar zxvf phoronix-test-suite-10.8.4.tar.gz
cd phoronix-test-suite
sudo ./install-sh
sudo dnf install php-cli php-xml php-json
phoronix-test-suite interactive
Select Task: 2
Select Suite: 3

Installed:
  Lmod-8.7.19-1.fc38.riscv64                                                    
  blas-3.11.0-2.fc38.riscv64                                                    
  blas-devel-3.11.0-2.fc38.riscv64                                              
  blas64-3.11.0-2.fc38.riscv64                                                  
  blas64_-3.11.0-2.fc38.riscv64                                                 
  fftw-3.3.10-3.0.riscv64.fc37.riscv64                                          
  fftw-devel-3.3.10-3.0.riscv64.fc37.riscv64                                    
  fftw-libs-3.3.10-3.0.riscv64.fc37.riscv64                                     
  fftw-libs-long-3.3.10-3.0.riscv64.fc37.riscv64                                
  fftw-libs-single-3.3.10-3.0.riscv64.fc37.riscv64                              
  hdf5-1.12.1-10.0.riscv64.fc37.riscv64                                         
  hdf5-devel-1.12.1-10.0.riscv64.fc37.riscv64                                   
  hwloc-libs-2.5.0-5.fc38.riscv64                                               
  lapack-3.11.0-2.fc38.riscv64                                                  
  lapack-devel-3.11.0-2.fc38.riscv64                                            
  lapack64-3.11.0-2.fc38.riscv64                                                
  lapack64_-3.11.0-2.fc38.riscv64                                               
  libaec-1.0.6-3.fc37.riscv64                                                   
  libaec-devel-1.0.6-3.fc37.riscv64                                             
  libfabric-1.15.1-2.fc37.riscv64                                               
  libibumad-41.0-1.0.riscv64.fc37.riscv64                                       
  lua-filesystem-1.8.0-8.fc38.riscv64                                           
  lua-json-1.3.4-3.fc38.noarch                                                  
  lua-lpeg-1.0.2-10.fc38.riscv64                                                
  lua-posix-35.1-5.fc38.riscv64                                                 
  lua-term-0.07-17.fc38.riscv64                                                 
  munge-libs-0.5.15-2.fc37.riscv64                                              
  openblas-devel-0.3.21-3.0.riscv64.fc37.riscv64                                
  openblas-openmp64-0.3.21-3.0.riscv64.fc37.riscv64                             
  openblas-openmp64_-0.3.21-3.0.riscv64.fc37.riscv64                            
  openblas-serial-0.3.21-3.0.riscv64.fc37.riscv64                               
  openblas-serial64-0.3.21-3.0.riscv64.fc37.riscv64                             
  openblas-serial64_-0.3.21-3.0.riscv64.fc37.riscv64                            
  openblas-threads-0.3.21-3.0.riscv64.fc37.riscv64                              
  openblas-threads64-0.3.21-3.0.riscv64.fc37.riscv64                            
  openblas-threads64_-0.3.21-3.0.riscv64.fc37.riscv64                           
  openmpi-4.1.4-4.1.riscv64.fc37.riscv64                                        
  openmpi-devel-4.1.4-4.1.riscv64.fc37.riscv64                                  
  opensm-libs-3.3.24-5.fc38.riscv64                                             
  orangefs-2.9.8-8.fc38.riscv64                                                 
  pmix-4.1.2-3.fc37.riscv64                                                     
  rpm-mpi-hooks-8-5.fc38.noarch

    To Install:    pts/himeno-1.3.0
    To Install:    pts/mrbayes-1.5.0
    To Install:    pts/hmmer-1.3.0
    To Install:    pts/mafft-1.6.2
    To Install:    pts/qmcpack-1.5.0

    Determining File Requirements ........................................................................................................................................................................................................
    Searching Download Caches ............................................................................................................................................................................................................

    5 Tests To Install
        7 Files To Download [210MB]
        1960MB Of Disk Space Is Needed
        7 Minutes, 12 Seconds Estimated Install Time

    pts/himeno-1.3.0:
        Test Installation 1 of 5
        1 File Needed
        Downloading: himenobmtxpa-2.tar.xz                                                                                                                                                                                        [0.00MB]
        Downloading ......................................................................................................................................................................................................................
        Approximate Install Size: 1 MB
        Estimated Install Time: 2 Seconds
        Installing Test @ 22:28:58

    pts/mrbayes-1.5.0:
        Test Installation 2 of 5
        1 File Needed [0.52 MB]
        Downloading: MrBayes-3.2.7a.tar.gz                                                                                                                                                                                        [0.52MB]
        Downloading ......................................................................................................................................................................................................................
        Approximate Install Size: 22 MB
        Estimated Install Time: 40 Seconds
        Installing Test @ 22:29:02
            The installer exited with a non-zero exit status.
            ERROR: cannot guess build type; you must specify one
            LOG: ~/.phoronix-test-suite/installed-tests/pts/mrbayes-1.5.0/install-failed.log

    pts/hmmer-1.3.0:
        Test Installation 3 of 5
        2 Files Needed [97.28 MB / 4 Minutes]
        Downloading: hmmer-3.3.2.tar.gz                                                                                                                                                                                          [17.37MB]
        Estimated Download Time: 1m ......................................................................................................................................................................................................
        Downloading: Pfam_ls.gz                                                                                                                                                                                                  [79.92MB]
        Estimated Download Time: 1m ......................................................................................................................................................................................................
        Approximate Install Size: 719 MB
        Estimated Install Time: 32 Seconds
        Installing Test @ 22:29:21
            The installer exited with a non-zero exit status.
            ERROR: No supported vectorization found for your machine.
            LOG: ~/.phoronix-test-suite/installed-tests/pts/hmmer-1.3.0/install-failed.log

    pts/mafft-1.6.2:
        Test Installation 4 of 5
        2 Files Needed [0.78 MB / 1 Minute]
        Downloading: mafft-7.471-without-extensions-src.tgz                                                                                                                                                                       [0.59MB]
        Estimated Download Time: 1m ......................................................................................................................................................................................................
        Downloading: mafft-ex1-lsu-rna.txt                                                                                                                                                                                        [0.19MB]
        Estimated Download Time: 1m ......................................................................................................................................................................................................
        Approximate Install Size: 18 MB
        Estimated Install Time: 13 Seconds
        Installing Test @ 22:29:40

    pts/qmcpack-1.5.0:
        Test Installation 5 of 5
        1 File Needed [111 MB / 1 Minute]
        Downloading: qmcpack-3.13.0.tar.gz                                                                                                                                                                                         [111MB]
        Estimated Download Time: 1m ......................................................................................................................................................................................................
        Approximate Install Size: 1200 MB
        Estimated Install Time: 5 Minutes, 45 Seconds
        Installing Test @ 22:31:34
            The installer exited with a non-zero exit status.
            ERROR: qmcpack-3.13.0/external_codes/boost_multi/multi/include/multi/./detail/../detail/serialization.hpp:77:16: error: template argument 2 is invalid
            LOG: ~/.phoronix-test-suite/installed-tests/pts/qmcpack-1.5.0/install-failed.log


The following tests failed to install:

  - pts/mrbayes-1.5.0
  - pts/hmmer-1.3.0
  - pts/qmcpack-1.5.0

Timed MrBayes Analysis 3.2.7:
    pts/mrbayes-1.5.0
    Test 1 of 5
    Estimated Trial Run Count:    3                      
    Estimated Test Run-Time:      16 Minutes             
    Estimated Time To Completion: 34 Minutes [23:11 UTC] 
        Started Run 1 @ 22:38:16
        The test quit with a non-zero exit status.
        Started Run 2 @ 22:38:20
        The test quit with a non-zero exit status.
        Started Run 3 @ 22:38:25
        The test quit with a non-zero exit status.
        E: There are not enough slots available in the system to satisfy the 4
QMCPACK 3.13:
    pts/qmcpack-1.5.0 [Input: simple-H2O]
    Test 2 of 5
    Estimated Trial Run Count:    3                      
    Estimated Test Run-Time:      3 Minutes              
    Estimated Time To Completion: 18 Minutes [22:56 UTC] 
        Started Run 1 @ 22:38:35
        The test quit with a non-zero exit status.
        Started Run 2 @ 22:38:39
        The test quit with a non-zero exit status.
        Started Run 3 @ 22:38:43
        The test quit with a non-zero exit status.
        E: There are not enough slots available in the system to satisfy the 4
Timed HMMer Search 3.3.2:
    pts/hmmer-1.3.0
    Test 3 of 5
    Estimated Trial Run Count:    3                      
    Estimated Test Run-Time:      9 Minutes              
    Estimated Time To Completion: 16 Minutes [22:54 UTC] 
        Started Run 1 @ 22:38:53
        The test quit with a non-zero exit status.
        Started Run 2 @ 22:38:57
        The test quit with a non-zero exit status.
        Started Run 3 @ 22:39:01
        The test quit with a non-zero exit status.
        E: There are not enough slots available in the system to satisfy the 4
Timed MAFFT Alignment 7.471:
    pts/mafft-1.6.2
    Test 4 of 5
    Estimated Trial Run Count:    3                     
    Estimated Test Run-Time:      3 Minutes             
    Estimated Time To Completion: 8 Minutes [22:46 UTC] 
        Started Run 1 @ 22:39:11
        Started Run 2 @ 22:41:42
        Started Run 3 @ 22:44:14

    Multiple Sequence Alignment - LSU RNA:
        147.107
        147.222
        147.161

    Average: 147.163 Seconds
    Deviation: 0.04%

Himeno Benchmark 3.0:
    pts/himeno-1.3.0
    Test 5 of 5
    Estimated Trial Run Count:    3                     
    Estimated Time To Completion: 6 Minutes [22:51 UTC] 
        Started Run 1 @ 22:46:51
        Started Run 2 @ 22:47:56
        Started Run 3 @ 22:49:00

    Poisson Pressure Solver:
        124.544577
        124.502994
        124.494601

    Average: 124.514057 MFLOPS
    Deviation: 0.02%


The following tests failed to properly run:

    - pts/mrbayes-1.5.0
    - pts/qmcpack-1.5.0: Input: simple-H2O
    - pts/hmmer-1.3.0

I started another test on suite 57(pts/system). I’ll come back much later this evening. It’s such a long running test, I started “screen”, then started test 57, ctrl-a, then press “d” to detach, and exited from my ssh shell to the vf2.
UPDATE: yeah it failed. Lots of packages are still missing in fc38. It’s still being built for rv64.

You can review the results of my Test using the following where I compared it to other Risc V Systems

phoronix-test-suite benchmark 2202191-IB-2112148AS16

Here is the link to the results

3 Likes

oh that pretty nice actually, but I miss the simplicity of SPEC (scores are a ratio of the reference machine and final score a geometric mean).

Is there an convenient way to compare the results against, say, a Raspberry Pi 4 (Cortex-A72)?

All very much true, Tommythorn.

A related dirty secret is that a lot of the world’s CRUD systems and accounting and other super-boring stuff that pays the bills is that a lot of computing (not all…) is dominated by strcmp/strcpy/memcmp/memcpy (or things that look like them to a uarch) so measuring those actually IS useful. Even in data centers where code is written by highly-trained professionals, saving 1% on memmove can save $$$ in hardware and electricity.

It’s important that it not be the ONLY measure, which I think was your point.

3 Likes

No, it was my point that it doesn’t even test that well (it’s always the same short string). It’s minimally useful really.

I Have run the same tests on my Raspberry PI 4 and my Raspberry PI 3.
Note that in some of the compression tests my Raspberry PI 3 was swapping due to it only having 1GB of Memory. It was running the 64 bit Ubuntu 22.04.The Raspberry PI 4 has 8GB of Memory and was also running Gentoo just like my Visionfive 2

Here are the results of the tests for you to Compare

4 Likes

Thanks, Andrew. I didn’t go through every one of these, but these seem to approximately match what architecture wonks expected. My broad takeaways, which could be wrong, are:

  1. Tuned implementations matter. It’s likely the ARM OpenSSL has an assembly version and the RISC-V ones just don’t…yet. Certainly having a many year head start helps.
  2. There’s. a BIG gap between Unmatched and BeagleV that’s not strictly explained by the additional RAM and (I think) cache. Unmatched was an even older U74 than in BeagleV so I’m surprised it has such a strong showing.
  3. SiFive’s additional work in the core fixing the bandwidth issues between the U74 release used in 7100 BeagleV and the 7110 in V5 was a big help. A couple years of real-world tweaking helped.
  4. Beagle-Fedora and Beagle-Debian are just in different places on their tuning journey. THey’re more different than I’d have guessed. Since we are unlikely to see a lot of releases or tuning for BeagleV (Or V5R1) they’re likely eternally a snapshot of builds from ~2 years ago. JH-7100’s Network performance, for example, is always going to be terrible.
  5. Gentoo on Pi4 consistently smokes!
  6. For the most part, the JH-7100 overtakes the Pi3 (400Mhz) on clock speed (1Ghz) but the multi-core Pi4’s performance at the same clock speed is consistently better because of A72’s Out Of Order execution engine on Pi4. Register renaming, OOO, op fusion, and being able to knock out up to 5 insns per cycle are all hard, but they pay off. This (and years of tuning) handily give wins to Pi4 over JH-7110 running at the same speed. A C910 like in TH-1520 should help close that gap. Maybe someone will ship a SiFive U8 or P550 core in a chip some day.

If you want to try to run the Meta LLaMA type AI Chatbot on your Visionfive2 Under Debian head over to GitHub - antimatter15/alpaca.cpp: Locally run an Instruction-Tuned Chat-Style LLM
Clone the files
Edit ggml.c and comment out line 155 / **** #include <immintrin.h> **** /
To compile using clang
export CC=/usr/lib/llvm/15/bin/clang
export CXX=/usr/lib/llvm/15/bin/clang++
export CPP=/usr/lib/llvm/15/bin/clang-cpp
make
Follow the rest of the instructions listed in the README.md
NOTE: It is slow but works it you run it and then come back later to see the results.

Here is the result of a Sample run

Sample run
./chat 
main: seed = 1679325368
llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.34 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | 
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


== Running in chat mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMA.
 - If you want to submit another line, end your input in '\'.

> A RISC V CPU is
The RiscV (RISC-V) architecture was developed by MIT, Berkeley and Stanford in collaboration with Google Inc., IBM Corporation, Red Hat Software Foundation, Samsung Electronics Co. Ltd., Synopsys Inc.. It provides a free open source implementation of the instruction set architecture for microprocessors that is designed to be compatible across multiple hardware platforms.
> 

2 Likes