Geekbench 5 results for D1 and JH7110

Please take into account that if you build software with default-ish CFLAGS or just don’t know what does this mean, on distro where every piece of SW was built that way then test results are pretty gibberish because it’s not an x86 thing where CPU babysits badly compiled code. I can provide examples of how bad GCC treats this little thingy when run with simple -O2 against hotpath cryptography code.

1 Like

I just perused the vf2 vs a00 geekbench results. It showed a difference of 205.8%. I would have preferred the geekbench would explain what exactly we’re looking at. For example the higher the number, the better. There’s nothing like that described for any of the benchmark tests. Does that mean the vf2 is 205.8% faster than a00? It seems so but I hate guessing/assuming. geekbench should crystallize what users are looking at and say it explicitly.

You didn’t try very hard if at all. Geekbench is well understood and is better documented than most: https://www.geekbench.com/doc/geekbench5-cpu-workloads.pdf

OT: I forgot I did this, but I had actually cooked up a pretty interesting benchmark (as always, this is but one point in the performance picture): GitHub - tommythorn/verilog-sim-bench: Verilog simulation workload extracted from Reduceron

Some results:

  • 3.6 GHz MacMini (M1): 47 s
  • 4.0 GHz EPYC 7443P (Zen 3): 91 s
  • 1.5 GHz Raspberry Pi 4B (Cortex-A72): 595 s
  • 1.5 GHz VisionFive 2 (JH7110/U74): 1062 s
1 Like

CPU bench’s result is can be manipulated.
Deffinet program require deffint feature.

eg. Intel’s CPU have 2*512bit FPU (AVX512) , but sse4 is 128bit . Some AMD FPUs are 2+2 (2 add,2 mul)can working in same time .

2 Likes

We run real applications that people use (like my example above) not synthetic toys like Dhrystones. If SSE or RiSC-V vector would accelerate the app, then it would wonderful as real apps would also run faster. It is true that SPEC has been hyper-optimized beyond what helps outside SPEC, that’s why you need a large suite and need to keep evolving it.

ADD: to be absolutely clear: my bench is a real workload and if you found a magic trick to accelerate it, a lot of people would be very interested.

1 Like

“real applications” is the problem.

For game , 8 cores and sse4.2 is enought .
For Sci , many cores and AVX512 is basic requirement.
For IoT , single core and FPU is an optional modules.

Your apps has feature requirement already. Find out what CPU has these features is a good way.

1 Like

Is “real workload” include “1000 times a+b in 1 second” ?

Jingtao I don’t know what your issue is. I’ve already explained that this is a typical, non-synthetic, example of a Real Workload. In my day job, I do run RTL simulations that take far longer than that. And while in this case the Verilog input looks unusual, it doesn’t matter for the actual C code that Verilator produces.

RTL simulation, by their very essence tries to execute the logic expressed by the RTL for millions of cycles. This is a single-thread workload as the state is the input to the logic cloud and every cycle it’s different. (There are work on multi-threading for very large and very partionable RTL, but that’s problem specific - my test is single thread).

I don’t know what you hope to achieve with the last insult. There’s no "“1000 times a+b in 1 second” " in this workload. In fact, what is run is actually simulating the execution of the Reduceron functional machine, itself running a non-trivial Knuth-Bendix logic solver (inception all the way down).

FYI, Verilator simulation is a Known Hard problem and I have inside knowledge that it is one what the key processor companies actually study and measure.

2 Likes

This is not my issue , but yours.
you sad:

This is a single-thread workload

but VF2 is 4 cores SBC and only 1.5GHz .
You need a high speed but single-core CPU .
Why run it on VF2 ?

I dont know your app which feature is required.

But M1 is faster than zen3 , seem that it is a “short width SIMD” or “non SIMD” app.
And M1 have 4x128bit FPU ,and zen3 have 2x256bit FPU( or 2+2 ? I forgot ) , maybe your app is require more FPU .
Of course, it is possible have some optimization for M1 (eg. AMX)
Or , M1 have high performance and low lentncy memory . Your app is need that feature.

And about “1000 times a+b in 1 second”
This is a joke .

But “Smart Screen” is almost .
It wait some second . Put a pic to screen with some 2D/3D effects. Wait some second . Put another pic to screen…
VF2 will work fine in that applications .

My guess guys, the “real workload” is it how you feel about it, give a time, get used to it and compare :smiley:
You’ll quickly notice differences at least in compile times of typical Linux kernel configs. For example, it takes literally ages to build a Slackware-15.x huge.s kernel on my i7-8750H (about six to ten hours), and I have to say that it is much better CPU that what I had before (AMD Athlon64x2 4800+ from 2009), but it is already worse than I have in my Fairphone 4 (Snapdragon 750G, A77), by about factor of 1~2. JH7110 will do this task, sure, but my guess it’ll take four days of uptime (yes I tried).

I just don’t get the purpose of these benchmarks. Especially alien/generic ones. It then ends up that you still get your rpi3 performance or A77 performance or M2 performance or whatever, you just didn’t read or understood spec and datasheets before you buy. Given that I already suspect this one is unfair bc at the beginning, AES-XTS (also, what’s the key size? It matters alot!) benchmark in software (no accel crypto) just does not match up with my result (mine are +6M/s for 256bit key, +15M/s if 128, total 37M/s per core @128bit key, OpenSSL 1.1.1).

My big guess is that typical p7zip b command and some ARX native XLEN cipher throughput like Threefish512 is well enough to estimate perf of platform. Ofc when your full hotpath SW is properly built. It did not gave me false positives, so far.

GPUs are completely different story though.

1 Like

Firstly I’m not criticizing all the efforts you made in presenting the data.

I’m sorry I wasn’t clear. I didn’t mean to explain with verbosity each benchmark. I meant to please add either “More is better.” or “Less is better.” as presented in this example:
https://www.phoronix.net/image.php?id=2020&image=easier_ob_compare_med

Thank you for listening.

2 Likes

Just for comparison, my GB5 result for VisionFive2, but on openEuler RISC-V 23.03 preview (images here), and on 1.5GHz stock frequency.

Seems overclocking does help a little bit.

4 Likes

@Kevin.MX - how did you do the overclocking? i.e. which changes to the kernel sources and dts?

I didn’t, not sure about that. OP’s post about VF2 is running at 1.75GHz which is obviously overclocked.

not sure, maybe only older versions of the vf2 did support 1.75ghz - but at some point last year that code was dropped and the current 1.5ghz were introduced instead - see also: Older 1.75ghz opp points

2 Likes

Yes, my fault. I mixed up my links and the one posted isn’t mine. My result at 1.5 GHz was identical to yours.

I unfortunately also don’t know how to overclock the JH7110.

To me the those Geekbench results are quite expectable: VF2 is somewhere around Raspberry Pi3, i.e. much better RPi2, worse than RPI4.

That order is fully expected because RPi4 has a 3-pipeline out-of-order CPU while VF2 has a more classic 2-pipeline in-order CPU. RPis also have SIMD FPU while VF2 does not, which likely makes RPi results look better. Would be hugely surprising if RPi4 were not ahead of VF2.

And all these above are vastly behind a PC/Mac whose CPUs alone cost hundreds of $$$, have billions of transistors, are manufactured in single-digit nanometres and draw tens of watts of power.

By the way, I acquired a VF2 because I’m working on a new embedded system that is based on smaller RISC-V cpu and was interested to have a commercial board which can run Linux. I connected a USB security camera to my VF2 and have it now record video clips when there’s motion within the camera view, and it seems to do fine in that purpose. There’s huge domain of applications that can use this kind of capability. Sure, all these would run fine also on Rasberry Pi and Beagle and similar, but competition is good and open architecture is good for competition so I wish all good for RISC-V ecosystem and alternative.

3 Likes

Only in terms of CPU number crunching grunt … in a limited number of areas it should be better than the RPi4.

2 Likes

eagerly waits for gpu code complete source code set to build it with clang and all known good compiler flags AND TORTURE THIS LITTLE THING :smiling_imp: :smiling_imp: :smiling_imp:

5 Likes