It is a powerful board, no prices listed yet, but I suspect it will not be cheap at all. It is in pre-launch, but some details can be found on crowdsupply.
The one major letdown is the SOC used, only supporting the pre-ratified 0.7.1 Vector extension. So that part of the chip will be unsupported by most public compilation tools.
The SoC used is still a total beast, even with the flaw:
Sophon SG2042 64 XuanTie C920 (T-Head Semiconductor) RISC-V cores
- 16 clusters of 4 cores each (RV64GCV = RV64IMAFDC + V0.7.1 )
- 12-stages pipeline (integer)
- Out of order, 3 decode, 4 rename/dispatch, 8 issue/execute, dual load/store
- 2.0 GHz CPU frequency
- L1 64KiB I-Cache and 64KiB D-Cache per core (8MiB in total)
- L2 1MiB unified cache per cluster (16MiB in total)
- L3 64MiB system level cache (64MiB in total)
- 512G ops/s for 8bit integer and 256G ops/s for 16bit floating point
(translation: 4x INT8 and 2x FP16 arithmetic functional units per core per cycle)
- TDP 120 Watt
- 4 channel DRAM controller, supports DDR4 UDIMM/SODIMM/RDIMM up to 3200MT/s with ECC byte
- 2 PCIe controllers supports Gen 4.0 x16 (16GT/s/lane) (~63.016GB/sec total throughput)
(Marketing on the website describes this as 32 PCIe 4.0 channels, which is technically right for x1 lanes)
- 1x 1Gbps ethernet RGMII
The vast majority of the 120 watts of power used, and area used, on the SOC will be by L1 and L2 SRAM caches.
The board itself on crowdsupply is a bit confusingly, claims to have “PCI Express® controllers (x32 PCI Express Gen 3)” and “PCI Express: 3x PCIe x16 Slot (PCIe 3.0 x8)”.
Reading more it looks like they have allocated the PCIe 4.0 as follows:
PCIe 4.0 x8 lanes to PCIe slot1 (15.512 GB/s) which is the equivalent of a PCIe 3.0 x16
PCIe 4.0 x8 lanes to PCIe slot2 (15.512 GB/s) which is the equivalent of a PCIe 3.0 x16
PCIe 4.0 x8 lanes to PCIe slot3 (15.512 GB/s) which is the equivalent of a PCIe 3.0 x16
PCIe 4.0 x8 lanes to a PCIe packet Switch bridge (ASM2824) which provides 24 lane 12 downstream ports with upstream PCIe Gen3x8 bandwidth
Which means that the remaining 8 PCIe 4.0 x1 lanes (15.512 GB/s total throughput) are used for the two 2.5 Gbit/second Ethernet ports (0.3124 GB/sec each), the 5 SATA 3.0 ports (would be 6 Gbit/s or 0.600 GB/s each), the 8 USB3.2 ports (would be 10 Gbit/s or 1.25 GB/s each), the two M.2 PCIe 3.0 x4 (would be 3.938 GB/s each), the one M.2 PCIe 3.0 x1 (would be 0.985 GB/s) and all other peripheries. They have allocated at least 22.486GB/sec to fit into a upstream data path of 15.512 GB/s.
EDIT: Me personally I’m not buying another RISC-V board until I see one, with features I like, using the new ratified RISC-V profile naming convention.
For me it will probably be a RVA22S64 SOC with Vector and Hypervisor extension (I’m betting on it being a StarFive Dubhe based chip ).
This would translate to:
64-bit RISC with supervisor-mode
With Mandatory Extensions:
- Ss1p12 Privileged Architecture version 1.12
- Sv39 Page-Based 39-bit Virtual-Memory System
Which includes all Mandatory Extensions for RVA22U64:
- M Integer multiplication and divison.
- A Atomic instructions.
- F Single-precision floating-point instructions.
- D Double-precision floating-point instructions.
- C Compressed Instructions.
- Zicsr CSR instructions. These are implied by presence of F.
- Zicntr Base counters and timers.
- Zihpm Hardware performance counters.
- Zihintpause Pause instruction.
- Zba Address computation.
- Zbb Basic bit manipulation.
- Zbs Single-bit instructions.
- Zic64b Cache blocks must be 64 bytes in size, naturally aligned in the address space
- Zicbom Cache-Block Management Operations.
- Zicbop Cache-Block Prefetch Operations.
- Zicboz Cache-Block Zero Operations.
- Zfhmin Half-Precision Floating-point transfer and convert.
- Zkt Data-independent execution time.
- V Vector Extension
- H Hypervisor Extension
And you may wonder why I’ll wait, because I can foresee with the sheer number of extensions that the mainline tool chains and operating systems will rapidly drop, or at least reduce, support for RISC-V chips that do not fit into a nice clean easy to support bundle.
EDIT2: I found a price for the computer, no idea if it is right or wrong:
“It has been on preorder in China for a month.
$1600 with 32 GB RAM
$1900 with 128 GB RAM” - https://news.ycombinator.com/item?id=36017287
For Comparison, just to highlight how powerful the CPU is:
RPi4 BCM2711 - 4 Cortex-A72 cores at 1800MHz 7.3 DMIPS/MHz => 4*1800*7.3 ~= 52560 DMIPS 5.4 CoreMark/MHz => 4*1800*5.4 ~= 38880 CoreMark Sipeed Lichee Pi 4A TH1520 - 4 C910 cores at 2000 MHz 5.6 DMIPS/MHz => 4*2000*5.6 ~= 44800 DMIPS 6.5 CoreMark/MHz => 4*2000*6.5 ~= 52000 CoreMark Apple M1 Firestorm - 8 ARMv8.5-A Cores at 3200 MHz ?16? DMIPS/MHz => 8*3200*16 ~= 409600 DMIPS 11.5 CoreMark/MHz => 8*3200*11.5 ~= 294400 CoreMark Sophon SG2042 - 64 XuanTie C920 cores at 2000MHz 5.8 DMIPS/MHz => 64*2000*5.8 ~= 742400 DMIPS 7.0 CoreMark/MHz => 64*2000*7 ~= 896000 CoreMark
I guessed the DMIPS/MHz for the Apple M1, because I could not find the information anywhere, by looking at earlier chips from ARM. It is probably higher.
My issue is the pre-ratified 0.7.1 Vector extension, but if you were using it to do builds, it would be powerful machine. But you would probably need the 128GB of RAM to keep all of the 64 cores/harts fed with data to process (2 GB per core, or 8GB per 4 core cluster). And with that amount of RAM you could turbocharge your builds by running them on a simple RAM disk.