Run AMP System (RT-Thread + Linux) on VisionFive 2

This article provides steps to run heterogeneous asymmetric multiprocessing(AMP) system (Linux + RT-Thread) on StarFive new generation SoC platform - JH-7110.

The industrialization of RISC-V cannot be separated from industrial scenarios. AMP, Asymmetric Multiprocessing, refers to the isolation between each core of a multi-core processor, allowing it to run different operating systems or baremetal applications independently, such as Linux + RT-Thread in this case. This mode can improve the system real-time performance and stability, and reduce the hardware cost, which is usually used in industrial fields that require high customization, real-time performance, and reliability.

  1. Reduce the Hardware Cost

In traditional applications, to resolve insufficient real-time performance of Linux system controllers, we usually use external microcontrollers to specifically execute high real-time programs. However, using AMP systems can change this.

JH-7110 is equipped with a quad core RISC-V CPU. The heterogeneous AMP implemented in this case allows 3 CPUs to run Linux and 1 CPU to run RT-Thread RTOS, so there is no need to build additional system hardware equipment support during development. Only a set of hardware circuits can achieve complex functions, which is greatly reduced the hardware costs.

  1. Improve System Real- Time Performance and Stability

In the real-time process running on the CPU of RTOS, some real-time drivers are run in RTOS for data collection, and the data is sent back to Linux through shared memory. Linux can run various non real-time applications. This approach ensures both real-time performance and unaffected application processes on Linux.

With the high demand for real-time performance in fields such as industrial automation, the demand for RTOS is constantly increasing. Recently, RT-Linux Kernel 6.6 has officially supported the RISC-V architecture, and the kernel now includes the driver code for StarFive JH-7110.

JH-7110 includes 4 U74 CPUs. The heterogeneous AMP to be implemented in this application note is to use one CPU run RT-Thread RTOS, 3 CPUs run Linux OS, thus build a dual system AMP architecture with 3 U74 CPUs run Linux OS and 1 U74 CPU runs RT-Thread RTOS. Among them, real-time processes are run on the CPU of RTOS, and some real-time drivers can run for data collection. At the same time, data can be sent back to Linux through shared memory, and various non real-time applications can be run on the Linux side. This allows the system to ensure real-time performance and run powerful applications using the Linux universal OS, which becomes an important architecture in industrial systems.

To ensure that you can fully utilize the performance of JH-7110, you need to prepare the relevant hardware before executing the demo:

VisionFive 2 is the world’s first high-performance mass-produced RISC-V single board computer with an integrated 3D GPU. It is equipped with a JH-7110 RISC-V multimedia processor and boasted a quad-core 64-bit RISC-V SoC, running up to 1.5 GHz. VisionFive 2 has strong performance, rich interfaces, strong scalability, and abundant software resources, making it an excellent choice for open-source enthusiasts to explore the RISC-V world.

Performance

VisionFive 2 boasts a quad-core 64-bit JH-7110 SoC with RV64GC ISA, running up to 1.5 GHz, and integrated with IMG BXE-4-32 MC1, supporting OpenCL 3.0, OpenGL ES 3.2, and Vulkan 1.2.

Interface

VisionFive 2 provides rich I/O peripherals such as M.2 connector, eMMC socket, USB 3.0 ports, a 40-pin GPIO header, gigabit Ethernet ports, a TF card slot, and many more.

Image & Video Processing

VisionFive 2 has onboard audio and video processing capabilities and has MIPI-CSI and MIPI-DSI connectors as multimedia peripherals. It integrated StarFive ISP and is compatible with mainstream camera sensors; VisionFive 2 has a built-in image/video processing subsystem, supporting H.264/H.265/JPEG encoding and decoding.

SBC Link:Buy VisionFive 2 | RVspace

1 . RT-Thread Debug Port

Linux uses UART0 as the system serial port, while RT-Thread uses UART1 as the system serial port. In this application note, pin11 and pin13 on 40-pin GPIO of VisionFive 2 are used as RX/TX pins. The following shows the circuit diagram of the VisionFive 2 40-pin GPIO.

Pin9, pin11 and pin13 form a complete serial port:

Pin9 (GND)

Pin11 (GPIO42): UART1 RX

Pin13 (GPIO43): UART1 TX

2 . Inter-Processor Communication

Inter-processor uses the standard virtio-base RPMsg protocol to communicate. RPMsg, also known as Remote Processor Messaging, defines the standard binary interface used for communication between cores in heterogeneous AMP system.

• Linux: In Linux kernel, the codes of RPMsg are under drivers/rpmsg/ directory, the following are the related codes:

driver/rpmsg/virtio_rpmsg_bus.c
drivers/rpmsg/virtio_rpmsg_starfive.c

• RT-Thread: It uses open-source rpmsg-lite code, which is also the open-source virtio-base code of RPMsg. It can send and receive data with Linux according to the protocol.The combination of IPI interrupts between cores and shared memory can achieve data transmission between heterogeneous cores. The filepath of RT-Thread code is as follows:

bsp/starfive/jh7110/driver/rpmsg_lite

2.1. Code Branch

AMP has revised the repository of U-Boot, OpenSBI and Kernel, the following are the address and

branch of the 5 repositories:

Code Branch

3 . RT-Thread Startup and Memory Allocation

In AMP startup process, Linux and RT-Thread RTOS start up independently, with their configuration entry set in the DTS of U-Boot, which separates Linux domain and RTOS domain. In OpenSBI, each core will jump to a different address based on different configurations. RT-Thread does not jump to the second stage of U-Boot, but directly jumps from OpenSBI to RT-Thread.

RT - Thread Side

The rtthread.bin and u-boot.bin files of the RT-Thread are used together to generate

visionfive2_fw_payload.img , and SPL will read this image to the starting physical address of DDR, which is 0×40000000 . The components of this image are as follows:RT-Thread Startup and Memory Allocation

Linux Side

The Linux side has reserved 28M for AMP, with shared memory set to 4M. The memory distribution

is as follows:

Linux Side Range Memory
Shared Memory 0×6e400000 - 0×6e7fffff 4M
RT-Thread code, stack space 0×6e800000 - 0×6effffff 8M
RT-Thread code, stack space 0×6f000000 - 0×6fffffff 16M

Memory Address Range

4. Compilation

Follow the steps below to compile:

  1. RT-Thread is compiled on SCons. So before compilation, execute the following command to install SCons:

sudo apt-get install scons

  1. Execute the following command to download the dependency packages of VisionFive 2:
$ git clone https:**//github.com/starfive-tech/VisionFive2.git**
$ cd VisionFive2
$ git checkout --track origin/rtthread_AMP
$ git submodule update --init --recursive
  1. Execute the following command to switch to rtthread_AMP branch:
$ cd buildroot && git checkout --track origin/JH7110_VisionFive2_devel && cd ..
$ cd u-boot && git checkout --track origin/rtthread_AMP && cd ..
$ cd linux && git checkout --track origin/rtthread_AMP && cd ..
$ cd opensbi && git checkout rtthread_AMP && cd ..
$ cd soft_3rdpart && git checkout JH7110_VisionFive2_devel && cd ..
$ cd rtthread && git checkout rtthread_AMP && cd ..
  1. The compilation of RT-Thread used the embedded riscv64-unkno wn-elf toolchain,

which has been uploaded to the toolchain folder (toolchain/too l-root1.tar.gz ) in the RT Thread repository (https://github.com/starfive-tech/rt-thread/tree/rtthread_AMP/ toolchain). Please execute the following command to copy it to the /opt folder and unzip it:

$ sudo tar xf rtthread/toolchain/tool-root1.tar.gz -C /opt/

  1. Run make command under visionfive folder. The final compiled
    visionfive2_fw_payload.img exceeds 4M. When flashing it into SPI Nor, pay attention to the size of the image.

  2. If only RT-Thread has been modified, it can be compiled separately. Enter jh711 0 folder and run scons to generate the rtthread.bin file:

$cd rtthread/bsp/starfive/jh7110
$scons
  1. To configure and crop RT-Thread, enter the following command under jh7110 directory:

$ scons --menuconfig

5. Run RT-Thread

Perform the following steps to boot RT-Thread:

  1. Connect Linux and the debug serial port(on page 8 ) of RTOS, set the baud rate to115,200.

  2. Flash the compiled file u-boot-spl.bin.normal.out and visionfive2_fw_payload.img into SPI Nor Flash.

  3. Power on: RT-Thread starts quickly after power on and runs the rpmsg Linux test program.RT-Thread is waiting for the Linux side to send an IPI interrupt, while the Linux side is the master of Rpmsg and needs to configure the control memory and shared memory of the virtual queue.

  4. Boot Linux: During the process of booting Linux, the virtio_rpmsg_bus driver and the virtio_rpmsg_starfive driver will be registered. After registration is completed, an IPI interrupt will be sent to RT-Thread.


    After receiving an IPI interrupt, rpmsg_linux_test will continue to execute, and at this point, the finish shell of RT-Thread can also be used normally.

  5. Running the following command on the Linux side can see the IPI interrupt sent by RT-Thread to Linux:
    cat /proc/interrupts

  6. Run the test program below:
    rpmsg_echo


    IPI interrupt:

cat /proc/interrupt
IPI5: 12 0 0 AMP rpmsg interrupts

6. RT-Thread Performance

This section introduces the performance of RT-Thread from the following two aspects:

6.1 Schedule Delay

Perform a schedule delay test similar to cyclictest under RT-Thread, and the following are the test conditions:

U74 main frequency: 1.5GHz
Running time in idle state: 12 hours

Result: The average delay is 1us, and the maximum delay is 2us.

6.2 Interrupt Delay

Interrupt delay can be divided into IPI interrupt delay and peripheral delay.

IPI Interrupt Delay

Due to the IPI interrupt needs to be sent through M-mode inter-core interrupts, it is necessary to switch to M-mode for transmission, which has a certain delay and is larger than the peripherals delay.

Run performance measurements in rpmsg_echo.c , and calculate the time for one rpmsg echo and more than 10 string round trip times in Linux user mode:

• Test duration: Several hours
• Frequency: 1.5GHz
• Test times: More than 20000
• One IPI round trip time: About 25us
• Maximum delay: About 70us

Sending message #21998: hello there 21998!
Receiving message #21998: test this time 24000 ns, avg time 24785 ns,maxtime 69500 ns

UART Interrupt Delay

Test the RX delay of UART at 1.5GHz, from receive > interrupt > finish shell process receives characters , the whole process takes about 6us.

The guide shared in this article is now opened in StarFive RVspace community, and you can download the complete document for practice. In addition, StarFive also provided sales and technical support channels to ensure that you can receive timely assistance.

Contact Sales: sales@starfivetech.com
Tech Support: support@starfivetech.com

1 Like