GPU firmware fails to load with recompiled Kernel

I have a fresh install on VF2 with the 2.11.5 firmware, uboot and sbi, I am using the SD image from debian.starfivetech.com (google drive) and booting with /boot on SD, but I have moved / (root) to my NVMe.

I followed the Release Notes for this install, and everything works as expected, Gnome runs snappily with the compositor. I see the following PVR messages via dmesg:

user@rose:~$ head -1 dmesg.starfive.vanilla && echo ... && grep -i PVR dmesg.starfive.vanilla
[    0.000000] Linux version 5.15.0-starfive (sw_buildbot@mdcsw02) (riscv64-unknown-linux-gnu-gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35) #1 SMP Sun Mar 26 12:29:48 EDT 2023
...
[    0.868914] PVR_K:  1: Read BVNC 36.50.54.182 from HW device registers
[    0.876214] PVR_K:  1: RGX Device registered BVNC 36.50.54.182 with 1 core in the system
[    1.534093] PVR_K:  1: RGX Firmware image 'rgx.fw.36.50.54.182' loaded
[    1.545044] PVR_K:  1: Shader binary image 'rgx.sh.36.50.54.182' loaded
[    1.554679] [drm] Initialized pvr 1.17.6210866 20170530 for 18000000.gpu on minor 0

I then followed, closely, the instructions for building a new kernel in the Release Notes (section: ‘Updating Linux Kernel in Image’) and installed the resulting .deb packages. I also copied the dtbs as instructed.

I did not change any of the kernel config for this. My goal is to rebuild with USB serial and USB bluetooth HCI enabled once i have verified I can build the StarFive vanilla kernel.

When I boot from this new kernel the GPU firmware fails to load:

user@rose:~$ head -1 dmesg.starfive.rebuilt && echo ... && grep -i PVR dmesg.starfive.rebuilt
[    0.000000] Linux version 5.15.0 (user@rose.easytarget.org) (gcc (Debian 12.2.0-10) 12.2.0, GNU ld (GNU Binutils for Debian) 2.39.50.20221224) #1 SMP Tue Apr 4 17:27:05 CEST 2023
...
[    0.878310] PVR_K:  1: Read BVNC 36.50.54.182 from HW device registers
[    0.885608] PVR_K:  1: RGX Device registered BVNC 36.50.54.182 with 1 core in the system
[    1.234234] pvrsrvkm 18000000.gpu: Direct firmware load for rgx.fw.36.50.54.182 failed with error -2
[    1.244492] pvrsrvkm 18000000.gpu: Direct firmware load for rgx.fw.36.50p.54.182 failed with error -2
[    1.254780] pvrsrvkm 18000000.gpu: Direct firmware load for rgx.fw failed with error -2
[    1.263642] PVR_K:(Fatal):     1: All RGX Firmware image loads failed for 'rgx.fw.36.50.54.182' (PVRSRV_ERROR_NOT_FOUND) [1599]
[    1.276349] PVR_K:(Error):     1: RGXInit: InitFirmware failed (275) [1556]
[    1.284054] PVR_K:(Error):     1: RGXInit() failed (PVRSRV_ERROR_NOT_FOUND) in PVRSRVCommonDeviceInitialise() [2156]
[    1.295687] PVR_K:(Error):     1: PVRSRVDeviceFinalise() failed (PVRSRV_ERROR_NOT_INITIALISED) in PVRSRVCommonDeviceInitialise() [2170]
[    1.309155] [drm:pvr_drm_load] *ERROR* device (____ptrval____) initialisation failed (err=-19)

I’m a bit stuck at this point, the files are there:

user@rose:~$ ls -l /lib/firmware/rgx*
-rw-r--r-- 1 7271 500 122880 Mar 24 05:06 /lib/firmware/rgx.fw.36.50.54.182
-rw-r--r-- 1 7271 500 383576 Mar 24 05:06 /lib/firmware/rgx.sh.36.50.54.182

Any suggestions for what to do next?

  • google is no help, this looks like a pretty unique error.
  • I can rebuild/debug/whatever… linux buildmaster and release engineer are part of my CV. but I need to know where to start.
  • This is why I know xfce4 installs and works, and allows me to switch to lightdm I did it since, with no compositor, GDM and .Gnome run slooooooowly
1 Like

I have the same problem when I rebuild the kernel on the device itself.

1 Like

Thankyou! that got me wondering if the issue may be building on the VF2 itself.

So I rebuilt the kernel on my laptop (*), copied the .deb’s over and installed them.

Unfortunately this doesn’t work either; still the same errors.

(*) Fedora, but I used docker + official debian image. I had to add cross compile option (CROSS_COMPILE=riscv64-linux-gnu-) and also apt install gcc-riscv64-linux-gnu rsync unzip.

What about initrd ? Driver is built as modules?Or build in kernel?
If moules, fw must stay in /lib/firmware when module is loaded.

2 Likes

i think you need to get the firmware into the initrd similar like here: imagebuilder/firmware at main · hexdump0815/imagebuilder · GitHub

2 Likes

Hi,
You shall rebuild with CONFIG_DRM_IMG_ROGUE=m. Default config sets this to y and if you run with initramfs, and firmware is not there, it will fail. But from my experience even including firmware into initramfs does not do anything for some unknown to me reason, only rebuilding and getting pvrsrvkm as a module resolved this problem.

Ah, and small note I forgot. StarFive or kernel maintainers for some reason gave no choice to build this driver as module, you need to edit drivers/gpu/drm/img/img-rogue/Kconfig and replace bool "DRM support for PowerVR GPU" with tristate "DRM support for PowerVR GPU". This driver builds just fine as a module.

HTH

5 Likes

I managed to get it working earlier with the help of the link from @hexdump0815

I’ll write it up, but laptop died and I’ve got to fix that first…

However, not a proper fix, this adds the files when the image is installed. But really the firmware needs to be in the image built by debbuild, like in the starfive .deb. Should simple to solve as part of the kernel build and package.

while we are at the gpu firmware: this seems to be a total mess - multiple different versions of the rgx.fw.36.50.54.182 firmware file seem to be used - all with the same filename and version number: i built a 2.11.5 kernel and it did not want to play well with the firmware file anymore which worked fine with the 2.10.4 kernel - using the (different) rgx.fw.36.50.54.182 file from the 202303 image instead of the one from the 202302 image makes the kernel happy again (i.e. it loads and starts the firmware properly again) but then the pvr binary blobs (taken from the 202302 debian image) are unhappy and will not work anymore with the new firmware due to a software version mismatch (kernel 812 - blobs 810) … - of course the different versions of the binary blobs have the same filenames and version numbers … welcome to the wonderful world of binary blobs :slight_smile:

so the kernel driver has to match the firmware version loaded and that has to match the version of the binary blobs and all that without a useable versioning schema … the kernel driver changed just slightly beween 2.10.4 and 2.11.5 (some call reordering and the disabling of pdump) and that already seemed to require an update firmware and with it updated blobs

from the exception i got when the wrong firmware version was loaded it looks like the gpu is running mips, so not riscv like someone else was assuming … in the end i think the gpu is not really special and very similar to other img rogue gpus and thus the open source driver will most probably just work once it is working for one of the rogue gpus - i guess all the differences between the different rogue gpu types can be abstracted away with slightly different firmwares running in them

3 Likes

i have created a seperate thread to collect all the internal gpu related information in one place - so maybe lets followup there for everything not related to the starfive debian images: Getting the builtin img gpu working from scratch

4 Likes

For the sake of completeness here is what I am currently doing; thankyou everyone for your help.

I created the initramfs tools firmware hook file, and made it excutable (important):

user@rose:~$ ll /etc/initramfs-tools/hooks/firmware 
-rwxr-xr-x 1 root root 360 Apr  6 14:01 /etc/initramfs-tools/hooks/firmware

It has the same contents as the the file linked by @hexdump0815, but modified for our firmware, and I added a echo statement to let me know it ran, and remind me that I have made this customisation.

/etc/initramfs-tools/hooks/firmware
#!/bin/sh

set -e

PREREQ=""

prereqs()
{
	echo "${PREREQ}"
}

case "${1}" in
	prereqs)
		prereqs
		exit 0
		;;
esac

. /usr/share/initramfs-tools/hook-functions

echo "******\nRGX firmware\n******"
mkdir -p ${DESTDIR}/lib/firmware
cp -a /lib/firmware/rgx.sh.36.50.54.182 ${DESTDIR}/lib/firmware
cp -a /lib/firmware/rgx.fw.36.50.54.182 ${DESTDIR}/lib/firmware

This is a workaround at present; the other solution is to get the files into the .deb package itself, the hook we use inserts the file after the package has been unpacked, running the hook while the image is being installed by dpkg.

Fwiiw:
You can examine the contents of a /boot image with lsinitramfs

user@rose:~$ lsinitramfs -l /boot/initrd.img-5.15.0-rose | grep firmware
drwxr-xr-x   2 root     root            0 Apr  6 17:39 usr/lib/firmware
-rw-r--r--   1 7271     500        122880 Mar 24 05:06 usr/lib/firmware/rgx.fw.36.50.54.182
-rw-r--r--   1 7271     500        383576 Mar 24 05:06 usr/lib/firmware/rgx.sh.36.50.54.182
-rw-r--r--   1 root     root          210 Dec 22 12:21 usr/lib/udev/rules.d/50-firmware.rules

… this is what I see for my self-compiled images installed with this hook.

2 Likes

I can confirm that the solution by @strlcat works for me on the Gentoo Installation when I recompile the Kernel directly on the Device.
Now the GPU firmware loads correctly on bootup.

NOTE: There is no need to add the firmware files to the initramfs as it works fine without the files added.

1 Like

Yes, building as a module works too, since by the time the module loads the full /root filesystem is available. No need to pre-populate the image.

The GPU use two files:

rgx.fw.36.50.54.182
and
rgx.sh.36.50.54.182

One is the firmware running in the microcontroller inside the GPU, the other being something related to shaders (not sure what)

If you look on StarFive Github: soft_3rdpart/IMG_GPU/out at JH7110_VisionFive2_devel · starfive-tech/soft_3rdpart · GitHub

They have two release of the GPU Driver, 1.15 and 1.17, I would expect the firmware to have differences between both release.

It is annoying that the file name do not have a version, but at the same time the kernel driver maybe somewhat agnostic of the version and adding version part of the filename may be complicated. In anyway, as far as I know the latest debian release from StarFive use the 1.17 version. If you want to check which version you have, I recommend to compare the hash of the files from StarFive’s archive with the one you have.

4 Likes

The setup in the form of a firmware module is loaded, but there is no output to the monitor, even to the console

[   38.232439] mailbox_test mailbox_client: Successfully registered
[   41.669558] @@ dev ptr:ffffffe0bfedac00/1500/1
[   41.669717] PVR_K:  272: Read BVNC 36.50.54.182 from HW device registers
[   41.669766] PVR_K:  272: RGX Device registered BVNC 36.50.54.182 with 1 core in the system
[   41.705177] PVR_K:  272: RGX Firmware image 'rgx.fw.36.50.54.182' loaded
[   41.728503] PVR_K:  272: Shader binary image 'rgx.sh.36.50.54.182' loaded
[   41.730429] [drm] Initialized pvr 1.19.6345021 20170530 for 18000000.gpu on minor 1
[   42.423200] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: dm-devel@redhat.

fixed

1 Like

I am facing the same problem but I cannot compile CONFIG_DRM_IMG_ROGUE as a module because in “make menuconfig” it is shown as a bool value only!?

I had another look at the StarFive debian image and even there, the drm driver is loaded from inside the kernel, not as a module.
So I’ll give it a try baking the firmware into the initrd.

fails exactly the same way :face_with_raised_eyebrow: