Using uart0 can crash the kernel

valium · April 18, 2023, 12:54pm

Trying to use uart0 aka ttyS0 for NMEA communication with a GPS module i discovered that it is possible to crash the kerenel using ttyS0…

setup: remove console=ttyS0,115200n8 from the kernel commandline and disable login on ttyS0 by ln -s /dev/null /etc/systemd/system/serial-getty@ttyS0.service

Now accessing the console with a different baudrate, eg screen /dev/ttyS0 9600 leads to:

[   84.808204] rcu: INFO: rcu_sched self-detected stall on CPU
[   84.808222] rcu:     1-....: (2099 ticks this GP) idle=e99/1/0x4000000000000002 softirq=4772/4772 fqs=1009 
[   84.808241]  (t=2100 jiffies g=1817 q=39)
[   84.808249] Task dump for CPU 1:
[   84.808254] task:kworker/u8:2    state:R  running task     stack:    0 pid:  102 ppid:     2 flags:0x00000008
[   84.808275] Workqueue: events_unbound flush_to_ldisc
[   84.808293] Call Trace:
[   84.808297] [<ffffffff800049a6>] dump_backtrace+0x1c/0x24
[   84.808312] [<ffffffff8002dc1c>] sched_show_task+0x156/0x176
[   84.808324] [<ffffffff80a9cfc2>] dump_cpu_task+0x42/0x4c
[   84.808338] [<ffffffff80a9dc14>] rcu_dump_cpu_stacks+0xd2/0x10e
[   84.808350] [<ffffffff8006813e>] rcu_sched_clock_irq+0x4d4/0x6c2
[   84.808365] [<ffffffff8006e716>] update_process_times+0xa0/0xc8
[   84.808378] [<ffffffff8007c53a>] tick_sched_timer+0x78/0x130
[   84.808389] [<ffffffff8006edae>] __hrtimer_run_queues+0x122/0x186
[   84.808401] [<ffffffff8006fa5a>] hrtimer_interrupt+0xcc/0x1d8
[   84.808412] [<ffffffff807911a4>] riscv_timer_interrupt+0x32/0x3c
[   84.808426] [<ffffffff8005c118>] handle_percpu_devid_irq+0x80/0x110
[   84.808441] [<ffffffff80057232>] handle_domain_irq+0x58/0x88
[   84.808451] [<ffffffff803c4136>] riscv_intc_irq+0x36/0x5e
[   84.808464] [<ffffffff800030f4>] ret_from_exception+0x0/0xc
[   84.808474] [<ffffffff80ab06c6>] _raw_spin_unlock_irqrestore+0x16/0x2e

It seems this is because riscv hvc sbi driver waits for sbi_console_putchar to return but it doesn’t. as hvc0 via sbi and the ttyS0 use the same underlying hardware I’d suspect this is because the uart was configured to a different baudrate.

https://raw.githubusercontent.com/riscv-non-isa/riscv-sbi-doc/master/riscv-sbi.pdf p12 describes the used functions.

possible solutions:
A fix on the sbi or kernel level would be appreciated. I’m not sure how the setup is done on sbi level, but returning an error if the console was reconfigured instead of just blocking forever would be an option according to the documentation.

workaround:
For now my workaround is to make sure hvc0 isn’t used by also disabling it with

ln -s /dev/null /etc/systemd/system/serial-getty@hvc.service

Stat_headcrabed · April 18, 2023, 1:57pm

I think this bug possibly exists also in upstream. Have you asked upstream?

valium · April 18, 2023, 2:21pm

i have not cause i’m not sure where and what to report.

on the sbi/u-boot level i’ve got no idea how this is set up… but i think it’s hardware specific

on the kernel level:
one might argue that the kernel driver should not just wait for the sbi funcition call to return forever but on the other hand that function call is specified to block until the console is ready or to return an error… so i’d assume its safe to wait for a return

all in all this seems pretty hardware specific because the uart is used by u-boot/sbi and linux (configured via starfives device tree)

but if you’ve got hints where & how to report the issue i’d be happy to do so

SunWukong · April 18, 2023, 10:04pm

Isn’t it easier to read NMEA via USB or I2C? And wouldn’t it be better to use a GPIO for PPS? I would rather not tamper with the uart0.

rvalles · April 19, 2023, 2:22am

I understand is deprecated and shouldn’t be used in the first place.

Doesn’t the kernel hit the serial port directly? Maybe the problem is that the sbi-based console remains in use after the kernel has already taken over.

This should get some attention, as serial port being reliable is very important.

philrandal · April 19, 2023, 6:22am

If you are trying to use one of the Raspberry Pi gps hats (Adafruit or Uputronics, for example), nmea is via UART0, PPS is a GPIO pin.

SunWukong · April 19, 2023, 7:44am

You can also connect a HAT as you wish.

Or you can give up using a HAT, which is sometimes completely overpriced, and go for a cheap module that is sufficient for a Stratum 1 server.

In these two photos from 2016 and 2019, I used the usual pins but could have used others.

philrandal · April 19, 2023, 8:01am

@SunWukong this is true, but I already have a spare Uputronics HAT and would love to get an NTP server using it up and running on the Visionfive 2. And I really don’t like the system monopolising UART0, anyhow.

SunWukong · April 19, 2023, 8:23am

I can understand that, I use two HATs from Uputronics myself.

mike@mpc4:~$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+rpi8.ln.berline 192.168.178.16   2 u  957 1024  377    0.521   +0.281   0.247
*rpi6.ln.berline .PPS.            1 u  617 1024  377    0.567   +0.068   0.129
+rpi2.ln.berline .GPS.            1 u  343 1024  377    0.353   +0.045   0.169
+rpi1.ln.berline 192.168.178.16   2 u  601 1024  377    0.527   +0.179   0.219

rpi1 is only a backup NTP without GPS and rpi8 is actually also a Statum-1 with PPS and uses a HAT from Uputronics like rpi6, but something is broken since the last apt upgrade. Only on rpi8 do I have an Ubuntu installed, the others all run Debian from Raspberry. On rpi2 I use an Adafruit Ultimate GPS.

valium · April 19, 2023, 9:08am

I’m not sure, the document i linked is still a draft.

Yes, thats my understanding. I’m pretty sure the kernel doesn’t “know” that serial0 and the sbi-based console use the same underlying hardware.

And yes, instead of directly attaching the hat I can connect it via jumper cables. nevertheless the serial0 is configured for use with linux via the device tree so i’m trying to use it for my needs.

rvalles · April 19, 2023, 9:12am

It was already deprecated in the 1.0.x document, which I had read before. What’s new is that they have a new API for it.

That’s my suspicion too. So Linux needs to somehow be told not to touch the sbi serial at all. Perhaps this is doable via device tree.