looks like i’m a lucky one with my few years old low end 128g toshiba nvme in my vf2 running off a normal rpi 5.1v/3a power brick … no issues yet, but maybe i did not yet put enough load onto it beyond -j4 kernel builds etc. …
I also get it with my Patriot P300 NVMe
[134995.705555] nvme nvme0: I/O 181 QID 2 timeout, completion polled
[135489.136238] nvme nvme0: I/O 63 QID 4 timeout, completion polled
Same here, with another Patriot P300.
“find /usr -type f | xargs md5sum” was sufficient to reproduce it - this wasn’t doing any explicit writing but I have a feeling that the access times in the file inodes were being touched.
The system seems to recover OK each time (after a delay of about 30 seconds). Maybe a race condition involving a lost interrupt from the device e.g. two queues signalling completion at once, and only one being noticed/serviced?
[10307.600100] nvme nvme0: I/O 8 QID 4 timeout, completion polled
[10338.399635] nvme nvme0: I/O 6 QID 4 timeout, completion polled
[10369.039234] nvme nvme0: I/O 5 QID 4 timeout, completion polled
[10404.058725] nvme nvme0: I/O 9 QID 4 timeout, completion polled
[10454.158009] nvme nvme0: I/O 11 QID 4 timeout, completion polled
[10497.677395] nvme nvme0: I/O 9 QID 1 timeout, completion polled
[10531.596896] nvme nvme0: I/O 4 QID 4 timeout, completion polled
[10564.876405] nvme nvme0: I/O 5 QID 1 timeout, completion polled
[10598.795926] nvme nvme0: I/O 5 QID 4 timeout, completion polled
[10638.955338] nvme nvme0: I/O 13 QID 2 timeout, completion polled
[10684.554657] nvme nvme0: I/O 13 QID 1 timeout, completion polled
Does adding norelatime,noatime
where the root partition is mounted help (no need for a nodiratime
because noatime
disables it as well) ? Or reduce the number of timeouts.
Patriot P310, Arch 5.15.2-cwt12 default kernel options, btrfs default.
No timeouts running the above over c. 20500 files.
Could it be a buggy firmware issue ?
With nvme-cli installed ($ sudo apt install nvme-cli
), does a command like the following list the firmware version ?
nvme id-ctrl /dev/nvme0
nvme id-ctrl /dev/nvme0 --vendor-specific
Here is the output for my device
StarFive ~ # nvme id-ctrl /dev/nvme0 NVME Identify Controller: vid : 0x126f ssvid : 0x126f sn : ***************************** mn : Patriot M.2 P300 256GB fr : V0513A0 rab : 6 ieee : 000001 cmic : 0 mdts : 6 cntlid : 0x1 ver : 0x10300 rtd3r : 0x249f0 rtd3e : 0x13880 oaes : 0x200 ctratt : 0 rrls : 0 cntrltype : 0 fguid : 00000000-0000-0000-0000-000000000000 crdt1 : 0 crdt2 : 0 crdt3 : 0 nvmsr : 0 vwci : 0 mec : 0 oacs : 0x7 acl : 4 aerl : 7 frmw : 0x12 lpa : 0x3 elpe : 63 npss : 0 avscc : 0 apsta : 0 wctemp : 356 cctemp : 358 mtfa : 100 hmpre : 16384 hmmin : 8192 tnvmcap : 0 unvmcap : 0 rpmbs : 0 edstt : 0 dsto : 0 fwug : 4 kas : 0 hctma : 0x1 mntmt : 273 mxtmt : 358 sanicap : 0 hmminds : 0 hmmaxd : 0 nsetidmax : 0 endgidmax : 0 anatt : 0 anacap : 0 anagrpmax : 0 nanagrpid : 0 pels : 0 domainid : 0 megcap : 0 sqes : 0x66 cqes : 0x44 maxcmd : 0 nn : 1 oncs : 0x15 fuses : 0 fna : 0x1 vwc : 0x1 awun : 0 awupf : 0 icsvscc : 0 nwpc : 0 acwu : 0 ocfs : 0 sgls : 0 mnan : 0 maxdna : 0 maxcna : 0 subnqn : ioccsz : 0 iorcsz : 0 icdoff : 0 fcatt : 0 msdbd : 0 ofcs : 0 ps 0 : mp:6.00W operational enlat:0 exlat:0 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- active_power_workload:-
StarFive ~ # nvme id-ctrl /dev/nvme0 --vendor-specific NVME Identify Controller: vid : 0x126f ssvid : 0x126f sn : ************************************* mn : Patriot M.2 P300 256GB fr : V0513A0 rab : 6 ieee : 000001 cmic : 0 mdts : 6 cntlid : 0x1 ver : 0x10300 rtd3r : 0x249f0 rtd3e : 0x13880 oaes : 0x200 ctratt : 0 rrls : 0 cntrltype : 0 fguid : 00000000-0000-0000-0000-000000000000 crdt1 : 0 crdt2 : 0 crdt3 : 0 nvmsr : 0 vwci : 0 mec : 0 oacs : 0x7 acl : 4 aerl : 7 frmw : 0x12 lpa : 0x3 elpe : 63 npss : 0 avscc : 0 apsta : 0 wctemp : 356 cctemp : 358 mtfa : 100 hmpre : 16384 hmmin : 8192 tnvmcap : 0 unvmcap : 0 rpmbs : 0 edstt : 0 dsto : 0 fwug : 4 kas : 0 hctma : 0x1 mntmt : 273 mxtmt : 358 sanicap : 0 hmminds : 0 hmmaxd : 0 nsetidmax : 0 endgidmax : 0 anatt : 0 anacap : 0 anagrpmax : 0 nanagrpid : 0 pels : 0 domainid : 0 megcap : 0 sqes : 0x66 cqes : 0x44 maxcmd : 0 nn : 1 oncs : 0x15 fuses : 0 fna : 0x1 vwc : 0x1 awun : 0 awupf : 0 icsvscc : 0 nwpc : 0 acwu : 0 ocfs : 0 sgls : 0 mnan : 0 maxdna : 0 maxcna : 0 subnqn : ioccsz : 0 iorcsz : 0 icdoff : 0 fcatt : 0 msdbd : 0 ofcs : 0 ps 0 : mp:6.00W operational enlat:0 exlat:0 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- active_power_workload:- vs[]: 0 1 2 3 4 5 6 7 8 9 a b c d e f 0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 "................" 00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 01a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 01b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 01c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 01d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 01e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 01f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 02a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 02b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 02c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 02d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 02e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 0390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 03a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 03b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 03c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 03d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 03e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" 03f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................" StarFive ~ #
If I was you I would remove the serial number, but leave the rest, with enough data points there is probably something shown that might help track down and identify the cause of the timeouts. Looks like your firmware revision is “V0513A0”
Serial Number has been removed
I found a second brand of NVMe (SSD M.2 NVMe Aoluska Gen 3.0 x4 2400Mb/s Leitura 256GB) that shares the same firmware revision, and it uses a Silicon Motion SM2263 NVMe SSD Controller which I am guessing is where the firmware runs.
So either:
Product Host Standards Flash Interface ECC Support Flash VCCQ Support DRAM TCG/AES Package
SM2263EN PCIe Gen3 x4 NVMe 1.3 4-CH Configurable LDPC ECC 1.8V/1.2V Yes Yes TFBGA288 (12 x 12mm)
SM2263XT PCIe Gen3 x4 NVMe 1.3 4-CH Configurable LDPC ECC 1.8V/1.2V -- Yes TFBGA288 (12 x 12mm)
And my guess would be a SM2263XT since patriotmemory do not mention any use of DRAM in their marketing.
I checked Silicon Motion website and there is no sign of a latter firmware ( “site:siliconmotion.com firmware” ).
I also checked patriotmemory website and they have none either (“site:patriotmemory.com firmware”).
And found that Aoluska does not appear to have a website.
The PCIe VendorID 0x126f is allocated to Silicon Motion, Inc., which would corroborate that this is the manufacturer of the controller chip used.
Yes it has a Silicon Motion SSD Controller according to hwinfo
NVME 00.0: 10600 Disk [Created at block.255] Unique ID: GP4z.dfVB1eXouQ4 Parent ID: xKWB._aNoHWEPua6 SysFS ID: /class/block/nvme0n1 SysFS BusID: nvme0 SysFS Device Link: /devices/platform/soc/2c000000.pcie/pci0001:00/0001:00:00.0/0001:01:00.0/nvme/nvme0 Hardware Class: disk Model: "Silicon Motion SM2263EN/SM2263XT SSD Controller" Vendor: pci 0x126f "Silicon Motion, Inc." Device: pci 0x2263 "SM2263EN/SM2263XT SSD Controller" SubVendor: pci 0x126f "Silicon Motion, Inc." SubDevice: pci 0x2263 Serial ID: "P300ABBB22111823091" Driver: "nvme" Driver Modules: "nvme" Device File: /dev/nvme0n1 Device Number: block 259:0 Geometry (Logical): CHS 244198/64/32 Size: 500118192 sectors a 512 bytes Capacity: 238 GB (256060514304 bytes) Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #10 (Non-Volatile memory controller)
I have not had a chance to try it but this patch may help.
I tried this patch but I still get timeouts
This is a stab in the dark, but since USB 3.0 chipset also uses PCIe, I wonder if you remove all USB devices and access the machine only over SSH are the timeouts less or gone.
I’m only thinking about this because later firmware that the one available for the VF2 fixed some PCIe issues. It is probably a red herring (idiom’s do not translate into all languages, so hence the link). But my thinking is that if you unplug all USB devices there should be no PCIe traffic generated by that lane. At the very least it would cross one item off the list as a possible cause of the problem or shift it much further down the list.
I have no USB devices plugged in and I only access it via ssh.
I have the same issue, both with a vf2 1.3b and a 1.2a board. Only with the upstream kernel branch.
With the 5.15 kernel included in the wayland debian image I do not get any timeouts.
Just tried the current 6.4rc1 (JH7110_VisionFive2_upstream
branch from just now), and still getting timeouts.
@Wrybane thanks for reporting that, an interesting data point.
Just for completeness, what brand/model of NVMe are you using? there is some suggestion that this affects some NVMe’s more than others…
This occurs with the current kernel as well.
If you search the forum you will find discussions on this.
WD Red SN700 500GB,
Firmware version 111150WD
Another interesting thing is this: I built u-boot & opensbi from upstream sources, as they seem to have enough support to boot from sdcard, and I had a significantly increased number of nvme timeouts with that. So much that it prolonged the boot process with root on the nvme to take over 5 minutes before I could login. So I’m wondering if there’s some extra power management steps opensbi/u-boot with the starfive versions take which perhaps upstream linux also doesn’t yet do which affects this?