cwt
January 17, 2023, 3:31am
1
I saw someone on twitter got the same problem as me about NVMe I/O timeouts. I added these params to the kernel and my problem is fixed.
pcie_aspm.policy=performance pcie_aspm=off pcie_port_pm=off nvme_core.default_ps_max_latency_us=0 nvme_core.io_timeout=255 nvme_core.max_retries=10 nvme_core.shutdown_timeout=10
Please try it if you got the same problem.
7 Likes
zu2
January 17, 2023, 6:14am
2
I’m having the same NVMe issue with my VisionFive2. I’ve been struggling with this for a week.
After some research I added nvme_core.default_ps_max_latency_us=0 to /boot/boot/extlinux/extlinux.conf .
This got rid of the I/O errors, but now causes a random reboot instead.
I’ll try adding your other parameters.
thank you.
2 Likes
cobalt
February 6, 2023, 8:37pm
3
Hmm, i tried setting the params that @cwt suggests above. It was OK to cope with apt-get, but unfortunately the system did repeatably not survive a git-clone linux. That’s very unfortunate, as it appears, that the NVMe cannot be used under load right now. Did anyone had more success?
1 Like
cobalt
February 10, 2023, 12:58pm
4
I believe, the “fix” above with the kernel parameters are misleading.
At least for me, the point was, that i used a weak power supply. After attaching a 18 W USB-C charger, the nvme works as supposed.
Please @cwt and @zu2 confirm, to put this rumor to rest.
3 Likes
cwt
February 10, 2023, 1:56pm
5
I always use powerful 65W PD, without those params my NVMe always timeouts while it got high IO loads. So, it may fixed my problem, specifically to my hardware.
3 Likes
zu2
February 10, 2023, 2:26pm
6
My problem was that I was using an “Anker Nano II 30W”.
(I haven’t checked yet if the cause is the power supply or the cable)
It seems that it was solved by changing to Sanwa Supply’s PD65W. It’s been working steadily for over 3 days.
1 Like
Hi,
I’ve got few NVMe timeouts aswell in my dmesg, with or without 12V PD supply. Currently running off a 9V 15W capable charger, with measured stable 5V supply at board. These are quite problematic if occur but happen very rarely:
nvme nvme0: I/O 805 QID 3 timeout, completion polled
nvme nvme0: I/O 277 QID 4 timeout, completion polled
nvme nvme0: I/O 283 QID 4 timeout, completion polled
nvme nvme0: I/O 319 QID 4 timeout, completion polled
nvme nvme0: I/O 774 QID 3 timeout, completion polled
nvme nvme0: I/O 296 QID 4 timeout, completion polled
nvme nvme0: I/O 794 QID 3 timeout, completion polled
nvme nvme0: I/O 801 QID 3 timeout, completion polled
Usually this happens during heavy load, like if I do lots of I/O, I mean, lots , like doing few tar
’s parallel to building gcc. Then yeah, they start to pop up. Linear load like cat /dev/nvme0n1 >/dev/null
pops nothing though. Tested both with 5V 15W USB-C PD and 12V 30W USB-C PD, all the same.
The problem might be related to supply since I never got NVMe to work off a USB-A 5V supplies, even powerful ones. Boot just hangs in attempt to mount rootfs from no answering NVMe. With USB monitor, when I see voltage around 4.9V and less, then NVMe unresponsive problem occurs. Funny how PD fixes this. Does it have remote feedback mechanism?
2 Likes
I have something similar, with a Patriot P300 NVMe.
I already reported this here:
opened 02:14PM - 28 Mar 23 UTC
I'm seeing occasional errors like:
```
[60386.019927] nvme nvme0: I/O 210 QID … 4 timeout, completion polled
[60416.259353] nvme nvme0: I/O 195 QID 4 timeout, completion polled
[60479.458282] nvme nvme0: I/O 220 QID 2 timeout, completion polled
[60509.487744] nvme nvme0: I/O 222 QID 2 timeout, completion polled
```
When doing large RPM package installs, and once while rsyncing a big file tree to the VF2.
When it happens the operation hangs for some time, then you get this in the log and the operation proceeds and eventually completes successfully.
This has occurred with the stock Debian-69 and v2.10.4 releases, and I just saw it with the latest v2.11.5 release. I do not see any additional info in the error logs although there is a `missing or invalid SUBNQN field` message at startup, otherwise the NVMe (Patriot P300) works well.
I have a 45W Samsung wall-wart USB PD supply, and have not seen any other issues with this card (indeed, it works well apart from this).
I saw the these errors while re-installing the latest [Debian packages](https://github.com/starfive-tech/Debian/releases/tag/v0.7.1-engineering-release-wayland) using the provided [install](https://github.com/starfive-tech/Debian/releases/download/v0.7.1-engineering-release-wayland/install_package_and_dependencies.sh) script. They all occurred during the final `dpkg -i` phase.
``` console
root@rose:~# dmesg | grep nvme
[ 0.000000] Kernel command line: root=/dev/nvme0n1p4 rw console=tty0 console=ttyS0,115200 earlycon rootwait stmmaceth=chain_mod
e:1 selinux=0
[ 4.440152] nvme nvme0: pci function 0001:01:00.0
[ 4.451660] nvme 0001:01:00.0: enabling device (0000 -> 0002)
[ 4.469585] nvme nvme0: missing or invalid SUBNQN field.
[ 4.585406] nvme nvme0: allocated 64 MiB host memory buffer.
[ 4.816532] nvme nvme0: 4/0/0 default/read/poll queues
[ 4.854331] nvme0n1: p1 p2 p3 p4
[ 9.004199] EXT4-fs (nvme0n1p4): mounted filesystem with ordered data mode. Opts: (null). Quota mode: disabled.
[ 10.105598] EXT4-fs (nvme0n1p4): re-mounted. Opts: (null). Quota mode: disabled.
[60386.019927] nvme nvme0: I/O 210 QID 4 timeout, completion polled
[60416.259353] nvme nvme0: I/O 195 QID 4 timeout, completion polled
[60479.458282] nvme nvme0: I/O 220 QID 2 timeout, completion polled
[60509.487744] nvme nvme0: I/O 222 QID 2 timeout, completion polled
root@rose:~# nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 P300ABBB22111821071 Patriot M.2 P300 256GB 1 256.06 GB / 256.06 GB 512 B + 0 B V0513A0
root@rose:~# uname -a
Linux rose.easytarget.org 5.15.310 #1 SMP Mon Mar 27 19:55:33 CEST 2023 riscv64 GNU/Linux
```
I build my own kernel to add USB serial and additional Bluetooth adaptors:
* The `.310` kernel was built yesterday (I bump minor version to avoid confusing myself) from the `v2.11.5` source on the VF2 itself.
* I made the following config differences:
``` console
user@rose:~/kernel/linux$ diff .config .config.defconfig
938c938
< CONFIG_BT_HIDP=y
---
> # CONFIG_BT_HIDP is not set
951,958c951
< CONFIG_BT_INTEL=y
< CONFIG_BT_BCM=y
< CONFIG_BT_RTL=y
< CONFIG_BT_HCIBTUSB=y
< # CONFIG_BT_HCIBTUSB_AUTOSUSPEND is not set
< CONFIG_BT_HCIBTUSB_BCM=y
< CONFIG_BT_HCIBTUSB_MTK=y
< CONFIG_BT_HCIBTUSB_RTL=y
---
> # CONFIG_BT_HCIBTUSB is not set
971d963
< # CONFIG_BT_ATH3K is not set
4163,4216c4155
< CONFIG_USB_SERIAL=y
< # CONFIG_USB_SERIAL_CONSOLE is not set
< CONFIG_USB_SERIAL_GENERIC=y
skip lots of < # CONFIG_USB_SERIAL_XXX is not set messages
---
> # CONFIG_USB_SERIAL is not set
```
Only seems to happen during ‘heavy’ operations, installing packages, cloning repos, etc.
I’m using a 45W Samsung charger and cable; If it is a power problem I’d be surprised, though maybe the VF2 can generate peak loads that drop enough over the cable or board copper to cause this.
Edit: I’ll try the solution @cwt posted at the top of this, and report results.
1 Like
looks like i’m a lucky one with my few years old low end 128g toshiba nvme in my vf2 running off a normal rpi 5.1v/3a power brick … no issues yet, but maybe i did not yet put enough load onto it beyond -j4 kernel builds etc. …
3 Likes
andrew
April 13, 2023, 8:35pm
10
I also get it with my Patriot P300 NVMe
[134995.705555] nvme nvme0: I/O 181 QID 2 timeout, completion polled
[135489.136238] nvme nvme0: I/O 63 QID 4 timeout, completion polled
2 Likes
Same here, with another Patriot P300.
“find /usr -type f | xargs md5sum” was sufficient to reproduce it - this wasn’t doing any explicit writing but I have a feeling that the access times in the file inodes were being touched.
The system seems to recover OK each time (after a delay of about 30 seconds). Maybe a race condition involving a lost interrupt from the device e.g. two queues signalling completion at once, and only one being noticed/serviced?
[10307.600100] nvme nvme0: I/O 8 QID 4 timeout, completion polled
[10338.399635] nvme nvme0: I/O 6 QID 4 timeout, completion polled
[10369.039234] nvme nvme0: I/O 5 QID 4 timeout, completion polled
[10404.058725] nvme nvme0: I/O 9 QID 4 timeout, completion polled
[10454.158009] nvme nvme0: I/O 11 QID 4 timeout, completion polled
[10497.677395] nvme nvme0: I/O 9 QID 1 timeout, completion polled
[10531.596896] nvme nvme0: I/O 4 QID 4 timeout, completion polled
[10564.876405] nvme nvme0: I/O 5 QID 1 timeout, completion polled
[10598.795926] nvme nvme0: I/O 5 QID 4 timeout, completion polled
[10638.955338] nvme nvme0: I/O 13 QID 2 timeout, completion polled
[10684.554657] nvme nvme0: I/O 13 QID 1 timeout, completion polled
1 Like
mzs
April 15, 2023, 2:26am
12
Does adding norelatime,noatime
where the root partition is mounted help (no need for a nodiratime
because noatime
disables it as well) ? Or reduce the number of timeouts.
1 Like
Patriot P310, Arch 5.15.2-cwt12 default kernel options, btrfs default.
No timeouts running the above over c. 20500 files.
2 Likes
mzs
April 15, 2023, 11:01am
14
Could it be a buggy firmware issue ?
With nvme-cli installed ($ sudo apt install nvme-cli
), does a command like the following list the firmware version ?
nvme id-ctrl /dev/nvme0
nvme id-ctrl /dev/nvme0 --vendor-specific
andrew
April 15, 2023, 8:42pm
15
Here is the output for my device
StarFive ~ # nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x126f
ssvid : 0x126f
sn : *****************************
mn : Patriot M.2 P300 256GB
fr : V0513A0
rab : 6
ieee : 000001
cmic : 0
mdts : 6
cntlid : 0x1
ver : 0x10300
rtd3r : 0x249f0
rtd3e : 0x13880
oaes : 0x200
ctratt : 0
rrls : 0
cntrltype : 0
fguid : 00000000-0000-0000-0000-000000000000
crdt1 : 0
crdt2 : 0
crdt3 : 0
nvmsr : 0
vwci : 0
mec : 0
oacs : 0x7
acl : 4
aerl : 7
frmw : 0x12
lpa : 0x3
elpe : 63
npss : 0
avscc : 0
apsta : 0
wctemp : 356
cctemp : 358
mtfa : 100
hmpre : 16384
hmmin : 8192
tnvmcap : 0
unvmcap : 0
rpmbs : 0
edstt : 0
dsto : 0
fwug : 4
kas : 0
hctma : 0x1
mntmt : 273
mxtmt : 358
sanicap : 0
hmminds : 0
hmmaxd : 0
nsetidmax : 0
endgidmax : 0
anatt : 0
anacap : 0
anagrpmax : 0
nanagrpid : 0
pels : 0
domainid : 0
megcap : 0
sqes : 0x66
cqes : 0x44
maxcmd : 0
nn : 1
oncs : 0x15
fuses : 0
fna : 0x1
vwc : 0x1
awun : 0
awupf : 0
icsvscc : 0
nwpc : 0
acwu : 0
ocfs : 0
sgls : 0
mnan : 0
maxdna : 0
maxcna : 0
subnqn :
ioccsz : 0
iorcsz : 0
icdoff : 0
fcatt : 0
msdbd : 0
ofcs : 0
ps 0 : mp:6.00W operational enlat:0 exlat:0 rrt:0 rrl:0
rwt:0 rwl:0 idle_power:- active_power:-
active_power_workload:-
StarFive ~ # nvme id-ctrl /dev/nvme0 --vendor-specific
NVME Identify Controller:
vid : 0x126f
ssvid : 0x126f
sn : *************************************
mn : Patriot M.2 P300 256GB
fr : V0513A0
rab : 6
ieee : 000001
cmic : 0
mdts : 6
cntlid : 0x1
ver : 0x10300
rtd3r : 0x249f0
rtd3e : 0x13880
oaes : 0x200
ctratt : 0
rrls : 0
cntrltype : 0
fguid : 00000000-0000-0000-0000-000000000000
crdt1 : 0
crdt2 : 0
crdt3 : 0
nvmsr : 0
vwci : 0
mec : 0
oacs : 0x7
acl : 4
aerl : 7
frmw : 0x12
lpa : 0x3
elpe : 63
npss : 0
avscc : 0
apsta : 0
wctemp : 356
cctemp : 358
mtfa : 100
hmpre : 16384
hmmin : 8192
tnvmcap : 0
unvmcap : 0
rpmbs : 0
edstt : 0
dsto : 0
fwug : 4
kas : 0
hctma : 0x1
mntmt : 273
mxtmt : 358
sanicap : 0
hmminds : 0
hmmaxd : 0
nsetidmax : 0
endgidmax : 0
anatt : 0
anacap : 0
anagrpmax : 0
nanagrpid : 0
pels : 0
domainid : 0
megcap : 0
sqes : 0x66
cqes : 0x44
maxcmd : 0
nn : 1
oncs : 0x15
fuses : 0
fna : 0x1
vwc : 0x1
awun : 0
awupf : 0
icsvscc : 0
nwpc : 0
acwu : 0
ocfs : 0
sgls : 0
mnan : 0
maxdna : 0
maxcna : 0
subnqn :
ioccsz : 0
iorcsz : 0
icdoff : 0
fcatt : 0
msdbd : 0
ofcs : 0
ps 0 : mp:6.00W operational enlat:0 exlat:0 rrt:0 rrl:0
rwt:0 rwl:0 idle_power:- active_power:-
active_power_workload:-
vs[]:
0 1 2 3 4 5 6 7 8 9 a b c d e f
0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 "................"
00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
01f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
03f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
StarFive ~ #
mzs
April 15, 2023, 8:54pm
16
If I was you I would remove the serial number, but leave the rest, with enough data points there is probably something shown that might help track down and identify the cause of the timeouts. Looks like your firmware revision is “V0513A0”
1 Like
andrew
April 15, 2023, 9:10pm
18
Serial Number has been removed
2 Likes
mzs
April 15, 2023, 9:12pm
19
I found a second brand of NVMe (SSD M.2 NVMe Aoluska Gen 3.0 x4 2400Mb/s Leitura 256GB) that shares the same firmware revision , and it uses a Silicon Motion SM2263 NVMe SSD Controller which I am guessing is where the firmware runs.
So either:
Product Host Standards Flash Interface ECC Support Flash VCCQ Support DRAM TCG/AES Package
SM2263EN PCIe Gen3 x4 NVMe 1.3 4-CH Configurable LDPC ECC 1.8V/1.2V Yes Yes TFBGA288 (12 x 12mm)
SM2263XT PCIe Gen3 x4 NVMe 1.3 4-CH Configurable LDPC ECC 1.8V/1.2V -- Yes TFBGA288 (12 x 12mm)
And my guess would be a SM2263XT since patriotmemory do not mention any use of DRAM in their marketing.
I checked Silicon Motion website and there is no sign of a latter firmware ( “site:siliconmotion.com firmware” ).
I also checked patriotmemory website and they have none either (“site:patriotmemory.com firmware”).
And found that Aoluska does not appear to have a website.
The PCIe VendorID 0x126f is allocated to Silicon Motion, Inc., which would corroborate that this is the manufacturer of the controller chip used.
1 Like
andrew
April 16, 2023, 7:47am
20
Yes it has a Silicon Motion SSD Controller according to hwinfo
NVME 00.0: 10600 Disk
[Created at block.255]
Unique ID: GP4z.dfVB1eXouQ4
Parent ID: xKWB._aNoHWEPua6
SysFS ID: /class/block/nvme0n1
SysFS BusID: nvme0
SysFS Device Link: /devices/platform/soc/2c000000.pcie/pci0001:00/0001:00:00.0/0001:01:00.0/nvme/nvme0
Hardware Class: disk
Model: "Silicon Motion SM2263EN/SM2263XT SSD Controller"
Vendor: pci 0x126f "Silicon Motion, Inc."
Device: pci 0x2263 "SM2263EN/SM2263XT SSD Controller"
SubVendor: pci 0x126f "Silicon Motion, Inc."
SubDevice: pci 0x2263
Serial ID: "P300ABBB22111823091"
Driver: "nvme"
Driver Modules: "nvme"
Device File: /dev/nvme0n1
Device Number: block 259:0
Geometry (Logical): CHS 244198/64/32
Size: 500118192 sectors a 512 bytes
Capacity: 238 GB (256060514304 bytes)
Config Status: cfg=new, avail=yes, need=no, active=unknown
Attached to: #10 (Non-Volatile memory controller)
2 Likes
andrew
April 23, 2023, 2:55pm
21
I have not had a chance to try it but this patch may help.
4 Likes