powerpc/pseries/msi: Avoid reading PCI device registers in reduced power states

[PATCH] powerpc/pseries/msi: Avoid reading PCI device registers in reduced power states

Posted by Gautam Menghani 11 months, 1 week ago

When a system is being suspended to RAM, the PCI devices are also
suspended and the PPC code ends up calling pseries_msi_compose_msg() and
this triggers the BUG_ON() in __pci_read_msi_msg() because the device at
this point is in reduced power state. In reduced power state, the memory
mapped registers of the PCI device are not accessible.

To replicate the bug:
1. Make sure deep sleep is selected
	# cat /sys/power/mem_sleep
	s2idle [deep]

2. Make sure console is not suspended (so that dmesg logs are visible)
	echo N > /sys/module/printk/parameters/console_suspend

3. Suspend the system
	echo mem > /sys/power/state

To fix this behaviour, read the cached msi message of the device when the
device is not in PCI_D0 power state instead of touching the hardware.

Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/msi.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index fdc2f7f38dc9..458d95c8c755 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -525,7 +525,12 @@ static struct msi_domain_info pseries_msi_domain_info = {
 
 static void pseries_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
-	__pci_read_msi_msg(irq_data_get_msi_desc(data), msg);
+	struct pci_dev *dev = msi_desc_to_pci_dev(irq_data_get_msi_desc(data));
+
+	if (dev->current_state == PCI_D0)
+		__pci_read_msi_msg(irq_data_get_msi_desc(data), msg);
+	else
+		get_cached_msi_msg(data->irq, msg);
 }
 
 static struct irq_chip pseries_msi_irq_chip = {
-- 
2.47.0

Re: [PATCH] powerpc/pseries/msi: Avoid reading PCI device registers in reduced power states

Posted by Madhavan Srinivasan 9 months, 1 week ago

On Wed, 05 Mar 2025 14:32:36 +0530, Gautam Menghani wrote:
> When a system is being suspended to RAM, the PCI devices are also
> suspended and the PPC code ends up calling pseries_msi_compose_msg() and
> this triggers the BUG_ON() in __pci_read_msi_msg() because the device at
> this point is in reduced power state. In reduced power state, the memory
> mapped registers of the PCI device are not accessible.
> 
> To replicate the bug:
> 1. Make sure deep sleep is selected
> 	# cat /sys/power/mem_sleep
> 	s2idle [deep]
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries/msi: Avoid reading PCI device registers in reduced power states
      https://git.kernel.org/powerpc/c/9cc0eafd28c7faef300822992bb08d79cab2a36c

Thanks

Re: [PATCH] powerpc/pseries/msi: Avoid reading PCI device registers in reduced power states

Posted by Vaibhav Jain 11 months ago

Gautam Menghani <gautam@linux.ibm.com> writes:

> When a system is being suspended to RAM, the PCI devices are also
> suspended and the PPC code ends up calling pseries_msi_compose_msg() and
> this triggers the BUG_ON() in __pci_read_msi_msg() because the device at
> this point is in reduced power state. In reduced power state, the memory
> mapped registers of the PCI device are not accessible.
>
> To replicate the bug:
> 1. Make sure deep sleep is selected
> 	# cat /sys/power/mem_sleep
> 	s2idle [deep]
>
> 2. Make sure console is not suspended (so that dmesg logs are visible)
> 	echo N > /sys/module/printk/parameters/console_suspend
>
> 3. Suspend the system
> 	echo mem > /sys/power/state
>
> To fix this behaviour, read the cached msi message of the device when the
> device is not in PCI_D0 power state instead of touching the hardware.
>
> Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains")
> Cc: stable@vger.kernel.org # v5.15+
> Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
LGTM. Hence
Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>

-- 
Cheers
~ Vaibhav

Re: [PATCH] powerpc/pseries/msi: Avoid reading PCI device registers in reduced power states

Posted by Venkat Rao Bagalkote 10 months, 2 weeks ago

On 10/03/25 10:30 am, Vaibhav Jain wrote:
> Gautam Menghani <gautam@linux.ibm.com> writes:
>
>> When a system is being suspended to RAM, the PCI devices are also
>> suspended and the PPC code ends up calling pseries_msi_compose_msg() and
>> this triggers the BUG_ON() in __pci_read_msi_msg() because the device at
>> this point is in reduced power state. In reduced power state, the memory
>> mapped registers of the PCI device are not accessible.
>>
>> To replicate the bug:
>> 1. Make sure deep sleep is selected
>> 	# cat /sys/power/mem_sleep
>> 	s2idle [deep]
>>
>> 2. Make sure console is not suspended (so that dmesg logs are visible)
>> 	echo N > /sys/module/printk/parameters/console_suspend
>>
>> 3. Suspend the system
>> 	echo mem > /sys/power/state
>>
>> To fix this behaviour, read the cached msi message of the device when the
>> device is not in PCI_D0 power state instead of touching the hardware.
>>
>> Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains")
>> Cc: stable@vger.kernel.org # v5.15+
>> Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
I am able to reporduce this issue without this patch and with this 
pacth, there is no BUG_ON() in __pci_read_msi_msg(), but did see kernel 
warnings. not sure if its side effect of this patch or a seperate issue.

Without this patch: [ 96.888399] ------------[ cut here ]------------ [ 
96.888402] kernel BUG at drivers/pci/msi/msi.c:158! [ 96.888407] Oops: 
Exception in kernel mode, sig: 5 [#1] [ 96.888410] LE PAGE_SIZE=64K 
MMU=Hash SMP NR_CPUS=8192 NUMA pSeries [ 96.888414] Modules linked in: 
nft_compat nf_tables nfnetlink bonding tls rfkill binfmt_misc kmem 
device_dax pseries_rng vmx_crypto dax_pmem drm 
drm_panel_orientation_quirks xfs dm_service_time sd_mod sg nd_pmem 
ibmvfc nd_btt ibmvscsi scsi_transport_fc ibmveth scsi_transport_srp 
papr_scm libnvdimm tg3 dm_multipath dm_mirror dm_region_hash dm_log 
dm_mod fuse [ 96.888473] CPU: 14 UID: 0 PID: 89 Comm: migration/14 
Kdump: loaded Not tainted 6.14.0-auto #3 [ 96.888479] Hardware name: 
IBM,9009-42A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW950.A0 
(VL950_141) hv:phyp pSeries [ 96.888481] Stopper: 
multi_cpu_stop+0x0/0x22c <- __stop_cpus.constprop.0+0x68/0xc0 [ 
96.888494] NIP: c000000000995aec LR: c00000000010ec20 CTR: 
c00000000010ebf8 [ 96.888498] REGS: c00000000680f830 TRAP: 0700 Not 
tainted (6.14.0-auto) [ 96.888501] MSR: 8000000002823033 
<SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 44004208 XER: 00000000 [ 96.888520] 
CFAR: c00000000010ec1c IRQMASK: 3 GPR00: c00000000010ec20 
c00000000680fad0 c000000001668100 c000000006c537e0 GPR04: 
c00000000680fb80 0000000000000000 0000000000000000 c009ffffff8325f0 
GPR08: 0000000000000001 0000000000000001 0000000000000003 
0000000000001003 GPR12: c00000000010ebf8 c00000000f7beb00 
c0000000001acbe0 c000000004056d40 GPR16: 0000000000000000 
0000000000000000 0000000000000000 0000000000000000 GPR20: 
0000000000000000 0000000000000003 000000000000001d c000000002cfaa88 
GPR24: c00000006545b800 c000000002cfc080 c000000001126bb0 
0000000000000000 GPR28: 0000000000000010 c00000000ce790c8 
c00000000680fb80 c000000006c537e0 [ 96.888586] NIP [c000000000995aec] 
__pci_read_msi_msg+0x48/0x278 [ 96.888592] LR [c00000000010ec20] 
pseries_msi_compose_msg+0x28/0x3c [ 96.888599] Call Trace: [ 96.888600] 
[c00000000680fad0] [000000000000001d] 0x1d (unreliable) [ 96.888608] 
[c00000000680fb20] [c00000006545b820] 0xc00000006545b820 [ 96.888613] 
[c00000000680fb40] [c00000000023b41c] irq_chip_compose_msi_msg+0x5c/0x90 
[ 96.888620] [c00000000680fb60] [c000000000242aec] 
msi_domain_set_affinity+0xb8/0xf4 [ 96.888627] [c00000000680fbb0] 
[c000000000234634] irq_do_set_affinity+0x14c/0x25c [ 96.888633] 
[c00000000680fc10] [c000000000234870] 
irq_set_affinity_locked+0x12c/0x1c4 [ 96.888639] [c00000000680fc60] 
[c000000000234a84] irq_set_affinity+0x64/0xa0 [ 96.888644] 
[c00000000680fca0] [c0000000000c9d40] xics_migrate_irqs_away+0x27c/0x30c 
[ 96.888650] [c00000000680fd60] [c000000000111834] 
pseries_cpu_disable+0xc8/0xf0 [ 96.888657] [c00000000680fd90] 
[c0000000000611e0] __cpu_disable+0x54/0xb0 [ 96.888662] 
[c00000000680fdc0] [c0000000001715e8] take_cpu_down+0x4c/0xcc [ 
96.888669] [c00000000680fe10] [c0000000002ebbc4] 
multi_cpu_stop+0xd8/0x22c [ 96.888676] [c00000000680fe80] 
[c0000000002eb898] cpu_stopper_thread+0x158/0x24c [ 96.888683] 
[c00000000680ff30] [c0000000001b7a0c] smpboot_thread_fn+0x1ec/0x25c [ 
96.888691] [c00000000680ff90] [c0000000001acd04] kthread+0x12c/0x14c [ 
96.888697] [c00000000680ffe0] [c00000000000df98] 
start_kernel_thread+0x14/0x18 [ 96.888703] Code: fba1ffe8 39200001 
f821ffb1 7c7f1b78 7c9e2378 e94d0c78 f9410028 39400000 eba30008 815dffd8 
2c0a0000 7d20489e <0b090000> a123004c 712a0001 41820168 [ 96.888730] 
---[ end trace 0000000000000000 ]---

With this patch: No System crash observed. But below warnings were observed.

[ 99.450644] ------------[ cut here ]------------ [ 99.450648] WARNING: 
CPU: 0 PID: 17 at arch/powerpc/sysdev/xics/icp-hv.c:55 
icp_hv_eoi+0xc4/0x120 [ 99.450659] Modules linked in: nft_compat 
nf_tables nfnetlink bonding tls rfkill binfmt_misc kmem device_dax 
pseries_rng vmx_crypto dax_pmem drm drm_panel_orientation_quirks xfs 
dm_service_time sd_mod sg nd_pmem ibmvfc nd_btt ibmvscsi 
scsi_transport_fc ibmveth scsi_transport_srp papr_scm libnvdimm tg3 
dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse [ 99.450704] 
CPU: 0 UID: 0 PID: 17 Comm: ksoftirqd/0 Kdump: loaded Not tainted 
6.14.0-auto-00001-g03419579f433 #4 [ 99.450712] Hardware name: 
IBM,9009-42A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW950.A0 
(VL950_141) hv:phyp pSeries [ 99.450717] NIP: c0000000000cadd4 LR: 
c0000000000cadd0 CTR: 00000000007088ec [ 99.450722] REGS: 
c000000004a2fa20 TRAP: 0700 Not tainted 
(6.14.0-auto-00001-g03419579f433) [ 99.450727] MSR: 800000000282b033 
<SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 2804424f XER: 00000010 [ 
99.450743] CFAR: c000000000224da8 IRQMASK: 1 GPR00: c0000000000cadd0 
c000000004a2fcc0 c000000001668100 000000000000003f GPR04: 
c0000007fd907c88 c0000007fd916000 c000000004a2fb08 00000007fb6a0000 
GPR08: 0000000000000027 0000000000000000 0000000000000000 
0000000000000001 GPR12: c000000002a37d48 c000000003000000 
c0000000001acc60 c000000004052080 GPR16: 0000000000000006 
0000000000000040 0000000000000006 0000000000000100 GPR20: 
0000000004208040 0000000000000000 0000000000000001 c0000000002382c0 
GPR24: 0000000000000001 0000000000000000 0000000000000006 
0000000000000002 GPR28: c0000007fd9078b8 0000000000000000 
c0000000010e69e8 00000000050a0002 [ 99.450802] NIP [c0000000000cadd4] 
icp_hv_eoi+0xc4/0x120 [ 99.450808] LR [c0000000000cadd0] 
icp_hv_eoi+0xc0/0x120 [ 99.450814] Call Trace: [ 99.450816] 
[c000000004a2fcc0] [c0000000000cadd0] icp_hv_eoi+0xc0/0x120 (unreliable) 
[ 99.450824] [c000000004a2fd30] [c000000000239eac] 
handle_fasteoi_irq+0x16c/0x344 [ 99.450832] [c000000004a2fd70] 
[c000000000238380] resend_irqs+0xc0/0x188 [ 99.450838] 
[c000000004a2fdb0] [c00000000017b054] tasklet_action_common+0x154/0x418 
[ 99.450845] [c000000004a2fe20] [c00000000017a788] 
handle_softirqs+0x148/0x3b4 [ 99.450852] [c000000004a2ff10] 
[c00000000017aa58] run_ksoftirqd+0x64/0xa0 [ 99.450858] 
[c000000004a2ff30] [c0000000001b7a8c] smpboot_thread_fn+0x1ec/0x25c [ 
99.450866] [c000000004a2ff90] [c0000000001acd84] kthread+0x12c/0x14c [ 
99.450873] [c000000004a2ffe0] [c00000000000df98] 
start_kernel_thread+0x14/0x18 [ 99.450879] Code: ebe1fff8 7c0803a6 
4e800020 3c82ffa8 3c62ffdc 7fc5f378 3fc2ffa8 3884e908 38632268 3bdee8e8 
48159f95 60000000 <0fe00000> 7bff4622 38600068 7fe4fb78 [ 99.450900] 
---[ end trace 0000000000000000 ]---

Please add below tag:

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>

Regards,

Venkat.

> LGTM. Hence
> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>

Re: [PATCH] powerpc/pseries/msi: Avoid reading PCI device registers in reduced power states

Posted by Gautam Menghani 9 months, 3 weeks ago

Hi Venkat,

Thanks for the report. I looked into this and found that the new warning
you reported can be observed even on current distro kernels, and is not
caused by the patch I've posted.

I was able to observe the same warning with fedora distro kernel 6.13.7-200.fc41

[   70.294478] icp_hv_set_xirr: bad return code eoi xirr=0x50a0002 returned -4
[   70.294521] ------------[ cut here ]------------
[   70.294546] WARNING: CPU: 7 PID: 54 at arch/powerpc/sysdev/xics/icp-hv.c:55 icp_hv_eoi+0xf8/0x120
[   70.294599] Modules linked in: xt_conntrack xt_MASQUERADE bridge stp llc ip6table_nat ip6table_filter ip6_tables xt_set ip_set iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables kvm rpcrdma rdma_cm iw_cm ib_cm ib_core bonding overlay rfkill binfmt_misc vmx_crypto pseries_rng nfsd auth_rpcgss nfs_acl loop dm_multipath lockd grace nfs_localio nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock xfs nvme_tcp nvme_fabrics nvme_keyring nvme_core nvme_auth ibmvscsi ibmveth scsi_transport_srp crct10dif_vpmsum crc32c_vpmsum pseries_wdt sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_dh_rdac scsi_dh_emc scsi_dh_alua fuse aes_gcm_p10_crypto crypto_simd cryptd
[   70.295015] CPU: 7 UID: 0 PID: 54 Comm: ksoftirqd/7 Kdump: loaded Not tainted 6.13.7-200.fc41.ppc64le #1
[   70.295064] Hardware name: IBM,9080-HEX POWER8 (architected) 0x800200 0xf000004 of:IBM,FW1060.00 (NH1060_022) hv:phyp pSeries
[   70.295120] NIP:  c000000000197c98 LR: c000000000197c94 CTR: 0000000000000000
[   70.295157] REGS: c000000007dd3a20 TRAP: 0700   Not tainted  (6.13.7-200.fc41.ppc64le)
[   70.295197] MSR:  8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24004202  XER: 00000001
[   70.295247] CFAR: c00000000032731c IRQMASK: 1
[   70.295247] GPR00: c000000000197c94 c000000007dd3cc0 c0000000024daa00 000000000000003f
[   70.295247] GPR04: 00000000ffff7fff 00000000ffff7fff c000000007dd3ae8 00000007ec8e0000
[   70.295247] GPR08: 0000000000000027 0000000000000000 0000000000000000 0000000000004000
[   70.295247] GPR12: 0000000000000000 c00000000ffc6f00 c000000000287ef8 c000000004a51080
[   70.295247] GPR16: 0000000000000000 0000000004208040 c000000003d62c80 c0000000031faf80
[   70.295247] GPR20: 00000000ffffa63b 000000000000000a c0000000031e6990 c000000000335f10
[   70.295247] GPR24: 0000000000000001 0000000000000000 0000000000000006 0000000000000002
[   70.295247] GPR28: c0000007efac68b8 0000000000000000 00000000050a0002 00000000050a0002
[   70.295603] NIP [c000000000197c98] icp_hv_eoi+0xf8/0x120
[   70.295633] LR [c000000000197c94] icp_hv_eoi+0xf4/0x120
[   70.295661] Call Trace:
[   70.295675] [c000000007dd3cc0] [c000000000197c94] icp_hv_eoi+0xf4/0x120 (unreliable)
[   70.295717] [c000000007dd3d40] [c000000000337a5c] handle_fasteoi_irq+0x16c/0x350
[   70.295757] [c000000007dd3d70] [c000000000335fd0] resend_irqs+0xc0/0x190
[   70.295793] [c000000007dd3db0] [c000000000254064] tasklet_action_common+0x154/0x440
[   70.295833] [c000000007dd3e20] [c000000000253458] handle_softirqs+0x168/0x4f0
[   70.295871] [c000000007dd3f10] [c000000000253848] run_ksoftirqd+0x68/0xb0
[   70.295912] [c000000007dd3f30] [c000000000292f20] smpboot_thread_fn+0x1d0/0x240
[   70.295951] [c000000007dd3f90] [c000000000288020] kthread+0x130/0x140
[   70.295984] [c000000007dd3fe0] [c00000000000ded8] start_kernel_thread+0x14/0x18
[   70.296022] Code: 48c84251 60000000 e9210068 4bffff98 7c661b78 3c82ff31 3c62ff7d 7fc5f378 38842b40 38639bf8 4818f649 60000000 <0fe00000> 38210080 7be34622 e8010010
[   70.296104] ---[ end trace 0000000000000000 ]---
[   70.297273] PM: resume devices took 0.000 seconds
[   70.297415] OOM killer enabled.
[   70.297433] Restarting tasks ... done.
[   70.298959] random: crng reseeded on system resumption
[   70.299106] PM: suspend exit


This can be tracked as a separate bug, as it is unrelated to the patch.

Thanks,
Gautam