drivers/pci/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all
non-x86") was bisected to break various non-x86 RISC Unix systems,
e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot
on modern ARM64, PPC64 and RISCV ISAs besides new enough x86.
Sun Blade 1000:
ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0)
ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603]
ERROR(0):
TPC<MakeIocReady+0xc/0x278 [mptbase]>
ERROR(0): M_SYND(0), E_SYND(0), Privileged
ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus"
ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000]
ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000]
ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000]
ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000]
ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000]
ERROR(0): E-cache idx[b08040] tag[000000001e008fa0]
ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff]
Kernel panic - not syncing: Irrecoverable deferred error trap.
CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18
Call Trace:
[<00000000004294b0>] panic+0xf0/0x370
[<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8
[<0000000000405e88>] c_deferred+0x18/0x24
[<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase]
[<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase]
[<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase]
[<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas]
[<0000000000788308>] local_pci_probe+0x24/0x70
[<0000000000788dac>] pci_device_probe+0x1c0/0x1d0
[<000000000082633c>] really_probe+0x13c/0x29c
[<0000000000826590>] __driver_probe_device+0xf4/0x104
[<0000000000826614>] driver_probe_device+0x24/0xa0
[<000000000082683c>] __driver_attach+0xe8/0x104
[<0000000000824da0>] bus_for_each_dev+0x58/0x84
[<0000000000825508>] bus_add_driver+0xdc/0x1f8
[<0000000000827110>] driver_register+0x70/0x120
Niagara T1:
mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
NON-RESUMABLE ERROR: Reporting on cpu 31
NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]>
NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000
NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000]
NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c]
NON-RESUMABLE ERROR: type [precise nonresumable]
NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv >
NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
Kernel panic - not syncing: Non-resumable error.
CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1
Call Trace:
[<00000000004373c4>] dump_stack+0x8/0x18
[<0000000000429540>] panic+0xf4/0x398
[<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
[<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
[<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
[<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
[<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
[<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
[<0000000000b3fab0>] local_pci_probe+0x30/0x80
[<0000000000b405d4>] pci_device_probe+0xb4/0x240
[<0000000000bfd348>] really_probe+0xc8/0x400
[<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
[<0000000000bfd8c8>] driver_probe_device+0x28/0x100
[<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
[<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
[<0000000000bfcafc>] driver_attach+0x1c/0x40
Press Stop-A (L1-A) from sun keyboard or send break
twice on console to return to the boot prom
Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86")
Signed-off-by: René Rebe <rene@exactco.de>
---
Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01
---
drivers/pci/pci.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b14dd064006c..7619d2cfa66d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
/*
* Out of caution, we only allow PCIe ports from 2015 or newer
- * into D3 on x86.
+ * into D3 or other modern ISAs only.
*/
- if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015)
+ if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015)
return true;
break;
}
--
2.52.0
--
René Rebe, ExactCODE GmbH, Berlin, Germany
https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
On Tue, Dec 02, 2025 at 05:40:07PM +0100, René Rebe wrote:
> Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all
> non-x86") was bisected to break various non-x86 RISC Unix systems,
> e.g. sparc64, see two example oopses below.
[...]
> Sun Blade 1000:
> ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0)
> ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603]
> ERROR(0):
> TPC<MakeIocReady+0xc/0x278 [mptbase]>
> ERROR(0): M_SYND(0), E_SYND(0), Privileged
> ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus"
> ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000]
> ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000]
> ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000]
> ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000]
> ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000]
> ERROR(0): E-cache idx[b08040] tag[000000001e008fa0]
> ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff]
> Kernel panic - not syncing: Irrecoverable deferred error trap.
Some ARM PCIe host controllers are known to raise a Data Abort exception
upon a Completion Timeout (pcie-brcmstb.c is a case in point). It looks
like these SPARC CPUs behave similarly.
> CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18
> Call Trace:
> [<00000000004294b0>] panic+0xf0/0x370
> [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8
> [<0000000000405e88>] c_deferred+0x18/0x24
> [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase]
> [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase]
> [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase]
> [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas]
> [<0000000000788308>] local_pci_probe+0x24/0x70
> [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0
> [<000000000082633c>] really_probe+0x13c/0x29c
> [<0000000000826590>] __driver_probe_device+0xf4/0x104
> [<0000000000826614>] driver_probe_device+0x24/0xa0
> [<000000000082683c>] __driver_attach+0xe8/0x104
> [<0000000000824da0>] bus_for_each_dev+0x58/0x84
> [<0000000000825508>] bus_add_driver+0xdc/0x1f8
> [<0000000000827110>] driver_register+0x70/0x120
I suspect this is a bug in the mpt3sas driver and/or scsi layer.
A runtime PM ref is held on the PCI Endpoint device when the
driver probes, so that ref must have been dropped. The Endpoint
(SCSI host controller) went into runtime suspend, which allowed the
Root Port to go to D3hot. When the Root Port is in D3hot,
MMIO to the attached Endpoint will cause Completion Timeouts.
(Config Space accesses will still work.)
I'm not seeing any "pm_runtime" or "autopm" occurrences in
drivers/scsi/mpt3sas/, so perhaps the issue is in the scsi layer?
To track this down, you'd have to instrument calls to pm_runtime_put()
and friends with a printk to see where runtime PM refs are acquired
and dropped. Alternatively, enabling tracing may help, there's a few
tracepoints in runtime PM code.
> mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Maybe the Root Port or Endpoint need extra delays to resume to D0?
> CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1
> Call Trace:
> [<00000000004373c4>] dump_stack+0x8/0x18
> [<0000000000429540>] panic+0xf4/0x398
> [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
> [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
> [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
> [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
> [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
> [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
> [<0000000000b3fab0>] local_pci_probe+0x30/0x80
> [<0000000000b405d4>] pci_device_probe+0xb4/0x240
> [<0000000000bfd348>] really_probe+0xc8/0x400
> [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
> [<0000000000bfd8c8>] driver_probe_device+0x28/0x100
> [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
> [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
> [<0000000000bfcafc>] driver_attach+0x1c/0x40
Same stracktrace, same bug I guess.
Thanks,
Lukas
[+cc Mani, Brian (a5fb3ff63287 authors), Rafael, Lukas, Mario]
On Tue, Dec 02, 2025 at 05:40:07PM +0100, René Rebe wrote:
> Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all
> non-x86") was bisected to break various non-x86 RISC Unix systems,
> e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot
> on modern ARM64, PPC64 and RISCV ISAs besides new enough x86.
I think we need some kind of analysis of what is happening to the PCI
devices here. I don't know why the CPU architecture per se would be
related to PCI power management.
pci_bridge_d3_possible() is already a barely maintainable hodge podge
of random things that work and don't work. Generally speaking most of
those cases relate to firmware.
> Sun Blade 1000:
> ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0)
> ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603]
> ERROR(0):
> TPC<MakeIocReady+0xc/0x278 [mptbase]>
> ERROR(0): M_SYND(0), E_SYND(0), Privileged
> ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus"
> ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000]
> ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000]
> ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000]
> ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000]
> ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000]
> ERROR(0): E-cache idx[b08040] tag[000000001e008fa0]
> ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff]
> Kernel panic - not syncing: Irrecoverable deferred error trap.
> CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18
> Call Trace:
> [<00000000004294b0>] panic+0xf0/0x370
> [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8
> [<0000000000405e88>] c_deferred+0x18/0x24
> [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase]
I assume both of these crashes are related to the
CHIPREG_READ32(&ioc->chip->Doorbell) in mpt_GetIocState(), e.g., maybe
that PCI read failed because an upstream bridge was not in D0 and
therefore treated the read as an unsupported request.
> [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase]
> [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase]
> [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas]
> [<0000000000788308>] local_pci_probe+0x24/0x70
> [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0
> [<000000000082633c>] really_probe+0x13c/0x29c
> [<0000000000826590>] __driver_probe_device+0xf4/0x104
> [<0000000000826614>] driver_probe_device+0x24/0xa0
> [<000000000082683c>] __driver_attach+0xe8/0x104
> [<0000000000824da0>] bus_for_each_dev+0x58/0x84
> [<0000000000825508>] bus_add_driver+0xdc/0x1f8
> [<0000000000827110>] driver_register+0x70/0x120
>
> Niagara T1:
> mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
> NON-RESUMABLE ERROR: Reporting on cpu 31
> NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]>
> NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000
> NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000]
> NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c]
> NON-RESUMABLE ERROR: type [precise nonresumable]
> NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv >
> NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
> Kernel panic - not syncing: Non-resumable error.
> CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1
> Call Trace:
> [<00000000004373c4>] dump_stack+0x8/0x18
> [<0000000000429540>] panic+0xf4/0x398
> [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
> [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
> [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
> [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
> [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
> [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
> [<0000000000b3fab0>] local_pci_probe+0x30/0x80
> [<0000000000b405d4>] pci_device_probe+0xb4/0x240
> [<0000000000bfd348>] really_probe+0xc8/0x400
> [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
> [<0000000000bfd8c8>] driver_probe_device+0x28/0x100
> [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
> [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
> [<0000000000bfcafc>] driver_attach+0x1c/0x40
> Press Stop-A (L1-A) from sun keyboard or send break
> twice on console to return to the boot prom
>
> Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86")
> Signed-off-by: René Rebe <rene@exactco.de>
> ---
> Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01
> ---
> drivers/pci/pci.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b14dd064006c..7619d2cfa66d 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
>
> /*
> * Out of caution, we only allow PCIe ports from 2015 or newer
> - * into D3 on x86.
> + * into D3 or other modern ISAs only.
> */
> - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015)
> + if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015)
> return true;
> break;
> }
> --
> 2.52.0
>
> --
> René Rebe, ExactCODE GmbH, Berlin, Germany
> https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
On Tue, Dec 02, 2025 at 11:28:37AM -0600, Bjorn Helgaas wrote: > I think we need some kind of analysis of what is happening to the PCI > devices here. I don't know why the CPU architecture per se would be > related to PCI power management. Agreed, and I think it will be very hard to ever make any traction on modernizing the PM stack here if we can't any sort of "why?" answer out of the systems that don't work. The last time this came up, the answer was essentially: https://lore.kernel.org/all/CAJZ5v0j_6jeMAQ7eFkZBe5Yi+USGzysxAgfemYh=-zq4h5W+Qg@mail.gmail.com/ The DMI check at the end of pci_bridge_d3_possible() is really something to the effect of "there is no particular reason to prevent this bridge from going into D3, but try to avoid platforms where it may not work". i.e., no specific reason, but a vague understanding that there is some old HW that doesn't work. That's not very helpful for supporting non-DMI systems that don't have a programmatic notion of "old." OTOH, I sympathize with Rene, that it's hard to dig into what amounts to new development on old platforms, and yet, they do remain broken. > pci_bridge_d3_possible() is already a barely maintainable hodge podge > of random things that work and don't work. Generally speaking most of > those cases relate to firmware. I wonder if we could take a different approach that helps straddle the uncertain boundary here a bit: 1) be more aggressive at *permitting* runtime PM / D3 for bridges (i.e., if we think a bridge might be OK to go to D3, then manage its get()/put() properly); and 2) be less aggressive about default-enabling runtime suspend / D3 (i.e., only call pm_runtime_allow() in drivers/pci/pcie/portdrv.c in limited circumstances). For #2, that would actually match the documentation: Documentation/power/pci.rst The driver itself should not call pm_runtime_allow(), though. Instead, it should let user space or some platform-specific code do that (user space can do it via sysfs as stated above), but it must be prepared to handle the runtime PM of the device correctly as soon as pm_runtime_allow() is called (which may happen at any time, even before the driver is loaded). So instead of portdrv.c calling pm_runtime_allow(), we'd leave that decision to user space (i.e., udev or similar). That will help limit the impact of getting #1 "wrong." And it's possible the bad systems didn't really want aggressive PM anyway, so it's not worth much trouble. For #1, that means pci_bridge_d3_possible() would become more like pci_bridge_d3_impossible(). We could leave it as-is, or at least ensure it fails toward the "possible" side. IOW, user space can choose to opt in by way of: echo auto > /sys/bus/pci/devices/[port device]/power/control That might require some new udev rules if existing x86 systems are supposed to retain their old behavior. Personally, I care more about #1 (that the kernel manages pm_runtime_*() refcounts properly, so that my systems *can* opt into aggressive PM), and less about #2 (it's a fact of life that PM policy often requires careful udev / sysfs management, and that the defaults will not necessarily give the best power savings). This might leave some old unmaintained systems as "D3 possible", but we don't actually exercise it if user space doesn't poke /sys/bus/pci/devices/[port device]/power/control. Brian
[cc += Mika] On Tue, Dec 02, 2025 at 01:54:00PM -0800, Brian Norris wrote: > I wonder if we could take a different approach that helps straddle the > uncertain boundary here a bit: [...] > 2) be less aggressive about default-enabling runtime suspend / D3 > (i.e., only call pm_runtime_allow() in drivers/pci/pcie/portdrv.c in > limited circumstances). [...] > So instead of portdrv.c calling pm_runtime_allow(), we'd leave that > decision to user space (i.e., udev or similar). That will help limit the > impact of getting #1 "wrong." And it's possible the bad systems didn't > really want aggressive PM anyway, so it's not worth much trouble. I think runtime PM support in the PCIe port driver was primarily motivated by the need to power down Thunderbolt controllers when they're not in use. A Thunderbolt controller exposes a PCIe switch. Daisy-chained Thunderbolt devices are thus visible to the OS as nested switches. If we followed the approach you're suggesting, users would have to manually allow runtime PM on every Switch Upstream and Downstream Port as well as the Root Port and they'd have to do that upon hotplugging a device. Yes, yes, users could add a udev rule to allow runtime PM automatically by default, but that's exactly the policy we have hardcoded in the kernel right now, so why the change? I expect massive power regressions for users (not least Chromebook users) if we made that change. The discrete Thunderbolt controller in my machine consumes 1.5W when nothing is attached. Some laptops have multiple of these. Recent CPUs with integrated Thunderbolt/USB4 may fail to transition the package to a low power state unless the Thunderbolt ports go to D3hot. So I don't think this approach is a viable option. Thanks, Lukas
Hi, On Wed, Dec 03, 2025 at 05:49:37AM +0100, Lukas Wunner wrote: > [cc += Mika] > > On Tue, Dec 02, 2025 at 01:54:00PM -0800, Brian Norris wrote: > > I wonder if we could take a different approach that helps straddle the > > uncertain boundary here a bit: > [...] > > 2) be less aggressive about default-enabling runtime suspend / D3 > > (i.e., only call pm_runtime_allow() in drivers/pci/pcie/portdrv.c in > > limited circumstances). > [...] > > So instead of portdrv.c calling pm_runtime_allow(), we'd leave that > > decision to user space (i.e., udev or similar). That will help limit the > > impact of getting #1 "wrong." And it's possible the bad systems didn't > > really want aggressive PM anyway, so it's not worth much trouble. > > I think runtime PM support in the PCIe port driver was primarily > motivated by the need to power down Thunderbolt controllers when > they're not in use. That and also there are discrete GPUs that can runtime suspend when not in use. > A Thunderbolt controller exposes a PCIe switch. Daisy-chained > Thunderbolt devices are thus visible to the OS as nested switches. > If we followed the approach you're suggesting, users would have to > manually allow runtime PM on every Switch Upstream and Downstream Port > as well as the Root Port and they'd have to do that upon hotplugging > a device. Yes, yes, users could add a udev rule to allow runtime PM > automatically by default, but that's exactly the policy we have hardcoded > in the kernel right now, so why the change? > > I expect massive power regressions for users (not least Chromebook > users) if we made that change. > > The discrete Thunderbolt controller in my machine consumes 1.5W > when nothing is attached. Some laptops have multiple of these. > Recent CPUs with integrated Thunderbolt/USB4 may fail to transition > the package to a low power state unless the Thunderbolt ports go > to D3hot. > > So I don't think this approach is a viable option. I agree. If this is limited to some older RISC machines (based on the $subject) perhaps this could be solved by adding udev rules to block runtime PM on those certain ports?
Hi, > On 3. Dec 2025, at 15:27, Mika Westerberg <mika.westerberg@linux.intel.com> wrote: … >> A Thunderbolt controller exposes a PCIe switch. Daisy-chained >> Thunderbolt devices are thus visible to the OS as nested switches. >> If we followed the approach you're suggesting, users would have to >> manually allow runtime PM on every Switch Upstream and Downstream Port >> as well as the Root Port and they'd have to do that upon hotplugging >> a device. Yes, yes, users could add a udev rule to allow runtime PM >> automatically by default, but that's exactly the policy we have hardcoded >> in the kernel right now, so why the change? >> >> I expect massive power regressions for users (not least Chromebook >> users) if we made that change. >> >> The discrete Thunderbolt controller in my machine consumes 1.5W >> when nothing is attached. Some laptops have multiple of these. >> Recent CPUs with integrated Thunderbolt/USB4 may fail to transition >> the package to a low power state unless the Thunderbolt ports go >> to D3hot. >> >> So I don't think this approach is a viable option. > > I agree. If this is limited to some older RISC machines (based on the > $subject) perhaps this could be solved by adding udev rules to block > runtime PM on those certain ports? Let’s not overcomplicate it for now. All we have are a couple of old Unix RISC workstations. Let’s see if we can somehow fix them for real first. Given the feedback that D3Hot “should” more often work I went ahead and changed the patch in T2/Linux removing the 2015 check and all arch except SPARC and let our prosumer enthusiast users find out if something else breaks first to gather more data points. I’ll try to find time to debug the SPARC64 Sun Blade 1K issue, but I have some other TODO first, so it might be January for more work on that. Maybe we should push a patch to only disable this for SPARC64 to stable In the meantime? https://svn.exactcode.de/t2/trunk/package/kernel/linux/hotfix-legacy-pci-bridge-d3.patch diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 2b53219fda3b..869d204a70a3 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3067,10 +3067,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) return false; /* - * Out of caution, we only allow PCIe ports from 2015 or newer - * into D3 on x86. + * It should be safe to put PCIe ports to D3. */ - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) + if (!IS_ENABLED(CONFIG_SPARC64)) return true; break; } René -- https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
On Wed, Dec 3, 2025 at 3:48 PM René Rebe <rene@exactco.de> wrote:
>
> Hi,
>
> > On 3. Dec 2025, at 15:27, Mika Westerberg <mika.westerberg@linux.intel.com> wrote:
>
> …
>
> >> A Thunderbolt controller exposes a PCIe switch. Daisy-chained
> >> Thunderbolt devices are thus visible to the OS as nested switches.
> >> If we followed the approach you're suggesting, users would have to
> >> manually allow runtime PM on every Switch Upstream and Downstream Port
> >> as well as the Root Port and they'd have to do that upon hotplugging
> >> a device. Yes, yes, users could add a udev rule to allow runtime PM
> >> automatically by default, but that's exactly the policy we have hardcoded
> >> in the kernel right now, so why the change?
> >>
> >> I expect massive power regressions for users (not least Chromebook
> >> users) if we made that change.
> >>
> >> The discrete Thunderbolt controller in my machine consumes 1.5W
> >> when nothing is attached. Some laptops have multiple of these.
> >> Recent CPUs with integrated Thunderbolt/USB4 may fail to transition
> >> the package to a low power state unless the Thunderbolt ports go
> >> to D3hot.
> >>
> >> So I don't think this approach is a viable option.
> >
> > I agree. If this is limited to some older RISC machines (based on the
> > $subject) perhaps this could be solved by adding udev rules to block
> > runtime PM on those certain ports?
>
> Let’s not overcomplicate it for now. All we have are a couple of old Unix
> RISC workstations. Let’s see if we can somehow fix them for real first.
>
> Given the feedback that D3Hot “should” more often work I went ahead
> and changed the patch in T2/Linux removing the 2015 check and all arch
> except SPARC and let our prosumer enthusiast users find out if something
> else breaks first to gather more data points.
>
> I’ll try to find time to debug the SPARC64 Sun Blade 1K issue, but I have
> some other TODO first, so it might be January for more work on that.
>
> Maybe we should push a patch to only disable this for SPARC64 to stable
> In the meantime?
>
> https://svn.exactcode.de/t2/trunk/package/kernel/linux/hotfix-legacy-pci-bridge-d3.patch
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 2b53219fda3b..869d204a70a3 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3067,10 +3067,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> return false;
>
> /*
> - * Out of caution, we only allow PCIe ports from 2015 or newer
> - * into D3 on x86.
> + * It should be safe to put PCIe ports to D3.
> */
> - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015)
> + if (!IS_ENABLED(CONFIG_SPARC64))
> return true;
> break;
> }
I would prefer
if ((IS_ENABLED(CONFIG_X86) && dmi_get_bios_year() >= 2015) ||
!IS_ENABLED(CONFIG_SPARC64))
return true;
Hi, > On 3. Dec 2025, at 16:22, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Dec 3, 2025 at 3:48 PM René Rebe <rene@exactco.de> wrote: >> >> Hi, >> >>> On 3. Dec 2025, at 15:27, Mika Westerberg <mika.westerberg@linux.intel.com> wrote: >> >> … >> >>>> A Thunderbolt controller exposes a PCIe switch. Daisy-chained >>>> Thunderbolt devices are thus visible to the OS as nested switches. >>>> If we followed the approach you're suggesting, users would have to >>>> manually allow runtime PM on every Switch Upstream and Downstream Port >>>> as well as the Root Port and they'd have to do that upon hotplugging >>>> a device. Yes, yes, users could add a udev rule to allow runtime PM >>>> automatically by default, but that's exactly the policy we have hardcoded >>>> in the kernel right now, so why the change? >>>> >>>> I expect massive power regressions for users (not least Chromebook >>>> users) if we made that change. >>>> >>>> The discrete Thunderbolt controller in my machine consumes 1.5W >>>> when nothing is attached. Some laptops have multiple of these. >>>> Recent CPUs with integrated Thunderbolt/USB4 may fail to transition >>>> the package to a low power state unless the Thunderbolt ports go >>>> to D3hot. >>>> >>>> So I don't think this approach is a viable option. >>> >>> I agree. If this is limited to some older RISC machines (based on the >>> $subject) perhaps this could be solved by adding udev rules to block >>> runtime PM on those certain ports? >> >> Let’s not overcomplicate it for now. All we have are a couple of old Unix >> RISC workstations. Let’s see if we can somehow fix them for real first. >> >> Given the feedback that D3Hot “should” more often work I went ahead >> and changed the patch in T2/Linux removing the 2015 check and all arch >> except SPARC and let our prosumer enthusiast users find out if something >> else breaks first to gather more data points. >> >> I’ll try to find time to debug the SPARC64 Sun Blade 1K issue, but I have >> some other TODO first, so it might be January for more work on that. >> >> Maybe we should push a patch to only disable this for SPARC64 to stable >> In the meantime? >> >> https://svn.exactcode.de/t2/trunk/package/kernel/linux/hotfix-legacy-pci-bridge-d3.patch >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index 2b53219fda3b..869d204a70a3 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -3067,10 +3067,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) >> return false; >> >> /* >> - * Out of caution, we only allow PCIe ports from 2015 or newer >> - * into D3 on x86. >> + * It should be safe to put PCIe ports to D3. >> */ >> - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) >> + if (!IS_ENABLED(CONFIG_SPARC64)) >> return true; >> break; >> } > > I would prefer > > if ((IS_ENABLED(CONFIG_X86) && dmi_get_bios_year() >= 2015) || > !IS_ENABLED(CONFIG_SPARC64)) > return true; Sorry for any confusion, I did not mean the above for upstream, but as I tried to express for us downstream in T2 to gather more data (if any) from our users for for pre 2015 x86 machines. Should I send your proposal which matches mine for stable in the meantime? René -- https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
On Wed, Dec 3, 2025 at 4:26 PM René Rebe <rene@exactco.de> wrote: > > Hi, > > > On 3. Dec 2025, at 16:22, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Wed, Dec 3, 2025 at 3:48 PM René Rebe <rene@exactco.de> wrote: > >> > >> Hi, > >> > >>> On 3. Dec 2025, at 15:27, Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > >> > >> … > >> > >>>> A Thunderbolt controller exposes a PCIe switch. Daisy-chained > >>>> Thunderbolt devices are thus visible to the OS as nested switches. > >>>> If we followed the approach you're suggesting, users would have to > >>>> manually allow runtime PM on every Switch Upstream and Downstream Port > >>>> as well as the Root Port and they'd have to do that upon hotplugging > >>>> a device. Yes, yes, users could add a udev rule to allow runtime PM > >>>> automatically by default, but that's exactly the policy we have hardcoded > >>>> in the kernel right now, so why the change? > >>>> > >>>> I expect massive power regressions for users (not least Chromebook > >>>> users) if we made that change. > >>>> > >>>> The discrete Thunderbolt controller in my machine consumes 1.5W > >>>> when nothing is attached. Some laptops have multiple of these. > >>>> Recent CPUs with integrated Thunderbolt/USB4 may fail to transition > >>>> the package to a low power state unless the Thunderbolt ports go > >>>> to D3hot. > >>>> > >>>> So I don't think this approach is a viable option. > >>> > >>> I agree. If this is limited to some older RISC machines (based on the > >>> $subject) perhaps this could be solved by adding udev rules to block > >>> runtime PM on those certain ports? > >> > >> Let’s not overcomplicate it for now. All we have are a couple of old Unix > >> RISC workstations. Let’s see if we can somehow fix them for real first. > >> > >> Given the feedback that D3Hot “should” more often work I went ahead > >> and changed the patch in T2/Linux removing the 2015 check and all arch > >> except SPARC and let our prosumer enthusiast users find out if something > >> else breaks first to gather more data points. > >> > >> I’ll try to find time to debug the SPARC64 Sun Blade 1K issue, but I have > >> some other TODO first, so it might be January for more work on that. > >> > >> Maybe we should push a patch to only disable this for SPARC64 to stable > >> In the meantime? > >> > >> https://svn.exactcode.de/t2/trunk/package/kernel/linux/hotfix-legacy-pci-bridge-d3.patch > >> > >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > >> index 2b53219fda3b..869d204a70a3 100644 > >> --- a/drivers/pci/pci.c > >> +++ b/drivers/pci/pci.c > >> @@ -3067,10 +3067,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) > >> return false; > >> > >> /* > >> - * Out of caution, we only allow PCIe ports from 2015 or newer > >> - * into D3 on x86. > >> + * It should be safe to put PCIe ports to D3. > >> */ > >> - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) > >> + if (!IS_ENABLED(CONFIG_SPARC64)) > >> return true; > >> break; > >> } > > > > I would prefer > > > > if ((IS_ENABLED(CONFIG_X86) && dmi_get_bios_year() >= 2015) || > > !IS_ENABLED(CONFIG_SPARC64)) > > return true; > > Sorry for any confusion, I did not mean the above for upstream, but as I > tried to express for us downstream in T2 to gather more data (if any) from > our users for for pre 2015 x86 machines. > > Should I send your proposal which matches mine for stable in the > meantime? Yes, you can do this, as far as I'm concerned.
Hi,
thank you for your review.
On Tue, 2 Dec 2025 11:28:37 -0600, Bjorn Helgaas <helgaas@kernel.org> wrote:
> [+cc Mani, Brian (a5fb3ff63287 authors), Rafael, Lukas, Mario]
>
> On Tue, Dec 02, 2025 at 05:40:07PM +0100, René Rebe wrote:
> > Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all
> > non-x86") was bisected to break various non-x86 RISC Unix systems,
> > e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot
> > on modern ARM64, PPC64 and RISCV ISAs besides new enough x86.
>
> I think we need some kind of analysis of what is happening to the PCI
> devices here. I don't know why the CPU architecture per se would be
> related to PCI power management.
That surely would be the best, but given few maintainers work on older
architectures it might take a while. This is also old hw from before
2015, like the x86 DMI test. Given the commit enabled it for all that
previously failing the dmi year check due:
static inline int dmi_get_bios_year(void) { return -ENXIO; }
Is it not sensible to first reinstate this for such $arch also to
stable trees while we further work on this?
> pci_bridge_d3_possible() is already a barely maintainable hodge podge
> of random things that work and don't work. Generally speaking most of
> those cases relate to firmware.
Fair, but this is a rather simple hotfix, for a simple year chec, for
a commit that just recently broke this systems. I would also expect
this high performance Unix systems might not have been designed or
test with dynamic PCI power management in mind, ...
René
> > Sun Blade 1000:
> > ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0)
> > ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603]
> > ERROR(0):
> > TPC<MakeIocReady+0xc/0x278 [mptbase]>
> > ERROR(0): M_SYND(0), E_SYND(0), Privileged
> > ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus"
> > ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000]
> > ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000]
> > ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000]
> > ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000]
> > ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000]
> > ERROR(0): E-cache idx[b08040] tag[000000001e008fa0]
> > ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff]
> > Kernel panic - not syncing: Irrecoverable deferred error trap.
> > CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18
> > Call Trace:
> > [<00000000004294b0>] panic+0xf0/0x370
> > [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8
> > [<0000000000405e88>] c_deferred+0x18/0x24
> > [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase]
>
> I assume both of these crashes are related to the
> CHIPREG_READ32(&ioc->chip->Doorbell) in mpt_GetIocState(), e.g., maybe
> that PCI read failed because an upstream bridge was not in D0 and
> therefore treated the read as an unsupported request.
>
> > [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase]
> > [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase]
> > [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas]
> > [<0000000000788308>] local_pci_probe+0x24/0x70
> > [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0
> > [<000000000082633c>] really_probe+0x13c/0x29c
> > [<0000000000826590>] __driver_probe_device+0xf4/0x104
> > [<0000000000826614>] driver_probe_device+0x24/0xa0
> > [<000000000082683c>] __driver_attach+0xe8/0x104
> > [<0000000000824da0>] bus_for_each_dev+0x58/0x84
> > [<0000000000825508>] bus_add_driver+0xdc/0x1f8
> > [<0000000000827110>] driver_register+0x70/0x120
> >
> > Niagara T1:
> > mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
> > NON-RESUMABLE ERROR: Reporting on cpu 31
> > NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]>
> > NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000
> > NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000]
> > NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c]
> > NON-RESUMABLE ERROR: type [precise nonresumable]
> > NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv >
> > NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
> > Kernel panic - not syncing: Non-resumable error.
> > CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1
> > Call Trace:
> > [<00000000004373c4>] dump_stack+0x8/0x18
> > [<0000000000429540>] panic+0xf4/0x398
> > [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
> > [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
> > [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
> > [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
> > [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
> > [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
> > [<0000000000b3fab0>] local_pci_probe+0x30/0x80
> > [<0000000000b405d4>] pci_device_probe+0xb4/0x240
> > [<0000000000bfd348>] really_probe+0xc8/0x400
> > [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
> > [<0000000000bfd8c8>] driver_probe_device+0x28/0x100
> > [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
> > [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
> > [<0000000000bfcafc>] driver_attach+0x1c/0x40
> > Press Stop-A (L1-A) from sun keyboard or send break
> > twice on console to return to the boot prom
> >
> > Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86")
> > Signed-off-by: René Rebe <rene@exactco.de>
> > ---
> > Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01
> > ---
> > drivers/pci/pci.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b14dd064006c..7619d2cfa66d 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> >
> > /*
> > * Out of caution, we only allow PCIe ports from 2015 or newer
> > - * into D3 on x86.
> > + * into D3 or other modern ISAs only.
> > */
> > - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015)
> > + if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015)
> > return true;
> > break;
> > }
> > --
> > 2.52.0
> >
> > --
> > René Rebe, ExactCODE GmbH, Berlin, Germany
> > https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
--
René Rebe, ExactCODE GmbH, Berlin, Germany
https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
Hi Rene,
On Tue, 2025-12-02 at 17:40 +0100, René Rebe wrote:
> Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all
> non-x86") was bisected to break various non-x86 RISC Unix systems,
> e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot
> on modern ARM64, PPC64 and RISCV ISAs besides new enough x86.
I think "ISA" is a misnomer here as this issue is not a matter of the
instruction set architecture in use but the PCI bus. So, I suggest to
use the term "systems" here as well.
Plus, I suggest the following message for the summary:
"pci: Further restrict the use of D3 power state"
> Sun Blade 1000:
> ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0)
> ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603]
> ERROR(0):
> TPC<MakeIocReady+0xc/0x278 [mptbase]>
> ERROR(0): M_SYND(0), E_SYND(0), Privileged
> ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus"
> ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000]
> ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000]
> ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000]
> ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000]
> ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000]
> ERROR(0): E-cache idx[b08040] tag[000000001e008fa0]
> ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff]
> Kernel panic - not syncing: Irrecoverable deferred error trap.
> CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18
> Call Trace:
> [<00000000004294b0>] panic+0xf0/0x370
> [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8
> [<0000000000405e88>] c_deferred+0x18/0x24
> [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase]
> [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase]
> [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase]
> [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas]
> [<0000000000788308>] local_pci_probe+0x24/0x70
> [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0
> [<000000000082633c>] really_probe+0x13c/0x29c
> [<0000000000826590>] __driver_probe_device+0xf4/0x104
> [<0000000000826614>] driver_probe_device+0x24/0xa0
> [<000000000082683c>] __driver_attach+0xe8/0x104
> [<0000000000824da0>] bus_for_each_dev+0x58/0x84
> [<0000000000825508>] bus_add_driver+0xdc/0x1f8
> [<0000000000827110>] driver_register+0x70/0x120
>
> Niagara T1:
> mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
> NON-RESUMABLE ERROR: Reporting on cpu 31
> NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]>
> NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000
> NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000]
> NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c]
> NON-RESUMABLE ERROR: type [precise nonresumable]
> NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv >
> NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
> Kernel panic - not syncing: Non-resumable error.
> CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1
> Call Trace:
> [<00000000004373c4>] dump_stack+0x8/0x18
> [<0000000000429540>] panic+0xf4/0x398
> [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
> [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
> [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
> [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
> [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
> [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
> [<0000000000b3fab0>] local_pci_probe+0x30/0x80
> [<0000000000b405d4>] pci_device_probe+0xb4/0x240
> [<0000000000bfd348>] really_probe+0xc8/0x400
> [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
> [<0000000000bfd8c8>] driver_probe_device+0x28/0x100
> [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
> [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
> [<0000000000bfcafc>] driver_attach+0x1c/0x40
> Press Stop-A (L1-A) from sun keyboard or send break
> twice on console to return to the boot prom
>
> Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86")
> Signed-off-by: René Rebe <rene@exactco.de>
> ---
> Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01
> ---
> drivers/pci/pci.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b14dd064006c..7619d2cfa66d 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
>
> /*
> * Out of caution, we only allow PCIe ports from 2015 or newer
> - * into D3 on x86.
> + * into D3 or other modern ISAs only.
Same here, I suggest "systems" instead of "ISAs".
> */
> - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015)
> + if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015)
Is there actually a justification to restrict the use of D3 to ARM64,
PPC64 and RISCV? What about MIPS, LoongArch or s390x?
Thanks,
Adrian
> return true;
> break;
> }
> --
> 2.52.0
>
> --
> René Rebe, ExactCODE GmbH, Berlin, Germany
> https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Hi,
On Tue, 02 Dec 2025 17:54:33 +0100, John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:
> Hi Rene,
>
> On Tue, 2025-12-02 at 17:40 +0100, René Rebe wrote:
> > Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all
> > non-x86") was bisected to break various non-x86 RISC Unix systems,
> > e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot
> > on modern ARM64, PPC64 and RISCV ISAs besides new enough x86.
>
> I think "ISA" is a misnomer here as this issue is not a matter of the
> instruction set architecture in use but the PCI bus. So, I suggest to
> use the term "systems" here as well.
>
> Plus, I suggest the following message for the summary:
>
> "pci: Further restrict the use of D3 power state"
I thought ISA is the correct term and few still remember an "ISA" bus,
but happy to rephrase to whatever is preferred.
> > Sun Blade 1000:
> > ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0)
> > ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603]
> > ERROR(0):
> > TPC<MakeIocReady+0xc/0x278 [mptbase]>
> > ERROR(0): M_SYND(0), E_SYND(0), Privileged
> > ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus"
> > ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000]
> > ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000]
> > ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000]
> > ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000]
> > ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000]
> > ERROR(0): E-cache idx[b08040] tag[000000001e008fa0]
> > ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff]
> > Kernel panic - not syncing: Irrecoverable deferred error trap.
> > CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18
> > Call Trace:
> > [<00000000004294b0>] panic+0xf0/0x370
> > [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8
> > [<0000000000405e88>] c_deferred+0x18/0x24
> > [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase]
> > [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase]
> > [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase]
> > [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas]
> > [<0000000000788308>] local_pci_probe+0x24/0x70
> > [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0
> > [<000000000082633c>] really_probe+0x13c/0x29c
> > [<0000000000826590>] __driver_probe_device+0xf4/0x104
> > [<0000000000826614>] driver_probe_device+0x24/0xa0
> > [<000000000082683c>] __driver_attach+0xe8/0x104
> > [<0000000000824da0>] bus_for_each_dev+0x58/0x84
> > [<0000000000825508>] bus_add_driver+0xdc/0x1f8
> > [<0000000000827110>] driver_register+0x70/0x120
> >
> > Niagara T1:
> > mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
> > NON-RESUMABLE ERROR: Reporting on cpu 31
> > NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]>
> > NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000
> > NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000]
> > NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c]
> > NON-RESUMABLE ERROR: type [precise nonresumable]
> > NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv >
> > NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
> > Kernel panic - not syncing: Non-resumable error.
> > CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1
> > Call Trace:
> > [<00000000004373c4>] dump_stack+0x8/0x18
> > [<0000000000429540>] panic+0xf4/0x398
> > [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
> > [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
> > [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
> > [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
> > [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
> > [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
> > [<0000000000b3fab0>] local_pci_probe+0x30/0x80
> > [<0000000000b405d4>] pci_device_probe+0xb4/0x240
> > [<0000000000bfd348>] really_probe+0xc8/0x400
> > [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
> > [<0000000000bfd8c8>] driver_probe_device+0x28/0x100
> > [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
> > [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
> > [<0000000000bfcafc>] driver_attach+0x1c/0x40
> > Press Stop-A (L1-A) from sun keyboard or send break
> > twice on console to return to the boot prom
> >
> > Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86")
> > Signed-off-by: René Rebe <rene@exactco.de>
> > ---
> > Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01
> > ---
> > drivers/pci/pci.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b14dd064006c..7619d2cfa66d 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> >
> > /*
> > * Out of caution, we only allow PCIe ports from 2015 or newer
> > - * into D3 on x86.
> > + * into D3 or other modern ISAs only.
>
> Same here, I suggest "systems" instead of "ISAs".
>
> > */
> > - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015)
> > + if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015)
>
> Is there actually a justification to restrict the use of D3 to ARM64,
> PPC64 and RISCV? What about MIPS, LoongArch or s390x?
Because the ones I picked are more modern, and thus more likely to
work. MIPS is very old. and I have no LoongArch nor regular access to
s390x. Maybe users of those want to allow list after testing? Now that
I think about it I was wondering why ALSA RAD1 audio is not longer
working in my Sgi Octane with the PCI window not being enabled. Would
not suprise me it was some change like this, too. Should bisect next
;-)
Before the breakign change it was disabled for all this other arch
anyway with:
static inline int dmi_get_bios_year(void) { return -ENXIO; }
and comparing whether the negative error code is greater than 2014,
...
René
> Thanks,
> Adrian
>
> > return true;
> > break;
> > }
> > --
> > 2.52.0
> >
> > --
> > René Rebe, ExactCODE GmbH, Berlin, Germany
> > https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
>
> --
> .''`. John Paul Adrian Glaubitz
> : :' : Debian Developer
> `. `' Physicist
> `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
--
René Rebe, ExactCODE GmbH, Berlin, Germany
https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
On Tue, 2 Dec 2025, René Rebe wrote: > > Is there actually a justification to restrict the use of D3 to ARM64, > > PPC64 and RISCV? What about MIPS, LoongArch or s390x? > > Because the ones I picked are more modern, and thus more likely to > work. MIPS is very old. [...] How old is "very old?" Granted, the newest MIPS CPU/system controller (aka host bridge) I own is from 2013 and conventional PCI only, but that is just because the core was synthesised for interfacing a conventional PCI base board I have the core card plugged into. Is it very old already or just somewhat old? Chips continue being manufactured to date and I'm not sure as to new core designs, but those went through to at least 2018 and I'd expect some were combined with PCIe system controller IP. So this seems like something that needs to be keyed off perhaps the capabilities of the system controller/host bridge? If you give me a shell recipe to trigger the issue you came across, then I can see what happens with some of my MIPS systems. I've got a bunch of options with PCI-PCIe reverse bridges and PCIe switches I could try. Maciej
Hi,
> On 6. Dec 2025, at 02:07, Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
> On Tue, 2 Dec 2025, René Rebe wrote:
>
>>> Is there actually a justification to restrict the use of D3 to ARM64,
>>> PPC64 and RISCV? What about MIPS, LoongArch or s390x?
>>
>> Because the ones I picked are more modern, and thus more likely to
>> work. MIPS is very old. [...]
>
> How old is "very old?"
>
> Granted, the newest MIPS CPU/system controller (aka host bridge) I own is
> from 2013 and conventional PCI only, but that is just because the core was
> synthesised for interfacing a conventional PCI base board I have the core
> card plugged into. Is it very old already or just somewhat old?
>
> Chips continue being manufactured to date and I'm not sure as to new core
> designs, but those went through to at least 2018 and I'd expect some were
> combined with PCIe system controller IP.
>
> So this seems like something that needs to be keyed off perhaps the
> capabilities of the system controller/host bridge? If you give me a shell
> recipe to trigger the issue you came across, then I can see what happens
> with some of my MIPS systems. I've got a bunch of options with PCI-PCIe
> reverse bridges and PCIe switches I could try.
Just booting a kernel with or since a5fb3ff63287 ("PCI: Allow PCI bridges to go
to D3Hot on all non-x86”) should be enough. The systems that fail for me do
so instantly booting, usually earlier than later. e.g. when a storage, network or
system controller driver initializes.
Best,
René
--
https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
On Sat, 6 Dec 2025, René Rebe wrote:
> > So this seems like something that needs to be keyed off perhaps the
> > capabilities of the system controller/host bridge? If you give me a shell
> > recipe to trigger the issue you came across, then I can see what happens
> > with some of my MIPS systems. I've got a bunch of options with PCI-PCIe
> > reverse bridges and PCIe switches I could try.
>
> Just booting a kernel with or since a5fb3ff63287 ("PCI: Allow PCI bridges to go
> to D3Hot on all non-x86”) should be enough. The systems that fail for me do
> so instantly booting, usually earlier than later. e.g. when a storage, network or
> system controller driver initializes.
I booted 6.18 as released last week on my Malta and saw no issues in this
area.
Maciej
On Sat, 2025-12-06 at 01:07 +0000, Maciej W. Rozycki wrote: > On Tue, 2 Dec 2025, René Rebe wrote: > > > > Is there actually a justification to restrict the use of D3 to ARM64, > > > PPC64 and RISCV? What about MIPS, LoongArch or s390x? > > > > Because the ones I picked are more modern, and thus more likely to > > work. MIPS is very old. [...] > > How old is "very old?" I've got two desktop and one embedded Loongson MIPS systems at home (not LoongArch) and these are very recent (made in the 2020s). The desktop systems already come with PCI Express slots. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
(Resent, was accidentally HTML before :-/)
Hey,
On 6. Dec 2025, at 09:31, John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:
>
> On Sat, 2025-12-06 at 01:07 +0000, Maciej W. Rozycki wrote:
>> On Tue, 2 Dec 2025, René Rebe wrote:
>>
>>>> Is there actually a justification to restrict the use of D3 to ARM64,
>>>> PPC64 and RISCV? What about MIPS, LoongArch or s390x?
>>>
>>> Because the ones I picked are more modern, and thus more likely to
>>> work. MIPS is very old. [...]
>>
>> How old is "very old?"
>
> I've got two desktop and one embedded Loongson MIPS systems at home (not LoongArch)
> and these are very recent (made in the 2020s). The desktop systems already come with
> PCI Express slots.
That’s great and all, but did you test a recent kernel since this PCI change I bisected
for sparc64?
I love my quirky Sgi MIPS64 Octane and O2 also very much, but fact is: those
systems had not only special proprietary high speed xbow interconnects, but also
very glitchy PCI bridges that already barely work to start with.
Also that just one modern Loongson system might work, does not mean all the
history of MIPS(64) system will be okay.
Just yesterday I found this change also breaking my HP PA-RISC C8000 [1] with:
BT Port failed to come ready!
BT_TRANSFER_INIT: B_BUSY failed to clear!
There was a reason given my experience keeping all CPU ISAs supported,
I had initially only chosen to allow modern ones. And again, they all where
not allowed to D3hot before, and only randomly allow listed since a5fb3ff63287
("PCI: Allow PCI bridges to go to D3Hot on all non-x86”), Mar 20 11:06:04 2025.
So we probably should update this to at least include HPPA until someone
finds time to further debug and patch this better.
That being said I did not yet found an issue on old x86 systems with the 2015
Year check removed to d3hot those more than mainline currently does.
Mit freundlichen Grüßen,
René
[1] https://t2linux.com/hardware/desktop/HP/c8000/
--
https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
On Tue, 2 Dec 2025, René Rebe wrote: > s390x. Maybe users of those want to allow list after testing? Now that > I think about it I was wondering why ALSA RAD1 audio is not longer > working in my Sgi Octane with the PCI window not being enabled. Would > not suprise me it was some change like this, too. Should bisect next Hi René, Could you please send me a dmesg and contents of the /proc/iomem (taken with root right so it shows the real addresses) so I can look at this PCI bridge window issue. If you know a working kernel, having logs from working and broken case would be very helpful to easily locate the differences. At this point, no need to bisect as I might be able to figure it out even without pinpointing the commit. To avoid spending on issues that are already know and have a fix, please check you're not running somewhat old kernel as I've already fixed a few things that have gotten broken due to recent made PCI bridge window fitting and assignment algorithm changes. -- i.
Hi Ilpo, On Tue, 2 Dec 2025 20:20:09 +0200 (EET), Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > On Tue, 2 Dec 2025, René Rebe wrote: > > > s390x. Maybe users of those want to allow list after testing? Now that > > I think about it I was wondering why ALSA RAD1 audio is not longer > > working in my Sgi Octane with the PCI window not being enabled. Would > > not suprise me it was some change like this, too. Should bisect next > > Hi René, > > Could you please send me a dmesg and contents of the /proc/iomem (taken > with root right so it shows the real addresses) so I can look at this PCI > bridge window issue. If you know a working kernel, having logs from > working and broken case would be very helpful to easily locate the > differences. Thank you so much for offering help with that different issue. Sgi/Octane IP30 only went upstream some years ago. I only have the likewise not upstream snd-rad1 working with much older out of tree kernels. Thanks you for the hints, I'll try to find some time to to further debug this soon to bring the snd-rad1 ALSA driver upstream, too. > At this point, no need to bisect as I might be able to figure it out even > without pinpointing the commit. To avoid spending on issues that are > already know and have a fix, please check you're not running somewhat old > kernel as I've already fixed a few things that have gotten broken due to > recent made PCI bridge window fitting and assignment algorithm changes. I can not easily bisect mips64 sgi-ip30 anyway. As it was out of tree for 20y and the uptreamed code changed a lot during cleanup for merging. Good to have a contact to look into this next. Thanks! René -- René Rebe, ExactCODE GmbH, Berlin, Germany https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
On Tue, 2 Dec 2025, René Rebe wrote: > Hi Ilpo, > > On Tue, 2 Dec 2025 20:20:09 +0200 (EET), Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > On Tue, 2 Dec 2025, René Rebe wrote: > > > > > s390x. Maybe users of those want to allow list after testing? Now that > > > I think about it I was wondering why ALSA RAD1 audio is not longer > > > working in my Sgi Octane with the PCI window not being enabled. Would > > > not suprise me it was some change like this, too. Should bisect next > > > > Hi René, > > > > Could you please send me a dmesg and contents of the /proc/iomem (taken > > with root right so it shows the real addresses) so I can look at this PCI > > bridge window issue. If you know a working kernel, having logs from > > working and broken case would be very helpful to easily locate the > > differences. > > Thank you so much for offering help with that different > issue. Sgi/Octane IP30 only went upstream some years ago. I only have > the likewise not upstream snd-rad1 working with much older out of tree > kernels. Thanks you for the hints, I'll try to find some time to to > further debug this soon to bring the snd-rad1 ALSA driver upstream, > too. Okay, if it's an old issue, it's likely not because of the recent PCI core changes. If there are "can't assign" or "no compatible bridge window" lines for PCI resources in the log, those happen before some endpoint driver even comes into picture so it could be PCI core issue so in that sense it might not matter if the endpoint driver is in-tree or out-of-tree as long as the kernel you're testing with is otherwise "new enough" to contain the recent changes and fixes to PCI subsystem. -- i. > > At this point, no need to bisect as I might be able to figure it out even > > without pinpointing the commit. To avoid spending on issues that are > > already know and have a fix, please check you're not running somewhat old > > kernel as I've already fixed a few things that have gotten broken due to > > recent made PCI bridge window fitting and assignment algorithm changes. > > I can not easily bisect mips64 sgi-ip30 anyway. As it was out of tree > for 20y and the uptreamed code changed a lot during cleanup for > merging. > > Good to have a contact to look into this next. > > Thanks! > René > >
© 2016 - 2025 Red Hat, Inc.