drivers/pci/pci-driver.c | 8 +++++++ drivers/pci/pci.c | 33 ++++++++++++-------------- drivers/pci/pci.h | 2 +- drivers/pci/probe.c | 12 ---------- drivers/pci/quirks.c | 62 ++++++++++++------------------------------------ include/linux/pci.h | 2 ++ 6 files changed, 41 insertions(+), 78 deletions(-)
Hi,
This series fixes the long standing issue with ACS in OF platforms. There are
two fixes in this series, both fixing independent issues on their own, but both
are needed to properly enable ACS on OF platforms.
Issue(s) background
===================
Back in 2021, Xingang Wang first noted a failure in attaching the HiSilicon SEC
device to QEMU ARM64 pci-root-port device [1]. He then tracked down the issue to
ACS not being enabled for the QEMU Root Port device and he proposed a patch to
fix it [2].
Once the patch got applied, people reported PCIe issues with linux-next on the
ARM Juno Development boards, where they saw failure in enumerating the endpoint
devices [3][4]. So soon, the patch got dropped, but the actual issue with the
ARM Juno boards was left behind.
Fast forward to 2024, Pavan resubmitted the same fix [5] for his own usecase,
hoping that someone in the community would fix the issue with ARM Juno boards.
But the patch was rightly rejected, as a patch that was known to cause issues
should not be merged to the kernel. But again, no one investigated the Juno
issue and it was left behind again.
Now it ended up in my plate and I managed to track down the issue with the help
of Naresh who got access to the Juno boards in LKFT. The Juno issue was with the
PCIe switch from Microsemi/IDT, which triggers ACS Source Validation error on
Completions received for the Configuration Read Request from a device connected
to the downstream port that has not yet captured the PCIe bus number. As per the
PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and Device Numbers
supplied with all Type 0 Configuration Write Requests completed by the Function
and supply these numbers in the Bus and Device Number fields of the Requester ID
for all Requests". So during the first Configuration Read Request issued by the
switch downstream port during enumeration (for reading Vendor ID), Bus and
Device numbers will be unknown to the device. So it responds to the Read Request
with Completion having Bus and Device number as 0. The switch interprets the
Completion as an ACS Source Validation error and drops the completion, leading
to the failure in detecting the endpoint device. Though the PCIe spec r6.0, sec
6.12.1.1, states that "Completions are never affected by ACS Source Validation".
This behavior is in violation of the spec.
Solution
========
In September, I submitted a series [6] to fix both issues. For the IDT issue,
I reused the existing quirk in the PCI core which does a dummy config write
before issuing the first config read to the device. And for the ACS enablement
issue, I just resubmitted the original patch from Xingang which called
pci_request_acs() from devm_of_pci_bridge_init().
But during the review of the series, several comments were received and they
required the series to be reworked completely. Hence, in this version, I've
incorported the comments as below:
1. For the ACS enablement issue, I've moved the pci_enable_acs() call from
pci_acs_init() to pci_dma_configure().
2. For the IDT issue, I've cached the ACS capabilities (RO) in 'pci_dev',
collected the broken capability for the IDT switches in the quirk and used it to
disable the capability in the cache. This also allowed me to get rid of the
earlier workaround for the switch.
[1] https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
[2] https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
[3] https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
[4] https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
[5] https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
[6] https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
Changes in v2:
* Reworked the patches completely as mentioned above.
* Rebased on top of v6.18-rc7
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
Manivannan Sadhasivam (4):
PCI: Enable ACS only after configuring IOMMU for OF platforms
PCI: Cache ACS capabilities
PCI: Disable ACS SV capability for the broken IDT switches
PCI: Extend the pci_disable_acs_sv quirk for one more IDT switch
drivers/pci/pci-driver.c | 8 +++++++
drivers/pci/pci.c | 33 ++++++++++++--------------
drivers/pci/pci.h | 2 +-
drivers/pci/probe.c | 12 ----------
drivers/pci/quirks.c | 62 ++++++++++++------------------------------------
include/linux/pci.h | 2 ++
6 files changed, 41 insertions(+), 78 deletions(-)
---
base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d
change-id: 20251201-pci_acs-b15aa3947289
Best regards,
--
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
On 02.12.2025 15:22, Manivannan Sadhasivam wrote: > This series fixes the long standing issue with ACS in OF platforms. There are > two fixes in this series, both fixing independent issues on their own, but both > are needed to properly enable ACS on OF platforms. > > Issue(s) background > =================== > > Back in 2021, Xingang Wang first noted a failure in attaching the HiSilicon SEC > device to QEMU ARM64 pci-root-port device [1]. He then tracked down the issue to > ACS not being enabled for the QEMU Root Port device and he proposed a patch to > fix it [2]. > > Once the patch got applied, people reported PCIe issues with linux-next on the > ARM Juno Development boards, where they saw failure in enumerating the endpoint > devices [3][4]. So soon, the patch got dropped, but the actual issue with the > ARM Juno boards was left behind. > > Fast forward to 2024, Pavan resubmitted the same fix [5] for his own usecase, > hoping that someone in the community would fix the issue with ARM Juno boards. > But the patch was rightly rejected, as a patch that was known to cause issues > should not be merged to the kernel. But again, no one investigated the Juno > issue and it was left behind again. > > Now it ended up in my plate and I managed to track down the issue with the help > of Naresh who got access to the Juno boards in LKFT. The Juno issue was with the > PCIe switch from Microsemi/IDT, which triggers ACS Source Validation error on > Completions received for the Configuration Read Request from a device connected > to the downstream port that has not yet captured the PCIe bus number. As per the > PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and Device Numbers > supplied with all Type 0 Configuration Write Requests completed by the Function > and supply these numbers in the Bus and Device Number fields of the Requester ID > for all Requests". So during the first Configuration Read Request issued by the > switch downstream port during enumeration (for reading Vendor ID), Bus and > Device numbers will be unknown to the device. So it responds to the Read Request > with Completion having Bus and Device number as 0. The switch interprets the > Completion as an ACS Source Validation error and drops the completion, leading > to the failure in detecting the endpoint device. Though the PCIe spec r6.0, sec > 6.12.1.1, states that "Completions are never affected by ACS Source Validation". > This behavior is in violation of the spec. > > Solution > ======== > > In September, I submitted a series [6] to fix both issues. For the IDT issue, > I reused the existing quirk in the PCI core which does a dummy config write > before issuing the first config read to the device. And for the ACS enablement > issue, I just resubmitted the original patch from Xingang which called > pci_request_acs() from devm_of_pci_bridge_init(). > > But during the review of the series, several comments were received and they > required the series to be reworked completely. Hence, in this version, I've > incorported the comments as below: > > 1. For the ACS enablement issue, I've moved the pci_enable_acs() call from > pci_acs_init() to pci_dma_configure(). > > 2. For the IDT issue, I've cached the ACS capabilities (RO) in 'pci_dev', > collected the broken capability for the IDT switches in the quirk and used it to > disable the capability in the cache. This also allowed me to get rid of the > earlier workaround for the switch. > > [1] https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com > [2] https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com > [3] https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com > [4] https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com > [5] https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com > [6] https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com > Thanks for this patchset! I've tested it on my ARM Juno R1 and it looks that it almost works fine. This patchset even fixed some issues with PCI devices probe, as I again see SATA and GBit ethernet devices, which were missing since Linux v6.14 (it looks that I've also missed this in my tests). # lspci 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01) 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 (rev 02) 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 (rev 02) 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 (rev 02) 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 (rev 02) 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 (rev 02) 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 (rev 02) 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 (rev 02) 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057 PCI-E Gigabit Ethernet Controller However there is also a regression. After applying this patchset system suspend/resume stopped working. This is probably related to this message: pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot, device inaccessible which appears after calling 'rtcwake -s10 -mmem'. This might not be related to this patchset, so I probably need to apply it on older kernel releases and check. > Changes in v2: > > * Reworked the patches completely as mentioned above. > * Rebased on top of v6.18-rc7 > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> > --- > Manivannan Sadhasivam (4): > PCI: Enable ACS only after configuring IOMMU for OF platforms > PCI: Cache ACS capabilities > PCI: Disable ACS SV capability for the broken IDT switches > PCI: Extend the pci_disable_acs_sv quirk for one more IDT switch > > drivers/pci/pci-driver.c | 8 +++++++ > drivers/pci/pci.c | 33 ++++++++++++-------------- > drivers/pci/pci.h | 2 +- > drivers/pci/probe.c | 12 ---------- > drivers/pci/quirks.c | 62 ++++++++++++------------------------------------ > include/linux/pci.h | 2 ++ > 6 files changed, 41 insertions(+), 78 deletions(-) > --- > base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d > change-id: 20251201-pci_acs-b15aa3947289 Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland
On 03.12.2025 13:04, Marek Szyprowski wrote: > On 02.12.2025 15:22, Manivannan Sadhasivam wrote: >> This series fixes the long standing issue with ACS in OF platforms. >> There are >> two fixes in this series, both fixing independent issues on their >> own, but both >> are needed to properly enable ACS on OF platforms. >> >> Issue(s) background >> =================== >> >> Back in 2021, Xingang Wang first noted a failure in attaching the >> HiSilicon SEC >> device to QEMU ARM64 pci-root-port device [1]. He then tracked down >> the issue to >> ACS not being enabled for the QEMU Root Port device and he proposed a >> patch to >> fix it [2]. >> >> Once the patch got applied, people reported PCIe issues with >> linux-next on the >> ARM Juno Development boards, where they saw failure in enumerating >> the endpoint >> devices [3][4]. So soon, the patch got dropped, but the actual issue >> with the >> ARM Juno boards was left behind. >> >> Fast forward to 2024, Pavan resubmitted the same fix [5] for his own >> usecase, >> hoping that someone in the community would fix the issue with ARM >> Juno boards. >> But the patch was rightly rejected, as a patch that was known to >> cause issues >> should not be merged to the kernel. But again, no one investigated >> the Juno >> issue and it was left behind again. >> >> Now it ended up in my plate and I managed to track down the issue >> with the help >> of Naresh who got access to the Juno boards in LKFT. The Juno issue >> was with the >> PCIe switch from Microsemi/IDT, which triggers ACS Source Validation >> error on >> Completions received for the Configuration Read Request from a device >> connected >> to the downstream port that has not yet captured the PCIe bus number. >> As per the >> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and >> Device Numbers >> supplied with all Type 0 Configuration Write Requests completed by >> the Function >> and supply these numbers in the Bus and Device Number fields of the >> Requester ID >> for all Requests". So during the first Configuration Read Request >> issued by the >> switch downstream port during enumeration (for reading Vendor ID), >> Bus and >> Device numbers will be unknown to the device. So it responds to the >> Read Request >> with Completion having Bus and Device number as 0. The switch >> interprets the >> Completion as an ACS Source Validation error and drops the >> completion, leading >> to the failure in detecting the endpoint device. Though the PCIe spec >> r6.0, sec >> 6.12.1.1, states that "Completions are never affected by ACS Source >> Validation". >> This behavior is in violation of the spec. >> >> Solution >> ======== >> >> In September, I submitted a series [6] to fix both issues. For the >> IDT issue, >> I reused the existing quirk in the PCI core which does a dummy config >> write >> before issuing the first config read to the device. And for the ACS >> enablement >> issue, I just resubmitted the original patch from Xingang which called >> pci_request_acs() from devm_of_pci_bridge_init(). >> >> But during the review of the series, several comments were received >> and they >> required the series to be reworked completely. Hence, in this >> version, I've >> incorported the comments as below: >> >> 1. For the ACS enablement issue, I've moved the pci_enable_acs() call >> from >> pci_acs_init() to pci_dma_configure(). >> >> 2. For the IDT issue, I've cached the ACS capabilities (RO) in >> 'pci_dev', >> collected the broken capability for the IDT switches in the quirk and >> used it to >> disable the capability in the cache. This also allowed me to get rid >> of the >> earlier workaround for the switch. >> >> [1] >> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com >> [2] >> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com >> [3] >> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com >> [4] >> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com >> [5] >> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com >> [6] >> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com >> > Thanks for this patchset! I've tested it on my ARM Juno R1 and it > looks that it almost works fine. This patchset even fixed some issues > with PCI devices probe, as I again see SATA and GBit ethernet devices, > which were missing since Linux v6.14 (it looks that I've also missed > this in my tests). > > # lspci > 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01) > 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device > 8090 (rev 02) > 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device > 8090 (rev 02) > 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device > 8090 (rev 02) > 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device > 8090 (rev 02) > 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device > 8090 (rev 02) > 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device > 8090 (rev 02) > 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device > 8090 (rev 02) > 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial > ATA Raid II Controller (rev 01) > 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057 > PCI-E Gigabit Ethernet Controller > > However there is also a regression. After applying this patchset > system suspend/resume stopped working. This is probably related to > this message: > > pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot, > device inaccessible > > which appears after calling 'rtcwake -s10 -mmem'. This might not be > related to this patchset, so I probably need to apply it on older > kernel releases and check. Just one more information - I've applied this patchset on top of v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet are visible again and system suspend/resume works too, so the issue with the latter on top of v6.18 seems not to be directly related to $subject patchset. I will try to bisect this issue when I have some spare time. Feel free to add: Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland
On 04.12.2025 14:13, Marek Szyprowski wrote:
> On 03.12.2025 13:04, Marek Szyprowski wrote:
>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>>> This series fixes the long standing issue with ACS in OF platforms.
>>> There are
>>> two fixes in this series, both fixing independent issues on their
>>> own, but both
>>> are needed to properly enable ACS on OF platforms.
>>>
>>> Issue(s) background
>>> ===================
>>>
>>> Back in 2021, Xingang Wang first noted a failure in attaching the
>>> HiSilicon SEC
>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
>>> the issue to
>>> ACS not being enabled for the QEMU Root Port device and he proposed
>>> a patch to
>>> fix it [2].
>>>
>>> Once the patch got applied, people reported PCIe issues with
>>> linux-next on the
>>> ARM Juno Development boards, where they saw failure in enumerating
>>> the endpoint
>>> devices [3][4]. So soon, the patch got dropped, but the actual issue
>>> with the
>>> ARM Juno boards was left behind.
>>>
>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his own
>>> usecase,
>>> hoping that someone in the community would fix the issue with ARM
>>> Juno boards.
>>> But the patch was rightly rejected, as a patch that was known to
>>> cause issues
>>> should not be merged to the kernel. But again, no one investigated
>>> the Juno
>>> issue and it was left behind again.
>>>
>>> Now it ended up in my plate and I managed to track down the issue
>>> with the help
>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
>>> was with the
>>> PCIe switch from Microsemi/IDT, which triggers ACS Source Validation
>>> error on
>>> Completions received for the Configuration Read Request from a
>>> device connected
>>> to the downstream port that has not yet captured the PCIe bus
>>> number. As per the
>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
>>> Device Numbers
>>> supplied with all Type 0 Configuration Write Requests completed by
>>> the Function
>>> and supply these numbers in the Bus and Device Number fields of the
>>> Requester ID
>>> for all Requests". So during the first Configuration Read Request
>>> issued by the
>>> switch downstream port during enumeration (for reading Vendor ID),
>>> Bus and
>>> Device numbers will be unknown to the device. So it responds to the
>>> Read Request
>>> with Completion having Bus and Device number as 0. The switch
>>> interprets the
>>> Completion as an ACS Source Validation error and drops the
>>> completion, leading
>>> to the failure in detecting the endpoint device. Though the PCIe
>>> spec r6.0, sec
>>> 6.12.1.1, states that "Completions are never affected by ACS Source
>>> Validation".
>>> This behavior is in violation of the spec.
>>>
>>> Solution
>>> ========
>>>
>>> In September, I submitted a series [6] to fix both issues. For the
>>> IDT issue,
>>> I reused the existing quirk in the PCI core which does a dummy
>>> config write
>>> before issuing the first config read to the device. And for the ACS
>>> enablement
>>> issue, I just resubmitted the original patch from Xingang which called
>>> pci_request_acs() from devm_of_pci_bridge_init().
>>>
>>> But during the review of the series, several comments were received
>>> and they
>>> required the series to be reworked completely. Hence, in this
>>> version, I've
>>> incorported the comments as below:
>>>
>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
>>> call from
>>> pci_acs_init() to pci_dma_configure().
>>>
>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
>>> 'pci_dev',
>>> collected the broken capability for the IDT switches in the quirk
>>> and used it to
>>> disable the capability in the cache. This also allowed me to get rid
>>> of the
>>> earlier workaround for the switch.
>>>
>>> [1]
>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>>> [2]
>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>>> [3]
>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>>> [4]
>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>>> [5]
>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>>> [6]
>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>>
>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
>> looks that it almost works fine. This patchset even fixed some issues
>> with PCI devices probe, as I again see SATA and GBit ethernet
>> devices, which were missing since Linux v6.14 (it looks that
>> I've also missed this in my tests).
>>
>> # lspci
>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>> 8090 (rev 02)
>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>> 8090 (rev 02)
>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>> 8090 (rev 02)
>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>> 8090 (rev 02)
>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>> 8090 (rev 02)
>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>> 8090 (rev 02)
>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>> 8090 (rev 02)
>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
>> ATA Raid II Controller (rev 01)
>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
>> PCI-E Gigabit Ethernet Controller
>>
>> However there is also a regression. After applying this patchset
>> system suspend/resume stopped working. This is probably related to
>> this message:
>>
>> pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot,
>> device inaccessible
>>
>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
>> related to this patchset, so I probably need to apply it on older
>> kernel releases and check.
>
>
> Just one more information - I've applied this patchset on top of v6.16
> and it works perfectly on ARM Juno R1. SATA and GBit ethernet are
> visible again and system suspend/resume works too, so the issue with
> the latter on top of v6.18 seems not to be directly related to
> $subject patchset. I will try to bisect this issue when I have some
> spare time.
>
> Feel free to add:
>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
I spent some time analyzing this regression on Juno R1 and found that:
1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
("iommu: Get DT/ACPI parsing into the proper probe path") merged to
v6.15-rc1.
2. With $subject patch applied to enable SATA & GBit ethernet again,
system suspend/resume stopped working after commit f3ac2ff14834
("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
platforms") merged to v6.18-rc1.
If I got it right, according to the latter commit message, some quirks
have to be added to fix the suspend/resume issue. Unfortunately I have
no idea if this is the Juno R1 or the given PCI devices specific issue.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On 09.12.2025 08:31, Marek Szyprowski wrote:
> On 04.12.2025 14:13, Marek Szyprowski wrote:
>> On 03.12.2025 13:04, Marek Szyprowski wrote:
>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>>>> This series fixes the long standing issue with ACS in OF platforms.
>>>> There are
>>>> two fixes in this series, both fixing independent issues on their
>>>> own, but both
>>>> are needed to properly enable ACS on OF platforms.
>>>>
>>>> Issue(s) background
>>>> ===================
>>>>
>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
>>>> HiSilicon SEC
>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
>>>> the issue to
>>>> ACS not being enabled for the QEMU Root Port device and he proposed
>>>> a patch to
>>>> fix it [2].
>>>>
>>>> Once the patch got applied, people reported PCIe issues with
>>>> linux-next on the
>>>> ARM Juno Development boards, where they saw failure in enumerating
>>>> the endpoint
>>>> devices [3][4]. So soon, the patch got dropped, but the actual
>>>> issue with the
>>>> ARM Juno boards was left behind.
>>>>
>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
>>>> own usecase,
>>>> hoping that someone in the community would fix the issue with ARM
>>>> Juno boards.
>>>> But the patch was rightly rejected, as a patch that was known to
>>>> cause issues
>>>> should not be merged to the kernel. But again, no one investigated
>>>> the Juno
>>>> issue and it was left behind again.
>>>>
>>>> Now it ended up in my plate and I managed to track down the issue
>>>> with the help
>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
>>>> was with the
>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
>>>> Validation error on
>>>> Completions received for the Configuration Read Request from a
>>>> device connected
>>>> to the downstream port that has not yet captured the PCIe bus
>>>> number. As per the
>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
>>>> Device Numbers
>>>> supplied with all Type 0 Configuration Write Requests completed by
>>>> the Function
>>>> and supply these numbers in the Bus and Device Number fields of the
>>>> Requester ID
>>>> for all Requests". So during the first Configuration Read Request
>>>> issued by the
>>>> switch downstream port during enumeration (for reading Vendor ID),
>>>> Bus and
>>>> Device numbers will be unknown to the device. So it responds to the
>>>> Read Request
>>>> with Completion having Bus and Device number as 0. The switch
>>>> interprets the
>>>> Completion as an ACS Source Validation error and drops the
>>>> completion, leading
>>>> to the failure in detecting the endpoint device. Though the PCIe
>>>> spec r6.0, sec
>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
>>>> Validation".
>>>> This behavior is in violation of the spec.
>>>>
>>>> Solution
>>>> ========
>>>>
>>>> In September, I submitted a series [6] to fix both issues. For the
>>>> IDT issue,
>>>> I reused the existing quirk in the PCI core which does a dummy
>>>> config write
>>>> before issuing the first config read to the device. And for the ACS
>>>> enablement
>>>> issue, I just resubmitted the original patch from Xingang which called
>>>> pci_request_acs() from devm_of_pci_bridge_init().
>>>>
>>>> But during the review of the series, several comments were received
>>>> and they
>>>> required the series to be reworked completely. Hence, in this
>>>> version, I've
>>>> incorported the comments as below:
>>>>
>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
>>>> call from
>>>> pci_acs_init() to pci_dma_configure().
>>>>
>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
>>>> 'pci_dev',
>>>> collected the broken capability for the IDT switches in the quirk
>>>> and used it to
>>>> disable the capability in the cache. This also allowed me to get
>>>> rid of the
>>>> earlier workaround for the switch.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>>>> [2]
>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>>>> [3]
>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>>>> [4]
>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>>>> [5]
>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>>>> [6]
>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>>>
>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
>>> looks that it almost works fine. This patchset even fixed some
>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
>>> devices, which were missing since Linux v6.14 (it looks that
>>> I've also missed this in my tests).
>>>
>>> # lspci
>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>> 8090 (rev 02)
>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>> 8090 (rev 02)
>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>> 8090 (rev 02)
>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>> 8090 (rev 02)
>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>> 8090 (rev 02)
>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>> 8090 (rev 02)
>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>> 8090 (rev 02)
>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
>>> ATA Raid II Controller (rev 01)
>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
>>> PCI-E Gigabit Ethernet Controller
>>>
>>> However there is also a regression. After applying this patchset
>>> system suspend/resume stopped working. This is probably related to
>>> this message:
>>>
>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
>>> D3hot, device inaccessible
>>>
>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
>>> related to this patchset, so I probably need to apply it on older
>>> kernel releases and check.
>>
>>
>> Just one more information - I've applied this patchset on top of
>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
>> are visible again and system suspend/resume works too, so the issue
>> with the latter on top of v6.18 seems not to be directly related to
>> $subject patchset. I will try to bisect this issue when I have some
>> spare time.
>>
>> Feel free to add:
>>
>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>
>
> I spent some time analyzing this regression on Juno R1 and found that:
>
> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
> v6.15-rc1.
>
> 2. With $subject patch applied to enable SATA & GBit ethernet again,
> system suspend/resume stopped working after commit f3ac2ff14834
> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
> platforms") merged to v6.18-rc1.
>
> If I got it right, according to the latter commit message, some quirks
> have to be added to fix the suspend/resume issue. Unfortunately I have
> no idea if this is the Juno R1 or the given PCI devices specific issue.
And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
L1 for devicetree platforms") doesn't fix the suspend/resume issue
either (with $subject patchset applied on top of it).
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
> On 09.12.2025 08:31, Marek Szyprowski wrote:
> > On 04.12.2025 14:13, Marek Szyprowski wrote:
> >> On 03.12.2025 13:04, Marek Szyprowski wrote:
> >>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
> >>>> This series fixes the long standing issue with ACS in OF platforms.
> >>>> There are
> >>>> two fixes in this series, both fixing independent issues on their
> >>>> own, but both
> >>>> are needed to properly enable ACS on OF platforms.
> >>>>
> >>>> Issue(s) background
> >>>> ===================
> >>>>
> >>>> Back in 2021, Xingang Wang first noted a failure in attaching the
> >>>> HiSilicon SEC
> >>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
> >>>> the issue to
> >>>> ACS not being enabled for the QEMU Root Port device and he proposed
> >>>> a patch to
> >>>> fix it [2].
> >>>>
> >>>> Once the patch got applied, people reported PCIe issues with
> >>>> linux-next on the
> >>>> ARM Juno Development boards, where they saw failure in enumerating
> >>>> the endpoint
> >>>> devices [3][4]. So soon, the patch got dropped, but the actual
> >>>> issue with the
> >>>> ARM Juno boards was left behind.
> >>>>
> >>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
> >>>> own usecase,
> >>>> hoping that someone in the community would fix the issue with ARM
> >>>> Juno boards.
> >>>> But the patch was rightly rejected, as a patch that was known to
> >>>> cause issues
> >>>> should not be merged to the kernel. But again, no one investigated
> >>>> the Juno
> >>>> issue and it was left behind again.
> >>>>
> >>>> Now it ended up in my plate and I managed to track down the issue
> >>>> with the help
> >>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
> >>>> was with the
> >>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
> >>>> Validation error on
> >>>> Completions received for the Configuration Read Request from a
> >>>> device connected
> >>>> to the downstream port that has not yet captured the PCIe bus
> >>>> number. As per the
> >>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
> >>>> Device Numbers
> >>>> supplied with all Type 0 Configuration Write Requests completed by
> >>>> the Function
> >>>> and supply these numbers in the Bus and Device Number fields of the
> >>>> Requester ID
> >>>> for all Requests". So during the first Configuration Read Request
> >>>> issued by the
> >>>> switch downstream port during enumeration (for reading Vendor ID),
> >>>> Bus and
> >>>> Device numbers will be unknown to the device. So it responds to the
> >>>> Read Request
> >>>> with Completion having Bus and Device number as 0. The switch
> >>>> interprets the
> >>>> Completion as an ACS Source Validation error and drops the
> >>>> completion, leading
> >>>> to the failure in detecting the endpoint device. Though the PCIe
> >>>> spec r6.0, sec
> >>>> 6.12.1.1, states that "Completions are never affected by ACS Source
> >>>> Validation".
> >>>> This behavior is in violation of the spec.
> >>>>
> >>>> Solution
> >>>> ========
> >>>>
> >>>> In September, I submitted a series [6] to fix both issues. For the
> >>>> IDT issue,
> >>>> I reused the existing quirk in the PCI core which does a dummy
> >>>> config write
> >>>> before issuing the first config read to the device. And for the ACS
> >>>> enablement
> >>>> issue, I just resubmitted the original patch from Xingang which called
> >>>> pci_request_acs() from devm_of_pci_bridge_init().
> >>>>
> >>>> But during the review of the series, several comments were received
> >>>> and they
> >>>> required the series to be reworked completely. Hence, in this
> >>>> version, I've
> >>>> incorported the comments as below:
> >>>>
> >>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
> >>>> call from
> >>>> pci_acs_init() to pci_dma_configure().
> >>>>
> >>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
> >>>> 'pci_dev',
> >>>> collected the broken capability for the IDT switches in the quirk
> >>>> and used it to
> >>>> disable the capability in the cache. This also allowed me to get
> >>>> rid of the
> >>>> earlier workaround for the switch.
> >>>>
> >>>> [1]
> >>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
> >>>> [2]
> >>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
> >>>> [3]
> >>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
> >>>> [4]
> >>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
> >>>> [5]
> >>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
> >>>> [6]
> >>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
> >>>>
> >>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
> >>> looks that it almost works fine. This patchset even fixed some
> >>> issues with PCI devices probe, as I again see SATA and GBit ethernet
> >>> devices, which were missing since Linux v6.14 (it looks that
> >>> I've also missed this in my tests).
> >>>
> >>> # lspci
> >>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
> >>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>> 8090 (rev 02)
> >>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>> 8090 (rev 02)
> >>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>> 8090 (rev 02)
> >>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>> 8090 (rev 02)
> >>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>> 8090 (rev 02)
> >>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>> 8090 (rev 02)
> >>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>> 8090 (rev 02)
> >>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
> >>> ATA Raid II Controller (rev 01)
> >>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
> >>> PCI-E Gigabit Ethernet Controller
> >>>
> >>> However there is also a regression. After applying this patchset
> >>> system suspend/resume stopped working. This is probably related to
> >>> this message:
> >>>
> >>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
> >>> D3hot, device inaccessible
> >>>
> >>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
> >>> related to this patchset, so I probably need to apply it on older
> >>> kernel releases and check.
> >>
> >>
> >> Just one more information - I've applied this patchset on top of
> >> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
> >> are visible again and system suspend/resume works too, so the issue
> >> with the latter on top of v6.18 seems not to be directly related to
> >> $subject patchset. I will try to bisect this issue when I have some
> >> spare time.
> >>
> >> Feel free to add:
> >>
> >> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >
> >
> > I spent some time analyzing this regression on Juno R1 and found that:
> >
> > 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
> > ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
> > v6.15-rc1.
> >
> > 2. With $subject patch applied to enable SATA & GBit ethernet again,
> > system suspend/resume stopped working after commit f3ac2ff14834
> > ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
> > platforms") merged to v6.18-rc1.
> >
Yes, this was expected as if you don't disable ACS, it will cause issues in
detecting the devices.
> > If I got it right, according to the latter commit message, some quirks
> > have to be added to fix the suspend/resume issue. Unfortunately I have
> > no idea if this is the Juno R1 or the given PCI devices specific issue.
>
>
> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
> L1 for devicetree platforms") doesn't fix the suspend/resume issue
> either (with $subject patchset applied on top of it).
>
Interesting. Can you do:
echo performance > /sys/module/pcie_aspm/parameters/policy
and then suspend?
- Mani
--
மணிவண்ணன் சதாசிவம்
On 09.12.2025 12:15, Manivannan Sadhasivam wrote:
> On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
>> On 09.12.2025 08:31, Marek Szyprowski wrote:
>>> On 04.12.2025 14:13, Marek Szyprowski wrote:
>>>> On 03.12.2025 13:04, Marek Szyprowski wrote:
>>>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>>>>>> This series fixes the long standing issue with ACS in OF platforms.
>>>>>> There are
>>>>>> two fixes in this series, both fixing independent issues on their
>>>>>> own, but both
>>>>>> are needed to properly enable ACS on OF platforms.
>>>>>>
>>>>>> Issue(s) background
>>>>>> ===================
>>>>>>
>>>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
>>>>>> HiSilicon SEC
>>>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
>>>>>> the issue to
>>>>>> ACS not being enabled for the QEMU Root Port device and he proposed
>>>>>> a patch to
>>>>>> fix it [2].
>>>>>>
>>>>>> Once the patch got applied, people reported PCIe issues with
>>>>>> linux-next on the
>>>>>> ARM Juno Development boards, where they saw failure in enumerating
>>>>>> the endpoint
>>>>>> devices [3][4]. So soon, the patch got dropped, but the actual
>>>>>> issue with the
>>>>>> ARM Juno boards was left behind.
>>>>>>
>>>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
>>>>>> own usecase,
>>>>>> hoping that someone in the community would fix the issue with ARM
>>>>>> Juno boards.
>>>>>> But the patch was rightly rejected, as a patch that was known to
>>>>>> cause issues
>>>>>> should not be merged to the kernel. But again, no one investigated
>>>>>> the Juno
>>>>>> issue and it was left behind again.
>>>>>>
>>>>>> Now it ended up in my plate and I managed to track down the issue
>>>>>> with the help
>>>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
>>>>>> was with the
>>>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
>>>>>> Validation error on
>>>>>> Completions received for the Configuration Read Request from a
>>>>>> device connected
>>>>>> to the downstream port that has not yet captured the PCIe bus
>>>>>> number. As per the
>>>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
>>>>>> Device Numbers
>>>>>> supplied with all Type 0 Configuration Write Requests completed by
>>>>>> the Function
>>>>>> and supply these numbers in the Bus and Device Number fields of the
>>>>>> Requester ID
>>>>>> for all Requests". So during the first Configuration Read Request
>>>>>> issued by the
>>>>>> switch downstream port during enumeration (for reading Vendor ID),
>>>>>> Bus and
>>>>>> Device numbers will be unknown to the device. So it responds to the
>>>>>> Read Request
>>>>>> with Completion having Bus and Device number as 0. The switch
>>>>>> interprets the
>>>>>> Completion as an ACS Source Validation error and drops the
>>>>>> completion, leading
>>>>>> to the failure in detecting the endpoint device. Though the PCIe
>>>>>> spec r6.0, sec
>>>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
>>>>>> Validation".
>>>>>> This behavior is in violation of the spec.
>>>>>>
>>>>>> Solution
>>>>>> ========
>>>>>>
>>>>>> In September, I submitted a series [6] to fix both issues. For the
>>>>>> IDT issue,
>>>>>> I reused the existing quirk in the PCI core which does a dummy
>>>>>> config write
>>>>>> before issuing the first config read to the device. And for the ACS
>>>>>> enablement
>>>>>> issue, I just resubmitted the original patch from Xingang which called
>>>>>> pci_request_acs() from devm_of_pci_bridge_init().
>>>>>>
>>>>>> But during the review of the series, several comments were received
>>>>>> and they
>>>>>> required the series to be reworked completely. Hence, in this
>>>>>> version, I've
>>>>>> incorported the comments as below:
>>>>>>
>>>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
>>>>>> call from
>>>>>> pci_acs_init() to pci_dma_configure().
>>>>>>
>>>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
>>>>>> 'pci_dev',
>>>>>> collected the broken capability for the IDT switches in the quirk
>>>>>> and used it to
>>>>>> disable the capability in the cache. This also allowed me to get
>>>>>> rid of the
>>>>>> earlier workaround for the switch.
>>>>>>
>>>>>> [1]
>>>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>>>>>> [2]
>>>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>>>>>> [3]
>>>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>>>>>> [4]
>>>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>>>>>> [5]
>>>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>>>>>> [6]
>>>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>>>>>
>>>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
>>>>> looks that it almost works fine. This patchset even fixed some
>>>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
>>>>> devices, which were missing since Linux v6.14 (it looks that
>>>>> I've also missed this in my tests).
>>>>>
>>>>> # lspci
>>>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
>>>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
>>>>> ATA Raid II Controller (rev 01)
>>>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
>>>>> PCI-E Gigabit Ethernet Controller
>>>>>
>>>>> However there is also a regression. After applying this patchset
>>>>> system suspend/resume stopped working. This is probably related to
>>>>> this message:
>>>>>
>>>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
>>>>> D3hot, device inaccessible
>>>>>
>>>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
>>>>> related to this patchset, so I probably need to apply it on older
>>>>> kernel releases and check.
>>>>
>>>> Just one more information - I've applied this patchset on top of
>>>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
>>>> are visible again and system suspend/resume works too, so the issue
>>>> with the latter on top of v6.18 seems not to be directly related to
>>>> $subject patchset. I will try to bisect this issue when I have some
>>>> spare time.
>>>>
>>>> Feel free to add:
>>>>
>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>
>>> I spent some time analyzing this regression on Juno R1 and found that:
>>>
>>> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
>>> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
>>> v6.15-rc1.
>>>
>>> 2. With $subject patch applied to enable SATA & GBit ethernet again,
>>> system suspend/resume stopped working after commit f3ac2ff14834
>>> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
>>> platforms") merged to v6.18-rc1.
>>>
> Yes, this was expected as if you don't disable ACS, it will cause issues in
> detecting the devices.
>
>>> If I got it right, according to the latter commit message, some quirks
>>> have to be added to fix the suspend/resume issue. Unfortunately I have
>>> no idea if this is the Juno R1 or the given PCI devices specific issue.
>>
>> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
>> L1 for devicetree platforms") doesn't fix the suspend/resume issue
>> either (with $subject patchset applied on top of it).
>>
> Interesting. Can you do:
>
> echo performance > /sys/module/pcie_aspm/parameters/policy
>
> and then suspend?
After the above command, system suspend/resume works again.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Tue, Dec 09, 2025 at 01:00:55PM +0100, Marek Szyprowski wrote:
> On 09.12.2025 12:15, Manivannan Sadhasivam wrote:
> > On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
> >> On 09.12.2025 08:31, Marek Szyprowski wrote:
> >>> On 04.12.2025 14:13, Marek Szyprowski wrote:
> >>>> On 03.12.2025 13:04, Marek Szyprowski wrote:
> >>>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
> >>>>>> This series fixes the long standing issue with ACS in OF platforms.
> >>>>>> There are
> >>>>>> two fixes in this series, both fixing independent issues on their
> >>>>>> own, but both
> >>>>>> are needed to properly enable ACS on OF platforms.
> >>>>>>
> >>>>>> Issue(s) background
> >>>>>> ===================
> >>>>>>
> >>>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
> >>>>>> HiSilicon SEC
> >>>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
> >>>>>> the issue to
> >>>>>> ACS not being enabled for the QEMU Root Port device and he proposed
> >>>>>> a patch to
> >>>>>> fix it [2].
> >>>>>>
> >>>>>> Once the patch got applied, people reported PCIe issues with
> >>>>>> linux-next on the
> >>>>>> ARM Juno Development boards, where they saw failure in enumerating
> >>>>>> the endpoint
> >>>>>> devices [3][4]. So soon, the patch got dropped, but the actual
> >>>>>> issue with the
> >>>>>> ARM Juno boards was left behind.
> >>>>>>
> >>>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
> >>>>>> own usecase,
> >>>>>> hoping that someone in the community would fix the issue with ARM
> >>>>>> Juno boards.
> >>>>>> But the patch was rightly rejected, as a patch that was known to
> >>>>>> cause issues
> >>>>>> should not be merged to the kernel. But again, no one investigated
> >>>>>> the Juno
> >>>>>> issue and it was left behind again.
> >>>>>>
> >>>>>> Now it ended up in my plate and I managed to track down the issue
> >>>>>> with the help
> >>>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
> >>>>>> was with the
> >>>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
> >>>>>> Validation error on
> >>>>>> Completions received for the Configuration Read Request from a
> >>>>>> device connected
> >>>>>> to the downstream port that has not yet captured the PCIe bus
> >>>>>> number. As per the
> >>>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
> >>>>>> Device Numbers
> >>>>>> supplied with all Type 0 Configuration Write Requests completed by
> >>>>>> the Function
> >>>>>> and supply these numbers in the Bus and Device Number fields of the
> >>>>>> Requester ID
> >>>>>> for all Requests". So during the first Configuration Read Request
> >>>>>> issued by the
> >>>>>> switch downstream port during enumeration (for reading Vendor ID),
> >>>>>> Bus and
> >>>>>> Device numbers will be unknown to the device. So it responds to the
> >>>>>> Read Request
> >>>>>> with Completion having Bus and Device number as 0. The switch
> >>>>>> interprets the
> >>>>>> Completion as an ACS Source Validation error and drops the
> >>>>>> completion, leading
> >>>>>> to the failure in detecting the endpoint device. Though the PCIe
> >>>>>> spec r6.0, sec
> >>>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
> >>>>>> Validation".
> >>>>>> This behavior is in violation of the spec.
> >>>>>>
> >>>>>> Solution
> >>>>>> ========
> >>>>>>
> >>>>>> In September, I submitted a series [6] to fix both issues. For the
> >>>>>> IDT issue,
> >>>>>> I reused the existing quirk in the PCI core which does a dummy
> >>>>>> config write
> >>>>>> before issuing the first config read to the device. And for the ACS
> >>>>>> enablement
> >>>>>> issue, I just resubmitted the original patch from Xingang which called
> >>>>>> pci_request_acs() from devm_of_pci_bridge_init().
> >>>>>>
> >>>>>> But during the review of the series, several comments were received
> >>>>>> and they
> >>>>>> required the series to be reworked completely. Hence, in this
> >>>>>> version, I've
> >>>>>> incorported the comments as below:
> >>>>>>
> >>>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
> >>>>>> call from
> >>>>>> pci_acs_init() to pci_dma_configure().
> >>>>>>
> >>>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
> >>>>>> 'pci_dev',
> >>>>>> collected the broken capability for the IDT switches in the quirk
> >>>>>> and used it to
> >>>>>> disable the capability in the cache. This also allowed me to get
> >>>>>> rid of the
> >>>>>> earlier workaround for the switch.
> >>>>>>
> >>>>>> [1]
> >>>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
> >>>>>> [2]
> >>>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
> >>>>>> [3]
> >>>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
> >>>>>> [4]
> >>>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
> >>>>>> [5]
> >>>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
> >>>>>> [6]
> >>>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
> >>>>>>
> >>>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
> >>>>> looks that it almost works fine. This patchset even fixed some
> >>>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
> >>>>> devices, which were missing since Linux v6.14 (it looks that
> >>>>> I've also missed this in my tests).
> >>>>>
> >>>>> # lspci
> >>>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
> >>>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
> >>>>> ATA Raid II Controller (rev 01)
> >>>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
> >>>>> PCI-E Gigabit Ethernet Controller
> >>>>>
> >>>>> However there is also a regression. After applying this patchset
> >>>>> system suspend/resume stopped working. This is probably related to
> >>>>> this message:
> >>>>>
> >>>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
> >>>>> D3hot, device inaccessible
> >>>>>
> >>>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
> >>>>> related to this patchset, so I probably need to apply it on older
> >>>>> kernel releases and check.
> >>>>
> >>>> Just one more information - I've applied this patchset on top of
> >>>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
> >>>> are visible again and system suspend/resume works too, so the issue
> >>>> with the latter on top of v6.18 seems not to be directly related to
> >>>> $subject patchset. I will try to bisect this issue when I have some
> >>>> spare time.
> >>>>
> >>>> Feel free to add:
> >>>>
> >>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >>>
> >>> I spent some time analyzing this regression on Juno R1 and found that:
> >>>
> >>> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
> >>> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
> >>> v6.15-rc1.
> >>>
> >>> 2. With $subject patch applied to enable SATA & GBit ethernet again,
> >>> system suspend/resume stopped working after commit f3ac2ff14834
> >>> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
> >>> platforms") merged to v6.18-rc1.
> >>>
> > Yes, this was expected as if you don't disable ACS, it will cause issues in
> > detecting the devices.
> >
> >>> If I got it right, according to the latter commit message, some quirks
> >>> have to be added to fix the suspend/resume issue. Unfortunately I have
> >>> no idea if this is the Juno R1 or the given PCI devices specific issue.
> >>
> >> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
> >> L1 for devicetree platforms") doesn't fix the suspend/resume issue
> >> either (with $subject patchset applied on top of it).
> >>
> > Interesting. Can you do:
> >
> > echo performance > /sys/module/pcie_aspm/parameters/policy
> >
> > and then suspend?
>
> After the above command, system suspend/resume works again.
>
Ok, so ASPM L0s/L1 seems to be the issue. But I'm not quite sure why it causes
issue during suspend/resume. If the device/controller doesn't play well with
ASPM L0s/L1, it should atleast cause the issue before entering suspend.
I'm clueless here atm...
- Mani
--
மணிவண்ணன் சதாசிவம்
On 09.12.2025 16:04, Manivannan Sadhasivam wrote:
> On Tue, Dec 09, 2025 at 01:00:55PM +0100, Marek Szyprowski wrote:
>> On 09.12.2025 12:15, Manivannan Sadhasivam wrote:
>>> On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
>>>> On 09.12.2025 08:31, Marek Szyprowski wrote:
>>>>> On 04.12.2025 14:13, Marek Szyprowski wrote:
>>>>>> On 03.12.2025 13:04, Marek Szyprowski wrote:
>>>>>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>>>>>>>> This series fixes the long standing issue with ACS in OF platforms.
>>>>>>>> There are
>>>>>>>> two fixes in this series, both fixing independent issues on their
>>>>>>>> own, but both
>>>>>>>> are needed to properly enable ACS on OF platforms.
>>>>>>>>
>>>>>>>> Issue(s) background
>>>>>>>> ===================
>>>>>>>>
>>>>>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
>>>>>>>> HiSilicon SEC
>>>>>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
>>>>>>>> the issue to
>>>>>>>> ACS not being enabled for the QEMU Root Port device and he proposed
>>>>>>>> a patch to
>>>>>>>> fix it [2].
>>>>>>>>
>>>>>>>> Once the patch got applied, people reported PCIe issues with
>>>>>>>> linux-next on the
>>>>>>>> ARM Juno Development boards, where they saw failure in enumerating
>>>>>>>> the endpoint
>>>>>>>> devices [3][4]. So soon, the patch got dropped, but the actual
>>>>>>>> issue with the
>>>>>>>> ARM Juno boards was left behind.
>>>>>>>>
>>>>>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
>>>>>>>> own usecase,
>>>>>>>> hoping that someone in the community would fix the issue with ARM
>>>>>>>> Juno boards.
>>>>>>>> But the patch was rightly rejected, as a patch that was known to
>>>>>>>> cause issues
>>>>>>>> should not be merged to the kernel. But again, no one investigated
>>>>>>>> the Juno
>>>>>>>> issue and it was left behind again.
>>>>>>>>
>>>>>>>> Now it ended up in my plate and I managed to track down the issue
>>>>>>>> with the help
>>>>>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
>>>>>>>> was with the
>>>>>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
>>>>>>>> Validation error on
>>>>>>>> Completions received for the Configuration Read Request from a
>>>>>>>> device connected
>>>>>>>> to the downstream port that has not yet captured the PCIe bus
>>>>>>>> number. As per the
>>>>>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
>>>>>>>> Device Numbers
>>>>>>>> supplied with all Type 0 Configuration Write Requests completed by
>>>>>>>> the Function
>>>>>>>> and supply these numbers in the Bus and Device Number fields of the
>>>>>>>> Requester ID
>>>>>>>> for all Requests". So during the first Configuration Read Request
>>>>>>>> issued by the
>>>>>>>> switch downstream port during enumeration (for reading Vendor ID),
>>>>>>>> Bus and
>>>>>>>> Device numbers will be unknown to the device. So it responds to the
>>>>>>>> Read Request
>>>>>>>> with Completion having Bus and Device number as 0. The switch
>>>>>>>> interprets the
>>>>>>>> Completion as an ACS Source Validation error and drops the
>>>>>>>> completion, leading
>>>>>>>> to the failure in detecting the endpoint device. Though the PCIe
>>>>>>>> spec r6.0, sec
>>>>>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
>>>>>>>> Validation".
>>>>>>>> This behavior is in violation of the spec.
>>>>>>>>
>>>>>>>> Solution
>>>>>>>> ========
>>>>>>>>
>>>>>>>> In September, I submitted a series [6] to fix both issues. For the
>>>>>>>> IDT issue,
>>>>>>>> I reused the existing quirk in the PCI core which does a dummy
>>>>>>>> config write
>>>>>>>> before issuing the first config read to the device. And for the ACS
>>>>>>>> enablement
>>>>>>>> issue, I just resubmitted the original patch from Xingang which called
>>>>>>>> pci_request_acs() from devm_of_pci_bridge_init().
>>>>>>>>
>>>>>>>> But during the review of the series, several comments were received
>>>>>>>> and they
>>>>>>>> required the series to be reworked completely. Hence, in this
>>>>>>>> version, I've
>>>>>>>> incorported the comments as below:
>>>>>>>>
>>>>>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
>>>>>>>> call from
>>>>>>>> pci_acs_init() to pci_dma_configure().
>>>>>>>>
>>>>>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
>>>>>>>> 'pci_dev',
>>>>>>>> collected the broken capability for the IDT switches in the quirk
>>>>>>>> and used it to
>>>>>>>> disable the capability in the cache. This also allowed me to get
>>>>>>>> rid of the
>>>>>>>> earlier workaround for the switch.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>>>>>>>> [2]
>>>>>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>>>>>>>> [3]
>>>>>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>>>>>>>> [4]
>>>>>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>>>>>>>> [5]
>>>>>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>>>>>>>> [6]
>>>>>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>>>>>>>
>>>>>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
>>>>>>> looks that it almost works fine. This patchset even fixed some
>>>>>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
>>>>>>> devices, which were missing since Linux v6.14 (it looks that
>>>>>>> I've also missed this in my tests).
>>>>>>>
>>>>>>> # lspci
>>>>>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
>>>>>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
>>>>>>> ATA Raid II Controller (rev 01)
>>>>>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
>>>>>>> PCI-E Gigabit Ethernet Controller
>>>>>>>
>>>>>>> However there is also a regression. After applying this patchset
>>>>>>> system suspend/resume stopped working. This is probably related to
>>>>>>> this message:
>>>>>>>
>>>>>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
>>>>>>> D3hot, device inaccessible
>>>>>>>
>>>>>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
>>>>>>> related to this patchset, so I probably need to apply it on older
>>>>>>> kernel releases and check.
>>>>>> Just one more information - I've applied this patchset on top of
>>>>>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
>>>>>> are visible again and system suspend/resume works too, so the issue
>>>>>> with the latter on top of v6.18 seems not to be directly related to
>>>>>> $subject patchset. I will try to bisect this issue when I have some
>>>>>> spare time.
>>>>>>
>>>>>> Feel free to add:
>>>>>>
>>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>> I spent some time analyzing this regression on Juno R1 and found that:
>>>>>
>>>>> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
>>>>> v6.15-rc1.
>>>>>
>>>>> 2. With $subject patch applied to enable SATA & GBit ethernet again,
>>>>> system suspend/resume stopped working after commit f3ac2ff14834
>>>>> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
>>>>> platforms") merged to v6.18-rc1.
>>>>>
>>> Yes, this was expected as if you don't disable ACS, it will cause issues in
>>> detecting the devices.
>>>
>>>>> If I got it right, according to the latter commit message, some quirks
>>>>> have to be added to fix the suspend/resume issue. Unfortunately I have
>>>>> no idea if this is the Juno R1 or the given PCI devices specific issue.
>>>> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
>>>> L1 for devicetree platforms") doesn't fix the suspend/resume issue
>>>> either (with $subject patchset applied on top of it).
>>>>
>>> Interesting. Can you do:
>>>
>>> echo performance > /sys/module/pcie_aspm/parameters/policy
>>>
>>> and then suspend?
>> After the above command, system suspend/resume works again.
>>
> Ok, so ASPM L0s/L1 seems to be the issue. But I'm not quite sure why it causes
> issue during suspend/resume. If the device/controller doesn't play well with
> ASPM L0s/L1, it should atleast cause the issue before entering suspend.
>
> I'm clueless here atm...
Definitely something gets broken during suspend, after adding
'no_console_suspend' to kernel command line I see the following messages:
# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Dec 10 17:04:12 2025
PM: suspend entry (deep)
Filesystems sync: 0.001 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.005 seconds)
OOM killer disabled.
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.003 seconds)
psmouse serio1: Failed to disable mouse on 1c070000.kmi
psmouse serio0: Failed to disable mouse on 1c060000.kmi
pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot,
device inaccessible
Disabling non-boot CPUs ...
psci: CPU5 killed (polled 0 ms)
psci: CPU4 killed (polled 0 ms)
psci: CPU3 killed (polled 0 ms)
psci: CPU2 killed (polled 0 ms)
psci: CPU1 killed (polled 4 ms)
and system never wakes up.
I assume that this 'pcieport 0000:02:1f.0: Unable to change power state
from D0 to D3hot, device inaccessible' message is crucial here. It
doesn't appear when I change the pcie_aspm policy to performance (as You
suggested in previous mail).
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Wed, Dec 10, 2025 at 06:26:27PM +0100, Marek Szyprowski wrote:
> On 09.12.2025 16:04, Manivannan Sadhasivam wrote:
> > On Tue, Dec 09, 2025 at 01:00:55PM +0100, Marek Szyprowski wrote:
> >> On 09.12.2025 12:15, Manivannan Sadhasivam wrote:
> >>> On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
> >>>> On 09.12.2025 08:31, Marek Szyprowski wrote:
> >>>>> On 04.12.2025 14:13, Marek Szyprowski wrote:
> >>>>>> On 03.12.2025 13:04, Marek Szyprowski wrote:
> >>>>>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
> >>>>>>>> This series fixes the long standing issue with ACS in OF platforms.
> >>>>>>>> There are
> >>>>>>>> two fixes in this series, both fixing independent issues on their
> >>>>>>>> own, but both
> >>>>>>>> are needed to properly enable ACS on OF platforms.
> >>>>>>>>
> >>>>>>>> Issue(s) background
> >>>>>>>> ===================
> >>>>>>>>
> >>>>>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
> >>>>>>>> HiSilicon SEC
> >>>>>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
> >>>>>>>> the issue to
> >>>>>>>> ACS not being enabled for the QEMU Root Port device and he proposed
> >>>>>>>> a patch to
> >>>>>>>> fix it [2].
> >>>>>>>>
> >>>>>>>> Once the patch got applied, people reported PCIe issues with
> >>>>>>>> linux-next on the
> >>>>>>>> ARM Juno Development boards, where they saw failure in enumerating
> >>>>>>>> the endpoint
> >>>>>>>> devices [3][4]. So soon, the patch got dropped, but the actual
> >>>>>>>> issue with the
> >>>>>>>> ARM Juno boards was left behind.
> >>>>>>>>
> >>>>>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
> >>>>>>>> own usecase,
> >>>>>>>> hoping that someone in the community would fix the issue with ARM
> >>>>>>>> Juno boards.
> >>>>>>>> But the patch was rightly rejected, as a patch that was known to
> >>>>>>>> cause issues
> >>>>>>>> should not be merged to the kernel. But again, no one investigated
> >>>>>>>> the Juno
> >>>>>>>> issue and it was left behind again.
> >>>>>>>>
> >>>>>>>> Now it ended up in my plate and I managed to track down the issue
> >>>>>>>> with the help
> >>>>>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
> >>>>>>>> was with the
> >>>>>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
> >>>>>>>> Validation error on
> >>>>>>>> Completions received for the Configuration Read Request from a
> >>>>>>>> device connected
> >>>>>>>> to the downstream port that has not yet captured the PCIe bus
> >>>>>>>> number. As per the
> >>>>>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
> >>>>>>>> Device Numbers
> >>>>>>>> supplied with all Type 0 Configuration Write Requests completed by
> >>>>>>>> the Function
> >>>>>>>> and supply these numbers in the Bus and Device Number fields of the
> >>>>>>>> Requester ID
> >>>>>>>> for all Requests". So during the first Configuration Read Request
> >>>>>>>> issued by the
> >>>>>>>> switch downstream port during enumeration (for reading Vendor ID),
> >>>>>>>> Bus and
> >>>>>>>> Device numbers will be unknown to the device. So it responds to the
> >>>>>>>> Read Request
> >>>>>>>> with Completion having Bus and Device number as 0. The switch
> >>>>>>>> interprets the
> >>>>>>>> Completion as an ACS Source Validation error and drops the
> >>>>>>>> completion, leading
> >>>>>>>> to the failure in detecting the endpoint device. Though the PCIe
> >>>>>>>> spec r6.0, sec
> >>>>>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
> >>>>>>>> Validation".
> >>>>>>>> This behavior is in violation of the spec.
> >>>>>>>>
> >>>>>>>> Solution
> >>>>>>>> ========
> >>>>>>>>
> >>>>>>>> In September, I submitted a series [6] to fix both issues. For the
> >>>>>>>> IDT issue,
> >>>>>>>> I reused the existing quirk in the PCI core which does a dummy
> >>>>>>>> config write
> >>>>>>>> before issuing the first config read to the device. And for the ACS
> >>>>>>>> enablement
> >>>>>>>> issue, I just resubmitted the original patch from Xingang which called
> >>>>>>>> pci_request_acs() from devm_of_pci_bridge_init().
> >>>>>>>>
> >>>>>>>> But during the review of the series, several comments were received
> >>>>>>>> and they
> >>>>>>>> required the series to be reworked completely. Hence, in this
> >>>>>>>> version, I've
> >>>>>>>> incorported the comments as below:
> >>>>>>>>
> >>>>>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
> >>>>>>>> call from
> >>>>>>>> pci_acs_init() to pci_dma_configure().
> >>>>>>>>
> >>>>>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
> >>>>>>>> 'pci_dev',
> >>>>>>>> collected the broken capability for the IDT switches in the quirk
> >>>>>>>> and used it to
> >>>>>>>> disable the capability in the cache. This also allowed me to get
> >>>>>>>> rid of the
> >>>>>>>> earlier workaround for the switch.
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
> >>>>>>>> [2]
> >>>>>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
> >>>>>>>> [3]
> >>>>>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
> >>>>>>>> [4]
> >>>>>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
> >>>>>>>> [5]
> >>>>>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
> >>>>>>>> [6]
> >>>>>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
> >>>>>>>>
> >>>>>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
> >>>>>>> looks that it almost works fine. This patchset even fixed some
> >>>>>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
> >>>>>>> devices, which were missing since Linux v6.14 (it looks that
> >>>>>>> I've also missed this in my tests).
> >>>>>>>
> >>>>>>> # lspci
> >>>>>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
> >>>>>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
> >>>>>>> ATA Raid II Controller (rev 01)
> >>>>>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
> >>>>>>> PCI-E Gigabit Ethernet Controller
> >>>>>>>
> >>>>>>> However there is also a regression. After applying this patchset
> >>>>>>> system suspend/resume stopped working. This is probably related to
> >>>>>>> this message:
> >>>>>>>
> >>>>>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
> >>>>>>> D3hot, device inaccessible
> >>>>>>>
> >>>>>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
> >>>>>>> related to this patchset, so I probably need to apply it on older
> >>>>>>> kernel releases and check.
> >>>>>> Just one more information - I've applied this patchset on top of
> >>>>>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
> >>>>>> are visible again and system suspend/resume works too, so the issue
> >>>>>> with the latter on top of v6.18 seems not to be directly related to
> >>>>>> $subject patchset. I will try to bisect this issue when I have some
> >>>>>> spare time.
> >>>>>>
> >>>>>> Feel free to add:
> >>>>>>
> >>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >>>>> I spent some time analyzing this regression on Juno R1 and found that:
> >>>>>
> >>>>> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
> >>>>> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
> >>>>> v6.15-rc1.
> >>>>>
> >>>>> 2. With $subject patch applied to enable SATA & GBit ethernet again,
> >>>>> system suspend/resume stopped working after commit f3ac2ff14834
> >>>>> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
> >>>>> platforms") merged to v6.18-rc1.
> >>>>>
> >>> Yes, this was expected as if you don't disable ACS, it will cause issues in
> >>> detecting the devices.
> >>>
> >>>>> If I got it right, according to the latter commit message, some quirks
> >>>>> have to be added to fix the suspend/resume issue. Unfortunately I have
> >>>>> no idea if this is the Juno R1 or the given PCI devices specific issue.
> >>>> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
> >>>> L1 for devicetree platforms") doesn't fix the suspend/resume issue
> >>>> either (with $subject patchset applied on top of it).
> >>>>
> >>> Interesting. Can you do:
> >>>
> >>> echo performance > /sys/module/pcie_aspm/parameters/policy
> >>>
> >>> and then suspend?
> >> After the above command, system suspend/resume works again.
> >>
> > Ok, so ASPM L0s/L1 seems to be the issue. But I'm not quite sure why it causes
> > issue during suspend/resume. If the device/controller doesn't play well with
> > ASPM L0s/L1, it should atleast cause the issue before entering suspend.
> >
> > I'm clueless here atm...
>
> Definitely something gets broken during suspend, after adding
> 'no_console_suspend' to kernel command line I see the following messages:
>
> # time rtcwake -s10 -mmem
> rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Dec 10 17:04:12 2025
> PM: suspend entry (deep)
> Filesystems sync: 0.001 seconds
> Freezing user space processes
> Freezing user space processes completed (elapsed 0.005 seconds)
> OOM killer disabled.
> Freezing remaining freezable tasks
> Freezing remaining freezable tasks completed (elapsed 0.003 seconds)
> psmouse serio1: Failed to disable mouse on 1c070000.kmi
> psmouse serio0: Failed to disable mouse on 1c060000.kmi
> pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot,
> device inaccessible
The device just got blown off the bus at this point. But it is unclear to me why
it happens though if we enable ASPM L0s/L1. I don't think the firmware has
gotten the chance to turn off the power to devices.
So maybe some actions that we do in the PCI core during system suspend is
affecting the device state. But can you try to access the device by doing:
lspci -vvv -s 0000:02:1f.0
before initiating system suspend. Just to make sure if the issue happens during
suspend or way before that.
- Mani
--
மணிவண்ணன் சதாசிவம்
On 12.12.2025 05:02, Manivannan Sadhasivam wrote:
> On Wed, Dec 10, 2025 at 06:26:27PM +0100, Marek Szyprowski wrote:
>> On 09.12.2025 16:04, Manivannan Sadhasivam wrote:
>>> On Tue, Dec 09, 2025 at 01:00:55PM +0100, Marek Szyprowski wrote:
>>>> On 09.12.2025 12:15, Manivannan Sadhasivam wrote:
>>>>> On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
>>>>>> On 09.12.2025 08:31, Marek Szyprowski wrote:
>>>>>>> On 04.12.2025 14:13, Marek Szyprowski wrote:
>>>>>>>> On 03.12.2025 13:04, Marek Szyprowski wrote:
>>>>>>>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>>>>>>>>>> This series fixes the long standing issue with ACS in OF platforms.
>>>>>>>>>> There are
>>>>>>>>>> two fixes in this series, both fixing independent issues on their
>>>>>>>>>> own, but both
>>>>>>>>>> are needed to properly enable ACS on OF platforms.
>>>>>>>>>>
>>>>>>>>>> Issue(s) background
>>>>>>>>>> ===================
>>>>>>>>>>
>>>>>>>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
>>>>>>>>>> HiSilicon SEC
>>>>>>>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
>>>>>>>>>> the issue to
>>>>>>>>>> ACS not being enabled for the QEMU Root Port device and he proposed
>>>>>>>>>> a patch to
>>>>>>>>>> fix it [2].
>>>>>>>>>>
>>>>>>>>>> Once the patch got applied, people reported PCIe issues with
>>>>>>>>>> linux-next on the
>>>>>>>>>> ARM Juno Development boards, where they saw failure in enumerating
>>>>>>>>>> the endpoint
>>>>>>>>>> devices [3][4]. So soon, the patch got dropped, but the actual
>>>>>>>>>> issue with the
>>>>>>>>>> ARM Juno boards was left behind.
>>>>>>>>>>
>>>>>>>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
>>>>>>>>>> own usecase,
>>>>>>>>>> hoping that someone in the community would fix the issue with ARM
>>>>>>>>>> Juno boards.
>>>>>>>>>> But the patch was rightly rejected, as a patch that was known to
>>>>>>>>>> cause issues
>>>>>>>>>> should not be merged to the kernel. But again, no one investigated
>>>>>>>>>> the Juno
>>>>>>>>>> issue and it was left behind again.
>>>>>>>>>>
>>>>>>>>>> Now it ended up in my plate and I managed to track down the issue
>>>>>>>>>> with the help
>>>>>>>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
>>>>>>>>>> was with the
>>>>>>>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
>>>>>>>>>> Validation error on
>>>>>>>>>> Completions received for the Configuration Read Request from a
>>>>>>>>>> device connected
>>>>>>>>>> to the downstream port that has not yet captured the PCIe bus
>>>>>>>>>> number. As per the
>>>>>>>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
>>>>>>>>>> Device Numbers
>>>>>>>>>> supplied with all Type 0 Configuration Write Requests completed by
>>>>>>>>>> the Function
>>>>>>>>>> and supply these numbers in the Bus and Device Number fields of the
>>>>>>>>>> Requester ID
>>>>>>>>>> for all Requests". So during the first Configuration Read Request
>>>>>>>>>> issued by the
>>>>>>>>>> switch downstream port during enumeration (for reading Vendor ID),
>>>>>>>>>> Bus and
>>>>>>>>>> Device numbers will be unknown to the device. So it responds to the
>>>>>>>>>> Read Request
>>>>>>>>>> with Completion having Bus and Device number as 0. The switch
>>>>>>>>>> interprets the
>>>>>>>>>> Completion as an ACS Source Validation error and drops the
>>>>>>>>>> completion, leading
>>>>>>>>>> to the failure in detecting the endpoint device. Though the PCIe
>>>>>>>>>> spec r6.0, sec
>>>>>>>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
>>>>>>>>>> Validation".
>>>>>>>>>> This behavior is in violation of the spec.
>>>>>>>>>>
>>>>>>>>>> Solution
>>>>>>>>>> ========
>>>>>>>>>>
>>>>>>>>>> In September, I submitted a series [6] to fix both issues. For the
>>>>>>>>>> IDT issue,
>>>>>>>>>> I reused the existing quirk in the PCI core which does a dummy
>>>>>>>>>> config write
>>>>>>>>>> before issuing the first config read to the device. And for the ACS
>>>>>>>>>> enablement
>>>>>>>>>> issue, I just resubmitted the original patch from Xingang which called
>>>>>>>>>> pci_request_acs() from devm_of_pci_bridge_init().
>>>>>>>>>>
>>>>>>>>>> But during the review of the series, several comments were received
>>>>>>>>>> and they
>>>>>>>>>> required the series to be reworked completely. Hence, in this
>>>>>>>>>> version, I've
>>>>>>>>>> incorported the comments as below:
>>>>>>>>>>
>>>>>>>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
>>>>>>>>>> call from
>>>>>>>>>> pci_acs_init() to pci_dma_configure().
>>>>>>>>>>
>>>>>>>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
>>>>>>>>>> 'pci_dev',
>>>>>>>>>> collected the broken capability for the IDT switches in the quirk
>>>>>>>>>> and used it to
>>>>>>>>>> disable the capability in the cache. This also allowed me to get
>>>>>>>>>> rid of the
>>>>>>>>>> earlier workaround for the switch.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>>>>>>>>>> [2]
>>>>>>>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>>>>>>>>>> [3]
>>>>>>>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>>>>>>>>>> [4]
>>>>>>>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>>>>>>>>>> [5]
>>>>>>>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>>>>>>>>>> [6]
>>>>>>>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>>>>>>>>>
>>>>>>>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
>>>>>>>>> looks that it almost works fine. This patchset even fixed some
>>>>>>>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
>>>>>>>>> devices, which were missing since Linux v6.14 (it looks that
>>>>>>>>> I've also missed this in my tests).
>>>>>>>>>
>>>>>>>>> # lspci
>>>>>>>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
>>>>>>>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
>>>>>>>>> ATA Raid II Controller (rev 01)
>>>>>>>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
>>>>>>>>> PCI-E Gigabit Ethernet Controller
>>>>>>>>>
>>>>>>>>> However there is also a regression. After applying this patchset
>>>>>>>>> system suspend/resume stopped working. This is probably related to
>>>>>>>>> this message:
>>>>>>>>>
>>>>>>>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
>>>>>>>>> D3hot, device inaccessible
>>>>>>>>>
>>>>>>>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
>>>>>>>>> related to this patchset, so I probably need to apply it on older
>>>>>>>>> kernel releases and check.
>>>>>>>> Just one more information - I've applied this patchset on top of
>>>>>>>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
>>>>>>>> are visible again and system suspend/resume works too, so the issue
>>>>>>>> with the latter on top of v6.18 seems not to be directly related to
>>>>>>>> $subject patchset. I will try to bisect this issue when I have some
>>>>>>>> spare time.
>>>>>>>>
>>>>>>>> Feel free to add:
>>>>>>>>
>>>>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>> I spent some time analyzing this regression on Juno R1 and found that:
>>>>>>>
>>>>>>> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
>>>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
>>>>>>> v6.15-rc1.
>>>>>>>
>>>>>>> 2. With $subject patch applied to enable SATA & GBit ethernet again,
>>>>>>> system suspend/resume stopped working after commit f3ac2ff14834
>>>>>>> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
>>>>>>> platforms") merged to v6.18-rc1.
>>>>>>>
>>>>> Yes, this was expected as if you don't disable ACS, it will cause issues in
>>>>> detecting the devices.
>>>>>
>>>>>>> If I got it right, according to the latter commit message, some quirks
>>>>>>> have to be added to fix the suspend/resume issue. Unfortunately I have
>>>>>>> no idea if this is the Juno R1 or the given PCI devices specific issue.
>>>>>> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
>>>>>> L1 for devicetree platforms") doesn't fix the suspend/resume issue
>>>>>> either (with $subject patchset applied on top of it).
>>>>>>
>>>>> Interesting. Can you do:
>>>>>
>>>>> echo performance > /sys/module/pcie_aspm/parameters/policy
>>>>>
>>>>> and then suspend?
>>>> After the above command, system suspend/resume works again.
>>>>
>>> Ok, so ASPM L0s/L1 seems to be the issue. But I'm not quite sure why it causes
>>> issue during suspend/resume. If the device/controller doesn't play well with
>>> ASPM L0s/L1, it should atleast cause the issue before entering suspend.
>>>
>>> I'm clueless here atm...
>> Definitely something gets broken during suspend, after adding
>> 'no_console_suspend' to kernel command line I see the following messages:
>>
>> # time rtcwake -s10 -mmem
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Dec 10 17:04:12 2025
>> PM: suspend entry (deep)
>> Filesystems sync: 0.001 seconds
>> Freezing user space processes
>> Freezing user space processes completed (elapsed 0.005 seconds)
>> OOM killer disabled.
>> Freezing remaining freezable tasks
>> Freezing remaining freezable tasks completed (elapsed 0.003 seconds)
>> psmouse serio1: Failed to disable mouse on 1c070000.kmi
>> psmouse serio0: Failed to disable mouse on 1c060000.kmi
>> pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot,
>> device inaccessible
> The device just got blown off the bus at this point. But it is unclear to me why
> it happens though if we enable ASPM L0s/L1. I don't think the firmware has
> gotten the chance to turn off the power to devices.
>
> So maybe some actions that we do in the PCI core during system suspend is
> affecting the device state. But can you try to access the device by doing:
>
> lspci -vvv -s 0000:02:1f.0
>
> before initiating system suspend. Just to make sure if the issue happens during
> suspend or way before that.
Same as before:
root@target:~# lspci -vvv -s 0000:02:1f.0
02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090
(rev 02) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin ? routed to IRQ 50
Bus: primary=02, secondary=08, subordinate=08, sec-latency=0
I/O behind bridge: 00002000-00002fff
Memory behind bridge: 50100000-501fffff
Prefetchable memory behind bridge:
00000000fff00000-00000000000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Downstream Port (Slot-), MSI 00
DevCap: MaxPayload 2048 bytes, PhantFunc 0
ExtTag+ RBE+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
Unsupported+
RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1,
Exit Latency L0s <4us, L1 <4us
ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp-
LnkCtl: ASPM L0s L1 Enabled; Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive+ BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-,
LTR-, OBFF Not Supported ARIFwd+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
LTR-, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
Capabilities: [c0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fffbb040 Data: 00e7
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+
ChkEn-
Capabilities: [200 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128-
WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [320 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+
UpstreamFwd+ EgressCtrl+ DirectTrans+
ACSCtl: SrcValid- TransBlk- ReqRedir+ CmpltRedir+
UpstreamFwd+ EgressCtrl- DirectTrans-
Capabilities: [330 v1] #12
Kernel driver in use: pcieport
root@target:~# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Dec 12 07:23:50 2025
[ 110.529810] PM: suspend entry (deep)
[ 110.532688] Filesystems sync: 0.001 seconds
[ 110.549590] Freezing user space processes
[ 110.557833] Freezing user space processes completed (elapsed 0.008
seconds)
[ 110.558282] OOM killer disabled.
[ 110.558296] Freezing remaining freezable tasks
[ 110.561602] Freezing remaining freezable tasks completed (elapsed
0.003 seconds)
[ 110.736524] psmouse serio1: Failed to disable mouse on 1c070000.kmi
[ 111.071329] psmouse serio0: Failed to disable mouse on 1c060000.kmi
[ 111.700685] pcieport 0000:02:1f.0: Unable to change power state from
D0 to D3hot, device inaccessible
[ 111.737951] Disabling non-boot CPUs ...
[ 111.757973] psci: CPU5 killed (polled 0 ms)
[ 111.775215] psci: CPU4 killed (polled 0 ms)
[ 111.789725] psci: CPU3 killed (polled 4 ms)
[ 111.800778] psci: CPU2 killed (polled 0 ms)
[ 111.816363] psci: CPU1 killed (polled 0 ms)
(machine never wakes up)
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
On Tue, 2 Dec 2025 at 19:53, Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> wrote: > > Hi, > > This series fixes the long standing issue with ACS in OF platforms. There are > two fixes in this series, both fixing independent issues on their own, but both > are needed to properly enable ACS on OF platforms. > > Issue(s) background > =================== > > Back in 2021, Xingang Wang first noted a failure in attaching the HiSilicon SEC > device to QEMU ARM64 pci-root-port device [1]. He then tracked down the issue to > ACS not being enabled for the QEMU Root Port device and he proposed a patch to > fix it [2]. > > Once the patch got applied, people reported PCIe issues with linux-next on the > ARM Juno Development boards, where they saw failure in enumerating the endpoint > devices [3][4]. So soon, the patch got dropped, but the actual issue with the > ARM Juno boards was left behind. > > Fast forward to 2024, Pavan resubmitted the same fix [5] for his own usecase, > hoping that someone in the community would fix the issue with ARM Juno boards. > But the patch was rightly rejected, as a patch that was known to cause issues > should not be merged to the kernel. But again, no one investigated the Juno > issue and it was left behind again. > > Now it ended up in my plate and I managed to track down the issue with the help > of Naresh who got access to the Juno boards in LKFT. The Juno issue was with the > PCIe switch from Microsemi/IDT, which triggers ACS Source Validation error on > Completions received for the Configuration Read Request from a device connected > to the downstream port that has not yet captured the PCIe bus number. As per the > PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and Device Numbers > supplied with all Type 0 Configuration Write Requests completed by the Function > and supply these numbers in the Bus and Device Number fields of the Requester ID > for all Requests". So during the first Configuration Read Request issued by the > switch downstream port during enumeration (for reading Vendor ID), Bus and > Device numbers will be unknown to the device. So it responds to the Read Request > with Completion having Bus and Device number as 0. The switch interprets the > Completion as an ACS Source Validation error and drops the completion, leading > to the failure in detecting the endpoint device. Though the PCIe spec r6.0, sec > 6.12.1.1, states that "Completions are never affected by ACS Source Validation". > This behavior is in violation of the spec. > > Solution > ======== > > In September, I submitted a series [6] to fix both issues. For the IDT issue, > I reused the existing quirk in the PCI core which does a dummy config write > before issuing the first config read to the device. And for the ACS enablement > issue, I just resubmitted the original patch from Xingang which called > pci_request_acs() from devm_of_pci_bridge_init(). > > But during the review of the series, several comments were received and they > required the series to be reworked completely. Hence, in this version, I've > incorported the comments as below: > > 1. For the ACS enablement issue, I've moved the pci_enable_acs() call from > pci_acs_init() to pci_dma_configure(). > > 2. For the IDT issue, I've cached the ACS capabilities (RO) in 'pci_dev', > collected the broken capability for the IDT switches in the quirk and used it to > disable the capability in the cache. This also allowed me to get rid of the > earlier workaround for the switch. > > [1] https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com > [2] https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com > [3] https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com > [4] https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com > [5] https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com > [6] https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com > > Changes in v2: > > * Reworked the patches completely as mentioned above. > * Rebased on top of v6.18-rc7 > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> > --- > Manivannan Sadhasivam (4): > PCI: Enable ACS only after configuring IOMMU for OF platforms > PCI: Cache ACS capabilities > PCI: Disable ACS SV capability for the broken IDT switches > PCI: Extend the pci_disable_acs_sv quirk for one more IDT switch > > drivers/pci/pci-driver.c | 8 +++++++ > drivers/pci/pci.c | 33 ++++++++++++-------------- > drivers/pci/pci.h | 2 +- > drivers/pci/probe.c | 12 ---------- > drivers/pci/quirks.c | 62 ++++++++++++------------------------------------ > include/linux/pci.h | 2 ++ > 6 files changed, 41 insertions(+), 78 deletions(-) > --- > base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d > change-id: 20251201-pci_acs-b15aa3947289 > > Best regards, > -- > Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> >
© 2016 - 2026 Red Hat, Inc.