[PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms

Manivannan Sadhasivam posted 4 patches 2 months, 1 week ago
There is a newer version of this series
drivers/pci/pci-driver.c |  8 +++++++
drivers/pci/pci.c        | 33 ++++++++++++--------------
drivers/pci/pci.h        |  2 +-
drivers/pci/probe.c      | 12 ----------
drivers/pci/quirks.c     | 62 ++++++++++++------------------------------------
include/linux/pci.h      |  2 ++
6 files changed, 41 insertions(+), 78 deletions(-)
[PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Manivannan Sadhasivam 2 months, 1 week ago
Hi,

This series fixes the long standing issue with ACS in OF platforms. There are
two fixes in this series, both fixing independent issues on their own, but both
are needed to properly enable ACS on OF platforms.

Issue(s) background
===================

Back in 2021, Xingang Wang first noted a failure in attaching the HiSilicon SEC
device to QEMU ARM64 pci-root-port device [1]. He then tracked down the issue to
ACS not being enabled for the QEMU Root Port device and he proposed a patch to
fix it [2].

Once the patch got applied, people reported PCIe issues with linux-next on the
ARM Juno Development boards, where they saw failure in enumerating the endpoint
devices [3][4]. So soon, the patch got dropped, but the actual issue with the
ARM Juno boards was left behind.

Fast forward to 2024, Pavan resubmitted the same fix [5] for his own usecase,
hoping that someone in the community would fix the issue with ARM Juno boards.
But the patch was rightly rejected, as a patch that was known to cause issues
should not be merged to the kernel. But again, no one investigated the Juno
issue and it was left behind again.

Now it ended up in my plate and I managed to track down the issue with the help
of Naresh who got access to the Juno boards in LKFT. The Juno issue was with the
PCIe switch from Microsemi/IDT, which triggers ACS Source Validation error on
Completions received for the Configuration Read Request from a device connected
to the downstream port that has not yet captured the PCIe bus number. As per the
PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and Device Numbers
supplied with all Type 0 Configuration Write Requests completed by the Function
and supply these numbers in the Bus and Device Number fields of the Requester ID
for all Requests". So during the first Configuration Read Request issued by the
switch downstream port during enumeration (for reading Vendor ID), Bus and
Device numbers will be unknown to the device. So it responds to the Read Request
with Completion having Bus and Device number as 0. The switch interprets the
Completion as an ACS Source Validation error and drops the completion, leading
to the failure in detecting the endpoint device. Though the PCIe spec r6.0, sec
6.12.1.1, states that "Completions are never affected by ACS Source Validation".
This behavior is in violation of the spec.

Solution
========

In September, I submitted a series [6] to fix both issues. For the IDT issue,
I reused the existing quirk in the PCI core which does a dummy config write
before issuing the first config read to the device. And for the ACS enablement
issue, I just resubmitted the original patch from Xingang which called
pci_request_acs() from devm_of_pci_bridge_init().

But during the review of the series, several comments were received and they
required the series to be reworked completely. Hence, in this version, I've
incorported the comments as below:

1. For the ACS enablement issue, I've moved the pci_enable_acs() call from
pci_acs_init() to pci_dma_configure().

2. For the IDT issue, I've cached the ACS capabilities (RO) in 'pci_dev',
collected the broken capability for the IDT switches in the quirk and used it to
disable the capability in the cache. This also allowed me to get rid of the
earlier workaround for the switch.

[1] https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
[2] https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
[3] https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
[4] https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
[5] https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
[6] https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com

Changes in v2:

* Reworked the patches completely as mentioned above.
* Rebased on top of v6.18-rc7

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
Manivannan Sadhasivam (4):
      PCI: Enable ACS only after configuring IOMMU for OF platforms
      PCI: Cache ACS capabilities
      PCI: Disable ACS SV capability for the broken IDT switches
      PCI: Extend the pci_disable_acs_sv quirk for one more IDT switch

 drivers/pci/pci-driver.c |  8 +++++++
 drivers/pci/pci.c        | 33 ++++++++++++--------------
 drivers/pci/pci.h        |  2 +-
 drivers/pci/probe.c      | 12 ----------
 drivers/pci/quirks.c     | 62 ++++++++++++------------------------------------
 include/linux/pci.h      |  2 ++
 6 files changed, 41 insertions(+), 78 deletions(-)
---
base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d
change-id: 20251201-pci_acs-b15aa3947289

Best regards,
-- 
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Marek Szyprowski 2 months, 1 week ago
On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
> This series fixes the long standing issue with ACS in OF platforms. There are
> two fixes in this series, both fixing independent issues on their own, but both
> are needed to properly enable ACS on OF platforms.
>
> Issue(s) background
> ===================
>
> Back in 2021, Xingang Wang first noted a failure in attaching the HiSilicon SEC
> device to QEMU ARM64 pci-root-port device [1]. He then tracked down the issue to
> ACS not being enabled for the QEMU Root Port device and he proposed a patch to
> fix it [2].
>
> Once the patch got applied, people reported PCIe issues with linux-next on the
> ARM Juno Development boards, where they saw failure in enumerating the endpoint
> devices [3][4]. So soon, the patch got dropped, but the actual issue with the
> ARM Juno boards was left behind.
>
> Fast forward to 2024, Pavan resubmitted the same fix [5] for his own usecase,
> hoping that someone in the community would fix the issue with ARM Juno boards.
> But the patch was rightly rejected, as a patch that was known to cause issues
> should not be merged to the kernel. But again, no one investigated the Juno
> issue and it was left behind again.
>
> Now it ended up in my plate and I managed to track down the issue with the help
> of Naresh who got access to the Juno boards in LKFT. The Juno issue was with the
> PCIe switch from Microsemi/IDT, which triggers ACS Source Validation error on
> Completions received for the Configuration Read Request from a device connected
> to the downstream port that has not yet captured the PCIe bus number. As per the
> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and Device Numbers
> supplied with all Type 0 Configuration Write Requests completed by the Function
> and supply these numbers in the Bus and Device Number fields of the Requester ID
> for all Requests". So during the first Configuration Read Request issued by the
> switch downstream port during enumeration (for reading Vendor ID), Bus and
> Device numbers will be unknown to the device. So it responds to the Read Request
> with Completion having Bus and Device number as 0. The switch interprets the
> Completion as an ACS Source Validation error and drops the completion, leading
> to the failure in detecting the endpoint device. Though the PCIe spec r6.0, sec
> 6.12.1.1, states that "Completions are never affected by ACS Source Validation".
> This behavior is in violation of the spec.
>
> Solution
> ========
>
> In September, I submitted a series [6] to fix both issues. For the IDT issue,
> I reused the existing quirk in the PCI core which does a dummy config write
> before issuing the first config read to the device. And for the ACS enablement
> issue, I just resubmitted the original patch from Xingang which called
> pci_request_acs() from devm_of_pci_bridge_init().
>
> But during the review of the series, several comments were received and they
> required the series to be reworked completely. Hence, in this version, I've
> incorported the comments as below:
>
> 1. For the ACS enablement issue, I've moved the pci_enable_acs() call from
> pci_acs_init() to pci_dma_configure().
>
> 2. For the IDT issue, I've cached the ACS capabilities (RO) in 'pci_dev',
> collected the broken capability for the IDT switches in the quirk and used it to
> disable the capability in the cache. This also allowed me to get rid of the
> earlier workaround for the switch.
>
> [1] https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
> [2] https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
> [3] https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
> [4] https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
> [5] https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
> [6] https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>
Thanks for this patchset! I've tested it on my ARM Juno R1 and it looks 
that it almost works fine. This patchset even fixed some issues with PCI 
devices probe, as I again see SATA and GBit ethernet devices, which were 
missing since Linux v6.14 (it looks that I've also missed this in my tests).

# lspci
00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 
(rev 02)
02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 
(rev 02)
02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 
(rev 02)
02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 
(rev 02)
02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 
(rev 02)
02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 
(rev 02)
02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 
(rev 02)
03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA 
Raid II Controller (rev 01)
08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057 PCI-E 
Gigabit Ethernet Controller

However there is also a regression. After applying this patchset system 
suspend/resume stopped working. This is probably related to this message:

pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot, 
device inaccessible

which appears after calling 'rtcwake -s10 -mmem'. This might not be 
related to this patchset, so I probably need to apply it on older kernel 
releases and check.


> Changes in v2:
>
> * Reworked the patches completely as mentioned above.
> * Rebased on top of v6.18-rc7
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> Manivannan Sadhasivam (4):
>        PCI: Enable ACS only after configuring IOMMU for OF platforms
>        PCI: Cache ACS capabilities
>        PCI: Disable ACS SV capability for the broken IDT switches
>        PCI: Extend the pci_disable_acs_sv quirk for one more IDT switch
>
>   drivers/pci/pci-driver.c |  8 +++++++
>   drivers/pci/pci.c        | 33 ++++++++++++--------------
>   drivers/pci/pci.h        |  2 +-
>   drivers/pci/probe.c      | 12 ----------
>   drivers/pci/quirks.c     | 62 ++++++++++++------------------------------------
>   include/linux/pci.h      |  2 ++
>   6 files changed, 41 insertions(+), 78 deletions(-)
> ---
> base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d
> change-id: 20251201-pci_acs-b15aa3947289

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Marek Szyprowski 2 months ago
On 03.12.2025 13:04, Marek Szyprowski wrote:
> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>> This series fixes the long standing issue with ACS in OF platforms. 
>> There are
>> two fixes in this series, both fixing independent issues on their 
>> own, but both
>> are needed to properly enable ACS on OF platforms.
>>
>> Issue(s) background
>> ===================
>>
>> Back in 2021, Xingang Wang first noted a failure in attaching the 
>> HiSilicon SEC
>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down 
>> the issue to
>> ACS not being enabled for the QEMU Root Port device and he proposed a 
>> patch to
>> fix it [2].
>>
>> Once the patch got applied, people reported PCIe issues with 
>> linux-next on the
>> ARM Juno Development boards, where they saw failure in enumerating 
>> the endpoint
>> devices [3][4]. So soon, the patch got dropped, but the actual issue 
>> with the
>> ARM Juno boards was left behind.
>>
>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his own 
>> usecase,
>> hoping that someone in the community would fix the issue with ARM 
>> Juno boards.
>> But the patch was rightly rejected, as a patch that was known to 
>> cause issues
>> should not be merged to the kernel. But again, no one investigated 
>> the Juno
>> issue and it was left behind again.
>>
>> Now it ended up in my plate and I managed to track down the issue 
>> with the help
>> of Naresh who got access to the Juno boards in LKFT. The Juno issue 
>> was with the
>> PCIe switch from Microsemi/IDT, which triggers ACS Source Validation 
>> error on
>> Completions received for the Configuration Read Request from a device 
>> connected
>> to the downstream port that has not yet captured the PCIe bus number. 
>> As per the
>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and 
>> Device Numbers
>> supplied with all Type 0 Configuration Write Requests completed by 
>> the Function
>> and supply these numbers in the Bus and Device Number fields of the 
>> Requester ID
>> for all Requests". So during the first Configuration Read Request 
>> issued by the
>> switch downstream port during enumeration (for reading Vendor ID), 
>> Bus and
>> Device numbers will be unknown to the device. So it responds to the 
>> Read Request
>> with Completion having Bus and Device number as 0. The switch 
>> interprets the
>> Completion as an ACS Source Validation error and drops the 
>> completion, leading
>> to the failure in detecting the endpoint device. Though the PCIe spec 
>> r6.0, sec
>> 6.12.1.1, states that "Completions are never affected by ACS Source 
>> Validation".
>> This behavior is in violation of the spec.
>>
>> Solution
>> ========
>>
>> In September, I submitted a series [6] to fix both issues. For the 
>> IDT issue,
>> I reused the existing quirk in the PCI core which does a dummy config 
>> write
>> before issuing the first config read to the device. And for the ACS 
>> enablement
>> issue, I just resubmitted the original patch from Xingang which called
>> pci_request_acs() from devm_of_pci_bridge_init().
>>
>> But during the review of the series, several comments were received 
>> and they
>> required the series to be reworked completely. Hence, in this 
>> version, I've
>> incorported the comments as below:
>>
>> 1. For the ACS enablement issue, I've moved the pci_enable_acs() call 
>> from
>> pci_acs_init() to pci_dma_configure().
>>
>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in 
>> 'pci_dev',
>> collected the broken capability for the IDT switches in the quirk and 
>> used it to
>> disable the capability in the cache. This also allowed me to get rid 
>> of the
>> earlier workaround for the switch.
>>
>> [1] 
>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>> [2] 
>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>> [3] 
>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>> [4] 
>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>> [5] 
>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>> [6] 
>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>
> Thanks for this patchset! I've tested it on my ARM Juno R1 and it 
> looks that it almost works fine. This patchset even fixed some issues 
> with PCI devices probe, as I again see SATA and GBit ethernet devices, 
> which were missing since Linux v6.14 (it looks that I've also missed 
> this in my tests).
>
> # lspci
> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> 8090 (rev 02)
> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> 8090 (rev 02)
> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> 8090 (rev 02)
> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> 8090 (rev 02)
> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> 8090 (rev 02)
> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> 8090 (rev 02)
> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> 8090 (rev 02)
> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial 
> ATA Raid II Controller (rev 01)
> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057 
> PCI-E Gigabit Ethernet Controller
>
> However there is also a regression. After applying this patchset 
> system suspend/resume stopped working. This is probably related to 
> this message:
>
> pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot, 
> device inaccessible
>
> which appears after calling 'rtcwake -s10 -mmem'. This might not be 
> related to this patchset, so I probably need to apply it on older 
> kernel releases and check.


Just one more information - I've applied this patchset on top of v6.16 
and it works perfectly on ARM Juno R1. SATA and GBit ethernet are 
visible again and system suspend/resume works too, so the issue with the 
latter on top of v6.18 seems not to be directly related to $subject 
patchset. I will try to bisect this issue when I have some spare time.

Feel free to add:

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Marek Szyprowski 2 months ago
On 04.12.2025 14:13, Marek Szyprowski wrote:
> On 03.12.2025 13:04, Marek Szyprowski wrote:
>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>>> This series fixes the long standing issue with ACS in OF platforms. 
>>> There are
>>> two fixes in this series, both fixing independent issues on their 
>>> own, but both
>>> are needed to properly enable ACS on OF platforms.
>>>
>>> Issue(s) background
>>> ===================
>>>
>>> Back in 2021, Xingang Wang first noted a failure in attaching the 
>>> HiSilicon SEC
>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down 
>>> the issue to
>>> ACS not being enabled for the QEMU Root Port device and he proposed 
>>> a patch to
>>> fix it [2].
>>>
>>> Once the patch got applied, people reported PCIe issues with 
>>> linux-next on the
>>> ARM Juno Development boards, where they saw failure in enumerating 
>>> the endpoint
>>> devices [3][4]. So soon, the patch got dropped, but the actual issue 
>>> with the
>>> ARM Juno boards was left behind.
>>>
>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his own 
>>> usecase,
>>> hoping that someone in the community would fix the issue with ARM 
>>> Juno boards.
>>> But the patch was rightly rejected, as a patch that was known to 
>>> cause issues
>>> should not be merged to the kernel. But again, no one investigated 
>>> the Juno
>>> issue and it was left behind again.
>>>
>>> Now it ended up in my plate and I managed to track down the issue 
>>> with the help
>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue 
>>> was with the
>>> PCIe switch from Microsemi/IDT, which triggers ACS Source Validation 
>>> error on
>>> Completions received for the Configuration Read Request from a 
>>> device connected
>>> to the downstream port that has not yet captured the PCIe bus 
>>> number. As per the
>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and 
>>> Device Numbers
>>> supplied with all Type 0 Configuration Write Requests completed by 
>>> the Function
>>> and supply these numbers in the Bus and Device Number fields of the 
>>> Requester ID
>>> for all Requests". So during the first Configuration Read Request 
>>> issued by the
>>> switch downstream port during enumeration (for reading Vendor ID), 
>>> Bus and
>>> Device numbers will be unknown to the device. So it responds to the 
>>> Read Request
>>> with Completion having Bus and Device number as 0. The switch 
>>> interprets the
>>> Completion as an ACS Source Validation error and drops the 
>>> completion, leading
>>> to the failure in detecting the endpoint device. Though the PCIe 
>>> spec r6.0, sec
>>> 6.12.1.1, states that "Completions are never affected by ACS Source 
>>> Validation".
>>> This behavior is in violation of the spec.
>>>
>>> Solution
>>> ========
>>>
>>> In September, I submitted a series [6] to fix both issues. For the 
>>> IDT issue,
>>> I reused the existing quirk in the PCI core which does a dummy 
>>> config write
>>> before issuing the first config read to the device. And for the ACS 
>>> enablement
>>> issue, I just resubmitted the original patch from Xingang which called
>>> pci_request_acs() from devm_of_pci_bridge_init().
>>>
>>> But during the review of the series, several comments were received 
>>> and they
>>> required the series to be reworked completely. Hence, in this 
>>> version, I've
>>> incorported the comments as below:
>>>
>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs() 
>>> call from
>>> pci_acs_init() to pci_dma_configure().
>>>
>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in 
>>> 'pci_dev',
>>> collected the broken capability for the IDT switches in the quirk 
>>> and used it to
>>> disable the capability in the cache. This also allowed me to get rid 
>>> of the
>>> earlier workaround for the switch.
>>>
>>> [1] 
>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>>> [2] 
>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>>> [3] 
>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>>> [4] 
>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>>> [5] 
>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>>> [6] 
>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>>
>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it 
>> looks that it almost works fine. This patchset even fixed some issues 
>> with PCI devices probe, as I again see SATA and GBit ethernet 
>> devices, which were missing since Linux v6.14 (it looks that 
>> I've also missed this in my tests).
>>
>> # lspci
>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>> 8090 (rev 02)
>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>> 8090 (rev 02)
>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>> 8090 (rev 02)
>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>> 8090 (rev 02)
>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>> 8090 (rev 02)
>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>> 8090 (rev 02)
>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>> 8090 (rev 02)
>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial 
>> ATA Raid II Controller (rev 01)
>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057 
>> PCI-E Gigabit Ethernet Controller
>>
>> However there is also a regression. After applying this patchset 
>> system suspend/resume stopped working. This is probably related to 
>> this message:
>>
>> pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot, 
>> device inaccessible
>>
>> which appears after calling 'rtcwake -s10 -mmem'. This might not be 
>> related to this patchset, so I probably need to apply it on older 
>> kernel releases and check.
>
>
> Just one more information - I've applied this patchset on top of v6.16 
> and it works perfectly on ARM Juno R1. SATA and GBit ethernet are 
> visible again and system suspend/resume works too, so the issue with 
> the latter on top of v6.18 seems not to be directly related to 
> $subject patchset. I will try to bisect this issue when I have some 
> spare time.
>
> Feel free to add:
>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>


I spent some time analyzing this regression on Juno R1 and found that:

1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c 
("iommu: Get DT/ACPI parsing into the proper probe path") merged to 
v6.15-rc1.

2. With $subject patch applied to enable SATA & GBit ethernet again, 
system suspend/resume stopped working after commit f3ac2ff14834 
("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree 
platforms") merged to v6.18-rc1.

If I got it right, according to the latter commit message, some quirks 
have to be added to fix the suspend/resume issue. Unfortunately I have 
no idea if this is the Juno R1 or the given PCI devices specific issue.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Marek Szyprowski 2 months ago
On 09.12.2025 08:31, Marek Szyprowski wrote:
> On 04.12.2025 14:13, Marek Szyprowski wrote:
>> On 03.12.2025 13:04, Marek Szyprowski wrote:
>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>>>> This series fixes the long standing issue with ACS in OF platforms. 
>>>> There are
>>>> two fixes in this series, both fixing independent issues on their 
>>>> own, but both
>>>> are needed to properly enable ACS on OF platforms.
>>>>
>>>> Issue(s) background
>>>> ===================
>>>>
>>>> Back in 2021, Xingang Wang first noted a failure in attaching the 
>>>> HiSilicon SEC
>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down 
>>>> the issue to
>>>> ACS not being enabled for the QEMU Root Port device and he proposed 
>>>> a patch to
>>>> fix it [2].
>>>>
>>>> Once the patch got applied, people reported PCIe issues with 
>>>> linux-next on the
>>>> ARM Juno Development boards, where they saw failure in enumerating 
>>>> the endpoint
>>>> devices [3][4]. So soon, the patch got dropped, but the actual 
>>>> issue with the
>>>> ARM Juno boards was left behind.
>>>>
>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his 
>>>> own usecase,
>>>> hoping that someone in the community would fix the issue with ARM 
>>>> Juno boards.
>>>> But the patch was rightly rejected, as a patch that was known to 
>>>> cause issues
>>>> should not be merged to the kernel. But again, no one investigated 
>>>> the Juno
>>>> issue and it was left behind again.
>>>>
>>>> Now it ended up in my plate and I managed to track down the issue 
>>>> with the help
>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue 
>>>> was with the
>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source 
>>>> Validation error on
>>>> Completions received for the Configuration Read Request from a 
>>>> device connected
>>>> to the downstream port that has not yet captured the PCIe bus 
>>>> number. As per the
>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and 
>>>> Device Numbers
>>>> supplied with all Type 0 Configuration Write Requests completed by 
>>>> the Function
>>>> and supply these numbers in the Bus and Device Number fields of the 
>>>> Requester ID
>>>> for all Requests". So during the first Configuration Read Request 
>>>> issued by the
>>>> switch downstream port during enumeration (for reading Vendor ID), 
>>>> Bus and
>>>> Device numbers will be unknown to the device. So it responds to the 
>>>> Read Request
>>>> with Completion having Bus and Device number as 0. The switch 
>>>> interprets the
>>>> Completion as an ACS Source Validation error and drops the 
>>>> completion, leading
>>>> to the failure in detecting the endpoint device. Though the PCIe 
>>>> spec r6.0, sec
>>>> 6.12.1.1, states that "Completions are never affected by ACS Source 
>>>> Validation".
>>>> This behavior is in violation of the spec.
>>>>
>>>> Solution
>>>> ========
>>>>
>>>> In September, I submitted a series [6] to fix both issues. For the 
>>>> IDT issue,
>>>> I reused the existing quirk in the PCI core which does a dummy 
>>>> config write
>>>> before issuing the first config read to the device. And for the ACS 
>>>> enablement
>>>> issue, I just resubmitted the original patch from Xingang which called
>>>> pci_request_acs() from devm_of_pci_bridge_init().
>>>>
>>>> But during the review of the series, several comments were received 
>>>> and they
>>>> required the series to be reworked completely. Hence, in this 
>>>> version, I've
>>>> incorported the comments as below:
>>>>
>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs() 
>>>> call from
>>>> pci_acs_init() to pci_dma_configure().
>>>>
>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in 
>>>> 'pci_dev',
>>>> collected the broken capability for the IDT switches in the quirk 
>>>> and used it to
>>>> disable the capability in the cache. This also allowed me to get 
>>>> rid of the
>>>> earlier workaround for the switch.
>>>>
>>>> [1] 
>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>>>> [2] 
>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>>>> [3] 
>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>>>> [4] 
>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>>>> [5] 
>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>>>> [6] 
>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>>>
>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it 
>>> looks that it almost works fine. This patchset even fixed some 
>>> issues with PCI devices probe, as I again see SATA and GBit ethernet 
>>> devices, which were missing since Linux v6.14 (it looks that 
>>> I've also missed this in my tests).
>>>
>>> # lspci
>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>>> 8090 (rev 02)
>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>>> 8090 (rev 02)
>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>>> 8090 (rev 02)
>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>>> 8090 (rev 02)
>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>>> 8090 (rev 02)
>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>>> 8090 (rev 02)
>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
>>> 8090 (rev 02)
>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial 
>>> ATA Raid II Controller (rev 01)
>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057 
>>> PCI-E Gigabit Ethernet Controller
>>>
>>> However there is also a regression. After applying this patchset 
>>> system suspend/resume stopped working. This is probably related to 
>>> this message:
>>>
>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to 
>>> D3hot, device inaccessible
>>>
>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be 
>>> related to this patchset, so I probably need to apply it on older 
>>> kernel releases and check.
>>
>>
>> Just one more information - I've applied this patchset on top of 
>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet 
>> are visible again and system suspend/resume works too, so the issue 
>> with the latter on top of v6.18 seems not to be directly related to 
>> $subject patchset. I will try to bisect this issue when I have some 
>> spare time.
>>
>> Feel free to add:
>>
>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>
>
> I spent some time analyzing this regression on Juno R1 and found that:
>
> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c 
> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to 
> v6.15-rc1.
>
> 2. With $subject patch applied to enable SATA & GBit ethernet again, 
> system suspend/resume stopped working after commit f3ac2ff14834 
> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree 
> platforms") merged to v6.18-rc1.
>
> If I got it right, according to the latter commit message, some quirks 
> have to be added to fix the suspend/resume issue. Unfortunately I have 
> no idea if this is the Juno R1 or the given PCI devices specific issue.


And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and 
L1 for devicetree platforms") doesn't fix the suspend/resume issue 
either (with $subject patchset applied on top of it).


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Manivannan Sadhasivam 2 months ago
On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
> On 09.12.2025 08:31, Marek Szyprowski wrote:
> > On 04.12.2025 14:13, Marek Szyprowski wrote:
> >> On 03.12.2025 13:04, Marek Szyprowski wrote:
> >>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
> >>>> This series fixes the long standing issue with ACS in OF platforms. 
> >>>> There are
> >>>> two fixes in this series, both fixing independent issues on their 
> >>>> own, but both
> >>>> are needed to properly enable ACS on OF platforms.
> >>>>
> >>>> Issue(s) background
> >>>> ===================
> >>>>
> >>>> Back in 2021, Xingang Wang first noted a failure in attaching the 
> >>>> HiSilicon SEC
> >>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down 
> >>>> the issue to
> >>>> ACS not being enabled for the QEMU Root Port device and he proposed 
> >>>> a patch to
> >>>> fix it [2].
> >>>>
> >>>> Once the patch got applied, people reported PCIe issues with 
> >>>> linux-next on the
> >>>> ARM Juno Development boards, where they saw failure in enumerating 
> >>>> the endpoint
> >>>> devices [3][4]. So soon, the patch got dropped, but the actual 
> >>>> issue with the
> >>>> ARM Juno boards was left behind.
> >>>>
> >>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his 
> >>>> own usecase,
> >>>> hoping that someone in the community would fix the issue with ARM 
> >>>> Juno boards.
> >>>> But the patch was rightly rejected, as a patch that was known to 
> >>>> cause issues
> >>>> should not be merged to the kernel. But again, no one investigated 
> >>>> the Juno
> >>>> issue and it was left behind again.
> >>>>
> >>>> Now it ended up in my plate and I managed to track down the issue 
> >>>> with the help
> >>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue 
> >>>> was with the
> >>>> PCIe switch from Microsemi/IDT, which triggers ACS Source 
> >>>> Validation error on
> >>>> Completions received for the Configuration Read Request from a 
> >>>> device connected
> >>>> to the downstream port that has not yet captured the PCIe bus 
> >>>> number. As per the
> >>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and 
> >>>> Device Numbers
> >>>> supplied with all Type 0 Configuration Write Requests completed by 
> >>>> the Function
> >>>> and supply these numbers in the Bus and Device Number fields of the 
> >>>> Requester ID
> >>>> for all Requests". So during the first Configuration Read Request 
> >>>> issued by the
> >>>> switch downstream port during enumeration (for reading Vendor ID), 
> >>>> Bus and
> >>>> Device numbers will be unknown to the device. So it responds to the 
> >>>> Read Request
> >>>> with Completion having Bus and Device number as 0. The switch 
> >>>> interprets the
> >>>> Completion as an ACS Source Validation error and drops the 
> >>>> completion, leading
> >>>> to the failure in detecting the endpoint device. Though the PCIe 
> >>>> spec r6.0, sec
> >>>> 6.12.1.1, states that "Completions are never affected by ACS Source 
> >>>> Validation".
> >>>> This behavior is in violation of the spec.
> >>>>
> >>>> Solution
> >>>> ========
> >>>>
> >>>> In September, I submitted a series [6] to fix both issues. For the 
> >>>> IDT issue,
> >>>> I reused the existing quirk in the PCI core which does a dummy 
> >>>> config write
> >>>> before issuing the first config read to the device. And for the ACS 
> >>>> enablement
> >>>> issue, I just resubmitted the original patch from Xingang which called
> >>>> pci_request_acs() from devm_of_pci_bridge_init().
> >>>>
> >>>> But during the review of the series, several comments were received 
> >>>> and they
> >>>> required the series to be reworked completely. Hence, in this 
> >>>> version, I've
> >>>> incorported the comments as below:
> >>>>
> >>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs() 
> >>>> call from
> >>>> pci_acs_init() to pci_dma_configure().
> >>>>
> >>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in 
> >>>> 'pci_dev',
> >>>> collected the broken capability for the IDT switches in the quirk 
> >>>> and used it to
> >>>> disable the capability in the cache. This also allowed me to get 
> >>>> rid of the
> >>>> earlier workaround for the switch.
> >>>>
> >>>> [1] 
> >>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
> >>>> [2] 
> >>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
> >>>> [3] 
> >>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
> >>>> [4] 
> >>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
> >>>> [5] 
> >>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
> >>>> [6] 
> >>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
> >>>>
> >>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it 
> >>> looks that it almost works fine. This patchset even fixed some 
> >>> issues with PCI devices probe, as I again see SATA and GBit ethernet 
> >>> devices, which were missing since Linux v6.14 (it looks that 
> >>> I've also missed this in my tests).
> >>>
> >>> # lspci
> >>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
> >>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> >>> 8090 (rev 02)
> >>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> >>> 8090 (rev 02)
> >>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> >>> 8090 (rev 02)
> >>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> >>> 8090 (rev 02)
> >>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> >>> 8090 (rev 02)
> >>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> >>> 8090 (rev 02)
> >>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 
> >>> 8090 (rev 02)
> >>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial 
> >>> ATA Raid II Controller (rev 01)
> >>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057 
> >>> PCI-E Gigabit Ethernet Controller
> >>>
> >>> However there is also a regression. After applying this patchset 
> >>> system suspend/resume stopped working. This is probably related to 
> >>> this message:
> >>>
> >>> pcieport 0000:02:1f.0: Unable to change power state from D0 to 
> >>> D3hot, device inaccessible
> >>>
> >>> which appears after calling 'rtcwake -s10 -mmem'. This might not be 
> >>> related to this patchset, so I probably need to apply it on older 
> >>> kernel releases and check.
> >>
> >>
> >> Just one more information - I've applied this patchset on top of 
> >> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet 
> >> are visible again and system suspend/resume works too, so the issue 
> >> with the latter on top of v6.18 seems not to be directly related to 
> >> $subject patchset. I will try to bisect this issue when I have some 
> >> spare time.
> >>
> >> Feel free to add:
> >>
> >> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >
> >
> > I spent some time analyzing this regression on Juno R1 and found that:
> >
> > 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c 
> > ("iommu: Get DT/ACPI parsing into the proper probe path") merged to 
> > v6.15-rc1.
> >
> > 2. With $subject patch applied to enable SATA & GBit ethernet again, 
> > system suspend/resume stopped working after commit f3ac2ff14834 
> > ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree 
> > platforms") merged to v6.18-rc1.
> >

Yes, this was expected as if you don't disable ACS, it will cause issues in
detecting the devices.

> > If I got it right, according to the latter commit message, some quirks 
> > have to be added to fix the suspend/resume issue. Unfortunately I have 
> > no idea if this is the Juno R1 or the given PCI devices specific issue.
> 
> 
> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and 
> L1 for devicetree platforms") doesn't fix the suspend/resume issue 
> either (with $subject patchset applied on top of it).
> 

Interesting. Can you do:

echo performance > /sys/module/pcie_aspm/parameters/policy

and then suspend?

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Marek Szyprowski 2 months ago
On 09.12.2025 12:15, Manivannan Sadhasivam wrote:
> On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
>> On 09.12.2025 08:31, Marek Szyprowski wrote:
>>> On 04.12.2025 14:13, Marek Szyprowski wrote:
>>>> On 03.12.2025 13:04, Marek Szyprowski wrote:
>>>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>>>>>> This series fixes the long standing issue with ACS in OF platforms.
>>>>>> There are
>>>>>> two fixes in this series, both fixing independent issues on their
>>>>>> own, but both
>>>>>> are needed to properly enable ACS on OF platforms.
>>>>>>
>>>>>> Issue(s) background
>>>>>> ===================
>>>>>>
>>>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
>>>>>> HiSilicon SEC
>>>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
>>>>>> the issue to
>>>>>> ACS not being enabled for the QEMU Root Port device and he proposed
>>>>>> a patch to
>>>>>> fix it [2].
>>>>>>
>>>>>> Once the patch got applied, people reported PCIe issues with
>>>>>> linux-next on the
>>>>>> ARM Juno Development boards, where they saw failure in enumerating
>>>>>> the endpoint
>>>>>> devices [3][4]. So soon, the patch got dropped, but the actual
>>>>>> issue with the
>>>>>> ARM Juno boards was left behind.
>>>>>>
>>>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
>>>>>> own usecase,
>>>>>> hoping that someone in the community would fix the issue with ARM
>>>>>> Juno boards.
>>>>>> But the patch was rightly rejected, as a patch that was known to
>>>>>> cause issues
>>>>>> should not be merged to the kernel. But again, no one investigated
>>>>>> the Juno
>>>>>> issue and it was left behind again.
>>>>>>
>>>>>> Now it ended up in my plate and I managed to track down the issue
>>>>>> with the help
>>>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
>>>>>> was with the
>>>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
>>>>>> Validation error on
>>>>>> Completions received for the Configuration Read Request from a
>>>>>> device connected
>>>>>> to the downstream port that has not yet captured the PCIe bus
>>>>>> number. As per the
>>>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
>>>>>> Device Numbers
>>>>>> supplied with all Type 0 Configuration Write Requests completed by
>>>>>> the Function
>>>>>> and supply these numbers in the Bus and Device Number fields of the
>>>>>> Requester ID
>>>>>> for all Requests". So during the first Configuration Read Request
>>>>>> issued by the
>>>>>> switch downstream port during enumeration (for reading Vendor ID),
>>>>>> Bus and
>>>>>> Device numbers will be unknown to the device. So it responds to the
>>>>>> Read Request
>>>>>> with Completion having Bus and Device number as 0. The switch
>>>>>> interprets the
>>>>>> Completion as an ACS Source Validation error and drops the
>>>>>> completion, leading
>>>>>> to the failure in detecting the endpoint device. Though the PCIe
>>>>>> spec r6.0, sec
>>>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
>>>>>> Validation".
>>>>>> This behavior is in violation of the spec.
>>>>>>
>>>>>> Solution
>>>>>> ========
>>>>>>
>>>>>> In September, I submitted a series [6] to fix both issues. For the
>>>>>> IDT issue,
>>>>>> I reused the existing quirk in the PCI core which does a dummy
>>>>>> config write
>>>>>> before issuing the first config read to the device. And for the ACS
>>>>>> enablement
>>>>>> issue, I just resubmitted the original patch from Xingang which called
>>>>>> pci_request_acs() from devm_of_pci_bridge_init().
>>>>>>
>>>>>> But during the review of the series, several comments were received
>>>>>> and they
>>>>>> required the series to be reworked completely. Hence, in this
>>>>>> version, I've
>>>>>> incorported the comments as below:
>>>>>>
>>>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
>>>>>> call from
>>>>>> pci_acs_init() to pci_dma_configure().
>>>>>>
>>>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
>>>>>> 'pci_dev',
>>>>>> collected the broken capability for the IDT switches in the quirk
>>>>>> and used it to
>>>>>> disable the capability in the cache. This also allowed me to get
>>>>>> rid of the
>>>>>> earlier workaround for the switch.
>>>>>>
>>>>>> [1]
>>>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>>>>>> [2]
>>>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>>>>>> [3]
>>>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>>>>>> [4]
>>>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>>>>>> [5]
>>>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>>>>>> [6]
>>>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>>>>>
>>>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
>>>>> looks that it almost works fine. This patchset even fixed some
>>>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
>>>>> devices, which were missing since Linux v6.14 (it looks that
>>>>> I've also missed this in my tests).
>>>>>
>>>>> # lspci
>>>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
>>>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>> 8090 (rev 02)
>>>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
>>>>> ATA Raid II Controller (rev 01)
>>>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
>>>>> PCI-E Gigabit Ethernet Controller
>>>>>
>>>>> However there is also a regression. After applying this patchset
>>>>> system suspend/resume stopped working. This is probably related to
>>>>> this message:
>>>>>
>>>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
>>>>> D3hot, device inaccessible
>>>>>
>>>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
>>>>> related to this patchset, so I probably need to apply it on older
>>>>> kernel releases and check.
>>>>
>>>> Just one more information - I've applied this patchset on top of
>>>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
>>>> are visible again and system suspend/resume works too, so the issue
>>>> with the latter on top of v6.18 seems not to be directly related to
>>>> $subject patchset. I will try to bisect this issue when I have some
>>>> spare time.
>>>>
>>>> Feel free to add:
>>>>
>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>
>>> I spent some time analyzing this regression on Juno R1 and found that:
>>>
>>> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
>>> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
>>> v6.15-rc1.
>>>
>>> 2. With $subject patch applied to enable SATA & GBit ethernet again,
>>> system suspend/resume stopped working after commit f3ac2ff14834
>>> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
>>> platforms") merged to v6.18-rc1.
>>>
> Yes, this was expected as if you don't disable ACS, it will cause issues in
> detecting the devices.
>
>>> If I got it right, according to the latter commit message, some quirks
>>> have to be added to fix the suspend/resume issue. Unfortunately I have
>>> no idea if this is the Juno R1 or the given PCI devices specific issue.
>>
>> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
>> L1 for devicetree platforms") doesn't fix the suspend/resume issue
>> either (with $subject patchset applied on top of it).
>>
> Interesting. Can you do:
>
> echo performance > /sys/module/pcie_aspm/parameters/policy
>
> and then suspend?

After the above command, system suspend/resume works again.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Manivannan Sadhasivam 2 months ago
On Tue, Dec 09, 2025 at 01:00:55PM +0100, Marek Szyprowski wrote:
> On 09.12.2025 12:15, Manivannan Sadhasivam wrote:
> > On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
> >> On 09.12.2025 08:31, Marek Szyprowski wrote:
> >>> On 04.12.2025 14:13, Marek Szyprowski wrote:
> >>>> On 03.12.2025 13:04, Marek Szyprowski wrote:
> >>>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
> >>>>>> This series fixes the long standing issue with ACS in OF platforms.
> >>>>>> There are
> >>>>>> two fixes in this series, both fixing independent issues on their
> >>>>>> own, but both
> >>>>>> are needed to properly enable ACS on OF platforms.
> >>>>>>
> >>>>>> Issue(s) background
> >>>>>> ===================
> >>>>>>
> >>>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
> >>>>>> HiSilicon SEC
> >>>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
> >>>>>> the issue to
> >>>>>> ACS not being enabled for the QEMU Root Port device and he proposed
> >>>>>> a patch to
> >>>>>> fix it [2].
> >>>>>>
> >>>>>> Once the patch got applied, people reported PCIe issues with
> >>>>>> linux-next on the
> >>>>>> ARM Juno Development boards, where they saw failure in enumerating
> >>>>>> the endpoint
> >>>>>> devices [3][4]. So soon, the patch got dropped, but the actual
> >>>>>> issue with the
> >>>>>> ARM Juno boards was left behind.
> >>>>>>
> >>>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
> >>>>>> own usecase,
> >>>>>> hoping that someone in the community would fix the issue with ARM
> >>>>>> Juno boards.
> >>>>>> But the patch was rightly rejected, as a patch that was known to
> >>>>>> cause issues
> >>>>>> should not be merged to the kernel. But again, no one investigated
> >>>>>> the Juno
> >>>>>> issue and it was left behind again.
> >>>>>>
> >>>>>> Now it ended up in my plate and I managed to track down the issue
> >>>>>> with the help
> >>>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
> >>>>>> was with the
> >>>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
> >>>>>> Validation error on
> >>>>>> Completions received for the Configuration Read Request from a
> >>>>>> device connected
> >>>>>> to the downstream port that has not yet captured the PCIe bus
> >>>>>> number. As per the
> >>>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
> >>>>>> Device Numbers
> >>>>>> supplied with all Type 0 Configuration Write Requests completed by
> >>>>>> the Function
> >>>>>> and supply these numbers in the Bus and Device Number fields of the
> >>>>>> Requester ID
> >>>>>> for all Requests". So during the first Configuration Read Request
> >>>>>> issued by the
> >>>>>> switch downstream port during enumeration (for reading Vendor ID),
> >>>>>> Bus and
> >>>>>> Device numbers will be unknown to the device. So it responds to the
> >>>>>> Read Request
> >>>>>> with Completion having Bus and Device number as 0. The switch
> >>>>>> interprets the
> >>>>>> Completion as an ACS Source Validation error and drops the
> >>>>>> completion, leading
> >>>>>> to the failure in detecting the endpoint device. Though the PCIe
> >>>>>> spec r6.0, sec
> >>>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
> >>>>>> Validation".
> >>>>>> This behavior is in violation of the spec.
> >>>>>>
> >>>>>> Solution
> >>>>>> ========
> >>>>>>
> >>>>>> In September, I submitted a series [6] to fix both issues. For the
> >>>>>> IDT issue,
> >>>>>> I reused the existing quirk in the PCI core which does a dummy
> >>>>>> config write
> >>>>>> before issuing the first config read to the device. And for the ACS
> >>>>>> enablement
> >>>>>> issue, I just resubmitted the original patch from Xingang which called
> >>>>>> pci_request_acs() from devm_of_pci_bridge_init().
> >>>>>>
> >>>>>> But during the review of the series, several comments were received
> >>>>>> and they
> >>>>>> required the series to be reworked completely. Hence, in this
> >>>>>> version, I've
> >>>>>> incorported the comments as below:
> >>>>>>
> >>>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
> >>>>>> call from
> >>>>>> pci_acs_init() to pci_dma_configure().
> >>>>>>
> >>>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
> >>>>>> 'pci_dev',
> >>>>>> collected the broken capability for the IDT switches in the quirk
> >>>>>> and used it to
> >>>>>> disable the capability in the cache. This also allowed me to get
> >>>>>> rid of the
> >>>>>> earlier workaround for the switch.
> >>>>>>
> >>>>>> [1]
> >>>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
> >>>>>> [2]
> >>>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
> >>>>>> [3]
> >>>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
> >>>>>> [4]
> >>>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
> >>>>>> [5]
> >>>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
> >>>>>> [6]
> >>>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
> >>>>>>
> >>>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
> >>>>> looks that it almost works fine. This patchset even fixed some
> >>>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
> >>>>> devices, which were missing since Linux v6.14 (it looks that
> >>>>> I've also missed this in my tests).
> >>>>>
> >>>>> # lspci
> >>>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
> >>>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>> 8090 (rev 02)
> >>>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
> >>>>> ATA Raid II Controller (rev 01)
> >>>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
> >>>>> PCI-E Gigabit Ethernet Controller
> >>>>>
> >>>>> However there is also a regression. After applying this patchset
> >>>>> system suspend/resume stopped working. This is probably related to
> >>>>> this message:
> >>>>>
> >>>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
> >>>>> D3hot, device inaccessible
> >>>>>
> >>>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
> >>>>> related to this patchset, so I probably need to apply it on older
> >>>>> kernel releases and check.
> >>>>
> >>>> Just one more information - I've applied this patchset on top of
> >>>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
> >>>> are visible again and system suspend/resume works too, so the issue
> >>>> with the latter on top of v6.18 seems not to be directly related to
> >>>> $subject patchset. I will try to bisect this issue when I have some
> >>>> spare time.
> >>>>
> >>>> Feel free to add:
> >>>>
> >>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >>>
> >>> I spent some time analyzing this regression on Juno R1 and found that:
> >>>
> >>> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
> >>> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
> >>> v6.15-rc1.
> >>>
> >>> 2. With $subject patch applied to enable SATA & GBit ethernet again,
> >>> system suspend/resume stopped working after commit f3ac2ff14834
> >>> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
> >>> platforms") merged to v6.18-rc1.
> >>>
> > Yes, this was expected as if you don't disable ACS, it will cause issues in
> > detecting the devices.
> >
> >>> If I got it right, according to the latter commit message, some quirks
> >>> have to be added to fix the suspend/resume issue. Unfortunately I have
> >>> no idea if this is the Juno R1 or the given PCI devices specific issue.
> >>
> >> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
> >> L1 for devicetree platforms") doesn't fix the suspend/resume issue
> >> either (with $subject patchset applied on top of it).
> >>
> > Interesting. Can you do:
> >
> > echo performance > /sys/module/pcie_aspm/parameters/policy
> >
> > and then suspend?
> 
> After the above command, system suspend/resume works again.
> 

Ok, so ASPM L0s/L1 seems to be the issue. But I'm not quite sure why it causes
issue during suspend/resume. If the device/controller doesn't play well with
ASPM L0s/L1, it should atleast cause the issue before entering suspend.

I'm clueless here atm...

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Marek Szyprowski 2 months ago
On 09.12.2025 16:04, Manivannan Sadhasivam wrote:
> On Tue, Dec 09, 2025 at 01:00:55PM +0100, Marek Szyprowski wrote:
>> On 09.12.2025 12:15, Manivannan Sadhasivam wrote:
>>> On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
>>>> On 09.12.2025 08:31, Marek Szyprowski wrote:
>>>>> On 04.12.2025 14:13, Marek Szyprowski wrote:
>>>>>> On 03.12.2025 13:04, Marek Szyprowski wrote:
>>>>>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>>>>>>>> This series fixes the long standing issue with ACS in OF platforms.
>>>>>>>> There are
>>>>>>>> two fixes in this series, both fixing independent issues on their
>>>>>>>> own, but both
>>>>>>>> are needed to properly enable ACS on OF platforms.
>>>>>>>>
>>>>>>>> Issue(s) background
>>>>>>>> ===================
>>>>>>>>
>>>>>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
>>>>>>>> HiSilicon SEC
>>>>>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
>>>>>>>> the issue to
>>>>>>>> ACS not being enabled for the QEMU Root Port device and he proposed
>>>>>>>> a patch to
>>>>>>>> fix it [2].
>>>>>>>>
>>>>>>>> Once the patch got applied, people reported PCIe issues with
>>>>>>>> linux-next on the
>>>>>>>> ARM Juno Development boards, where they saw failure in enumerating
>>>>>>>> the endpoint
>>>>>>>> devices [3][4]. So soon, the patch got dropped, but the actual
>>>>>>>> issue with the
>>>>>>>> ARM Juno boards was left behind.
>>>>>>>>
>>>>>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
>>>>>>>> own usecase,
>>>>>>>> hoping that someone in the community would fix the issue with ARM
>>>>>>>> Juno boards.
>>>>>>>> But the patch was rightly rejected, as a patch that was known to
>>>>>>>> cause issues
>>>>>>>> should not be merged to the kernel. But again, no one investigated
>>>>>>>> the Juno
>>>>>>>> issue and it was left behind again.
>>>>>>>>
>>>>>>>> Now it ended up in my plate and I managed to track down the issue
>>>>>>>> with the help
>>>>>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
>>>>>>>> was with the
>>>>>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
>>>>>>>> Validation error on
>>>>>>>> Completions received for the Configuration Read Request from a
>>>>>>>> device connected
>>>>>>>> to the downstream port that has not yet captured the PCIe bus
>>>>>>>> number. As per the
>>>>>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
>>>>>>>> Device Numbers
>>>>>>>> supplied with all Type 0 Configuration Write Requests completed by
>>>>>>>> the Function
>>>>>>>> and supply these numbers in the Bus and Device Number fields of the
>>>>>>>> Requester ID
>>>>>>>> for all Requests". So during the first Configuration Read Request
>>>>>>>> issued by the
>>>>>>>> switch downstream port during enumeration (for reading Vendor ID),
>>>>>>>> Bus and
>>>>>>>> Device numbers will be unknown to the device. So it responds to the
>>>>>>>> Read Request
>>>>>>>> with Completion having Bus and Device number as 0. The switch
>>>>>>>> interprets the
>>>>>>>> Completion as an ACS Source Validation error and drops the
>>>>>>>> completion, leading
>>>>>>>> to the failure in detecting the endpoint device. Though the PCIe
>>>>>>>> spec r6.0, sec
>>>>>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
>>>>>>>> Validation".
>>>>>>>> This behavior is in violation of the spec.
>>>>>>>>
>>>>>>>> Solution
>>>>>>>> ========
>>>>>>>>
>>>>>>>> In September, I submitted a series [6] to fix both issues. For the
>>>>>>>> IDT issue,
>>>>>>>> I reused the existing quirk in the PCI core which does a dummy
>>>>>>>> config write
>>>>>>>> before issuing the first config read to the device. And for the ACS
>>>>>>>> enablement
>>>>>>>> issue, I just resubmitted the original patch from Xingang which called
>>>>>>>> pci_request_acs() from devm_of_pci_bridge_init().
>>>>>>>>
>>>>>>>> But during the review of the series, several comments were received
>>>>>>>> and they
>>>>>>>> required the series to be reworked completely. Hence, in this
>>>>>>>> version, I've
>>>>>>>> incorported the comments as below:
>>>>>>>>
>>>>>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
>>>>>>>> call from
>>>>>>>> pci_acs_init() to pci_dma_configure().
>>>>>>>>
>>>>>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
>>>>>>>> 'pci_dev',
>>>>>>>> collected the broken capability for the IDT switches in the quirk
>>>>>>>> and used it to
>>>>>>>> disable the capability in the cache. This also allowed me to get
>>>>>>>> rid of the
>>>>>>>> earlier workaround for the switch.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>>>>>>>> [2]
>>>>>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>>>>>>>> [3]
>>>>>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>>>>>>>> [4]
>>>>>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>>>>>>>> [5]
>>>>>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>>>>>>>> [6]
>>>>>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>>>>>>>
>>>>>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
>>>>>>> looks that it almost works fine. This patchset even fixed some
>>>>>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
>>>>>>> devices, which were missing since Linux v6.14 (it looks that
>>>>>>> I've also missed this in my tests).
>>>>>>>
>>>>>>> # lspci
>>>>>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
>>>>>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>> 8090 (rev 02)
>>>>>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
>>>>>>> ATA Raid II Controller (rev 01)
>>>>>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
>>>>>>> PCI-E Gigabit Ethernet Controller
>>>>>>>
>>>>>>> However there is also a regression. After applying this patchset
>>>>>>> system suspend/resume stopped working. This is probably related to
>>>>>>> this message:
>>>>>>>
>>>>>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
>>>>>>> D3hot, device inaccessible
>>>>>>>
>>>>>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
>>>>>>> related to this patchset, so I probably need to apply it on older
>>>>>>> kernel releases and check.
>>>>>> Just one more information - I've applied this patchset on top of
>>>>>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
>>>>>> are visible again and system suspend/resume works too, so the issue
>>>>>> with the latter on top of v6.18 seems not to be directly related to
>>>>>> $subject patchset. I will try to bisect this issue when I have some
>>>>>> spare time.
>>>>>>
>>>>>> Feel free to add:
>>>>>>
>>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>> I spent some time analyzing this regression on Juno R1 and found that:
>>>>>
>>>>> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
>>>>> v6.15-rc1.
>>>>>
>>>>> 2. With $subject patch applied to enable SATA & GBit ethernet again,
>>>>> system suspend/resume stopped working after commit f3ac2ff14834
>>>>> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
>>>>> platforms") merged to v6.18-rc1.
>>>>>
>>> Yes, this was expected as if you don't disable ACS, it will cause issues in
>>> detecting the devices.
>>>
>>>>> If I got it right, according to the latter commit message, some quirks
>>>>> have to be added to fix the suspend/resume issue. Unfortunately I have
>>>>> no idea if this is the Juno R1 or the given PCI devices specific issue.
>>>> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
>>>> L1 for devicetree platforms") doesn't fix the suspend/resume issue
>>>> either (with $subject patchset applied on top of it).
>>>>
>>> Interesting. Can you do:
>>>
>>> echo performance > /sys/module/pcie_aspm/parameters/policy
>>>
>>> and then suspend?
>> After the above command, system suspend/resume works again.
>>
> Ok, so ASPM L0s/L1 seems to be the issue. But I'm not quite sure why it causes
> issue during suspend/resume. If the device/controller doesn't play well with
> ASPM L0s/L1, it should atleast cause the issue before entering suspend.
>
> I'm clueless here atm...

Definitely something gets broken during suspend, after adding 
'no_console_suspend' to kernel command line I see the following messages:

# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Dec 10 17:04:12 2025
PM: suspend entry (deep)
Filesystems sync: 0.001 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.005 seconds)
OOM killer disabled.
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.003 seconds)
psmouse serio1: Failed to disable mouse on 1c070000.kmi
psmouse serio0: Failed to disable mouse on 1c060000.kmi
pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot, 
device inaccessible
Disabling non-boot CPUs ...
psci: CPU5 killed (polled 0 ms)
psci: CPU4 killed (polled 0 ms)
psci: CPU3 killed (polled 0 ms)
psci: CPU2 killed (polled 0 ms)
psci: CPU1 killed (polled 4 ms)

and system never wakes up.

I assume that this 'pcieport 0000:02:1f.0: Unable to change power state 
from D0 to D3hot, device inaccessible' message is crucial here. It 
doesn't appear when I change the pcie_aspm policy to performance (as You 
suggested in previous mail).

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Manivannan Sadhasivam 1 month, 4 weeks ago
On Wed, Dec 10, 2025 at 06:26:27PM +0100, Marek Szyprowski wrote:
> On 09.12.2025 16:04, Manivannan Sadhasivam wrote:
> > On Tue, Dec 09, 2025 at 01:00:55PM +0100, Marek Szyprowski wrote:
> >> On 09.12.2025 12:15, Manivannan Sadhasivam wrote:
> >>> On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
> >>>> On 09.12.2025 08:31, Marek Szyprowski wrote:
> >>>>> On 04.12.2025 14:13, Marek Szyprowski wrote:
> >>>>>> On 03.12.2025 13:04, Marek Szyprowski wrote:
> >>>>>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
> >>>>>>>> This series fixes the long standing issue with ACS in OF platforms.
> >>>>>>>> There are
> >>>>>>>> two fixes in this series, both fixing independent issues on their
> >>>>>>>> own, but both
> >>>>>>>> are needed to properly enable ACS on OF platforms.
> >>>>>>>>
> >>>>>>>> Issue(s) background
> >>>>>>>> ===================
> >>>>>>>>
> >>>>>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
> >>>>>>>> HiSilicon SEC
> >>>>>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
> >>>>>>>> the issue to
> >>>>>>>> ACS not being enabled for the QEMU Root Port device and he proposed
> >>>>>>>> a patch to
> >>>>>>>> fix it [2].
> >>>>>>>>
> >>>>>>>> Once the patch got applied, people reported PCIe issues with
> >>>>>>>> linux-next on the
> >>>>>>>> ARM Juno Development boards, where they saw failure in enumerating
> >>>>>>>> the endpoint
> >>>>>>>> devices [3][4]. So soon, the patch got dropped, but the actual
> >>>>>>>> issue with the
> >>>>>>>> ARM Juno boards was left behind.
> >>>>>>>>
> >>>>>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
> >>>>>>>> own usecase,
> >>>>>>>> hoping that someone in the community would fix the issue with ARM
> >>>>>>>> Juno boards.
> >>>>>>>> But the patch was rightly rejected, as a patch that was known to
> >>>>>>>> cause issues
> >>>>>>>> should not be merged to the kernel. But again, no one investigated
> >>>>>>>> the Juno
> >>>>>>>> issue and it was left behind again.
> >>>>>>>>
> >>>>>>>> Now it ended up in my plate and I managed to track down the issue
> >>>>>>>> with the help
> >>>>>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
> >>>>>>>> was with the
> >>>>>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
> >>>>>>>> Validation error on
> >>>>>>>> Completions received for the Configuration Read Request from a
> >>>>>>>> device connected
> >>>>>>>> to the downstream port that has not yet captured the PCIe bus
> >>>>>>>> number. As per the
> >>>>>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
> >>>>>>>> Device Numbers
> >>>>>>>> supplied with all Type 0 Configuration Write Requests completed by
> >>>>>>>> the Function
> >>>>>>>> and supply these numbers in the Bus and Device Number fields of the
> >>>>>>>> Requester ID
> >>>>>>>> for all Requests". So during the first Configuration Read Request
> >>>>>>>> issued by the
> >>>>>>>> switch downstream port during enumeration (for reading Vendor ID),
> >>>>>>>> Bus and
> >>>>>>>> Device numbers will be unknown to the device. So it responds to the
> >>>>>>>> Read Request
> >>>>>>>> with Completion having Bus and Device number as 0. The switch
> >>>>>>>> interprets the
> >>>>>>>> Completion as an ACS Source Validation error and drops the
> >>>>>>>> completion, leading
> >>>>>>>> to the failure in detecting the endpoint device. Though the PCIe
> >>>>>>>> spec r6.0, sec
> >>>>>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
> >>>>>>>> Validation".
> >>>>>>>> This behavior is in violation of the spec.
> >>>>>>>>
> >>>>>>>> Solution
> >>>>>>>> ========
> >>>>>>>>
> >>>>>>>> In September, I submitted a series [6] to fix both issues. For the
> >>>>>>>> IDT issue,
> >>>>>>>> I reused the existing quirk in the PCI core which does a dummy
> >>>>>>>> config write
> >>>>>>>> before issuing the first config read to the device. And for the ACS
> >>>>>>>> enablement
> >>>>>>>> issue, I just resubmitted the original patch from Xingang which called
> >>>>>>>> pci_request_acs() from devm_of_pci_bridge_init().
> >>>>>>>>
> >>>>>>>> But during the review of the series, several comments were received
> >>>>>>>> and they
> >>>>>>>> required the series to be reworked completely. Hence, in this
> >>>>>>>> version, I've
> >>>>>>>> incorported the comments as below:
> >>>>>>>>
> >>>>>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
> >>>>>>>> call from
> >>>>>>>> pci_acs_init() to pci_dma_configure().
> >>>>>>>>
> >>>>>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
> >>>>>>>> 'pci_dev',
> >>>>>>>> collected the broken capability for the IDT switches in the quirk
> >>>>>>>> and used it to
> >>>>>>>> disable the capability in the cache. This also allowed me to get
> >>>>>>>> rid of the
> >>>>>>>> earlier workaround for the switch.
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
> >>>>>>>> [2]
> >>>>>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
> >>>>>>>> [3]
> >>>>>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
> >>>>>>>> [4]
> >>>>>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
> >>>>>>>> [5]
> >>>>>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
> >>>>>>>> [6]
> >>>>>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
> >>>>>>>>
> >>>>>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
> >>>>>>> looks that it almost works fine. This patchset even fixed some
> >>>>>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
> >>>>>>> devices, which were missing since Linux v6.14 (it looks that
> >>>>>>> I've also missed this in my tests).
> >>>>>>>
> >>>>>>> # lspci
> >>>>>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
> >>>>>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
> >>>>>>> 8090 (rev 02)
> >>>>>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
> >>>>>>> ATA Raid II Controller (rev 01)
> >>>>>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
> >>>>>>> PCI-E Gigabit Ethernet Controller
> >>>>>>>
> >>>>>>> However there is also a regression. After applying this patchset
> >>>>>>> system suspend/resume stopped working. This is probably related to
> >>>>>>> this message:
> >>>>>>>
> >>>>>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
> >>>>>>> D3hot, device inaccessible
> >>>>>>>
> >>>>>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
> >>>>>>> related to this patchset, so I probably need to apply it on older
> >>>>>>> kernel releases and check.
> >>>>>> Just one more information - I've applied this patchset on top of
> >>>>>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
> >>>>>> are visible again and system suspend/resume works too, so the issue
> >>>>>> with the latter on top of v6.18 seems not to be directly related to
> >>>>>> $subject patchset. I will try to bisect this issue when I have some
> >>>>>> spare time.
> >>>>>>
> >>>>>> Feel free to add:
> >>>>>>
> >>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >>>>> I spent some time analyzing this regression on Juno R1 and found that:
> >>>>>
> >>>>> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
> >>>>> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
> >>>>> v6.15-rc1.
> >>>>>
> >>>>> 2. With $subject patch applied to enable SATA & GBit ethernet again,
> >>>>> system suspend/resume stopped working after commit f3ac2ff14834
> >>>>> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
> >>>>> platforms") merged to v6.18-rc1.
> >>>>>
> >>> Yes, this was expected as if you don't disable ACS, it will cause issues in
> >>> detecting the devices.
> >>>
> >>>>> If I got it right, according to the latter commit message, some quirks
> >>>>> have to be added to fix the suspend/resume issue. Unfortunately I have
> >>>>> no idea if this is the Juno R1 or the given PCI devices specific issue.
> >>>> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
> >>>> L1 for devicetree platforms") doesn't fix the suspend/resume issue
> >>>> either (with $subject patchset applied on top of it).
> >>>>
> >>> Interesting. Can you do:
> >>>
> >>> echo performance > /sys/module/pcie_aspm/parameters/policy
> >>>
> >>> and then suspend?
> >> After the above command, system suspend/resume works again.
> >>
> > Ok, so ASPM L0s/L1 seems to be the issue. But I'm not quite sure why it causes
> > issue during suspend/resume. If the device/controller doesn't play well with
> > ASPM L0s/L1, it should atleast cause the issue before entering suspend.
> >
> > I'm clueless here atm...
> 
> Definitely something gets broken during suspend, after adding 
> 'no_console_suspend' to kernel command line I see the following messages:
> 
> # time rtcwake -s10 -mmem
> rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Dec 10 17:04:12 2025
> PM: suspend entry (deep)
> Filesystems sync: 0.001 seconds
> Freezing user space processes
> Freezing user space processes completed (elapsed 0.005 seconds)
> OOM killer disabled.
> Freezing remaining freezable tasks
> Freezing remaining freezable tasks completed (elapsed 0.003 seconds)
> psmouse serio1: Failed to disable mouse on 1c070000.kmi
> psmouse serio0: Failed to disable mouse on 1c060000.kmi
> pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot, 
> device inaccessible

The device just got blown off the bus at this point. But it is unclear to me why
it happens though if we enable ASPM L0s/L1. I don't think the firmware has
gotten the chance to turn off the power to devices.

So maybe some actions that we do in the PCI core during system suspend is
affecting the device state. But can you try to access the device by doing:

lspci -vvv -s 0000:02:1f.0

before initiating system suspend. Just to make sure if the issue happens during
suspend or way before that.

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Marek Szyprowski 1 month, 4 weeks ago
On 12.12.2025 05:02, Manivannan Sadhasivam wrote:
> On Wed, Dec 10, 2025 at 06:26:27PM +0100, Marek Szyprowski wrote:
>> On 09.12.2025 16:04, Manivannan Sadhasivam wrote:
>>> On Tue, Dec 09, 2025 at 01:00:55PM +0100, Marek Szyprowski wrote:
>>>> On 09.12.2025 12:15, Manivannan Sadhasivam wrote:
>>>>> On Tue, Dec 09, 2025 at 09:28:38AM +0100, Marek Szyprowski wrote:
>>>>>> On 09.12.2025 08:31, Marek Szyprowski wrote:
>>>>>>> On 04.12.2025 14:13, Marek Szyprowski wrote:
>>>>>>>> On 03.12.2025 13:04, Marek Szyprowski wrote:
>>>>>>>>> On 02.12.2025 15:22, Manivannan Sadhasivam wrote:
>>>>>>>>>> This series fixes the long standing issue with ACS in OF platforms.
>>>>>>>>>> There are
>>>>>>>>>> two fixes in this series, both fixing independent issues on their
>>>>>>>>>> own, but both
>>>>>>>>>> are needed to properly enable ACS on OF platforms.
>>>>>>>>>>
>>>>>>>>>> Issue(s) background
>>>>>>>>>> ===================
>>>>>>>>>>
>>>>>>>>>> Back in 2021, Xingang Wang first noted a failure in attaching the
>>>>>>>>>> HiSilicon SEC
>>>>>>>>>> device to QEMU ARM64 pci-root-port device [1]. He then tracked down
>>>>>>>>>> the issue to
>>>>>>>>>> ACS not being enabled for the QEMU Root Port device and he proposed
>>>>>>>>>> a patch to
>>>>>>>>>> fix it [2].
>>>>>>>>>>
>>>>>>>>>> Once the patch got applied, people reported PCIe issues with
>>>>>>>>>> linux-next on the
>>>>>>>>>> ARM Juno Development boards, where they saw failure in enumerating
>>>>>>>>>> the endpoint
>>>>>>>>>> devices [3][4]. So soon, the patch got dropped, but the actual
>>>>>>>>>> issue with the
>>>>>>>>>> ARM Juno boards was left behind.
>>>>>>>>>>
>>>>>>>>>> Fast forward to 2024, Pavan resubmitted the same fix [5] for his
>>>>>>>>>> own usecase,
>>>>>>>>>> hoping that someone in the community would fix the issue with ARM
>>>>>>>>>> Juno boards.
>>>>>>>>>> But the patch was rightly rejected, as a patch that was known to
>>>>>>>>>> cause issues
>>>>>>>>>> should not be merged to the kernel. But again, no one investigated
>>>>>>>>>> the Juno
>>>>>>>>>> issue and it was left behind again.
>>>>>>>>>>
>>>>>>>>>> Now it ended up in my plate and I managed to track down the issue
>>>>>>>>>> with the help
>>>>>>>>>> of Naresh who got access to the Juno boards in LKFT. The Juno issue
>>>>>>>>>> was with the
>>>>>>>>>> PCIe switch from Microsemi/IDT, which triggers ACS Source
>>>>>>>>>> Validation error on
>>>>>>>>>> Completions received for the Configuration Read Request from a
>>>>>>>>>> device connected
>>>>>>>>>> to the downstream port that has not yet captured the PCIe bus
>>>>>>>>>> number. As per the
>>>>>>>>>> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and
>>>>>>>>>> Device Numbers
>>>>>>>>>> supplied with all Type 0 Configuration Write Requests completed by
>>>>>>>>>> the Function
>>>>>>>>>> and supply these numbers in the Bus and Device Number fields of the
>>>>>>>>>> Requester ID
>>>>>>>>>> for all Requests". So during the first Configuration Read Request
>>>>>>>>>> issued by the
>>>>>>>>>> switch downstream port during enumeration (for reading Vendor ID),
>>>>>>>>>> Bus and
>>>>>>>>>> Device numbers will be unknown to the device. So it responds to the
>>>>>>>>>> Read Request
>>>>>>>>>> with Completion having Bus and Device number as 0. The switch
>>>>>>>>>> interprets the
>>>>>>>>>> Completion as an ACS Source Validation error and drops the
>>>>>>>>>> completion, leading
>>>>>>>>>> to the failure in detecting the endpoint device. Though the PCIe
>>>>>>>>>> spec r6.0, sec
>>>>>>>>>> 6.12.1.1, states that "Completions are never affected by ACS Source
>>>>>>>>>> Validation".
>>>>>>>>>> This behavior is in violation of the spec.
>>>>>>>>>>
>>>>>>>>>> Solution
>>>>>>>>>> ========
>>>>>>>>>>
>>>>>>>>>> In September, I submitted a series [6] to fix both issues. For the
>>>>>>>>>> IDT issue,
>>>>>>>>>> I reused the existing quirk in the PCI core which does a dummy
>>>>>>>>>> config write
>>>>>>>>>> before issuing the first config read to the device. And for the ACS
>>>>>>>>>> enablement
>>>>>>>>>> issue, I just resubmitted the original patch from Xingang which called
>>>>>>>>>> pci_request_acs() from devm_of_pci_bridge_init().
>>>>>>>>>>
>>>>>>>>>> But during the review of the series, several comments were received
>>>>>>>>>> and they
>>>>>>>>>> required the series to be reworked completely. Hence, in this
>>>>>>>>>> version, I've
>>>>>>>>>> incorported the comments as below:
>>>>>>>>>>
>>>>>>>>>> 1. For the ACS enablement issue, I've moved the pci_enable_acs()
>>>>>>>>>> call from
>>>>>>>>>> pci_acs_init() to pci_dma_configure().
>>>>>>>>>>
>>>>>>>>>> 2. For the IDT issue, I've cached the ACS capabilities (RO) in
>>>>>>>>>> 'pci_dev',
>>>>>>>>>> collected the broken capability for the IDT switches in the quirk
>>>>>>>>>> and used it to
>>>>>>>>>> disable the capability in the cache. This also allowed me to get
>>>>>>>>>> rid of the
>>>>>>>>>> earlier workaround for the switch.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
>>>>>>>>>> [2]
>>>>>>>>>> https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
>>>>>>>>>> [3]
>>>>>>>>>> https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
>>>>>>>>>> [4]
>>>>>>>>>> https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
>>>>>>>>>> [5]
>>>>>>>>>> https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
>>>>>>>>>> [6]
>>>>>>>>>> https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>>>>>>>>>>
>>>>>>>>> Thanks for this patchset! I've tested it on my ARM Juno R1 and it
>>>>>>>>> looks that it almost works fine. This patchset even fixed some
>>>>>>>>> issues with PCI devices probe, as I again see SATA and GBit ethernet
>>>>>>>>> devices, which were missing since Linux v6.14 (it looks that
>>>>>>>>> I've also missed this in my tests).
>>>>>>>>>
>>>>>>>>> # lspci
>>>>>>>>> 00:00.0 PCI bridge: PLDA PCI Express Core Reference Design (rev 01)
>>>>>>>>> 01:00.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:01.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:02.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:03.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:0c.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:10.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device
>>>>>>>>> 8090 (rev 02)
>>>>>>>>> 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial
>>>>>>>>> ATA Raid II Controller (rev 01)
>>>>>>>>> 08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057
>>>>>>>>> PCI-E Gigabit Ethernet Controller
>>>>>>>>>
>>>>>>>>> However there is also a regression. After applying this patchset
>>>>>>>>> system suspend/resume stopped working. This is probably related to
>>>>>>>>> this message:
>>>>>>>>>
>>>>>>>>> pcieport 0000:02:1f.0: Unable to change power state from D0 to
>>>>>>>>> D3hot, device inaccessible
>>>>>>>>>
>>>>>>>>> which appears after calling 'rtcwake -s10 -mmem'. This might not be
>>>>>>>>> related to this patchset, so I probably need to apply it on older
>>>>>>>>> kernel releases and check.
>>>>>>>> Just one more information - I've applied this patchset on top of
>>>>>>>> v6.16 and it works perfectly on ARM Juno R1. SATA and GBit ethernet
>>>>>>>> are visible again and system suspend/resume works too, so the issue
>>>>>>>> with the latter on top of v6.18 seems not to be directly related to
>>>>>>>> $subject patchset. I will try to bisect this issue when I have some
>>>>>>>> spare time.
>>>>>>>>
>>>>>>>> Feel free to add:
>>>>>>>>
>>>>>>>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>> I spent some time analyzing this regression on Juno R1 and found that:
>>>>>>>
>>>>>>> 1. SATA and GBit Ethernet stopped working after commit bcb81ac6ae3c
>>>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path") merged to
>>>>>>> v6.15-rc1.
>>>>>>>
>>>>>>> 2. With $subject patch applied to enable SATA & GBit ethernet again,
>>>>>>> system suspend/resume stopped working after commit f3ac2ff14834
>>>>>>> ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
>>>>>>> platforms") merged to v6.18-rc1.
>>>>>>>
>>>>> Yes, this was expected as if you don't disable ACS, it will cause issues in
>>>>> detecting the devices.
>>>>>
>>>>>>> If I got it right, according to the latter commit message, some quirks
>>>>>>> have to be added to fix the suspend/resume issue. Unfortunately I have
>>>>>>> no idea if this is the Juno R1 or the given PCI devices specific issue.
>>>>>> And one more note, commit df5192d9bb0e ("PCI/ASPM: Enable only L0s and
>>>>>> L1 for devicetree platforms") doesn't fix the suspend/resume issue
>>>>>> either (with $subject patchset applied on top of it).
>>>>>>
>>>>> Interesting. Can you do:
>>>>>
>>>>> echo performance > /sys/module/pcie_aspm/parameters/policy
>>>>>
>>>>> and then suspend?
>>>> After the above command, system suspend/resume works again.
>>>>
>>> Ok, so ASPM L0s/L1 seems to be the issue. But I'm not quite sure why it causes
>>> issue during suspend/resume. If the device/controller doesn't play well with
>>> ASPM L0s/L1, it should atleast cause the issue before entering suspend.
>>>
>>> I'm clueless here atm...
>> Definitely something gets broken during suspend, after adding
>> 'no_console_suspend' to kernel command line I see the following messages:
>>
>> # time rtcwake -s10 -mmem
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Wed Dec 10 17:04:12 2025
>> PM: suspend entry (deep)
>> Filesystems sync: 0.001 seconds
>> Freezing user space processes
>> Freezing user space processes completed (elapsed 0.005 seconds)
>> OOM killer disabled.
>> Freezing remaining freezable tasks
>> Freezing remaining freezable tasks completed (elapsed 0.003 seconds)
>> psmouse serio1: Failed to disable mouse on 1c070000.kmi
>> psmouse serio0: Failed to disable mouse on 1c060000.kmi
>> pcieport 0000:02:1f.0: Unable to change power state from D0 to D3hot,
>> device inaccessible
> The device just got blown off the bus at this point. But it is unclear to me why
> it happens though if we enable ASPM L0s/L1. I don't think the firmware has
> gotten the chance to turn off the power to devices.
>
> So maybe some actions that we do in the PCI core during system suspend is
> affecting the device state. But can you try to access the device by doing:
>
> lspci -vvv -s 0000:02:1f.0
>
> before initiating system suspend. Just to make sure if the issue happens during
> suspend or way before that.

Same as before:

root@target:~# lspci -vvv -s 0000:02:1f.0
02:1f.0 PCI bridge: Integrated Device Technology, Inc. [IDT] Device 8090 
(rev 02) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin ? routed to IRQ 50
        Bus: primary=02, secondary=08, subordinate=08, sec-latency=0
        I/O behind bridge: 00002000-00002fff
        Memory behind bridge: 50100000-501fffff
        Prefetchable memory behind bridge: 
00000000fff00000-00000000000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v2) Downstream Port (Slot-), MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- 
TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, 
Exit Latency L0s <4us, L1 <4us
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM L0s L1 Enabled; Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive+ BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, 
LTR-, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, 
LTR-, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- 
SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, 
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, 
LinkEqualizationRequest-
        Capabilities: [c0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fffbb040  Data: 00e7
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- 
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- 
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- 
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ 
ChkEn-
        Capabilities: [200 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- 
WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [320 v1] Access Control Services
                ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ 
UpstreamFwd+ EgressCtrl+ DirectTrans+
                ACSCtl: SrcValid- TransBlk- ReqRedir+ CmpltRedir+ 
UpstreamFwd+ EgressCtrl- DirectTrans-
        Capabilities: [330 v1] #12
        Kernel driver in use: pcieport

root@target:~# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Dec 12 07:23:50 2025
[  110.529810] PM: suspend entry (deep)
[  110.532688] Filesystems sync: 0.001 seconds
[  110.549590] Freezing user space processes
[  110.557833] Freezing user space processes completed (elapsed 0.008 
seconds)
[  110.558282] OOM killer disabled.
[  110.558296] Freezing remaining freezable tasks
[  110.561602] Freezing remaining freezable tasks completed (elapsed 
0.003 seconds)
[  110.736524] psmouse serio1: Failed to disable mouse on 1c070000.kmi
[  111.071329] psmouse serio0: Failed to disable mouse on 1c060000.kmi
[  111.700685] pcieport 0000:02:1f.0: Unable to change power state from 
D0 to D3hot, device inaccessible
[  111.737951] Disabling non-boot CPUs ...
[  111.757973] psci: CPU5 killed (polled 0 ms)
[  111.775215] psci: CPU4 killed (polled 0 ms)
[  111.789725] psci: CPU3 killed (polled 4 ms)
[  111.800778] psci: CPU2 killed (polled 0 ms)
[  111.816363] psci: CPU1 killed (polled 0 ms)

(machine never wakes up)

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v2 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
Posted by Naresh Kamboju 2 months, 1 week ago
On Tue, 2 Dec 2025 at 19:53, Manivannan Sadhasivam
<manivannan.sadhasivam@oss.qualcomm.com> wrote:
>
> Hi,
>
> This series fixes the long standing issue with ACS in OF platforms. There are
> two fixes in this series, both fixing independent issues on their own, but both
> are needed to properly enable ACS on OF platforms.
>
> Issue(s) background
> ===================
>
> Back in 2021, Xingang Wang first noted a failure in attaching the HiSilicon SEC
> device to QEMU ARM64 pci-root-port device [1]. He then tracked down the issue to
> ACS not being enabled for the QEMU Root Port device and he proposed a patch to
> fix it [2].
>
> Once the patch got applied, people reported PCIe issues with linux-next on the
> ARM Juno Development boards, where they saw failure in enumerating the endpoint
> devices [3][4]. So soon, the patch got dropped, but the actual issue with the
> ARM Juno boards was left behind.
>
> Fast forward to 2024, Pavan resubmitted the same fix [5] for his own usecase,
> hoping that someone in the community would fix the issue with ARM Juno boards.
> But the patch was rightly rejected, as a patch that was known to cause issues
> should not be merged to the kernel. But again, no one investigated the Juno
> issue and it was left behind again.
>
> Now it ended up in my plate and I managed to track down the issue with the help
> of Naresh who got access to the Juno boards in LKFT. The Juno issue was with the
> PCIe switch from Microsemi/IDT, which triggers ACS Source Validation error on
> Completions received for the Configuration Read Request from a device connected
> to the downstream port that has not yet captured the PCIe bus number. As per the
> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and Device Numbers
> supplied with all Type 0 Configuration Write Requests completed by the Function
> and supply these numbers in the Bus and Device Number fields of the Requester ID
> for all Requests". So during the first Configuration Read Request issued by the
> switch downstream port during enumeration (for reading Vendor ID), Bus and
> Device numbers will be unknown to the device. So it responds to the Read Request
> with Completion having Bus and Device number as 0. The switch interprets the
> Completion as an ACS Source Validation error and drops the completion, leading
> to the failure in detecting the endpoint device. Though the PCIe spec r6.0, sec
> 6.12.1.1, states that "Completions are never affected by ACS Source Validation".
> This behavior is in violation of the spec.
>
> Solution
> ========
>
> In September, I submitted a series [6] to fix both issues. For the IDT issue,
> I reused the existing quirk in the PCI core which does a dummy config write
> before issuing the first config read to the device. And for the ACS enablement
> issue, I just resubmitted the original patch from Xingang which called
> pci_request_acs() from devm_of_pci_bridge_init().
>
> But during the review of the series, several comments were received and they
> required the series to be reworked completely. Hence, in this version, I've
> incorported the comments as below:
>
> 1. For the ACS enablement issue, I've moved the pci_enable_acs() call from
> pci_acs_init() to pci_dma_configure().
>
> 2. For the IDT issue, I've cached the ACS capabilities (RO) in 'pci_dev',
> collected the broken capability for the IDT switches in the quirk and used it to
> disable the capability in the cache. This also allowed me to get rid of the
> earlier workaround for the switch.
>
> [1] https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
> [2] https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
> [3] https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
> [4] https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
> [5] https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
> [6] https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
>
> Changes in v2:
>
> * Reworked the patches completely as mentioned above.
> * Rebased on top of v6.18-rc7
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>


> ---
> Manivannan Sadhasivam (4):
>       PCI: Enable ACS only after configuring IOMMU for OF platforms
>       PCI: Cache ACS capabilities
>       PCI: Disable ACS SV capability for the broken IDT switches
>       PCI: Extend the pci_disable_acs_sv quirk for one more IDT switch
>
>  drivers/pci/pci-driver.c |  8 +++++++
>  drivers/pci/pci.c        | 33 ++++++++++++--------------
>  drivers/pci/pci.h        |  2 +-
>  drivers/pci/probe.c      | 12 ----------
>  drivers/pci/quirks.c     | 62 ++++++++++++------------------------------------
>  include/linux/pci.h      |  2 ++
>  6 files changed, 41 insertions(+), 78 deletions(-)
> ---
> base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d
> change-id: 20251201-pci_acs-b15aa3947289
>
> Best regards,
> --
> Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
>