[PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events

Feng Tang posted 4 patches 9 months, 3 weeks ago
[PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Feng Tang 9 months, 3 weeks ago
There was problem reported by firmware developers that they received
two PCIe hotplug commands in very short intervals on an ARM server,
which doesn't comply with PCIe spec, and broke their state machine and
work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
to wait at least 1 second for the command-complete event, before
resending the command or sending a new command.

In the failure case, the first PCIe hotplug command firmware received
is from get_port_device_capability(), which sends command to disable
PCIe hotplug interrupts without waiting for its completion, and the
second command comes from pcie_enable_notification() of pciehp driver,
which enables hotplug interrupts again.

Fix it by adding the necessary wait to comply with PCIe spec.

Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
Originally-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
---
 drivers/pci/pci.h          |  2 ++
 drivers/pci/pcie/portdrv.c | 19 +++++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4c94a589de4a..a1138ebc2689 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -760,6 +760,7 @@ static inline void pcie_ecrc_get_policy(char *str) { }
 void pcie_reset_lbms_count(struct pci_dev *port);
 int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
 int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms);
+void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
 #else
 static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
 static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
@@ -770,6 +771,7 @@ static inline int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
 {
 	return 0;
 }
+static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
 #endif
 
 struct pci_dev_reset_methods {
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index bb00ba45ee51..ca4f21dff486 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -230,6 +230,22 @@ int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
 	return  ret;
 }
 
+void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
+{
+	u16 slot_ctrl = 0;
+
+	pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
+	/* Bail out early if it is already disabled */
+	if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
+		return;
+
+	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
+		  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
+
+	if (pcie_poll_sltctl_cmd(dev, 1000))
+		pci_info(dev, "Timeout on disabling PCIe hot-plug interrupt\n");
+}
+
 /**
  * get_port_device_capability - discover capabilities of a PCI Express port
  * @dev: PCI Express port to examine
@@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
 		 * Disable hot-plug interrupts in case they have been enabled
 		 * by the BIOS and the hot-plug service driver is not loaded.
 		 */
-		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
-			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
+		pcie_disable_hp_interrupts_early(dev);
 	}
 
 #ifdef CONFIG_PCIEAER
-- 
2.43.5
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Lukas Wunner 9 months, 3 weeks ago
On Mon, Feb 24, 2025 at 11:44:58AM +0800, Feng Tang wrote:
> @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
>  		 * Disable hot-plug interrupts in case they have been enabled
>  		 * by the BIOS and the hot-plug service driver is not loaded.
>  		 */
> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +		pcie_disable_hp_interrupts_early(dev);
>  	}

Moving the Slot Control code from pciehp to portdrv (as is done in
patch 1 of this series) is hackish.  It should be avoided if at all
possible.

As I've already said before...

https://lore.kernel.org/all/Z6HYuBDP6uvE1Sf4@wunner.de/

...it seems clearing those interrupts is only necessary
in the CONFIG_HOTPLUG_PCI_PCIE=n case.  And in that case,
there is no second Slot Control write, so there is no delay
to be observed.

Hence the proper solution is to make the clearing of the
interrupts conditional on: !IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE)

You keep sending new versions of these patches that do not
incorporate the feedback I provided in the above-linked e-mail.

Please re-read that e-mail and verify if the solution that
I proposed solves the problem.  That solution does not
require moving the Slot Control code from pciehp to portdrv.

Thanks,

Lukas
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Feng Tang 9 months, 3 weeks ago
Hi Lukas,

On Mon, Feb 24, 2025 at 07:12:11PM +0100, Lukas Wunner wrote:
> On Mon, Feb 24, 2025 at 11:44:58AM +0800, Feng Tang wrote:
> > @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
> >  		 * Disable hot-plug interrupts in case they have been enabled
> >  		 * by the BIOS and the hot-plug service driver is not loaded.
> >  		 */
> > -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > +		pcie_disable_hp_interrupts_early(dev);
> >  	}
> 
> Moving the Slot Control code from pciehp to portdrv (as is done in
> patch 1 of this series) is hackish.  It should be avoided if at all
> possible.

I tried to remove the code duplication of 2 waiting function, according
to Bjorn's comment in https://lore.kernel.org/lkml/20250218223354.GA196886@bhelgaas/.
Maybe I didn't git it right. Any suggestion?

> 
> As I've already said before...
> 
> https://lore.kernel.org/all/Z6HYuBDP6uvE1Sf4@wunner.de/
> 
> ...it seems clearing those interrupts is only necessary
> in the CONFIG_HOTPLUG_PCI_PCIE=n case.  And in that case,
> there is no second Slot Control write, so there is no delay
> to be observed.
> 
> Hence the proper solution is to make the clearing of the
> interrupts conditional on: !IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE)
> 
> You keep sending new versions of these patches that do not
> incorporate the feedback I provided in the above-linked e-mail.
> 
> Please re-read that e-mail and verify if the solution that
> I proposed solves the problem.  That solution does not
> require moving the Slot Control code from pciehp to portdrv.

There might be some misunderstaning here :), I responded in
https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
that your suggestion could solve our issue.

And the reason I didn't take it is I was afraid that it might hurt
the problem what commit 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port
services during port initialization") tried to solve.

As you mentioned, the comment for 2bd50dd800b5 is "a bit confusing",
and I tried to guess in my previous reply: 
"
I'm not sure what problem this piece of code try to avoid, maybe
something simliar to the irq storm isseu as mentioned in the 2/2 patch?
The code comments could be about the small time window between this
point and the loading of pciehp driver, which will request_irq and
enable hotplug interrupt again.
"

The code comment from 2bd50dd800b5 is:

	/*
	 * Disable hot-plug interrupts in case they have been
	 * enabled by the BIOS and the hot-plug service driver
	 * is not loaded.
	 */

The "is not loaded" has 2 possible meanings:
1. the pciehp driver is not loaded yet at this point inside
   get_port_device_capability(), and will be loaded later
2. the pciehp will never be loaded, i.e. CONFIG_HOTPLUG_PCI_PCIE=n 

If it's case 2, your suggestion can solve it nicely, but for case 1,
we may have to keep the interrupt disabling.

Thanks,
Feng
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Lukas Wunner 9 months, 3 weeks ago
On Tue, Feb 25, 2025 at 11:06:50AM +0800, Feng Tang wrote:
> On Mon, Feb 24, 2025 at 07:12:11PM +0100, Lukas Wunner wrote:
> > On Mon, Feb 24, 2025 at 11:44:58AM +0800, Feng Tang wrote:
> > > @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
> > >  		 * Disable hot-plug interrupts in case they have been enabled
> > >  		 * by the BIOS and the hot-plug service driver is not loaded.
> > >  		 */
> > > -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > > -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > > +		pcie_disable_hp_interrupts_early(dev);
> > >  	}
> > 
> > Moving the Slot Control code from pciehp to portdrv (as is done in
> > patch 1 of this series) is hackish.  It should be avoided if at all
> > possible.
> 
> I tried to remove the code duplication of 2 waiting function, according
> to Bjorn's comment in https://lore.kernel.org/lkml/20250218223354.GA196886@bhelgaas/.
> Maybe I didn't git it right. Any suggestion?

My point is just that you may not need to move the code from pciehp to
portdrv at all if you follow my suggestion.


> There might be some misunderstaning here :), I responded in
> https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> that your suggestion could solve our issue.

Well, could you test it please?

A small change like the one I proposed is definitely preferable to
moving dozens of lines of code around.


> And the reason I didn't take it is I was afraid that it might hurt
> the problem what commit 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port
> services during port initialization") tried to solve.
> 
> As you mentioned, the comment for 2bd50dd800b5 is "a bit confusing",
> and I tried to guess in my previous reply: 
> "
> I'm not sure what problem this piece of code try to avoid, maybe
> something simliar to the irq storm isseu as mentioned in the 2/2 patch?
> The code comments could be about the small time window between this
> point and the loading of pciehp driver, which will request_irq and
> enable hotplug interrupt again.
> "
> 
> The code comment from 2bd50dd800b5 is:
> 
> 	/*
> 	 * Disable hot-plug interrupts in case they have been
> 	 * enabled by the BIOS and the hot-plug service driver
> 	 * is not loaded.
> 	 */
> 
> The "is not loaded" has 2 possible meanings:
> 1. the pciehp driver is not loaded yet at this point inside
>    get_port_device_capability(), and will be loaded later
> 2. the pciehp will never be loaded, i.e. CONFIG_HOTPLUG_PCI_PCIE=n 
> 
> If it's case 2, your suggestion can solve it nicely, but for case 1,
> we may have to keep the interrupt disabling.

The pciehp driver cannot be bound to the PCIe port when
get_port_device_capability() is running.  Because at that point,
portdrv is still figuring out which capabilities the port has and
it will subsequently instantiate the port service devices to which
the drivers (such as pciehp) will bind.

So in that sense, case 1 cannot be what the code comment is
referring to.

My point is that if CONFIG_HOTPLUG_PCI_PCIE=y, there may indeed be
another write to the Slot Control register before the command written
by portdrv has been completed.  Because pciehp will write to the
register on probe.  But in this case, there shouldn't be a need for
portdrv to quiesce the interrupt because pciehp will do that anyway
shortly afterwards.

And in the CONFIG_HOTPLUG_PCI_PCIE=n case, pciehp will not quiesce
the interrupt, so portdrv has to do that.  I believe that's what
the code comment is referring to.  It should be safe to just write
to the Slot Control register without waiting for the command to
complete because there shouldn't be another Slot Control write
afterwards (not by pciehp anyway).

If making the Slot Control write in portdrv conditional on
CONFIG_HOTPLUG_PCI_PCIE=n does unexpectedly *not* solve the issue,
please try to find out where the second Slot Control write is
coming from.  E.g. you could amend pcie_capability_write_word()
with something like:

	if (pos == PCI_EXP_SLTCTL) {
		pci_info(dev, "Writing %04hx SltCtl\n", val);
		dump_stack();
	}

Thanks,

Lukas
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Feng Tang 9 months, 3 weeks ago
Hi Lukas,

On Tue, Feb 25, 2025 at 05:01:43AM +0100, Lukas Wunner wrote:
> On Tue, Feb 25, 2025 at 11:06:50AM +0800, Feng Tang wrote:
> > On Mon, Feb 24, 2025 at 07:12:11PM +0100, Lukas Wunner wrote:
> > > On Mon, Feb 24, 2025 at 11:44:58AM +0800, Feng Tang wrote:
> > > > @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
> > > >  		 * Disable hot-plug interrupts in case they have been enabled
> > > >  		 * by the BIOS and the hot-plug service driver is not loaded.
> > > >  		 */
> > > > -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > > > -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > > > +		pcie_disable_hp_interrupts_early(dev);
> > > >  	}
> > > 
> > > Moving the Slot Control code from pciehp to portdrv (as is done in
> > > patch 1 of this series) is hackish.  It should be avoided if at all
> > > possible.
> > 
> > I tried to remove the code duplication of 2 waiting function, according
> > to Bjorn's comment in https://lore.kernel.org/lkml/20250218223354.GA196886@bhelgaas/.
> > Maybe I didn't git it right. Any suggestion?
> 
> My point is just that you may not need to move the code from pciehp to
> portdrv at all if you follow my suggestion.
 
I see.

> 
> > There might be some misunderstaning here :), I responded in
> > https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> > that your suggestion could solve our issue.
> 
> Well, could you test it please?

I don't have the hardware right now, will try to get from firmware
developers. But from code logic, your suggestion can surely solve the
issue unless I still miss something. From bug report (also commit log),
the first PCIe hotplug command issued is here, and the second command
comes from pciehp driver. In our kernel config, CONFIG_HOTPLUG_PCI_PCIE=y,
so the first command won't happen, and all following commands come
from pciehp driver, which setup its own waiting for command logic.

> A small change like the one I proposed is definitely preferable to
> moving dozens of lines of code around.

Agree.

> 
> > And the reason I didn't take it is I was afraid that it might hurt
> > the problem what commit 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port
> > services during port initialization") tried to solve.
> > 
> > As you mentioned, the comment for 2bd50dd800b5 is "a bit confusing",
> > and I tried to guess in my previous reply: 
> > "
> > I'm not sure what problem this piece of code try to avoid, maybe
> > something simliar to the irq storm isseu as mentioned in the 2/2 patch?
> > The code comments could be about the small time window between this
> > point and the loading of pciehp driver, which will request_irq and
> > enable hotplug interrupt again.
> > "
> > 
> > The code comment from 2bd50dd800b5 is:
> > 
> > 	/*
> > 	 * Disable hot-plug interrupts in case they have been
> > 	 * enabled by the BIOS and the hot-plug service driver
> > 	 * is not loaded.
> > 	 */
> > 
> > The "is not loaded" has 2 possible meanings:
> > 1. the pciehp driver is not loaded yet at this point inside
> >    get_port_device_capability(), and will be loaded later
> > 2. the pciehp will never be loaded, i.e. CONFIG_HOTPLUG_PCI_PCIE=n 
> > 
> > If it's case 2, your suggestion can solve it nicely, but for case 1,
> > we may have to keep the interrupt disabling.
> 
> The pciehp driver cannot be bound to the PCIe port when
> get_port_device_capability() is running.  Because at that point,
> portdrv is still figuring out which capabilities the port has and
> it will subsequently instantiate the port service devices to which
> the drivers (such as pciehp) will bind.
 
Yes, the time window between here and the following initialization of
pciehp service driver is very small, and your suggestion sounds quite
safe to me.

> So in that sense, case 1 cannot be what the code comment is
> referring to.

Hi Rafel,

Could you help to confirm this?

> 
> My point is that if CONFIG_HOTPLUG_PCI_PCIE=y, there may indeed be
> another write to the Slot Control register before the command written
> by portdrv has been completed.  Because pciehp will write to the
> register on probe.  But in this case, there shouldn't be a need for
> portdrv to quiesce the interrupt because pciehp will do that anyway
> shortly afterwards.
> 
> And in the CONFIG_HOTPLUG_PCI_PCIE=n case, pciehp will not quiesce
> the interrupt, so portdrv has to do that.  I believe that's what
> the code comment is referring to.  It should be safe to just write
> to the Slot Control register without waiting for the command to
> complete because there shouldn't be another Slot Control write
> afterwards (not by pciehp anyway).
> 
> If making the Slot Control write in portdrv conditional on
> CONFIG_HOTPLUG_PCI_PCIE=n does unexpectedly *not* solve the issue,
> please try to find out where the second Slot Control write is
> coming from.  E.g. you could amend pcie_capability_write_word()
> with something like:
> 
> 	if (pos == PCI_EXP_SLTCTL) {
> 		pci_info(dev, "Writing %04hx SltCtl\n", val);
> 		dump_stack();
> 	}

Thanks for the suggestion, and we added similar debug to figure
out them.

Thanks,
Feng
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Feng Tang 9 months, 3 weeks ago
On Tue, Feb 25, 2025 at 12:42:04PM +0800, Feng Tang wrote:
 
> > 
> > > There might be some misunderstaning here :), I responded in
> > > https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> > > that your suggestion could solve our issue.
> > 
> > Well, could you test it please?
> 
> I don't have the hardware right now, will try to get from firmware
> developers. But from code logic, your suggestion can surely solve the
> issue unless I still miss something. From bug report (also commit log),
> the first PCIe hotplug command issued is here, and the second command
> comes from pciehp driver. In our kernel config, CONFIG_HOTPLUG_PCI_PCIE=y,
> so the first command won't happen, and all following commands come
> from pciehp driver, which setup its own waiting for command logic.
> 

Hi Lucas,

We just tried the patch on the hardware and initial 5.10 kernel, and
the problem cannot be reproduced, as the first PCIe hotplug command
of disabling CCIE and HPIE was not issued. 

Should I post a new version patch with your suggestion? You analysis
in previous sounded sane to me. Also for the original context, if the
BIOS has enabled the hotplug interrupt, it has been there since OS
boot for quite some time, this solution just affects a very small
time window from here to the loading of pciehp driver.  

Also I would like to separate this patch from the patch dealing the
nomsi irq storm issue. How do you think?

Thanks,
Feng

> > > 
> > > The code comment from 2bd50dd800b5 is:
> > > 
> > > 	/*
> > > 	 * Disable hot-plug interrupts in case they have been
> > > 	 * enabled by the BIOS and the hot-plug service driver
> > > 	 * is not loaded.
> > > 	 */
> > > 
> > > The "is not loaded" has 2 possible meanings:
> > > 1. the pciehp driver is not loaded yet at this point inside
> > >    get_port_device_capability(), and will be loaded later
> > > 2. the pciehp will never be loaded, i.e. CONFIG_HOTPLUG_PCI_PCIE=n 
> > > 
> > > If it's case 2, your suggestion can solve it nicely, but for case 1,
> > > we may have to keep the interrupt disabling.
> > 
> > The pciehp driver cannot be bound to the PCIe port when
> > get_port_device_capability() is running.  Because at that point,
> > portdrv is still figuring out which capabilities the port has and
> > it will subsequently instantiate the port service devices to which
> > the drivers (such as pciehp) will bind.
>  
> Yes, the time window between here and the following initialization of
> pciehp service driver is very small, and your suggestion sounds quite
> safe to me.
> 
> > So in that sense, case 1 cannot be what the code comment is
> > referring to.
> 
> Hi Rafel,
> 
> Could you help to confirm this?
> 
> > 
> > My point is that if CONFIG_HOTPLUG_PCI_PCIE=y, there may indeed be
> > another write to the Slot Control register before the command written
> > by portdrv has been completed.  Because pciehp will write to the
> > register on probe.  But in this case, there shouldn't be a need for
> > portdrv to quiesce the interrupt because pciehp will do that anyway
> > shortly afterwards.
> > 
> > And in the CONFIG_HOTPLUG_PCI_PCIE=n case, pciehp will not quiesce
> > the interrupt, so portdrv has to do that.  I believe that's what
> > the code comment is referring to.  It should be safe to just write
> > to the Slot Control register without waiting for the command to
> > complete because there shouldn't be another Slot Control write
> > afterwards (not by pciehp anyway).
> > 
> > If making the Slot Control write in portdrv conditional on
> > CONFIG_HOTPLUG_PCI_PCIE=n does unexpectedly *not* solve the issue,
> > please try to find out where the second Slot Control write is
> > coming from.  E.g. you could amend pcie_capability_write_word()
> > with something like:
> > 
> > 	if (pos == PCI_EXP_SLTCTL) {
> > 		pci_info(dev, "Writing %04hx SltCtl\n", val);
> > 		dump_stack();
> > 	}
> 
> Thanks for the suggestion, and we added similar debug to figure
> out them.
> 
> Thanks,
> Feng
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Lukas Wunner 9 months, 3 weeks ago
On Fri, Feb 28, 2025 at 02:29:29PM +0800, Feng Tang wrote:
> On Tue, Feb 25, 2025 at 12:42:04PM +0800, Feng Tang wrote:
> > > > There might be some misunderstaning here :), I responded in
> > > > https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> > > > that your suggestion could solve our issue.
> > > 
> > > Well, could you test it please?
> > 
> We just tried the patch on the hardware and initial 5.10 kernel, and
> the problem cannot be reproduced, as the first PCIe hotplug command
> of disabling CCIE and HPIE was not issued.

Good!

> Should I post a new version patch with your suggestion?

Yes, please.

> Also I would like to separate this patch from the patch dealing the
> nomsi irq storm issue. How do you think?

Makes sense to me.

The problem with the nomsi irq storm is really that if the platform
(i.e. BIOS) doesn't grant OSPM control of hotplug, OSPM (i.e. the kernel)
cannot modify hotplug registers because the assumption is that the
platform controls them.  If the platform doesn't actually handle
hotplug, but keeps the interrupts enabled, that's basically a bug
of the specific platform.

I think the kernel community's stance in such situations is that the
BIOS vendor should provide an update with a fix.  In some cases
that's not posible because the product is no longer supported,
or the vendor doesn't care about Linux issues because it only
supports Windows or macOS.  In those cases, we deal with these
problems with a quirk.  E.g. on x86 we often use a DMI quirk to
recognize affected hardware and the quirk would then disable the
hotplug interrupts.

Thanks,

Lukas
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Feng Tang 9 months, 3 weeks ago
Hi Lukas,

On Fri, Feb 28, 2025 at 08:14:04AM +0100, Lukas Wunner wrote:
> On Fri, Feb 28, 2025 at 02:29:29PM +0800, Feng Tang wrote:
> > On Tue, Feb 25, 2025 at 12:42:04PM +0800, Feng Tang wrote:
> > > > > There might be some misunderstaning here :), I responded in
> > > > > https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> > > > > that your suggestion could solve our issue.
> > > > 
> > > > Well, could you test it please?
> > > 
> > We just tried the patch on the hardware and initial 5.10 kernel, and
> > the problem cannot be reproduced, as the first PCIe hotplug command
> > of disabling CCIE and HPIE was not issued.
> 
> Good!
> 
> > Should I post a new version patch with your suggestion?
> 
> Yes, please.

Will do, thanks

> 
> > Also I would like to separate this patch from the patch dealing the
> > nomsi irq storm issue. How do you think?
> 
> Makes sense to me.
> 
> The problem with the nomsi irq storm is really that if the platform
> (i.e. BIOS) doesn't grant OSPM control of hotplug, OSPM (i.e. the kernel)
> cannot modify hotplug registers because the assumption is that the
> platform controls them. 

Yes, very reasonable. I also talked with some firmware engineer, who
shared there is working sample that on some old x86 platform, the
firmware itself is really capable of handling the hotplug stuff when
MSI is disabled.

> If the platform doesn't actually handle
> hotplug, but keeps the interrupts enabled, that's basically a bug
> of the specific platform.

That's what happened in our case :)

> I think the kernel community's stance in such situations is that the
> BIOS vendor should provide an update with a fix.  In some cases
> that's not posible because the product is no longer supported,
> or the vendor doesn't care about Linux issues because it only
> supports Windows or macOS.  In those cases, we deal with these
> problems with a quirk.  E.g. on x86 we often use a DMI quirk to
> recognize affected hardware and the quirk would then disable the
> hotplug interrupts.

I see.

As you dug out the history in https://lore.kernel.org/lkml/Z6RU-681eXl7hcp6@wunner.de/

Our previous debug could go through the OSC check in nomsi case,
only after below patch:

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 84030804a763..e7d9328cba45 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -38,8 +38,7 @@ static int acpi_pci_root_scan_dependent(struct acpi_device *adev)
 
 #define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
 				| OSC_PCI_ASPM_SUPPORT \
-				| OSC_PCI_CLOCK_PM_SUPPORT \
-				| OSC_PCI_MSI_SUPPORT)
+				| OSC_PCI_CLOCK_PM_SUPPORT)

Otherwise, the OSC function won't be executed, but kernel will simply
disable PCIe hotplug, which breaks the working sample I mentioned above.
We'd better also include take this change?

Thanks,
Feng

> 
> Thanks,
> 
> Lukas
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Feng Tang 9 months, 3 weeks ago
On Fri, Feb 28, 2025 at 05:33:43PM +0800, Feng Tang wrote:
> > The problem with the nomsi irq storm is really that if the platform
> > (i.e. BIOS) doesn't grant OSPM control of hotplug, OSPM (i.e. the kernel)
> > cannot modify hotplug registers because the assumption is that the
> > platform controls them. 
> 
> Yes, very reasonable. I also talked with some firmware engineer, who
> shared there is working sample that on some old x86 platform, the
> firmware itself is really capable of handling the hotplug stuff when
> MSI is disabled.
> 
> > If the platform doesn't actually handle
> > hotplug, but keeps the interrupts enabled, that's basically a bug
> > of the specific platform.
> 
> That's what happened in our case :)
> 
> > I think the kernel community's stance in such situations is that the
> > BIOS vendor should provide an update with a fix.  In some cases
> > that's not posible because the product is no longer supported,
> > or the vendor doesn't care about Linux issues because it only
> > supports Windows or macOS.  In those cases, we deal with these
> > problems with a quirk.  E.g. on x86 we often use a DMI quirk to
> > recognize affected hardware and the quirk would then disable the
> > hotplug interrupts.
> 
> I see.
> 
> As you dug out the history in https://lore.kernel.org/lkml/Z6RU-681eXl7hcp6@wunner.de/
> 
> Our previous debug could go through the OSC check in nomsi case,
> only after below patch:
> 
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 84030804a763..e7d9328cba45 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -38,8 +38,7 @@ static int acpi_pci_root_scan_dependent(struct acpi_device *adev)
>  
>  #define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
>  				| OSC_PCI_ASPM_SUPPORT \
> -				| OSC_PCI_CLOCK_PM_SUPPORT \
> -				| OSC_PCI_MSI_SUPPORT)
> +				| OSC_PCI_CLOCK_PM_SUPPORT)
> 
> Otherwise, the OSC function won't be executed, but kernel will simply
> disable PCIe hotplug,

> which breaks the working sample I mentioned above.
Sorry, this saying is wrong, the patch doesn't change anything for cases
that firwmare would handle the PCIe hotplug.

But still, it is reasonable to let firmware decide if it would grant
the control to OSPM.

Thanks,
Feng

> We'd better also include take this change?
> 
> Thanks,
> Feng
> 
> > 
> > Thanks,
> > 
> > Lukas
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Ilpo Järvinen 9 months, 3 weeks ago
On Mon, 24 Feb 2025, Feng Tang wrote:

> There was problem reported by firmware developers that they received
> two PCIe hotplug commands in very short intervals on an ARM server,
> which doesn't comply with PCIe spec, and broke their state machine and
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> to wait at least 1 second for the command-complete event, before
> resending the command or sending a new command.
> 
> In the failure case, the first PCIe hotplug command firmware received
> is from get_port_device_capability(), which sends command to disable
> PCIe hotplug interrupts without waiting for its completion, and the
> second command comes from pcie_enable_notification() of pciehp driver,
> which enables hotplug interrupts again.
> 
> Fix it by adding the necessary wait to comply with PCIe spec.
> 
> Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> Originally-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> ---
>  drivers/pci/pci.h          |  2 ++
>  drivers/pci/pcie/portdrv.c | 19 +++++++++++++++++--
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 4c94a589de4a..a1138ebc2689 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -760,6 +760,7 @@ static inline void pcie_ecrc_get_policy(char *str) { }
>  void pcie_reset_lbms_count(struct pci_dev *port);
>  int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
>  int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms);
> +void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
>  #else
>  static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
>  static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> @@ -770,6 +771,7 @@ static inline int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
>  {
>  	return 0;
>  }
> +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
>  #endif
>  
>  struct pci_dev_reset_methods {
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index bb00ba45ee51..ca4f21dff486 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -230,6 +230,22 @@ int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
>  	return  ret;
>  }
>  
> +void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
> +{
> +	u16 slot_ctrl = 0;

Unnecessary initialization

> +
> +	pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
> +	/* Bail out early if it is already disabled */
> +	if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
> +		return;
> +
> +	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> +		  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);

Align to (. You might need to put the bits to own lines.

> +
> +	if (pcie_poll_sltctl_cmd(dev, 1000))
> +		pci_info(dev, "Timeout on disabling PCIe hot-plug interrupt\n");
> +}
> +
>  /**
>   * get_port_device_capability - discover capabilities of a PCI Express port
>   * @dev: PCI Express port to examine
> @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
>  		 * Disable hot-plug interrupts in case they have been enabled
>  		 * by the BIOS and the hot-plug service driver is not loaded.
>  		 */
> -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> +		pcie_disable_hp_interrupts_early(dev);

Doesn't calling this here delay setup for all portdrv services, not just 
hotplug? And the delay can be relatively long.

>  	}
>  
>  #ifdef CONFIG_PCIEAER
> 

-- 
 i.
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Feng Tang 9 months, 3 weeks ago
Hi Ilpo, 

On Mon, Feb 24, 2025 at 05:00:03PM +0200, Ilpo Järvinen wrote:
> On Mon, 24 Feb 2025, Feng Tang wrote:
> 
> > There was problem reported by firmware developers that they received
> > two PCIe hotplug commands in very short intervals on an ARM server,
> > which doesn't comply with PCIe spec, and broke their state machine and
> > work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
> > to wait at least 1 second for the command-complete event, before
> > resending the command or sending a new command.
> > 
> > In the failure case, the first PCIe hotplug command firmware received
> > is from get_port_device_capability(), which sends command to disable
> > PCIe hotplug interrupts without waiting for its completion, and the
> > second command comes from pcie_enable_notification() of pciehp driver,
> > which enables hotplug interrupts again.
> > 
> > Fix it by adding the necessary wait to comply with PCIe spec.
> > 
> > Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
> > Originally-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
> > Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
> > ---
> >  drivers/pci/pci.h          |  2 ++
> >  drivers/pci/pcie/portdrv.c | 19 +++++++++++++++++--
> >  2 files changed, 19 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 4c94a589de4a..a1138ebc2689 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -760,6 +760,7 @@ static inline void pcie_ecrc_get_policy(char *str) { }
> >  void pcie_reset_lbms_count(struct pci_dev *port);
> >  int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
> >  int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms);
> > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
> >  #else
> >  static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
> >  static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> > @@ -770,6 +771,7 @@ static inline int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
> >  {
> >  	return 0;
> >  }
> > +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
> >  #endif
> >  
> >  struct pci_dev_reset_methods {
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index bb00ba45ee51..ca4f21dff486 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -230,6 +230,22 @@ int pcie_poll_sltctl_cmd(struct pci_dev *dev, int timeout_ms)
> >  	return  ret;
> >  }
> >  
> > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev)
> > +{
> > +	u16 slot_ctrl = 0;
> 
> Unnecessary initialization
 
The reason I put it to 0 is, very unlikely, the pcie_capability_read_word()
might fail, and 0 can avoid the early return.

> > +
> > +	pcie_capability_read_word(dev, PCI_EXP_SLTCTL, &slot_ctrl);
> > +	/* Bail out early if it is already disabled */
> > +	if (!(slot_ctrl & (PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE)))
> > +		return;
> > +
> > +	pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > +		  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> 
> Align to (. You might need to put the bits to own lines.

OK.

> > +
> > +	if (pcie_poll_sltctl_cmd(dev, 1000))
> > +		pci_info(dev, "Timeout on disabling PCIe hot-plug interrupt\n");
> > +}
> > +
> >  /**
> >   * get_port_device_capability - discover capabilities of a PCI Express port
> >   * @dev: PCI Express port to examine
> > @@ -255,8 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
> >  		 * Disable hot-plug interrupts in case they have been enabled
> >  		 * by the BIOS and the hot-plug service driver is not loaded.
> >  		 */
> > -		pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> > -			  PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
> > +		pcie_disable_hp_interrupts_early(dev);
> 
> Doesn't calling this here delay setup for all portdrv services, not just 
> hotplug? And the delay can be relatively long.

I don't quite follow, physically there is only one root port, the code
from commit 2bd50dd800b5 just does it once. Also the 1 second is just the
maxim waiting time, the wait will end once the command completed event
bit is set. The exact time depends on platform (x86 and ARM) and the
firmeware implementation, and it is much smaller than 1 second AFAIK.

Thanks,
Feng

> >  	}
> >  
> >  #ifdef CONFIG_PCIEAER
> > 
> 
> -- 
>  i.
Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling hotplug events
Posted by Markus Elfring 9 months, 3 weeks ago
…
> work flow. According to PCIe 6.1 spec, section 6.7.3.2, software needs
…

                                   specification?

Regards,
Markus