[RFC PATCH] PCI: Support Immediate Readiness on devices without PM capabilities

Sean Christopherson posted 1 patch 3 months, 2 weeks ago
There is a newer version of this series
drivers/pci/pci.c | 40 +++++++++++++++++++++++++---------------
1 file changed, 25 insertions(+), 15 deletions(-)
[RFC PATCH] PCI: Support Immediate Readiness on devices without PM capabilities
Posted by Sean Christopherson 3 months, 2 weeks ago
Query support for Immediate Readiness irrespective of whether or not the
device supports PM capabilities, as nothing in the PCIe spec suggests that
Immediate Readiness is in any way dependent on PM functionality.

Opportunistically add a comment to explain why "errors" during PM setup
are effectively ignored.

Fixes: d6112f8def51 ("PCI: Add support for Immediate Readiness")
Cc: David Matlack <dmatlack@google.com>
Cc: Vipin Sharma <vipinsh@google.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---

RFC as I'm not entirely sure this is useful/correct.

Found by inspection when debugging a VFIO VF passthrough issue that turned out
to be 907a7a2e5bf4 ("PCI/PM: Set up runtime PM even for devices without PCI PM").

The folks on the Cc list are looking at parallelizing VF assignment to avoid
serializing the 100ms wait on FLR.  I'm hoping we'll get lucky and the VFs in
question do (or can) support PCI_STATUS_IMM_READY.

 drivers/pci/pci.c | 40 +++++++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9e42090fb108..cd91adbf0269 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3198,33 +3198,22 @@ void pci_pm_power_up_and_verify_state(struct pci_dev *pci_dev)
 	pci_update_current_state(pci_dev, PCI_D0);
 }
 
-/**
- * pci_pm_init - Initialize PM functions of given PCI device
- * @dev: PCI device to handle.
- */
-void pci_pm_init(struct pci_dev *dev)
+static void __pci_pm_init(struct pci_dev *dev)
 {
 	int pm;
-	u16 status;
 	u16 pmc;
 
-	device_enable_async_suspend(&dev->dev);
-	dev->wakeup_prepared = false;
-
-	dev->pm_cap = 0;
-	dev->pme_support = 0;
-
 	/* find PCI PM capability in list */
 	pm = pci_find_capability(dev, PCI_CAP_ID_PM);
 	if (!pm)
-		goto poweron;
+		return;
 	/* Check device's ability to generate PME# */
 	pci_read_config_word(dev, pm + PCI_PM_PMC, &pmc);
 
 	if ((pmc & PCI_PM_CAP_VER_MASK) > 3) {
 		pci_err(dev, "unsupported PM cap regs version (%u)\n",
 			pmc & PCI_PM_CAP_VER_MASK);
-		goto poweron;
+		return;
 	}
 
 	dev->pm_cap = pm;
@@ -3265,11 +3254,32 @@ void pci_pm_init(struct pci_dev *dev)
 		/* Disable the PME# generation functionality */
 		pci_pme_active(dev, false);
 	}
+}
+
+/**
+ * pci_pm_init - Initialize PM functions of given PCI device
+ * @dev: PCI device to handle.
+ */
+void pci_pm_init(struct pci_dev *dev)
+{
+	u16 status;
+
+	device_enable_async_suspend(&dev->dev);
+	dev->wakeup_prepared = false;
+
+	dev->pm_cap = 0;
+	dev->pme_support = 0;
+
+	/*
+	 * Note, support for the PCI PM spec is optional for legacy PCI devices
+	 * and for VFs.  Continue on even if no PM capabilities are supported.
+	 */
+	__pci_pm_init(dev);
 
 	pci_read_config_word(dev, PCI_STATUS, &status);
 	if (status & PCI_STATUS_IMM_READY)
 		dev->imm_ready = 1;
-poweron:
+
 	pci_pm_power_up_and_verify_state(dev);
 	pm_runtime_forbid(&dev->dev);
 	pm_runtime_set_active(&dev->dev);

base-commit: 86731a2a651e58953fc949573895f2fa6d456841
-- 
2.50.0.714.g196bf9f422-goog
Re: [RFC PATCH] PCI: Support Immediate Readiness on devices without PM capabilities
Posted by Bjorn Helgaas 3 months, 2 weeks ago
On Tue, Jun 24, 2025 at 10:16:37AM -0700, Sean Christopherson wrote:
> Query support for Immediate Readiness irrespective of whether or not the
> device supports PM capabilities, as nothing in the PCIe spec suggests that
> Immediate Readiness is in any way dependent on PM functionality.

Huh, I forgot that we had support for Immediate Readiness at all.

I agree, Immediate Readiness has nothing to do with PM except that we
take advantage of it in a PM path, and I think pci_pm_init() was
probably not the best place to put this.

> Opportunistically add a comment to explain why "errors" during PM setup
> are effectively ignored.
> 
> Fixes: d6112f8def51 ("PCI: Add support for Immediate Readiness")
> Cc: David Matlack <dmatlack@google.com>
> Cc: Vipin Sharma <vipinsh@google.com>
> Cc: Aaron Lewis <aaronlewis@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> 
> RFC as I'm not entirely sure this is useful/correct.
> 
> Found by inspection when debugging a VFIO VF passthrough issue that turned out
> to be 907a7a2e5bf4 ("PCI/PM: Set up runtime PM even for devices without PCI PM").
> 
> The folks on the Cc list are looking at parallelizing VF assignment to avoid
> serializing the 100ms wait on FLR.  I'm hoping we'll get lucky and the VFs in
> question do (or can) support PCI_STATUS_IMM_READY.
> 
>  drivers/pci/pci.c | 40 +++++++++++++++++++++++++---------------
>  1 file changed, 25 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 9e42090fb108..cd91adbf0269 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3198,33 +3198,22 @@ void pci_pm_power_up_and_verify_state(struct pci_dev *pci_dev)
>  	pci_update_current_state(pci_dev, PCI_D0);
>  }
>  
> -/**
> - * pci_pm_init - Initialize PM functions of given PCI device
> - * @dev: PCI device to handle.
> - */
> -void pci_pm_init(struct pci_dev *dev)
> +static void __pci_pm_init(struct pci_dev *dev)
>  {
>  	int pm;
> -	u16 status;
>  	u16 pmc;
>  
> -	device_enable_async_suspend(&dev->dev);
> -	dev->wakeup_prepared = false;
> -
> -	dev->pm_cap = 0;
> -	dev->pme_support = 0;
> -
>  	/* find PCI PM capability in list */
>  	pm = pci_find_capability(dev, PCI_CAP_ID_PM);
>  	if (!pm)
> -		goto poweron;
> +		return;
>  	/* Check device's ability to generate PME# */
>  	pci_read_config_word(dev, pm + PCI_PM_PMC, &pmc);
>  
>  	if ((pmc & PCI_PM_CAP_VER_MASK) > 3) {
>  		pci_err(dev, "unsupported PM cap regs version (%u)\n",
>  			pmc & PCI_PM_CAP_VER_MASK);
> -		goto poweron;
> +		return;
>  	}
>  
>  	dev->pm_cap = pm;
> @@ -3265,11 +3254,32 @@ void pci_pm_init(struct pci_dev *dev)
>  		/* Disable the PME# generation functionality */
>  		pci_pme_active(dev, false);
>  	}
> +}
> +
> +/**
> + * pci_pm_init - Initialize PM functions of given PCI device
> + * @dev: PCI device to handle.
> + */
> +void pci_pm_init(struct pci_dev *dev)
> +{
> +	u16 status;
> +
> +	device_enable_async_suspend(&dev->dev);
> +	dev->wakeup_prepared = false;
> +
> +	dev->pm_cap = 0;
> +	dev->pme_support = 0;
> +
> +	/*
> +	 * Note, support for the PCI PM spec is optional for legacy PCI devices
> +	 * and for VFs.  Continue on even if no PM capabilities are supported.
> +	 */
> +	__pci_pm_init(dev);
>  
>  	pci_read_config_word(dev, PCI_STATUS, &status);
>  	if (status & PCI_STATUS_IMM_READY)
>  		dev->imm_ready = 1;

I would rather just move this PCI_STATUS read to somewhere else.  I
don't think there's a great place to put it.  We could put it in
set_pcie_port_type(), which is sort of a grab bag of PCIe-related
things.

I don't know if it's necessarily even a PCIe-specific thing, but it
would be unexpected if somebody made a conventional PCI device that
set it, since the bit was reserved (and should be zero) in PCI r3.0
and defined in PCIe r4.0.

Maybe we should put it in pci_setup_device() close to where we call
pci_intx_mask_broken()?

Both PCI_STATUS_IMM_READY and PCI_STATUS_CAP_LIST are read-only, and
we currently read PCI_STATUS for every single pci_find_capability()
call, which is kind of stupid.  Maybe someday we can optimize that and
read PCI_STATUS once for both PCI_STATUS_CAP_LIST and
PCI_STATUS_IMM_READY.

> -poweron:
> +
>  	pci_pm_power_up_and_verify_state(dev);
>  	pm_runtime_forbid(&dev->dev);
>  	pm_runtime_set_active(&dev->dev);
> 
> base-commit: 86731a2a651e58953fc949573895f2fa6d456841
> -- 
> 2.50.0.714.g196bf9f422-goog
>
Re: [RFC PATCH] PCI: Support Immediate Readiness on devices without PM capabilities
Posted by Sean Christopherson 3 months, 1 week ago
On Wed, Jun 25, 2025, Bjorn Helgaas wrote:
> On Tue, Jun 24, 2025 at 10:16:37AM -0700, Sean Christopherson wrote:
> > +void pci_pm_init(struct pci_dev *dev)
> > +{
> > +	u16 status;
> > +
> > +	device_enable_async_suspend(&dev->dev);
> > +	dev->wakeup_prepared = false;
> > +
> > +	dev->pm_cap = 0;
> > +	dev->pme_support = 0;
> > +
> > +	/*
> > +	 * Note, support for the PCI PM spec is optional for legacy PCI devices
> > +	 * and for VFs.  Continue on even if no PM capabilities are supported.
> > +	 */
> > +	__pci_pm_init(dev);
> >  
> >  	pci_read_config_word(dev, PCI_STATUS, &status);
> >  	if (status & PCI_STATUS_IMM_READY)
> >  		dev->imm_ready = 1;
> 
> I would rather just move this PCI_STATUS read to somewhere else.  I
> don't think there's a great place to put it.  We could put it in
> set_pcie_port_type(), which is sort of a grab bag of PCIe-related
> things.
> 
> I don't know if it's necessarily even a PCIe-specific thing, but it
> would be unexpected if somebody made a conventional PCI device that
> set it, since the bit was reserved (and should be zero) in PCI r3.0
> and defined in PCIe r4.0.
> 
> Maybe we should put it in pci_setup_device() close to where we call
> pci_intx_mask_broken()?

Any reason not to throw it in pci_init_capabilities()?  That has the advantage
of minimizing the travel distance, e.g. to avoid introducing a goof similar to
what happened with 4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").

E.g. something silly like this?  Or maybe pci_misc_init() or so?

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9e42090fb108..4a1ba5c017cd 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3205,7 +3205,6 @@ void pci_pm_power_up_and_verify_state(struct pci_dev *pci_dev)
 void pci_pm_init(struct pci_dev *dev)
 {
        int pm;
-       u16 status;
        u16 pmc;
 
        device_enable_async_suspend(&dev->dev);
@@ -3266,9 +3265,6 @@ void pci_pm_init(struct pci_dev *dev)
                pci_pme_active(dev, false);
        }
 
-       pci_read_config_word(dev, PCI_STATUS, &status);
-       if (status & PCI_STATUS_IMM_READY)
-               dev->imm_ready = 1;
 poweron:
        pci_pm_power_up_and_verify_state(dev);
        pm_runtime_forbid(&dev->dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 4b8693ec9e4c..d33b8af37247 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2595,6 +2595,15 @@ void pcie_report_downtraining(struct pci_dev *dev)
        __pcie_print_link_status(dev, false);
 }
 
+static void pci_imm_ready_init(struct pci_dev *dev)
+{
+       u16 status;
+
+       pci_read_config_word(dev, PCI_STATUS, &status);
+       if (status & PCI_STATUS_IMM_READY)
+               dev->imm_ready = 1;
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
        pci_ea_init(dev);               /* Enhanced Allocation */
@@ -2604,6 +2613,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
        /* Buffers for saving PCIe and PCI-X capabilities */
        pci_allocate_cap_save_buffers(dev);
 
+       pci_imm_ready_init(dev);        /* Immediate Ready */
        pci_pm_init(dev);               /* Power Management */
        pci_vpd_init(dev);              /* Vital Product Data */
        pci_configure_ari(dev);         /* Alternative Routing-ID Forwarding */
Re: [RFC PATCH] PCI: Support Immediate Readiness on devices without PM capabilities
Posted by Bjorn Helgaas 3 months, 1 week ago
On Thu, Jun 26, 2025 at 03:17:48PM -0700, Sean Christopherson wrote:
> On Wed, Jun 25, 2025, Bjorn Helgaas wrote:
> > On Tue, Jun 24, 2025 at 10:16:37AM -0700, Sean Christopherson wrote:
> > > +void pci_pm_init(struct pci_dev *dev)
> > > +{
> > > +	u16 status;
> > > +
> > > +	device_enable_async_suspend(&dev->dev);
> > > +	dev->wakeup_prepared = false;
> > > +
> > > +	dev->pm_cap = 0;
> > > +	dev->pme_support = 0;
> > > +
> > > +	/*
> > > +	 * Note, support for the PCI PM spec is optional for legacy PCI devices
> > > +	 * and for VFs.  Continue on even if no PM capabilities are supported.
> > > +	 */
> > > +	__pci_pm_init(dev);
> > >  
> > >  	pci_read_config_word(dev, PCI_STATUS, &status);
> > >  	if (status & PCI_STATUS_IMM_READY)
> > >  		dev->imm_ready = 1;
> > 
> > I would rather just move this PCI_STATUS read to somewhere else.  I
> > don't think there's a great place to put it.  We could put it in
> > set_pcie_port_type(), which is sort of a grab bag of PCIe-related
> > things.
> > 
> > I don't know if it's necessarily even a PCIe-specific thing, but it
> > would be unexpected if somebody made a conventional PCI device that
> > set it, since the bit was reserved (and should be zero) in PCI r3.0
> > and defined in PCIe r4.0.
> > 
> > Maybe we should put it in pci_setup_device() close to where we call
> > pci_intx_mask_broken()?
> 
> Any reason not to throw it in pci_init_capabilities()?  That has the
> advantage of minimizing the travel distance, e.g. to avoid
> introducing a goof similar to what happened with 4d4c10f763d7 ("PCI:
> Explicitly put devices into D0 when initializing").

The only reason I suggested doing it earlier was to enable a potential
pci_find_capability() optimization.  But this could easily be moved
if/when that happens, so I think the patch below would be fine.

> E.g. something silly like this?  Or maybe pci_misc_init() or so?
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 9e42090fb108..4a1ba5c017cd 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3205,7 +3205,6 @@ void pci_pm_power_up_and_verify_state(struct pci_dev *pci_dev)
>  void pci_pm_init(struct pci_dev *dev)
>  {
>         int pm;
> -       u16 status;
>         u16 pmc;
>  
>         device_enable_async_suspend(&dev->dev);
> @@ -3266,9 +3265,6 @@ void pci_pm_init(struct pci_dev *dev)
>                 pci_pme_active(dev, false);
>         }
>  
> -       pci_read_config_word(dev, PCI_STATUS, &status);
> -       if (status & PCI_STATUS_IMM_READY)
> -               dev->imm_ready = 1;
>  poweron:
>         pci_pm_power_up_and_verify_state(dev);
>         pm_runtime_forbid(&dev->dev);
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 4b8693ec9e4c..d33b8af37247 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2595,6 +2595,15 @@ void pcie_report_downtraining(struct pci_dev *dev)
>         __pcie_print_link_status(dev, false);
>  }
>  
> +static void pci_imm_ready_init(struct pci_dev *dev)
> +{
> +       u16 status;
> +
> +       pci_read_config_word(dev, PCI_STATUS, &status);
> +       if (status & PCI_STATUS_IMM_READY)
> +               dev->imm_ready = 1;
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>         pci_ea_init(dev);               /* Enhanced Allocation */
> @@ -2604,6 +2613,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
>         /* Buffers for saving PCIe and PCI-X capabilities */
>         pci_allocate_cap_save_buffers(dev);
>  
> +       pci_imm_ready_init(dev);        /* Immediate Ready */
>         pci_pm_init(dev);               /* Power Management */
>         pci_vpd_init(dev);              /* Vital Product Data */
>         pci_configure_ari(dev);         /* Alternative Routing-ID Forwarding */
Re: [RFC PATCH] PCI: Support Immediate Readiness on devices without PM capabilities
Posted by Vipin Sharma 3 months, 2 weeks ago
On 2025-06-24 10:16:37, Sean Christopherson wrote:
> Query support for Immediate Readiness irrespective of whether or not the
> device supports PM capabilities, as nothing in the PCIe spec suggests that
> Immediate Readiness is in any way dependent on PM functionality.

After reading spec I also arrived at the same conclusion, this can be
done irrespective of PM functionality. For example, wait after FLR
behavior which are done using PCI Express Capability is also covered by
this Immediate Readiness bit.

> 
> Opportunistically add a comment to explain why "errors" during PM setup
> are effectively ignored.
> 
> Fixes: d6112f8def51 ("PCI: Add support for Immediate Readiness")
> Cc: David Matlack <dmatlack@google.com>
> Cc: Vipin Sharma <vipinsh@google.com>
> Cc: Aaron Lewis <aaronlewis@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> 
> RFC as I'm not entirely sure this is useful/correct.
> 
> Found by inspection when debugging a VFIO VF passthrough issue that turned out
> to be 907a7a2e5bf4 ("PCI/PM: Set up runtime PM even for devices without PCI PM").
> 
> The folks on the Cc list are looking at parallelizing VF assignment to avoid
> serializing the 100ms wait on FLR.  I'm hoping we'll get lucky and the VFs in
> question do (or can) support PCI_STATUS_IMM_READY.
> 
>  drivers/pci/pci.c | 40 +++++++++++++++++++++++++---------------
>  1 file changed, 25 insertions(+), 15 deletions(-)
> 
> +void pci_pm_init(struct pci_dev *dev)
> +{
> +	u16 status;
> +
> +	device_enable_async_suspend(&dev->dev);
> +	dev->wakeup_prepared = false;
> +
> +	dev->pm_cap = 0;
> +	dev->pme_support = 0;
> +
> +	/*
> +	 * Note, support for the PCI PM spec is optional for legacy PCI devices
> +	 * and for VFs.  Continue on even if no PM capabilities are supported.
> +	 */
> +	__pci_pm_init(dev);
>  
>  	pci_read_config_word(dev, PCI_STATUS, &status);
>  	if (status & PCI_STATUS_IMM_READY)
>  		dev->imm_ready = 1;

I don't see a reason to keep it in pci_pm_init then. This should be
moved out of this function, maybe in the caller or its own function.
> -poweron:
> +
>  	pci_pm_power_up_and_verify_state(dev);
>  	pm_runtime_forbid(&dev->dev);
>  	pm_runtime_set_active(&dev->dev);
> 
> base-commit: 86731a2a651e58953fc949573895f2fa6d456841
> -- 
> 2.50.0.714.g196bf9f422-goog
>