[PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

Timothy Pearson posted 6 patches 3 months, 3 weeks ago
There is a newer version of this series
[PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Timothy Pearson 3 months, 3 weeks ago
 state

The PCIe specification allows three attention indicator states,
on, off, and blink.  Enable all three states instead of basic
on / off control.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
---
 drivers/pci/hotplug/pnv_php.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 0ceb4a2c3c79..c3005324be3d 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -440,10 +440,23 @@ static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
 	return ret;
 }
 
+static int pnv_php_get_raw_indicator_status(struct hotplug_slot *slot, u8 *state)
+{
+	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
+	struct pci_dev *bridge = php_slot->pdev;
+	u16 status;
+
+	pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &status);
+	*state = (status & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6;
+	return 0;
+}
+
+
 static int pnv_php_get_attention_state(struct hotplug_slot *slot, u8 *state)
 {
 	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
 
+	pnv_php_get_raw_indicator_status(slot, &php_slot->attention_state);
 	*state = php_slot->attention_state;
 	return 0;
 }
@@ -461,7 +474,7 @@ static int pnv_php_set_attention_state(struct hotplug_slot *slot, u8 state)
 	mask = PCI_EXP_SLTCTL_AIC;
 
 	if (state)
-		new = PCI_EXP_SLTCTL_ATTN_IND_ON;
+		new = FIELD_PREP(PCI_EXP_SLTCTL_AIC, state);
 	else
 		new = PCI_EXP_SLTCTL_ATTN_IND_OFF;
 
-- 
2.39.5
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Bjorn Helgaas 3 months, 3 weeks ago
On Wed, Jun 18, 2025 at 11:58:59AM -0500, Timothy Pearson wrote:
>  state

Weird wrapping of last word of subject to here.

> The PCIe specification allows three attention indicator states,
> on, off, and blink.  Enable all three states instead of basic
> on / off control.
> 
> Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
> ---
>  drivers/pci/hotplug/pnv_php.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> index 0ceb4a2c3c79..c3005324be3d 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -440,10 +440,23 @@ static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
>  	return ret;
>  }
>  
> +static int pnv_php_get_raw_indicator_status(struct hotplug_slot *slot, u8 *state)
> +{
> +	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
> +	struct pci_dev *bridge = php_slot->pdev;
> +	u16 status;
> +
> +	pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &status);
> +	*state = (status & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6;

Should be able to do this with FIELD_GET().

Is the PCI_EXP_SLTCTL_PIC part needed?  It wasn't there before, commit
log doesn't mention it, and as far as I can tell, this would be the
only driver to do that.  Most expose only the attention status (0=off,
1=on, 2=identify/blink).

> +	return 0;
> +}
> +
> +
>  static int pnv_php_get_attention_state(struct hotplug_slot *slot, u8 *state)
>  {
>  	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
>  
> +	pnv_php_get_raw_indicator_status(slot, &php_slot->attention_state);

This is a change worth noting.  Previously we didn't read the AIC
state from PCI_EXP_SLTCTL at all; we used php_slot->attention_state to
keep track of whatever had been previously set via
pnv_php_set_attention_state().

Now we read the current state from PCI_EXP_SLTCTL.  It's not clear
that php_slot->attention_state is still needed at all.

Previously, the user could write any value at all to the sysfs
"attention" file and then read that same value back.  After this
patch, the user can still write anything, but reads will only return
values with PCI_EXP_SLTCTL_AIC and PCI_EXP_SLTCTL_PIC.

>  	*state = php_slot->attention_state;
>  	return 0;
>  }
> @@ -461,7 +474,7 @@ static int pnv_php_set_attention_state(struct hotplug_slot *slot, u8 state)
>  	mask = PCI_EXP_SLTCTL_AIC;
>  
>  	if (state)
> -		new = PCI_EXP_SLTCTL_ATTN_IND_ON;
> +		new = FIELD_PREP(PCI_EXP_SLTCTL_AIC, state);

This changes the behavior in some cases:

  write 0: previously turned indicator off, now writes reserved value
  write 2: previously turned indicator on, now sets to blink
  write 3: previously turned indicator on, now turns it off

>  	else
>  		new = PCI_EXP_SLTCTL_ATTN_IND_OFF;
>  
> -- 
> 2.39.5
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Timothy Pearson 3 months, 3 weeks ago

----- Original Message -----
> From: "Bjorn Helgaas" <helgaas@kernel.org>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci"
> <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>,
> "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas"
> <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>
> Sent: Wednesday, June 18, 2025 2:01:46 PM
> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

> On Wed, Jun 18, 2025 at 11:58:59AM -0500, Timothy Pearson wrote:
>>  state
> 
> Weird wrapping of last word of subject to here.

I'll need to see what's up with my git format-patch setup. Apologies for that across the multiple series.

>> The PCIe specification allows three attention indicator states,
>> on, off, and blink.  Enable all three states instead of basic
>> on / off control.
>> 
>> Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
>> ---
>>  drivers/pci/hotplug/pnv_php.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>> index 0ceb4a2c3c79..c3005324be3d 100644
>> --- a/drivers/pci/hotplug/pnv_php.c
>> +++ b/drivers/pci/hotplug/pnv_php.c
>> @@ -440,10 +440,23 @@ static int pnv_php_get_adapter_state(struct hotplug_slot
>> *slot, u8 *state)
>>  	return ret;
>>  }
>>  
>> +static int pnv_php_get_raw_indicator_status(struct hotplug_slot *slot, u8
>> *state)
>> +{
>> +	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
>> +	struct pci_dev *bridge = php_slot->pdev;
>> +	u16 status;
>> +
>> +	pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &status);
>> +	*state = (status & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6;
> 
> Should be able to do this with FIELD_GET().

I used the same overall structure as the pciehp_hpc driver here.  Do you want me to also fix up that driver with FIELD_GET()?

> Is the PCI_EXP_SLTCTL_PIC part needed?  It wasn't there before, commit
> log doesn't mention it, and as far as I can tell, this would be the
> only driver to do that.  Most expose only the attention status (0=off,
> 1=on, 2=identify/blink).
> 
>> +	return 0;
>> +}
>> +
>> +
>>  static int pnv_php_get_attention_state(struct hotplug_slot *slot, u8 *state)
>>  {
>>  	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
>>  
>> +	pnv_php_get_raw_indicator_status(slot, &php_slot->attention_state);
> 
> This is a change worth noting.  Previously we didn't read the AIC
> state from PCI_EXP_SLTCTL at all; we used php_slot->attention_state to
> keep track of whatever had been previously set via
> pnv_php_set_attention_state().
> 
> Now we read the current state from PCI_EXP_SLTCTL.  It's not clear
> that php_slot->attention_state is still needed at all.

It probably isn't.  It's unclear why IBM took this path at all, given pciehp's attention handlers predate pnv-php's by many years.

> Previously, the user could write any value at all to the sysfs
> "attention" file and then read that same value back.  After this
> patch, the user can still write anything, but reads will only return
> values with PCI_EXP_SLTCTL_AIC and PCI_EXP_SLTCTL_PIC.
> 
>>  	*state = php_slot->attention_state;
>>  	return 0;
>>  }
>> @@ -461,7 +474,7 @@ static int pnv_php_set_attention_state(struct hotplug_slot
>> *slot, u8 state)
>>  	mask = PCI_EXP_SLTCTL_AIC;
>>  
>>  	if (state)
>> -		new = PCI_EXP_SLTCTL_ATTN_IND_ON;
>> +		new = FIELD_PREP(PCI_EXP_SLTCTL_AIC, state);
> 
> This changes the behavior in some cases:
> 
>  write 0: previously turned indicator off, now writes reserved value
>  write 2: previously turned indicator on, now sets to blink
>  write 3: previously turned indicator on, now turns it off

If we're looking at normalizing with pciehp with an eye toward eventually deprecating / removing pnv-php, I can't think of a better time to change this behavior.  I suspect we're the only major user of this code path at the moment, with most software expecting to see pciehp-style handling.  Thoughts?
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Bjorn Helgaas 3 months, 2 weeks ago
On Wed, Jun 18, 2025 at 07:37:54PM -0500, Timothy Pearson wrote:
> ----- Original Message -----
> > From: "Bjorn Helgaas" <helgaas@kernel.org>
> > To: "Timothy Pearson" <tpearson@raptorengineering.com>
> > Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci"
> > <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>,
> > "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas"
> > <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>
> > Sent: Wednesday, June 18, 2025 2:01:46 PM
> > Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
> 
> > On Wed, Jun 18, 2025 at 11:58:59AM -0500, Timothy Pearson wrote:
> >>  state
> > 
> > Weird wrapping of last word of subject to here.
> 
> I'll need to see what's up with my git format-patch setup. Apologies
> for that across the multiple series.

No worries.  If you can figure out how to make your mailer use the
normal "On xxx, somebody wrote:" attribution instead of duplicating
all those headers, that would be far more useful :)

> >> +static int pnv_php_get_raw_indicator_status(struct hotplug_slot *slot, u8
> >> *state)
> >> +{
> >> +	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
> >> +	struct pci_dev *bridge = php_slot->pdev;
> >> +	u16 status;
> >> +
> >> +	pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &status);
> >> +	*state = (status & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6;
> > 
> > Should be able to do this with FIELD_GET().
> 
> I used the same overall structure as the pciehp_hpc driver here.  Do
> you want me to also fix up that driver with FIELD_GET()?

Nope, I think it's fine to keep this looking like pciehp for now.
If somebody wants to use FIELD_GET() in pciehp, I'd probably be OK
with that, but no need for you to open that can of worms.

> > Is the PCI_EXP_SLTCTL_PIC part needed?  It wasn't there before, commit
> > log doesn't mention it, and as far as I can tell, this would be the
> > only driver to do that.  Most expose only the attention status (0=off,
> > 1=on, 2=identify/blink).
> > 
> >> +	return 0;
> >> +}
> >> +
> >> +
> >>  static int pnv_php_get_attention_state(struct hotplug_slot *slot, u8 *state)
> >>  {
> >>  	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
> >>  
> >> +	pnv_php_get_raw_indicator_status(slot, &php_slot->attention_state);
> > 
> > This is a change worth noting.  Previously we didn't read the AIC
> > state from PCI_EXP_SLTCTL at all; we used php_slot->attention_state to
> > keep track of whatever had been previously set via
> > pnv_php_set_attention_state().
> > 
> > Now we read the current state from PCI_EXP_SLTCTL.  It's not clear
> > that php_slot->attention_state is still needed at all.
> 
> It probably isn't.  It's unclear why IBM took this path at all,
> given pciehp's attention handlers predate pnv-php's by many years.
> 
> > Previously, the user could write any value at all to the sysfs
> > "attention" file and then read that same value back.  After this
> > patch, the user can still write anything, but reads will only return
> > values with PCI_EXP_SLTCTL_AIC and PCI_EXP_SLTCTL_PIC.
> > 
> >>  	*state = php_slot->attention_state;
> >>  	return 0;
> >>  }
> >> @@ -461,7 +474,7 @@ static int pnv_php_set_attention_state(struct hotplug_slot
> >> *slot, u8 state)
> >>  	mask = PCI_EXP_SLTCTL_AIC;
> >>  
> >>  	if (state)
> >> -		new = PCI_EXP_SLTCTL_ATTN_IND_ON;
> >> +		new = FIELD_PREP(PCI_EXP_SLTCTL_AIC, state);
> > 
> > This changes the behavior in some cases:
> > 
> >  write 0: previously turned indicator off, now writes reserved value
> >  write 2: previously turned indicator on, now sets to blink
> >  write 3: previously turned indicator on, now turns it off
> 
> If we're looking at normalizing with pciehp with an eye toward
> eventually deprecating / removing pnv-php, I can't think of a better
> time to change this behavior.  I suspect we're the only major user
> of this code path at the moment, with most software expecting to see
> pciehp-style handling.  Thoughts?

I'm OK with changing this, but I do think it would be worth calling
out the different behavior in the commit log.

Bjorn
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Krishna Kumar 3 months ago
Thanks all for the review and Thanks a bunch to Timothy for fixing the PE Freeze issue. The hotplug issues are like you fix N issue and N+1 th issue will come with new HW.

We had a meeting of around 1.5 -2.0 hr with demo, code review and log review and we decided to let these fixes go ahead. I am consolidating what we discussed -


1. Let these fixes go ahead as it solves wider and much needed customer issue and its urgently needed for time being. We have verified in live demo that nothing is broken from legacy flow and 

PE/PHB freeze, race condition, hung, oops etc has been solved correctly. Basically it fixes below issues -

root-port(phb) -> switch(having4 port)--> 2 nvme devices

1st case - only removal of EP-nvme device (surprise hotplug disables msi at root port), soft removal may work
2nd case  - If we remove switch itself (surprise hotplug disable msi at root port) .


Some explanation of problem -

PHB Freeze (Interrupt is not reaching there) - Fence - Need to reset ||
EP device on removal generated msi - goes to cpu (via root port and then apic mmio region to cpu ),
PHB/root port itself got freeze and cpu never get interrupt - No wq/event going to work - driver is noit working


One area what I thought to fix it with OPAL call is below piece of code-

ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
+    if (ret) {
+        SLOT_WARN(php_slot, "PCI slot activation failed with error code %d, possible frozen PHB", ret);
+        SLOT_WARN(php_slot, "Attempting complete PHB reset before retrying slot activation\n");
+        for (i = 0; i < 3; i++) {
+            /* Slot activation failed, PHB may be fenced from a prior device failure
+             * Use the OPAL fundamental reset call to both try a device reset and clear
+             * any potentially active PHB fence / freeze
+             */
+            SLOT_WARN(php_slot, "Try %d...\n", i + 1);
+            pci_set_pcie_reset_state(php_slot->pdev, pcie_warm_reset);
+            msleep(250);
+            pci_set_pcie_reset_state(php_slot->pdev, pcie_deassert_reset);
+
+            ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
+            if (!ret)
+                break;
+        }


I would like to see this fix in skiboot, the opal call should reset it and it should work. Normally opal call is responsible for  link training and reset, so ideally it should  happens from there. As if now, Timothy has some explanation for it, so its fine for now. He can add his points for record.


2. In future we have decided to work on items like - removal of Work-queue with threaded irq, addition of dpc support in this driver.

3. We have also discussed if we want to move pciehpc.c driver in future, we have to keep below things in mind, Timothy can add some more points for record.

Device Node (DTB) & its relationship with slot, Can we decouple it and will pciehpc.c going to work correctly for this ?
Driver binding and unbinding based on hotplug event and its relationship with slot. Role of DTB in this. Our driver  also depends on OPAL call for link reset etc, can we decouple from it ? If we add some PPC specific code in pciehpc.c, how will it gets handled (by VFT/Function-Pointer or #ifdef or by seperate files ?)


Lukas has some points for above and I am somewhat aligned with below points, but it needs to be tested to see conceptually it fixes above issues, I am consolidating his points and he can add more  if needed-

Only the host bridge
has to be enumerated in the devicetree or DSDT.

pnv_php.c seems to search the devicetree for hotplug slots and
instantiates them.

We've traditionally dealt with such issues by inserting pcibios_*()
hooks in generic code, with a __weak implementation (which is usually
an empty stub) and a custom implementation in arch/powerpc/.

The linker then chooses the custom implementation over the __weak one.

You can find the existing hooks with:

git grep "__weak .*pcibios" -- drivers/pci
git grep pcibios -- arch/powerpc

An alternative method is to add a callback to struct pci_host_bridge.

pciehp is used on all kinds of arches, it's just an implementation of the PCIe Base Spec.



Best Regards,

Krishna




On 6/25/25 4:04 AM, Bjorn Helgaas wrote:
> On Wed, Jun 18, 2025 at 07:37:54PM -0500, Timothy Pearson wrote:
>> ----- Original Message -----
>>> From: "Bjorn Helgaas" <helgaas@kernel.org>
>>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>>> Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci"
>>> <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>,
>>> "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas"
>>> <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>
>>> Sent: Wednesday, June 18, 2025 2:01:46 PM
>>> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
>>> On Wed, Jun 18, 2025 at 11:58:59AM -0500, Timothy Pearson wrote:
>>>>  state
>>> Weird wrapping of last word of subject to here.
>> I'll need to see what's up with my git format-patch setup. Apologies
>> for that across the multiple series.
> No worries.  If you can figure out how to make your mailer use the
> normal "On xxx, somebody wrote:" attribution instead of duplicating
> all those headers, that would be far more useful :)
>
>>>> +static int pnv_php_get_raw_indicator_status(struct hotplug_slot *slot, u8
>>>> *state)
>>>> +{
>>>> +	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
>>>> +	struct pci_dev *bridge = php_slot->pdev;
>>>> +	u16 status;
>>>> +
>>>> +	pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &status);
>>>> +	*state = (status & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6;
>>> Should be able to do this with FIELD_GET().
>> I used the same overall structure as the pciehp_hpc driver here.  Do
>> you want me to also fix up that driver with FIELD_GET()?
> Nope, I think it's fine to keep this looking like pciehp for now.
> If somebody wants to use FIELD_GET() in pciehp, I'd probably be OK
> with that, but no need for you to open that can of worms.
>
>>> Is the PCI_EXP_SLTCTL_PIC part needed?  It wasn't there before, commit
>>> log doesn't mention it, and as far as I can tell, this would be the
>>> only driver to do that.  Most expose only the attention status (0=off,
>>> 1=on, 2=identify/blink).
>>>
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +
>>>>  static int pnv_php_get_attention_state(struct hotplug_slot *slot, u8 *state)
>>>>  {
>>>>  	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
>>>>  
>>>> +	pnv_php_get_raw_indicator_status(slot, &php_slot->attention_state);
>>> This is a change worth noting.  Previously we didn't read the AIC
>>> state from PCI_EXP_SLTCTL at all; we used php_slot->attention_state to
>>> keep track of whatever had been previously set via
>>> pnv_php_set_attention_state().
>>>
>>> Now we read the current state from PCI_EXP_SLTCTL.  It's not clear
>>> that php_slot->attention_state is still needed at all.
>> It probably isn't.  It's unclear why IBM took this path at all,
>> given pciehp's attention handlers predate pnv-php's by many years.
>>
>>> Previously, the user could write any value at all to the sysfs
>>> "attention" file and then read that same value back.  After this
>>> patch, the user can still write anything, but reads will only return
>>> values with PCI_EXP_SLTCTL_AIC and PCI_EXP_SLTCTL_PIC.
>>>
>>>>  	*state = php_slot->attention_state;
>>>>  	return 0;
>>>>  }
>>>> @@ -461,7 +474,7 @@ static int pnv_php_set_attention_state(struct hotplug_slot
>>>> *slot, u8 state)
>>>>  	mask = PCI_EXP_SLTCTL_AIC;
>>>>  
>>>>  	if (state)
>>>> -		new = PCI_EXP_SLTCTL_ATTN_IND_ON;
>>>> +		new = FIELD_PREP(PCI_EXP_SLTCTL_AIC, state);
>>> This changes the behavior in some cases:
>>>
>>>  write 0: previously turned indicator off, now writes reserved value
>>>  write 2: previously turned indicator on, now sets to blink
>>>  write 3: previously turned indicator on, now turns it off
>> If we're looking at normalizing with pciehp with an eye toward
>> eventually deprecating / removing pnv-php, I can't think of a better
>> time to change this behavior.  I suspect we're the only major user
>> of this code path at the moment, with most software expecting to see
>> pciehp-style handling.  Thoughts?
> I'm OK with changing this, but I do think it would be worth calling
> out the different behavior in the commit log.
>
> Bjorn
>
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Timothy Pearson 2 months, 4 weeks ago

----- Original Message -----
> From: "Krishna Kumar" <krishnak@linux.ibm.com>
> To: "Bjorn Helgaas" <helgaas@kernel.org>, "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci"
> <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>,
> "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas"
> <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>, "Lukas Wunner" <lukas@wunner.de>
> Sent: Monday, July 7, 2025 3:01:00 AM
> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

> Thanks all for the review and Thanks a bunch to Timothy for fixing the PE Freeze
> issue. The hotplug issues are like you fix N issue and N+1 th issue will come
> with new HW.
> 
> We had a meeting of around 1.5 -2.0 hr with demo, code review and log review and
> we decided to let these fixes go ahead. I am consolidating what we discussed -
> 
> 
> 1. Let these fixes go ahead as it solves wider and much needed customer issue
> and its urgently needed for time being. We have verified in live demo that
> nothing is broken from legacy flow and
> 
> PE/PHB freeze, race condition, hung, oops etc has been solved correctly.
> Basically it fixes below issues -
> 
> root-port(phb) -> switch(having4 port)--> 2 nvme devices
> 
> 1st case - only removal of EP-nvme device (surprise hotplug disables msi at root
> port), soft removal may work
> 2nd case  - If we remove switch itself (surprise hotplug disable msi at root
> port) .

Just a quick follow up to see if anything else is needed from my end before this merge can occur?

Thanks!
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Bjorn Helgaas 2 months, 4 weeks ago
On Fri, Jul 11, 2025 at 01:18:07PM -0500, Timothy Pearson wrote:
> ----- Original Message -----
> > From: "Krishna Kumar" <krishnak@linux.ibm.com>
> > To: "Bjorn Helgaas" <helgaas@kernel.org>, "Timothy Pearson" <tpearson@raptorengineering.com>
> > Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci"
> > <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>,
> > "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas"
> > <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>, "Lukas Wunner" <lukas@wunner.de>
> > Sent: Monday, July 7, 2025 3:01:00 AM
> > Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
> 
> > Thanks all for the review and Thanks a bunch to Timothy for fixing the PE Freeze
> > issue. The hotplug issues are like you fix N issue and N+1 th issue will come
> > with new HW.
> > 
> > We had a meeting of around 1.5 -2.0 hr with demo, code review and log review and
> > we decided to let these fixes go ahead. I am consolidating what we discussed -
> > 
> > 
> > 1. Let these fixes go ahead as it solves wider and much needed customer issue
> > and its urgently needed for time being. We have verified in live demo that
> > nothing is broken from legacy flow and
> > 
> > PE/PHB freeze, race condition, hung, oops etc has been solved correctly.
> > Basically it fixes below issues -
> > 
> > root-port(phb) -> switch(having4 port)--> 2 nvme devices
> > 
> > 1st case - only removal of EP-nvme device (surprise hotplug disables msi at root
> > port), soft removal may work
> > 2nd case  - If we remove switch itself (surprise hotplug disable msi at root
> > port) .
> 
> Just a quick follow up to see if anything else is needed from my end
> before this merge can occur?

I was waiting for a v3 to fix at least these:

  - Subject line style matching history in drivers/pci/hotplug/ (use
    "git log --oneline" to see it)

  - Broken subject line wrapping into commit log

  - Spelling fixes

  - Comment reformat to match prevailing style

  - Note attention indicator behavior changes in commit log

  - Some minor refactoring

Basically everything mentioned in the responses to the original
posting.  Or was there a v3 that I missed?

Bjorn
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Timothy Pearson 2 months, 3 weeks ago
Apologies for that, I had meant to send v3 in and apparently it got dropped.  I believe I have addressed the comments in the v3 I just sent in.

Thank you!

----- Original Message -----
> From: "Bjorn Helgaas" <helgaas@kernel.org>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "Krishna Kumar" <krishnak@linux.ibm.com>, "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel"
> <linux-kernel@vger.kernel.org>, "linux-pci" <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>,
> "Michael Ellerman" <mpe@ellerman.id.au>, "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao"
> <naveen@kernel.org>, "Bjorn Helgaas" <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>,
> "Lukas Wunner" <lukas@wunner.de>
> Sent: Friday, July 11, 2025 4:05:10 PM
> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

> On Fri, Jul 11, 2025 at 01:18:07PM -0500, Timothy Pearson wrote:
>> ----- Original Message -----
>> > From: "Krishna Kumar" <krishnak@linux.ibm.com>
>> > To: "Bjorn Helgaas" <helgaas@kernel.org>, "Timothy Pearson"
>> > <tpearson@raptorengineering.com>
>> > Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel"
>> > <linux-kernel@vger.kernel.org>, "linux-pci"
>> > <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>,
>> > "Michael Ellerman" <mpe@ellerman.id.au>,
>> > "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao"
>> > <naveen@kernel.org>, "Bjorn Helgaas"
>> > <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>,
>> > "Lukas Wunner" <lukas@wunner.de>
>> > Sent: Monday, July 7, 2025 3:01:00 AM
>> > Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention
>> > indicator
>> 
>> > Thanks all for the review and Thanks a bunch to Timothy for fixing the PE Freeze
>> > issue. The hotplug issues are like you fix N issue and N+1 th issue will come
>> > with new HW.
>> > 
>> > We had a meeting of around 1.5 -2.0 hr with demo, code review and log review and
>> > we decided to let these fixes go ahead. I am consolidating what we discussed -
>> > 
>> > 
>> > 1. Let these fixes go ahead as it solves wider and much needed customer issue
>> > and its urgently needed for time being. We have verified in live demo that
>> > nothing is broken from legacy flow and
>> > 
>> > PE/PHB freeze, race condition, hung, oops etc has been solved correctly.
>> > Basically it fixes below issues -
>> > 
>> > root-port(phb) -> switch(having4 port)--> 2 nvme devices
>> > 
>> > 1st case - only removal of EP-nvme device (surprise hotplug disables msi at root
>> > port), soft removal may work
>> > 2nd case  - If we remove switch itself (surprise hotplug disable msi at root
>> > port) .
>> 
>> Just a quick follow up to see if anything else is needed from my end
>> before this merge can occur?
> 
> I was waiting for a v3 to fix at least these:
> 
>  - Subject line style matching history in drivers/pci/hotplug/ (use
>    "git log --oneline" to see it)
> 
>  - Broken subject line wrapping into commit log
> 
>  - Spelling fixes
> 
>  - Comment reformat to match prevailing style
> 
>  - Note attention indicator behavior changes in commit log
> 
>  - Some minor refactoring
> 
> Basically everything mentioned in the responses to the original
> posting.  Or was there a v3 that I missed?
> 
> Bjorn
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Krishna Kumar 3 months, 2 weeks ago
Shawn, Timothy:

Thanks for posting the series of patch. Few things I wanted to do better understand Raptor problem -


1. Last time my two patches solved all the hotunplug operation and Kernel crash issue except nvme case. It did not work with

    NVME since dpc support was not there. I was not able to do that due to being  occupied in some other work.

2. I want to understand the delta from last yr problem to this problem. Is the PHB freeze or hotunplug failure happens

    only for particular Microsemi switch or it happens with all the switches. When did this problem started coming ? Till last yr 

    it was not there. Is it specific to particular Hardware ? Can I get your setup to test this problem and your patch ?

3. To me, hot unplug opertaion  --> AER triggering --> DPC support, this flow should mask the error to reach root port/cpu and it 

    should solve the PHB freeze/ hot unplug failure operation. If there are AER/EEH related synchronization issue we need to solve them. 

    Can you pls list the issue, I will pass it to EEH/AER team. But yeah, to me if AER implementation is correct and we add DPC support,

    all the error will be contained by switch itself. The PHB/root port/cpu will not be impacted by this and there should not be any freeze.

4. Ofcourse we can pick some of the fixes from pciehp driver if its missing in pnv_php.c. Also at the same time you have done

    some cleanup in hot unplug path and fixed the attenuation button related code. If these works fine, we can pick it. But I want to test it.

     Pls provide me setup.

5. If point 3 and 4 does not solve the problem, then only we should move to pciehp.c. But AFAIK, PPC/Powernv is DT based while pciehp.c

     may be only supporting acpi (I have to check it on this).  We need to provide PHB related information via DTB and maintain the related 

     topology information via dtb and then it can be doable. Also , we need to do thorough planning/testing if we think to choose pciehp.c. 

     But yeah, lets not jump here and lets try to fix the current issues via point 3 & 4. Point 5 will be our last option.


Best Regards,

Krishna




On 6/19/25 6:07 AM, Timothy Pearson wrote:
>
> ----- Original Message -----
>> From: "Bjorn Helgaas" <helgaas@kernel.org>
>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>> Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci"
>> <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>,
>> "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas"
>> <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>
>> Sent: Wednesday, June 18, 2025 2:01:46 PM
>> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
>> On Wed, Jun 18, 2025 at 11:58:59AM -0500, Timothy Pearson wrote:
>>>  state
>> Weird wrapping of last word of subject to here.
> I'll need to see what's up with my git format-patch setup. Apologies for that across the multiple series.
>
>>> The PCIe specification allows three attention indicator states,
>>> on, off, and blink.  Enable all three states instead of basic
>>> on / off control.
>>>
>>> Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
>>> ---
>>>  drivers/pci/hotplug/pnv_php.c | 15 ++++++++++++++-
>>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>>> index 0ceb4a2c3c79..c3005324be3d 100644
>>> --- a/drivers/pci/hotplug/pnv_php.c
>>> +++ b/drivers/pci/hotplug/pnv_php.c
>>> @@ -440,10 +440,23 @@ static int pnv_php_get_adapter_state(struct hotplug_slot
>>> *slot, u8 *state)
>>>  	return ret;
>>>  }
>>>  
>>> +static int pnv_php_get_raw_indicator_status(struct hotplug_slot *slot, u8
>>> *state)
>>> +{
>>> +	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
>>> +	struct pci_dev *bridge = php_slot->pdev;
>>> +	u16 status;
>>> +
>>> +	pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &status);
>>> +	*state = (status & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6;
>> Should be able to do this with FIELD_GET().
> I used the same overall structure as the pciehp_hpc driver here.  Do you want me to also fix up that driver with FIELD_GET()?
>
>> Is the PCI_EXP_SLTCTL_PIC part needed?  It wasn't there before, commit
>> log doesn't mention it, and as far as I can tell, this would be the
>> only driver to do that.  Most expose only the attention status (0=off,
>> 1=on, 2=identify/blink).
>>
>>> +	return 0;
>>> +}
>>> +
>>> +
>>>  static int pnv_php_get_attention_state(struct hotplug_slot *slot, u8 *state)
>>>  {
>>>  	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
>>>  
>>> +	pnv_php_get_raw_indicator_status(slot, &php_slot->attention_state);
>> This is a change worth noting.  Previously we didn't read the AIC
>> state from PCI_EXP_SLTCTL at all; we used php_slot->attention_state to
>> keep track of whatever had been previously set via
>> pnv_php_set_attention_state().
>>
>> Now we read the current state from PCI_EXP_SLTCTL.  It's not clear
>> that php_slot->attention_state is still needed at all.
> It probably isn't.  It's unclear why IBM took this path at all, given pciehp's attention handlers predate pnv-php's by many years.
>
>> Previously, the user could write any value at all to the sysfs
>> "attention" file and then read that same value back.  After this
>> patch, the user can still write anything, but reads will only return
>> values with PCI_EXP_SLTCTL_AIC and PCI_EXP_SLTCTL_PIC.
>>
>>>  	*state = php_slot->attention_state;
>>>  	return 0;
>>>  }
>>> @@ -461,7 +474,7 @@ static int pnv_php_set_attention_state(struct hotplug_slot
>>> *slot, u8 state)
>>>  	mask = PCI_EXP_SLTCTL_AIC;
>>>  
>>>  	if (state)
>>> -		new = PCI_EXP_SLTCTL_ATTN_IND_ON;
>>> +		new = FIELD_PREP(PCI_EXP_SLTCTL_AIC, state);
>> This changes the behavior in some cases:
>>
>>  write 0: previously turned indicator off, now writes reserved value
>>  write 2: previously turned indicator on, now sets to blink
>>  write 3: previously turned indicator on, now turns it off
> If we're looking at normalizing with pciehp with an eye toward eventually deprecating / removing pnv-php, I can't think of a better time to change this behavior.  I suspect we're the only major user of this code path at the moment, with most software expecting to see pciehp-style handling.  Thoughts?
>
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Timothy Pearson 3 months, 2 weeks ago

----- Original Message -----
> From: "Krishna Kumar" <krishnak@linux.ibm.com>
> To: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "Timothy Pearson" <tpearson@raptorengineering.com>, "Shawn
> Anastasio" <sanastasio@raptorengineering.com>
> Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci"
> <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>,
> "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas"
> <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>
> Sent: Friday, June 20, 2025 4:26:51 AM
> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

> Shawn, Timothy:
> 
> Thanks for posting the series of patch. Few things I wanted to do better
> understand Raptor problem -
> 
> 
> 1. Last time my two patches solved all the hotunplug operation and Kernel crash
> issue except nvme case. It did not work with
> 
>    NVME since dpc support was not there. I was not able to do that due to being
>      occupied in some other work.

With the current series all hotplug is working correctly, including not only NVMe on root port and bridge ports, but also suprise plug of the entire PCIe switch at the root port.  The lack of DPC support *might* be related to the PE freeze, but in any case we prefer the hotplug driver to be able to recover from a PE freeze (e.g. if a bridge card is faulty and needs to be replaced) without also requiring a reboot, so I would consider DPC implementation orthogonal to this patch set.

> 2. I want to understand the delta from last yr problem to this problem. Is the
> PHB freeze or hotunplug failure happens
> 
>    only for particular Microsemi switch or it happens with all the switches. When
>    did this problem started coming ? Till last yr

Hotplug has never worked reliably for us, if it worked at all it was always rolling the dice on whether the kernel would oops and take down the host.  Even if the kernel didn't oops, suprise plug and auto-add / auto-remove never worked beyond one remove operation.

>    it was not there. Is it specific to particular Hardware ? Can I get your setup
>    to test this problem and your patch ?

Because you will need to be able to physically plug and unplug cards and drives this may be a bit tricky.  Do you have access to a POWER9 host system with a x16 PCIe slot?  If so, all you need is a Supermicro SLC-AO3G-8E2P card and some random U.2 NVMe drives -- these cards are readily available and provide relatively standardized OCuLink access to a Switchtec bridge.

If you don't have access to a POWER9 host, we can set you up with remote access, but it won't show all of the crashing and problems that occur with surprise plug unless we set up a live debug session (video call or similar).

> 3. To me, hot unplug opertaion  --> AER triggering --> DPC support, this flow
> should mask the error to reach root port/cpu and it
> 
>    should solve the PHB freeze/ hot unplug failure operation. If there are AER/EEH
>    related synchronization issue we need to solve them.
> 
>    Can you pls list the issue, I will pass it to EEH/AER team. But yeah, to me if
>    AER implementation is correct and we add DPC support,
> 
>    all the error will be contained by switch itself. The PHB/root port/cpu will not
>    be impacted by this and there should not be any freeze.

While this is a good goal to work toward, it only solves one possible fault mode.  The patch series posted here will handle the general case of a PE freeze without requiring a host reboot, which is great for high-reliability systems where there might be a desire to replace the entire switch card (this has been tested with the patch series and works perfectly).

> 4. Ofcourse we can pick some of the fixes from pciehp driver if its missing in
> pnv_php.c. Also at the same time you have done
> 
>    some cleanup in hot unplug path and fixed the attenuation button related code.
>    If these works fine, we can pick it. But I want to test it.
> 
>     Pls provide me setup.
> 
> 5. If point 3 and 4 does not solve the problem, then only we should move to
> pciehp.c. But AFAIK, PPC/Powernv is DT based while pciehp.c
> 
>     may be only supporting acpi (I have to check it on this).  We need to provide
>     PHB related information via DTB and maintain the related
> 
>     topology information via dtb and then it can be doable. Also , we need to do
>     thorough planning/testing if we think to choose pciehp.c.
> 
>     But yeah, lets not jump here and lets try to fix the current issues via point 3
>     & 4. Point 5 will be our last option.

If possible I would like to see this series merged vs. being blocked on DPC.  Again, from where I sit DPC is orthogonal; many events can cause a PE freeze and implementing DPC only solves one.  We do *not* want to require a host reboot in any situation whatsoever short of a complete failure of a critical element (e.g. the PHB itself or a CPU package); our use case as deployed is five nines critical infrastructure, and the broken hotplug has already been the sole reason we have not maintained 100% uptime on a key system.

Thanks!
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Krishna Kumar 3 months, 2 weeks ago
On 6/21/25 8:35 PM, Timothy Pearson wrote:
>
> ----- Original Message -----
>> From: "Krishna Kumar" <krishnak@linux.ibm.com>
>> To: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "Timothy Pearson" <tpearson@raptorengineering.com>, "Shawn
>> Anastasio" <sanastasio@raptorengineering.com>
>> Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-pci"
>> <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>, "Michael Ellerman" <mpe@ellerman.id.au>,
>> "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao" <naveen@kernel.org>, "Bjorn Helgaas"
>> <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>
>> Sent: Friday, June 20, 2025 4:26:51 AM
>> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
>> Shawn, Timothy:
>>
>> Thanks for posting the series of patch. Few things I wanted to do better
>> understand Raptor problem -
>>
>>
>> 1. Last time my two patches solved all the hotunplug operation and Kernel crash
>> issue except nvme case. It did not work with
>>
>>     NVME since dpc support was not there. I was not able to do that due to being
>>       occupied in some other work.
> With the current series all hotplug is working correctly, including not only NVMe on root port and bridge ports, but also suprise plug of the entire PCIe switch at the root port.  The lack of DPC support *might* be related to the PE freeze, but in any case we prefer the hotplug driver to be able to recover from a PE freeze (e.g. if a bridge card is faulty and needs to be replaced) without also requiring a reboot, so I would consider DPC implementation orthogonal to this patch set.
Sounds Good !!
>
>> 2. I want to understand the delta from last yr problem to this problem. Is the
>> PHB freeze or hotunplug failure happens
>>
>>     only for particular Microsemi switch or it happens with all the switches. When
>>     did this problem started coming ? Till last yr
> Hotplug has never worked reliably for us, if it worked at all it was always rolling the dice on whether the kernel would oops and take down the host.  Even if the kernel didn't oops, suprise plug and auto-add / auto-remove never worked beyond one remove operation.
I would like to see this problem may be during our zoom/teams meeting. Though I have not tested surprise plug/unplug and only tested via sysfs, you may be correct but I want to have a look of this problem.
>
>>     it was not there. Is it specific to particular Hardware ? Can I get your setup
>>     to test this problem and your patch ?
> Because you will need to be able to physically plug and unplug cards and drives this may be a bit tricky.  Do you have access to a POWER9 host system with a x16 PCIe slot?  If so, all you need is a Supermicro SLC-AO3G-8E2P card and some random U.2 NVMe drives -- these cards are readily available and provide relatively standardized OCuLink access to a Switchtec bridge.
>
> If you don't have access to a POWER9 host, we can set you up with remote access, but it won't show all of the crashing and problems that occur with surprise plug unless we set up a live debug session (video call or similar).


Video Call should be fine. During the call I will have a look of existing problem and fix along with driver/kernel logs.

>
>> 3. To me, hot unplug opertaion  --> AER triggering --> DPC support, this flow
>> should mask the error to reach root port/cpu and it
>>
>>     should solve the PHB freeze/ hot unplug failure operation. If there are AER/EEH
>>     related synchronization issue we need to solve them.
>>
>>     Can you pls list the issue, I will pass it to EEH/AER team. But yeah, to me if
>>     AER implementation is correct and we add DPC support,
>>
>>     all the error will be contained by switch itself. The PHB/root port/cpu will not
>>     be impacted by this and there should not be any freeze.
> While this is a good goal to work toward, it only solves one possible fault mode.  The patch series posted here will handle the general case of a PE freeze without requiring a host reboot, which is great for high-reliability systems where there might be a desire to replace the entire switch card (this has been tested with the patch series and works perfectly).


You may be correct on this and this is possible. If the driver and AER/EEH errors/events are properly 

handled then we may not need DPC in all cases. The point of DPC was to absorb the error at switch port 

itself so that it will not reach to PHB/Root-port/Cpu and will avoid further corruption. I was hoping that if 

DPC gets enabled, we may not need explicit reboot for drives to come up in case of surprise hot unplug.

But yeah, we can compare this with current result when this support will be enabled.

>
>> 4. Ofcourse we can pick some of the fixes from pciehp driver if its missing in
>> pnv_php.c. Also at the same time you have done
>>
>>     some cleanup in hot unplug path and fixed the attenuation button related code.
>>     If these works fine, we can pick it. But I want to test it.
>>
>>      Pls provide me setup.
>>
>> 5. If point 3 and 4 does not solve the problem, then only we should move to
>> pciehp.c. But AFAIK, PPC/Powernv is DT based while pciehp.c
>>
>>      may be only supporting acpi (I have to check it on this).  We need to provide
>>      PHB related information via DTB and maintain the related
>>
>>      topology information via dtb and then it can be doable. Also , we need to do
>>      thorough planning/testing if we think to choose pciehp.c.
>>
>>      But yeah, lets not jump here and lets try to fix the current issues via point 3
>>      & 4. Point 5 will be our last option.
> If possible I would like to see this series merged vs. being blocked on DPC.  Again, from where I sit DPC is orthogonal; many events can cause a PE freeze and implementing DPC only solves one.  We do *not* want to require a host reboot in any situation whatsoever short of a complete failure of a critical element (e.g. the PHB itself or a CPU package); our use case as deployed is five nines critical infrastructure, and the broken hotplug has already been the sole reason we have not maintained 100% uptime on a key system.

If you are in hurry and want to defer DPC for some time, I am fine with it since it serves larger purpose like PE freeze and NVME drives working 

along with surprise hotplug fixes.  I have gone through your pnv_php.c changes and I am mostly fine with it. But, I would like to review it again 

from larger prespective w.r.t to EEH & pciehp.c, so give me some time.  Also, if possible you can show me 

the problem/fix along with log  during video call. it would be great if we can meet sometimes next month in early first week may be on 5th of July.

I will request few of the EEH/AER developer to have a look into the patch and to join the meeting if they have bandwidth. Please shoot the 

mail/invite on krishna.kumar11@ibm.com along with this email id. I am based in Bangalore but can be available till night 10:00 pm.

>
> Thanks!


Thanks & Best Regards,

Krishna

>
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Timothy Pearson 3 months, 2 weeks ago

----- Original Message -----
> From: "Krishna Kumar" <krishnak@linux.ibm.com>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "Shawn Anastasio" <sanastasio@raptorengineering.com>, "linux-kernel"
> <linux-kernel@vger.kernel.org>, "linux-pci" <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>,
> "Michael Ellerman" <mpe@ellerman.id.au>, "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao"
> <naveen@kernel.org>, "Bjorn Helgaas" <bhelgaas@google.com>
> Sent: Tuesday, June 24, 2025 2:07:30 AM
> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

> On 6/21/25 8:35 PM, Timothy Pearson wrote:
>>
>> ----- Original Message -----
>>> From: "Krishna Kumar" <krishnak@linux.ibm.com>
>>> To: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "Timothy Pearson"
>>> <tpearson@raptorengineering.com>, "Shawn
>>> Anastasio" <sanastasio@raptorengineering.com>
>>> Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel"
>>> <linux-kernel@vger.kernel.org>, "linux-pci"
>>> <linux-pci@vger.kernel.org>, "Madhavan Srinivasan" <maddy@linux.ibm.com>,
>>> "Michael Ellerman" <mpe@ellerman.id.au>,
>>> "christophe leroy" <christophe.leroy@csgroup.eu>, "Naveen N Rao"
>>> <naveen@kernel.org>, "Bjorn Helgaas"
>>> <bhelgaas@google.com>, "Shawn Anastasio" <sanastasio@raptorengineering.com>
>>> Sent: Friday, June 20, 2025 4:26:51 AM
>>> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention
>>> indicator
>>> Shawn, Timothy:
>>>
>>> Thanks for posting the series of patch. Few things I wanted to do better
>>> understand Raptor problem -
>>>
>>>
>>> 1. Last time my two patches solved all the hotunplug operation and Kernel crash
>>> issue except nvme case. It did not work with
>>>
>>>     NVME since dpc support was not there. I was not able to do that due to being
>>>       occupied in some other work.
>> With the current series all hotplug is working correctly, including not only
>> NVMe on root port and bridge ports, but also suprise plug of the entire PCIe
>> switch at the root port.  The lack of DPC support *might* be related to the PE
>> freeze, but in any case we prefer the hotplug driver to be able to recover from
>> a PE freeze (e.g. if a bridge card is faulty and needs to be replaced) without
>> also requiring a reboot, so I would consider DPC implementation orthogonal to
>> this patch set.
> Sounds Good !!
>>
>>> 2. I want to understand the delta from last yr problem to this problem. Is the
>>> PHB freeze or hotunplug failure happens
>>>
>>>     only for particular Microsemi switch or it happens with all the switches. When
>>>     did this problem started coming ? Till last yr
>> Hotplug has never worked reliably for us, if it worked at all it was always
>> rolling the dice on whether the kernel would oops and take down the host.  Even
>> if the kernel didn't oops, suprise plug and auto-add / auto-remove never worked
>> beyond one remove operation.
> I would like to see this problem may be during our zoom/teams meeting. Though I
> have not tested surprise plug/unplug and only tested via sysfs, you may be
> correct but I want to have a look of this problem.
>>
>>>     it was not there. Is it specific to particular Hardware ? Can I get your setup
>>>     to test this problem and your patch ?
>> Because you will need to be able to physically plug and unplug cards and drives
>> this may be a bit tricky.  Do you have access to a POWER9 host system with a
>> x16 PCIe slot?  If so, all you need is a Supermicro SLC-AO3G-8E2P card and some
>> random U.2 NVMe drives -- these cards are readily available and provide
>> relatively standardized OCuLink access to a Switchtec bridge.
>>
>> If you don't have access to a POWER9 host, we can set you up with remote access,
>> but it won't show all of the crashing and problems that occur with surprise
>> plug unless we set up a live debug session (video call or similar).
> 
> 
> Video Call should be fine. During the call I will have a look of existing
> problem and fix along with driver/kernel logs.

Sounds good.  We'll set up a machine in the DMZ for this session so you can also have access.  For anyone interested in logging on to the box for logs, can you send over an SSH public key to my Email address directly?  Will get everyone added with root access to the test box prior to the call start.

>>
>>> 3. To me, hot unplug opertaion  --> AER triggering --> DPC support, this flow
>>> should mask the error to reach root port/cpu and it
>>>
>>>     should solve the PHB freeze/ hot unplug failure operation. If there are AER/EEH
>>>     related synchronization issue we need to solve them.
>>>
>>>     Can you pls list the issue, I will pass it to EEH/AER team. But yeah, to me if
>>>     AER implementation is correct and we add DPC support,
>>>
>>>     all the error will be contained by switch itself. The PHB/root port/cpu will not
>>>     be impacted by this and there should not be any freeze.
>> While this is a good goal to work toward, it only solves one possible fault
>> mode.  The patch series posted here will handle the general case of a PE freeze
>> without requiring a host reboot, which is great for high-reliability systems
>> where there might be a desire to replace the entire switch card (this has been
>> tested with the patch series and works perfectly).
> 
> 
> You may be correct on this and this is possible. If the driver and AER/EEH
> errors/events are properly
> 
> handled then we may not need DPC in all cases. The point of DPC was to absorb
> the error at switch port
> 
> itself so that it will not reach to PHB/Root-port/Cpu and will avoid further
> corruption. I was hoping that if
> 
> DPC gets enabled, we may not need explicit reboot for drives to come up in case
> of surprise hot unplug.

I do understand the logic here, and it would theoretically work, but again it's a bit more fragile than the solution we're presenting here in that it relies on another chunk of device logic to work correctly in all cases, with the consequence of a failure being a forced reboot.

With our patch series here we can hot plug and hot unplug NVMe drives all day without requiring any reboots, including surprise plug.  DPC would simply make this process a little bit faster, in that we don't have to wait a few hundred milliseconds for the PE to unfreeze and the EEH driver to give up.

> But yeah, we can compare this with current result when this support will be
> enabled.
> 
>>
>>> 4. Ofcourse we can pick some of the fixes from pciehp driver if its missing in
>>> pnv_php.c. Also at the same time you have done
>>>
>>>     some cleanup in hot unplug path and fixed the attenuation button related code.
>>>     If these works fine, we can pick it. But I want to test it.
>>>
>>>      Pls provide me setup.
>>>
>>> 5. If point 3 and 4 does not solve the problem, then only we should move to
>>> pciehp.c. But AFAIK, PPC/Powernv is DT based while pciehp.c
>>>
>>>      may be only supporting acpi (I have to check it on this).  We need to provide
>>>      PHB related information via DTB and maintain the related
>>>
>>>      topology information via dtb and then it can be doable. Also , we need to do
>>>      thorough planning/testing if we think to choose pciehp.c.
>>>
>>>      But yeah, lets not jump here and lets try to fix the current issues via point 3
>>>      & 4. Point 5 will be our last option.
>> If possible I would like to see this series merged vs. being blocked on DPC.
>> Again, from where I sit DPC is orthogonal; many events can cause a PE freeze
>> and implementing DPC only solves one.  We do *not* want to require a host
>> reboot in any situation whatsoever short of a complete failure of a critical
>> element (e.g. the PHB itself or a CPU package); our use case as deployed is
>> five nines critical infrastructure, and the broken hotplug has already been the
>> sole reason we have not maintained 100% uptime on a key system.
> 
> If you are in hurry and want to defer DPC for some time, I am fine with it since
> it serves larger purpose like PE freeze and NVME drives working
> 
> along with surprise hotplug fixes.  I have gone through your pnv_php.c changes
> and I am mostly fine with it. But, I would like to review it again
> 
> from larger prespective w.r.t to EEH & pciehp.c, so give me some time.

Sure, please let me know if anything is concerning.  My goal would be to have this reviewed from a code quality perspective and a v2 posted at least a couple of days before the video call.

Also, if
> possible you can show me
> 
> the problem/fix along with log  during video call. it would be great if we can
> meet sometimes next month in early first week may be on 5th of July.

Let's plan for that -- this is somewhat urgent with Debian Trixie being released soon, and I want to see this repair backported.  We already ship Bookworm Debian kernels with completely broken VFIO and hotplug, our goal is to get that all fixed for Trixie and enable the functionality our customers are paying for.

> I will request few of the EEH/AER developer to have a look into the patch and to
> join the meeting if they have bandwidth. Please shoot the
> 
> mail/invite on krishna.kumar11@ibm.com along with this email id. I am based in
> Bangalore but can be available till night 10:00 pm.

Will do, thanks!
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Lukas Wunner 3 months, 2 weeks ago
On Fri, Jun 20, 2025 at 02:56:51PM +0530, Krishna Kumar wrote:
> 5. If point 3 and 4 does not solve the problem, then only we should
> move to pciehp.c. But AFAIK, PPC/Powernv is DT based while pciehp.c
> may be only supporting acpi (I have to check it on this). We need to
> provide PHB related information via DTB and maintain the related
> topology information via dtb and then it can be doable.

pciehp is not ACPI-specific.  The PCIe port service driver in
drivers/pci/pcie/portdrv.c binds to any PCIe port, examines the
port's capabilities (e.g. hotplug, AER, DPC, ...) and instantiates
sub-devices to which pciehp and the other drivers such as aer bind.

How do you prevent pciehp from driving hotplug on PowerNV anyway?
Do you disable CONFIG_HOTPLUG_PCI_PCIE?

Thanks,

Lukas
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Krishna Kumar 3 months, 2 weeks ago
On 6/21/25 3:29 PM, Lukas Wunner wrote:
> On Fri, Jun 20, 2025 at 02:56:51PM +0530, Krishna Kumar wrote:
>> 5. If point 3 and 4 does not solve the problem, then only we should
>> move to pciehp.c. But AFAIK, PPC/Powernv is DT based while pciehp.c
>> may be only supporting acpi (I have to check it on this). We need to
>> provide PHB related information via DTB and maintain the related
>> topology information via dtb and then it can be doable.
> pciehp is not ACPI-specific.  The PCIe port service driver in
> drivers/pci/pcie/portdrv.c binds to any PCIe port, examines the
> port's capabilities (e.g. hotplug, AER, DPC, ...) and instantiates
> sub-devices to which pciehp and the other drivers such as aer bind.

Good to know and thanks for the info. As I already told I did not go through pciehp.c,

and its internal working (since I did not needed it) I can only comment in concrete manner 

once I am done with its code review and internal logic.

What my concern was, you can comment on below point if I am wrong (I will learn something) -

1. If we get PHB info from mmcfg via acpi table in x86 and create a root port from there with 

some address/entity and if this Acpi and associated entity is not present for PPC, then it can be a problem.

2. PPC is normally based on DTB entity and it identifies PHB and pcie devices from there. If this all the information 

is correctly map via portdrv.c then there is no problem and whatever you are telling is correct and it will work.

3. But if point 2 is not handled correctly we need to just aligned with port related data structure to make it work. 

Not a big deal but need to put thorough logic and testing effort. If its already in place, we are good.


>
> How do you prevent pciehp from driving hotplug on PowerNV anyway?
> Do you disable CONFIG_HOTPLUG_PCI_PCIE?

I am not sure if at present pciehp is used for Powenv, and PPC uses this config or it has disabled it (since we rely on pnvphp.c for our hotplug operation, I did not checked it). 

If its going to work on PPC by enabling config and its giving correct output, I dont see any reason to not using it as an 

alternate driver where pnvphp.c fails. But yeah, as I told I can commnet on this once I do some experiment (going through 

the code and some testing). But yeah, its always good to hear from you. Thanks a bunch !!

>
> Thanks,
>
> Lukas

Best regards,

Krishna


>
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Lukas Wunner 3 months, 2 weeks ago
On Wed, Jun 25, 2025 at 09:38:19AM +0530, Krishna Kumar wrote:
> On 6/21/25 3:29 PM, Lukas Wunner wrote:
> > On Fri, Jun 20, 2025 at 02:56:51PM +0530, Krishna Kumar wrote:
> > > 5. If point 3 and 4 does not solve the problem, then only we should
> > > move to pciehp.c. But AFAIK, PPC/Powernv is DT based while pciehp.c
> > > may be only supporting acpi (I have to check it on this). We need to
> > > provide PHB related information via DTB and maintain the related
> > > topology information via dtb and then it can be doable.
> > 
> > pciehp is not ACPI-specific.  The PCIe port service driver in
> > drivers/pci/pcie/portdrv.c binds to any PCIe port, examines the
> > port's capabilities (e.g. hotplug, AER, DPC, ...) and instantiates
> > sub-devices to which pciehp and the other drivers such as aer bind.
> 
> 1. If we get PHB info from mmcfg via acpi table in x86 and create a
>    root port from there with some address/entity and if this Acpi and
>    associated entity is not present for PPC, then it can be a problem.
> 
> 2. PPC is normally based on DTB entity and it identifies PHB and pcie
>    devices from there. If this all the information is correctly map
>    via portdrv.c then there is no problem and whatever you are telling
>    is correct and it will work.
> 
> 3. But if point 2 is not handled correctly we need to just aligned with
>    port related data structure to make it work.

PCI devices do not have to be enumerated in the devicetree (or in ACPI
DSDT) because PCI is an enumerable bus (like USB).  Only the host bridge
has to be enumerated in the devicetree or DSDT.  The kernel can find the
PCI devices below the host bridge itself.  Hot-plugged devices are
usually not described in the devicetree or DSDT because one doesn't
know their properties in advance.

pnv_php.c seems to search the devicetree for hotplug slots and
instantiates them.  My expectation would be that any hotplug-capable
PCIe Root Port or Downstream Port, which is *not* described in the
devicetree such that pnv_php.c creates a slot for it, is handled by
pciehp.

Timothy was talking about a Microsemi PCIe switch below the Root Port.
My understanding is that the Downstream Ports of that switch are
hotplug-capable.  So unless you've disabled CONFIG_HOTPLUG_PCI_PCIE,
I'd expect those ports to be handled by pciehp.  Assuming they're not
described as a "ibm,ioda2-phb" compatible device in the devicetree,
but why would they?

Thanks,

Lukas
Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
Posted by Krishna Kumar 3 months, 2 weeks ago
On 6/25/25 1:38 PM, Lukas Wunner wrote:
> On Wed, Jun 25, 2025 at 09:38:19AM +0530, Krishna Kumar wrote:
>> On 6/21/25 3:29 PM, Lukas Wunner wrote:
>>> On Fri, Jun 20, 2025 at 02:56:51PM +0530, Krishna Kumar wrote:
>>>> 5. If point 3 and 4 does not solve the problem, then only we should
>>>> move to pciehp.c. But AFAIK, PPC/Powernv is DT based while pciehp.c
>>>> may be only supporting acpi (I have to check it on this). We need to
>>>> provide PHB related information via DTB and maintain the related
>>>> topology information via dtb and then it can be doable.
>>> pciehp is not ACPI-specific.  The PCIe port service driver in
>>> drivers/pci/pcie/portdrv.c binds to any PCIe port, examines the
>>> port's capabilities (e.g. hotplug, AER, DPC, ...) and instantiates
>>> sub-devices to which pciehp and the other drivers such as aer bind.
>> 1. If we get PHB info from mmcfg via acpi table in x86 and create a
>>    root port from there with some address/entity and if this Acpi and
>>    associated entity is not present for PPC, then it can be a problem.
>>
>> 2. PPC is normally based on DTB entity and it identifies PHB and pcie
>>    devices from there. If this all the information is correctly map
>>    via portdrv.c then there is no problem and whatever you are telling
>>    is correct and it will work.
>>
>> 3. But if point 2 is not handled correctly we need to just aligned with
>>    port related data structure to make it work.
> PCI devices do not have to be enumerated in the devicetree (or in ACPI
> DSDT) because PCI is an enumerable bus (like USB).  Only the host bridge
> has to be enumerated in the devicetree or DSDT.  The kernel can find the
> PCI devices below the host bridge itself.
Yes in DFS manner (once it gets to know the PHB address via -acpi in X86 and via DTB in PPC)
>   Hot-plugged devices are
> usually not described in the devicetree or DSDT because one doesn't
> know their properties in advance.
>
> pnv_php.c seems to search the devicetree for hotplug slots and
> instantiates them.  My expectation would be that any hotplug-capable
> PCIe Root Port or Downstream Port, which is *not* described in the
> devicetree such that pnv_php.c creates a slot for it, is handled by
> pciehp.
Your are correct, pnv_php.c heavily depends on slot id and DTB nodes. Thats how its designed. Can we decouple it via DTB nodes, I will come back on this.
>
> Timothy was talking about a Microsemi PCIe switch below the Root Port.
> My understanding is that the Downstream Ports of that switch are
> hotplug-capable. 
I understand it.
>  So unless you've disabled CONFIG_HOTPLUG_PCI_PCIE,
> I'd expect those ports to be handled by pciehp. 
I need to check it and test it, but yeah- it may or maynot work and I will confirm it only after some study and testing (maybe in 1-2 week)
>  Assuming they're not
> described as a "ibm,ioda2-phb" compatible device in the devicetree,
> but why would they?
HW topology and DTB is based on IODA and it will keep changing, not frequently but eventually.
>
> Thanks,
>
> Lukas


Best Regards,

Krishna