[PATCH v11 4/9] PCI: Add additional checks for flr reset

Farhan Ali posted 9 patches 3 weeks ago
There is a newer version of this series
[PATCH v11 4/9] PCI: Add additional checks for flr reset
Posted by Farhan Ali 3 weeks ago
If a device is in an error state, then any reads of device registers can
return error value. Add additional checks to validate if a device is in an
error state before doing an flr reset.

Cc: stable@vger.kernel.org
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/pci/pci.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 373421f4b9d8..8e4d924f4e88 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
  */
 int pcie_reset_flr(struct pci_dev *dev, bool probe)
 {
+	u32 reg;
+
 	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
 		return -ENOTTY;
 
 	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
 		return -ENOTTY;
 
+	if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
+		pci_warn(dev, "Device unable to do an FLR\n");
+		return -ENOTTY;
+	}
+
 	if (probe)
 		return 0;
 
-- 
2.43.0
Re: [PATCH v11 4/9] PCI: Add additional checks for flr reset
Posted by Bjorn Helgaas 1 week, 6 days ago
On Mon, Mar 16, 2026 at 12:15:39PM -0700, Farhan Ali wrote:
> If a device is in an error state, then any reads of device registers can
> return error value. Add additional checks to validate if a device is in an
> error state before doing an flr reset.

s/flr/FLR/ (also in subject)

Also please extend the subject to say something specific about the
"additional checks".  E.g.,

  PCI: Fail FLR when config space inaccessible

> Cc: stable@vger.kernel.org
> Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  drivers/pci/pci.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 373421f4b9d8..8e4d924f4e88 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
>   */
>  int pcie_reset_flr(struct pci_dev *dev, bool probe)
>  {
> +	u32 reg;
> +
>  	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
>  		return -ENOTTY;
>  
>  	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>  		return -ENOTTY;
>  
> +	if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
> +		pci_warn(dev, "Device unable to do an FLR\n");
> +		return -ENOTTY;
> +	}

I guess the point of this is to detect devices that are inaccessible?
The same sort of thing as in pci_dev_save_and_disable() from patch
3/9?  But we use "dev->error_state == pci_channel_io_perm_failure"
instead?

No matter what we do, this has the same race as in patch 3/9.  And I
think using dev->error_state also depends on AER being enabled, which
cuts out many PCIe devices.

I think using the same exact code as in pci_dev_save_and_disable()
would be more straightforward.  And also the same sort of wording in
the message, e.g., "Device config space inaccessible; unable to FLR"
or similar.  I foresee many of these messages in my future, and it
will be helpful to have a specific clue about why FLR failed :)

>  	if (probe)
>  		return 0;
>  
> -- 
> 2.43.0
>
Re: [PATCH v11 4/9] PCI: Add additional checks for flr reset
Posted by Alex Williamson 1 week, 5 days ago
On Tue, 24 Mar 2026 17:49:36 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Mon, Mar 16, 2026 at 12:15:39PM -0700, Farhan Ali wrote:
> > If a device is in an error state, then any reads of device registers can
> > return error value. Add additional checks to validate if a device is in an
> > error state before doing an flr reset.  
> 
> s/flr/FLR/ (also in subject)
> 
> Also please extend the subject to say something specific about the
> "additional checks".  E.g.,
> 
>   PCI: Fail FLR when config space inaccessible
> 
> > Cc: stable@vger.kernel.org
> > Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
> > Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > ---
> >  drivers/pci/pci.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 373421f4b9d8..8e4d924f4e88 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
> >   */
> >  int pcie_reset_flr(struct pci_dev *dev, bool probe)
> >  {
> > +	u32 reg;
> > +
> >  	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
> >  		return -ENOTTY;
> >  
> >  	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
> >  		return -ENOTTY;
> >  
> > +	if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
> > +		pci_warn(dev, "Device unable to do an FLR\n");
> > +		return -ENOTTY;
> > +	}  
> 
> I guess the point of this is to detect devices that are inaccessible?
> The same sort of thing as in pci_dev_save_and_disable() from patch
> 3/9?  But we use "dev->error_state == pci_channel_io_perm_failure"
> instead?
> 
> No matter what we do, this has the same race as in patch 3/9.  And I
> think using dev->error_state also depends on AER being enabled, which
> cuts out many PCIe devices.
> 
> I think using the same exact code as in pci_dev_save_and_disable()
> would be more straightforward.  And also the same sort of wording in
> the message, e.g., "Device config space inaccessible; unable to FLR"
> or similar.  I foresee many of these messages in my future, and it
> will be helpful to have a specific clue about why FLR failed :)

I think pci_af_flr() and pci_pm_reset() are all subject to this as
well, some of the device specific resets too.  Maybe we need a common
helper with a string provided so we can differentiate the callers, even
if we don't make all the conversions in this series.  Thanks,

Alex
Re: [PATCH v11 4/9] PCI: Add additional checks for flr reset
Posted by Farhan Ali 1 week, 5 days ago
On 3/25/2026 9:25 AM, Alex Williamson wrote:
> On Tue, 24 Mar 2026 17:49:36 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
>
>> On Mon, Mar 16, 2026 at 12:15:39PM -0700, Farhan Ali wrote:
>>> If a device is in an error state, then any reads of device registers can
>>> return error value. Add additional checks to validate if a device is in an
>>> error state before doing an flr reset.
>> s/flr/FLR/ (also in subject)
>>
>> Also please extend the subject to say something specific about the
>> "additional checks".  E.g.,
>>
>>    PCI: Fail FLR when config space inaccessible
>>
>>> Cc: stable@vger.kernel.org
>>> Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
>>> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>> ---
>>>   drivers/pci/pci.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 373421f4b9d8..8e4d924f4e88 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
>>>    */
>>>   int pcie_reset_flr(struct pci_dev *dev, bool probe)
>>>   {
>>> +	u32 reg;
>>> +
>>>   	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
>>>   		return -ENOTTY;
>>>   
>>>   	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>>>   		return -ENOTTY;
>>>   
>>> +	if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
>>> +		pci_warn(dev, "Device unable to do an FLR\n");
>>> +		return -ENOTTY;
>>> +	}
>> I guess the point of this is to detect devices that are inaccessible?
>> The same sort of thing as in pci_dev_save_and_disable() from patch
>> 3/9?  But we use "dev->error_state == pci_channel_io_perm_failure"
>> instead?
>>
>> No matter what we do, this has the same race as in patch 3/9.  And I
>> think using dev->error_state also depends on AER being enabled, which
>> cuts out many PCIe devices.
>>
>> I think using the same exact code as in pci_dev_save_and_disable()
>> would be more straightforward.  And also the same sort of wording in
>> the message, e.g., "Device config space inaccessible; unable to FLR"
>> or similar.  I foresee many of these messages in my future, and it
>> will be helpful to have a specific clue about why FLR failed :)
> I think pci_af_flr() and pci_pm_reset() are all subject to this as
> well, some of the device specific resets too.  Maybe we need a common
> helper with a string provided so we can differentiate the callers, even
> if we don't make all the conversions in this series.  Thanks,
>
> Alex

Yeah I was planning on implementing a common function in the next 
revision. Will include a string parameter to differentiate the callers. 
And yeah I prefer not to update pci_af_flr() or pci_pm_reset() as part 
of this series since I don't have the devices to test those paths.

Thanks

Farhan
Re: [PATCH v11 4/9] PCI: Add additional checks for flr reset
Posted by Farhan Ali 1 week, 6 days ago
On 3/24/2026 3:49 PM, Bjorn Helgaas wrote:
> On Mon, Mar 16, 2026 at 12:15:39PM -0700, Farhan Ali wrote:
>> If a device is in an error state, then any reads of device registers can
>> return error value. Add additional checks to validate if a device is in an
>> error state before doing an flr reset.
> s/flr/FLR/ (also in subject)
>
> Also please extend the subject to say something specific about the
> "additional checks".  E.g.,
>
>    PCI: Fail FLR when config space inaccessible

This makes sense, will update the subject.


>
>> Cc: stable@vger.kernel.org
>> Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
>> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   drivers/pci/pci.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 373421f4b9d8..8e4d924f4e88 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
>>    */
>>   int pcie_reset_flr(struct pci_dev *dev, bool probe)
>>   {
>> +	u32 reg;
>> +
>>   	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
>>   		return -ENOTTY;
>>   
>>   	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>>   		return -ENOTTY;
>>   
>> +	if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
>> +		pci_warn(dev, "Device unable to do an FLR\n");
>> +		return -ENOTTY;
>> +	}
> I guess the point of this is to detect devices that are inaccessible?
> The same sort of thing as in pci_dev_save_and_disable() from patch
> 3/9?  But we use "dev->error_state == pci_channel_io_perm_failure"
> instead?
>
> No matter what we do, this has the same race as in patch 3/9.  And I
> think using dev->error_state also depends on AER being enabled, which
> cuts out many PCIe devices.
>
> I think using the same exact code as in pci_dev_save_and_disable()
> would be more straightforward.  And also the same sort of wording in
> the message, e.g., "Device config space inaccessible; unable to FLR"
> or similar.  I foresee many of these messages in my future, and it
> will be helpful to have a specific clue about why FLR failed :)

I will update the message :) and update the code to use the same check 
at patch 3 here (will move the check to a common function).

Thanks

Farhan


>
>>   	if (probe)
>>   		return 0;
>>   
>> -- 
>> 2.43.0
>>