If a device is in an error state, then any reads of device registers can
return error value. Add additional checks to validate if a device is in an
error state before doing an flr reset.
Cc: stable@vger.kernel.org
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/pci/pci.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 373421f4b9d8..8e4d924f4e88 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
*/
int pcie_reset_flr(struct pci_dev *dev, bool probe)
{
+ u32 reg;
+
if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
return -ENOTTY;
if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
return -ENOTTY;
+ if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, ®)) {
+ pci_warn(dev, "Device unable to do an FLR\n");
+ return -ENOTTY;
+ }
+
if (probe)
return 0;
--
2.43.0
On Mon, Mar 16, 2026 at 12:15:39PM -0700, Farhan Ali wrote:
> If a device is in an error state, then any reads of device registers can
> return error value. Add additional checks to validate if a device is in an
> error state before doing an flr reset.
s/flr/FLR/ (also in subject)
Also please extend the subject to say something specific about the
"additional checks". E.g.,
PCI: Fail FLR when config space inaccessible
> Cc: stable@vger.kernel.org
> Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> drivers/pci/pci.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 373421f4b9d8..8e4d924f4e88 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
> */
> int pcie_reset_flr(struct pci_dev *dev, bool probe)
> {
> + u32 reg;
> +
> if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
> return -ENOTTY;
>
> if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
> return -ENOTTY;
>
> + if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, ®)) {
> + pci_warn(dev, "Device unable to do an FLR\n");
> + return -ENOTTY;
> + }
I guess the point of this is to detect devices that are inaccessible?
The same sort of thing as in pci_dev_save_and_disable() from patch
3/9? But we use "dev->error_state == pci_channel_io_perm_failure"
instead?
No matter what we do, this has the same race as in patch 3/9. And I
think using dev->error_state also depends on AER being enabled, which
cuts out many PCIe devices.
I think using the same exact code as in pci_dev_save_and_disable()
would be more straightforward. And also the same sort of wording in
the message, e.g., "Device config space inaccessible; unable to FLR"
or similar. I foresee many of these messages in my future, and it
will be helpful to have a specific clue about why FLR failed :)
> if (probe)
> return 0;
>
> --
> 2.43.0
>
On Tue, 24 Mar 2026 17:49:36 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Mon, Mar 16, 2026 at 12:15:39PM -0700, Farhan Ali wrote:
> > If a device is in an error state, then any reads of device registers can
> > return error value. Add additional checks to validate if a device is in an
> > error state before doing an flr reset.
>
> s/flr/FLR/ (also in subject)
>
> Also please extend the subject to say something specific about the
> "additional checks". E.g.,
>
> PCI: Fail FLR when config space inaccessible
>
> > Cc: stable@vger.kernel.org
> > Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
> > Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > ---
> > drivers/pci/pci.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 373421f4b9d8..8e4d924f4e88 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
> > */
> > int pcie_reset_flr(struct pci_dev *dev, bool probe)
> > {
> > + u32 reg;
> > +
> > if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
> > return -ENOTTY;
> >
> > if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
> > return -ENOTTY;
> >
> > + if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, ®)) {
> > + pci_warn(dev, "Device unable to do an FLR\n");
> > + return -ENOTTY;
> > + }
>
> I guess the point of this is to detect devices that are inaccessible?
> The same sort of thing as in pci_dev_save_and_disable() from patch
> 3/9? But we use "dev->error_state == pci_channel_io_perm_failure"
> instead?
>
> No matter what we do, this has the same race as in patch 3/9. And I
> think using dev->error_state also depends on AER being enabled, which
> cuts out many PCIe devices.
>
> I think using the same exact code as in pci_dev_save_and_disable()
> would be more straightforward. And also the same sort of wording in
> the message, e.g., "Device config space inaccessible; unable to FLR"
> or similar. I foresee many of these messages in my future, and it
> will be helpful to have a specific clue about why FLR failed :)
I think pci_af_flr() and pci_pm_reset() are all subject to this as
well, some of the device specific resets too. Maybe we need a common
helper with a string provided so we can differentiate the callers, even
if we don't make all the conversions in this series. Thanks,
Alex
On 3/25/2026 9:25 AM, Alex Williamson wrote:
> On Tue, 24 Mar 2026 17:49:36 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
>
>> On Mon, Mar 16, 2026 at 12:15:39PM -0700, Farhan Ali wrote:
>>> If a device is in an error state, then any reads of device registers can
>>> return error value. Add additional checks to validate if a device is in an
>>> error state before doing an flr reset.
>> s/flr/FLR/ (also in subject)
>>
>> Also please extend the subject to say something specific about the
>> "additional checks". E.g.,
>>
>> PCI: Fail FLR when config space inaccessible
>>
>>> Cc: stable@vger.kernel.org
>>> Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
>>> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>> ---
>>> drivers/pci/pci.c | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 373421f4b9d8..8e4d924f4e88 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
>>> */
>>> int pcie_reset_flr(struct pci_dev *dev, bool probe)
>>> {
>>> + u32 reg;
>>> +
>>> if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
>>> return -ENOTTY;
>>>
>>> if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>>> return -ENOTTY;
>>>
>>> + if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, ®)) {
>>> + pci_warn(dev, "Device unable to do an FLR\n");
>>> + return -ENOTTY;
>>> + }
>> I guess the point of this is to detect devices that are inaccessible?
>> The same sort of thing as in pci_dev_save_and_disable() from patch
>> 3/9? But we use "dev->error_state == pci_channel_io_perm_failure"
>> instead?
>>
>> No matter what we do, this has the same race as in patch 3/9. And I
>> think using dev->error_state also depends on AER being enabled, which
>> cuts out many PCIe devices.
>>
>> I think using the same exact code as in pci_dev_save_and_disable()
>> would be more straightforward. And also the same sort of wording in
>> the message, e.g., "Device config space inaccessible; unable to FLR"
>> or similar. I foresee many of these messages in my future, and it
>> will be helpful to have a specific clue about why FLR failed :)
> I think pci_af_flr() and pci_pm_reset() are all subject to this as
> well, some of the device specific resets too. Maybe we need a common
> helper with a string provided so we can differentiate the callers, even
> if we don't make all the conversions in this series. Thanks,
>
> Alex
Yeah I was planning on implementing a common function in the next
revision. Will include a string parameter to differentiate the callers.
And yeah I prefer not to update pci_af_flr() or pci_pm_reset() as part
of this series since I don't have the devices to test those paths.
Thanks
Farhan
On 3/24/2026 3:49 PM, Bjorn Helgaas wrote:
> On Mon, Mar 16, 2026 at 12:15:39PM -0700, Farhan Ali wrote:
>> If a device is in an error state, then any reads of device registers can
>> return error value. Add additional checks to validate if a device is in an
>> error state before doing an flr reset.
> s/flr/FLR/ (also in subject)
>
> Also please extend the subject to say something specific about the
> "additional checks". E.g.,
>
> PCI: Fail FLR when config space inaccessible
This makes sense, will update the subject.
>
>> Cc: stable@vger.kernel.org
>> Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
>> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>> drivers/pci/pci.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 373421f4b9d8..8e4d924f4e88 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4371,12 +4371,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
>> */
>> int pcie_reset_flr(struct pci_dev *dev, bool probe)
>> {
>> + u32 reg;
>> +
>> if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
>> return -ENOTTY;
>>
>> if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>> return -ENOTTY;
>>
>> + if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, ®)) {
>> + pci_warn(dev, "Device unable to do an FLR\n");
>> + return -ENOTTY;
>> + }
> I guess the point of this is to detect devices that are inaccessible?
> The same sort of thing as in pci_dev_save_and_disable() from patch
> 3/9? But we use "dev->error_state == pci_channel_io_perm_failure"
> instead?
>
> No matter what we do, this has the same race as in patch 3/9. And I
> think using dev->error_state also depends on AER being enabled, which
> cuts out many PCIe devices.
>
> I think using the same exact code as in pci_dev_save_and_disable()
> would be more straightforward. And also the same sort of wording in
> the message, e.g., "Device config space inaccessible; unable to FLR"
> or similar. I foresee many of these messages in my future, and it
> will be helpful to have a specific clue about why FLR failed :)
I will update the message :) and update the code to use the same check
at patch 3 here (will move the check to a common function).
Thanks
Farhan
>
>> if (probe)
>> return 0;
>>
>> --
>> 2.43.0
>>
© 2016 - 2026 Red Hat, Inc.