[v4] Error recovery for vfio-pci devices on s390x

[PATCH v4 02/10] PCI: Add additional checks for flr reset

Posted by Farhan Ali 4 months, 2 weeks ago

If a device is in an error state, then any reads of device registers can
return error value. Add addtional checks to validate if a device is in an
error state before doing an flr reset.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/pci/pci.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a3d93d1baee7..327fefc6a1eb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4576,12 +4576,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
  */
 int pcie_reset_flr(struct pci_dev *dev, bool probe)
 {
+	u32 reg;
+
 	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
 		return -ENOTTY;
 
 	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
 		return -ENOTTY;
 
+	if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
+		pci_warn(dev, "Device unable to do an FLR\n");
+		return -ENOTTY;
+	}
+
 	if (probe)
 		return 0;
 
-- 
2.43.0

Re: [PATCH v4 02/10] PCI: Add additional checks for flr reset

Posted by Benjamin Block 4 months, 1 week ago

On Wed, Sep 24, 2025 at 10:16:20AM -0700, Farhan Ali wrote:
> If a device is in an error state, then any reads of device registers can
> return error value. Add addtional checks to validate if a device is in an
> error state before doing an flr reset.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  drivers/pci/pci.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 

Apart from the discussion about a stable tag this looks good to me.


Reviewed-by: Benjamin Block <bblock@linux.ibm.com>

-- 
Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
Vors. Aufs.-R.: Wolfgang Wendt         /        Geschäftsführung: David Faller
Sitz der Ges.: Böblingen     /    Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH v4 02/10] PCI: Add additional checks for flr reset

Posted by Benjamin Block 4 months, 1 week ago

On Wed, Sep 24, 2025 at 10:16:20AM -0700, Farhan Ali wrote:
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index a3d93d1baee7..327fefc6a1eb 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4576,12 +4576,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
>   */
>  int pcie_reset_flr(struct pci_dev *dev, bool probe)
>  {
> +	u32 reg;
> +
>  	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
>  		return -ENOTTY;
>  
>  	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>  		return -ENOTTY;
>  
> +	if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
> +		pci_warn(dev, "Device unable to do an FLR\n");
> +		return -ENOTTY;
> +	}

Just thinking out loud, not sure whether it make sense, but since you already
read an up-to-date value from the config space, would it make sense to
pull the check above `dev->devcap & PCI_EXP_DEVCAP_FLR` below this read, and
check on the just read `reg`?

Also wondering whether it makes sense to stable-tag this? We've recently seen
"unpleasant" recovery attempts that look like this in the kernel logs:

    [  663.330053] vfio-pci 0007:00:00.1: timed out waiting for pending transaction; performing function level reset anyway
    [  664.730051] vfio-pci 0007:00:00.1: not ready 1023ms after FLR; waiting
    [  665.830023] vfio-pci 0007:00:00.1: not ready 2047ms after FLR; waiting
    [  667.910023] vfio-pci 0007:00:00.1: not ready 4095ms after FLR; waiting
    [  672.070022] vfio-pci 0007:00:00.1: not ready 8191ms after FLR; waiting
    [  680.550025] vfio-pci 0007:00:00.1: not ready 16383ms after FLR; waiting
    [  697.190023] vfio-pci 0007:00:00.1: not ready 32767ms after FLR; waiting
    [  730.470021] vfio-pci 0007:00:00.1: not ready 65535ms after FLR; giving up

The VF here was already dead in the water at that point, so I suspect
`pci_read_config_dword()` might have failed, and so this check would have
failed, and we wouldn't have "wasted" the minute waiting for a FLR that was
never going to happen anyway.


-- 
Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
Vors. Aufs.-R.: Wolfgang Wendt         /        Geschäftsführung: David Faller
Sitz der Ges.: Böblingen     /    Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH v4 02/10] PCI: Add additional checks for flr reset

Posted by Farhan Ali 4 months, 1 week ago

On 9/30/2025 3:03 AM, Benjamin Block wrote:
> On Wed, Sep 24, 2025 at 10:16:20AM -0700, Farhan Ali wrote:
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index a3d93d1baee7..327fefc6a1eb 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4576,12 +4576,19 @@ EXPORT_SYMBOL_GPL(pcie_flr);
>>    */
>>   int pcie_reset_flr(struct pci_dev *dev, bool probe)
>>   {
>> +	u32 reg;
>> +
>>   	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
>>   		return -ENOTTY;
>>   
>>   	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>>   		return -ENOTTY;
>>   
>> +	if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
>> +		pci_warn(dev, "Device unable to do an FLR\n");
>> +		return -ENOTTY;
>> +	}
> Just thinking out loud, not sure whether it make sense, but since you already
> read an up-to-date value from the config space, would it make sense to
> pull the check above `dev->devcap & PCI_EXP_DEVCAP_FLR` below this read, and
> check on the just read `reg`?

My thinking was we could exit early if the device never had FLR 
capability (and so was not cached in devcap). This way we avoid an extra 
PCI read.


>
> Also wondering whether it makes sense to stable-tag this? We've recently seen
> "unpleasant" recovery attempts that look like this in the kernel logs:
>
>      [  663.330053] vfio-pci 0007:00:00.1: timed out waiting for pending transaction; performing function level reset anyway
>      [  664.730051] vfio-pci 0007:00:00.1: not ready 1023ms after FLR; waiting
>      [  665.830023] vfio-pci 0007:00:00.1: not ready 2047ms after FLR; waiting
>      [  667.910023] vfio-pci 0007:00:00.1: not ready 4095ms after FLR; waiting
>      [  672.070022] vfio-pci 0007:00:00.1: not ready 8191ms after FLR; waiting
>      [  680.550025] vfio-pci 0007:00:00.1: not ready 16383ms after FLR; waiting
>      [  697.190023] vfio-pci 0007:00:00.1: not ready 32767ms after FLR; waiting
>      [  730.470021] vfio-pci 0007:00:00.1: not ready 65535ms after FLR; giving up
>
> The VF here was already dead in the water at that point, so I suspect
> `pci_read_config_dword()` might have failed, and so this check would have
> failed, and we wouldn't have "wasted" the minute waiting for a FLR that was
> never going to happen anyway.
I think maybe we could? I don't think this patch fixes anything that's 
"broken" but rather improves the behavior to escalate to other reset 
method if the device is already in a bad state. I will cc stable.

Thanks

Farhan

Re: [PATCH v4 02/10] PCI: Add additional checks for flr reset

Posted by Benjamin Block 4 months, 1 week ago

On Tue, Sep 30, 2025 at 10:04:25AM -0700, Farhan Ali wrote:
> 
> On 9/30/2025 3:03 AM, Benjamin Block wrote:
> > On Wed, Sep 24, 2025 at 10:16:20AM -0700, Farhan Ali wrote:
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index a3d93d1baee7..327fefc6a1eb 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
--8<--
> >> +	if (pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &reg)) {
> >> +		pci_warn(dev, "Device unable to do an FLR\n");
> >> +		return -ENOTTY;
> >> +	}
> > Just thinking out loud, not sure whether it make sense, but since you already
> > read an up-to-date value from the config space, would it make sense to
> > pull the check above `dev->devcap & PCI_EXP_DEVCAP_FLR` below this read, and
> > check on the just read `reg`?
> 
> My thinking was we could exit early if the device never had FLR 
> capability (and so was not cached in devcap). This way we avoid an extra 
> PCI read.

That makes sense.

> > Also wondering whether it makes sense to stable-tag this? We've recently seen
> > "unpleasant" recovery attempts that look like this in the kernel logs:
> >
> >      [  663.330053] vfio-pci 0007:00:00.1: timed out waiting for pending transaction; performing function level reset anyway
> >      [  664.730051] vfio-pci 0007:00:00.1: not ready 1023ms after FLR; waiting
> >      [  665.830023] vfio-pci 0007:00:00.1: not ready 2047ms after FLR; waiting
> >      [  667.910023] vfio-pci 0007:00:00.1: not ready 4095ms after FLR; waiting
> >      [  672.070022] vfio-pci 0007:00:00.1: not ready 8191ms after FLR; waiting
> >      [  680.550025] vfio-pci 0007:00:00.1: not ready 16383ms after FLR; waiting
> >      [  697.190023] vfio-pci 0007:00:00.1: not ready 32767ms after FLR; waiting
> >      [  730.470021] vfio-pci 0007:00:00.1: not ready 65535ms after FLR; giving up
> >
> > The VF here was already dead in the water at that point, so I suspect
> > `pci_read_config_dword()` might have failed, and so this check would have
> > failed, and we wouldn't have "wasted" the minute waiting for a FLR that was
> > never going to happen anyway.
>
> I think maybe we could? I don't think this patch fixes anything that's 
> "broken" but rather improves the behavior to escalate to other reset 
> method if the device is already in a bad state. I will cc stable.

Right, adding a Fixes tag doesn't really make sense. But it does help with
accelerating recoveries, so maybe a stable tag will work :)


-- 
Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
Vors. Aufs.-R.: Wolfgang Wendt         /        Geschäftsführung: David Faller
Sitz der Ges.: Böblingen     /    Registergericht: AmtsG Stuttgart, HRB 243294