[PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices

grwhyte@linux.microsoft.com posted 2 patches 4 months ago
drivers/pci/pci.c    |  8 ++++++--
drivers/pci/pci.h    |  2 ++
drivers/pci/quirks.c | 20 ++++++++++++++++++++
include/linux/pci.h  |  1 +
4 files changed, 29 insertions(+), 2 deletions(-)
[PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by grwhyte@linux.microsoft.com 4 months ago
From: Graham Whyte <grwhyte@linux.microsoft.com>

Add a new flr_delay member of the pci_dev struct to allow customization of
the delay after FLR for devices that do not support immediate readiness
or readiness time reporting. The main scenario this addresses is VF
removal and rescan during runtime repairs and driver updates, which,
if fixed to 100ms, introduces significant delays across multiple VFs.
These delays are unnecessary for devices that complete the FLR well
within this timeframe.

Patch 1 adds the flr_delay member to the pci_dev struct
Patch 2 adds the msft device specific quirk to utilize the flr_delay

---
v2->v3:
- Removed Microsoft specific pcie reset reset, replaced with customizable flr_delay parameter
- Changed msleep in pcie_flr to usleep_range to support flr delays of under 20ms 
v1->v2:
- Removed unnecessary EXPORT_SYMBOL_GPL for function pci_dev_wait
- Link to thread:https://lore.kernel.org/linux-pci/?q=f%3Agrwhyte&x=t#m7453647902a1b22840f5e39434a631fd7b2515ce'

Link to V1: https://lore.kernel.org/linux-pci/20250522085253.GN7435@unreal/T/#m7453647902a1b22840f5e39434a631fd7b2515ce  

Graham Whyte (2):
  PCI: Add flr_delay parameter to pci_dev struct
  PCI: Reduce FLR delay to 10ms for MSFT devices

 drivers/pci/pci.c    |  8 ++++++--
 drivers/pci/pci.h    |  2 ++
 drivers/pci/quirks.c | 20 ++++++++++++++++++++
 include/linux/pci.h  |  1 +
 4 files changed, 29 insertions(+), 2 deletions(-)

-- 
2.25.1
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Manivannan Sadhasivam 3 months, 4 weeks ago
On Wed, Jun 11, 2025 at 12:05:50AM +0000, grwhyte@linux.microsoft.com wrote:
> From: Graham Whyte <grwhyte@linux.microsoft.com>
> 
> Add a new flr_delay member of the pci_dev struct to allow customization of
> the delay after FLR for devices that do not support immediate readiness
> or readiness time reporting. The main scenario this addresses is VF
> removal and rescan during runtime repairs and driver updates, which,
> if fixed to 100ms, introduces significant delays across multiple VFs.
> These delays are unnecessary for devices that complete the FLR well
> within this timeframe.
> 

I don't think it is acceptable to *reduce* the standard delay just because your
device completes it more quickly. Proper way to reduce the timing would be to
support FRS as you said, but we cannot have arbitrary delays for random devices.

- Mani

> Patch 1 adds the flr_delay member to the pci_dev struct
> Patch 2 adds the msft device specific quirk to utilize the flr_delay
> 
> ---
> v2->v3:
> - Removed Microsoft specific pcie reset reset, replaced with customizable flr_delay parameter
> - Changed msleep in pcie_flr to usleep_range to support flr delays of under 20ms 
> v1->v2:
> - Removed unnecessary EXPORT_SYMBOL_GPL for function pci_dev_wait
> - Link to thread:https://lore.kernel.org/linux-pci/?q=f%3Agrwhyte&x=t#m7453647902a1b22840f5e39434a631fd7b2515ce'
> 
> Link to V1: https://lore.kernel.org/linux-pci/20250522085253.GN7435@unreal/T/#m7453647902a1b22840f5e39434a631fd7b2515ce  
> 
> Graham Whyte (2):
>   PCI: Add flr_delay parameter to pci_dev struct
>   PCI: Reduce FLR delay to 10ms for MSFT devices
> 
>  drivers/pci/pci.c    |  8 ++++++--
>  drivers/pci/pci.h    |  2 ++
>  drivers/pci/quirks.c | 20 ++++++++++++++++++++
>  include/linux/pci.h  |  1 +
>  4 files changed, 29 insertions(+), 2 deletions(-)
> 
> -- 
> 2.25.1
> 
> 

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Lukas Wunner 3 months, 4 weeks ago
On Fri, Jun 13, 2025 at 05:12:48PM +0530, Manivannan Sadhasivam wrote:
> On Wed, Jun 11, 2025 at 12:05:50AM +0000, grwhyte@linux.microsoft.com wrote:
> > Add a new flr_delay member of the pci_dev struct to allow customization of
> > the delay after FLR for devices that do not support immediate readiness
> > or readiness time reporting. The main scenario this addresses is VF
> > removal and rescan during runtime repairs and driver updates, which,
> > if fixed to 100ms, introduces significant delays across multiple VFs.
> > These delays are unnecessary for devices that complete the FLR well
> > within this timeframe.
> > 
> 
> I don't think it is acceptable to *reduce* the standard delay just
> because your device completes it more quickly. Proper way to reduce
> the timing would be to support FRS as you said, but we cannot have
> arbitrary delays for random devices.

To be fair, we already have that for certain devices:

The quirk delay_250ms_after_flr() is referenced by three different
Vendor ID / Device ID combos and *lengthens* the delay after FLR.

It's probably difficult to justify rejecting custom delays for
certain MANA devices, even though we allowed them for three other
devices.

The proposed patch introduces a generic solution which avoids
further cluttering up pci_dev_reset_methods[] with extra entries,
so I think it's an approach worth considering.

There are a bunch of nits in the proposed patches, such as "pci"
not being capitalized, but the general approach seems fine to me.

Thanks,

Lukas
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Manivannan Sadhasivam 3 months, 4 weeks ago
On Fri, Jun 13, 2025 at 03:45:35PM +0200, Lukas Wunner wrote:
> On Fri, Jun 13, 2025 at 05:12:48PM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Jun 11, 2025 at 12:05:50AM +0000, grwhyte@linux.microsoft.com wrote:
> > > Add a new flr_delay member of the pci_dev struct to allow customization of
> > > the delay after FLR for devices that do not support immediate readiness
> > > or readiness time reporting. The main scenario this addresses is VF
> > > removal and rescan during runtime repairs and driver updates, which,
> > > if fixed to 100ms, introduces significant delays across multiple VFs.
> > > These delays are unnecessary for devices that complete the FLR well
> > > within this timeframe.
> > > 
> > 
> > I don't think it is acceptable to *reduce* the standard delay just
> > because your device completes it more quickly. Proper way to reduce
> > the timing would be to support FRS as you said, but we cannot have
> > arbitrary delays for random devices.
> 
> To be fair, we already have that for certain devices:
> 
> The quirk delay_250ms_after_flr() is referenced by three different
> Vendor ID / Device ID combos and *lengthens* the delay after FLR.
> 

This quirk is fine as it works around an issue in the device. But this patch is
not fixing/working around an issue in the device, but rather optimizing the
delay for performance, which is not what quirks are used for AFAIK.

> It's probably difficult to justify rejecting custom delays for
> certain MANA devices, even though we allowed them for three other
> devices.
> 

If the MANA devices require extended delay, then a quirk indeed makes sense.
But it is the other way around.

> The proposed patch introduces a generic solution which avoids
> further cluttering up pci_dev_reset_methods[] with extra entries,
> so I think it's an approach worth considering.
> 
> There are a bunch of nits in the proposed patches, such as "pci"
> not being capitalized, but the general approach seems fine to me.
> 

I honestly don't know if there is any other way to handle this. So I think it is
upto Bjorn to take a call.

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Christoph Hellwig 4 months ago
On Wed, Jun 11, 2025 at 12:05:50AM +0000, grwhyte@linux.microsoft.com wrote:
> From: Graham Whyte <grwhyte@linux.microsoft.com>
> 
> Add a new flr_delay member of the pci_dev struct to allow customization of
> the delay after FLR for devices that do not support immediate readiness
> or readiness time reporting. The main scenario this addresses is VF
> removal and rescan during runtime repairs and driver updates, which,
> if fixed to 100ms, introduces significant delays across multiple VFs.
> These delays are unnecessary for devices that complete the FLR well
> within this timeframe.

Please work with the PCIe SIG to have a standard capability for this
instead of piling up hacks like this quirk.
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Niklas Cassel 4 months ago
On Tue, Jun 10, 2025 at 08:27:12PM -0700, Christoph Hellwig wrote:
> On Wed, Jun 11, 2025 at 12:05:50AM +0000, grwhyte@linux.microsoft.com wrote:
> > From: Graham Whyte <grwhyte@linux.microsoft.com>
> > 
> > Add a new flr_delay member of the pci_dev struct to allow customization of
> > the delay after FLR for devices that do not support immediate readiness
> > or readiness time reporting. The main scenario this addresses is VF
> > removal and rescan during runtime repairs and driver updates, which,
> > if fixed to 100ms, introduces significant delays across multiple VFs.
> > These delays are unnecessary for devices that complete the FLR well
> > within this timeframe.
> 
> Please work with the PCIe SIG to have a standard capability for this
> instead of piling up hacks like this quirk.

There is already some support in PCIe for this.

For Conventional Reset, see PCIe 6.0, section 6.6.1, there is the
"Readiness Time Reporting Extended Capability":
"For a Device that implements the Readiness Time Reporting Extended Capability,
and has reported a Reset Time shorter than 100ms, software is permitted to send
a Configuration Request to the Device after waiting the reported Reset Time
from Conventional Reset."


For FLR, see PCIe 6.0, section 6.6.2, there is the "Function Readiness Status":
"After an FLR has been initiated by writing a 1b to the Initiate Function Level
Reset bit, the Function must complete the FLR within 100 ms. [...] If Function
Readiness Status (FRS - see § Section 6.22.2 ) is implemented, then system
software is permitted to issue Configuration Requests to the Function
immediately following receipt of an FRS Message indicating Configuration-Ready,
however, this does not necessarily indicate that outstanding Requests initiated
by the Function have completed."


Kind regards,
Niklas
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Graham Whyte 4 months ago

On 6/11/2025 12:23 AM, Niklas Cassel wrote:
> On Tue, Jun 10, 2025 at 08:27:12PM -0700, Christoph Hellwig wrote:
>> On Wed, Jun 11, 2025 at 12:05:50AM +0000, grwhyte@linux.microsoft.com wrote:
>>> From: Graham Whyte <grwhyte@linux.microsoft.com>
>>>
>>> Add a new flr_delay member of the pci_dev struct to allow customization of
>>> the delay after FLR for devices that do not support immediate readiness
>>> or readiness time reporting. The main scenario this addresses is VF
>>> removal and rescan during runtime repairs and driver updates, which,
>>> if fixed to 100ms, introduces significant delays across multiple VFs.
>>> These delays are unnecessary for devices that complete the FLR well
>>> within this timeframe.
>>
>> Please work with the PCIe SIG to have a standard capability for this
>> instead of piling up hacks like this quirk.
> 
> There is already some support in PCIe for this.
> 
> For Conventional Reset, see PCIe 6.0, section 6.6.1, there is the
> "Readiness Time Reporting Extended Capability":
> "For a Device that implements the Readiness Time Reporting Extended Capability,
> and has reported a Reset Time shorter than 100ms, software is permitted to send
> a Configuration Request to the Device after waiting the reported Reset Time
> from Conventional Reset."
> 
> 
> For FLR, see PCIe 6.0, section 6.6.2, there is the "Function Readiness Status":
> "After an FLR has been initiated by writing a 1b to the Initiate Function Level
> Reset bit, the Function must complete the FLR within 100 ms. [...] If Function
> Readiness Status (FRS - see § Section 6.22.2 ) is implemented, then system
> software is permitted to issue Configuration Requests to the Function
> immediately following receipt of an FRS Message indicating Configuration-Ready,
> however, this does not necessarily indicate that outstanding Requests initiated
> by the Function have completed."
> 
> 
> Kind regards,
> Niklas

We can ask our HW engineers to implement function readiness but we need to be
able to support exiting products, hence why posting it as a quirk. 
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Christoph Hellwig 4 months ago
On Wed, Jun 11, 2025 at 01:08:21PM -0700, Graham Whyte wrote:
> We can ask our HW engineers to implement function readiness but we need
> to be able to support exiting products, hence why posting it as a quirk.

Your report sounds like it works perfectly fine, it's just that you
want to reduce the delay.  For that you'll need to stick to the standard
methods instead of adding quirks, which are for buggy hardware that does
not otherwise work.
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Graham Whyte 4 months ago

On 6/11/2025 11:31 PM, Christoph Hellwig wrote:
> On Wed, Jun 11, 2025 at 01:08:21PM -0700, Graham Whyte wrote:
>> We can ask our HW engineers to implement function readiness but we need
>> to be able to support exiting products, hence why posting it as a quirk.
> 
> Your report sounds like it works perfectly fine, it's just that you
> want to reduce the delay.  For that you'll need to stick to the standard
> methods instead of adding quirks, which are for buggy hardware that does
> not otherwise work.

Bjorn, what would you recommend as next steps here?
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Bjorn Helgaas 3 months, 4 weeks ago
On Thu, Jun 12, 2025 at 09:41:45AM -0700, Graham Whyte wrote:
> On 6/11/2025 11:31 PM, Christoph Hellwig wrote:
> > On Wed, Jun 11, 2025 at 01:08:21PM -0700, Graham Whyte wrote:
> >> We can ask our HW engineers to implement function readiness but we need
> >> to be able to support exiting products, hence why posting it as a quirk.
> > 
> > Your report sounds like it works perfectly fine, it's just that you
> > want to reduce the delay.  For that you'll need to stick to the standard
> > methods instead of adding quirks, which are for buggy hardware that does
> > not otherwise work.
> 
> Bjorn, what would you recommend as next steps here?

This is a tough call and I don't pretend to have an obvious answer.  I
understand the desire to improve performance.  On the other hand, PCI
has been successful over the long term because devices adhere to
standardized ways of doing things, which makes generic software
possible.  Quirks degrade that story, of course, especially when there
is an existing standardized solution that isn't being used.  I'm not
at all happy about vendors that decide against the standard solution
and then ask OS folks to do extra work to compensate.

Bjorn
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Graham Whyte 3 months, 3 weeks ago

On 6/13/2025 8:33 AM, Bjorn Helgaas wrote:
> On Thu, Jun 12, 2025 at 09:41:45AM -0700, Graham Whyte wrote:
>> On 6/11/2025 11:31 PM, Christoph Hellwig wrote:
>>> On Wed, Jun 11, 2025 at 01:08:21PM -0700, Graham Whyte wrote:
>>>> We can ask our HW engineers to implement function readiness but we need
>>>> to be able to support exiting products, hence why posting it as a quirk.
>>>
>>> Your report sounds like it works perfectly fine, it's just that you
>>> want to reduce the delay.  For that you'll need to stick to the standard
>>> methods instead of adding quirks, which are for buggy hardware that does
>>> not otherwise work.
>>
>> Bjorn, what would you recommend as next steps here?
> 
> This is a tough call and I don't pretend to have an obvious answer.  I
> understand the desire to improve performance.  On the other hand, PCI
> has been successful over the long term because devices adhere to
> standardized ways of doing things, which makes generic software
> possible.  Quirks degrade that story, of course, especially when there
> is an existing standardized solution that isn't being used.  I'm not
> at all happy about vendors that decide against the standard solution
> and then ask OS folks to do extra work to compensate.
> 
> Bjorn

Hi Bjorn,

Should someone want to implement readiness time reporting down the road,
they'll need to do the same work as patch 1 in this series (making the
flr delay a configurable parameter). This change lays the groundwork for
that work, while also supporting devices that can't use readiness time
reporting.

Alternatively could we use a sysfs file to make this parameter
configurable via the user space application? Similar to switching between
d3hot and d3cold by writing to /sys/bus/pci/slots/$DEVICE/power.

Thanks,
Graham
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Bjorn Helgaas 3 months, 3 weeks ago
On Mon, Jun 16, 2025 at 12:02:41PM -0700, Graham Whyte wrote:
> On 6/13/2025 8:33 AM, Bjorn Helgaas wrote:
> > On Thu, Jun 12, 2025 at 09:41:45AM -0700, Graham Whyte wrote:
> >> On 6/11/2025 11:31 PM, Christoph Hellwig wrote:
> >>> On Wed, Jun 11, 2025 at 01:08:21PM -0700, Graham Whyte wrote:
> >>>> We can ask our HW engineers to implement function readiness but we need
> >>>> to be able to support exiting products, hence why posting it as a quirk.
> >>>
> >>> Your report sounds like it works perfectly fine, it's just that you
> >>> want to reduce the delay.  For that you'll need to stick to the standard
> >>> methods instead of adding quirks, which are for buggy hardware that does
> >>> not otherwise work.
> >>
> >> Bjorn, what would you recommend as next steps here?
> > 
> > This is a tough call and I don't pretend to have an obvious answer.  I
> > understand the desire to improve performance.  On the other hand, PCI
> > has been successful over the long term because devices adhere to
> > standardized ways of doing things, which makes generic software
> > possible.  Quirks degrade that story, of course, especially when there
> > is an existing standardized solution that isn't being used.  I'm not
> > at all happy about vendors that decide against the standard solution
> > and then ask OS folks to do extra work to compensate.
> 
> Should someone want to implement readiness time reporting down the road,
> they'll need to do the same work as patch 1 in this series (making the
> flr delay a configurable parameter).

Sure.  That's a trivial change.  The problem is the quirk itself.

The Readiness Time Reporting Extended Capability is read-only with no
control bits in it so it requires no actual logic in the device.
Maybe you can just implement that capability with a firmware change on
the device and add the corresponding Linux support for it.
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Graham Whyte 3 months, 3 weeks ago

On 6/16/2025 2:05 PM, Bjorn Helgaas wrote:
> On Mon, Jun 16, 2025 at 12:02:41PM -0700, Graham Whyte wrote:
>> On 6/13/2025 8:33 AM, Bjorn Helgaas wrote:
>>> On Thu, Jun 12, 2025 at 09:41:45AM -0700, Graham Whyte wrote:
>>>> On 6/11/2025 11:31 PM, Christoph Hellwig wrote:
>>>>> On Wed, Jun 11, 2025 at 01:08:21PM -0700, Graham Whyte wrote:
>>>>>> We can ask our HW engineers to implement function readiness but we need
>>>>>> to be able to support exiting products, hence why posting it as a quirk.
>>>>>
>>>>> Your report sounds like it works perfectly fine, it's just that you
>>>>> want to reduce the delay.  For that you'll need to stick to the standard
>>>>> methods instead of adding quirks, which are for buggy hardware that does
>>>>> not otherwise work.
>>>>
>>>> Bjorn, what would you recommend as next steps here?
>>>
>>> This is a tough call and I don't pretend to have an obvious answer.  I
>>> understand the desire to improve performance.  On the other hand, PCI
>>> has been successful over the long term because devices adhere to
>>> standardized ways of doing things, which makes generic software
>>> possible.  Quirks degrade that story, of course, especially when there
>>> is an existing standardized solution that isn't being used.  I'm not
>>> at all happy about vendors that decide against the standard solution
>>> and then ask OS folks to do extra work to compensate.
>>
>> Should someone want to implement readiness time reporting down the road,
>> they'll need to do the same work as patch 1 in this series (making the
>> flr delay a configurable parameter).
> 
> Sure.  That's a trivial change.  The problem is the quirk itself.
> 
> The Readiness Time Reporting Extended Capability is read-only with no
> control bits in it so it requires no actual logic in the device.
> Maybe you can just implement that capability with a firmware change on
> the device and add the corresponding Linux support for it.

Hi Bjorn,

We checked with our HW folks, it's not possible for us to update the pci
register components with this particular card, they are read only. What
are your thoughts on the sysfs approach mentioned in the previous email?

Thanks,
Graham
Re: [PATCH v3 0/2] PCI: Reduce FLR delay to 10ms for MSFT devices
Posted by Graham Whyte 3 months, 1 week ago

On 6/18/2025 9:42 AM, Graham Whyte wrote:
> 
> 
> On 6/16/2025 2:05 PM, Bjorn Helgaas wrote:
>> On Mon, Jun 16, 2025 at 12:02:41PM -0700, Graham Whyte wrote:
>>> On 6/13/2025 8:33 AM, Bjorn Helgaas wrote:
>>>> On Thu, Jun 12, 2025 at 09:41:45AM -0700, Graham Whyte wrote:
>>>>> On 6/11/2025 11:31 PM, Christoph Hellwig wrote:
>>>>>> On Wed, Jun 11, 2025 at 01:08:21PM -0700, Graham Whyte wrote:
>>>>>>> We can ask our HW engineers to implement function readiness but we need
>>>>>>> to be able to support exiting products, hence why posting it as a quirk.
>>>>>>
>>>>>> Your report sounds like it works perfectly fine, it's just that you
>>>>>> want to reduce the delay.  For that you'll need to stick to the standard
>>>>>> methods instead of adding quirks, which are for buggy hardware that does
>>>>>> not otherwise work.
>>>>>
>>>>> Bjorn, what would you recommend as next steps here?
>>>>
>>>> This is a tough call and I don't pretend to have an obvious answer.  I
>>>> understand the desire to improve performance.  On the other hand, PCI
>>>> has been successful over the long term because devices adhere to
>>>> standardized ways of doing things, which makes generic software
>>>> possible.  Quirks degrade that story, of course, especially when there
>>>> is an existing standardized solution that isn't being used.  I'm not
>>>> at all happy about vendors that decide against the standard solution
>>>> and then ask OS folks to do extra work to compensate.
>>>
>>> Should someone want to implement readiness time reporting down the road,
>>> they'll need to do the same work as patch 1 in this series (making the
>>> flr delay a configurable parameter).
>>
>> Sure.  That's a trivial change.  The problem is the quirk itself.
>>
>> The Readiness Time Reporting Extended Capability is read-only with no
>> control bits in it so it requires no actual logic in the device.
>> Maybe you can just implement that capability with a firmware change on
>> the device and add the corresponding Linux support for it.
> 
> Hi Bjorn,
> 
> We checked with our HW folks, it's not possible for us to update the pci
> register components with this particular card, they are read only. What
> are your thoughts on the sysfs approach mentioned in the previous email?
> 
> Thanks,
> Graham

Hi Bjorn, just wanted to follow up on this here.