[PATCH v3 2/8] PCI: Replace msleep with fsleep for precise secondary bus reset

Hans Zhang posted 8 patches 1 month, 1 week ago
[PATCH v3 2/8] PCI: Replace msleep with fsleep for precise secondary bus reset
Posted by Hans Zhang 1 month, 1 week ago
The msleep() function with small values (less than 20ms) may not sleep
for the exact duration due to the kernel's timer wheel design. According
to the comment in kernel/time/sleep_timeout.c:

  "The slack of timers which will end up in level 0 depends on sleep
  duration (msecs) and HZ configuration. For example, with HZ=1000 and
  a requested sleep of 2ms, the slack can be as high as 50% (1ms) because
  the minimum slack is 12.5% but the actual calculation for level 0 timers
  is slack = MSECS_PER_TICK / msecs. This means that msleep(2) can
  actually take up to 3ms (2ms + 1ms) on a system with HZ=1000."

This unnecessary delay can impact system responsiveness during PCI
operations, especially since the PCIe r7.0 specification, section
7.5.1.3.13, requires only a minimum Trst of 1ms. We double this to 2ms
to ensure we meet the minimum requirement, but using msleep(2) may
actually wait longer than needed.

Using fsleep() provides a more precise delay that matches the stated
intent of the code. The fsleep() function uses high-resolution timers
where available to achieve microsecond precision.

Replace msleep(2 * PCI_T_RST_SEC_BUS_DELAY_MS) with
fsleep(2 * PCI_T_RST_SEC_BUS_DELAY_US) to ensure the actual delay is
closer to the intended 2ms delay.

Signed-off-by: Hans Zhang <18255117159@163.com>
---
 drivers/pci/pci.c | 2 +-
 drivers/pci/pci.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index c05a4c2fa643..81105dfc2f62 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4964,7 +4964,7 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
 
 	/* Double this to 2ms to ensure that we meet the minimum requirement */
-	msleep(2 * PCI_T_RST_SEC_BUS_DELAY_MS);
+	fsleep(2 * PCI_T_RST_SEC_BUS_DELAY_US);
 
 	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
 	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4d7e9c3f3453..9d38ef26c6a9 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -61,7 +61,7 @@ struct pcie_tlp_log;
 #define PCIE_LINK_WAIT_SLEEP_MS		90
 
 /* PCIe r7.0, sec 7.5.1.3.13, requires minimum Trst of 1ms */
-#define PCI_T_RST_SEC_BUS_DELAY_MS	1
+#define PCI_T_RST_SEC_BUS_DELAY_US	1000
 
 /* Message Routing (r[2:0]); PCIe r6.0, sec 2.2.8 */
 #define PCIE_MSG_TYPE_R_RC	0
-- 
2.25.1
Re: [PATCH v3 2/8] PCI: Replace msleep with fsleep for precise secondary bus reset
Posted by Bjorn Helgaas 1 month, 1 week ago
On Wed, Aug 27, 2025 at 01:03:09AM +0800, Hans Zhang wrote:
> The msleep() function with small values (less than 20ms) may not sleep
> for the exact duration due to the kernel's timer wheel design. According
> to the comment in kernel/time/sleep_timeout.c:
> 
>   "The slack of timers which will end up in level 0 depends on sleep
>   duration (msecs) and HZ configuration. For example, with HZ=1000 and
>   a requested sleep of 2ms, the slack can be as high as 50% (1ms) because
>   the minimum slack is 12.5% but the actual calculation for level 0 timers
>   is slack = MSECS_PER_TICK / msecs. This means that msleep(2) can
>   actually take up to 3ms (2ms + 1ms) on a system with HZ=1000."

I thought I heard something about 20ms being the minimum actual delay
for small msleeps.  I suppose the error is larger for HZ=100 systems.

The fsleep() would turn into something between 2ms and 2.5ms, so if
we're talking about reducing 3ms to 2.5ms, I have a hard time getting
worried about that.

And we're going to wait at least 100ms before touching the device
below the bridge anyway.

> This unnecessary delay can impact system responsiveness during PCI
> operations, especially since the PCIe r7.0 specification, section
> 7.5.1.3.13, requires only a minimum Trst of 1ms. We double this to 2ms
> to ensure we meet the minimum requirement, but using msleep(2) may
> actually wait longer than needed.
> 
> Using fsleep() provides a more precise delay that matches the stated
> intent of the code. The fsleep() function uses high-resolution timers
> where available to achieve microsecond precision.
> 
> Replace msleep(2 * PCI_T_RST_SEC_BUS_DELAY_MS) with
> fsleep(2 * PCI_T_RST_SEC_BUS_DELAY_US) to ensure the actual delay is
> closer to the intended 2ms delay.
> 
> Signed-off-by: Hans Zhang <18255117159@163.com>
> ---
>  drivers/pci/pci.c | 2 +-
>  drivers/pci/pci.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index c05a4c2fa643..81105dfc2f62 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4964,7 +4964,7 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
>  	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
>  
>  	/* Double this to 2ms to ensure that we meet the minimum requirement */
> -	msleep(2 * PCI_T_RST_SEC_BUS_DELAY_MS);
> +	fsleep(2 * PCI_T_RST_SEC_BUS_DELAY_US);
>  
>  	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>  	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 4d7e9c3f3453..9d38ef26c6a9 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -61,7 +61,7 @@ struct pcie_tlp_log;
>  #define PCIE_LINK_WAIT_SLEEP_MS		90
>  
>  /* PCIe r7.0, sec 7.5.1.3.13, requires minimum Trst of 1ms */
> -#define PCI_T_RST_SEC_BUS_DELAY_MS	1
> +#define PCI_T_RST_SEC_BUS_DELAY_US	1000
>  
>  /* Message Routing (r[2:0]); PCIe r6.0, sec 2.2.8 */
>  #define PCIE_MSG_TYPE_R_RC	0
> -- 
> 2.25.1
>
Re: [PATCH v3 2/8] PCI: Replace msleep with fsleep for precise secondary bus reset
Posted by Hans Zhang 1 month ago

On 2025/8/27 07:05, Bjorn Helgaas wrote:
> On Wed, Aug 27, 2025 at 01:03:09AM +0800, Hans Zhang wrote:
>> The msleep() function with small values (less than 20ms) may not sleep
>> for the exact duration due to the kernel's timer wheel design. According
>> to the comment in kernel/time/sleep_timeout.c:
>>
>>    "The slack of timers which will end up in level 0 depends on sleep
>>    duration (msecs) and HZ configuration. For example, with HZ=1000 and
>>    a requested sleep of 2ms, the slack can be as high as 50% (1ms) because
>>    the minimum slack is 12.5% but the actual calculation for level 0 timers
>>    is slack = MSECS_PER_TICK / msecs. This means that msleep(2) can
>>    actually take up to 3ms (2ms + 1ms) on a system with HZ=1000."
> 
> I thought I heard something about 20ms being the minimum actual delay
> for small msleeps.  I suppose the error is larger for HZ=100 systems.
> 

Yes.

> The fsleep() would turn into something between 2ms and 2.5ms, so if
> we're talking about reducing 3ms to 2.5ms, I have a hard time getting
> worried about that.
> 
> And we're going to wait at least 100ms before touching the device
> below the bridge anyway.

int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
	pcibios_reset_secondary_bus(dev);
                 pci_reset_secondary_bus(dev);
                         pci_read_config_word(dev, PCI_BRIDGE_CONTROL, 
&ctrl);
                         ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
                         pci_write_config_word(dev, PCI_BRIDGE_CONTROL, 
ctrl);

                         /*
                         * PCI spec v3.0 7.6.4.2 requires minimum Trst 
of 1ms.  Double
                         * this to 2ms to ensure that we meet the 
minimum requirement.
                         */
                         msleep(2);      // As you mentioned, is it 
necessary to be very precise here? Please decide whether you want to 
make the modification.

                         ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
                         pci_write_config_word(dev, PCI_BRIDGE_CONTROL, 
ctrl);
	pci_bridge_wait_for_secondary_bus(dev, "bus reset");
                 // There are also some delays in between.

                 // The delay here is also long enough.
                 pci_dev_wait(child, reset_type, 
PCIE_RESET_READY_POLL_MS - delay);




If it's not necessary to do so, do I still need to resubmit the version? 
Or do you choose a few acceptable ones or the first patch?



Best regards,
Hans

> 
>> This unnecessary delay can impact system responsiveness during PCI
>> operations, especially since the PCIe r7.0 specification, section
>> 7.5.1.3.13, requires only a minimum Trst of 1ms. We double this to 2ms
>> to ensure we meet the minimum requirement, but using msleep(2) may
>> actually wait longer than needed.
>>
>> Using fsleep() provides a more precise delay that matches the stated
>> intent of the code. The fsleep() function uses high-resolution timers
>> where available to achieve microsecond precision.
>>
>> Replace msleep(2 * PCI_T_RST_SEC_BUS_DELAY_MS) with
>> fsleep(2 * PCI_T_RST_SEC_BUS_DELAY_US) to ensure the actual delay is
>> closer to the intended 2ms delay.
>>
>> Signed-off-by: Hans Zhang <18255117159@163.com>
>> ---
>>   drivers/pci/pci.c | 2 +-
>>   drivers/pci/pci.h | 2 +-
>>   2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index c05a4c2fa643..81105dfc2f62 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4964,7 +4964,7 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
>>   	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
>>   
>>   	/* Double this to 2ms to ensure that we meet the minimum requirement */
>> -	msleep(2 * PCI_T_RST_SEC_BUS_DELAY_MS);
>> +	fsleep(2 * PCI_T_RST_SEC_BUS_DELAY_US);
>>   
>>   	ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>>   	pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index 4d7e9c3f3453..9d38ef26c6a9 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -61,7 +61,7 @@ struct pcie_tlp_log;
>>   #define PCIE_LINK_WAIT_SLEEP_MS		90
>>   
>>   /* PCIe r7.0, sec 7.5.1.3.13, requires minimum Trst of 1ms */
>> -#define PCI_T_RST_SEC_BUS_DELAY_MS	1
>> +#define PCI_T_RST_SEC_BUS_DELAY_US	1000
>>   
>>   /* Message Routing (r[2:0]); PCIe r6.0, sec 2.2.8 */
>>   #define PCIE_MSG_TYPE_R_RC	0
>> -- 
>> 2.25.1
>>