[PATCH v2 13/15] powerpc/rtas: Use fsleep() to minimize additional sleep duration

Anna-Maria Behnsen posted 15 patches 2 months, 2 weeks ago
There is a newer version of this series
[PATCH v2 13/15] powerpc/rtas: Use fsleep() to minimize additional sleep duration
Posted by Anna-Maria Behnsen 2 months, 2 weeks ago
When commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
was introduced, documentation about proper usage of sleep realted functions
was outdated.

The commit message references the usage of a HZ=100 system. When using a
20ms sleep duration on such a system and therefore using msleep(), the
possible additional slack will be +10ms.

When the system is configured with HZ=100 the granularity of a jiffy and of
a bucket of the lowest timer wheel level is 10ms. To make sure a timer will
not expire early (when queueing of the timer races with an concurrent
update of jiffies), timers are always queued into the next bucket. This is
the reason for the maximal possible slack of 10ms.

fsleep() limits the maximal possible slack to 25% by making threshold
between usleep_range() and msleep() HZ dependent. As soon as the accuracy
of msleep() is sufficient, the less expensive timer list timer based
sleeping function is used instead of the more expensive hrtimer based
usleep_range() function. The udelay() will not be used in this specific
usecase as the lowest sleep length is larger than 1 microsecond.

Use fsleep() directly instead of using an own heuristic for the best
sleeping mechanism to use..

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
---
v2: fix typos
---
 arch/powerpc/kernel/rtas.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index f7e86e09c49f..d31c9799cab2 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1390,21 +1390,14 @@ bool __ref rtas_busy_delay(int status)
 		 */
 		ms = clamp(ms, 1U, 1000U);
 		/*
-		 * The delay hint is an order-of-magnitude suggestion, not
-		 * a minimum. It is fine, possibly even advantageous, for
-		 * us to pause for less time than hinted. For small values,
-		 * use usleep_range() to ensure we don't sleep much longer
-		 * than actually needed.
-		 *
-		 * See Documentation/timers/timers-howto.rst for
-		 * explanation of the threshold used here. In effect we use
-		 * usleep_range() for 9900 and 9901, msleep() for
-		 * 9902-9905.
+		 * The delay hint is an order-of-magnitude suggestion, not a
+		 * minimum. It is fine, possibly even advantageous, for us to
+		 * pause for less time than hinted. To make sure pause time will
+		 * not be way longer than requested independent of HZ
+		 * configuration, use fsleep(). See fsleep() for details of
+		 * used sleeping functions.
 		 */
-		if (ms <= 20)
-			usleep_range(ms * 100, ms * 1000);
-		else
-			msleep(ms);
+		fsleep(ms * 1000);
 		break;
 	case RTAS_BUSY:
 		ret = true;

-- 
2.39.2
Re: [PATCH v2 13/15] powerpc/rtas: Use fsleep() to minimize additional sleep duration
Posted by Frederic Weisbecker 1 month, 3 weeks ago
Le Wed, Sep 11, 2024 at 07:13:39AM +0200, Anna-Maria Behnsen a écrit :
> When commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
> was introduced, documentation about proper usage of sleep realted functions

related*

> was outdated.
> 
> The commit message references the usage of a HZ=100 system. When using a
> 20ms sleep duration on such a system and therefore using msleep(), the
> possible additional slack will be +10ms.
> 
> When the system is configured with HZ=100 the granularity of a jiffy and of
> a bucket of the lowest timer wheel level is 10ms. To make sure a timer will
> not expire early (when queueing of the timer races with an concurrent
> update of jiffies), timers are always queued into the next bucket. This is
> the reason for the maximal possible slack of 10ms.
> 
> fsleep() limits the maximal possible slack to 25% by making threshold
> between usleep_range() and msleep() HZ dependent. As soon as the accuracy
> of msleep() is sufficient, the less expensive timer list timer based
> sleeping function is used instead of the more expensive hrtimer based
> usleep_range() function. The udelay() will not be used in this specific
> usecase as the lowest sleep length is larger than 1 microsecond.

Isn't udelay() for everything below 10us ?

> 
> Use fsleep() directly instead of using an own heuristic for the best
> sleeping mechanism to use..
> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Re: [PATCH v2 13/15] powerpc/rtas: Use fsleep() to minimize additional sleep duration
Posted by Anna-Maria Behnsen 1 month, 2 weeks ago
Frederic Weisbecker <frederic@kernel.org> writes:

> Le Wed, Sep 11, 2024 at 07:13:39AM +0200, Anna-Maria Behnsen a écrit :
>> When commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements")
>> was introduced, documentation about proper usage of sleep realted functions
>
> related*
>
>> was outdated.
>> 
>> The commit message references the usage of a HZ=100 system. When using a
>> 20ms sleep duration on such a system and therefore using msleep(), the
>> possible additional slack will be +10ms.
>> 
>> When the system is configured with HZ=100 the granularity of a jiffy and of
>> a bucket of the lowest timer wheel level is 10ms. To make sure a timer will
>> not expire early (when queueing of the timer races with an concurrent
>> update of jiffies), timers are always queued into the next bucket. This is
>> the reason for the maximal possible slack of 10ms.
>> 
>> fsleep() limits the maximal possible slack to 25% by making threshold
>> between usleep_range() and msleep() HZ dependent. As soon as the accuracy
>> of msleep() is sufficient, the less expensive timer list timer based
>> sleeping function is used instead of the more expensive hrtimer based
>> usleep_range() function. The udelay() will not be used in this specific
>> usecase as the lowest sleep length is larger than 1 microsecond.
>
> Isn't udelay() for everything below 10us ?

It's larger than 1 millisecond...

>
>> 
>> Use fsleep() directly instead of using an own heuristic for the best
>> sleeping mechanism to use..
>> 
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Nathan Lynch <nathanl@linux.ibm.com>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
>> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>