[v2] timers: Cleanup delay/sleep related mess

[PATCH v2 05/15] timers: Update function descriptions of sleep/delay related functions

Posted by Anna-Maria Behnsen 1 year, 5 months ago

A lot of commonly used functions for inserting a sleep or delay lack a
proper function description. Add function descriptions to all of them to
have important information in a central place close to the code.

No functional change.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
---
v2:
 - Fix typos
 - Fix proper usage of kernel-doc return formatting
---
 include/asm-generic/delay.h | 41 +++++++++++++++++++++++++++++++----
 include/linux/delay.h       | 48 ++++++++++++++++++++++++++++++----------
 kernel/time/sleep_timeout.c | 53 ++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 120 insertions(+), 22 deletions(-)

diff --git a/include/asm-generic/delay.h b/include/asm-generic/delay.h
index e448ac61430c..70a1b20f3e1a 100644
--- a/include/asm-generic/delay.h
+++ b/include/asm-generic/delay.h
@@ -12,11 +12,39 @@ extern void __const_udelay(unsigned long xloops);
 extern void __delay(unsigned long loops);
 
 /*
- * The weird n/20000 thing suppresses a "comparison is always false due to
- * limited range of data type" warning with non-const 8-bit arguments.
+ * Implementation details:
+ *
+ * * The weird n/20000 thing suppresses a "comparison is always false due to
+ *   limited range of data type" warning with non-const 8-bit arguments.
+ * * 0x10c7 is 2**32 / 1000000 (rounded up) -> udelay
+ * * 0x5 is 2**32 / 1000000000 (rounded up) -> ndelay
  */
 
-/* 0x10c7 is 2**32 / 1000000 (rounded up) */
+/**
+ * udelay - Inserting a delay based on microseconds with busy waiting
+ * @usec:	requested delay in microseconds
+ *
+ * When delaying in an atomic context ndelay(), udelay() and mdelay() are the
+ * only valid variants of delaying/sleeping to go with.
+ *
+ * When inserting delays in non atomic context which are shorter than the time
+ * which is required to queue e.g. an hrtimer and to enter then the scheduler,
+ * it is also valuable to use udelay(). But is not simple to specify a generic
+ * threshold for this which will fit for all systems, but an approximation would
+ * be a threshold for all delays up to 10 microseconds.
+ *
+ * When having a delay which is larger than the architecture specific
+ * %MAX_UDELAY_MS value, please make sure mdelay() is used. Otherwise a overflow
+ * risk is given.
+ *
+ * Please note that ndelay(), udelay() and mdelay() may return early for several
+ * reasons (https://lists.openwall.net/linux-kernel/2011/01/09/56):
+ *
+ * #. computed loops_per_jiffy too low (due to the time taken to execute the
+ *    timer interrupt.)
+ * #. cache behaviour affecting the time it takes to execute the loop function.
+ * #. CPU clock rate changes.
+ */
 #define udelay(n)							\
 	({								\
 		if (__builtin_constant_p(n)) {				\
@@ -29,7 +57,12 @@ extern void __delay(unsigned long loops);
 		}							\
 	})
 
-/* 0x5 is 2**32 / 1000000000 (rounded up) */
+/**
+ * ndelay - Inserting a delay based on nanoseconds with busy waiting
+ * @nsec:	requested delay in nanoseconds
+ *
+ * See udelay() for basic information about ndelay() and it's variants.
+ */
 #define ndelay(n)							\
 	({								\
 		if (__builtin_constant_p(n)) {				\
diff --git a/include/linux/delay.h b/include/linux/delay.h
index 2bc586aa2068..23623fa79768 100644
--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -6,17 +6,7 @@
  * Copyright (C) 1993 Linus Torvalds
  *
  * Delay routines, using a pre-computed "loops_per_jiffy" value.
- *
- * Please note that ndelay(), udelay() and mdelay() may return early for
- * several reasons:
- *  1. computed loops_per_jiffy too low (due to the time taken to
- *     execute the timer interrupt.)
- *  2. cache behaviour affecting the time it takes to execute the
- *     loop function.
- *  3. CPU clock rate changes.
- *
- * Please see this thread:
- *   https://lists.openwall.net/linux-kernel/2011/01/09/56
+ * Sleep routines using timer list timers or hrtimers.
  */
 
 #include <linux/math.h>
@@ -35,12 +25,21 @@ extern unsigned long loops_per_jiffy;
  * The 2nd mdelay() definition ensures GCC will optimize away the 
  * while loop for the common cases where n <= MAX_UDELAY_MS  --  Paul G.
  */
-
 #ifndef MAX_UDELAY_MS
 #define MAX_UDELAY_MS	5
 #endif
 
 #ifndef mdelay
+/**
+ * mdelay - Inserting a delay based on microseconds with busy waiting
+ * @n:	requested delay in microseconds
+ *
+ * See udelay() for basic information about mdelay() and it's variants.
+ *
+ * Please double check, whether mdelay() is the right way to go or whether a
+ * refactoring of the code is the better variant to be able to use msleep()
+ * instead.
+ */
 #define mdelay(n) (\
 	(__builtin_constant_p(n) && (n)<=MAX_UDELAY_MS) ? udelay((n)*1000) : \
 	({unsigned long __ms=(n); while (__ms--) udelay(1000);}))
@@ -63,16 +62,41 @@ unsigned long msleep_interruptible(unsigned int msecs);
 void usleep_range_state(unsigned long min, unsigned long max,
 			unsigned int state);
 
+/**
+ * usleep_range - Sleep for an approximate time
+ * @min:	Minimum time in microseconds to sleep
+ * @max:	Maximum time in microseconds to sleep
+ *
+ * For basic information please refere to usleep_range_state().
+ *
+ * The task will be in the state TASK_UNINTERRUPTIBLE during the sleep.
+ */
 static inline void usleep_range(unsigned long min, unsigned long max)
 {
 	usleep_range_state(min, max, TASK_UNINTERRUPTIBLE);
 }
 
+/**
+ * usleep_range_idle - Sleep for an approximate time with idle time accounting
+ * @min:	Minimum time in microseconds to sleep
+ * @max:	Maximum time in microseconds to sleep
+ *
+ * For basic information please refere to usleep_range_state().
+ *
+ * The sleeping task has the state TASK_IDLE during the sleep to prevent
+ * contribution to the load avarage.
+ */
 static inline void usleep_range_idle(unsigned long min, unsigned long max)
 {
 	usleep_range_state(min, max, TASK_IDLE);
 }
 
+/**
+ * ssleep - wrapper for seconds arount msleep
+ * @seconds:	Requested sleep duration in seconds
+ *
+ * Please refere to msleep() for detailed information.
+ */
 static inline void ssleep(unsigned int seconds)
 {
 	msleep(seconds * 1000);
diff --git a/kernel/time/sleep_timeout.c b/kernel/time/sleep_timeout.c
index 560d17c30aa5..21f412350b15 100644
--- a/kernel/time/sleep_timeout.c
+++ b/kernel/time/sleep_timeout.c
@@ -281,7 +281,34 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout);
 
 /**
  * msleep - sleep safely even with waitqueue interruptions
- * @msecs: Time in milliseconds to sleep for
+ * @msecs:	Requested sleep duration in milliseconds
+ *
+ * msleep() uses jiffy based timeouts for the sleep duration. The accuracy of
+ * the resulting sleep duration depends on:
+ *
+ * * HZ configuration
+ * * sleep duration (as granularity of a bucket which collects timers increases
+ *   with the timer wheel levels)
+ *
+ * When the timer is queued into the second level of the timer wheel the maximum
+ * additional delay will be 12.5%. For explanation please check the detailed
+ * description about the basics of the timer wheel. In case this is accurate
+ * enough check which sleep length is selected to make sure required accuracy is
+ * given. Please use therefore the following simple steps:
+ *
+ * #. Decide which slack is fine for the requested sleep duration - but do not
+ *    use values shorter than 1/8
+ * #. Check whether your sleep duration is equal or greater than the following
+ *    result: ``TICK_NSEC / slack / NSEC_PER_MSEC``
+ *
+ * Examples:
+ *
+ * * ``HZ=1000`` with `slack=1/4``: all sleep durations greater or equal 4ms will meet
+ *   the constrains.
+ * * ``HZ=250`` with ``slack=1/4``: all sleep durations greater or equal 16ms will meet
+ *   the constrains.
+ *
+ * See also the signal aware variant msleep_interruptible().
  */
 void msleep(unsigned int msecs)
 {
@@ -294,7 +321,15 @@ EXPORT_SYMBOL(msleep);
 
 /**
  * msleep_interruptible - sleep waiting for signals
- * @msecs: Time in milliseconds to sleep for
+ * @msecs:	Requested sleep duration in milliseconds
+ *
+ * See msleep() for some basic information.
+ *
+ * The difference between msleep() and msleep_interruptible() is that the sleep
+ * could be interrupted by a signal delivery and then returns early.
+ *
+ * Returns: The remaining time of the sleep duration transformed to msecs (see
+ * schedule_timeout() for details).
  */
 unsigned long msleep_interruptible(unsigned int msecs)
 {
@@ -312,11 +347,17 @@ EXPORT_SYMBOL(msleep_interruptible);
  * @max:	Maximum time in usecs to sleep
  * @state:	State of the current task that will be while sleeping
  *
+ * usleep_range_state() sleeps at least for the minimum specified time but not
+ * longer than the maximum specified amount of time. The range might reduce
+ * power usage by allowing hrtimers to coalesce an already scheduled interrupt
+ * with this hrtimer. In the worst case, an interrupt is scheduled for the upper
+ * bound.
+ *
+ * The sleeping task is set to the specified state before starting the sleep.
+ *
  * In non-atomic context where the exact wakeup time is flexible, use
- * usleep_range_state() instead of udelay().  The sleep improves responsiveness
- * by avoiding the CPU-hogging busy-wait of udelay(), and the range reduces
- * power usage by allowing hrtimers to take advantage of an already-
- * scheduled interrupt instead of scheduling a new one just for this sleep.
+ * usleep_range() or its variants instead of udelay(). The sleep improves
+ * responsiveness by avoiding the CPU-hogging busy-wait of udelay().
  */
 void __sched usleep_range_state(unsigned long min, unsigned long max, unsigned int state)
 {

-- 
2.39.2

Re: [PATCH v2 05/15] timers: Update function descriptions of sleep/delay related functions

Posted by Frederic Weisbecker 1 year, 4 months ago

Le Wed, Sep 11, 2024 at 07:13:31AM +0200, Anna-Maria Behnsen a écrit :
> A lot of commonly used functions for inserting a sleep or delay lack a
> proper function description. Add function descriptions to all of them to
> have important information in a central place close to the code.
> 
> No functional change.
> 
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: linux-arch@vger.kernel.org
> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
> ---
> v2:
>  - Fix typos
>  - Fix proper usage of kernel-doc return formatting
> ---
>  include/asm-generic/delay.h | 41 +++++++++++++++++++++++++++++++----
>  include/linux/delay.h       | 48 ++++++++++++++++++++++++++++++----------
>  kernel/time/sleep_timeout.c | 53 ++++++++++++++++++++++++++++++++++++++++-----
>  3 files changed, 120 insertions(+), 22 deletions(-)
> 
> diff --git a/include/asm-generic/delay.h b/include/asm-generic/delay.h
> index e448ac61430c..70a1b20f3e1a 100644
> --- a/include/asm-generic/delay.h
> +++ b/include/asm-generic/delay.h
> @@ -12,11 +12,39 @@ extern void __const_udelay(unsigned long xloops);
>  extern void __delay(unsigned long loops);
>  
>  /*
> - * The weird n/20000 thing suppresses a "comparison is always false due to
> - * limited range of data type" warning with non-const 8-bit arguments.
> + * Implementation details:
> + *
> + * * The weird n/20000 thing suppresses a "comparison is always false due to
> + *   limited range of data type" warning with non-const 8-bit arguments.
> + * * 0x10c7 is 2**32 / 1000000 (rounded up) -> udelay
> + * * 0x5 is 2**32 / 1000000000 (rounded up) -> ndelay

I can't say I'm less confused about these values but at least it
brings a bit of light in the horizon...

>   */
>  
> -/* 0x10c7 is 2**32 / 1000000 (rounded up) */
> +/**
> + * udelay - Inserting a delay based on microseconds with busy waiting
> + * @usec:	requested delay in microseconds
> + *
> + * When delaying in an atomic context ndelay(), udelay() and mdelay() are the
> + * only valid variants of delaying/sleeping to go with.
> + *
> + * When inserting delays in non atomic context which are shorter than the time
> + * which is required to queue e.g. an hrtimer and to enter then the scheduler,
> + * it is also valuable to use udelay(). But is not simple to specify a generic

But it is*

> + * threshold for this which will fit for all systems, but an approximation would

But but?

> + * be a threshold for all delays up to 10 microseconds.
> + *
> + * When having a delay which is larger than the architecture specific
> + * %MAX_UDELAY_MS value, please make sure mdelay() is used. Otherwise a overflow
> + * risk is given.
> + *
> + * Please note that ndelay(), udelay() and mdelay() may return early for several
> + * reasons (https://lists.openwall.net/linux-kernel/2011/01/09/56):
> + *
> + * #. computed loops_per_jiffy too low (due to the time taken to execute the
> + *    timer interrupt.)
> + * #. cache behaviour affecting the time it takes to execute the loop function.
> + * #. CPU clock rate changes.
> + */
>  #define udelay(n)							\
>  	({								\
>  		if (__builtin_constant_p(n)) {				\
> @@ -35,12 +25,21 @@ extern unsigned long loops_per_jiffy;
>   * The 2nd mdelay() definition ensures GCC will optimize away the 
>   * while loop for the common cases where n <= MAX_UDELAY_MS  --  Paul G.
>   */
> -
>  #ifndef MAX_UDELAY_MS
>  #define MAX_UDELAY_MS	5
>  #endif
>  
>  #ifndef mdelay
> +/**
> + * mdelay - Inserting a delay based on microseconds with busy waiting

Milliseconds?

> + * @n:	requested delay in microseconds

Ditto

> + *
> + * See udelay() for basic information about mdelay() and it's variants.
> + *
> + * Please double check, whether mdelay() is the right way to go or whether a
> + * refactoring of the code is the better variant to be able to use msleep()
> + * instead.
> + */
>  #define mdelay(n) (\
>  	(__builtin_constant_p(n) && (n)<=MAX_UDELAY_MS) ? udelay((n)*1000) : \
>  	({unsigned long __ms=(n); while (__ms--) udelay(1000);}))
> @@ -63,16 +62,41 @@ unsigned long msleep_interruptible(unsigned int msecs);
>  void usleep_range_state(unsigned long min, unsigned long max,
>  			unsigned int state);
>  
> +/**
> + * usleep_range - Sleep for an approximate time
> + * @min:	Minimum time in microseconds to sleep
> + * @max:	Maximum time in microseconds to sleep
> + *
> + * For basic information please refere to usleep_range_state().
> + *
> + * The task will be in the state TASK_UNINTERRUPTIBLE during the sleep.
> + */
>  static inline void usleep_range(unsigned long min, unsigned long max)
>  {
>  	usleep_range_state(min, max, TASK_UNINTERRUPTIBLE);
>  }
>  
> +/**
> + * usleep_range_idle - Sleep for an approximate time with idle time accounting
> + * @min:	Minimum time in microseconds to sleep
> + * @max:	Maximum time in microseconds to sleep
> + *
> + * For basic information please refere to usleep_range_state().
> + *
> + * The sleeping task has the state TASK_IDLE during the sleep to prevent
> + * contribution to the load avarage.
> + */
>  static inline void usleep_range_idle(unsigned long min, unsigned long max)
>  {
>  	usleep_range_state(min, max, TASK_IDLE);
>  }
>  
> +/**
> + * ssleep - wrapper for seconds arount msleep

around

> + * @seconds:	Requested sleep duration in seconds
> + *
> + * Please refere to msleep() for detailed information.
> + */
>  static inline void ssleep(unsigned int seconds)
>  {
>  	msleep(seconds * 1000);
> diff --git a/kernel/time/sleep_timeout.c b/kernel/time/sleep_timeout.c
> index 560d17c30aa5..21f412350b15 100644
> --- a/kernel/time/sleep_timeout.c
> +++ b/kernel/time/sleep_timeout.c
> @@ -281,7 +281,34 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout);
>  
>  /**
>   * msleep - sleep safely even with waitqueue interruptions
> - * @msecs: Time in milliseconds to sleep for
> + * @msecs:	Requested sleep duration in milliseconds
> + *
> + * msleep() uses jiffy based timeouts for the sleep duration. The accuracy of
> + * the resulting sleep duration depends on:
> + *
> + * * HZ configuration
> + * * sleep duration (as granularity of a bucket which collects timers increases
> + *   with the timer wheel levels)
> + *
> + * When the timer is queued into the second level of the timer wheel the maximum
> + * additional delay will be 12.5%. For explanation please check the detailed
> + * description about the basics of the timer wheel. In case this is accurate
> + * enough check which sleep length is selected to make sure required accuracy is
> + * given. Please use therefore the following simple steps:
> + *
> + * #. Decide which slack is fine for the requested sleep duration - but do not
> + *    use values shorter than 1/8

I'm confused, what means 1/x for a slack value? 1/8 means 125 msecs? I'm not
even I understand what you mean by slack. Is it the bucket_expiry - expiry?

> + * #. Check whether your sleep duration is equal or greater than the following
> + *    result: ``TICK_NSEC / slack / NSEC_PER_MSEC``
> + *
> + * Examples:
> + *
> + * * ``HZ=1000`` with `slack=1/4``: all sleep durations greater or equal 4ms will meet
> + *   the constrains.
> + * * ``HZ=250`` with ``slack=1/4``: all sleep durations greater or equal 16ms will meet
> + *   the constrains.

constraints.

But I'm still lost...

Thanks.

Re: [PATCH v2 05/15] timers: Update function descriptions of sleep/delay related functions

Posted by Anna-Maria Behnsen 1 year, 4 months ago

Frederic Weisbecker <frederic@kernel.org> writes:

> Le Wed, Sep 11, 2024 at 07:13:31AM +0200, Anna-Maria Behnsen a écrit :
>> A lot of commonly used functions for inserting a sleep or delay lack a
>> proper function description. Add function descriptions to all of them to
>> have important information in a central place close to the code.
>> 
>> No functional change.
>> 
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: linux-arch@vger.kernel.org
>> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
>> ---
>> v2:
>>  - Fix typos
>>  - Fix proper usage of kernel-doc return formatting
>> ---
>>  include/asm-generic/delay.h | 41 +++++++++++++++++++++++++++++++----
>>  include/linux/delay.h       | 48 ++++++++++++++++++++++++++++++----------
>>  kernel/time/sleep_timeout.c | 53 ++++++++++++++++++++++++++++++++++++++++-----
>>  3 files changed, 120 insertions(+), 22 deletions(-)
>> 
>> diff --git a/include/asm-generic/delay.h b/include/asm-generic/delay.h
>> index e448ac61430c..70a1b20f3e1a 100644
>> --- a/include/asm-generic/delay.h
>> +++ b/include/asm-generic/delay.h
>> @@ -12,11 +12,39 @@ extern void __const_udelay(unsigned long xloops);
>>  extern void __delay(unsigned long loops);
>>  
>>  /*
>> - * The weird n/20000 thing suppresses a "comparison is always false due to
>> - * limited range of data type" warning with non-const 8-bit arguments.
>> + * Implementation details:
>> + *
>> + * * The weird n/20000 thing suppresses a "comparison is always false due to
>> + *   limited range of data type" warning with non-const 8-bit arguments.
>> + * * 0x10c7 is 2**32 / 1000000 (rounded up) -> udelay
>> + * * 0x5 is 2**32 / 1000000000 (rounded up) -> ndelay
>
> I can't say I'm less confused about these values but at least it
> brings a bit of light in the horizon...

:) This will be cleaned up in a second step all over the place as
suggested by Thomas already in v1. But for now, the aim is only to fix
fsleep and especially the outdated documentation of delay and sleep
related functions.

>>   */
>>  
>> -/* 0x10c7 is 2**32 / 1000000 (rounded up) */
>> +/**
>> + * udelay - Inserting a delay based on microseconds with busy waiting
>> + * @usec:	requested delay in microseconds
>> + *
>> + * When delaying in an atomic context ndelay(), udelay() and mdelay() are the
>> + * only valid variants of delaying/sleeping to go with.
>> + *
>> + * When inserting delays in non atomic context which are shorter than the time
>> + * which is required to queue e.g. an hrtimer and to enter then the scheduler,
>> + * it is also valuable to use udelay(). But is not simple to specify a generic
>
> But it is*
>
>> + * threshold for this which will fit for all systems, but an approximation would
>
> But but?

change those two sentences into: But it is not simple to specify a
generic threshold for this which will fit for all systems. An
approximation is a threshold for all delays up to 10 microseconds.

>> + * be a threshold for all delays up to 10 microseconds.
>> + *
>> + * When having a delay which is larger than the architecture specific
>> + * %MAX_UDELAY_MS value, please make sure mdelay() is used. Otherwise a overflow
>> + * risk is given.
>> + *
>> + * Please note that ndelay(), udelay() and mdelay() may return early for several
>> + * reasons (https://lists.openwall.net/linux-kernel/2011/01/09/56):
>> + *
>> + * #. computed loops_per_jiffy too low (due to the time taken to execute the
>> + *    timer interrupt.)
>> + * #. cache behaviour affecting the time it takes to execute the loop function.
>> + * #. CPU clock rate changes.
>> + */
>>  #define udelay(n)							\
>>  	({								\
>>  		if (__builtin_constant_p(n)) {				\
>> diff --git a/kernel/time/sleep_timeout.c b/kernel/time/sleep_timeout.c
>> index 560d17c30aa5..21f412350b15 100644
>> --- a/kernel/time/sleep_timeout.c
>> +++ b/kernel/time/sleep_timeout.c
>> @@ -281,7 +281,34 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout);
>>  
>>  /**
>>   * msleep - sleep safely even with waitqueue interruptions
>> - * @msecs: Time in milliseconds to sleep for
>> + * @msecs:	Requested sleep duration in milliseconds
>> + *
>> + * msleep() uses jiffy based timeouts for the sleep duration. The accuracy of
>> + * the resulting sleep duration depends on:
>> + *
>> + * * HZ configuration
>> + * * sleep duration (as granularity of a bucket which collects timers increases
>> + *   with the timer wheel levels)
>> + *
>> + * When the timer is queued into the second level of the timer wheel the maximum
>> + * additional delay will be 12.5%. For explanation please check the detailed
>> + * description about the basics of the timer wheel. In case this is accurate
>> + * enough check which sleep length is selected to make sure required accuracy is
>> + * given. Please use therefore the following simple steps:
>> + *
>> + * #. Decide which slack is fine for the requested sleep duration - but do not
>> + *    use values shorter than 1/8
>
> I'm confused, what means 1/x for a slack value? 1/8 means 125 msecs? I'm not
> even I understand what you mean by slack. Is it the bucket_expiry - expiry?

I was confused as well and had to read it twice... I would propose to
rephrase the whole function description:

/**
 * msleep - sleep safely even with waitqueue interruptions
 * @msecs:	Requested sleep duration in milliseconds
 *
 * msleep() uses jiffy based timeouts for the sleep duration. Because of the
 * design of the timer wheel, the maximum additional percentage delay (slack) is
 * 12.5%. This is only valid for timers which will end up in the second or a
 * higher level of the timer wheel. For explanation of those 12.5% please check
 * the detailed description about the basics of the timer wheel.
 *
 * The slack of timers which will end up in the first level depends on:
 *
 * * sleep duration (msecs)
 * * HZ configuration
 *
 * To make sure the sleep duration with the slack is accurate enough, a slack
 * value is required (because of the design of the timer wheel it is not
 * possible to define a value smaller than 12.5%). The following check makes
 * clear, whether the sleep duration with the defined slack and with the HZ
 * configuration will meet the constraints:
 *
 *  ``msecs >= (MSECS_PER_TICK / slack)``
 *
 * Examples:
 *
 * * ``HZ=1000`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 1 / (1/4) = 4``:
 *   all sleep durations greater or equal 4ms will meet the constraints.
 * * ``HZ=1000`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 1 / (1/8) = 8``:
 *   all sleep durations greater or equal 8ms will meet the constraints.
 * * ``HZ=250`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 4 / (1/4) = 16``:
 *   all sleep durations greater or equal 16ms will meet the constraints.
 * * ``HZ=250`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 4 / (1/8) = 32``:
 *   all sleep durations greater or equal 32ms will meet the constraints.
 *
 * See also the signal aware variant msleep_interruptible().
 */

>
> But I'm still lost...
>

Hopefully no longer :)

Thanks,

	Anna-Maria

Re: [PATCH v2 05/15] timers: Update function descriptions of sleep/delay related functions

Posted by Frederic Weisbecker 1 year, 4 months ago

Le Thu, Oct 10, 2024 at 10:45:03AM +0200, Anna-Maria Behnsen a écrit :
> Frederic Weisbecker <frederic@kernel.org> writes:
> > I can't say I'm less confused about these values but at least it
> > brings a bit of light in the horizon...
> 
> :) This will be cleaned up in a second step all over the place as
> suggested by Thomas already in v1. But for now, the aim is only to fix
> fsleep and especially the outdated documentation of delay and sleep
> related functions.

Sure.

> 
> >>   */
> >>  
> >> -/* 0x10c7 is 2**32 / 1000000 (rounded up) */
> >> +/**
> >> + * udelay - Inserting a delay based on microseconds with busy waiting
> >> + * @usec:	requested delay in microseconds
> >> + *
> >> + * When delaying in an atomic context ndelay(), udelay() and mdelay() are the
> >> + * only valid variants of delaying/sleeping to go with.
> >> + *
> >> + * When inserting delays in non atomic context which are shorter than the time
> >> + * which is required to queue e.g. an hrtimer and to enter then the scheduler,
> >> + * it is also valuable to use udelay(). But is not simple to specify a generic
> >
> > But it is*
> >
> >> + * threshold for this which will fit for all systems, but an approximation would
> >
> > But but?
> 
> change those two sentences into: But it is not simple to specify a
> generic threshold for this which will fit for all systems. An
> approximation is a threshold for all delays up to 10 microseconds.

Very good!

> >> @@ -281,7 +281,34 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout);
> >>  
> >>  /**
> >>   * msleep - sleep safely even with waitqueue interruptions
> >> - * @msecs: Time in milliseconds to sleep for
> >> + * @msecs:	Requested sleep duration in milliseconds
> >> + *
> >> + * msleep() uses jiffy based timeouts for the sleep duration. The accuracy of
> >> + * the resulting sleep duration depends on:
> >> + *
> >> + * * HZ configuration
> >> + * * sleep duration (as granularity of a bucket which collects timers increases
> >> + *   with the timer wheel levels)
> >> + *
> >> + * When the timer is queued into the second level of the timer wheel the maximum
> >> + * additional delay will be 12.5%. For explanation please check the detailed
> >> + * description about the basics of the timer wheel. In case this is accurate
> >> + * enough check which sleep length is selected to make sure required accuracy is
> >> + * given. Please use therefore the following simple steps:
> >> + *
> >> + * #. Decide which slack is fine for the requested sleep duration - but do not
> >> + *    use values shorter than 1/8
> >
> > I'm confused, what means 1/x for a slack value? 1/8 means 125 msecs? I'm not
> > even I understand what you mean by slack. Is it the bucket_expiry - expiry?
> 
> I was confused as well and had to read it twice... I would propose to
> rephrase the whole function description:
> 
> 
> /**
>  * msleep - sleep safely even with waitqueue interruptions
>  * @msecs:	Requested sleep duration in milliseconds
>  *
>  * msleep() uses jiffy based timeouts for the sleep duration. Because of the
>  * design of the timer wheel, the maximum additional percentage delay (slack) is
>  * 12.5%. This is only valid for timers which will end up in the second or a
>  * higher level of the timer wheel. For explanation of those 12.5% please check
>  * the detailed description about the basics of the timer wheel.

I've never realized this constant worst percentage of slack. Would be nice to mention
that somewhere in kernel/time/timer.c

However this doesn't need a second to apply. It only takes crossing levels above
0. Or am I missing something?

>  *
>  * The slack of timers which will end up in the first level depends on:
>  *
>  * * sleep duration (msecs)
>  * * HZ configuration
>  *
>  * To make sure the sleep duration with the slack is accurate enough, a slack
>  * value is required (because of the design of the timer wheel it is not

But where is it required?

>  * possible to define a value smaller than 12.5%). The following check makes
>  * clear, whether the sleep duration with the defined slack and with the HZ
>  * configuration will meet the constraints:
>  *
>  *  ``msecs >= (MSECS_PER_TICK / slack)``
>  *
>  * Examples:
>  *
>  * * ``HZ=1000`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 1 / (1/4) = 4``:
>  *   all sleep durations greater or equal 4ms will meet the constraints.
>  * * ``HZ=1000`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 1 / (1/8) = 8``:
>  *   all sleep durations greater or equal 8ms will meet the constraints.
>  * * ``HZ=250`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 4 / (1/4) = 16``:
>  *   all sleep durations greater or equal 16ms will meet the constraints.
>  * * ``HZ=250`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 4 / (1/8) = 32``:
>  *   all sleep durations greater or equal 32ms will meet the constraints.

But who defines those slacks and where? I'm even more confused now...

>  *
>  * See also the signal aware variant msleep_interruptible().
>  */
> 
> >
> > But I'm still lost...
> >
> 
> Hopefully no longer :)

Well...

> Thanks,
> 
> 	Anna-Maria
> 
>

Re: [PATCH v2 05/15] timers: Update function descriptions of sleep/delay related functions

Posted by Anna-Maria Behnsen 1 year, 4 months ago

Frederic Weisbecker <frederic@kernel.org> writes:

> Le Thu, Oct 10, 2024 at 10:45:03AM +0200, Anna-Maria Behnsen a écrit :
>> Frederic Weisbecker <frederic@kernel.org> writes:
>> >> @@ -281,7 +281,34 @@ EXPORT_SYMBOL_GPL(schedule_hrtimeout);
>> >>  
>> >>  /**
>> >>   * msleep - sleep safely even with waitqueue interruptions
>> >> - * @msecs: Time in milliseconds to sleep for
>> >> + * @msecs:	Requested sleep duration in milliseconds
>> >> + *
>> >> + * msleep() uses jiffy based timeouts for the sleep duration. The accuracy of
>> >> + * the resulting sleep duration depends on:
>> >> + *
>> >> + * * HZ configuration
>> >> + * * sleep duration (as granularity of a bucket which collects timers increases
>> >> + *   with the timer wheel levels)
>> >> + *
>> >> + * When the timer is queued into the second level of the timer wheel the maximum
>> >> + * additional delay will be 12.5%. For explanation please check the detailed
>> >> + * description about the basics of the timer wheel. In case this is accurate
>> >> + * enough check which sleep length is selected to make sure required accuracy is
>> >> + * given. Please use therefore the following simple steps:
>> >> + *
>> >> + * #. Decide which slack is fine for the requested sleep duration - but do not
>> >> + *    use values shorter than 1/8
>> >
>> > I'm confused, what means 1/x for a slack value? 1/8 means 125 msecs? I'm not
>> > even I understand what you mean by slack. Is it the bucket_expiry - expiry?
>> 
>> I was confused as well and had to read it twice... I would propose to
>> rephrase the whole function description:
>> 
>> 
>> /**
>>  * msleep - sleep safely even with waitqueue interruptions
>>  * @msecs:	Requested sleep duration in milliseconds
>>  *
>>  * msleep() uses jiffy based timeouts for the sleep duration. Because of the
>>  * design of the timer wheel, the maximum additional percentage delay (slack) is
>>  * 12.5%. This is only valid for timers which will end up in the second or a
>>  * higher level of the timer wheel. For explanation of those 12.5% please check
>>  * the detailed description about the basics of the timer wheel.
>
> I've never realized this constant worst percentage of slack. Would be nice to mention
> that somewhere in kernel/time/timer.c

Yes, we can explicitly add it (I will put it on the TODO list). It's
possible to calculate it on your own with the overview of levels and
granularity,...

> However this doesn't need a second to apply. It only takes crossing levels above
> 0. Or am I missing something?

s/the second/level 1/

more clear? Then it's the same number as used in the timer wheel
documentation.

>>  *
>>  * The slack of timers which will end up in the first level depends on:
>>  *

Same here: s/the first level/level 0/

>>  * * sleep duration (msecs)
>>  * * HZ configuration
>>  *
>>  * To make sure the sleep duration with the slack is accurate enough, a slack
>>  * value is required (because of the design of the timer wheel it is not
>
> But where is it required?

The callsite has to decide which accuracy/slack is required for their
use case (this was also part of the discussion which leads to this
queue).

>>  * possible to define a value smaller than 12.5%). The following check makes
>>  * clear, whether the sleep duration with the defined slack and with the HZ
>>  * configuration will meet the constraints:
>>  *
>>  *  ``msecs >= (MSECS_PER_TICK / slack)``
>>  *
>>  * Examples:
>>  *
>>  * * ``HZ=1000`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 1 / (1/4) = 4``:
>>  *   all sleep durations greater or equal 4ms will meet the constraints.
>>  * * ``HZ=1000`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 1 / (1/8) = 8``:
>>  *   all sleep durations greater or equal 8ms will meet the constraints.
>>  * * ``HZ=250`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 4 / (1/4) = 16``:
>>  *   all sleep durations greater or equal 16ms will meet the constraints.
>>  * * ``HZ=250`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 4 / (1/8) = 32``:
>>  *   all sleep durations greater or equal 32ms will meet the constraints.
>
> But who defines those slacks and where? I'm even more confused now...

I think I know where the confusion comes from. I rephrase it once more
and turned around the calculation:

/**
 * msleep - sleep safely even with waitqueue interruptions
 * @msecs:	Requested sleep duration in milliseconds
 *
 * msleep() uses jiffy based timeouts for the sleep duration. Because of the
 * design of the timer wheel, the maximum additional percentage delay (slack) is
 * 12.5%. This is only valid for timers which will end up in level 1 or a
 * higher level of the timer wheel. For explanation of those 12.5% please check
 * the detailed description about the basics of the timer wheel.
 *
 * The slack of timers which will end up in level 0 depends on sleep
 * duration (msecs) and HZ configuration and can be calculated in the
 * following way (with the timer wheel design restriction that the slack
 * is not less than 12.5%):
 *
 *   ``slack = MSECS_PER_TICK / msecs``
 *
 * When the allowed slack of the callsite is known, the calculation
 * could be turned around to find the minimal allowed sleep duration to meet
 * the constraints. For example:
 *
 * * ``HZ=1000`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 1 / (1/4) = 4``:
 *   all sleep durations greater or equal 4ms will meet the constraints.
 * * ``HZ=1000`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 1 / (1/8) = 8``:
 *   all sleep durations greater or equal 8ms will meet the constraints.
 * * ``HZ=250`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 4 / (1/4) = 16``:
 *   all sleep durations greater or equal 16ms will meet the constraints.
 * * ``HZ=250`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 4 / (1/8) = 32``:
 *   all sleep durations greater or equal 32ms will meet the constraints.
 *
 * See also the signal aware variant msleep_interruptible().
 */

Hopefully this attempt clarifies the confusion?

Re: [PATCH v2 05/15] timers: Update function descriptions of sleep/delay related functions

Posted by Frederic Weisbecker 1 year, 4 months ago

Le Fri, Oct 11, 2024 at 12:20:20PM +0200, Anna-Maria Behnsen a écrit :
> Frederic Weisbecker <frederic@kernel.org> writes:
> > However this doesn't need a second to apply. It only takes crossing levels above
> > 0. Or am I missing something?
> 
> s/the second/level 1/
> 
> more clear? Then it's the same number as used in the timer wheel
> documentation.

Oh right! I was confused with second (s) and 2nd. Yes much better.

> 
> >>  *
> >>  * The slack of timers which will end up in the first level depends on:
> >>  *
> 
> Same here: s/the first level/level 0/

Much better !

> 
> >>  * * sleep duration (msecs)
> >>  * * HZ configuration
> >>  *
> >>  * To make sure the sleep duration with the slack is accurate enough, a slack
> >>  * value is required (because of the design of the timer wheel it is not
> >
> > But where is it required?
> 
> The callsite has to decide which accuracy/slack is required for their
> use case (this was also part of the discussion which leads to this
> queue).
> 
> >>  * possible to define a value smaller than 12.5%). The following check makes
> >>  * clear, whether the sleep duration with the defined slack and with the HZ
> >>  * configuration will meet the constraints:
> >>  *
> >>  *  ``msecs >= (MSECS_PER_TICK / slack)``
> >>  *
> >>  * Examples:
> >>  *
> >>  * * ``HZ=1000`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 1 / (1/4) = 4``:
> >>  *   all sleep durations greater or equal 4ms will meet the constraints.
> >>  * * ``HZ=1000`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 1 / (1/8) = 8``:
> >>  *   all sleep durations greater or equal 8ms will meet the constraints.
> >>  * * ``HZ=250`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 4 / (1/4) = 16``:
> >>  *   all sleep durations greater or equal 16ms will meet the constraints.
> >>  * * ``HZ=250`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 4 / (1/8) = 32``:
> >>  *   all sleep durations greater or equal 32ms will meet the constraints.
> >
> > But who defines those slacks and where? I'm even more confused now...
> 
> I think I know where the confusion comes from. I rephrase it once more
> and turned around the calculation:
> 
> /**
>  * msleep - sleep safely even with waitqueue interruptions
>  * @msecs:	Requested sleep duration in milliseconds
>  *
>  * msleep() uses jiffy based timeouts for the sleep duration. Because of the
>  * design of the timer wheel, the maximum additional percentage delay (slack) is
>  * 12.5%. This is only valid for timers which will end up in level 1 or a
>  * higher level of the timer wheel. For explanation of those 12.5% please check
>  * the detailed description about the basics of the timer wheel.
>  *
>  * The slack of timers which will end up in level 0 depends on sleep
>  * duration (msecs) and HZ configuration and can be calculated in the
>  * following way (with the timer wheel design restriction that the slack
>  * is not less than 12.5%):
>  *
>  *   ``slack = MSECS_PER_TICK / msecs``
>  *
>  * When the allowed slack of the callsite is known, the calculation
>  * could be turned around to find the minimal allowed sleep duration to meet
>  * the constraints. For example:
>  *
>  * * ``HZ=1000`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 1 / (1/4) = 4``:
>  *   all sleep durations greater or equal 4ms will meet the constraints.
>  * * ``HZ=1000`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 1 / (1/8) = 8``:
>  *   all sleep durations greater or equal 8ms will meet the constraints.
>  * * ``HZ=250`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 4 / (1/4) = 16``:
>  *   all sleep durations greater or equal 16ms will meet the constraints.
>  * * ``HZ=250`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 4 / (1/8) = 32``:
>  *   all sleep durations greater or equal 32ms will meet the constraints.
>  *
>  * See also the signal aware variant msleep_interruptible().
>  */
> 
> Hopefully this attempt clarifies the confusion?
> 

Yes now it's perfectly clear!

Please add: Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks.