kernel/time/tick-common.c | 11 +---------- kernel/time/tick-sched.c | 13 ++++++++++++- 2 files changed, 13 insertions(+), 11 deletions(-)
From: Thomas Gleixner <tglx@linutronix.de>
The tick period is aligned very early while the first clock_event_device
is registered. The system runs in periodic mode and switches later to
one-shot mode if possible.
The next wake-up event is programmed based on aligned value
(tick_next_period) but the delta value, that is used to program the
clock_event_device, is computed based on ktime_get().
With the subtracted offset, the devices fires in less than the exacted
time frame. With a large enough offset the system programs the timer for
the next wake-up and the remaining time left is too little to make any
boot progress. The system hangs.
Move the alignment later to the setup of tick_sched timer. At this point
the system switches to oneshot mode and a highres clocksource is
available. It safe to update tick_next_period ktime_get() will now
return accurate (not jiffies based) time.
[bigeasy: Patch description + testing].
Reported-by: Mathias Krause <minipli@grsecurity.net>
Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
---
kernel/time/tick-common.c | 11 +----------
kernel/time/tick-sched.c | 13 ++++++++++++-
2 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 65b8658da829e..b85f2f9c32426 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -218,19 +218,10 @@ static void tick_setup_device(struct tick_device *td,
* this cpu:
*/
if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
- ktime_t next_p;
- u32 rem;
tick_do_timer_cpu = cpu;
- next_p = ktime_get();
- div_u64_rem(next_p, TICK_NSEC, &rem);
- if (rem) {
- next_p -= rem;
- next_p += TICK_NSEC;
- }
-
- tick_next_period = next_p;
+ tick_next_period = ktime_get();
#ifdef CONFIG_NO_HZ_FULL
/*
* The boot CPU may be nohz_full, in which case set
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 52254679ec489..42c0be3080bde 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
raw_spin_lock(&jiffies_lock);
write_seqcount_begin(&jiffies_seq);
/* Did we start the jiffies update yet ? */
- if (last_jiffies_update == 0)
+ if (last_jiffies_update == 0) {
+ u32 rem;
+
+ /*
+ * Ensure that the tick is aligned to a multiple of
+ * TICK_NSEC.
+ */
+ div_u64_rem(tick_next_period, TICK_NSEC, &rem);
+ if (rem)
+ tick_next_period += TICK_NSEC - rem;
+
last_jiffies_update = tick_next_period;
+ }
period = last_jiffies_update;
write_seqcount_end(&jiffies_seq);
raw_spin_unlock(&jiffies_lock);
--
2.40.1
On Thu, Jun 15, 2023 at 11:18:30AM +0200, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The tick period is aligned very early while the first clock_event_device
> is registered. The system runs in periodic mode and switches later to
> one-shot mode if possible.
>
> The next wake-up event is programmed based on aligned value
> (tick_next_period) but the delta value, that is used to program the
> clock_event_device, is computed based on ktime_get().
>
> With the subtracted offset, the devices fires in less than the exacted
> time frame. With a large enough offset the system programs the timer for
> the next wake-up and the remaining time left is too little to make any
> boot progress. The system hangs.
>
> Move the alignment later to the setup of tick_sched timer. At this point
> the system switches to oneshot mode and a highres clocksource is
> available. It safe to update tick_next_period ktime_get() will now
> return accurate (not jiffies based) time.
>
> [bigeasy: Patch description + testing].
>
> Reported-by: Mathias Krause <minipli@grsecurity.net>
> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
> ---
> kernel/time/tick-common.c | 11 +----------
> kernel/time/tick-sched.c | 13 ++++++++++++-
> 2 files changed, 13 insertions(+), 11 deletions(-)
What's the status of this fix, I didn't see it in -rc7, am I looking in
the wrong place?
thanks,
greg k-h
On 19.06.23 08:18, Greg KH wrote:
> On Thu, Jun 15, 2023 at 11:18:30AM +0200, Sebastian Andrzej Siewior wrote:
>> From: Thomas Gleixner <tglx@linutronix.de>
>>
>> The tick period is aligned very early while the first clock_event_device
>> is registered. The system runs in periodic mode and switches later to
>> one-shot mode if possible.
>>
>> The next wake-up event is programmed based on aligned value
>> (tick_next_period) but the delta value, that is used to program the
>> clock_event_device, is computed based on ktime_get().
>>
>> With the subtracted offset, the devices fires in less than the exacted
>> time frame. With a large enough offset the system programs the timer for
>> the next wake-up and the remaining time left is too little to make any
>> boot progress. The system hangs.
>>
>> Move the alignment later to the setup of tick_sched timer. At this point
>> the system switches to oneshot mode and a highres clocksource is
>> available. It safe to update tick_next_period ktime_get() will now
>> return accurate (not jiffies based) time.
>>
>> [bigeasy: Patch description + testing].
>>
>> Reported-by: Mathias Krause <minipli@grsecurity.net>
>> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
>> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
>> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
>> ---
>> kernel/time/tick-common.c | 11 +----------
>> kernel/time/tick-sched.c | 13 ++++++++++++-
>> 2 files changed, 13 insertions(+), 11 deletions(-)
>
> What's the status of this fix, I didn't see it in -rc7, am I looking in
> the wrong place?
It's in the tip tree since Friday, but yeah, no pull request yet:
https://git.kernel.org/tip/13bb06f8dd42
Thanks,
Mathias
On Thu, 15 Jun 2023 11:18:30 +0200 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The tick period is aligned very early while the first clock_event_device
> is registered. The system runs in periodic mode and switches later to
> one-shot mode if possible.
>
> The next wake-up event is programmed based on aligned value
> (tick_next_period) but the delta value, that is used to program the
> clock_event_device, is computed based on ktime_get().
>
> With the subtracted offset, the devices fires in less than the exacted
> time frame. With a large enough offset the system programs the timer for
> the next wake-up and the remaining time left is too little to make any
> boot progress. The system hangs.
>
> Move the alignment later to the setup of tick_sched timer. At this point
> the system switches to oneshot mode and a highres clocksource is
> available. It safe to update tick_next_period ktime_get() will now
> return accurate (not jiffies based) time.
>
> [bigeasy: Patch description + testing].
>
> Reported-by: Mathias Krause <minipli@grsecurity.net>
> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
I guess adding 'Cc: stable@' might further help stable maintainers?
I also left one very tirival cosmetic comment below, but I dont think those
could be blockers.
Acked-by: SeongJae Park <sj@kernel.org>
Thanks,
SJ
> ---
> kernel/time/tick-common.c | 11 +----------
> kernel/time/tick-sched.c | 13 ++++++++++++-
> 2 files changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 65b8658da829e..b85f2f9c32426 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -218,19 +218,10 @@ static void tick_setup_device(struct tick_device *td,
> * this cpu:
> */
> if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> - ktime_t next_p;
> - u32 rem;
>
Nit: I guess we'd like to remove above one blank line together?
> tick_do_timer_cpu = cpu;
>
> - next_p = ktime_get();
> - div_u64_rem(next_p, TICK_NSEC, &rem);
> - if (rem) {
> - next_p -= rem;
> - next_p += TICK_NSEC;
> - }
> -
> - tick_next_period = next_p;
> + tick_next_period = ktime_get();
> #ifdef CONFIG_NO_HZ_FULL
> /*
> * The boot CPU may be nohz_full, in which case set
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 52254679ec489..42c0be3080bde 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
> raw_spin_lock(&jiffies_lock);
> write_seqcount_begin(&jiffies_seq);
> /* Did we start the jiffies update yet ? */
> - if (last_jiffies_update == 0)
> + if (last_jiffies_update == 0) {
> + u32 rem;
> +
> + /*
> + * Ensure that the tick is aligned to a multiple of
> + * TICK_NSEC.
> + */
> + div_u64_rem(tick_next_period, TICK_NSEC, &rem);
> + if (rem)
> + tick_next_period += TICK_NSEC - rem;
> +
> last_jiffies_update = tick_next_period;
> + }
> period = last_jiffies_update;
> write_seqcount_end(&jiffies_seq);
> raw_spin_unlock(&jiffies_lock);
> --
> 2.40.1
On 2023-06-15 05:18, Sebastian Andrzej Siewior wrote:
>
>
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The tick period is aligned very early while the first clock_event_device
> is registered. The system runs in periodic mode and switches later to
> one-shot mode if possible.
>
> The next wake-up event is programmed based on aligned value
> (tick_next_period) but the delta value, that is used to program the
> clock_event_device, is computed based on ktime_get().
>
> With the subtracted offset, the devices fires in less than the exacted
> time frame. With a large enough offset the system programs the timer for
> the next wake-up and the remaining time left is too little to make any
> boot progress. The system hangs.
>
> Move the alignment later to the setup of tick_sched timer. At this point
> the system switches to oneshot mode and a highres clocksource is
> available. It safe to update tick_next_period ktime_get() will now
> return accurate (not jiffies based) time.
>
> [bigeasy: Patch description + testing].
>
> Reported-by: Mathias Krause <minipli@grsecurity.net>
> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
I've tested this against 5.10.184 (which is where it reproduces quick
for me):
Tested-by: Luiz Capitulino <luizcap@amazon.com>
> ---
> kernel/time/tick-common.c | 11 +----------
> kernel/time/tick-sched.c | 13 ++++++++++++-
> 2 files changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 65b8658da829e..b85f2f9c32426 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -218,19 +218,10 @@ static void tick_setup_device(struct tick_device *td,
> * this cpu:
> */
> if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> - ktime_t next_p;
> - u32 rem;
>
> tick_do_timer_cpu = cpu;
>
> - next_p = ktime_get();
> - div_u64_rem(next_p, TICK_NSEC, &rem);
> - if (rem) {
> - next_p -= rem;
> - next_p += TICK_NSEC;
> - }
> -
> - tick_next_period = next_p;
> + tick_next_period = ktime_get();
> #ifdef CONFIG_NO_HZ_FULL
> /*
> * The boot CPU may be nohz_full, in which case set
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 52254679ec489..42c0be3080bde 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
> raw_spin_lock(&jiffies_lock);
> write_seqcount_begin(&jiffies_seq);
> /* Did we start the jiffies update yet ? */
> - if (last_jiffies_update == 0)
> + if (last_jiffies_update == 0) {
> + u32 rem;
> +
> + /*
> + * Ensure that the tick is aligned to a multiple of
> + * TICK_NSEC.
> + */
> + div_u64_rem(tick_next_period, TICK_NSEC, &rem);
> + if (rem)
> + tick_next_period += TICK_NSEC - rem;
> +
> last_jiffies_update = tick_next_period;
> + }
> period = last_jiffies_update;
> write_seqcount_end(&jiffies_seq);
> raw_spin_unlock(&jiffies_lock);
> --
> 2.40.1
>
On Thu, Jun 15, 2023 at 11:18:30AM +0200, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The tick period is aligned very early while the first clock_event_device
> is registered. The system runs in periodic mode and switches later to
> one-shot mode if possible.
>
> The next wake-up event is programmed based on aligned value
> (tick_next_period) but the delta value, that is used to program the
> clock_event_device, is computed based on ktime_get().
>
> With the subtracted offset, the devices fires in less than the exacted
> time frame. With a large enough offset the system programs the timer for
> the next wake-up and the remaining time left is too little to make any
> boot progress. The system hangs.
>
> Move the alignment later to the setup of tick_sched timer. At this point
> the system switches to oneshot mode and a highres clocksource is
> available. It safe to update tick_next_period ktime_get() will now
> return accurate (not jiffies based) time.
>
> [bigeasy: Patch description + testing].
>
> Reported-by: Mathias Krause <minipli@grsecurity.net>
> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
Tested-by: Richard W.M. Jones <rjones@redhat.com>
... fixing this bug which we thought originally was in qemu, then in
an unrelated kernel commit:
https://gitlab.com/qemu-project/qemu/-/issues/1696
https://lore.kernel.org/all/20230613134105.GA10301@redhat.com/
Rich.
> kernel/time/tick-common.c | 11 +----------
> kernel/time/tick-sched.c | 13 ++++++++++++-
> 2 files changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 65b8658da829e..b85f2f9c32426 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -218,19 +218,10 @@ static void tick_setup_device(struct tick_device *td,
> * this cpu:
> */
> if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> - ktime_t next_p;
> - u32 rem;
>
> tick_do_timer_cpu = cpu;
>
> - next_p = ktime_get();
> - div_u64_rem(next_p, TICK_NSEC, &rem);
> - if (rem) {
> - next_p -= rem;
> - next_p += TICK_NSEC;
> - }
> -
> - tick_next_period = next_p;
> + tick_next_period = ktime_get();
> #ifdef CONFIG_NO_HZ_FULL
> /*
> * The boot CPU may be nohz_full, in which case set
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 52254679ec489..42c0be3080bde 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
> raw_spin_lock(&jiffies_lock);
> write_seqcount_begin(&jiffies_seq);
> /* Did we start the jiffies update yet ? */
> - if (last_jiffies_update == 0)
> + if (last_jiffies_update == 0) {
> + u32 rem;
> +
> + /*
> + * Ensure that the tick is aligned to a multiple of
> + * TICK_NSEC.
> + */
> + div_u64_rem(tick_next_period, TICK_NSEC, &rem);
> + if (rem)
> + tick_next_period += TICK_NSEC - rem;
> +
> last_jiffies_update = tick_next_period;
> + }
> period = last_jiffies_update;
> write_seqcount_end(&jiffies_seq);
> raw_spin_unlock(&jiffies_lock);
> --
> 2.40.1
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
On 15.06.23 11:18, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> The tick period is aligned very early while the first clock_event_device
> is registered. The system runs in periodic mode and switches later to
> one-shot mode if possible.
>
> The next wake-up event is programmed based on aligned value
> (tick_next_period) but the delta value, that is used to program the
> clock_event_device, is computed based on ktime_get().
>
> With the subtracted offset, the devices fires in less than the exacted
> time frame. With a large enough offset the system programs the timer for
> the next wake-up and the remaining time left is too little to make any
> boot progress. The system hangs.
>
> Move the alignment later to the setup of tick_sched timer. At this point
> the system switches to oneshot mode and a highres clocksource is
> available. It safe to update tick_next_period ktime_get() will now
> return accurate (not jiffies based) time.
>
> [bigeasy: Patch description + testing].
>
> Reported-by: Mathias Krause <minipli@grsecurity.net>
> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
Cc: stable, maybe? This commit already ended up in quite a few "stable"
kernels (v6.3.2, v6.2.15, v6.1.28, v5.15.111, v5.10.180 and v5.4.243)
and it might be better to list them explicitly to avoid one of them
getting missed.
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
> ---
> kernel/time/tick-common.c | 11 +----------
> kernel/time/tick-sched.c | 13 ++++++++++++-
> 2 files changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 65b8658da829e..b85f2f9c32426 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -218,19 +218,10 @@ static void tick_setup_device(struct tick_device *td,
> * this cpu:
> */
> if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> - ktime_t next_p;
> - u32 rem;
>
> tick_do_timer_cpu = cpu;
>
> - next_p = ktime_get();
> - div_u64_rem(next_p, TICK_NSEC, &rem);
> - if (rem) {
> - next_p -= rem;
> - next_p += TICK_NSEC;
> - }
> -
> - tick_next_period = next_p;
> + tick_next_period = ktime_get();
> #ifdef CONFIG_NO_HZ_FULL
> /*
> * The boot CPU may be nohz_full, in which case set
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 52254679ec489..42c0be3080bde 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
> raw_spin_lock(&jiffies_lock);
> write_seqcount_begin(&jiffies_seq);
> /* Did we start the jiffies update yet ? */
> - if (last_jiffies_update == 0)
> + if (last_jiffies_update == 0) {
> + u32 rem;
> +
> + /*
> + * Ensure that the tick is aligned to a multiple of
> + * TICK_NSEC.
> + */
> + div_u64_rem(tick_next_period, TICK_NSEC, &rem);
> + if (rem)
> + tick_next_period += TICK_NSEC - rem;
> +
> last_jiffies_update = tick_next_period;
> + }
> period = last_jiffies_update;
> write_seqcount_end(&jiffies_seq);
> raw_spin_unlock(&jiffies_lock);
Hah, nice spot. So you implemented what I suggested and it, indeed,
works as expected, thereby:
Tested-by: Mathias Krause <minipli@grsecurity.net>
Thanks,
Mathias
© 2016 - 2026 Red Hat, Inc.