From nobody Fri Sep 12 10:30:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E730C636CD for ; Fri, 10 Feb 2023 14:09:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232430AbjBJOJe (ORCPT ); Fri, 10 Feb 2023 09:09:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232429AbjBJOJ3 (ORCPT ); Fri, 10 Feb 2023 09:09:29 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E73E72195A for ; Fri, 10 Feb 2023 06:09:27 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5991A61DCC for ; Fri, 10 Feb 2023 14:09:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9B65CC433D2; Fri, 10 Feb 2023 14:09:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1676038166; bh=FaVCj67P0RIyRZ9SOpAzVW9SW/d7Prf5e3bxlBOh2/o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UELP+A9oXEuDHgvQT4Xnlimx8oOHWewth4dIVn8V4qqePtaa/ZXu7V3shqHKtChaS RzaZ/LWoUWrmNvH9XjuWU3BFP2HVX6vy5Do9A3v1HMm9H4E92OnlPKxM1Jj3i31MTb G82c6Kbqsmnei+EGq7X+vI54fzZIh6jrl1VhmQbVdF8eZajz8ohT3SSa99OAEmbB88 gJcolUJZfUK9wK41WSZAHigUU7EduSYF/haH/a5/xH/o717V6rkk0RHGcOdmxvzXTF ZjlVx60HL42E0JxO5SXuYWjdAnRqHAvRzKIAk1zccJOYNCd82gQUTpn2aNdg/WFoSQ 5US2UPnpqtYdg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Alexey Dobriyan , Peter Zijlstra , Wei Li , Mirsad Goran Todorovac , Thomas Gleixner , Yu Liao , Hillf Danton , Ingo Molnar Subject: [PATCH 1/6] timers/nohz: Restructure and reshuffle struct tick_sched Date: Fri, 10 Feb 2023 15:09:12 +0100 Message-Id: <20230210140917.279062-2-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230210140917.279062-1-frederic@kernel.org> References: <20230210140917.279062-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Restructure and group fields by access in order to optimize cache layout. While at it, also add missing kernel doc for two fields: @last_jiffies and @idle_expires. Reported-by: Thomas Gleixner Cc: Hillf Danton Cc: Yu Liao Cc: Ingo Molnar Cc: Wei Li Cc: Alexey Dobriyan Cc: Mirsad Goran Todorovac Cc: Peter Zijlstra Signed-off-by: Frederic Weisbecker Acked-by: Peter Zijlstra (Intel) --- kernel/time/tick-sched.h | 66 +++++++++++++++++++++++++--------------- 1 file changed, 41 insertions(+), 25 deletions(-) diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h index 504649513399..c6663254d17d 100644 --- a/kernel/time/tick-sched.h +++ b/kernel/time/tick-sched.h @@ -22,65 +22,81 @@ enum tick_nohz_mode { =20 /** * struct tick_sched - sched tick emulation and no idle tick control/stats - * @sched_timer: hrtimer to schedule the periodic tick in high - * resolution mode - * @check_clocks: Notification mechanism about clocksource changes - * @nohz_mode: Mode - one state of tick_nohz_mode + * * @inidle: Indicator that the CPU is in the tick idle mode * @tick_stopped: Indicator that the idle tick has been stopped * @idle_active: Indicator that the CPU is actively in the tick idle mode; * it is reset during irq handling phases. - * @do_timer_lst: CPU was the last one doing do_timer before going idle + * @do_timer_last: CPU was the last one doing do_timer before going idle * @got_idle_tick: Tick timer function has run with @inidle set + * @stalled_jiffies: Number of stalled jiffies detected across ticks + * @last_tick_jiffies: Value of jiffies seen on last tick + * @sched_timer: hrtimer to schedule the periodic tick in high + * resolution mode * @last_tick: Store the last tick expiry time when the tick * timer is modified for nohz sleeps. This is necessary * to resume the tick timer operation in the timeline * when the CPU returns from nohz sleep. * @next_tick: Next tick to be fired when in dynticks mode. * @idle_jiffies: jiffies at the entry to idle for idle time accounting + * @idle_waketime: Time when the idle was interrupted + * @idle_entrytime: Time when the idle call was entered + * @nohz_mode: Mode - one state of tick_nohz_mode + * @last_jiffies: Base jiffies snapshot when next event was last computed + * @timer_expires_base: Base time clock monotonic for @timer_expires + * @timer_expires: Anticipated timer expiration time (in case sched tick i= s stopped) + * @next_timer: Expiry time of next expiring timer for debugging purpose = only + * @idle_expires: Next tick in idle, for debugging purpose only * @idle_calls: Total number of idle calls * @idle_sleeps: Number of idle calls, where the sched tick was stopped - * @idle_entrytime: Time when the idle call was entered - * @idle_waketime: Time when the idle was interrupted * @idle_exittime: Time when the idle state was left * @idle_sleeptime: Sum of the time slept in idle with sched tick stopped * @iowait_sleeptime: Sum of the time slept in idle with sched tick stoppe= d, with IO outstanding - * @timer_expires: Anticipated timer expiration time (in case sched tick i= s stopped) - * @timer_expires_base: Base time clock monotonic for @timer_expires - * @next_timer: Expiry time of next expiring timer for debugging purpose = only * @tick_dep_mask: Tick dependency mask - is set, if someone needs the tick - * @last_tick_jiffies: Value of jiffies seen on last tick - * @stalled_jiffies: Number of stalled jiffies detected across ticks + * @check_clocks: Notification mechanism about clocksource changes */ struct tick_sched { - struct hrtimer sched_timer; - unsigned long check_clocks; - enum tick_nohz_mode nohz_mode; - + /* Common flags */ unsigned int inidle : 1; unsigned int tick_stopped : 1; unsigned int idle_active : 1; unsigned int do_timer_last : 1; unsigned int got_idle_tick : 1; =20 + /* Tick handling: jiffies stall check */ + unsigned int stalled_jiffies; + unsigned long last_tick_jiffies; + + /* Tick handling */ + struct hrtimer sched_timer; ktime_t last_tick; ktime_t next_tick; unsigned long idle_jiffies; - unsigned long idle_calls; - unsigned long idle_sleeps; - ktime_t idle_entrytime; ktime_t idle_waketime; - ktime_t idle_exittime; - ktime_t idle_sleeptime; - ktime_t iowait_sleeptime; + + /* Idle entry */ + ktime_t idle_entrytime; + + /* Tick stop */ + enum tick_nohz_mode nohz_mode; unsigned long last_jiffies; - u64 timer_expires; u64 timer_expires_base; + u64 timer_expires; u64 next_timer; ktime_t idle_expires; + unsigned long idle_calls; + unsigned long idle_sleeps; + + /* Idle exit */ + ktime_t idle_exittime; + ktime_t idle_sleeptime; + ktime_t iowait_sleeptime; + + /* Full dynticks handling */ atomic_t tick_dep_mask; - unsigned long last_tick_jiffies; - unsigned int stalled_jiffies; + + /* Clocksource changes */ + unsigned long check_clocks; }; =20 extern struct tick_sched *tick_get_tick_sched(int cpu); --=20 2.34.1 From nobody Fri Sep 12 10:30:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90A7AC05027 for ; Fri, 10 Feb 2023 14:09:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232446AbjBJOJl (ORCPT ); Fri, 10 Feb 2023 09:09:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232443AbjBJOJh (ORCPT ); Fri, 10 Feb 2023 09:09:37 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38C911B570 for ; Fri, 10 Feb 2023 06:09:32 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A6A74B82280 for ; Fri, 10 Feb 2023 14:09:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 42292C433EF; Fri, 10 Feb 2023 14:09:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1676038169; bh=DTwnO69RuMGiGrcCioikHuzFr2i7tzmHe97x2bj6jfw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lvIYljSOpFSc1gXt4msCSasQgiigspNSl7oK8mjEVdmfTZGTqzy55GSupermnme7B 3ULYZB4r2PrYFQy85Mx92pTSkyQ8SaZGCOhYPFsFq9h+Q3zjBFVcP1jo2w10TMUINb NJKo90JkRRLiOLP/VgJss8s9MczqNVnNOl5QspxINpyNIj8d456HJ0nGpXJzoD87fg nylbp1qcUdhReF/ZU0g4jiGSEIPyi7dNno9iHtDWBccv1YB2YSegzXgyQpIXsKzr5g WjBQd8Y/M+lMBNpNY4Nmj9osrsNUTpOD0i0Lh3CnVl+CdsbaahcqRWX8gZCmSLaJ7u RS/26oZNYJIZg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Alexey Dobriyan , Peter Zijlstra , Wei Li , Mirsad Goran Todorovac , Thomas Gleixner , Yu Liao , Hillf Danton , Ingo Molnar Subject: [PATCH 2/6] timers/nohz: Only ever update sleeptime from idle exit Date: Fri, 10 Feb 2023 15:09:13 +0100 Message-Id: <20230210140917.279062-3-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230210140917.279062-1-frederic@kernel.org> References: <20230210140917.279062-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The idle and io sleeptime statistics appearing in /proc/stat can be currently updated from two sites: locally on idle exit and remotely by cpufreq. However there is no synchronization mechanism protecting concurrent updates. It is therefore possible to account the sleeptime twice, among all the possible broken scenarios. To prevent from breaking the sleeptime accounting source, restrict the sleeptime updates to the local idle exit site. If there is a delta to add since the last update, IO/Idle sleep time readers will now only compute the delta without actually writing it back to the internal idle statistic fields. This fixes a writer VS writer race. Note there are still two known reader VS writer races to handle. A subsequent patch will fix one. Reported-by: Yu Liao Cc: Hillf Danton Cc: Yu Liao Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Wei Li Cc: Alexey Dobriyan Cc: Mirsad Goran Todorovac Cc: Peter Zijlstra Signed-off-by: Frederic Weisbecker Acked-by: Peter Zijlstra (Intel) --- kernel/time/tick-sched.c | 104 ++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 62 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index b0e3c9205946..db22342f8948 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -637,31 +637,21 @@ static void tick_nohz_update_jiffies(ktime_t now) touch_softlockup_watchdog_sched(); } =20 -/* - * Updates the per-CPU time idle statistics counters - */ -static void -update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *las= t_update_time) -{ - ktime_t delta; - - if (ts->idle_active) { - delta =3D ktime_sub(now, ts->idle_entrytime); - if (nr_iowait_cpu(cpu) > 0) - ts->iowait_sleeptime =3D ktime_add(ts->iowait_sleeptime, delta); - else - ts->idle_sleeptime =3D ktime_add(ts->idle_sleeptime, delta); - ts->idle_entrytime =3D now; - } - - if (last_update_time) - *last_update_time =3D ktime_to_us(now); - -} - static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now) { - update_ts_time_stats(smp_processor_id(), ts, now, NULL); + ktime_t delta; + + if (WARN_ON_ONCE(!ts->idle_active)) + return; + + delta =3D ktime_sub(now, ts->idle_entrytime); + + if (nr_iowait_cpu(smp_processor_id()) > 0) + ts->iowait_sleeptime =3D ktime_add(ts->iowait_sleeptime, delta); + else + ts->idle_sleeptime =3D ktime_add(ts->idle_sleeptime, delta); + + ts->idle_entrytime =3D now; ts->idle_active =3D 0; =20 sched_clock_idle_wakeup_event(); @@ -674,6 +664,31 @@ static void tick_nohz_start_idle(struct tick_sched *ts) sched_clock_idle_sleep_event(); } =20 +static u64 get_cpu_sleep_time_us(int cpu, ktime_t *sleeptime, + bool compute_delta, u64 *last_update_time) +{ + struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); + ktime_t now, idle; + + if (!tick_nohz_active) + return -1; + + now =3D ktime_get(); + if (last_update_time) + *last_update_time =3D ktime_to_us(now); + + if (ts->idle_active && !nr_iowait_cpu(cpu)) { + ktime_t delta =3D ktime_sub(now, ts->idle_entrytime); + + idle =3D ktime_add(*sleeptime, delta); + } else { + idle =3D *sleeptime; + } + + return ktime_to_us(idle); + +} + /** * get_cpu_idle_time_us - get the total idle time of a CPU * @cpu: CPU number to query @@ -691,27 +706,9 @@ static void tick_nohz_start_idle(struct tick_sched *ts) u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - ktime_t now, idle; - - if (!tick_nohz_active) - return -1; - - now =3D ktime_get(); - if (last_update_time) { - update_ts_time_stats(cpu, ts, now, last_update_time); - idle =3D ts->idle_sleeptime; - } else { - if (ts->idle_active && !nr_iowait_cpu(cpu)) { - ktime_t delta =3D ktime_sub(now, ts->idle_entrytime); - - idle =3D ktime_add(ts->idle_sleeptime, delta); - } else { - idle =3D ts->idle_sleeptime; - } - } - - return ktime_to_us(idle); =20 + return get_cpu_sleep_time_us(cpu, &ts->idle_sleeptime, + !nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); =20 @@ -732,26 +729,9 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); - ktime_t now, iowait; =20 - if (!tick_nohz_active) - return -1; - - now =3D ktime_get(); - if (last_update_time) { - update_ts_time_stats(cpu, ts, now, last_update_time); - iowait =3D ts->iowait_sleeptime; - } else { - if (ts->idle_active && nr_iowait_cpu(cpu) > 0) { - ktime_t delta =3D ktime_sub(now, ts->idle_entrytime); - - iowait =3D ktime_add(ts->iowait_sleeptime, delta); - } else { - iowait =3D ts->iowait_sleeptime; - } - } - - return ktime_to_us(iowait); + return get_cpu_sleep_time_us(cpu, &ts->iowait_sleeptime, + nr_iowait_cpu(cpu), last_update_time); } EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); =20 --=20 2.34.1 From nobody Fri Sep 12 10:30:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 186BEC636D7 for ; Fri, 10 Feb 2023 14:10:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232476AbjBJOJs (ORCPT ); Fri, 10 Feb 2023 09:09:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50186 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232460AbjBJOJn (ORCPT ); Fri, 10 Feb 2023 09:09:43 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08B4A7407D for ; Fri, 10 Feb 2023 06:09:34 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 436CFB8251D for ; Fri, 10 Feb 2023 14:09:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DA9E2C4339B; Fri, 10 Feb 2023 14:09:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1676038172; bh=uGFctYYXWtKjvXXubFVYxBpGfYhc7n33Hu43EUXs6NM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=G10o5gZnxPa9cdACbV9apoHHnkc6ZHxJPdHxwQyzqZzTKAf+d+shez4QNAGy6KrIa CABaCi4TORQERtSoPhOwdxg0h+WjfP8bcnrRe/XTGogYvOKeIkUphZRK42U0XpUrS6 ARilPOTRz1SGmDemN2WkSiLZ35EOwDfJ8EuGkf++OIg8WITUhoR+GEQfxPaitd2OoE nkPvWt7Cn/7dXQBs7GFd6SWMaCCuk0IikYF8/YrcHr1WraLSkn75m1+ftkhdEN5Cvi o5vkTR/Z4ZFv7ndvVTLxIX1YTjTSVFmTq/EqNK9FdXZXeoztCcLFBaCW9YneBZ+z+9 N726M+2CSpEaA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Alexey Dobriyan , Peter Zijlstra , Wei Li , Mirsad Goran Todorovac , Thomas Gleixner , Yu Liao , Hillf Danton , Ingo Molnar Subject: [PATCH 3/6] timers/nohz: Protect idle/iowait sleep time under seqcount Date: Fri, 10 Feb 2023 15:09:14 +0100 Message-Id: <20230210140917.279062-4-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230210140917.279062-1-frederic@kernel.org> References: <20230210140917.279062-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Reading idle/io sleep time (eg: from /proc/stat) can race with idle exit updates because the state machine handling the stats is not atomic and requires a coherent read batch. As a result reading the sleep time may report irrelevant or backward values. Fix this with protecting the simple state machine within a seqcount. This is expected to be cheap enough not to add measurable performance impact on the idle path. Note this only fixes reader VS writer condition partitially. A race remains that involves remote updates of the CPU iowait task counter. It can hardly be fixed. Reported-by: Yu Liao Cc: Hillf Danton Cc: Yu Liao Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Wei Li Cc: Alexey Dobriyan Cc: Mirsad Goran Todorovac Cc: Peter Zijlstra Signed-off-by: Frederic Weisbecker Acked-by: Peter Zijlstra (Intel) --- kernel/time/tick-sched.c | 22 ++++++++++++++++------ kernel/time/tick-sched.h | 1 + 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index db22342f8948..757e03ef4409 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -646,6 +646,7 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, = ktime_t now) =20 delta =3D ktime_sub(now, ts->idle_entrytime); =20 + write_seqcount_begin(&ts->idle_sleeptime_seq); if (nr_iowait_cpu(smp_processor_id()) > 0) ts->iowait_sleeptime =3D ktime_add(ts->iowait_sleeptime, delta); else @@ -653,14 +654,18 @@ static void tick_nohz_stop_idle(struct tick_sched *ts= , ktime_t now) =20 ts->idle_entrytime =3D now; ts->idle_active =3D 0; + write_seqcount_end(&ts->idle_sleeptime_seq); =20 sched_clock_idle_wakeup_event(); } =20 static void tick_nohz_start_idle(struct tick_sched *ts) { + write_seqcount_begin(&ts->idle_sleeptime_seq); ts->idle_entrytime =3D ktime_get(); ts->idle_active =3D 1; + write_seqcount_end(&ts->idle_sleeptime_seq); + sched_clock_idle_sleep_event(); } =20 @@ -669,6 +674,7 @@ static u64 get_cpu_sleep_time_us(int cpu, ktime_t *slee= ptime, { struct tick_sched *ts =3D &per_cpu(tick_cpu_sched, cpu); ktime_t now, idle; + unsigned int seq; =20 if (!tick_nohz_active) return -1; @@ -677,13 +683,17 @@ static u64 get_cpu_sleep_time_us(int cpu, ktime_t *sl= eeptime, if (last_update_time) *last_update_time =3D ktime_to_us(now); =20 - if (ts->idle_active && !nr_iowait_cpu(cpu)) { - ktime_t delta =3D ktime_sub(now, ts->idle_entrytime); + do { + seq =3D read_seqcount_begin(&ts->idle_sleeptime_seq); =20 - idle =3D ktime_add(*sleeptime, delta); - } else { - idle =3D *sleeptime; - } + if (ts->idle_active && !nr_iowait_cpu(cpu)) { + ktime_t delta =3D ktime_sub(now, ts->idle_entrytime); + + idle =3D ktime_add(*sleeptime, delta); + } else { + idle =3D *sleeptime; + } + } while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq)); =20 return ktime_to_us(idle); =20 diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h index c6663254d17d..5ed5a9d41d5a 100644 --- a/kernel/time/tick-sched.h +++ b/kernel/time/tick-sched.h @@ -75,6 +75,7 @@ struct tick_sched { ktime_t idle_waketime; =20 /* Idle entry */ + seqcount_t idle_sleeptime_seq; ktime_t idle_entrytime; =20 /* Tick stop */ --=20 2.34.1 From nobody Fri Sep 12 10:30:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22366C636D4 for ; Fri, 10 Feb 2023 14:10:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232464AbjBJOKJ (ORCPT ); Fri, 10 Feb 2023 09:10:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232465AbjBJOJr (ORCPT ); Fri, 10 Feb 2023 09:09:47 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E64674998 for ; Fri, 10 Feb 2023 06:09:37 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 1C466B824CF for ; Fri, 10 Feb 2023 14:09:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7CE88C433EF; Fri, 10 Feb 2023 14:09:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1676038174; bh=H1/1wQpakm7Rwe+HqpYOF44S3z7tTyE+Intx0NLRpx8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DDFnJHQOZ66Z8Tl3ybE0nX4k4L/0RQNDRp2D04Wg023oD/b/WgkR/ryauQ70mOTJ6 45qevKmTRI7WuMliaLFNQVIEmLo7ozbu8cUMYP4mNGgIIOdokUmWR5vrzZNqngvaTk lcEtMzqME/TABgcQjOSAMWr1oM2XPW1NoNH+0xlpa8VR6jZHsquv4s0Mtg4SyvVWSn VRuKc9pscKhM40MDPYz+ErJ+ujq1YgKXCgvCuUf0dNVc8Vo3gA/oQs8PWHcnRhAnO/ 9qslSokLvZDWXpIa+c7auLCdX0bJn3g1vPocg1jQPBDFOam9Ex1sEF9i5OINpplvne x6BmcZmSCQumQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Alexey Dobriyan , Peter Zijlstra , Wei Li , Mirsad Goran Todorovac , Thomas Gleixner , Yu Liao , Hillf Danton , Ingo Molnar Subject: [PATCH 4/6] timers/nohz: Add a comment about broken iowait counter update race Date: Fri, 10 Feb 2023 15:09:15 +0100 Message-Id: <20230210140917.279062-5-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230210140917.279062-1-frederic@kernel.org> References: <20230210140917.279062-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The per-cpu iowait task counter is incremented locally upon sleeping. But since the task can be woken to (and by) another CPU, the counter may then be decremented remotely. This is the source of a race involving readers VS writer of idle/iowait sleeptime. The following scenario shows an example where a /proc/stat reader observes a pending sleep time as IO whereas that pending sleep time later eventually gets accounted as non-IO. CPU 0 CPU 1 CPU 2 ----- ----- ------ //io_schedule() TASK A current->in_iowait =3D 1 rq(0)->nr_iowait++ //switch to idle // READ /proc/stat // See nr_iowait_cpu(0) =3D=3D 1 return ts->iowait_sleeptime + ktime_sub(ktime_get(), ts->idle_entrytime) //try_to_wake_up(= TASK A) rq(0)->nr_iowait-- //idle exit // See nr_iowait_cpu(0) =3D=3D 0 ts->idle_sleeptime +=3D ktime_sub(ktime_get(), ts->idle_entrytime) As a result subsequent reads on /proc/stat may expose backward progress. This is unfortunately hardly fixable. Just add a comment about that condition. Cc: Hillf Danton Cc: Yu Liao Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Wei Li Cc: Alexey Dobriyan Cc: Mirsad Goran Todorovac Cc: Peter Zijlstra Signed-off-by: Frederic Weisbecker Acked-by: Peter Zijlstra (Intel) --- kernel/time/tick-sched.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 757e03ef4409..bd9499fb15b3 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -706,7 +706,10 @@ static u64 get_cpu_sleep_time_us(int cpu, ktime_t *sle= eptime, * counters if NULL. * * Return the cumulative idle time (since boot) for a given - * CPU, in microseconds. + * CPU, in microseconds. Note this is partially broken due to + * the counter of iowait tasks that can be remotely updated without + * any synchronization. Therefore it is possible to observe backward + * values within two consecutive reads. * * This time is measured via accounting rather than sampling, * and is as accurate as ktime_get() is. @@ -729,7 +732,10 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us); * counters if NULL. * * Return the cumulative iowait time (since boot) for a given - * CPU, in microseconds. + * CPU, in microseconds. Note this is partially broken due to + * the counter of iowait tasks that can be remotely updated without + * any synchronization. Therefore it is possible to observe backward + * values within two consecutive reads. * * This time is measured via accounting rather than sampling, * and is as accurate as ktime_get() is. --=20 2.34.1 From nobody Fri Sep 12 10:30:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F105C636CD for ; Fri, 10 Feb 2023 14:10:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232495AbjBJOKU (ORCPT ); Fri, 10 Feb 2023 09:10:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232489AbjBJOJx (ORCPT ); Fri, 10 Feb 2023 09:09:53 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F01B74077 for ; Fri, 10 Feb 2023 06:09:40 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 9E4E3B8252A for ; Fri, 10 Feb 2023 14:09:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21966C4339B; Fri, 10 Feb 2023 14:09:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1676038177; bh=/jXn/vnDl0cDz/KxBmavjhz+/0Xao8NYmx2pabP5Zx4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=K/0jBDC/fMsnlIotZfR27XyUnoZAGtly6nQErhHRgQGnqqJKeNqcwk4AifODzILik kAKP+wCFxOBpg8K7GGp48xIXbzDFrmmS6IbNgnYFH0F5iB3DUg02eTjc0xi/YLHbt7 Fp5IDGzrrXA6YVu2fiqpcTa/JSZkR8I467GGH59b1zgQSHmKQrY0ZMGTCw1QK+zu6p bT4dkVZGhq7YrXFv4dxexV95Rylakm7gf3RqcBFSLBz9kvtlPffJ1pnKh7pXozchBd RUX8WwYq4wE8/s6gYsYcntQXf/vuqwmUvLGETk+SYEMrS5EdFw6I2S026Oa23G8pBc TMzu2JVON+NvQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Alexey Dobriyan , Peter Zijlstra , Wei Li , Mirsad Goran Todorovac , Thomas Gleixner , Yu Liao , Hillf Danton , Ingo Molnar Subject: [PATCH 5/6] timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick() Date: Fri, 10 Feb 2023 15:09:16 +0100 Message-Id: <20230210140917.279062-6-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230210140917.279062-1-frederic@kernel.org> References: <20230210140917.279062-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There is no need for the __tick_nohz_idle_stop_tick() function between tick_nohz_idle_stop_tick() and its implementation. Remove that unnecessary step. Cc: Hillf Danton Cc: Yu Liao Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Wei Li Cc: Alexey Dobriyan Cc: Mirsad Goran Todorovac Cc: Peter Zijlstra Signed-off-by: Frederic Weisbecker Acked-by: Peter Zijlstra (Intel) --- kernel/time/tick-sched.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index bd9499fb15b3..7065c6e373e9 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1080,10 +1080,16 @@ static bool can_stop_idle_tick(int cpu, struct tick= _sched *ts) return true; } =20 -static void __tick_nohz_idle_stop_tick(struct tick_sched *ts) +/** + * tick_nohz_idle_stop_tick - stop the idle tick from the idle task + * + * When the next event is more than a tick into the future, stop the idle = tick + */ +void tick_nohz_idle_stop_tick(void) { + struct tick_sched *ts =3D this_cpu_ptr(&tick_cpu_sched); + int cpu =3D smp_processor_id(); ktime_t expires; - int cpu =3D smp_processor_id(); =20 /* * If tick_nohz_get_sleep_length() ran tick_nohz_next_event(), the @@ -1115,16 +1121,6 @@ static void __tick_nohz_idle_stop_tick(struct tick_s= ched *ts) } } =20 -/** - * tick_nohz_idle_stop_tick - stop the idle tick from the idle task - * - * When the next event is more than a tick into the future, stop the idle = tick - */ -void tick_nohz_idle_stop_tick(void) -{ - __tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched)); -} - void tick_nohz_idle_retain_tick(void) { tick_nohz_retain_tick(this_cpu_ptr(&tick_cpu_sched)); --=20 2.34.1 From nobody Fri Sep 12 10:30:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9B07C636CD for ; Fri, 10 Feb 2023 14:10:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232508AbjBJOK2 (ORCPT ); Fri, 10 Feb 2023 09:10:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232500AbjBJOJz (ORCPT ); Fri, 10 Feb 2023 09:09:55 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31A0E74064 for ; Fri, 10 Feb 2023 06:09:41 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7F7B361DD2 for ; Fri, 10 Feb 2023 14:09:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BB131C43445; Fri, 10 Feb 2023 14:09:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1676038179; bh=gtKeflSOf/kRVKW9lvZG7UlWcNwX9gYbrJOAj1/IZRk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SQXYyBgD/vKrwdGKQ0x0+iUltbLAg5bsSTaW72Oos3ne4vHjakkeJNoKHMoLrsMR3 JO9WGdZmWyo3n8tGhl8kSk3vEDGgaRrKzhWpde8OfJohw1Aoi0VXqQ5I+wng3XwWDS pG5AkMRXEoq2JiK9/GO1bRrXGazyvq00egHl69bOPtoeLuYtliEPEaJWGqDN0SBs6X 23uZRwjunYcxr8Ekf0BVaTjGrdX8kXsza51ASUc0//fOGkULnR5nz/6URabnxDwmRo ORj0p4IT/A3Ah95EB8iMKgA9wwHy6wA7YssnD/TeqXGDtMdEGK9sLu8UY3IFANdA0F OWSNLhZO7yqVA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Alexey Dobriyan , Peter Zijlstra , Wei Li , Mirsad Goran Todorovac , Thomas Gleixner , Yu Liao , Hillf Danton , Ingo Molnar Subject: [PATCH 6/6] MAINTAINERS: Remove stale email address Date: Fri, 10 Feb 2023 15:09:17 +0100 Message-Id: <20230210140917.279062-7-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230210140917.279062-1-frederic@kernel.org> References: <20230210140917.279062-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Cc: Hillf Danton Cc: Yu Liao Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Wei Li Cc: Alexey Dobriyan Cc: Mirsad Goran Todorovac Cc: Peter Zijlstra Signed-off-by: Frederic Weisbecker Acked-by: Peter Zijlstra (Intel) --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index fb1471cb5ed3..300ca61fa0bc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14794,7 +14794,7 @@ F: include/uapi/linux/nitro_enclaves.h F: samples/nitro_enclaves/ =20 NOHZ, DYNTICKS SUPPORT -M: Frederic Weisbecker +M: Frederic Weisbecker M: Thomas Gleixner M: Ingo Molnar L: linux-kernel@vger.kernel.org --=20 2.34.1