From nobody Mon Dec 1 21:33:22 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4732028A72F for ; Mon, 1 Dec 2025 06:51:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764571910; cv=none; b=gDYW1XPXhvFJVj7MXgo7OLSHLrHqleFpATVznI/VBJq0YOCt0CmRPde/RrHEdJCf7qQ8/kRAQrM6GeB4kuL9tJRAMwRvmEyaRdWCX1N9gKDGTFL8r2J0d9ydxhZ3qozIEavOMQWBR34JNq+Xe09kfGleYQ60qN3+jvrXQdImiX0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764571910; c=relaxed/simple; bh=ysXb7CPgrEYtOtR50bCYvZocS47NK5Op8QXBLoAYkaY=; h=From:To:Cc:Subject:References:Message-ID:Content-Type: MIME-Version:Date; b=WyyzmxYNkM0LWThmaFi5I0jZtRPB02APR3csIuOtTbq6J05d7+FyNNBq9wRSc5tzyX98uHxF0qwVwTC0ZTutzhsxK+NKQamflmHFGUex+U8KqvPHTUZZfMm/PnHweraETZRORrfLAmXxYgZL1P1AUevuaEsdIQMhgK6e/n/HFHk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=HHfrIDVu; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=kL/slq2j; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="HHfrIDVu"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="kL/slq2j" From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1764571905; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=OptVbElXMa0PhNkDv0aQk3y0SpFTQVJwn4aaIA0VaCo=; b=HHfrIDVuH8TRkebnG03kosopSnNZl250RX4imVWOaR6ZDqV+Cb5EO1btoTKsEYUVAIkv9S vDs55qZ3Oc4/cPPQeL8rzJ0bdOiu1g9swJDQeX3EJK84Yp1q/WNq5g/lpUDPhEY6a1Pabc gFAOFFhn+yYDkXmjKlGltnhdr3lPu4bLWMO19voNgwj+wSaV4r+kY7rkhMop9l5GghzOip f/qHJ105BdKMeZ5+a4ojcpTmgu7dbM+HjjNY/yku57sggDwMu0jZVVqB+Eh4bPWEE9XPTh RfXaYEusbL9nnHDsmvbkoWR6rEFQnhOvg6q+/ncI6uoI5NT7JbEn9cU+uPgRQw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1764571905; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=OptVbElXMa0PhNkDv0aQk3y0SpFTQVJwn4aaIA0VaCo=; b=kL/slq2jHD4JYKAdPxPDZsfQliQghAPiz7F/dXbO9+YmQ62h2qAUKDcsgQT58eX6K+5xZT y8Z/JqPs5DLDdYAA== To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, x86@kernel.org Subject: [GIT pull] timers/core for v6.19-rc1 References: <176457119565.1888260.10012195384143368631.tglx@xen13> Message-ID: <176457122251.1888260.91531689314335034.tglx@xen13> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Mon, 1 Dec 2025 07:51:38 +0100 (CET) Linus, please pull the latest timers/core branch from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-core-20= 25-11-30 up to: 7dec062cfcf2: timers/migration: Exclude isolated cpus from hierarchy Update to the time/timers core: - Prevent a thundering herd problem when the timekeeper CPU is delayed and a large number of CPUs compete to acquire jiffies_lock to do the update. Limit it to one CPU with a separate "uncontended" atomic variable. - A set of improvements for the timer migration mechanism: - Support imbalanced NUMA trees correctly =20 - Support dynamic exclusion of CPUs from the migrator duty to allow the cpuset/isolation mechanism to exclude them from handling timers of remote idle CPUs. - The usual small updates, cleanups and enhancements Thanks, tglx ------------------> Frederic Weisbecker (6): timers/migration: Convert "while" loops to use "for" timers/migration: Remove locking on group connection timers/migration: Fix imbalanced NUMA trees timers/migration: Assert that hotplug preparing CPU is part of stable= active hierarchy timers/migration: Remove unused "cpu" parameter from tmigr_get_group() timers/migration: Remove dead code handling idle CPU checking for rem= ote timers Gabriele Monaco (6): timers/migration: Rename 'online' bit to 'available' timers/migration: Add mask for CPUs available in the hierarchy timers/migration: Use scoped_guard on available flag set/clear cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_is= olation_cpumasks() sched/isolation: Force housekeeping if isolcpus and nohz_full don't l= eave any timers/migration: Exclude isolated cpus from hierarchy Jianyun Gao (1): time: Fix a few typos in time[r] related code comments Steve Wahl (1): tick/sched: Limit non-timekeeper CPUs calling jiffies update Sunday Adelodun (1): time: tick-oneshot: Add missing Return and parameter descriptions to = kernel-doc Thomas Wei=C3=9Fschuh (2): hrtimer: Store time as ktime_t in restart block selftests/timers/nanosleep: Add tests for return of remaining time Wake Liu (1): selftests/timers: Clean up kernel version check in posix_timers Yury Norov (1): cpumask: Add initialiser to use cleanup helpers include/linux/cpumask.h | 2 + include/linux/delay.h | 8 +- include/linux/restart_block.h | 2 +- include/linux/timer.h | 9 + include/trace/events/timer_migration.h | 4 +- kernel/cgroup/cpuset.c | 15 +- kernel/sched/isolation.c | 23 ++ kernel/time/hrtimer.c | 4 +- kernel/time/posix-cpu-timers.c | 4 +- kernel/time/posix-timers.c | 2 +- kernel/time/tick-oneshot.c | 20 +- kernel/time/tick-sched.c | 30 +- kernel/time/timer_migration.c | 487 +++++++++++++++++-----= ---- kernel/time/timer_migration.h | 2 +- tools/testing/selftests/timers/nanosleep.c | 55 +++ tools/testing/selftests/timers/posix_timers.c | 32 +- 16 files changed, 503 insertions(+), 196 deletions(-) diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index ff8f41ab7ce6..68be522449ec 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -1005,6 +1005,7 @@ static __always_inline unsigned int cpumask_size(void) =20 #define this_cpu_cpumask_var_ptr(x) this_cpu_read(x) #define __cpumask_var_read_mostly __read_mostly +#define CPUMASK_VAR_NULL NULL =20 bool alloc_cpumask_var_node(cpumask_var_t *mask, gfp_t flags, int node); =20 @@ -1051,6 +1052,7 @@ static __always_inline bool cpumask_available(cpumask= _var_t mask) =20 #define this_cpu_cpumask_var_ptr(x) this_cpu_ptr(x) #define __cpumask_var_read_mostly +#define CPUMASK_VAR_NULL {} =20 static __always_inline bool alloc_cpumask_var(cpumask_var_t *mask, gfp_t f= lags) { diff --git a/include/linux/delay.h b/include/linux/delay.h index 89866bab100d..46412c00033a 100644 --- a/include/linux/delay.h +++ b/include/linux/delay.h @@ -68,7 +68,7 @@ void usleep_range_state(unsigned long min, unsigned long = max, * @min: Minimum time in microseconds to sleep * @max: Maximum time in microseconds to sleep * - * For basic information please refere to usleep_range_state(). + * For basic information please refer to usleep_range_state(). * * The task will be in the state TASK_UNINTERRUPTIBLE during the sleep. */ @@ -82,10 +82,10 @@ static inline void usleep_range(unsigned long min, unsi= gned long max) * @min: Minimum time in microseconds to sleep * @max: Maximum time in microseconds to sleep * - * For basic information please refere to usleep_range_state(). + * For basic information please refer to usleep_range_state(). * * The sleeping task has the state TASK_IDLE during the sleep to prevent - * contribution to the load avarage. + * contribution to the load average. */ static inline void usleep_range_idle(unsigned long min, unsigned long max) { @@ -96,7 +96,7 @@ static inline void usleep_range_idle(unsigned long min, u= nsigned long max) * ssleep - wrapper for seconds around msleep * @seconds: Requested sleep duration in seconds * - * Please refere to msleep() for detailed information. + * Please refer to msleep() for detailed information. */ static inline void ssleep(unsigned int seconds) { diff --git a/include/linux/restart_block.h b/include/linux/restart_block.h index 7e50bbc94e47..36ddfa1ec301 100644 --- a/include/linux/restart_block.h +++ b/include/linux/restart_block.h @@ -43,7 +43,7 @@ struct restart_block { struct __kernel_timespec __user *rmtp; struct old_timespec32 __user *compat_rmtp; }; - u64 expires; + ktime_t expires; } nanosleep; /* For poll */ struct { diff --git a/include/linux/timer.h b/include/linux/timer.h index 0414d9e6b4fc..62e1cea71125 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -188,4 +188,13 @@ int timers_dead_cpu(unsigned int cpu); #define timers_dead_cpu NULL #endif =20 +#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON) +extern int tmigr_isolated_exclude_cpumask(struct cpumask *exclude_cpumask); +#else +static inline int tmigr_isolated_exclude_cpumask(struct cpumask *exclude_c= pumask) +{ + return 0; +} +#endif + #endif diff --git a/include/trace/events/timer_migration.h b/include/trace/events/= timer_migration.h index 47db5eaf2f9a..61171b13c687 100644 --- a/include/trace/events/timer_migration.h +++ b/include/trace/events/timer_migration.h @@ -173,14 +173,14 @@ DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_active, TP_ARGS(tmc) ); =20 -DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_online, +DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_available, =20 TP_PROTO(struct tmigr_cpu *tmc), =20 TP_ARGS(tmc) ); =20 -DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_offline, +DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_unavailable, =20 TP_PROTO(struct tmigr_cpu *tmc), =20 diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 27adb04df675..bfc3b319e1c0 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1339,7 +1339,7 @@ static bool partition_xcpus_del(int old_prs, struct c= puset *parent, return isolcpus_updated; } =20 -static void update_unbound_workqueue_cpumask(bool isolcpus_updated) +static void update_isolation_cpumasks(bool isolcpus_updated) { int ret; =20 @@ -1350,6 +1350,9 @@ static void update_unbound_workqueue_cpumask(bool iso= lcpus_updated) =20 ret =3D workqueue_unbound_exclude_cpumask(isolated_cpus); WARN_ON_ONCE(ret < 0); + + ret =3D tmigr_isolated_exclude_cpumask(isolated_cpus); + WARN_ON_ONCE(ret < 0); } =20 /** @@ -1470,7 +1473,7 @@ static int remote_partition_enable(struct cpuset *cs,= int new_prs, list_add(&cs->remote_sibling, &remote_children); cpumask_copy(cs->effective_xcpus, tmp->new_cpus); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); + update_isolation_cpumasks(isolcpus_updated); cpuset_force_rebuild(); cs->prs_err =3D 0; =20 @@ -1511,7 +1514,7 @@ static void remote_partition_disable(struct cpuset *c= s, struct tmpmasks *tmp) compute_effective_exclusive_cpumask(cs, NULL, NULL); reset_partition_data(cs); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); + update_isolation_cpumasks(isolcpus_updated); cpuset_force_rebuild(); =20 /* @@ -1580,7 +1583,7 @@ static void remote_cpus_update(struct cpuset *cs, str= uct cpumask *xcpus, if (xcpus) cpumask_copy(cs->exclusive_cpus, xcpus); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); + update_isolation_cpumasks(isolcpus_updated); if (adding || deleting) cpuset_force_rebuild(); =20 @@ -1943,7 +1946,7 @@ static int update_parent_effective_cpumask(struct cpu= set *cs, int cmd, WARN_ON_ONCE(parent->nr_subparts < 0); } spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); + update_isolation_cpumasks(isolcpus_updated); =20 if ((old_prs !=3D new_prs) && (cmd =3D=3D partcmd_update)) update_partition_exclusive_flag(cs, new_prs); @@ -2968,7 +2971,7 @@ static int update_prstate(struct cpuset *cs, int new_= prs) else if (isolcpus_updated) isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus); spin_unlock_irq(&callback_lock); - update_unbound_workqueue_cpumask(isolcpus_updated); + update_isolation_cpumasks(isolcpus_updated); =20 /* Force update if switching back to member & update effective_xcpus */ update_cpumasks_hier(cs, &tmpmask, !new_prs); diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index a4cf17b1fab0..3ad0d6df6a0a 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -167,6 +167,29 @@ static int __init housekeeping_setup(char *str, unsign= ed long flags) } } =20 + /* + * Check the combination of nohz_full and isolcpus=3Ddomain, + * necessary to avoid problems with the timer migration + * hierarchy. managed_irq is ignored by this check since it + * isn't considered in the timer migration logic. + */ + iter_flags =3D housekeeping.flags & (HK_FLAG_KERNEL_NOISE | HK_FLAG_DOMA= IN); + type =3D find_first_bit(&iter_flags, HK_TYPE_MAX); + /* + * Pass the check if none of these flags were previously set or + * are not in the current selection. + */ + iter_flags =3D flags & (HK_FLAG_KERNEL_NOISE | HK_FLAG_DOMAIN); + first_cpu =3D (type =3D=3D HK_TYPE_MAX || !iter_flags) ? 0 : + cpumask_first_and_and(cpu_present_mask, + housekeeping_staging, housekeeping.cpumasks[type]); + if (first_cpu >=3D min(nr_cpu_ids, setup_max_cpus)) { + pr_warn("Housekeeping: must include one present CPU " + "neither in nohz_full=3D nor in isolcpus=3Ddomain, " + "ignoring setting %s\n", str); + goto free_housekeeping_staging; + } + iter_flags =3D flags & ~housekeeping.flags; =20 for_each_set_bit(type, &iter_flags, HK_TYPE_MAX) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 7e7b2b471bae..9c77e5c72556 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -2145,7 +2145,7 @@ static long __sched hrtimer_nanosleep_restart(struct = restart_block *restart) int ret; =20 hrtimer_setup_sleeper_on_stack(&t, restart->nanosleep.clockid, HRTIMER_MO= DE_ABS); - hrtimer_set_expires_tv64(&t.timer, restart->nanosleep.expires); + hrtimer_set_expires(&t.timer, restart->nanosleep.expires); ret =3D do_nanosleep(&t, HRTIMER_MODE_ABS); destroy_hrtimer_on_stack(&t.timer); return ret; @@ -2172,7 +2172,7 @@ long hrtimer_nanosleep(ktime_t rqtp, const enum hrtim= er_mode mode, =20 restart =3D ¤t->restart_block; restart->nanosleep.clockid =3D t.timer.base->clockid; - restart->nanosleep.expires =3D hrtimer_get_expires_tv64(&t.timer); + restart->nanosleep.expires =3D hrtimer_get_expires(&t.timer); set_restart_fn(restart, hrtimer_nanosleep_restart); out: destroy_hrtimer_on_stack(&t.timer); diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c index 2e5b89d7d866..0de2bb7cbec0 100644 --- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -1557,7 +1557,7 @@ static int do_cpu_nanosleep(const clockid_t which_clo= ck, int flags, * Report back to the user the time still remaining. */ restart =3D ¤t->restart_block; - restart->nanosleep.expires =3D expires; + restart->nanosleep.expires =3D ns_to_ktime(expires); if (restart->nanosleep.type !=3D TT_NONE) error =3D nanosleep_copyout(restart, &it.it_value); } @@ -1599,7 +1599,7 @@ static long posix_cpu_nsleep_restart(struct restart_b= lock *restart_block) clockid_t which_clock =3D restart_block->nanosleep.clockid; struct timespec64 t; =20 - t =3D ns_to_timespec64(restart_block->nanosleep.expires); + t =3D ktime_to_timespec64(restart_block->nanosleep.expires); =20 return do_cpu_nanosleep(which_clock, TIMER_ABSTIME, &t); } diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index aa3120104a51..36dbb8146517 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -1242,7 +1242,7 @@ SYSCALL_DEFINE2(clock_adjtime, const clockid_t, which= _clock, * sys_clock_settime(). The kernel internal timekeeping is always using * nanoseconds precision independent of the clocksource device which is * used to read the time from. The resolution of that device only - * affects the presicion of the time returned by sys_clock_gettime(). + * affects the precision of the time returned by sys_clock_gettime(). * * Returns: * 0 Success. @tp contains the resolution diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c index 5e2c2c26b3cc..ffee943d796d 100644 --- a/kernel/time/tick-oneshot.c +++ b/kernel/time/tick-oneshot.c @@ -19,6 +19,10 @@ =20 /** * tick_program_event - program the CPU local timer device for the next ev= ent + * @expires: the time at which the next timer event should occur + * @force: flag to force reprograming even if the event time hasn't changed + * + * Return: 0 on success, negative error code on failure */ int tick_program_event(ktime_t expires, int force) { @@ -57,6 +61,13 @@ void tick_resume_oneshot(void) =20 /** * tick_setup_oneshot - setup the event device for oneshot mode (hres or n= ohz) + * @newdev: Pointer to the clock event device to configure + * @handler: Function to be called when the event device triggers an inter= rupt + * @next_event: Initial expiry time for the next event (in ktime) + * + * Configures the specified clock event device for onshot mode, + * assigns the given handler as its event callback, and programs + * the device to trigger at the specified next event time. */ void tick_setup_oneshot(struct clock_event_device *newdev, void (*handler)(struct clock_event_device *), @@ -69,6 +80,10 @@ void tick_setup_oneshot(struct clock_event_device *newde= v, =20 /** * tick_switch_to_oneshot - switch to oneshot mode + * @handler: function to call when an event occurs on the tick device + * + * Return: 0 on success, -EINVAL if the tick device is not present, + * not functional, or does not support oneshot mode. */ int tick_switch_to_oneshot(void (*handler)(struct clock_event_device *)) { @@ -101,7 +116,7 @@ int tick_switch_to_oneshot(void (*handler)(struct clock= _event_device *)) /** * tick_oneshot_mode_active - check whether the system is in oneshot mode * - * returns 1 when either nohz or highres are enabled. otherwise 0. + * Return: 1 when either nohz or highres are enabled, otherwise 0. */ int tick_oneshot_mode_active(void) { @@ -120,6 +135,9 @@ int tick_oneshot_mode_active(void) * tick_init_highres - switch to high resolution mode * * Called with interrupts disabled. + * + * Return: 0 on success, -EINVAL if the tick device cannot switch + * to oneshot/high-resolution mode. */ int tick_init_highres(void) { diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index c527b421c865..3ff3eb1f90d0 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -201,6 +201,27 @@ static inline void tick_sched_flag_clear(struct tick_s= ched *ts, ts->flags &=3D ~flag; } =20 +/* + * Allow only one non-timekeeper CPU at a time update jiffies from + * the timer tick. + * + * Returns true if update was run. + */ +static bool tick_limited_update_jiffies64(struct tick_sched *ts, ktime_t n= ow) +{ + static atomic_t in_progress; + int inp; + + inp =3D atomic_read(&in_progress); + if (inp || !atomic_try_cmpxchg(&in_progress, &inp, 1)) + return false; + + if (ts->last_tick_jiffies =3D=3D jiffies) + tick_do_update_jiffies64(now); + atomic_set(&in_progress, 0); + return true; +} + #define MAX_STALLED_JIFFIES 5 =20 static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now) @@ -239,10 +260,11 @@ static void tick_sched_do_timer(struct tick_sched *ts= , ktime_t now) ts->stalled_jiffies =3D 0; ts->last_tick_jiffies =3D READ_ONCE(jiffies); } else { - if (++ts->stalled_jiffies =3D=3D MAX_STALLED_JIFFIES) { - tick_do_update_jiffies64(now); - ts->stalled_jiffies =3D 0; - ts->last_tick_jiffies =3D READ_ONCE(jiffies); + if (++ts->stalled_jiffies >=3D MAX_STALLED_JIFFIES) { + if (tick_limited_update_jiffies64(ts, now)) { + ts->stalled_jiffies =3D 0; + ts->last_tick_jiffies =3D READ_ONCE(jiffies); + } } } =20 diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c index c0c54dc5314c..18dda1aa782d 100644 --- a/kernel/time/timer_migration.c +++ b/kernel/time/timer_migration.c @@ -10,6 +10,7 @@ #include #include #include +#include =20 #include "timer_migration.h" #include "tick-internal.h" @@ -420,14 +421,54 @@ static struct list_head *tmigr_level_list __read_most= ly; static unsigned int tmigr_hierarchy_levels __read_mostly; static unsigned int tmigr_crossnode_level __read_mostly; =20 +static struct tmigr_group *tmigr_root; + static DEFINE_PER_CPU(struct tmigr_cpu, tmigr_cpu); =20 +/* + * CPUs available for timer migration. + * Protected by cpuset_mutex (with cpus_read_lock held) or cpus_write_lock. + * Additionally tmigr_available_mutex serializes set/clear operations with= each other. + */ +static cpumask_var_t tmigr_available_cpumask; +static DEFINE_MUTEX(tmigr_available_mutex); + +/* Enabled during late initcall */ +static DEFINE_STATIC_KEY_FALSE(tmigr_exclude_isolated); + #define TMIGR_NONE 0xFF #define BIT_CNT 8 =20 static inline bool tmigr_is_not_available(struct tmigr_cpu *tmc) { - return !(tmc->tmgroup && tmc->online); + return !(tmc->tmgroup && tmc->available); +} + +/* + * Returns true if @cpu should be excluded from the hierarchy as isolated. + * Domain isolated CPUs don't participate in timer migration, nohz_full CP= Us + * are still part of the hierarchy but become idle (from a tick and timer + * migration perspective) when they stop their tick. This lets the timekee= ping + * CPU handle their global timers. Marking also isolated CPUs as idle woul= d be + * too costly, hence they are completely excluded from the hierarchy. + * This check is necessary, for instance, to prevent offline isolated CPUs= from + * being incorrectly marked as available once getting back online. + * + * This function returns false during early boot and the isolation logic is + * enabled only after isolated CPUs are marked as unavailable at late boot. + * The tick CPU can be isolated at boot, however we cannot mark it as + * unavailable to avoid having no global migrator for the nohz_full CPUs. = This + * should be ensured by the callers of this function: implicitly from hotp= lug + * callbacks and explicitly in tmigr_init_isolation() and + * tmigr_isolated_exclude_cpumask(). + */ +static inline bool tmigr_is_isolated(int cpu) +{ + if (!static_branch_unlikely(&tmigr_exclude_isolated)) + return false; + return (!housekeeping_cpu(cpu, HK_TYPE_DOMAIN) || + cpuset_cpu_is_isolated(cpu)) && + housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE); } =20 /* @@ -502,11 +543,6 @@ static bool tmigr_check_lonely(struct tmigr_group *gro= up) * @now: timer base monotonic * @check: is set if there is the need to handle remote timers; * required in tmigr_requires_handle_remote() only - * @tmc_active: this flag indicates, whether the CPU which triggers - * the hierarchy walk is !idle in the timer migration - * hierarchy. When the CPU is idle and the whole hierarchy is - * idle, only the first event of the top level has to be - * considered. */ struct tmigr_walk { u64 nextexp; @@ -517,16 +553,13 @@ struct tmigr_walk { unsigned long basej; u64 now; bool check; - bool tmc_active; }; =20 typedef bool (*up_f)(struct tmigr_group *, struct tmigr_group *, struct tm= igr_walk *); =20 -static void __walk_groups(up_f up, struct tmigr_walk *data, - struct tmigr_cpu *tmc) +static void __walk_groups_from(up_f up, struct tmigr_walk *data, + struct tmigr_group *child, struct tmigr_group *group) { - struct tmigr_group *child =3D NULL, *group =3D tmc->tmgroup; - do { WARN_ON_ONCE(group->level >=3D tmigr_hierarchy_levels); =20 @@ -544,6 +577,12 @@ static void __walk_groups(up_f up, struct tmigr_walk *= data, } while (group); } =20 +static void __walk_groups(up_f up, struct tmigr_walk *data, + struct tmigr_cpu *tmc) +{ + __walk_groups_from(up, data, NULL, tmc->tmgroup); +} + static void walk_groups(up_f up, struct tmigr_walk *data, struct tmigr_cpu= *tmc) { lockdep_assert_held(&tmc->lock); @@ -708,7 +747,7 @@ void tmigr_cpu_activate(void) /* * Returns true, if there is nothing to be propagated to the next level * - * @data->firstexp is set to expiry of first gobal event of the (top level= of + * @data->firstexp is set to expiry of first global event of the (top leve= l of * the) hierarchy, but only when hierarchy is completely idle. * * The child and group states need to be read under the lock, to prevent a= race @@ -926,7 +965,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u= 64 now, * updated the event takes care when hierarchy is completely * idle. Otherwise the migrator does it as the event is enqueued. */ - if (!tmc->online || tmc->remote || tmc->cpuevt.ignore || + if (!tmc->available || tmc->remote || tmc->cpuevt.ignore || now < tmc->cpuevt.nextevt.expires) { raw_spin_unlock_irq(&tmc->lock); return; @@ -973,7 +1012,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, = u64 now, * (See also section "Required event and timerqueue update after a * remote expiry" in the documentation at the top) */ - if (!tmc->online || !tmc->idle) { + if (!tmc->available || !tmc->idle) { timer_unlock_remote_bases(cpu); goto unlock; } @@ -1113,15 +1152,6 @@ static bool tmigr_requires_handle_remote_up(struct t= migr_group *group, */ if (!tmigr_check_migrator(group, childmask)) return true; - - /* - * When there is a parent group and the CPU which triggered the - * hierarchy walk is not active, proceed the walk to reach the top level - * group before reading the next_expiry value. - */ - if (group->parent && !data->tmc_active) - return false; - /* * The lock is required on 32bit architectures to read the variable * consistently with a concurrent writer. On 64bit the lock is not @@ -1166,7 +1196,6 @@ bool tmigr_requires_handle_remote(void) data.now =3D get_jiffies_update(&jif); data.childmask =3D tmc->groupmask; data.firstexp =3D KTIME_MAX; - data.tmc_active =3D !tmc->idle; data.check =3D false; =20 /* @@ -1432,38 +1461,43 @@ static long tmigr_trigger_active(void *unused) { struct tmigr_cpu *tmc =3D this_cpu_ptr(&tmigr_cpu); =20 - WARN_ON_ONCE(!tmc->online || tmc->idle); + WARN_ON_ONCE(!tmc->available || tmc->idle); =20 return 0; } =20 -static int tmigr_cpu_offline(unsigned int cpu) +static int tmigr_clear_cpu_available(unsigned int cpu) { struct tmigr_cpu *tmc =3D this_cpu_ptr(&tmigr_cpu); int migrator; u64 firstexp; =20 - raw_spin_lock_irq(&tmc->lock); - tmc->online =3D false; - WRITE_ONCE(tmc->wakeup, KTIME_MAX); + guard(mutex)(&tmigr_available_mutex); =20 - /* - * CPU has to handle the local events on his own, when on the way to - * offline; Therefore nextevt value is set to KTIME_MAX - */ - firstexp =3D __tmigr_cpu_deactivate(tmc, KTIME_MAX); - trace_tmigr_cpu_offline(tmc); - raw_spin_unlock_irq(&tmc->lock); + cpumask_clear_cpu(cpu, tmigr_available_cpumask); + scoped_guard(raw_spinlock_irq, &tmc->lock) { + if (!tmc->available) + return 0; + tmc->available =3D false; + WRITE_ONCE(tmc->wakeup, KTIME_MAX); + + /* + * CPU has to handle the local events on his own, when on the way to + * offline; Therefore nextevt value is set to KTIME_MAX + */ + firstexp =3D __tmigr_cpu_deactivate(tmc, KTIME_MAX); + trace_tmigr_cpu_unavailable(tmc); + } =20 if (firstexp !=3D KTIME_MAX) { - migrator =3D cpumask_any_but(cpu_online_mask, cpu); + migrator =3D cpumask_any(tmigr_available_cpumask); work_on_cpu(migrator, tmigr_trigger_active, NULL); } =20 return 0; } =20 -static int tmigr_cpu_online(unsigned int cpu) +static int tmigr_set_cpu_available(unsigned int cpu) { struct tmigr_cpu *tmc =3D this_cpu_ptr(&tmigr_cpu); =20 @@ -1471,16 +1505,123 @@ static int tmigr_cpu_online(unsigned int cpu) if (WARN_ON_ONCE(!tmc->tmgroup)) return -EINVAL; =20 - raw_spin_lock_irq(&tmc->lock); - trace_tmigr_cpu_online(tmc); - tmc->idle =3D timer_base_is_idle(); - if (!tmc->idle) - __tmigr_cpu_activate(tmc); - tmc->online =3D true; - raw_spin_unlock_irq(&tmc->lock); + if (tmigr_is_isolated(cpu)) + return 0; + + guard(mutex)(&tmigr_available_mutex); + + cpumask_set_cpu(cpu, tmigr_available_cpumask); + scoped_guard(raw_spinlock_irq, &tmc->lock) { + if (tmc->available) + return 0; + trace_tmigr_cpu_available(tmc); + tmc->idle =3D timer_base_is_idle(); + if (!tmc->idle) + __tmigr_cpu_activate(tmc); + tmc->available =3D true; + } return 0; } =20 +static void tmigr_cpu_isolate(struct work_struct *ignored) +{ + tmigr_clear_cpu_available(smp_processor_id()); +} + +static void tmigr_cpu_unisolate(struct work_struct *ignored) +{ + tmigr_set_cpu_available(smp_processor_id()); +} + +/** + * tmigr_isolated_exclude_cpumask - Exclude given CPUs from hierarchy + * @exclude_cpumask: the cpumask to be excluded from timer migration hiera= rchy + * + * This function can be called from cpuset code to provide the new set of + * isolated CPUs that should be excluded from the hierarchy. + * Online CPUs not present in exclude_cpumask but already excluded are bro= ught + * back to the hierarchy. + * Functions to isolate/unisolate need to be called locally and can sleep. + */ +int tmigr_isolated_exclude_cpumask(struct cpumask *exclude_cpumask) +{ + struct work_struct __percpu *works __free(free_percpu) =3D + alloc_percpu(struct work_struct); + cpumask_var_t cpumask __free(free_cpumask_var) =3D CPUMASK_VAR_NULL; + int cpu; + + lockdep_assert_cpus_held(); + + if (!works) + return -ENOMEM; + if (!alloc_cpumask_var(&cpumask, GFP_KERNEL)) + return -ENOMEM; + + /* + * First set previously isolated CPUs as available (unisolate). + * This cpumask contains only CPUs that switched to available now. + */ + cpumask_andnot(cpumask, cpu_online_mask, exclude_cpumask); + cpumask_andnot(cpumask, cpumask, tmigr_available_cpumask); + + for_each_cpu(cpu, cpumask) { + struct work_struct *work =3D per_cpu_ptr(works, cpu); + + INIT_WORK(work, tmigr_cpu_unisolate); + schedule_work_on(cpu, work); + } + for_each_cpu(cpu, cpumask) + flush_work(per_cpu_ptr(works, cpu)); + + /* + * Then clear previously available CPUs (isolate). + * This cpumask contains only CPUs that switched to not available now. + * There cannot be overlap with the newly available ones. + */ + cpumask_and(cpumask, exclude_cpumask, tmigr_available_cpumask); + cpumask_and(cpumask, cpumask, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)); + /* + * Handle this here and not in the cpuset code because exclude_cpumask + * might include also the tick CPU if included in isolcpus. + */ + for_each_cpu(cpu, cpumask) { + if (!tick_nohz_cpu_hotpluggable(cpu)) { + cpumask_clear_cpu(cpu, cpumask); + break; + } + } + + for_each_cpu(cpu, cpumask) { + struct work_struct *work =3D per_cpu_ptr(works, cpu); + + INIT_WORK(work, tmigr_cpu_isolate); + schedule_work_on(cpu, work); + } + for_each_cpu(cpu, cpumask) + flush_work(per_cpu_ptr(works, cpu)); + + return 0; +} + +static int __init tmigr_init_isolation(void) +{ + cpumask_var_t cpumask __free(free_cpumask_var) =3D CPUMASK_VAR_NULL; + + static_branch_enable(&tmigr_exclude_isolated); + + if (!housekeeping_enabled(HK_TYPE_DOMAIN)) + return 0; + if (!alloc_cpumask_var(&cpumask, GFP_KERNEL)) + return -ENOMEM; + + cpumask_andnot(cpumask, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_D= OMAIN)); + + /* Protect against RCU torture hotplug testing */ + guard(cpus_read_lock)(); + return tmigr_isolated_exclude_cpumask(cpumask); +} +late_initcall(tmigr_init_isolation); + static void tmigr_init_group(struct tmigr_group *group, unsigned int lvl, int node) { @@ -1498,21 +1639,6 @@ static void tmigr_init_group(struct tmigr_group *gro= up, unsigned int lvl, s.seq =3D 0; atomic_set(&group->migr_state, s.state); =20 - /* - * If this is a new top-level, prepare its groupmask in advance. - * This avoids accidents where yet another new top-level is - * created in the future and made visible before the current groupmask. - */ - if (list_empty(&tmigr_level_list[lvl])) { - group->groupmask =3D BIT(0); - /* - * The previous top level has prepared its groupmask already, - * simply account it as the first child. - */ - if (lvl > 0) - group->num_children =3D 1; - } - timerqueue_init_head(&group->events); timerqueue_init(&group->groupevt.nextevt); group->groupevt.nextevt.expires =3D KTIME_MAX; @@ -1520,8 +1646,7 @@ static void tmigr_init_group(struct tmigr_group *grou= p, unsigned int lvl, group->groupevt.ignore =3D true; } =20 -static struct tmigr_group *tmigr_get_group(unsigned int cpu, int node, - unsigned int lvl) +static struct tmigr_group *tmigr_get_group(int node, unsigned int lvl) { struct tmigr_group *tmp, *group =3D NULL; =20 @@ -1567,25 +1692,51 @@ static struct tmigr_group *tmigr_get_group(unsigned= int cpu, int node, return group; } =20 +static bool tmigr_init_root(struct tmigr_group *group, bool activate) +{ + if (!group->parent && group !=3D tmigr_root) { + /* + * This is the new top-level, prepare its groupmask in advance + * to avoid accidents where yet another new top-level is + * created in the future and made visible before this groupmask. + */ + group->groupmask =3D BIT(0); + WARN_ON_ONCE(activate); + + return true; + } + + return false; + +} + static void tmigr_connect_child_parent(struct tmigr_group *child, struct tmigr_group *parent, bool activate) { - struct tmigr_walk data; - - raw_spin_lock_irq(&child->lock); - raw_spin_lock_nested(&parent->lock, SINGLE_DEPTH_NESTING); + if (tmigr_init_root(parent, activate)) { + /* + * The previous top level had prepared its groupmask already, + * simply account it in advance as the first child. If some groups + * have been created between the old and new root due to node + * mismatch, the new root's child will be intialized accordingly. + */ + parent->num_children =3D 1; + } =20 - if (activate) { + /* Connecting old root to new root ? */ + if (!parent->parent && activate) { /* - * @child is the old top and @parent the new one. In this - * case groupmask is pre-initialized and @child already - * accounted, along with its new sibling corresponding to the - * CPU going up. + * @child is the old top, or in case of node mismatch, some + * intermediate group between the old top and the new one in + * @parent. In this case the @child must be pre-accounted above + * as the first child. Its new inactive sibling corresponding + * to the CPU going up has been accounted as the second child. */ - WARN_ON_ONCE(child->groupmask !=3D BIT(0) || parent->num_children !=3D 2= ); + WARN_ON_ONCE(parent->num_children !=3D 2); + child->groupmask =3D BIT(0); } else { - /* Adding @child for the CPU going up to @parent. */ + /* Common case adding @child for the CPU going up to @parent. */ child->groupmask =3D BIT(parent->num_children++); } =20 @@ -1596,87 +1747,61 @@ static void tmigr_connect_child_parent(struct tmigr= _group *child, */ smp_store_release(&child->parent, parent); =20 - raw_spin_unlock(&parent->lock); - raw_spin_unlock_irq(&child->lock); - trace_tmigr_connect_child_parent(child); - - if (!activate) - return; - - /* - * To prevent inconsistent states, active children need to be active in - * the new parent as well. Inactive children are already marked inactive - * in the parent group: - * - * * When new groups were created by tmigr_setup_groups() starting from - * the lowest level (and not higher then one level below the current - * top level), then they are not active. They will be set active when - * the new online CPU comes active. - * - * * But if a new group above the current top level is required, it is - * mandatory to propagate the active state of the already existing - * child to the new parent. So tmigr_connect_child_parent() is - * executed with the formerly top level group (child) and the newly - * created group (parent). - * - * * It is ensured that the child is active, as this setup path is - * executed in hotplug prepare callback. This is exectued by an - * already connected and !idle CPU. Even if all other CPUs go idle, - * the CPU executing the setup will be responsible up to current top - * level group. And the next time it goes inactive, it will release - * the new childmask and parent to subsequent walkers through this - * @child. Therefore propagate active state unconditionally. - */ - data.childmask =3D child->groupmask; - - /* - * There is only one new level per time (which is protected by - * tmigr_mutex). When connecting the child and the parent and set the - * child active when the parent is inactive, the parent needs to be the - * uppermost level. Otherwise there went something wrong! - */ - WARN_ON(!tmigr_active_up(parent, child, &data) && parent->parent); } =20 -static int tmigr_setup_groups(unsigned int cpu, unsigned int node) +static int tmigr_setup_groups(unsigned int cpu, unsigned int node, + struct tmigr_group *start, bool activate) { struct tmigr_group *group, *child, **stack; - int top =3D 0, err =3D 0, i =3D 0; - struct list_head *lvllist; + int i, top =3D 0, err =3D 0, start_lvl =3D 0; + bool root_mismatch =3D false; =20 stack =3D kcalloc(tmigr_hierarchy_levels, sizeof(*stack), GFP_KERNEL); if (!stack) return -ENOMEM; =20 - do { - group =3D tmigr_get_group(cpu, node, i); + if (start) { + stack[start->level] =3D start; + start_lvl =3D start->level + 1; + } + + if (tmigr_root) + root_mismatch =3D tmigr_root->numa_node !=3D node; + + for (i =3D start_lvl; i < tmigr_hierarchy_levels; i++) { + group =3D tmigr_get_group(node, i); if (IS_ERR(group)) { err =3D PTR_ERR(group); + i--; break; } =20 top =3D i; - stack[i++] =3D group; + stack[i] =3D group; =20 /* * When booting only less CPUs of a system than CPUs are - * available, not all calculated hierarchy levels are required. + * available, not all calculated hierarchy levels are required, + * unless a node mismatch is detected. * * The loop is aborted as soon as the highest level, which might * be different from tmigr_hierarchy_levels, contains only a - * single group. + * single group, unless the nodes mismatch below tmigr_crossnode_level */ - if (group->parent || list_is_singular(&tmigr_level_list[i - 1])) + if (group->parent) break; + if ((!root_mismatch || i >=3D tmigr_crossnode_level) && + list_is_singular(&tmigr_level_list[i])) + break; + } =20 - } while (i < tmigr_hierarchy_levels); - - /* Assert single root */ - WARN_ON_ONCE(!err && !group->parent && !list_is_singular(&tmigr_level_lis= t[top])); + /* Assert single root without parent */ + if (WARN_ON_ONCE(i >=3D tmigr_hierarchy_levels)) + return -EINVAL; =20 - while (i > 0) { - group =3D stack[--i]; + for (; i >=3D start_lvl; i--) { + group =3D stack[i]; =20 if (err < 0) { list_del(&group->list); @@ -1692,12 +1817,10 @@ static int tmigr_setup_groups(unsigned int cpu, uns= igned int node) if (i =3D=3D 0) { struct tmigr_cpu *tmc =3D per_cpu_ptr(&tmigr_cpu, cpu); =20 - raw_spin_lock_irq(&group->lock); - tmc->tmgroup =3D group; tmc->groupmask =3D BIT(group->num_children++); =20 - raw_spin_unlock_irq(&group->lock); + tmigr_init_root(group, activate); =20 trace_tmigr_connect_cpu_parent(tmc); =20 @@ -1705,42 +1828,58 @@ static int tmigr_setup_groups(unsigned int cpu, uns= igned int node) continue; } else { child =3D stack[i - 1]; - /* Will be activated at online time */ - tmigr_connect_child_parent(child, group, false); + tmigr_connect_child_parent(child, group, activate); } + } =20 - /* check if uppermost level was newly created */ - if (top !=3D i) - continue; - - WARN_ON_ONCE(top =3D=3D 0); + if (err < 0) + goto out; =20 - lvllist =3D &tmigr_level_list[top]; + if (activate) { + struct tmigr_walk data; + union tmigr_state state; =20 /* - * Newly created root level should have accounted the upcoming - * CPU's child group and pre-accounted the old root. + * To prevent inconsistent states, active children need to be active in + * the new parent as well. Inactive children are already marked inactive + * in the parent group: + * + * * When new groups were created by tmigr_setup_groups() starting from + * the lowest level, then they are not active. They will be set active + * when the new online CPU comes active. + * + * * But if new groups above the current top level are required, it is + * mandatory to propagate the active state of the already existing + * child to the new parents. So tmigr_active_up() activates the + * new parents while walking up from the old root to the new. + * + * * It is ensured that @start is active, as this setup path is + * executed in hotplug prepare callback. This is executed by an + * already connected and !idle CPU. Even if all other CPUs go idle, + * the CPU executing the setup will be responsible up to current top + * level group. And the next time it goes inactive, it will release + * the new childmask and parent to subsequent walkers through this + * @child. Therefore propagate active state unconditionally. */ - if (group->num_children =3D=3D 2 && list_is_singular(lvllist)) { - /* - * The target CPU must never do the prepare work, except - * on early boot when the boot CPU is the target. Otherwise - * it may spuriously activate the old top level group inside - * the new one (nevertheless whether old top level group is - * active or not) and/or release an uninitialized childmask. - */ - WARN_ON_ONCE(cpu =3D=3D raw_smp_processor_id()); - - lvllist =3D &tmigr_level_list[top - 1]; - list_for_each_entry(child, lvllist, list) { - if (child->parent) - continue; + state.state =3D atomic_read(&start->migr_state); + WARN_ON_ONCE(!state.active); + WARN_ON_ONCE(!start->parent); + data.childmask =3D start->groupmask; + __walk_groups_from(tmigr_active_up, &data, start, start->parent); + } =20 - tmigr_connect_child_parent(child, group, true); - } + /* Root update */ + if (list_is_singular(&tmigr_level_list[top])) { + group =3D list_first_entry(&tmigr_level_list[top], + typeof(*group), list); + WARN_ON_ONCE(group->parent); + if (tmigr_root) { + /* Old root should be the same or below */ + WARN_ON_ONCE(tmigr_root->level > top); } + tmigr_root =3D group; } - +out: kfree(stack); =20 return err; @@ -1748,12 +1887,31 @@ static int tmigr_setup_groups(unsigned int cpu, uns= igned int node) =20 static int tmigr_add_cpu(unsigned int cpu) { + struct tmigr_group *old_root =3D tmigr_root; int node =3D cpu_to_node(cpu); int ret; =20 - mutex_lock(&tmigr_mutex); - ret =3D tmigr_setup_groups(cpu, node); - mutex_unlock(&tmigr_mutex); + guard(mutex)(&tmigr_mutex); + + ret =3D tmigr_setup_groups(cpu, node, NULL, false); + + /* Root has changed? Connect the old one to the new */ + if (ret >=3D 0 && old_root && old_root !=3D tmigr_root) { + /* + * The target CPU must never do the prepare work, except + * on early boot when the boot CPU is the target. Otherwise + * it may spuriously activate the old top level group inside + * the new one (nevertheless whether old top level group is + * active or not) and/or release an uninitialized childmask. + */ + WARN_ON_ONCE(cpu =3D=3D raw_smp_processor_id()); + /* + * The (likely) current CPU is expected to be online in the hierarchy, + * otherwise the old root may not be active as expected. + */ + WARN_ON_ONCE(!per_cpu_ptr(&tmigr_cpu, raw_smp_processor_id())->available= ); + ret =3D tmigr_setup_groups(-1, old_root->numa_node, old_root, true); + } =20 return ret; } @@ -1798,6 +1956,11 @@ static int __init tmigr_init(void) if (ncpus =3D=3D 1) return 0; =20 + if (!zalloc_cpumask_var(&tmigr_available_cpumask, GFP_KERNEL)) { + ret =3D -ENOMEM; + goto err; + } + /* * Calculate the required hierarchy levels. Unfortunately there is no * reliable information available, unless all possible CPUs have been @@ -1847,7 +2010,7 @@ static int __init tmigr_init(void) goto err; =20 ret =3D cpuhp_setup_state(CPUHP_AP_TMIGR_ONLINE, "tmigr:online", - tmigr_cpu_online, tmigr_cpu_offline); + tmigr_set_cpu_available, tmigr_clear_cpu_available); if (ret) goto err; =20 diff --git a/kernel/time/timer_migration.h b/kernel/time/timer_migration.h index ae19f70f8170..70879cde6fdd 100644 --- a/kernel/time/timer_migration.h +++ b/kernel/time/timer_migration.h @@ -97,7 +97,7 @@ struct tmigr_group { */ struct tmigr_cpu { raw_spinlock_t lock; - bool online; + bool available; bool idle; bool remote; struct tmigr_group *tmgroup; diff --git a/tools/testing/selftests/timers/nanosleep.c b/tools/testing/sel= ftests/timers/nanosleep.c index 252c6308c569..10badae13ebe 100644 --- a/tools/testing/selftests/timers/nanosleep.c +++ b/tools/testing/selftests/timers/nanosleep.c @@ -116,6 +116,56 @@ int nanosleep_test(int clockid, long long ns) return 0; } =20 +static void dummy_event_handler(int val) +{ + /* No action needed */ +} + +static int nanosleep_test_remaining(int clockid) +{ + struct timespec rqtp =3D {}, rmtp =3D {}; + struct itimerspec itimer =3D {}; + struct sigaction sa =3D {}; + timer_t timer; + int ret; + + sa.sa_handler =3D dummy_event_handler; + ret =3D sigaction(SIGALRM, &sa, NULL); + if (ret) + return -1; + + ret =3D timer_create(clockid, NULL, &timer); + if (ret) + return -1; + + itimer.it_value.tv_nsec =3D NSEC_PER_SEC / 4; + ret =3D timer_settime(timer, 0, &itimer, NULL); + if (ret) + return -1; + + rqtp.tv_nsec =3D NSEC_PER_SEC / 2; + ret =3D clock_nanosleep(clockid, 0, &rqtp, &rmtp); + if (ret !=3D EINTR) + return -1; + + ret =3D timer_delete(timer); + if (ret) + return -1; + + sa.sa_handler =3D SIG_DFL; + ret =3D sigaction(SIGALRM, &sa, NULL); + if (ret) + return -1; + + if (!in_order((struct timespec) {}, rmtp)) + return -1; + + if (!in_order(rmtp, rqtp)) + return -1; + + return 0; +} + int main(int argc, char **argv) { long long length; @@ -150,6 +200,11 @@ int main(int argc, char **argv) } length *=3D 100; } + ret =3D nanosleep_test_remaining(clockid); + if (ret < 0) { + ksft_test_result_fail("%-31s\n", clockstring(clockid)); + ksft_exit_fail(); + } ksft_test_result_pass("%-31s\n", clockstring(clockid)); next: ret =3D 0; diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/= selftests/timers/posix_timers.c index f0eceb0faf34..a563c438ac79 100644 --- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -18,6 +18,7 @@ #include #include #include +#include =20 #include "../kselftest.h" =20 @@ -670,8 +671,14 @@ static void check_timer_create_exact(void) =20 int main(int argc, char **argv) { + bool run_sig_ign_tests =3D ksft_min_kernel_version(6, 13); + ksft_print_header(); - ksft_set_plan(19); + if (run_sig_ign_tests) { + ksft_set_plan(19); + } else { + ksft_set_plan(10); + } =20 ksft_print_msg("Testing posix timers. False negative may happen on CPU ex= ecution \n"); ksft_print_msg("based timers if other threads run on the CPU...\n"); @@ -695,15 +702,20 @@ int main(int argc, char **argv) check_timer_create(CLOCK_PROCESS_CPUTIME_ID, "CLOCK_PROCESS_CPUTIME_ID"); check_timer_distribution(); =20 - check_sig_ign(0); - check_sig_ign(1); - check_rearm(); - check_delete(); - check_sigev_none(CLOCK_MONOTONIC, "CLOCK_MONOTONIC"); - check_sigev_none(CLOCK_PROCESS_CPUTIME_ID, "CLOCK_PROCESS_CPUTIME_ID"); - check_gettime(CLOCK_MONOTONIC, "CLOCK_MONOTONIC"); - check_gettime(CLOCK_PROCESS_CPUTIME_ID, "CLOCK_PROCESS_CPUTIME_ID"); - check_gettime(CLOCK_THREAD_CPUTIME_ID, "CLOCK_THREAD_CPUTIME_ID"); + if (run_sig_ign_tests) { + check_sig_ign(0); + check_sig_ign(1); + check_rearm(); + check_delete(); + check_sigev_none(CLOCK_MONOTONIC, "CLOCK_MONOTONIC"); + check_sigev_none(CLOCK_PROCESS_CPUTIME_ID, "CLOCK_PROCESS_CPUTIME_ID"); + check_gettime(CLOCK_MONOTONIC, "CLOCK_MONOTONIC"); + check_gettime(CLOCK_PROCESS_CPUTIME_ID, "CLOCK_PROCESS_CPUTIME_ID"); + check_gettime(CLOCK_THREAD_CPUTIME_ID, "CLOCK_THREAD_CPUTIME_ID"); + } else { + ksft_print_msg("Skipping SIG_IGN tests on kernel < 6.13\n"); + } + check_overrun(CLOCK_MONOTONIC, "CLOCK_MONOTONIC"); check_overrun(CLOCK_PROCESS_CPUTIME_ID, "CLOCK_PROCESS_CPUTIME_ID"); check_overrun(CLOCK_THREAD_CPUTIME_ID, "CLOCK_THREAD_CPUTIME_ID");