[PATCH] timers: Exclude isolated cpus from timer migation

Gabriele Monaco posted 1 patch 8 months, 1 week ago
There is a newer version of this series
include/linux/timer.h                  |  6 +++
include/trace/events/timer_migration.h |  4 +-
kernel/cgroup/cpuset.c                 | 15 +++---
kernel/time/timer_migration.c          | 72 +++++++++++++++++++++-----
kernel/time/timer_migration.h          |  2 +-
5 files changed, 76 insertions(+), 23 deletions(-)
[PATCH] timers: Exclude isolated cpus from timer migation
Posted by Gabriele Monaco 8 months, 1 week ago
The timer migration mechanism allows timers to move from idle CPUs to
active ones to improve the overall idle time. This is however undesired
when CPU intensive workloads run on isolated cores, as the algorithm
would move the timers from housekeeping to isolated cores, negatively
affecting the isolation.

Exclude isolated cores from the timer migration algorithm, extend the
concept of unavailable cores, currently used for offline ones, to
isolated ones:
 * A core is unavailable if isolated or offline;
 * A core is available if isolated and offline;

Keep a cpumap to easily track unavailable cores and change the concept
of online/offline tmigr to available/unavailable in code and
tracepoints.

A core is considered unavailable as idle if:
 * is in the isolcpus list
 * is in the nohz_full list
 * is in an isolated cpuset

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/linux/timer.h                  |  6 +++
 include/trace/events/timer_migration.h |  4 +-
 kernel/cgroup/cpuset.c                 | 15 +++---
 kernel/time/timer_migration.c          | 72 +++++++++++++++++++++-----
 kernel/time/timer_migration.h          |  2 +-
 5 files changed, 76 insertions(+), 23 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 10596d7c3a346..27fb02aa3d780 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -190,4 +190,10 @@ int timers_dead_cpu(unsigned int cpu);
 #define timers_dead_cpu		NULL
 #endif
 
+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
+extern int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask);
+#else
+static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
+#endif
+
 #endif
diff --git a/include/trace/events/timer_migration.h b/include/trace/events/timer_migration.h
index 47db5eaf2f9ab..61171b13c687c 100644
--- a/include/trace/events/timer_migration.h
+++ b/include/trace/events/timer_migration.h
@@ -173,14 +173,14 @@ DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_active,
 	TP_ARGS(tmc)
 );
 
-DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_online,
+DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_available,
 
 	TP_PROTO(struct tmigr_cpu *tmc),
 
 	TP_ARGS(tmc)
 );
 
-DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_offline,
+DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_unavailable,
 
 	TP_PROTO(struct tmigr_cpu *tmc),
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 306b604300914..47495ba4012b5 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1323,7 +1323,7 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
 	return isolcpus_updated;
 }
 
-static void update_unbound_workqueue_cpumask(bool isolcpus_updated)
+static void update_unbound_interference_cpumask(bool isolcpus_updated)
 {
 	int ret;
 
@@ -1334,6 +1334,9 @@ static void update_unbound_workqueue_cpumask(bool isolcpus_updated)
 
 	ret = workqueue_unbound_exclude_cpumask(isolated_cpus);
 	WARN_ON_ONCE(ret < 0);
+
+	ret = tmigr_isolated_exclude_cpumask(isolated_cpus);
+	WARN_ON_ONCE(ret < 0);
 }
 
 /**
@@ -1454,7 +1457,7 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
 	list_add(&cs->remote_sibling, &remote_children);
 	cpumask_copy(cs->effective_xcpus, tmp->new_cpus);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_unbound_interference_cpumask(isolcpus_updated);
 	cpuset_force_rebuild();
 	cs->prs_err = 0;
 
@@ -1495,7 +1498,7 @@ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
 	compute_effective_exclusive_cpumask(cs, NULL, NULL);
 	reset_partition_data(cs);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_unbound_interference_cpumask(isolcpus_updated);
 	cpuset_force_rebuild();
 
 	/*
@@ -1563,7 +1566,7 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
 	if (xcpus)
 		cpumask_copy(cs->exclusive_cpus, xcpus);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_unbound_interference_cpumask(isolcpus_updated);
 	if (adding || deleting)
 		cpuset_force_rebuild();
 
@@ -1906,7 +1909,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 		WARN_ON_ONCE(parent->nr_subparts < 0);
 	}
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_unbound_interference_cpumask(isolcpus_updated);
 
 	if ((old_prs != new_prs) && (cmd == partcmd_update))
 		update_partition_exclusive_flag(cs, new_prs);
@@ -2931,7 +2934,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
 	else if (isolcpus_updated)
 		isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus);
 	spin_unlock_irq(&callback_lock);
-	update_unbound_workqueue_cpumask(isolcpus_updated);
+	update_unbound_interference_cpumask(isolcpus_updated);
 
 	/* Force update if switching back to member & update effective_xcpus */
 	update_cpumasks_hier(cs, &tmpmask, !new_prs);
diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 2f6330831f084..c1f7de1d28ab8 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -10,6 +10,7 @@
 #include <linux/spinlock.h>
 #include <linux/timerqueue.h>
 #include <trace/events/ipi.h>
+#include <linux/sched/isolation.h>
 
 #include "timer_migration.h"
 #include "tick-internal.h"
@@ -422,12 +423,15 @@ static unsigned int tmigr_crossnode_level __read_mostly;
 
 static DEFINE_PER_CPU(struct tmigr_cpu, tmigr_cpu);
 
+/*  cpumask excluded from migration */
+static cpumask_var_t tmigr_unavailable_cpumask;
+
 #define TMIGR_NONE	0xFF
 #define BIT_CNT		8
 
 static inline bool tmigr_is_not_available(struct tmigr_cpu *tmc)
 {
-	return !(tmc->tmgroup && tmc->online);
+	return !(tmc->tmgroup && tmc->available);
 }
 
 /*
@@ -926,7 +930,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
 	 * updated the event takes care when hierarchy is completely
 	 * idle. Otherwise the migrator does it as the event is enqueued.
 	 */
-	if (!tmc->online || tmc->remote || tmc->cpuevt.ignore ||
+	if (!tmc->available || tmc->remote || tmc->cpuevt.ignore ||
 	    now < tmc->cpuevt.nextevt.expires) {
 		raw_spin_unlock_irq(&tmc->lock);
 		return;
@@ -973,7 +977,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
 	 * (See also section "Required event and timerqueue update after a
 	 * remote expiry" in the documentation at the top)
 	 */
-	if (!tmc->online || !tmc->idle) {
+	if (!tmc->available || !tmc->idle) {
 		timer_unlock_remote_bases(cpu);
 		goto unlock;
 	}
@@ -1435,55 +1439,88 @@ static long tmigr_trigger_active(void *unused)
 {
 	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
 
-	WARN_ON_ONCE(!tmc->online || tmc->idle);
+	WARN_ON_ONCE(!tmc->available || tmc->idle);
 
 	return 0;
 }
 
-static int tmigr_cpu_offline(unsigned int cpu)
+static int tmigr_cpu_unavailable(unsigned int cpu)
 {
-	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
+	struct tmigr_cpu *tmc = per_cpu_ptr(&tmigr_cpu, cpu);
 	int migrator;
 	u64 firstexp;
 
 	raw_spin_lock_irq(&tmc->lock);
-	tmc->online = false;
+	tmc->available = false;
 	WRITE_ONCE(tmc->wakeup, KTIME_MAX);
+	cpumask_set_cpu(cpu, tmigr_unavailable_cpumask);
 
 	/*
 	 * CPU has to handle the local events on his own, when on the way to
 	 * offline; Therefore nextevt value is set to KTIME_MAX
 	 */
 	firstexp = __tmigr_cpu_deactivate(tmc, KTIME_MAX);
-	trace_tmigr_cpu_offline(tmc);
+	trace_tmigr_cpu_unavailable(tmc);
 	raw_spin_unlock_irq(&tmc->lock);
 
 	if (firstexp != KTIME_MAX) {
-		migrator = cpumask_any_but(cpu_online_mask, cpu);
+		migrator = cpumask_nth_andnot(0, cpu_possible_mask,
+					      tmigr_unavailable_cpumask);
+		/* Fall back to any online in case all are isolated. */
+		if (migrator >= nr_cpu_ids)
+			migrator = cpumask_any_but(cpu_online_mask, cpu);
 		work_on_cpu(migrator, tmigr_trigger_active, NULL);
 	}
 
 	return 0;
 }
 
-static int tmigr_cpu_online(unsigned int cpu)
+static int tmigr_cpu_available(unsigned int cpu)
 {
-	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
+	struct tmigr_cpu *tmc = per_cpu_ptr(&tmigr_cpu, cpu);
 
 	/* Check whether CPU data was successfully initialized */
 	if (WARN_ON_ONCE(!tmc->tmgroup))
 		return -EINVAL;
 
+	/* Silently ignore online requests if isolated */
+	if (cpu_is_isolated(cpu))
+		return 0;
 	raw_spin_lock_irq(&tmc->lock);
-	trace_tmigr_cpu_online(tmc);
+	trace_tmigr_cpu_available(tmc);
 	tmc->idle = timer_base_is_idle();
 	if (!tmc->idle)
 		__tmigr_cpu_activate(tmc);
-	tmc->online = true;
+	tmc->available = true;
+	tmc->idle = true;
+	cpumask_clear_cpu(cpu, tmigr_unavailable_cpumask);
 	raw_spin_unlock_irq(&tmc->lock);
 	return 0;
 }
 
+int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)
+{
+	int cpu;
+	cpumask_var_t cpumask;
+
+	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
+		return -ENOMEM;
+
+	cpumask_copy(cpumask, tmigr_unavailable_cpumask);
+
+	/* Was not excluded but should be excluded now. */
+	for_each_cpu_andnot(cpu, exclude_cpumask, cpumask)
+		tmigr_cpu_unavailable(cpu);
+
+	/* Was excluded but should be included now */
+	for_each_cpu_andnot(cpu, cpumask, exclude_cpumask)
+		if (cpu_online(cpu))
+			tmigr_cpu_available(cpu);
+
+	free_cpumask_var(cpumask);
+	return 0;
+}
+
 static void tmigr_init_group(struct tmigr_group *group, unsigned int lvl,
 			     int node)
 {
@@ -1801,6 +1838,13 @@ static int __init tmigr_init(void)
 	if (ncpus == 1)
 		return 0;
 
+	if (!zalloc_cpumask_var(&tmigr_unavailable_cpumask, GFP_KERNEL)) {
+		ret = -ENOMEM;
+		goto err;
+	}
+	/* Will be set by tmigr_cpu_available */
+	cpumask_setall(tmigr_unavailable_cpumask);
+
 	/*
 	 * Calculate the required hierarchy levels. Unfortunately there is no
 	 * reliable information available, unless all possible CPUs have been
@@ -1850,7 +1894,7 @@ static int __init tmigr_init(void)
 		goto err;
 
 	ret = cpuhp_setup_state(CPUHP_AP_TMIGR_ONLINE, "tmigr:online",
-				tmigr_cpu_online, tmigr_cpu_offline);
+				tmigr_cpu_available, tmigr_cpu_unavailable);
 	if (ret)
 		goto err;
 
diff --git a/kernel/time/timer_migration.h b/kernel/time/timer_migration.h
index ae19f70f8170f..70879cde6fdd0 100644
--- a/kernel/time/timer_migration.h
+++ b/kernel/time/timer_migration.h
@@ -97,7 +97,7 @@ struct tmigr_group {
  */
 struct tmigr_cpu {
 	raw_spinlock_t		lock;
-	bool			online;
+	bool			available;
 	bool			idle;
 	bool			remote;
 	struct tmigr_group	*tmgroup;

base-commit: 3b07108ada81a8ebcebf1fe61367b4e436c895bd
-- 
2.49.0
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by kernel test robot 8 months, 1 week ago
Hi Gabriele,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 3b07108ada81a8ebcebf1fe61367b4e436c895bd]

url:    https://github.com/intel-lab-lkp/linux/commits/Gabriele-Monaco/timers-Exclude-isolated-cpus-from-timer-migation/20250410-145629
base:   3b07108ada81a8ebcebf1fe61367b4e436c895bd
patch link:    https://lore.kernel.org/r/20250410065446.57304-2-gmonaco%40redhat.com
patch subject: [PATCH] timers: Exclude isolated cpus from timer migation
config: hexagon-randconfig-002-20250411 (https://download.01.org/0day-ci/archive/20250411/202504111714.VH8TJ92G-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project f819f46284f2a79790038e1f6649172789734ae8)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250411/202504111714.VH8TJ92G-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504111714.VH8TJ92G-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from arch/hexagon/kernel/asm-offsets.c:12:
   In file included from include/linux/compat.h:15:
   In file included from include/linux/socket.h:8:
   In file included from include/linux/uio.h:9:
   In file included from include/linux/mm_types.h:16:
   In file included from include/linux/uprobes.h:18:
>> include/linux/timer.h:196:83: warning: non-void function does not return a value [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         |                                                                                   ^
   1 warning generated.
--
   In file included from drivers/spi/spi-pic32-sqi.c:9:
   In file included from include/linux/clk.h:14:
   In file included from include/linux/notifier.h:16:
   In file included from include/linux/srcu.h:21:
   In file included from include/linux/workqueue.h:9:
>> include/linux/timer.h:196:83: warning: non-void function does not return a value [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         |                                                                                   ^
   drivers/spi/spi-pic32-sqi.c:673:34: warning: unused variable 'pic32_sqi_of_ids' [-Wunused-const-variable]
     673 | static const struct of_device_id pic32_sqi_of_ids[] = {
         |                                  ^~~~~~~~~~~~~~~~
   2 warnings generated.
--
   In file included from drivers/spi/spi-orion.c:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:19:
   In file included from include/linux/topology.h:33:
   In file included from include/linux/mmzone.h:22:
   In file included from include/linux/mm_types.h:16:
   In file included from include/linux/uprobes.h:18:
>> include/linux/timer.h:196:83: warning: non-void function does not return a value [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         |                                                                                   ^
   drivers/spi/spi-orion.c:614:34: warning: unused variable 'orion_spi_of_match_table' [-Wunused-const-variable]
     614 | static const struct of_device_id orion_spi_of_match_table[] = {
         |                                  ^~~~~~~~~~~~~~~~~~~~~~~~
   2 warnings generated.
--
   In file included from drivers/spi/spi-pic32.c:9:
   In file included from include/linux/clk.h:14:
   In file included from include/linux/notifier.h:16:
   In file included from include/linux/srcu.h:21:
   In file included from include/linux/workqueue.h:9:
>> include/linux/timer.h:196:83: warning: non-void function does not return a value [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         |                                                                                   ^
   drivers/spi/spi-pic32.c:850:34: warning: unused variable 'pic32_spi_of_match' [-Wunused-const-variable]
     850 | static const struct of_device_id pic32_spi_of_match[] = {
         |                                  ^~~~~~~~~~~~~~~~~~
   2 warnings generated.
--
   In file included from drivers/spi/spi-armada-3700.c:11:
   In file included from include/linux/clk.h:14:
   In file included from include/linux/notifier.h:16:
   In file included from include/linux/srcu.h:21:
   In file included from include/linux/workqueue.h:9:
>> include/linux/timer.h:196:83: warning: non-void function does not return a value [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         |                                                                                   ^
   drivers/spi/spi-armada-3700.c:806:34: warning: unused variable 'a3700_spi_dt_ids' [-Wunused-const-variable]
     806 | static const struct of_device_id a3700_spi_dt_ids[] = {
         |                                  ^~~~~~~~~~~~~~~~
   2 warnings generated.
--
   In file included from drivers/spi/spi-rockchip.c:7:
   In file included from include/linux/clk.h:14:
   In file included from include/linux/notifier.h:16:
   In file included from include/linux/srcu.h:21:
   In file included from include/linux/workqueue.h:9:
>> include/linux/timer.h:196:83: warning: non-void function does not return a value [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         |                                                                                   ^
   drivers/spi/spi-rockchip.c:1024:34: warning: unused variable 'rockchip_spi_dt_match' [-Wunused-const-variable]
    1024 | static const struct of_device_id rockchip_spi_dt_match[] = {
         |                                  ^~~~~~~~~~~~~~~~~~~~~
   2 warnings generated.
--
   In file included from drivers/spi/spi-s3c64xx.c:8:
   In file included from include/linux/clk.h:14:
   In file included from include/linux/notifier.h:16:
   In file included from include/linux/srcu.h:21:
   In file included from include/linux/workqueue.h:9:
>> include/linux/timer.h:196:83: warning: non-void function does not return a value [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         |                                                                                   ^
   drivers/spi/spi-s3c64xx.c:1644:34: warning: unused variable 's3c64xx_spi_dt_match' [-Wunused-const-variable]
    1644 | static const struct of_device_id s3c64xx_spi_dt_match[] = {
         |                                  ^~~~~~~~~~~~~~~~~~~~
   2 warnings generated.
--
   In file included from drivers/spi/spi-meson-spifc.c:8:
   In file included from include/linux/clk.h:14:
   In file included from include/linux/notifier.h:16:
   In file included from include/linux/srcu.h:21:
   In file included from include/linux/workqueue.h:9:
>> include/linux/timer.h:196:83: warning: non-void function does not return a value [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         |                                                                                   ^
   drivers/spi/spi-meson-spifc.c:423:34: warning: unused variable 'meson_spifc_dt_match' [-Wunused-const-variable]
     423 | static const struct of_device_id meson_spifc_dt_match[] = {
         |                                  ^~~~~~~~~~~~~~~~~~~~
   2 warnings generated.
--
   In file included from arch/hexagon/kernel/asm-offsets.c:12:
   In file included from include/linux/compat.h:15:
   In file included from include/linux/socket.h:8:
   In file included from include/linux/uio.h:9:
   In file included from include/linux/mm_types.h:16:
   In file included from include/linux/uprobes.h:18:
>> include/linux/timer.h:196:83: warning: non-void function does not return a value [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         |                                                                                   ^
   1 warning generated.


vim +196 include/linux/timer.h

   192	
   193	#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
   194	extern int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask);
   195	#else
 > 196	static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
   197	#endif
   198	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by kernel test robot 8 months, 1 week ago
Hi Gabriele,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 3b07108ada81a8ebcebf1fe61367b4e436c895bd]

url:    https://github.com/intel-lab-lkp/linux/commits/Gabriele-Monaco/timers-Exclude-isolated-cpus-from-timer-migation/20250410-145629
base:   3b07108ada81a8ebcebf1fe61367b4e436c895bd
patch link:    https://lore.kernel.org/r/20250410065446.57304-2-gmonaco%40redhat.com
patch subject: [PATCH] timers: Exclude isolated cpus from timer migation
config: csky-randconfig-001-20250411 (https://download.01.org/0day-ci/archive/20250411/202504111433.IwFLpkrz-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250411/202504111433.IwFLpkrz-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504111433.IwFLpkrz-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from include/linux/uprobes.h:18,
                    from include/linux/mm_types.h:16,
                    from include/linux/mmzone.h:22,
                    from include/linux/topology.h:33,
                    from include/linux/irq.h:19,
                    from include/asm-generic/hardirq.h:17,
                    from ./arch/csky/include/generated/asm/hardirq.h:1,
                    from include/linux/hardirq.h:11,
                    from include/linux/interrupt.h:11,
                    from include/linux/kernel_stat.h:8,
                    from arch/csky/kernel/asm-offsets.c:5:
   include/linux/timer.h: In function 'tmigr_isolated_exclude_cpumask':
>> include/linux/timer.h:196:1: warning: no return statement in function returning non-void [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         | ^~~~~~
--
   In file included from include/linux/uprobes.h:18,
                    from include/linux/mm_types.h:16,
                    from include/linux/mmzone.h:22,
                    from include/linux/topology.h:33,
                    from include/linux/irq.h:19,
                    from include/asm-generic/hardirq.h:17,
                    from ./arch/csky/include/generated/asm/hardirq.h:1,
                    from include/linux/hardirq.h:11,
                    from include/linux/interrupt.h:11,
                    from include/linux/kernel_stat.h:8,
                    from arch/csky/kernel/asm-offsets.c:5:
   include/linux/timer.h: In function 'tmigr_isolated_exclude_cpumask':
>> include/linux/timer.h:196:1: warning: no return statement in function returning non-void [-Wreturn-type]
     196 | static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
         | ^~~~~~


vim +196 include/linux/timer.h

   192	
   193	#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
   194	extern int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask);
   195	#else
 > 196	static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
   197	#endif
   198	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Waiman Long 8 months, 1 week ago
On 4/10/25 2:54 AM, Gabriele Monaco wrote:
> The timer migration mechanism allows timers to move from idle CPUs to
> active ones to improve the overall idle time. This is however undesired
> when CPU intensive workloads run on isolated cores, as the algorithm
> would move the timers from housekeeping to isolated cores, negatively
> affecting the isolation.
>
> Exclude isolated cores from the timer migration algorithm, extend the
> concept of unavailable cores, currently used for offline ones, to
> isolated ones:
>   * A core is unavailable if isolated or offline;
>   * A core is available if isolated and offline;
>
> Keep a cpumap to easily track unavailable cores and change the concept
> of online/offline tmigr to available/unavailable in code and
> tracepoints.
>
> A core is considered unavailable as idle if:
>   * is in the isolcpus list
>   * is in the nohz_full list
>   * is in an isolated cpuset
>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>   include/linux/timer.h                  |  6 +++
>   include/trace/events/timer_migration.h |  4 +-
>   kernel/cgroup/cpuset.c                 | 15 +++---
>   kernel/time/timer_migration.c          | 72 +++++++++++++++++++++-----
>   kernel/time/timer_migration.h          |  2 +-
>   5 files changed, 76 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 10596d7c3a346..27fb02aa3d780 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -190,4 +190,10 @@ int timers_dead_cpu(unsigned int cpu);
>   #define timers_dead_cpu		NULL
>   #endif
>   
> +#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
> +extern int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask);
> +#else
> +static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask) { }
> +#endif
> +
>   #endif
> diff --git a/include/trace/events/timer_migration.h b/include/trace/events/timer_migration.h
> index 47db5eaf2f9ab..61171b13c687c 100644
> --- a/include/trace/events/timer_migration.h
> +++ b/include/trace/events/timer_migration.h
> @@ -173,14 +173,14 @@ DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_active,
>   	TP_ARGS(tmc)
>   );
>   
> -DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_online,
> +DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_available,
>   
>   	TP_PROTO(struct tmigr_cpu *tmc),
>   
>   	TP_ARGS(tmc)
>   );
>   
> -DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_offline,
> +DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_unavailable,
>   
>   	TP_PROTO(struct tmigr_cpu *tmc),
>   
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 306b604300914..47495ba4012b5 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1323,7 +1323,7 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
>   	return isolcpus_updated;
>   }
>   
> -static void update_unbound_workqueue_cpumask(bool isolcpus_updated)
> +static void update_unbound_interference_cpumask(bool isolcpus_updated)

Unbound workqueue is a special workqueue. The name 
"update_unbound_interference_cpumask" doesn't make sense. I would prefer 
you to use names like "update_exclusion_cpumasks" or 
"update_isolated_cpumasks".

Cheers,
Longman
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Thomas Gleixner 8 months, 1 week ago
On Thu, Apr 10 2025 at 08:54, Gabriele Monaco wrote:
>  
> +/*  cpumask excluded from migration */
> +static cpumask_var_t tmigr_unavailable_cpumask;

Why is this a negated mask instead of being the obvious and intuitive
available mask?

>  	if (firstexp != KTIME_MAX) {
> -		migrator = cpumask_any_but(cpu_online_mask, cpu);
> +		migrator = cpumask_nth_andnot(0, cpu_possible_mask,
> +					      tmigr_unavailable_cpumask);

That's exactly what this negated mask causes: incomprehensible code.

	cpumask_clear_cpu(cpu, available_mask);
        ...               
		migrator = cpumask_first(available_mask);

is too simple and obvious, right?

> +		/* Fall back to any online in case all are isolated. */

How can that happen? There is always at least _ONE_ housekeeping,
non-isolated, CPU online, no?

> +		if (migrator >= nr_cpu_ids)
> +			migrator = cpumask_any_but(cpu_online_mask, cpu);
>  		work_on_cpu(migrator, tmigr_trigger_active, NULL);
>  	}
>  
>  	return 0;
>  }
>  
> -static int tmigr_cpu_online(unsigned int cpu)
> +static int tmigr_cpu_available(unsigned int cpu)
>  {
> -	struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
> +	struct tmigr_cpu *tmc = per_cpu_ptr(&tmigr_cpu, cpu);
>  
>  	/* Check whether CPU data was successfully initialized */
>  	if (WARN_ON_ONCE(!tmc->tmgroup))
>  		return -EINVAL;
>  
> +	/* Silently ignore online requests if isolated */

This comment makes no sense.

     /* Isolated CPUs are not participating in timer migration */

makes it entirely clear what this is about, no?

That brings me to the general design decision here. Your changelog
explains at great length WHAT the change is doing, but completely fails
to explain the consequences and the rationale why this is the right
thing to do.

By excluding the isolated CPUs from migration completely, any 'global'
timer, which is armed on such a CPU, has to be expired on that isolated
CPU. That's fundamentaly different from e.g. RCU isolation.

It might be the right thing to do and harmless, but without a proper
explanation it's a silent side effect of your changes, which leaves
people scratching their heads.

> +	if (cpu_is_isolated(cpu))
> +		return 0;
>  	raw_spin_lock_irq(&tmc->lock);
> -	trace_tmigr_cpu_online(tmc);
> +	trace_tmigr_cpu_available(tmc);
>  	tmc->idle = timer_base_is_idle();
>  	if (!tmc->idle)
>  		__tmigr_cpu_activate(tmc);
> -	tmc->online = true;
> +	tmc->available = true;
> +	tmc->idle = true;

How so?

> +	cpumask_clear_cpu(cpu, tmigr_unavailable_cpumask);
>  	raw_spin_unlock_irq(&tmc->lock);
>  	return 0;
>  }
>  
> +int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)

cpumask_var_t is wrong here. 'const struct cpumask *' is what you want.

> +{
> +	int cpu;
> +	cpumask_var_t cpumask;

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#coding-style-notes

> +	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	cpumask_copy(cpumask, tmigr_unavailable_cpumask);

What serializes this against concurrent CPU hotplug? I assume it's done
by the caller, but then this code should have a lockdep assert to
validate it. If it's not, then this is broken.

As it is serialized it does not need a copy either, right?

> +	/* Was not excluded but should be excluded now. */
> +	for_each_cpu_andnot(cpu, exclude_cpumask, cpumask)
> +		tmigr_cpu_unavailable(cpu);
> +
> +	/* Was excluded but should be included now */
> +	for_each_cpu_andnot(cpu, cpumask, exclude_cpumask)
> +		if (cpu_online(cpu))
> +			tmigr_cpu_available(cpu);

My brain hurts by now.

         for_each_cpu_and(cpu, available_mask, exclude_mask)
         	tmigr_cpu_unavailable(cpu);

         for_each_cpu_andnot(cpu, cpu_online_mask, exclude_mask) {
         	if (!cpumask_test_cpu(cpu, available_mask))
                	tmigr_cpu_available(cpu);
         }

No?

Also this patch is doing too many things at once. It want's to be split
into:

    Patch 1: Rename 'online' to 'available' (bit and function names)
    Patch 2: Add the available mask logic
    Patch 3: Add the isolation functionality

Thanks,

        tglx
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Gabriele Monaco 8 months, 1 week ago
On Thu, 2025-04-10 at 10:26 +0200, Thomas Gleixner wrote:
> On Thu, Apr 10 2025 at 08:54, Gabriele Monaco wrote:
> >  
> > +/*  cpumask excluded from migration */
> > +static cpumask_var_t tmigr_unavailable_cpumask;
> 
> Why is this a negated mask instead of being the obvious and intuitive
> available mask?

Well, the way I started writing the patch I found it easier to do the
double andnot in tmigr_isolated_exclude_cpumask to see what changed.
I see the way it evolved is just messier..
I'll apply your solution which seems much neater!

> 
> >   if (firstexp != KTIME_MAX) {
> > - migrator = cpumask_any_but(cpu_online_mask, cpu);
> > + migrator = cpumask_nth_andnot(0, cpu_possible_mask,
> > +       tmigr_unavailable_cpumask);
> 
> That's exactly what this negated mask causes: incomprehensible code.
> 
>  cpumask_clear_cpu(cpu, available_mask);
>         ...               
>  migrator = cpumask_first(available_mask);
> 
> is too simple and obvious, right?
> 
> > + /* Fall back to any online in case all are isolated. */
> 
> How can that happen? There is always at least _ONE_ housekeeping,
> non-isolated, CPU online, no?
> 

In my understanding it shouldn't, but I'm not sure there's anything
preventing the user from isolating everything via cpuset.
Anyway that's something no one in their mind should do, so I guess I'd
just opt for the cpumask_first (or actually cpumask_any, like before
the change).

> > + if (migrator >= nr_cpu_ids)
> > + migrator = cpumask_any_but(cpu_online_mask, cpu);
> >   work_on_cpu(migrator, tmigr_trigger_active, NULL);
> >   }
> >  
> >   return 0;
> >  }
> >  
> > -static int tmigr_cpu_online(unsigned int cpu)
> > +static int tmigr_cpu_available(unsigned int cpu)
> >  {
> > - struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
> > + struct tmigr_cpu *tmc = per_cpu_ptr(&tmigr_cpu, cpu);
> >  
> >   /* Check whether CPU data was successfully initialized */
> >   if (WARN_ON_ONCE(!tmc->tmgroup))
> >   return -EINVAL;
> >  
> > + /* Silently ignore online requests if isolated */
> 
> This comment makes no sense.
> 
>      /* Isolated CPUs are not participating in timer migration */
> 
> makes it entirely clear what this is about, no?
> 
> That brings me to the general design decision here. Your changelog
> explains at great length WHAT the change is doing, but completely
> fails
> to explain the consequences and the rationale why this is the right
> thing to do.
> 
> By excluding the isolated CPUs from migration completely, any
> 'global'
> timer, which is armed on such a CPU, has to be expired on that
> isolated
> CPU. That's fundamentaly different from e.g. RCU isolation.
> 
> It might be the right thing to do and harmless, but without a proper
> explanation it's a silent side effect of your changes, which leaves
> people scratching their heads.

Mmh, essentially the idea is that global timer should not migrate from
housekeeping to isolated cores. I assumed the opposite never occurs (as
global timers /should/ not even start on isolated cores on a properly
isolated system), but you're right, that's not quite true.

Thinking about it now, since global timers /can/ start on isolated
cores, that makes them quite different from offline ones and probably
considering them the same is just not the right thing to do..

I'm going to have a deeper thought about this whole approach, perhaps
something simpler just preventing migration in that one direction would
suffice.

> 
> > + if (cpu_is_isolated(cpu))
> > + return 0;
> >   raw_spin_lock_irq(&tmc->lock);
> > - trace_tmigr_cpu_online(tmc);
> > + trace_tmigr_cpu_available(tmc);
> >   tmc->idle = timer_base_is_idle();
> >   if (!tmc->idle)
> >   __tmigr_cpu_activate(tmc);
> > - tmc->online = true;
> > + tmc->available = true;
> > + tmc->idle = true;
> 
> How so?
> 
> > + cpumask_clear_cpu(cpu, tmigr_unavailable_cpumask);
> >   raw_spin_unlock_irq(&tmc->lock);
> >   return 0;
> >  }
> >  
> > +int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)
> 
> cpumask_var_t is wrong here. 'const struct cpumask *' is what you
> want.

You mean in the function argument? That's exactly how it is handled in
workqueue_unbound_exclude_cpumask. I get cpumask_var_t is not
necessarily a pointer, perhaps it's worth changing it there too..
Or am I missing your point?

> 
> > +{
> > + int cpu;
> > + cpumask_var_t cpumask;
> 
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#coding-style-notes
> 
> > + if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
> > + return -ENOMEM;
> > +
> > + cpumask_copy(cpumask, tmigr_unavailable_cpumask);
> 
> What serializes this against concurrent CPU hotplug? I assume it's
> done
> by the caller, but then this code should have a lockdep assert to
> validate it. If it's not, then this is broken.
> 
> As it is serialized it does not need a copy either, right?
> 
> > + /* Was not excluded but should be excluded now. */
> > + for_each_cpu_andnot(cpu, exclude_cpumask, cpumask)
> > + tmigr_cpu_unavailable(cpu);
> > +
> > + /* Was excluded but should be included now */
> > + for_each_cpu_andnot(cpu, cpumask, exclude_cpumask)
> > + if (cpu_online(cpu))
> > + tmigr_cpu_available(cpu);
> 
> My brain hurts by now.
> 
>          for_each_cpu_and(cpu, available_mask, exclude_mask)
>           tmigr_cpu_unavailable(cpu);
> 
>          for_each_cpu_andnot(cpu, cpu_online_mask, exclude_mask) {
>           if (!cpumask_test_cpu(cpu, available_mask))
>                  tmigr_cpu_available(cpu);
>          }
> 
> No?
> 
> Also this patch is doing too many things at once. It want's to be
> split
> into:
> 
>     Patch 1: Rename 'online' to 'available' (bit and function names)
>     Patch 2: Add the available mask logic
>     Patch 3: Add the isolation functionality
> 

Good point, I'll keep it in mind if the patch keeps this shape.

Thanks for the feedback,
Gabriele
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Waiman Long 8 months, 1 week ago
On 4/10/25 6:38 AM, Gabriele Monaco wrote:
> On Thu, 2025-04-10 at 10:26 +0200, Thomas Gleixner wrote:
>> On Thu, Apr 10 2025 at 08:54, Gabriele Monaco wrote:
>>>   
>>> +/*  cpumask excluded from migration */
>>> +static cpumask_var_t tmigr_unavailable_cpumask;
>> Why is this a negated mask instead of being the obvious and intuitive
>> available mask?
> Well, the way I started writing the patch I found it easier to do the
> double andnot in tmigr_isolated_exclude_cpumask to see what changed.
> I see the way it evolved is just messier..
> I'll apply your solution which seems much neater!
>
>>>    if (firstexp != KTIME_MAX) {
>>> - migrator = cpumask_any_but(cpu_online_mask, cpu);
>>> + migrator = cpumask_nth_andnot(0, cpu_possible_mask,
>>> +       tmigr_unavailable_cpumask);
>> That's exactly what this negated mask causes: incomprehensible code.
>>
>>   cpumask_clear_cpu(cpu, available_mask);
>>          ...
>>   migrator = cpumask_first(available_mask);
>>
>> is too simple and obvious, right?
>>
>>> + /* Fall back to any online in case all are isolated. */
>> How can that happen? There is always at least _ONE_ housekeeping,
>> non-isolated, CPU online, no?
>>
> In my understanding it shouldn't, but I'm not sure there's anything
> preventing the user from isolating everything via cpuset.
> Anyway that's something no one in their mind should do, so I guess I'd
> just opt for the cpumask_first (or actually cpumask_any, like before
> the change).

No, cpuset is not allowed to take all the CPUs from the cgroup root 
(housekeeping CPUs). So it can't happen.


>>> + if (migrator >= nr_cpu_ids)
>>> + migrator = cpumask_any_but(cpu_online_mask, cpu);
>>>    work_on_cpu(migrator, tmigr_trigger_active, NULL);
>>>    }
>>>   
>>>    return 0;
>>>   }
>>>   
>>> -static int tmigr_cpu_online(unsigned int cpu)
>>> +static int tmigr_cpu_available(unsigned int cpu)
>>>   {
>>> - struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
>>> + struct tmigr_cpu *tmc = per_cpu_ptr(&tmigr_cpu, cpu);
>>>   
>>>    /* Check whether CPU data was successfully initialized */
>>>    if (WARN_ON_ONCE(!tmc->tmgroup))
>>>    return -EINVAL;
>>>   
>>> + /* Silently ignore online requests if isolated */
>> This comment makes no sense.
>>
>>       /* Isolated CPUs are not participating in timer migration */
>>
>> makes it entirely clear what this is about, no?
>>
>> That brings me to the general design decision here. Your changelog
>> explains at great length WHAT the change is doing, but completely
>> fails
>> to explain the consequences and the rationale why this is the right
>> thing to do.
>>
>> By excluding the isolated CPUs from migration completely, any
>> 'global'
>> timer, which is armed on such a CPU, has to be expired on that
>> isolated
>> CPU. That's fundamentaly different from e.g. RCU isolation.
>>
>> It might be the right thing to do and harmless, but without a proper
>> explanation it's a silent side effect of your changes, which leaves
>> people scratching their heads.
> Mmh, essentially the idea is that global timer should not migrate from
> housekeeping to isolated cores. I assumed the opposite never occurs (as
> global timers /should/ not even start on isolated cores on a properly
> isolated system), but you're right, that's not quite true.
>
> Thinking about it now, since global timers /can/ start on isolated
> cores, that makes them quite different from offline ones and probably
> considering them the same is just not the right thing to do..
>
> I'm going to have a deeper thought about this whole approach, perhaps
> something simpler just preventing migration in that one direction would
> suffice.
>
>>> + if (cpu_is_isolated(cpu))
>>> + return 0;
>>>    raw_spin_lock_irq(&tmc->lock);
>>> - trace_tmigr_cpu_online(tmc);
>>> + trace_tmigr_cpu_available(tmc);
>>>    tmc->idle = timer_base_is_idle();
>>>    if (!tmc->idle)
>>>    __tmigr_cpu_activate(tmc);
>>> - tmc->online = true;
>>> + tmc->available = true;
>>> + tmc->idle = true;
>> How so?
>>
>>> + cpumask_clear_cpu(cpu, tmigr_unavailable_cpumask);
>>>    raw_spin_unlock_irq(&tmc->lock);
>>>    return 0;
>>>   }
>>>   
>>> +int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)
>> cpumask_var_t is wrong here. 'const struct cpumask *' is what you
>> want.
> You mean in the function argument? That's exactly how it is handled in
> workqueue_unbound_exclude_cpumask. I get cpumask_var_t is not
> necessarily a pointer, perhaps it's worth changing it there too..
> Or am I missing your point?

cpumask_var_t can be interchangeable with "struct cpumask *" as long as 
the passed-in cpumask is not being modified.

Cheers,
Longman

Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Thomas Gleixner 8 months, 1 week ago
On Thu, Apr 10 2025 at 12:38, Gabriele Monaco wrote:
> On Thu, 2025-04-10 at 10:26 +0200, Thomas Gleixner wrote:
>> > + /* Fall back to any online in case all are isolated. */
>> 
>> How can that happen? There is always at least _ONE_ housekeeping,
>> non-isolated, CPU online, no?
>> 
>
> In my understanding it shouldn't, but I'm not sure there's anything
> preventing the user from isolating everything via cpuset.
> Anyway that's something no one in their mind should do, so I guess I'd
> just opt for the cpumask_first (or actually cpumask_any, like before
> the change).

This should be prevented by the core infrastructure. Isolating _ALL_
CPUs is broken by definition.

>> That brings me to the general design decision here. Your changelog
>> explains at great length WHAT the change is doing, but completely
>> fails
>> to explain the consequences and the rationale why this is the right
>> thing to do.
>> 
>> By excluding the isolated CPUs from migration completely, any
>> 'global'
>> timer, which is armed on such a CPU, has to be expired on that
>> isolated
>> CPU. That's fundamentaly different from e.g. RCU isolation.
>> 
>> It might be the right thing to do and harmless, but without a proper
>> explanation it's a silent side effect of your changes, which leaves
>> people scratching their heads.
>
> Mmh, essentially the idea is that global timer should not migrate from
> housekeeping to isolated cores. I assumed the opposite never occurs (as
> global timers /should/ not even start on isolated cores on a properly
> isolated system), but you're right, that's not quite true.
>
> Thinking about it now, since global timers /can/ start on isolated
> cores, that makes them quite different from offline ones and probably
> considering them the same is just not the right thing to do..
>
> I'm going to have a deeper thought about this whole approach, perhaps
> something simpler just preventing migration in that one direction would
> suffice.

Indeed.
>> > +int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)
>> 
>> cpumask_var_t is wrong here. 'const struct cpumask *' is what you
>> want.
>
> You mean in the function argument? That's exactly how it is handled in
> workqueue_unbound_exclude_cpumask. I get cpumask_var_t is not
> necessarily a pointer, perhaps it's worth changing it there too..

Correct. cpumask_var_t is the magic macro construct which allows to
switch from cpumask on stack to allocated ones at compile time, but
what's handed around is a pointer to struct cpumask.

Thanks,

        tglx
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Frederic Weisbecker 8 months, 1 week ago
Le Thu, Apr 10, 2025 at 12:38:25PM +0200, Gabriele Monaco a écrit :
> On Thu, 2025-04-10 at 10:26 +0200, Thomas Gleixner wrote:
> > How can that happen? There is always at least _ONE_ housekeeping,
> > non-isolated, CPU online, no?
> > 
> 
> In my understanding it shouldn't, but I'm not sure there's anything
> preventing the user from isolating everything via cpuset.
> Anyway that's something no one in their mind should do, so I guess I'd
> just opt for the cpumask_first (or actually cpumask_any, like before
> the change).

With "nohz_full=..." or "isolcpus=nohz,..." there is always at least one
housekeeping CPU. But with isolcpus=[domain] or cpusets equivalents
(v1 cpuset.sched_load_balance, v2 isolated partion) there is nothing that
prevents all CPUs from being isolated.

Speaking of, those are two different issues here:

* nohz_full CPUs are handled just like idle CPUs. Once the tick is stopped,
  the global timers are handled by other CPUs (housekeeping). There is always
  one housekeeping CPU that never goes idle.
  One subtle thing though: if the nohz_full CPU fires a tick, because there
  is a local timer to be handled for example, it will also possibly handle
  some global timers along the way. If it happens to be a problem, it should
  be easy to resolve.

* Domain isolated CPUs are treated just like other CPUs. But there is not
  always a housekeeping CPU around. And no guarantee that there is always
  a non-idle CPU to take care of global timers.

> > That brings me to the general design decision here. Your changelog
> > explains at great length WHAT the change is doing, but completely
> > fails
> > to explain the consequences and the rationale why this is the right
> > thing to do.
> > 
> > By excluding the isolated CPUs from migration completely, any
> > 'global'
> > timer, which is armed on such a CPU, has to be expired on that
> > isolated
> > CPU. That's fundamentaly different from e.g. RCU isolation.
> > 
> > It might be the right thing to do and harmless, but without a proper
> > explanation it's a silent side effect of your changes, which leaves
> > people scratching their heads.
> 
> Mmh, essentially the idea is that global timer should not migrate from
> housekeeping to isolated cores. I assumed the opposite never occurs (as
> global timers /should/ not even start on isolated cores on a properly
> isolated system), but you're right, that's not quite true.

Indeed, they can definetly start there.
I'm tempted to propose to offline/reonline isolated CPUs in order to migrate
away those timers. But that only works for timers that are currently queued.

> 
> Thinking about it now, since global timers /can/ start on isolated
> cores, that makes them quite different from offline ones and probably
> considering them the same is just not the right thing to do..
> 
> I'm going to have a deeper thought about this whole approach, perhaps
> something simpler just preventing migration in that one direction would
> suffice.

I think we can use your solution, which involves isolating the CPU from tmigr
hierarchy. And also always queue global timers to non-isolated targets.

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Waiman Long 8 months, 1 week ago
On 4/10/25 9:03 AM, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 12:38:25PM +0200, Gabriele Monaco a écrit :
>> On Thu, 2025-04-10 at 10:26 +0200, Thomas Gleixner wrote:
>>> How can that happen? There is always at least _ONE_ housekeeping,
>>> non-isolated, CPU online, no?
>>>
>> In my understanding it shouldn't, but I'm not sure there's anything
>> preventing the user from isolating everything via cpuset.
>> Anyway that's something no one in their mind should do, so I guess I'd
>> just opt for the cpumask_first (or actually cpumask_any, like before
>> the change).
> With "nohz_full=..." or "isolcpus=nohz,..." there is always at least one
> housekeeping CPU. But with isolcpus=[domain] or cpusets equivalents
> (v1 cpuset.sched_load_balance, v2 isolated partion) there is nothing that
> prevents all CPUs from being isolated.

Actually v2 won't allow users to isolate all the CPUs. Users can 
probably do that with v1's cpuset.sched_load_balance.

Cheers,
Longman

Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Frederic Weisbecker 8 months, 1 week ago
Le Thu, Apr 10, 2025 at 10:35:55AM -0400, Waiman Long a écrit :
> On 4/10/25 9:03 AM, Frederic Weisbecker wrote:
> > Le Thu, Apr 10, 2025 at 12:38:25PM +0200, Gabriele Monaco a écrit :
> > > On Thu, 2025-04-10 at 10:26 +0200, Thomas Gleixner wrote:
> > > > How can that happen? There is always at least _ONE_ housekeeping,
> > > > non-isolated, CPU online, no?
> > > > 
> > > In my understanding it shouldn't, but I'm not sure there's anything
> > > preventing the user from isolating everything via cpuset.
> > > Anyway that's something no one in their mind should do, so I guess I'd
> > > just opt for the cpumask_first (or actually cpumask_any, like before
> > > the change).
> > With "nohz_full=..." or "isolcpus=nohz,..." there is always at least one
> > housekeeping CPU. But with isolcpus=[domain] or cpusets equivalents
> > (v1 cpuset.sched_load_balance, v2 isolated partion) there is nothing that
> > prevents all CPUs from being isolated.
> 
> Actually v2 won't allow users to isolate all the CPUs. Users can probably do
> that with v1's cpuset.sched_load_balance.

Perhaps, and I think isolcpus= can too.

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Waiman Long 8 months, 1 week ago
On 4/10/25 10:43 AM, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 10:35:55AM -0400, Waiman Long a écrit :
>> On 4/10/25 9:03 AM, Frederic Weisbecker wrote:
>>> Le Thu, Apr 10, 2025 at 12:38:25PM +0200, Gabriele Monaco a écrit :
>>>> On Thu, 2025-04-10 at 10:26 +0200, Thomas Gleixner wrote:
>>>>> How can that happen? There is always at least _ONE_ housekeeping,
>>>>> non-isolated, CPU online, no?
>>>>>
>>>> In my understanding it shouldn't, but I'm not sure there's anything
>>>> preventing the user from isolating everything via cpuset.
>>>> Anyway that's something no one in their mind should do, so I guess I'd
>>>> just opt for the cpumask_first (or actually cpumask_any, like before
>>>> the change).
>>> With "nohz_full=..." or "isolcpus=nohz,..." there is always at least one
>>> housekeeping CPU. But with isolcpus=[domain] or cpusets equivalents
>>> (v1 cpuset.sched_load_balance, v2 isolated partion) there is nothing that
>>> prevents all CPUs from being isolated.
>> Actually v2 won't allow users to isolate all the CPUs. Users can probably do
>> that with v1's cpuset.sched_load_balance.
> Perhaps, and I think isolcpus= can too.

No, I don't think so. The following code is in kernel/sched/isolation.c:

first_cpu = cpumask_first_and(cpu_present_mask, housekeeping_staging); 
if (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) { 
__cpumask_set_cpu(smp_processor_id(), housekeeping_staging); 
__cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask); if 
(!housekeeping.flags) { pr_warn("Housekeeping: must include one present 
CPU, " "using boot CPU:%d\n", smp_processor_id()); } }

Cheers, Longman

Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Frederic Weisbecker 8 months, 1 week ago
Le Thu, Apr 10, 2025 at 10:50:39AM -0400, Waiman Long a écrit :
> 
> On 4/10/25 10:43 AM, Frederic Weisbecker wrote:
> > Le Thu, Apr 10, 2025 at 10:35:55AM -0400, Waiman Long a écrit :
> > > On 4/10/25 9:03 AM, Frederic Weisbecker wrote:
> > > > Le Thu, Apr 10, 2025 at 12:38:25PM +0200, Gabriele Monaco a écrit :
> > > > > On Thu, 2025-04-10 at 10:26 +0200, Thomas Gleixner wrote:
> > > > > > How can that happen? There is always at least _ONE_ housekeeping,
> > > > > > non-isolated, CPU online, no?
> > > > > > 
> > > > > In my understanding it shouldn't, but I'm not sure there's anything
> > > > > preventing the user from isolating everything via cpuset.
> > > > > Anyway that's something no one in their mind should do, so I guess I'd
> > > > > just opt for the cpumask_first (or actually cpumask_any, like before
> > > > > the change).
> > > > With "nohz_full=..." or "isolcpus=nohz,..." there is always at least one
> > > > housekeeping CPU. But with isolcpus=[domain] or cpusets equivalents
> > > > (v1 cpuset.sched_load_balance, v2 isolated partion) there is nothing that
> > > > prevents all CPUs from being isolated.
> > > Actually v2 won't allow users to isolate all the CPUs. Users can probably do
> > > that with v1's cpuset.sched_load_balance.
> > Perhaps, and I think isolcpus= can too.
> 
> No, I don't think so. The following code is in kernel/sched/isolation.c:
> 
> first_cpu = cpumask_first_and(cpu_present_mask, housekeeping_staging); if
> (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) {
> __cpumask_set_cpu(smp_processor_id(), housekeeping_staging);
> __cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask); if
> (!housekeeping.flags) { pr_warn("Housekeeping: must include one present CPU,
> " "using boot CPU:%d\n", smp_processor_id()); } }

Ok, good then!

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Gabriele Monaco 8 months, 1 week ago

On Thu, 2025-04-10 at 16:43 +0200, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 10:35:55AM -0400, Waiman Long a écrit :
> > On 4/10/25 9:03 AM, Frederic Weisbecker wrote:
> > > Le Thu, Apr 10, 2025 at 12:38:25PM +0200, Gabriele Monaco a écrit
> > > :
> > > > On Thu, 2025-04-10 at 10:26 +0200, Thomas Gleixner wrote:
> > > > > How can that happen? There is always at least _ONE_
> > > > > housekeeping,
> > > > > non-isolated, CPU online, no?
> > > > > 
> > > > In my understanding it shouldn't, but I'm not sure there's
> > > > anything
> > > > preventing the user from isolating everything via cpuset.
> > > > Anyway that's something no one in their mind should do, so I
> > > > guess I'd
> > > > just opt for the cpumask_first (or actually cpumask_any, like
> > > > before
> > > > the change).
> > > With "nohz_full=..." or "isolcpus=nohz,..." there is always at
> > > least one
> > > housekeeping CPU. But with isolcpus=[domain] or cpusets
> > > equivalents
> > > (v1 cpuset.sched_load_balance, v2 isolated partion) there is
> > > nothing that
> > > prevents all CPUs from being isolated.
> > 
> > Actually v2 won't allow users to isolate all the CPUs. Users can
> > probably do
> > that with v1's cpuset.sched_load_balance.
> 
> Perhaps, and I think isolcpus= can too.
> 
  # vng -a isolcpus=0-15 cat /sys/devices/system/cpu/isolated
  1-15

Seems not..
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Thomas Gleixner 8 months, 1 week ago
On Thu, Apr 10 2025 at 15:03, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 12:38:25PM +0200, Gabriele Monaco a écrit :
> Speaking of, those are two different issues here:
>
> * nohz_full CPUs are handled just like idle CPUs. Once the tick is stopped,
>   the global timers are handled by other CPUs (housekeeping). There is always
>   one housekeeping CPU that never goes idle.
>   One subtle thing though: if the nohz_full CPU fires a tick, because there
>   is a local timer to be handled for example, it will also possibly handle
>   some global timers along the way. If it happens to be a problem, it should
>   be easy to resolve.
>
> * Domain isolated CPUs are treated just like other CPUs. But there is not
>   always a housekeeping CPU around. And no guarantee that there is always
>   a non-idle CPU to take care of global timers.

That's an insianity.

>> Thinking about it now, since global timers /can/ start on isolated
>> cores, that makes them quite different from offline ones and probably
>> considering them the same is just not the right thing to do..
>> 
>> I'm going to have a deeper thought about this whole approach, perhaps
>> something simpler just preventing migration in that one direction would
>> suffice.
>
> I think we can use your solution, which involves isolating the CPU from tmigr
> hierarchy. And also always queue global timers to non-isolated targets.

Why do we have to inflict extra complexity into the timer enqueue path
instead of preventing the migration to, but not the migration from
isolated CPUs?

Thanks,

        tglx
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Frederic Weisbecker 8 months, 1 week ago
Le Thu, Apr 10, 2025 at 03:15:49PM +0200, Thomas Gleixner a écrit :
> On Thu, Apr 10 2025 at 15:03, Frederic Weisbecker wrote:
> > Le Thu, Apr 10, 2025 at 12:38:25PM +0200, Gabriele Monaco a écrit :
> > Speaking of, those are two different issues here:
> >
> > * nohz_full CPUs are handled just like idle CPUs. Once the tick is stopped,
> >   the global timers are handled by other CPUs (housekeeping). There is always
> >   one housekeeping CPU that never goes idle.
> >   One subtle thing though: if the nohz_full CPU fires a tick, because there
> >   is a local timer to be handled for example, it will also possibly handle
> >   some global timers along the way. If it happens to be a problem, it should
> >   be easy to resolve.
> >
> > * Domain isolated CPUs are treated just like other CPUs. But there is not
> >   always a housekeeping CPU around. And no guarantee that there is always
> >   a non-idle CPU to take care of global timers.
> 
> That's an insianity.

It works, but it doesn't make much sense arguably.

> 
> >> Thinking about it now, since global timers /can/ start on isolated
> >> cores, that makes them quite different from offline ones and probably
> >> considering them the same is just not the right thing to do..
> >> 
> >> I'm going to have a deeper thought about this whole approach, perhaps
> >> something simpler just preventing migration in that one direction would
> >> suffice.
> >
> > I think we can use your solution, which involves isolating the CPU from tmigr
> > hierarchy. And also always queue global timers to non-isolated targets.
> 
> Why do we have to inflict extra complexity into the timer enqueue path
> instead of preventing the migration to, but not the migration from
> isolated CPUs?

But how do we handle global timers that have been initialized and queued from
isolated CPUs?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Gabriele Monaco 8 months, 1 week ago

On Thu, 2025-04-10 at 15:27 +0200, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 03:15:49PM +0200, Thomas Gleixner a écrit :
> > On Thu, Apr 10 2025 at 15:03, Frederic Weisbecker wrote:
> > > Le Thu, Apr 10, 2025 at 12:38:25PM +0200, Gabriele Monaco a écrit
> > > :
> > > Speaking of, those are two different issues here:
> > > 
> > > * nohz_full CPUs are handled just like idle CPUs. Once the tick
> > > is stopped,
> > >   the global timers are handled by other CPUs (housekeeping).
> > > There is always
> > >   one housekeeping CPU that never goes idle.
> > >   One subtle thing though: if the nohz_full CPU fires a tick,
> > > because there
> > >   is a local timer to be handled for example, it will also
> > > possibly handle
> > >   some global timers along the way. If it happens to be a
> > > problem, it should
> > >   be easy to resolve.
> > > 
> > > * Domain isolated CPUs are treated just like other CPUs. But
> > > there is not
> > >   always a housekeeping CPU around. And no guarantee that there
> > > is always
> > >   a non-idle CPU to take care of global timers.
> > 
> > That's an insianity.
> 
> It works, but it doesn't make much sense arguably.

I wonder if we should really worry about this scenario though.

> 
> > 
> > > > Thinking about it now, since global timers /can/ start on
> > > > isolated
> > > > cores, that makes them quite different from offline ones and
> > > > probably
> > > > considering them the same is just not the right thing to do..
> > > > 
> > > > I'm going to have a deeper thought about this whole approach,
> > > > perhaps
> > > > something simpler just preventing migration in that one
> > > > direction would
> > > > suffice.
> > > 
> > > I think we can use your solution, which involves isolating the
> > > CPU from tmigr
> > > hierarchy. And also always queue global timers to non-isolated
> > > targets.
> > 
> > Why do we have to inflict extra complexity into the timer enqueue
> > path
> > instead of preventing the migration to, but not the migration from
> > isolated CPUs?
> 
> But how do we handle global timers that have been initialized and
> queued from
> isolated CPUs?

I need to sketch a bit more the solution but the rough idea is:
1. isolated CPUs don't pull remote timers
2. isolated CPUs ignore their global timers and let others pull them
  perhaps with some more logic to avoid it expiring


Wouldn't that be sufficient?

Also, I would definitely do 1. for any kind of isolation, but I'm not
sure about 2.
Strictly speaking domain isolated cores don't claim to be free of
kernel noise, even if they initiate it (but nohz_full ones do).
What would be the expectation there?

Thanks,
Gabriele
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Frederic Weisbecker 8 months, 1 week ago
Le Thu, Apr 10, 2025 at 03:56:02PM +0200, Gabriele Monaco a écrit :
> On Thu, 2025-04-10 at 15:27 +0200, Frederic Weisbecker wrote:
> > But how do we handle global timers that have been initialized and
> > queued from
> > isolated CPUs?
> 
> I need to sketch a bit more the solution but the rough idea is:
> 1. isolated CPUs don't pull remote timers

That's the "easy" part.

> 2. isolated CPUs ignore their global timers and let others pull them
>   perhaps with some more logic to avoid it expiring

This will always involve added overhead because you may need to wake up
a CPU upon enqueueing a global timer to make sure it will be handled.
At least when all other CPUs are idle.

> Wouldn't that be sufficient?
> 
> Also, I would definitely do 1. for any kind of isolation, but I'm not
> sure about 2.
> Strictly speaking domain isolated cores don't claim to be free of
> kernel noise, even if they initiate it (but nohz_full ones do).
> What would be the expectation there?

I don't know, I haven't heard complains about isolcpus handling global
timers so far...

I wouldn't pay much attention to 2) until anybody complains. Does 1) even
matter to anybody outside nohz_full ?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Thomas Gleixner 8 months, 1 week ago
On Thu, Apr 10 2025 at 16:20, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 03:56:02PM +0200, Gabriele Monaco a écrit :
>> On Thu, 2025-04-10 at 15:27 +0200, Frederic Weisbecker wrote:
>> > But how do we handle global timers that have been initialized and
>> > queued from
>> > isolated CPUs?
>> 
>> I need to sketch a bit more the solution but the rough idea is:
>> 1. isolated CPUs don't pull remote timers
>
> That's the "easy" part.
>
>> 2. isolated CPUs ignore their global timers and let others pull them
>>   perhaps with some more logic to avoid it expiring
>
> This will always involve added overhead because you may need to wake up
> a CPU upon enqueueing a global timer to make sure it will be handled.
> At least when all other CPUs are idle.

Which is true for the remote enqueue path too. But you inflict the
handling of this muck into the generic enqueue path as you have to turn
a 'global' timer into a remote timer right in the hot path.

When you enqueue it in the regular way on the 'global' list, then you
can delegate the expiry to a remote CPU on return to user, no?

Thanks,

        tglx
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Frederic Weisbecker 8 months, 1 week ago
Le Thu, Apr 10, 2025 at 04:46:06PM +0200, Thomas Gleixner a écrit :
> On Thu, Apr 10 2025 at 16:20, Frederic Weisbecker wrote:
> > Le Thu, Apr 10, 2025 at 03:56:02PM +0200, Gabriele Monaco a écrit :
> >> On Thu, 2025-04-10 at 15:27 +0200, Frederic Weisbecker wrote:
> >> > But how do we handle global timers that have been initialized and
> >> > queued from
> >> > isolated CPUs?
> >> 
> >> I need to sketch a bit more the solution but the rough idea is:
> >> 1. isolated CPUs don't pull remote timers
> >
> > That's the "easy" part.
> >
> >> 2. isolated CPUs ignore their global timers and let others pull them
> >>   perhaps with some more logic to avoid it expiring
> >
> > This will always involve added overhead because you may need to wake up
> > a CPU upon enqueueing a global timer to make sure it will be handled.
> > At least when all other CPUs are idle.
> 
> Which is true for the remote enqueue path too. But you inflict the
> handling of this muck into the generic enqueue path as you have to turn
> a 'global' timer into a remote timer right in the hot path.

Fair point.

> 
> When you enqueue it in the regular way on the 'global' list, then you
> can delegate the expiry to a remote CPU on return to user, no?

If you're referring to nohz_full, it's not a problem there because
it's already considered as an idle CPU.

But for isolcpus alone that notification is necessary. I'm not sure
if return to user is the best place. I hear that some kernel threads
can spend a lot of time doing things...

But to begin with, is this all really necessary for isolcpus users?

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Waiman Long 8 months, 1 week ago
On 4/10/25 10:54 AM, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 04:46:06PM +0200, Thomas Gleixner a écrit :
>> On Thu, Apr 10 2025 at 16:20, Frederic Weisbecker wrote:
>>> Le Thu, Apr 10, 2025 at 03:56:02PM +0200, Gabriele Monaco a écrit :
>>>> On Thu, 2025-04-10 at 15:27 +0200, Frederic Weisbecker wrote:
>>>>> But how do we handle global timers that have been initialized and
>>>>> queued from
>>>>> isolated CPUs?
>>>> I need to sketch a bit more the solution but the rough idea is:
>>>> 1. isolated CPUs don't pull remote timers
>>> That's the "easy" part.
>>>
>>>> 2. isolated CPUs ignore their global timers and let others pull them
>>>>    perhaps with some more logic to avoid it expiring
>>> This will always involve added overhead because you may need to wake up
>>> a CPU upon enqueueing a global timer to make sure it will be handled.
>>> At least when all other CPUs are idle.
>> Which is true for the remote enqueue path too. But you inflict the
>> handling of this muck into the generic enqueue path as you have to turn
>> a 'global' timer into a remote timer right in the hot path.
> Fair point.
>
>> When you enqueue it in the regular way on the 'global' list, then you
>> can delegate the expiry to a remote CPU on return to user, no?
> If you're referring to nohz_full, it's not a problem there because
> it's already considered as an idle CPU.
>
> But for isolcpus alone that notification is necessary. I'm not sure
> if return to user is the best place. I hear that some kernel threads
> can spend a lot of time doing things...
>
> But to begin with, is this all really necessary for isolcpus users?
>
In some of the actual customer use cases that I have seen, both 
"nohz_full" and "isolcpus" are set to the same set of CPUs to achieve 
maximum isolation. isolcpus have domain and managed_irq that are not 
covered by nohz_full. As isolcpus=nohz has been extended to be 
equivalent to nohz_full, users can now just use "isolcpus" to achieve 
that maximum CPU isolation.

Cheers,
Longman

Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Gabriele Monaco 8 months, 1 week ago

On Thu, 2025-04-10 at 16:20 +0200, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 03:56:02PM +0200, Gabriele Monaco a écrit :
> > On Thu, 2025-04-10 at 15:27 +0200, Frederic Weisbecker wrote:
> > > But how do we handle global timers that have been initialized and
> > > queued from
> > > isolated CPUs?
> > 
> > I need to sketch a bit more the solution but the rough idea is:
> > 1. isolated CPUs don't pull remote timers
> 
> That's the "easy" part.
> 
> > 2. isolated CPUs ignore their global timers and let others pull
> > them
> >   perhaps with some more logic to avoid it expiring
> 
> This will always involve added overhead because you may need to wake
> up
> a CPU upon enqueueing a global timer to make sure it will be handled.
> At least when all other CPUs are idle.
> 
> > Wouldn't that be sufficient?
> > 
> > Also, I would definitely do 1. for any kind of isolation, but I'm
> > not
> > sure about 2.
> > Strictly speaking domain isolated cores don't claim to be free of
> > kernel noise, even if they initiate it (but nohz_full ones do).
> > What would be the expectation there?
> 
> I don't know, I haven't heard complains about isolcpus handling
> global
> timers so far...
> 
> I wouldn't pay much attention to 2) until anybody complains. Does 1)
> even
> matter to anybody outside nohz_full ?
> 

Makes sense..
In our case, 2. is not a big issue because it can usually be solved by
other configurations, but 1. is an issue.
Most people indeed use nohz_full in that scenario, but some users may
not want its overhead.

I find it misleading at best for global timers to migrate from
housekeeping to isolcpus cores and since it's "easy", I'd definitely
change that.

Does it make sense?

Thanks,
Gabriele
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Frederic Weisbecker 8 months, 1 week ago
Le Thu, Apr 10, 2025 at 04:46:10PM +0200, Gabriele Monaco a écrit :
> 
> 
> On Thu, 2025-04-10 at 16:20 +0200, Frederic Weisbecker wrote:
> > Le Thu, Apr 10, 2025 at 03:56:02PM +0200, Gabriele Monaco a écrit :
> > > On Thu, 2025-04-10 at 15:27 +0200, Frederic Weisbecker wrote:
> > > > But how do we handle global timers that have been initialized and
> > > > queued from
> > > > isolated CPUs?
> > > 
> > > I need to sketch a bit more the solution but the rough idea is:
> > > 1. isolated CPUs don't pull remote timers
> > 
> > That's the "easy" part.
> > 
> > > 2. isolated CPUs ignore their global timers and let others pull
> > > them
> > >   perhaps with some more logic to avoid it expiring
> > 
> > This will always involve added overhead because you may need to wake
> > up
> > a CPU upon enqueueing a global timer to make sure it will be handled.
> > At least when all other CPUs are idle.
> > 
> > > Wouldn't that be sufficient?
> > > 
> > > Also, I would definitely do 1. for any kind of isolation, but I'm
> > > not
> > > sure about 2.
> > > Strictly speaking domain isolated cores don't claim to be free of
> > > kernel noise, even if they initiate it (but nohz_full ones do).
> > > What would be the expectation there?
> > 
> > I don't know, I haven't heard complains about isolcpus handling
> > global
> > timers so far...
> > 
> > I wouldn't pay much attention to 2) until anybody complains. Does 1)
> > even
> > matter to anybody outside nohz_full ?
> > 
> 
> Makes sense..
> In our case, 2. is not a big issue because it can usually be solved by
> other configurations, but 1. is an issue.
> Most people indeed use nohz_full in that scenario, but some users may
> not want its overhead.
> 
> I find it misleading at best for global timers to migrate from
> housekeeping to isolcpus cores and since it's "easy", I'd definitely
> change that.

Easy but still a bit invasive so:

> Does it make sense?

It makes sense but is there a real need for that? Have people
complained about that?

Thanks.

> 
> Thanks,
> Gabriele
> 

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Gabriele Monaco 8 months, 1 week ago

On Thu, 2025-04-10 at 16:59 +0200, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 04:46:10PM +0200, Gabriele Monaco a écrit :
> > 
> > 
> > On Thu, 2025-04-10 at 16:20 +0200, Frederic Weisbecker wrote:
> > > Le Thu, Apr 10, 2025 at 03:56:02PM +0200, Gabriele Monaco a écrit
> > > :
> > > > On Thu, 2025-04-10 at 15:27 +0200, Frederic Weisbecker wrote:
> > > > > But how do we handle global timers that have been initialized
> > > > > and
> > > > > queued from
> > > > > isolated CPUs?
> > > > 
> > > > I need to sketch a bit more the solution but the rough idea is:
> > > > 1. isolated CPUs don't pull remote timers
> > > 
> > > That's the "easy" part.
> > > 
> > > > 2. isolated CPUs ignore their global timers and let others pull
> > > > them
> > > >   perhaps with some more logic to avoid it expiring
> > > 
> > > This will always involve added overhead because you may need to
> > > wake
> > > up
> > > a CPU upon enqueueing a global timer to make sure it will be
> > > handled.
> > > At least when all other CPUs are idle.
> > > 
> > > > Wouldn't that be sufficient?
> > > > 
> > > > Also, I would definitely do 1. for any kind of isolation, but
> > > > I'm
> > > > not
> > > > sure about 2.
> > > > Strictly speaking domain isolated cores don't claim to be free
> > > > of
> > > > kernel noise, even if they initiate it (but nohz_full ones do).
> > > > What would be the expectation there?
> > > 
> > > I don't know, I haven't heard complains about isolcpus handling
> > > global
> > > timers so far...
> > > 
> > > I wouldn't pay much attention to 2) until anybody complains. Does
> > > 1)
> > > even
> > > matter to anybody outside nohz_full ?
> > > 
> > 
> > Makes sense..
> > In our case, 2. is not a big issue because it can usually be solved
> > by
> > other configurations, but 1. is an issue.
> > Most people indeed use nohz_full in that scenario, but some users
> > may
> > not want its overhead.
> > 
> > I find it misleading at best for global timers to migrate from
> > housekeeping to isolcpus cores and since it's "easy", I'd
> > definitely
> > change that.
> 
> Easy but still a bit invasive so:
> 

I'm not understanding what is going to be invasive in this case.
Aren't we just talking about not allowing isolcpus to pull timers from
other cpus?
Let's ignore for now the global timers started on those CPUs, I'm not
aware of complaints regarding that.

As far as I understand, the change would allow timer migration to work
normally out of isolcpus and among housekeeping ones, we are not
forcing any migration that would potentially introduce overhead or
missed timers.
Or am I oversimplifying it?

Thanks,
Gabriele
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Frederic Weisbecker 8 months, 1 week ago
Le Thu, Apr 10, 2025 at 05:05:52PM +0200, Gabriele Monaco a écrit :
> I'm not understanding what is going to be invasive in this case.
> Aren't we just talking about not allowing isolcpus to pull timers from
> other cpus?

Right, but in order to do that you need to avoid remotely executing
timers. And for that we need your initial patch (after reviews have
been addressed). Right?

> Let's ignore for now the global timers started on those CPUs, I'm not
> aware of complaints regarding that.
> 
> As far as I understand, the change would allow timer migration to work
> normally out of isolcpus and among housekeeping ones, we are not
> forcing any migration that would potentially introduce overhead or
> missed timers.
> Or am I oversimplifying it?

Global timers only migrate:

a) When a CPU goes offline (but they should be migrated to
  a housekeeping CPU)

b) When a timer executes remotely and re-enqueues itself.

In order to handle b), you must make sure an isolated CPU
can't execute timers remotely. Hence why it must be set
as not available like you did.

Or am I missing something else?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Gabriele Monaco 8 months, 1 week ago

On Thu, 2025-04-10 at 17:32 +0200, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 05:05:52PM +0200, Gabriele Monaco a écrit :
> > I'm not understanding what is going to be invasive in this case.
> > Aren't we just talking about not allowing isolcpus to pull timers
> > from
> > other cpus?
> 
> Right, but in order to do that you need to avoid remotely executing
> timers. And for that we need your initial patch (after reviews have
> been addressed). Right?
> 
> > Let's ignore for now the global timers started on those CPUs, I'm
> > not
> > aware of complaints regarding that.
> > 
> > As far as I understand, the change would allow timer migration to
> > work
> > normally out of isolcpus and among housekeeping ones, we are not
> > forcing any migration that would potentially introduce overhead or
> > missed timers.
> > Or am I oversimplifying it?
> 
> Global timers only migrate:
> 
> a) When a CPU goes offline (but they should be migrated to
>   a housekeeping CPU)
> 
> b) When a timer executes remotely and re-enqueues itself.
> 
> In order to handle b), you must make sure an isolated CPU
> can't execute timers remotely. Hence why it must be set
> as not available like you did.
> 
> Or am I missing something else?
> 

Mmh, my patch is in fact allowing isolated cores to still migrate
everything if they go offline.

However I don't think housekeeping CPUs can execute remote timers on
isolated ones. That is not a problem for offline CPUs (they won't start
anything and do the migration while offlining is enough), but we should
allow it here.
I may be missing something, but isn't it what [1] is doing?

Thanks,
Gabriele

[1] -
https://elixir.bootlin.com/linux/v6.13.7/source/kernel/time/timer_migration.c#L976
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Frederic Weisbecker 8 months, 1 week ago
Le Fri, Apr 11, 2025 at 09:08:35AM +0200, Gabriele Monaco a écrit :
> Mmh, my patch is in fact allowing isolated cores to still migrate
> everything if they go offline.

Sure that doesn't change.

> 
> However I don't think housekeeping CPUs can execute remote timers on
> isolated ones.

I'm confused, a CPU can't execute something on another CPU (except with
an IPI). But:

Before your patch, a housekeeping or isolated CPU can pull timers from
any other CPU and execute them on its behalf.

After your patch, a housekeeping CPU can only pull timers from other
housekeeping CPUs. And isolated CPUs each execute their own global timers.


> That is not a problem for offline CPUs (they won't start
> anything and do the migration while offlining is enough), but we should
> allow it here.
> I may be missing something, but isn't it what [1] is doing?

That's only something that avoids pulling timers from offlining CPUs.

The real migration happens at timers_dead_cpu(), which is called
directly by cpu_down() / cpu_up() callers context. And timers are
migrated to the current CPU, which is fine if cpu_down()/cpu_up()
are being called from a task that is affine to housekeeping.

Thanks.

> 
> Thanks,
> Gabriele
> 
> [1] -
> https://elixir.bootlin.com/linux/v6.13.7/source/kernel/time/timer_migration.c#L976
> 

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Gabriele Monaco 8 months, 1 week ago

On Fri, 2025-04-11 at 13:31 +0200, Frederic Weisbecker wrote:
> Le Fri, Apr 11, 2025 at 09:08:35AM +0200, Gabriele Monaco a écrit :
> > Mmh, my patch is in fact allowing isolated cores to still migrate
> > everything if they go offline.
> 
> Sure that doesn't change.
> 
> > 
> > However I don't think housekeeping CPUs can execute remote timers
> > on
> > isolated ones.
> 
> I'm confused, a CPU can't execute something on another CPU (except
> with
> an IPI). But:
> 
> Before your patch, a housekeeping or isolated CPU can pull timers
> from
> any other CPU and execute them on its behalf.
> 
> After your patch, a housekeeping CPU can only pull timers from other
> housekeeping CPUs. And isolated CPUs each execute their own global
> timers.
> 

Right, the way I said it doesn't really make sense.

What I mean is: why wouldn't a housekeeping CPU pull global timers from
an isolated one?

We want to prevent the other way around, but I think housekeeping
should be encouraged to pull timers from isolated CPUs, even if those
are not idle.

I see only preventing isolated CPUs from pulling remote timers may play
bad with the algorithm since they'd register in the hierarchy but just
not pull timers.
(This simpler approach works in our scenario though)

The idea in my patch could mostly work, but I'd explicitly let
housekeeping CPUs pull timers from isolated (while of course not doing
it for offline ones).

Does it make sense to you?

Thanks,
Gabriele
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Frederic Weisbecker 8 months, 1 week ago
Le Fri, Apr 11, 2025 at 03:02:11PM +0200, Gabriele Monaco a écrit :
> 
> 
> On Fri, 2025-04-11 at 13:31 +0200, Frederic Weisbecker wrote:
> > Le Fri, Apr 11, 2025 at 09:08:35AM +0200, Gabriele Monaco a écrit :
> > > Mmh, my patch is in fact allowing isolated cores to still migrate
> > > everything if they go offline.
> > 
> > Sure that doesn't change.
> > 
> > > 
> > > However I don't think housekeeping CPUs can execute remote timers
> > > on
> > > isolated ones.
> > 
> > I'm confused, a CPU can't execute something on another CPU (except
> > with
> > an IPI). But:
> > 
> > Before your patch, a housekeeping or isolated CPU can pull timers
> > from
> > any other CPU and execute them on its behalf.
> > 
> > After your patch, a housekeeping CPU can only pull timers from other
> > housekeeping CPUs. And isolated CPUs each execute their own global
> > timers.
> > 
> 
> Right, the way I said it doesn't really make sense.
> 
> What I mean is: why wouldn't a housekeeping CPU pull global timers from
> an isolated one?
> 
> We want to prevent the other way around, but I think housekeeping
> should be encouraged to pull timers from isolated CPUs, even if those
> are not idle.
> 
> I see only preventing isolated CPUs from pulling remote timers may play
> bad with the algorithm since they'd register in the hierarchy but just
> not pull timers.
> (This simpler approach works in our scenario though)
> 
> The idea in my patch could mostly work, but I'd explicitly let
> housekeeping CPUs pull timers from isolated (while of course not doing
> it for offline ones).
> 
> Does it make sense to you?

If you want housekeeping CPUs to pull timers from isolated ones, then you
need isolated CPUs to eventually be part of the hierarchy (be "available"),
because they would need to propagate their global timers up to the top.

If they propagate their global timers then they must play the whole hierarchy
game and be ready to pull the timers from any other CPUs.

Because if they propagate their timers, they must also propagate their
idle state to synchronize with the whole hierarchy and know if there is
a non-idle housekeeping CPU to take care of their timers. And if they propagate
their (non) idle state, the isolated CPUs also declare themselves as available
to handle other's timers.

And working around that would be very nasty.

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH] timers: Exclude isolated cpus from timer migation
Posted by Gabriele Monaco 8 months, 1 week ago
On Sat, 2025-04-12 at 00:57 +0200, Frederic Weisbecker wrote:
> 
> If you want housekeeping CPUs to pull timers from isolated ones, then
> you
> need isolated CPUs to eventually be part of the hierarchy (be
> "available"),
> because they would need to propagate their global timers up to the
> top.
> 
> If they propagate their global timers then they must play the whole
> hierarchy
> game and be ready to pull the timers from any other CPUs.
> 
> Because if they propagate their timers, they must also propagate
> their
> idle state to synchronize with the whole hierarchy and know if there
> is
> a non-idle housekeeping CPU to take care of their timers. And if they
> propagate
> their (non) idle state, the isolated CPUs also declare themselves as
> available
> to handle other's timers.
> 
> And working around that would be very nasty.
> 

Well, that's exactly what I was trying to do, thanks for pointing this
out. I guess I won't go down that rabbit hole then.

Thanks,
Gabriele