[RFC PATCH v1] Add kthreads_update_affinity()

Costa Shulyupin posted 1 patch 11 months, 1 week ago
include/linux/kthread.h | 5 ++++-
kernel/cgroup/cpuset.c  | 1 +
kernel/kthread.c        | 6 ++++++
3 files changed, 11 insertions(+), 1 deletion(-)
[RFC PATCH v1] Add kthreads_update_affinity()
Posted by Costa Shulyupin 11 months, 1 week ago
Changing and using housekeeping and isolated CPUs requires reboot.

The goal is to change CPU isolation dynamically without reboot.

This patch is based on the parent patch
cgroup/cpuset: Exclude isolated CPUs from housekeeping CPU masks
https://lore.kernel.org/lkml/20240821142312.236970-3-longman@redhat.com/
Its purpose is to update isolation cpumasks.

However, some subsystems may use outdated housekeeping CPU masks. To
prevent the use of these isolated CPUs, it is essential to explicitly
propagate updates to the housekeeping masks across all subsystems that
depend on them.

This patch is not intended to be merged and disrupt the kernel.
It is still a proof-of-concept for research purposes.

The questions are:
- Is this the right direction, or should I explore an alternative approach?
- What factors need to be considered?
- Any suggestions or advice?
- Have similar attempts been made before?

Update the affinity of kthreadd and trigger the recalculation of kthread
affinities using kthreads_online_cpu().

The argument passed to kthreads_online_cpu() is irrelevant, as the
function reassigns affinities of kthreads based on their
preferred_affinity and the updated housekeeping_cpumask(HK_TYPE_KTHREAD).

Currently only RCU uses kthread_affine_preferred().

I dare to try calling kthread_affine_preferred() from kthread_run() to
set preferred_affinity as cpu_possible_mask for kthreads without a
specific affinity, enabling their management through
kthreads_online_cpu().

Any objections?

For details about kthreads affinity patterns please see:
https://lore.kernel.org/lkml/20241211154035.75565-16-frederic@kernel.org/

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
---
 include/linux/kthread.h | 5 ++++-
 kernel/cgroup/cpuset.c  | 1 +
 kernel/kthread.c        | 6 ++++++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 8d27403888ce9..b43c5aeb2cfd7 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -52,8 +52,10 @@ bool kthread_is_per_cpu(struct task_struct *k);
 ({									   \
 	struct task_struct *__k						   \
 		= kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \
-	if (!IS_ERR(__k))						   \
+	if (!IS_ERR(__k)) {						   \
+		kthread_affine_preferred(__k, cpu_possible_mask);	   \
 		wake_up_process(__k);					   \
+	}								   \
 	__k;								   \
 })
 
@@ -270,4 +272,5 @@ struct cgroup_subsys_state *kthread_blkcg(void);
 #else
 static inline void kthread_associate_blkcg(struct cgroup_subsys_state *css) { }
 #endif
+void kthreads_update_affinity(void);
 #endif /* _LINUX_KTHREAD_H */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 65658a5c2ac81..7d71acc7f46b6 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1355,6 +1355,7 @@ static void update_isolation_cpumasks(bool isolcpus_updated)
 	trl();
 	ret = housekeeping_exlude_isolcpus(isolated_cpus, HOUSEKEEPING_FLAGS);
 	WARN_ON_ONCE((ret < 0) && (ret != -EOPNOTSUPP));
+	kthreads_update_affinity();
 }
 
 /**
diff --git a/kernel/kthread.c b/kernel/kthread.c
index c4574c2d37e0d..2488cdf8aec17 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1763,3 +1763,9 @@ struct cgroup_subsys_state *kthread_blkcg(void)
 	return NULL;
 }
 #endif
+
+void kthreads_update_affinity(void)
+{
+	set_cpus_allowed_ptr(kthreadd_task, housekeeping_cpumask(HK_TYPE_KTHREAD));
+	kthreads_online_cpu(-1);
+}
-- 
2.47.0
Re: [RFC PATCH v1] Add kthreads_update_affinity()
Posted by Frederic Weisbecker 10 months, 4 weeks ago
Hi Costa,

Le Mon, Jan 13, 2025 at 09:08:54PM +0200, Costa Shulyupin a écrit :
> Changing and using housekeeping and isolated CPUs requires reboot.
> 
> The goal is to change CPU isolation dynamically without reboot.
> 
> This patch is based on the parent patch
> cgroup/cpuset: Exclude isolated CPUs from housekeeping CPU masks
> https://lore.kernel.org/lkml/20240821142312.236970-3-longman@redhat.com/
> Its purpose is to update isolation cpumasks.
> 
> However, some subsystems may use outdated housekeeping CPU masks. To
> prevent the use of these isolated CPUs, it is essential to explicitly
> propagate updates to the housekeeping masks across all subsystems that
> depend on them.
> 
> This patch is not intended to be merged and disrupt the kernel.
> It is still a proof-of-concept for research purposes.
> 
> The questions are:
> - Is this the right direction, or should I explore an alternative approach?
> - What factors need to be considered?
> - Any suggestions or advice?
> - Have similar attempts been made before?

Since the kthreads preferred affinity patchset just got merged,
I don't think anything needs to be done anymore in the kthreads front to toggle
nohz_full through cpusets safely. Let's see the details below:

> 
> Update the affinity of kthreadd and trigger the recalculation of kthread
> affinities using kthreads_online_cpu().
> 
> The argument passed to kthreads_online_cpu() is irrelevant, as the
> function reassigns affinities of kthreads based on their
> preferred_affinity and the updated housekeeping_cpumask(HK_TYPE_KTHREAD).
> 
> Currently only RCU uses kthread_affine_preferred().
> 
> I dare to try calling kthread_affine_preferred() from kthread_run() to
> set preferred_affinity as cpu_possible_mask for kthreads without a
> specific affinity, enabling their management through
> kthreads_online_cpu().
> 
> Any objections?
> 
> For details about kthreads affinity patterns please see:
> https://lore.kernel.org/lkml/20241211154035.75565-16-frederic@kernel.org/
> 
> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
> ---
>  include/linux/kthread.h | 5 ++++-
>  kernel/cgroup/cpuset.c  | 1 +
>  kernel/kthread.c        | 6 ++++++
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index 8d27403888ce9..b43c5aeb2cfd7 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -52,8 +52,10 @@ bool kthread_is_per_cpu(struct task_struct *k);
>  ({									   \
>  	struct task_struct *__k						   \
>  		= kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \
> -	if (!IS_ERR(__k))						   \
> +	if (!IS_ERR(__k)) {						   \
> +		kthread_affine_preferred(__k, cpu_possible_mask);	   \
>  		wake_up_process(__k);					   \
> +	}								   \
>  	__k;								   \
>  })
>  
> @@ -270,4 +272,5 @@ struct cgroup_subsys_state *kthread_blkcg(void);
>  #else
>  static inline void kthread_associate_blkcg(struct cgroup_subsys_state *css) { }
>  #endif
> +void kthreads_update_affinity(void);
>  #endif /* _LINUX_KTHREAD_H */
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 65658a5c2ac81..7d71acc7f46b6 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1355,6 +1355,7 @@ static void update_isolation_cpumasks(bool isolcpus_updated)
>  	trl();
>  	ret = housekeeping_exlude_isolcpus(isolated_cpus, HOUSEKEEPING_FLAGS);
>  	WARN_ON_ONCE((ret < 0) && (ret != -EOPNOTSUPP));
> +	kthreads_update_affinity();

A few things to consider:

1) update_isolation_cpumasks() will be called with cpus_read_lock()
  (cf: sched_partition_write() and cpuset_write_resmask()), therefore
  kthreads_online_cpu() can't run concurrently.

2) The constraint to turn on/off a CPU as nohz_full will be that the
   target CPU is offline.

So when the target CPU will later become offline, kthreads_online_cpu()
will take care of the newly modified housekeeping_mask() (which will be visible
because cpus_write_lock() is held) and apply accordingly the appropriate
effective affinity.

The only thing that might need to be done by update_isolation_cpumasks() is
to hold kthreads_hotplug_lock so that a subsequent call to
kthread_affine_preferred() sees the "freshest" update to housekeeping_mask().
But even that may not be mandatory because the concerned CPUs are offline
and kthreads_online_cpu() will fix the race when they boot.

Oh and the special case kthreadd might need a direct affinity update.

So the good news is that we have a lot of things sorted out to prepare
for that cpuset interface:

_ RCU NOCB is ready
_ kthreads are ready
_ timers are fine since we toggle only offline CPUs and timer_migration.c
  should work without changes.

Still some details need to be taken care of:

* scheduler (see the housekeeping_mask() references, especially the ilb which is
  my biggest worry, get_nohz_timer_target() shouldn't be an issue)
  
* posix cpu timers (make tick_dep unconditional ?)

* kernel/watchdog.c (make proc_watchdog_cpumask() and
  lockup_detector_online_cpu() safe against update_isolation_cpumasks()
  
* workqueue unbound mask (just apply the new one?)

* some RCU tick_dep to handle (make them unconditional since they
  apply on slow path anyway?)
  
* other things? (grep for tick_nohz_full_cpu(), housekeeping_* and tick_dep_* )

But we are getting closer!

Thanks!
Re: [RFC PATCH v1] Add kthreads_update_affinity()
Posted by Costa Shulyupin 10 months ago
Hi Frederic,

On Wed, 22 Jan 2025 at 19:11, Frederic Weisbecker <frederic@kernel.org> wrote:
> > @@ -1355,6 +1355,7 @@ static void update_isolation_cpumasks(bool isolcpus_updated)
...
> > +     kthreads_update_affinity();
>
> A few things to consider:
>
> 1) update_isolation_cpumasks() will be called with cpus_read_lock()
>   (cf: sched_partition_write() and cpuset_write_resmask()), therefore
>   kthreads_online_cpu() can't run concurrently.

Sorry, but I don't understand what you mean by “kthreads_online_cpu()
can't run concurrently.” Could you clarify please?

> 2) The constraint to turn on/off a CPU as nohz_full will be that the
>    target CPU is offline.

The final goal of CPU isolation is to isolate real-time applications from
disturbances and ensure low latency. However, CPU hotplug disrupts
real-time tasks including the oslat test, which measures latency using RDTSC.
While performing a full CPU offline-online cycle at runtime can help avoid
long reboots and reduce downtime, it does not achieve the goal of
maintaining consistently low latency for real-time applications.

> * scheduler (see the housekeeping_mask() references, especially the ilb which is
>   my biggest worry, get_nohz_timer_target() shouldn't be an issue)

Are you referring to find_new_ilb()? What are your concerns?

> * posix cpu timers (make tick_dep unconditional ?)
Do you refer to the arm_timer()?
Could you please clarify which condition you are referring to?

> But we are getting closer!
Thank you very much for the detailed review!

Thanks,
Costa