kernel/sched/core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
Chen Ridong reported that cpuset could report a kernel warning for a task
due to set_cpus_allowed_ptr() returning failure in the corner case that:
1) the task used sched_setaffinity(2) to set its CPU affinity mask to
be the same as the cpuset.cpus of its cpuset,
2) all the CPUs assigned to that cpuset were taken offline, and
3) cpuset v1 is in use and the task had to be migrated to the top cpuset.
Due to the fact that CPU affinity of the tasks in the top cpuset are
not updated when a CPU hotplug online/offline event happens, offline
CPUs are included in CPU affinity of those tasks. It is possible
that further masking with user_cpus_ptr set by sched_setaffinity(2)
in __set_cpus_allowed_ptr() will leave only offline CPUs in the new
mask causing the subsequent call to __set_cpus_allowed_ptr_locked()
to return failure with an empty CPU affinity.
Fix this failure by skipping user_cpus_ptr masking if there is no online
CPU left.
Reported-by: Chen Ridong <chenridong@huaweicloud.com>
Closes: https://lore.kernel.org/lkml/20250714032311.3570157-1-chenridong@huaweicloud.com/
Fixes: da019032819a ("sched: Enforce user requested affinity")
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/sched/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81c6df746df1..208f8af73134 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3170,12 +3170,13 @@ int __set_cpus_allowed_ptr(struct task_struct *p, struct affinity_context *ctx)
rq = task_rq_lock(p, &rf);
/*
- * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_*
- * flags are set.
+ * Masking should be skipped if SCA_USER, any of the SCA_MIGRATE_*
+ * flags are set or no online CPU left.
*/
if (p->user_cpus_ptr &&
!(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) &&
- cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr))
+ cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) &&
+ cpumask_intersects(rq->scratch_mask, cpu_active_mask))
ctx->new_mask = rq->scratch_mask;
return __set_cpus_allowed_ptr_locked(p, ctx, rq, &rf);
--
2.50.0
On 7/18/25 12:41 PM, Waiman Long wrote: > Chen Ridong reported that cpuset could report a kernel warning for a task > due to set_cpus_allowed_ptr() returning failure in the corner case that: > > 1) the task used sched_setaffinity(2) to set its CPU affinity mask to > be the same as the cpuset.cpus of its cpuset, > 2) all the CPUs assigned to that cpuset were taken offline, and > 3) cpuset v1 is in use and the task had to be migrated to the top cpuset. > > Due to the fact that CPU affinity of the tasks in the top cpuset are > not updated when a CPU hotplug online/offline event happens, offline > CPUs are included in CPU affinity of those tasks. It is possible > that further masking with user_cpus_ptr set by sched_setaffinity(2) > in __set_cpus_allowed_ptr() will leave only offline CPUs in the new > mask causing the subsequent call to __set_cpus_allowed_ptr_locked() > to return failure with an empty CPU affinity. > > Fix this failure by skipping user_cpus_ptr masking if there is no online > CPU left. > > Reported-by: Chen Ridong <chenridong@huaweicloud.com> > Closes: https://lore.kernel.org/lkml/20250714032311.3570157-1-chenridong@huaweicloud.com/ > Fixes: da019032819a ("sched: Enforce user requested affinity") > Signed-off-by: Waiman Long <longman@redhat.com> > --- > kernel/sched/core.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) Sorry, I forgot to change the patch title. Will send out a v3. Cheers, Longman > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 81c6df746df1..208f8af73134 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3170,12 +3170,13 @@ int __set_cpus_allowed_ptr(struct task_struct *p, struct affinity_context *ctx) > > rq = task_rq_lock(p, &rf); > /* > - * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_* > - * flags are set. > + * Masking should be skipped if SCA_USER, any of the SCA_MIGRATE_* > + * flags are set or no online CPU left. > */ > if (p->user_cpus_ptr && > !(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) && > - cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr)) > + cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) && > + cpumask_intersects(rq->scratch_mask, cpu_active_mask)) > ctx->new_mask = rq->scratch_mask; > > return __set_cpus_allowed_ptr_locked(p, ctx, rq, &rf);
Chen Ridong reported that cpuset could report a kernel warning for a task
due to set_cpus_allowed_ptr() returning failure in the corner case that:
1) the task used sched_setaffinity(2) to set its CPU affinity mask to
be the same as the cpuset.cpus of its cpuset,
2) all the CPUs assigned to that cpuset were taken offline, and
3) cpuset v1 is in use and the task had to be migrated to the top cpuset.
Due to the fact that CPU affinity of the tasks in the top cpuset are
not updated when a CPU hotplug online/offline event happens, offline
CPUs are included in CPU affinity of those tasks. It is possible
that further masking with user_cpus_ptr set by sched_setaffinity(2)
in __set_cpus_allowed_ptr() will leave only offline CPUs in the new
mask causing the subsequent call to __set_cpus_allowed_ptr_locked()
to return failure with an empty CPU affinity.
Fix this failure by skipping user_cpus_ptr masking if there is no online
CPU left.
Reported-by: Chen Ridong <chenridong@huaweicloud.com>
Closes: https://lore.kernel.org/lkml/20250714032311.3570157-1-chenridong@huaweicloud.com/
Fixes: da019032819a ("sched: Enforce user requested affinity")
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/sched/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81c6df746df1..208f8af73134 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3170,12 +3170,13 @@ int __set_cpus_allowed_ptr(struct task_struct *p, struct affinity_context *ctx)
rq = task_rq_lock(p, &rf);
/*
- * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_*
- * flags are set.
+ * Masking should be skipped if SCA_USER, any of the SCA_MIGRATE_*
+ * flags are set or no online CPU left.
*/
if (p->user_cpus_ptr &&
!(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) &&
- cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr))
+ cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) &&
+ cpumask_intersects(rq->scratch_mask, cpu_active_mask))
ctx->new_mask = rq->scratch_mask;
return __set_cpus_allowed_ptr_locked(p, ctx, rq, &rf);
--
2.50.0
On 2025/7/19 0:48, Waiman Long wrote: > Chen Ridong reported that cpuset could report a kernel warning for a task > due to set_cpus_allowed_ptr() returning failure in the corner case that: > > 1) the task used sched_setaffinity(2) to set its CPU affinity mask to > be the same as the cpuset.cpus of its cpuset, > 2) all the CPUs assigned to that cpuset were taken offline, and > 3) cpuset v1 is in use and the task had to be migrated to the top cpuset. > > Due to the fact that CPU affinity of the tasks in the top cpuset are > not updated when a CPU hotplug online/offline event happens, offline > CPUs are included in CPU affinity of those tasks. It is possible > that further masking with user_cpus_ptr set by sched_setaffinity(2) > in __set_cpus_allowed_ptr() will leave only offline CPUs in the new > mask causing the subsequent call to __set_cpus_allowed_ptr_locked() > to return failure with an empty CPU affinity. > > Fix this failure by skipping user_cpus_ptr masking if there is no online > CPU left. > > Reported-by: Chen Ridong <chenridong@huaweicloud.com> > Closes: https://lore.kernel.org/lkml/20250714032311.3570157-1-chenridong@huaweicloud.com/ > Fixes: da019032819a ("sched: Enforce user requested affinity") > Signed-off-by: Waiman Long <longman@redhat.com> > --- > kernel/sched/core.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 81c6df746df1..208f8af73134 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3170,12 +3170,13 @@ int __set_cpus_allowed_ptr(struct task_struct *p, struct affinity_context *ctx) > > rq = task_rq_lock(p, &rf); > /* > - * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_* > - * flags are set. > + * Masking should be skipped if SCA_USER, any of the SCA_MIGRATE_* > + * flags are set or no online CPU left. > */ > if (p->user_cpus_ptr && > !(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) && > - cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr)) > + cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) && > + cpumask_intersects(rq->scratch_mask, cpu_active_mask)) > ctx->new_mask = rq->scratch_mask; > > return __set_cpus_allowed_ptr_locked(p, ctx, rq, &rf); Tested-by: Chen Ridong <chenridong@huawei.com>
On 2025/7/23 9:58, Chen Ridong wrote: > > > On 2025/7/19 0:48, Waiman Long wrote: >> Chen Ridong reported that cpuset could report a kernel warning for a task >> due to set_cpus_allowed_ptr() returning failure in the corner case that: >> >> 1) the task used sched_setaffinity(2) to set its CPU affinity mask to >> be the same as the cpuset.cpus of its cpuset, >> 2) all the CPUs assigned to that cpuset were taken offline, and >> 3) cpuset v1 is in use and the task had to be migrated to the top cpuset. >> >> Due to the fact that CPU affinity of the tasks in the top cpuset are >> not updated when a CPU hotplug online/offline event happens, offline >> CPUs are included in CPU affinity of those tasks. It is possible >> that further masking with user_cpus_ptr set by sched_setaffinity(2) >> in __set_cpus_allowed_ptr() will leave only offline CPUs in the new >> mask causing the subsequent call to __set_cpus_allowed_ptr_locked() >> to return failure with an empty CPU affinity. >> >> Fix this failure by skipping user_cpus_ptr masking if there is no online >> CPU left. >> >> Reported-by: Chen Ridong <chenridong@huaweicloud.com> >> Closes: https://lore.kernel.org/lkml/20250714032311.3570157-1-chenridong@huaweicloud.com/ >> Fixes: da019032819a ("sched: Enforce user requested affinity") >> Signed-off-by: Waiman Long <longman@redhat.com> >> --- >> kernel/sched/core.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 81c6df746df1..208f8af73134 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -3170,12 +3170,13 @@ int __set_cpus_allowed_ptr(struct task_struct *p, struct affinity_context *ctx) >> >> rq = task_rq_lock(p, &rf); >> /* >> - * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_* >> - * flags are set. >> + * Masking should be skipped if SCA_USER, any of the SCA_MIGRATE_* >> + * flags are set or no online CPU left. >> */ >> if (p->user_cpus_ptr && >> !(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) && >> - cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr)) >> + cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) && >> + cpumask_intersects(rq->scratch_mask, cpu_active_mask)) >> ctx->new_mask = rq->scratch_mask; >> >> return __set_cpus_allowed_ptr_locked(p, ctx, rq, &rf); > > Tested-by: Chen Ridong <chenridong@huawei.com> > Friendly ping. Best regards, Ridong
On 2025/7/31 20:03, Chen Ridong wrote: > > > On 2025/7/23 9:58, Chen Ridong wrote: >> >> >> On 2025/7/19 0:48, Waiman Long wrote: >>> Chen Ridong reported that cpuset could report a kernel warning for a task >>> due to set_cpus_allowed_ptr() returning failure in the corner case that: >>> >>> 1) the task used sched_setaffinity(2) to set its CPU affinity mask to >>> be the same as the cpuset.cpus of its cpuset, >>> 2) all the CPUs assigned to that cpuset were taken offline, and >>> 3) cpuset v1 is in use and the task had to be migrated to the top cpuset. >>> >>> Due to the fact that CPU affinity of the tasks in the top cpuset are >>> not updated when a CPU hotplug online/offline event happens, offline >>> CPUs are included in CPU affinity of those tasks. It is possible >>> that further masking with user_cpus_ptr set by sched_setaffinity(2) >>> in __set_cpus_allowed_ptr() will leave only offline CPUs in the new >>> mask causing the subsequent call to __set_cpus_allowed_ptr_locked() >>> to return failure with an empty CPU affinity. >>> >>> Fix this failure by skipping user_cpus_ptr masking if there is no online >>> CPU left. >>> >>> Reported-by: Chen Ridong <chenridong@huaweicloud.com> >>> Closes: https://lore.kernel.org/lkml/20250714032311.3570157-1-chenridong@huaweicloud.com/ >>> Fixes: da019032819a ("sched: Enforce user requested affinity") >>> Signed-off-by: Waiman Long <longman@redhat.com> >>> --- >>> kernel/sched/core.c | 7 ++++--- >>> 1 file changed, 4 insertions(+), 3 deletions(-) >>> >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>> index 81c6df746df1..208f8af73134 100644 >>> --- a/kernel/sched/core.c >>> +++ b/kernel/sched/core.c >>> @@ -3170,12 +3170,13 @@ int __set_cpus_allowed_ptr(struct task_struct *p, struct affinity_context *ctx) >>> >>> rq = task_rq_lock(p, &rf); >>> /* >>> - * Masking should be skipped if SCA_USER or any of the SCA_MIGRATE_* >>> - * flags are set. >>> + * Masking should be skipped if SCA_USER, any of the SCA_MIGRATE_* >>> + * flags are set or no online CPU left. >>> */ >>> if (p->user_cpus_ptr && >>> !(ctx->flags & (SCA_USER | SCA_MIGRATE_ENABLE | SCA_MIGRATE_DISABLE)) && >>> - cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr)) >>> + cpumask_and(rq->scratch_mask, ctx->new_mask, p->user_cpus_ptr) && >>> + cpumask_intersects(rq->scratch_mask, cpu_active_mask)) >>> ctx->new_mask = rq->scratch_mask; >>> >>> return __set_cpus_allowed_ptr_locked(p, ctx, rq, &rf); >> >> Tested-by: Chen Ridong <chenridong@huawei.com> >> > > Friendly ping. > > Best regards, > Ridong > Could someone please review this patch? -- Best regards, Ridong
On Fri, Jul 18, 2025 at 12:48:56PM -0400, Waiman Long <longman@redhat.com> wrote: > Chen Ridong reported that cpuset could report a kernel warning for a task > due to set_cpus_allowed_ptr() returning failure in the corner case that: > > 1) the task used sched_setaffinity(2) to set its CPU affinity mask to > be the same as the cpuset.cpus of its cpuset, > 2) all the CPUs assigned to that cpuset were taken offline, and > 3) cpuset v1 is in use and the task had to be migrated to the top cpuset. Does this make sense for cpuset v2 (or no cpuset at all for that matter)? I'm asking whether this mask modification could only be extracted into cpuset-v1.c (like cgroup_tranfer_tasks() or a new function) Thanks, Michal
On 7/21/25 11:13 AM, Michal Koutný wrote: > On Fri, Jul 18, 2025 at 12:48:56PM -0400, Waiman Long <longman@redhat.com> wrote: >> Chen Ridong reported that cpuset could report a kernel warning for a task >> due to set_cpus_allowed_ptr() returning failure in the corner case that: >> >> 1) the task used sched_setaffinity(2) to set its CPU affinity mask to >> be the same as the cpuset.cpus of its cpuset, >> 2) all the CPUs assigned to that cpuset were taken offline, and >> 3) cpuset v1 is in use and the task had to be migrated to the top cpuset. > Does this make sense for cpuset v2 (or no cpuset at all for that matter)? > I'm asking whether this mask modification could only be extracted into > cpuset-v1.c (like cgroup_tranfer_tasks() or a new function) This corner case as specified in Chen Ridong's patch only happens with a cpuset v1 environment, but it is still the case that the default cpu affinity of the root cgroup (with or without CONFIG_CGROUPS) will include offline CPUs, if present. So it still make senses to skip the sched_setaffinity() setting if there is no online CPU left, though it will be much harder to have such a condition without using cpuset v1. Cheers, Longman
Hi. I had a look after a while (thanks for reminders Ridong). On Mon, Jul 21, 2025 at 11:28:15AM -0400, Waiman Long <llong@redhat.com> wrote: > This corner case as specified in Chen Ridong's patch only happens with a > cpuset v1 environment, but it is still the case that the default cpu > affinity of the root cgroup (with or without CONFIG_CGROUPS) will include > offline CPUs, if present. IIUC, the generic sched_setaffinity(2) is ready for that, simply returning an EINVAL. > So it still make senses to skip the sched_setaffinity() setting if > there is no online CPU left, though it will be much harder to have > such a condition without using cpuset v1. That sounds like there'd be no issue without cpuset v1 and the source of the warning has quite a telling comment: * fail. TODO: have a better way to handle failure here */ WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach)); The trouble is that this is from cpuset_attach() (cgroup_subsys.attach) where no errors are expected. So I'd say the place for the check should be earlier in cpuset_can_attach() [1]. I'm not sure if that's universally immune against cpu offlining but it'd be sufficient for the reported sequential offlining. HTH, Michal [1] Although the error propagates, it ends up without recovery in remove_tasks_in_empty_cpuset() "only" as an error message. But that's likely all what can be done in this workfn context -- it's better than silently skipping the migration as consequence of this patch.
On 8/26/25 10:25 AM, Michal Koutný wrote: > Hi. > > I had a look after a while (thanks for reminders Ridong). > > On Mon, Jul 21, 2025 at 11:28:15AM -0400, Waiman Long <llong@redhat.com> wrote: >> This corner case as specified in Chen Ridong's patch only happens with a >> cpuset v1 environment, but it is still the case that the default cpu >> affinity of the root cgroup (with or without CONFIG_CGROUPS) will include >> offline CPUs, if present. > IIUC, the generic sched_setaffinity(2) is ready for that, simply > returning an EINVAL. The modified code will not be executed when called from sched_setaffiity() as the SCA_USER flag will be set. In the described scenario, sched_setaffinity() was called without failure as the request was valid at the time. > >> So it still make senses to skip the sched_setaffinity() setting if >> there is no online CPU left, though it will be much harder to have >> such a condition without using cpuset v1. > That sounds like there'd be no issue without cpuset v1 and the source of > the warning has quite a telling comment: > > * fail. TODO: have a better way to handle failure here > */ > WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach)); > > The trouble is that this is from cpuset_attach() (cgroup_subsys.attach) > where no errors are expected. So I'd say the place for the check should > be earlier in cpuset_can_attach() [1]. I'm not sure if that's universally > immune against cpu offlining but it'd be sufficient for the reported > sequential offlining. Cpuset1 has no concept of effective cpumask that excludes offline CPUs unless "cpuset_v2_mode" mount option is used. So when the cpuset has no CPU left, it will force migrate the tasks to its parent and the __set_cpus_allowed_ptr() function will be invoked. The parent will likely have those offline CPUs in their cpus_allowed list and __set_cpus_allowed_ptr_locked() will be called with only the offline CPUs causing the warning. Migrating to the top_cpuset is probably not needed to illustrate the problem. Cheers, Longman > HTH, > Michal > > [1] Although the error propagates, it ends up without recovery in > remove_tasks_in_empty_cpuset() "only" as an error message. But that's > likely all what can be done in this workfn context -- it's better than > silently skipping the migration as consequence of this patch.
© 2016 - 2025 Red Hat, Inc.