When a CPU hot removal causes a v1 cpuset to lose all its CPUs, the
cpuset hotplug handler will schedule a work function to migrate tasks
in that cpuset with no CPU to its ancestor to enable those tasks to
continue running.
If a strict security policy is in place, however, the task migration
may fail when security_task_setscheduler() call in cpuset_can_attach()
returns a -EACCESS error. That will mean that those tasks will have
no CPU to run on. The system administrators will have to explicitly
intervene to either add CPUs to that cpuset or move the tasks elsewhere
if they are aware of it.
This problem was found by a reported test failure in the LTP's
cpuset_hotplug_test.sh. Fix this problem by treating this special case
as an exception to skip the setsched security check as it is initated
internally within the kernel itself instead of from user input. Do that
by setting a new one-off CS_TASKS_OUT flag in the affected cpuset by the
hotplug handler to allow cpuset_can_attach() to skip the security check.
With that patch applied, the cpuset_hotplug_test.sh test can be run
successfully without failure.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/cgroup/cpuset-internal.h | 1 +
kernel/cgroup/cpuset-v1.c | 3 +++
kernel/cgroup/cpuset.c | 14 ++++++++++++++
3 files changed, 18 insertions(+)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index fd7d19842ded..75e2c20249ad 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -46,6 +46,7 @@ typedef enum {
CS_SCHED_LOAD_BALANCE,
CS_SPREAD_PAGE,
CS_SPREAD_SLAB,
+ CS_TASKS_OUT,
} cpuset_flagbits_t;
/* The various types of files and directories in a cpuset file system */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 7308e9b02495..0c818edd0a1d 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -322,6 +322,9 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
return;
}
+ /* Enable task removal without security check */
+ set_bit(CS_TASKS_OUT, &cs->flags);
+
s->cs = cs;
INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
schedule_work(&s->work);
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 58c5b7b72cca..24d3ceef7991 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3011,6 +3011,20 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
setsched_check = !cpuset_v2() ||
!cpumask_equal(cs->effective_cpus, oldcs->effective_cpus) ||
!nodes_equal(cs->effective_mems, oldcs->effective_mems);
+ /*
+ * Also check if task migration away from the old cpuset is allowed
+ * without security check. This bit should only be set by the hotplug
+ * handler when task migration from a child v1 cpuset to its ancestor
+ * is needed because there is no CPU left for the tasks to run on after
+ * a hot CPU removal. Clear the bit if set as it is one-off. Also
+ * doube-check the CPU emptiness of oldcs to be sure before clearing
+ * setsched_check.
+ */
+ if (test_bit(CS_TASKS_OUT, &oldcs->flags)) {
+ if (cpumask_empty(oldcs->effective_cpus))
+ setsched_check = false;
+ clear_bit(CS_TASKS_OUT, &oldcs->flags);
+ }
cgroup_taskset_for_each(task, css, tset) {
ret = task_can_attach(task);
--
2.53.0
On 2026/3/30 1:39, Waiman Long wrote:
> When a CPU hot removal causes a v1 cpuset to lose all its CPUs, the
> cpuset hotplug handler will schedule a work function to migrate tasks
> in that cpuset with no CPU to its ancestor to enable those tasks to
> continue running.
>
> If a strict security policy is in place, however, the task migration
> may fail when security_task_setscheduler() call in cpuset_can_attach()
> returns a -EACCESS error. That will mean that those tasks will have
> no CPU to run on. The system administrators will have to explicitly
> intervene to either add CPUs to that cpuset or move the tasks elsewhere
> if they are aware of it.
>
> This problem was found by a reported test failure in the LTP's
> cpuset_hotplug_test.sh. Fix this problem by treating this special case
> as an exception to skip the setsched security check as it is initated
> internally within the kernel itself instead of from user input. Do that
> by setting a new one-off CS_TASKS_OUT flag in the affected cpuset by the
> hotplug handler to allow cpuset_can_attach() to skip the security check.
>
> With that patch applied, the cpuset_hotplug_test.sh test can be run
> successfully without failure.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset-internal.h | 1 +
> kernel/cgroup/cpuset-v1.c | 3 +++
> kernel/cgroup/cpuset.c | 14 ++++++++++++++
> 3 files changed, 18 insertions(+)
>
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index fd7d19842ded..75e2c20249ad 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -46,6 +46,7 @@ typedef enum {
> CS_SCHED_LOAD_BALANCE,
> CS_SPREAD_PAGE,
> CS_SPREAD_SLAB,
> + CS_TASKS_OUT,
> } cpuset_flagbits_t;
>
> /* The various types of files and directories in a cpuset file system */
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> index 7308e9b02495..0c818edd0a1d 100644
> --- a/kernel/cgroup/cpuset-v1.c
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -322,6 +322,9 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
> return;
> }
>
> + /* Enable task removal without security check */
> + set_bit(CS_TASKS_OUT, &cs->flags);
> +
> s->cs = cs;
> INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
> schedule_work(&s->work);
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 58c5b7b72cca..24d3ceef7991 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -3011,6 +3011,20 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
> setsched_check = !cpuset_v2() ||
> !cpumask_equal(cs->effective_cpus, oldcs->effective_cpus) ||
> !nodes_equal(cs->effective_mems, oldcs->effective_mems);
> + /*
> + * Also check if task migration away from the old cpuset is allowed
> + * without security check. This bit should only be set by the hotplug
> + * handler when task migration from a child v1 cpuset to its ancestor
> + * is needed because there is no CPU left for the tasks to run on after
> + * a hot CPU removal. Clear the bit if set as it is one-off. Also
> + * doube-check the CPU emptiness of oldcs to be sure before clearing
> + * setsched_check.
> + */
> + if (test_bit(CS_TASKS_OUT, &oldcs->flags)) {
> + if (cpumask_empty(oldcs->effective_cpus))
> + setsched_check = false;
> + clear_bit(CS_TASKS_OUT, &oldcs->flags);
> + }
>
If there are many tasks in the cpuset that has no CPUs, they will be migrated
one by one. I'm afraid that only the first task will succeed, and the rest will
fail because the flag is cleared after processing the first one.
--
Best regards,
Ridong
On 3/29/26 9:48 PM, Chen Ridong wrote:
>
> On 2026/3/30 1:39, Waiman Long wrote:
>> When a CPU hot removal causes a v1 cpuset to lose all its CPUs, the
>> cpuset hotplug handler will schedule a work function to migrate tasks
>> in that cpuset with no CPU to its ancestor to enable those tasks to
>> continue running.
>>
>> If a strict security policy is in place, however, the task migration
>> may fail when security_task_setscheduler() call in cpuset_can_attach()
>> returns a -EACCESS error. That will mean that those tasks will have
>> no CPU to run on. The system administrators will have to explicitly
>> intervene to either add CPUs to that cpuset or move the tasks elsewhere
>> if they are aware of it.
>>
>> This problem was found by a reported test failure in the LTP's
>> cpuset_hotplug_test.sh. Fix this problem by treating this special case
>> as an exception to skip the setsched security check as it is initated
>> internally within the kernel itself instead of from user input. Do that
>> by setting a new one-off CS_TASKS_OUT flag in the affected cpuset by the
>> hotplug handler to allow cpuset_can_attach() to skip the security check.
>>
>> With that patch applied, the cpuset_hotplug_test.sh test can be run
>> successfully without failure.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> kernel/cgroup/cpuset-internal.h | 1 +
>> kernel/cgroup/cpuset-v1.c | 3 +++
>> kernel/cgroup/cpuset.c | 14 ++++++++++++++
>> 3 files changed, 18 insertions(+)
>>
>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>> index fd7d19842ded..75e2c20249ad 100644
>> --- a/kernel/cgroup/cpuset-internal.h
>> +++ b/kernel/cgroup/cpuset-internal.h
>> @@ -46,6 +46,7 @@ typedef enum {
>> CS_SCHED_LOAD_BALANCE,
>> CS_SPREAD_PAGE,
>> CS_SPREAD_SLAB,
>> + CS_TASKS_OUT,
>> } cpuset_flagbits_t;
>>
>> /* The various types of files and directories in a cpuset file system */
>> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>> index 7308e9b02495..0c818edd0a1d 100644
>> --- a/kernel/cgroup/cpuset-v1.c
>> +++ b/kernel/cgroup/cpuset-v1.c
>> @@ -322,6 +322,9 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>> return;
>> }
>>
>> + /* Enable task removal without security check */
>> + set_bit(CS_TASKS_OUT, &cs->flags);
>> +
>> s->cs = cs;
>> INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
>> schedule_work(&s->work);
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 58c5b7b72cca..24d3ceef7991 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -3011,6 +3011,20 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>> setsched_check = !cpuset_v2() ||
>> !cpumask_equal(cs->effective_cpus, oldcs->effective_cpus) ||
>> !nodes_equal(cs->effective_mems, oldcs->effective_mems);
>> + /*
>> + * Also check if task migration away from the old cpuset is allowed
>> + * without security check. This bit should only be set by the hotplug
>> + * handler when task migration from a child v1 cpuset to its ancestor
>> + * is needed because there is no CPU left for the tasks to run on after
>> + * a hot CPU removal. Clear the bit if set as it is one-off. Also
>> + * doube-check the CPU emptiness of oldcs to be sure before clearing
>> + * setsched_check.
>> + */
>> + if (test_bit(CS_TASKS_OUT, &oldcs->flags)) {
>> + if (cpumask_empty(oldcs->effective_cpus))
>> + setsched_check = false;
>> + clear_bit(CS_TASKS_OUT, &oldcs->flags);
>> + }
>>
> If there are many tasks in the cpuset that has no CPUs, they will be migrated
> one by one. I'm afraid that only the first task will succeed, and the rest will
> fail because the flag is cleared after processing the first one.
The setsched_check flag is used in the cgroup_taskset_for_each() loop
below. That loop is going to iterate all the tasks to be migrated and so
the flag will apply to all of them. So it is not just the first one.
Cheers,
Longman
Hello, On Mon, Mar 30, 2026 at 12:15:01PM -0400, Waiman Long wrote: ... > > If there are many tasks in the cpuset that has no CPUs, they will be migrated > > one by one. I'm afraid that only the first task will succeed, and the rest will > > fail because the flag is cleared after processing the first one. > > The setsched_check flag is used in the cgroup_taskset_for_each() loop below. > That loop is going to iterate all the tasks to be migrated and so the flag > will apply to all of them. So it is not just the first one. During migration, a taskset is used to group tasks in a thread group if cgroup_migrate() called with %true @threadgroup. That doens't really apply here. cgroup_transfer_tasks() doesn't set @threadgroup and even if it were to set set, there can just be multiple procesess. Besides, it's rather odd for it be a one-shot param that gets cleared deep in the stack. Wouldn't it make more sense to make whoever sets it to be responsible for clearing it? Thanks. -- tejun
On 3/30/26 2:21 PM, Tejun Heo wrote: > Hello, > > On Mon, Mar 30, 2026 at 12:15:01PM -0400, Waiman Long wrote: > ... >>> If there are many tasks in the cpuset that has no CPUs, they will be migrated >>> one by one. I'm afraid that only the first task will succeed, and the rest will >>> fail because the flag is cleared after processing the first one. >> The setsched_check flag is used in the cgroup_taskset_for_each() loop below. >> That loop is going to iterate all the tasks to be migrated and so the flag >> will apply to all of them. So it is not just the first one. > During migration, a taskset is used to group tasks in a thread group if > cgroup_migrate() called with %true @threadgroup. That doens't really apply > here. cgroup_transfer_tasks() doesn't set @threadgroup and even if it were > to set set, there can just be multiple procesess. Besides, it's rather odd > for it be a one-shot param that gets cleared deep in the stack. Wouldn't it > make more sense to make whoever sets it to be responsible for clearing it? Apparently, I have misunderstood how cgroup_transfer_tasks() works. Right, it calls cgroup_migrate_execute() on a process-by-process basis. So I shouldn't clear the flag in the first call. As for clearing the flag, I think we can do it in the CPU hot-add situation or when the cpuset.cpus is modified. Thanks, Longman
© 2016 - 2026 Red Hat, Inc.