Documentation/admin-guide/cgroup-v2.rst | 6 ++++-- kernel/cgroup/cpuset.c | 16 +++++++++++++--- 2 files changed, 17 insertions(+), 5 deletions(-)
Creation of a cpuset partition or adding more CPUs to an existing
partition will take CPUs away from other cpusets outside of the
partition leaving less CPUs for the others. So it is a privileged
operation that non-privileged users shouldn't be allowed to do.
Currently, remote partition code has check for CAP_SYS_ADMIN capability
before allowing such operations, but not for local partition. This leaves
a security hole in case cpuset.cpus.partition of a cpuset is chown'ed
to a non-root user and its parent cpuset happens to be a partition root.
Add such privilege check for local partition too to close such a hole.
Also update Documentation/admin-guide/cgroup-v2.rst to clarify the
intention.
With this patch applied, any attempt to enable partition or add CPUs
to an existing local or remote partition by an unprivileged user will
invalidate the partition even if writing to cpuset control files are
allowed.
Fixes: ee8dde0cd2ce ("cpuset: Add new v2 cpuset.sched.partition flag")
Reported-by: Xie Maoyi <maoyi.xie@ntu.edu.sg>
Signed-off-by: Waiman Long <longman@redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 6 ++++--
kernel/cgroup/cpuset.c | 16 +++++++++++++---
2 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 6efd0095ed99..df58557902db 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2599,8 +2599,10 @@ Cpuset Interface Files
cpuset.cpus.partition
A read-write single value file which exists on non-root
- cpuset-enabled cgroups. This flag is owned by the parent cgroup
- and is not delegatable.
+ cpuset-enabled cgroups. This file is owned by the parent cgroup
+ and is not delegatable. Any partition operations that take CPUs
+ away from other cpusets outside of a partition is not allowed
+ without privilege.
It accepts only the following input values when written to.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index e3a081a07c6d..5fc8555f2046 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -57,7 +57,7 @@ static const char * const perr_strings[] = {
[PERR_HOTPLUG] = "No cpu available due to hotplug",
[PERR_CPUSEMPTY] = "cpuset.cpus and cpuset.cpus.exclusive are empty",
[PERR_HKEEPING] = "partition config conflicts with housekeeping setup",
- [PERR_ACCESS] = "Enable partition not permitted",
+ [PERR_ACCESS] = "Partition operation not permitted",
[PERR_REMOTE] = "Have remote partition underneath",
};
@@ -1740,6 +1740,8 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
nocpu = tasks_nocpu_error(parent, cs, xcpus);
if ((cmd == partcmd_enable) || (cmd == partcmd_enablei)) {
+ if (!capable(CAP_SYS_ADMIN))
+ return PERR_ACCESS;
/*
* Need to call compute_excpus() in case
* exclusive_cpus not set. Sibling conflict should only happen
@@ -1833,12 +1835,18 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
parent->effective_xcpus);
}
+ /*
+ * Taking CPUs away from parent is not allowed without privilege
+ */
+ if (deleting && !capable(CAP_SYS_ADMIN))
+ part_error = PERR_ACCESS;
+
/*
* TBD: Invalidate a currently valid child root partition may
* still break isolated_cpus_can_update() rule if parent is an
* isolated partition.
*/
- if (is_partition_valid(cs) && (old_prs != parent_prs)) {
+ else if (is_partition_valid(cs) && (old_prs != parent_prs)) {
if ((parent_prs == PRS_ROOT) &&
/* Adding to parent means removing isolated CPUs */
!isolated_cpus_can_update(tmp->delmask, tmp->addmask))
@@ -1919,8 +1927,10 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
}
write_error:
- if (part_error)
+ if (part_error) {
WRITE_ONCE(cs->prs_err, part_error);
+ adding = deleting = false;
+ }
if (cmd == partcmd_update) {
/*
--
2.53.0
Hi Waiman.
On Mon, Apr 27, 2026 at 11:34:39PM -0400, Waiman Long <longman@redhat.com> wrote:
> Creation of a cpuset partition or adding more CPUs to an existing
> partition will take CPUs away from other cpusets outside of the
> partition leaving less CPUs for the others. So it is a privileged
> operation that non-privileged users shouldn't be allowed to do.
>
> Currently, remote partition code has check for CAP_SYS_ADMIN capability
> before allowing such operations, but not for local partition.
Remote partitions need such a check because their CPUs are sourced from
the global supply (top level) without
> This leaves a security hole in case cpuset.cpus.partition of a cpuset
> is chown'ed to a non-root user and its parent cpuset happens to be a
> partition root.
I wouldn't say this difference between remote and local partitions is a
security hole [1].
Consider this -- cgroup a is created by root (admin) and its resources
are constrained by root's policy. However, what happens in a subtree is
irrelevant from that top level view.
# setup // owner
a/cpuset.partition=root // root
a/cpuset.cpus=0-3 // root
a/cgroup.procs // user, they can organize subtree as needed
For example the user may want to create a (sub)partition with some of
the CPUs they got:
user$ mkdir a/b
a/b/cpuset.partition=root // user
a/b/cpuset.cpus=0-1 // user
This should be a valid configuration and behavior, no?
Thanks,
Michal
[1] And thanks to the need of cpuset.cpus.exclusive chain down the tree,
the capability check for remote partitions may be too restrictive
too. But I don't not plead for its removal now.
On 4/28/26 3:58 AM, Michal Koutný wrote: > Hi Waiman. > > On Mon, Apr 27, 2026 at 11:34:39PM -0400, Waiman Long <longman@redhat.com> wrote: >> Creation of a cpuset partition or adding more CPUs to an existing >> partition will take CPUs away from other cpusets outside of the >> partition leaving less CPUs for the others. So it is a privileged >> operation that non-privileged users shouldn't be allowed to do. >> >> Currently, remote partition code has check for CAP_SYS_ADMIN capability >> before allowing such operations, but not for local partition. > Remote partitions need such a check because their CPUs are sourced from > the global supply (top level) without > >> This leaves a security hole in case cpuset.cpus.partition of a cpuset >> is chown'ed to a non-root user and its parent cpuset happens to be a >> partition root. > I wouldn't say this difference between remote and local partitions is a > security hole [1]. OK, I will tone down the description. > > Consider this -- cgroup a is created by root (admin) and its resources > are constrained by root's policy. However, what happens in a subtree is > irrelevant from that top level view. > > # setup // owner > a/cpuset.partition=root // root > a/cpuset.cpus=0-3 // root > a/cgroup.procs // user, they can organize subtree as needed > > For example the user may want to create a (sub)partition with some of > the CPUs they got: > > user$ mkdir a/b > > a/b/cpuset.partition=root // user > a/b/cpuset.cpus=0-1 // user > > This should be a valid configuration and behavior, no? Thank for the comment. Yes, that can be a valid configuration. One possible workaround may be to see if the current user has write access to its parent partition root. If so, we can allow it to create a sub-partition, if not, we will forbid it. Cheers, Longman
On 4/28/26 11:19 AM, Waiman Long wrote: > On 4/28/26 3:58 AM, Michal Koutný wrote: >> Hi Waiman. >> >> On Mon, Apr 27, 2026 at 11:34:39PM -0400, Waiman Long >> <longman@redhat.com> wrote: >>> Creation of a cpuset partition or adding more CPUs to an existing >>> partition will take CPUs away from other cpusets outside of the >>> partition leaving less CPUs for the others. So it is a privileged >>> operation that non-privileged users shouldn't be allowed to do. >>> >>> Currently, remote partition code has check for CAP_SYS_ADMIN capability >>> before allowing such operations, but not for local partition. >> Remote partitions need such a check because their CPUs are sourced from >> the global supply (top level) without >> >>> This leaves a security hole in case cpuset.cpus.partition of a cpuset >>> is chown'ed to a non-root user and its parent cpuset happens to be a >>> partition root. >> I wouldn't say this difference between remote and local partitions is a >> security hole [1]. > OK, I will tone down the description. >> >> Consider this -- cgroup a is created by root (admin) and its resources >> are constrained by root's policy. However, what happens in a subtree is >> irrelevant from that top level view. >> >> # setup // owner >> a/cpuset.partition=root // root >> a/cpuset.cpus=0-3 // root >> a/cgroup.procs // user, they can organize subtree as needed >> >> For example the user may want to create a (sub)partition with some of >> the CPUs they got: >> >> user$ mkdir a/b >> >> a/b/cpuset.partition=root // user >> a/b/cpuset.cpus=0-1 // user >> >> This should be a valid configuration and behavior, no? > > Thank for the comment. Yes, that can be a valid configuration. > > One possible workaround may be to see if the current user has write > access to its parent partition root. If so, we can allow it to create > a sub-partition, if not, we will forbid it. It is not that simple to check if the user can write to its parent. So I will put it down as a TODO item, but will still forbid such a configuration for now. Cheers, Longman
Hello, On Tue, Apr 28, 2026 at 11:19:16AM -0400, Waiman Long wrote: ... > Thank for the comment. Yes, that can be a valid configuration. > > One possible workaround may be to see if the current user has write access > to its parent partition root. If so, we can allow it to create a > sub-partition, if not, we will forbid it. I think this whole thing is a confusion. First of all, resource knobs in any given cgroup is owned by the parent. Delegations where the perm to a resource knob is given to delegatee is not supported and expected to affect resource distribution w.r.t. its siblings. Partition isn't special in this regard. memory.low or min can create similar effects. Maybe I'm missing something but I don't see anything happening that's not supposed to happen. Thanks. -- tejun
On 4/28/26 11:44 AM, Tejun Heo wrote: > Hello, > > On Tue, Apr 28, 2026 at 11:19:16AM -0400, Waiman Long wrote: > ... >> Thank for the comment. Yes, that can be a valid configuration. >> >> One possible workaround may be to see if the current user has write access >> to its parent partition root. If so, we can allow it to create a >> sub-partition, if not, we will forbid it. > I think this whole thing is a confusion. First of all, resource knobs in any > given cgroup is owned by the parent. Delegations where the perm to a > resource knob is given to delegatee is not supported and expected to affect > resource distribution w.r.t. its siblings. Partition isn't special in this > regard. memory.low or min can create similar effects. Maybe I'm missing > something but I don't see anything happening that's not supposed to happen. You are right. I am a bit confused about the exact delegation rules. After reading the delegation section of the cgroup-v2.rst file, I realize that the current behavior should be OK. For clarity, I am planning to send a documentation patch to clarify the current partition delegation behavior. Thanks, Longman
© 2016 - 2026 Red Hat, Inc.