kernel/cgroup/cpuset.c | 3 +++ 1 file changed, 3 insertions(+)
In cgroup v2, the non-hardwall fallthrough path in
cpuset_current_node_allowed() always ends up allowing the allocation:
- CS_MEM_EXCLUSIVE and CS_MEM_HARDWALL are v1-only flags, toggled
only via the cpuset.mem_exclusive / cpuset.mem_hardwall files
which do not exist in v2. Neither flag is ever set on any cpuset
(including top_cpuset) in pure v2 mode.
- As a result, nearest_hardwall_ancestor() always walks up to
top_cpuset.
- top_cpuset.mems_allowed is set to node_possible_map in v2 mode,
so node_isset() on it is always true for any valid node.
The whole scan therefore boils down to taking callback_lock, walking
to the root and returning true. Short-circuit it by returning true
directly when is_in_v2_mode() holds, sparing the callback_lock
acquisition and the pointless walk.
Place the short-circuit after the __GFP_HARDWALL check so that the
generic hardwall enforcement for GFP_USER allocations remains in
effect: __GFP_HARDWALL requests still return false when the node is
outside mems_allowed, preserving cpuset.mems constraints for
__alloc_pages() callers (which prepare_alloc_pages() marks
__GFP_HARDWALL unconditionally when cpusets are enabled).
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Chen Wandun <chenwandun@lixiang.com>
---
kernel/cgroup/cpuset.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a48901a0416a..b539f5b4d21e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -4231,6 +4231,9 @@ bool cpuset_current_node_allowed(int node, gfp_t gfp_mask)
if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */
return false;
+ if (is_in_v2_mode())
+ return true;
+
/* Not hardwall and node outside mems_allowed: scan up cpusets */
spin_lock_irqsave(&callback_lock, flags);
--
2.43.0
Hello, is_in_v2_mode() is also true for v1 mounted with cpuset_v2_mode, where cpuset.mem_exclusive / cpuset.mem_hardwall are still settable. Would that be a problem here? cpuset_v2() looks like a tighter fit. Thanks. -- tejun
On 5/8/26 23:51, Tejun Heo wrote: > Hello, > > is_in_v2_mode() is also true for v1 mounted with cpuset_v2_mode, where > cpuset.mem_exclusive / cpuset.mem_hardwall are still settable. Would > that be a problem here? cpuset_v2() looks like a tighter fit. You're right, it is a problem. Under v1 + cpuset_v2_mode, CS_MEM_HARDWALL/CS_MEM_EXCLUSIVE can be set on non-root cpuset cgroup, so can't directly return true; I will fix it in v2. Best regards, Wandun > > Thanks. > > -- > tejun
On Sat, May 09, 2026 at 05:36:39PM +0800, Wandun <chenwandun1@gmail.com> wrote: > > is_in_v2_mode() is also true for v1 mounted with cpuset_v2_mode, where > > cpuset.mem_exclusive / cpuset.mem_hardwall are still settable. Would > > that be a problem here? cpuset_v2() looks like a tighter fit. > You're right, it is a problem. > > Under v1 + cpuset_v2_mode, CS_MEM_HARDWALL/CS_MEM_EXCLUSIVE can be set > on non-root cpuset cgroup, so can't directly return true; Ah, sorry missd that. > I will fix it in v2. Thx.
© 2016 - 2026 Red Hat, Inc.