From nobody Sat Jun 13 13:37:57 2026 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E930E3EAC9A for ; Thu, 7 May 2026 10:54:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.194 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778151292; cv=none; b=deD6bB2cIWqZWeTIJppnXEVXWGnx5xH8H5Faa721nf0r6rWzHvTc29Jx8iCrfaKs56aUAu8mAcUVBAwmZxf6R3rG2JyO9Ga5qb4dKwpijy7VwavAW6QybUznz8ESCzcP5bB1KmUZvTqQbR/j9DEKTtPtbp/af8ENL7CTk/4jSnA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778151292; c=relaxed/simple; bh=U3CPPGK6K4+AyYWoP8vcr58iqNl7s0IMZ7Gx//WlteQ=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=hcBj3jbHjaMvQ8s+FJwCB8/ePEO5nXm6bNpJEhZDvwJfE6Y81TAMHx+TxS56OVoY+eFQNlMpehAfTCNs03Gxk/GgD2rFu+CLVhzYw4TxkhhRpKVa7WfHcxDShtUXQeYpuM/UooJFudInHeP68q7mQHm8pbOfH7c9ApJ0R5ZAKo0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jC8QDxJj; arc=none smtp.client-ip=209.85.215.194 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jC8QDxJj" Received: by mail-pg1-f194.google.com with SMTP id 41be03b00d2f7-c8173b2af32so453749a12.0 for ; Thu, 07 May 2026 03:54:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778151282; x=1778756082; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=9+L6fy6avhrQ4Ubnijze0dregbUMX614xcgQRJNc0Y4=; b=jC8QDxJjETa+KPKdpTRz/KBm72+y5q9yrQRd2zrDSLEGq2Uky8zKHiImMWcbMnk2b/ hSlT1kMrn/+sWdisNqlt0rIM0KrYCJ4cCgbLHbG54pH0YfzLBQ4JGhLBT8Yrb7EA2Yxh 8BZvSSeYtwvzWqynC5q/M0nV/07tzraaxCTt2TrEq0fhZlyNug1tW9/d3iXRWeww6BUs Wd3cLf8RnrMQ1L3i010aryjr+yK6In90G50TSfyMCmmwJR+I0a+MZsJQY44pwz+D2n62 pyB1DzJXI2BhH6tiMbVEpnmwTN89QtEFsUrWYcULXToT7qD8mPQbTFqpTCJ2Il1cft+H PUkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778151282; x=1778756082; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=9+L6fy6avhrQ4Ubnijze0dregbUMX614xcgQRJNc0Y4=; b=RVE3NkC+Ic0svEbSr095+I8QSvOwLnnM/2frkHX0nb9mgn74vSDEUxtntS7fvrbWFV ATR3ddXpZsfAABAtHnRbl50qL8rR6ZPK3gCxzj91nRRIxSWHHxDTQz4m4WRp/mCg98sM ZRT2EUVwzfirESHdHUuyXC4Oea6cm5v1VxiO8TDheKktm0dUYgKSimyML5HvdDSd6CQ+ wHecEZm6exjEkChUY57/XwBox0ncQwU/Db2GAtdrdFS2fNilRmaIyqjXuQ31jZ31C1qR Rl6y7+Hj0H9GY2ntxVcMavPj75qWM2ZRhqoXbl3D92Fzom6rmpLzsvnvHvwvXYixp7rc WKIg== X-Forwarded-Encrypted: i=1; AFNElJ8HzZ2OKkndQUAPhULi9YuEVqQwOu5wD90sM2mNlDqVIhwJ6KidC8E8SrKkViJiIIkQShKPjvnwtk9jQoY=@vger.kernel.org X-Gm-Message-State: AOJu0Yy22oGHGJ3ZPNyNdqQGm4FN4ttUpd+rwnsJfNQqcyeIn/o9OA70 zpYLO/p/x+gUB65QM4DiDa7g9ttHMa3QKAcnedElNTnGGEVY8ho3LuBcb/2fy/pgSnI= X-Gm-Gg: AeBDiev6/Q3aSD+4Q0UF3l26EamCMt71XK+yD3s21rbJar81k88kYTCenyPKbm+Qo2c Gk9Cld8guAyxmTd4Wzks1Oxgz+huSf6IdxwZUSUEhWKIpxz4FvBxY5dPl2pj/qI3Vl2xHQABj7K LqjLNdY/B7sOfY+6r15N4kUxmedI2x6FLxihxXC3Rj1hwcitwn/DA5xCjLPDeBpK+fyjwIarEMR 6w9VAZrrLcH5ZTMq1JHWYhJ0J3pOowufk1AlcEPEZkINKJ51Lh/ch+E2WmrBA1VhCVqTHRnK+NF lApxSKTCezdYqH00NMM3NSTVew5KM3TLeDeg6RtIdhPtmzKkp/jtaxxA0es4iCC4W1USXOKC0Cx UlgbOnXjkzhYV5McEcWQItESJpEKFFQm3uV5WEXTPdc8V4JNS3/qbHFJGRulFCueIeG+D4QvoE5 A6t3aSkeINvB4mZWvIxQ1/DrRJt9Xzd7zFn5Mmiar8/LF0ake0 X-Received: by 2002:a05:6a20:394d:b0:398:71e4:6287 with SMTP id adf61e73a8af0-3aa5a830dacmr7784784637.10.1778151282215; Thu, 07 May 2026 03:54:42 -0700 (PDT) Received: from intel.company.local ([210.184.73.204]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c8253b399efsm1889949a12.18.2026.05.07.03.54.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 May 2026 03:54:41 -0700 (PDT) From: Chen Wandun X-Google-Original-From: Chen Wandun To: longman@redhat.com, chenridong@huaweicloud.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] cgroup/cpuset: move PF_EXITING check before __GFP_HARDWALL in cpuset_current_node_allowed() Date: Thu, 7 May 2026 18:54:34 +0800 Message-ID: <20260507105434.3266234-1-chenwandun@lixiang.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since prepare_alloc_pages() unconditionally adds __GFP_HARDWALL for the fast path when cpusets are enabled, the __GFP_HARDWALL check in cpuset_current_node_allowed() causes the PF_EXITING escape path to be skipped on the first allocation attempt. This makes it unreachable in the common case, so dying tasks can get stuck in direct reclaim or even trigger OOM while trying to exit, despite being allowed to allocate from any node. Move the PF_EXITING check before __GFP_HARDWALL so that dying tasks can allocate memory from any node to exit quickly, even when cpusets are enabled. Also update the function comment to reflect the actual behavior of prepare_alloc_pages() and the corrected check ordering. Signed-off-by: Chen Wandun Acked-by: Michal Koutn=C3=BD Acked-by: Waiman Long Reviewed-by: Chen Ridong --- kernel/cgroup/cpuset.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index e3a081a07c6d..a48901a0416a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -4176,11 +4176,11 @@ static struct cpuset *nearest_hardwall_ancestor(str= uct cpuset *cs) * current's mems_allowed, yes. If it's not a __GFP_HARDWALL request and = this * node is set in the nearest hardwalled cpuset ancestor to current's cpus= et, * yes. If current has access to memory reserves as an oom victim, yes. - * Otherwise, no. + * If the current task is PF_EXITING, yes. Otherwise, no. * * GFP_USER allocations are marked with the __GFP_HARDWALL bit, * and do not allow allocations outside the current tasks cpuset - * unless the task has been OOM killed. + * unless the task has been OOM killed or is exiting. * GFP_KERNEL allocations are not so marked, so can escape to the * nearest enclosing hardwalled ancestor cpuset. * @@ -4194,7 +4194,9 @@ static struct cpuset *nearest_hardwall_ancestor(struc= t cpuset *cs) * The first call here from mm/page_alloc:get_page_from_freelist() * has __GFP_HARDWALL set in gfp_mask, enforcing hardwall cpusets, * so no allocation on a node outside the cpuset is allowed (unless - * in interrupt, of course). + * in interrupt, of course). The PF_EXITING check must therefore + * come before the __GFP_HARDWALL check, otherwise a dying task + * would be blocked on the fast path. * * The second pass through get_page_from_freelist() doesn't even call * here for GFP_ATOMIC calls. For those calls, the __alloc_pages() @@ -4204,6 +4206,7 @@ static struct cpuset *nearest_hardwall_ancestor(struc= t cpuset *cs) * in_interrupt - any node ok (current task context irrelevant) * GFP_ATOMIC - any node ok * tsk_is_oom_victim - any node ok + * PF_EXITING - any node ok (let dying task exit quickly) * GFP_KERNEL - any node in enclosing hardwalled cpuset ok * GFP_USER - only nodes in current tasks mems allowed ok. */ @@ -4223,11 +4226,10 @@ bool cpuset_current_node_allowed(int node, gfp_t gf= p_mask) */ if (unlikely(tsk_is_oom_victim(current))) return true; - if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */ - return false; - if (current->flags & PF_EXITING) /* Let dying task have memory */ return true; + if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */ + return false; =20 /* Not hardwall and node outside mems_allowed: scan up cpusets */ spin_lock_irqsave(&callback_lock, flags); --=20 2.43.0