From nobody Mon Apr 6 22:09:56 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05E0E3E556F for ; Tue, 17 Mar 2026 23:07:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773788847; cv=none; b=emvK/XiYOYYwo9cMwMlM4ishLP6gaiJylCeQbNHh27x1inGWHph8EmG5nvmYxNKctys9GIJCRCuoHVk5MkQU8yoGbAlKOhP5oPHJ2hEelJUnwxdFxMEcqAAMYejpV7m8ADoeWQ4bqHvJ7uklNKXl9vZp4FItbl4fm80aS6A0Ncc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773788847; c=relaxed/simple; bh=lTY2DkVlB1ZqmArZtsCW+40/HO85obKkV4Bbav50FdY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jjuqBg4mmM2vH7HdU+S1aEQdpZJQMUB0xwA+Cg+9QzO+lW4ut7WnymNFyGfbqyWk82JvBYCh1b5l8b3ylIRZHnvVJKSm51jLEcw+RZ18UOVkvHr0948MWPJluz4IYkVEE+f9Ab4nqZ5vpGnT+5MC5jU0ZCgJByM7MnsfVbpk7Qg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--bingjiao.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=muesGX3u; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--bingjiao.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="muesGX3u" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2b064f043adso17920075ad.0 for ; Tue, 17 Mar 2026 16:07:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773788845; x=1774393645; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=By9SVSNyV82E5eeo5rQU1kGPzHgbL0ThBDCP4IlBKCg=; b=muesGX3uZ4sPoOWXsfwkJ0bxJTO3/DnReO0HtDdsSTAWX0sKLtEFLVvSjQi8tyXgNE NnXn2mUvxDExVOJJkF3ZNnfMXXIM/6vYssOp2tEEeBffq7PiI3TZ4CJIpBaFGysUB2VY 77bcubsbrlT3unBOEm90JZ3j9KUB+8xXzOHAt+SXGo7ChYx0izEaRF8Wd7qRpar2m2rh SRcp+Z5UPGBBb192UxDAN3k9ASktuSMPi5oYHd7F9VeEiBRdG9san6bk3iaPed/ofNEm 7G8R6JDMbphBKCc8N9bxEls8sKIfm8UhPfi4X75WkENOx1xaVMPgxae/gccB0HuYpGm0 HkLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773788845; x=1774393645; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=By9SVSNyV82E5eeo5rQU1kGPzHgbL0ThBDCP4IlBKCg=; b=V2VBJZB7ctO31CXg0nF0LdexBw66eiWPV3idXVRigi8YovHUzO9m6lzlvV0LjQKzKV LKZQMh31iZi1vf/xT58sDk9Qbz8IGd2XVQes7tOGmj3F/wcmlZqq6ZbnH9mptcWVYvmF ptvPo0VK7Uc+bj4bwKFL1X79DJCbBRqj+X7Vxgz1WeIaygvDsqfixjnpaqXUrr0W9O5F K4BL5OraSw+sWMjrJawT0PozXgQIphmfVbeJ0qm1EuAE15yjMmkDvEWM81I94JnS56RC L4gBs/ptKo2lIRInFUL16DjpesFBE5ZqcCd+10IY2gZxjpERa1qL7mmIUWTOjVvayphl h4sg== X-Forwarded-Encrypted: i=1; AJvYcCUDDjYhlUMLKerVUnzWosSMSLuuMp+kWVAFBepVfQHwxlc+328tIOwKRwCJepCpac7/+1dRjDCW/xaLbrw=@vger.kernel.org X-Gm-Message-State: AOJu0YzX0h9OduAmoU3F39CPNwuKDQuVr6c4HqivbKrc6Y/xHwXxVbkN OXgDqwvXU4k561cqxGP3y1HA8UVEzo81jHAth2zEVFxIkeUS8aBE7CUVMiN3yo3SC39Gs992nTp wf+DrgGfkC7lGpQ== X-Received: from plbli7.prod.google.com ([2002:a17:903:2947:b0:2b0:4e8e:5c09]) (user=bingjiao job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:f544:b0:2ae:4732:285a with SMTP id d9443c01a7336-2b06e332b4fmr12404475ad.3.1773788845155; Tue, 17 Mar 2026 16:07:25 -0700 (PDT) Date: Tue, 17 Mar 2026 23:07:01 +0000 In-Reply-To: <20260317230720.990329-1-bingjiao@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260317230720.990329-1-bingjiao@google.com> X-Mailer: git-send-email 2.53.0.851.ga537e3e6e9-goog Message-ID: <20260317230720.990329-3-bingjiao@google.com> Subject: [PATCH 2/3] mm/memcontrol: disable demotion in memcg direct reclaim From: Bing Jiao To: linux-mm@kvack.org Cc: Bing Jiao , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Rientjes , Yosry Ahmed , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Youngjun Park , David Hildenbrand , Qi Zheng , Lorenzo Stoakes , Axel Rasmussen , Yuanchu Xie , Wei Xu , Joshua Hahn Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" NUMA demotion counts towards reclaim targets in shrink_folio_list(), but it does not reduce the total memory usage of a memcg. In memcg direct reclaim paths (e.g., charge-triggered or manual limit writes), where demotion is allowed, this leads to "fake progress" where the reclaim loop concludes it has satisfied the memory request without actually reducing the cgroup's charge. This could result in inefficient reclaim loops, CPU waste, moving all pages to far-tier nodes, and potentially premature OOM kills when the cgroup is under memory pressure but demotion is still possible. Introduce the MEMCG_RECLAIM_NO_DEMOTION flag to disable demotion in these memcg-specific reclaim paths. This ensures that reclaim progress is only counted when memory is actually freed or swapped out. Signed-off-by: Bing Jiao --- include/linux/swap.h | 1 + mm/memcontrol-v1.c | 10 ++++++++-- mm/memcontrol.c | 16 +++++++++++----- mm/vmscan.c | 1 + 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 7a09df6977a5..e83897a6dc72 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -356,6 +356,7 @@ unsigned long lruvec_lru_size(struct lruvec *lruvec, en= um lru_list lru, int zone #define MEMCG_RECLAIM_MAY_SWAP (1 << 1) #define MEMCG_RECLAIM_PROACTIVE (1 << 2) +#define MEMCG_RECLAIM_NO_DEMOTION (1 << 3) #define MIN_SWAPPINESS 0 #define MAX_SWAPPINESS 200 diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 433bba9dfe71..3cb600e28e5b 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1466,6 +1466,10 @@ static int mem_cgroup_resize_max(struct mem_cgroup *= memcg, int ret; bool limits_invariant; struct page_counter *counter =3D memsw ? &memcg->memsw : &memcg->memory; + unsigned int reclaim_options =3D MEMCG_RECLAIM_NO_DEMOTION; + + if (!memsw) + reclaim_options |=3D MEMCG_RECLAIM_MAY_SWAP; do { if (signal_pending(current)) { @@ -1500,7 +1504,7 @@ static int mem_cgroup_resize_max(struct mem_cgroup *m= emcg, } if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, NULL)) { + reclaim_options, NULL)) { ret =3D -EBUSY; break; } @@ -1520,6 +1524,8 @@ static int mem_cgroup_resize_max(struct mem_cgroup *m= emcg, static int mem_cgroup_force_empty(struct mem_cgroup *memcg) { int nr_retries =3D MAX_RECLAIM_RETRIES; + unsigned int reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP | + MEMCG_RECLAIM_NO_DEMOTION; /* we call try-to-free pages for make this cgroup empty */ lru_add_drain_all(); @@ -1532,7 +1538,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *= memcg) return -EINTR; if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - MEMCG_RECLAIM_MAY_SWAP, NULL)) + reclaim_options, NULL)) nr_retries--; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 303ac622d22d..fcf1cd0da643 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2287,6 +2287,8 @@ static unsigned long reclaim_high(struct mem_cgroup *= memcg, gfp_t gfp_mask) { unsigned long nr_reclaimed =3D 0; + unsigned int reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP | + MEMCG_RECLAIM_NO_DEMOTION; do { unsigned long pflags; @@ -2300,7 +2302,7 @@ static unsigned long reclaim_high(struct mem_cgroup *= memcg, psi_memstall_enter(&pflags); nr_reclaimed +=3D try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, - MEMCG_RECLAIM_MAY_SWAP, + reclaim_options, NULL); psi_memstall_leave(&pflags); } while ((memcg =3D parent_mem_cgroup(memcg)) && @@ -2572,7 +2574,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, /* Avoid the refill and flush of the older stock */ batch =3D nr_pages; - reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP; + reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_NO_DEMOTION; if (!do_memsw_account() || page_counter_try_charge(&memcg->memsw, batch, &counter)) { if (page_counter_try_charge(&memcg->memory, batch, &counter)) @@ -2610,7 +2612,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, psi_memstall_enter(&pflags); nr_reclaimed =3D try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, - gfp_mask, reclaim_options, NULL); + gfp_mask, reclaim_options, NULL); psi_memstall_leave(&pflags); if (mem_cgroup_margin(mem_over_limit) >=3D nr_pages) @@ -4638,6 +4640,8 @@ static ssize_t memory_high_write(struct kernfs_open_f= ile *of, { struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); unsigned int nr_retries =3D MAX_RECLAIM_RETRIES; + unsigned int reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP | + MEMCG_RECLAIM_NO_DEMOTION; bool drained =3D false; unsigned long high; int err; @@ -4669,7 +4673,7 @@ static ssize_t memory_high_write(struct kernfs_open_f= ile *of, } reclaimed =3D try_to_free_mem_cgroup_pages(memcg, nr_pages - high, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL); + GFP_KERNEL, reclaim_options, NULL); if (!reclaimed && !nr_retries--) break; @@ -4690,6 +4694,8 @@ static ssize_t memory_max_write(struct kernfs_open_fi= le *of, { struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); unsigned int nr_reclaims =3D MAX_RECLAIM_RETRIES; + unsigned int reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP | + MEMCG_RECLAIM_NO_DEMOTION; bool drained =3D false; unsigned long max; int err; @@ -4721,7 +4727,7 @@ static ssize_t memory_max_write(struct kernfs_open_fi= le *of, if (nr_reclaims) { if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, NULL)) + GFP_KERNEL, reclaim_options, NULL)) nr_reclaims--; continue; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 33287ba4a500..7a8617ba1748 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6809,6 +6809,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem= _cgroup *memcg, .may_unmap =3D 1, .may_swap =3D !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), .proactive =3D !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), + .no_demotion =3D !!(reclaim_options & MEMCG_RECLAIM_NO_DEMOTION), }; /* * Traverse the ZONELIST_FALLBACK zonelist of the current node to put -- 2.53.0.851.ga537e3e6e9-goog