From nobody Fri Dec 19 14:06:26 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E0DDE193402 for ; Mon, 14 Apr 2025 02:13:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744596792; cv=none; b=XRxeC446rO26MTUPG/W0bsGVYQ8vAGU4ZeUCDJwCkbglLjt6uLTZ0fd8oJHAd+kwS9mxAGfaCS5QlJXBE1xEcbHRWpksMZgfZdQUGrzrJWpwdZP/eV3AYnkcfsLJBEugHTSrsitibGwcE5k6qkMqJmnx6hF86NCxCACKgYt0Nbk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744596792; c=relaxed/simple; bh=UkNjBzrZQ56WR0I2Jr/2T7hIjhiXrcVzWBiJwjH1NWM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PQpRNYT4/B6YGv3Jn5e8pMbesufuuRjN/rHF/8R+QdBPiJhwyHiHFIwiWvFBshuH7A98lxKcqCCDpKumQs9DOsBRuNEjtnTKecYD53xL7WmxLkyakhTQ0OFUdZOAdKvd4M5tnxHGP/HRON5nhNFN4PqOZBQSyikPkIRlgghC87c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RLru4YMe; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RLru4YMe" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1744596790; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D51S4gUlLsYOY6CMyTXh6Jn0QDbYM7QIvJMp3/dr794=; b=RLru4YMeiK86CNKqarL2TvjNRhapGxXfm6FgF2IX0KEFb0QREkwTqdDAjROpBSTgIno+Yv SMy6fC8cUiRkUYFkMl8TDjjYxsW2FpVl5WODkwWazrmpCMtQFYZ08hD1pw35Jn3b1fK8Wc /UvqVEcUOe0b86qvrWFqt1qkX6Nc8as= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-639-oL2pbVgrPO-LZ5pu1Csbsw-1; Sun, 13 Apr 2025 22:13:04 -0400 X-MC-Unique: oL2pbVgrPO-LZ5pu1Csbsw-1 X-Mimecast-MFC-AGG-ID: oL2pbVgrPO-LZ5pu1Csbsw_1744596782 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 89F2D1956087; Mon, 14 Apr 2025 02:13:02 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.88.48]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 382B0180AF7C; Mon, 14 Apr 2025 02:12:58 +0000 (UTC) From: Waiman Long To: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Tejun Heo , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Waiman Long Subject: [PATCH v6 1/2] mm/vmscan: Skip memcg with !usage in shrink_node_memcgs() Date: Sun, 13 Apr 2025 22:12:48 -0400 Message-ID: <20250414021249.3232315-2-longman@redhat.com> In-Reply-To: <20250414021249.3232315-1-longman@redhat.com> References: <20250414021249.3232315-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 The test_memcontrol selftest consistently fails its test_memcg_low sub-test due to the fact that two of its test child cgroups which have a memmory.low of 0 or an effective memory.low of 0 still have low events generated for them since mem_cgroup_below_low() use the ">=3D" operator when comparing to elow. The two failed use cases are as follows: 1) memory.low is set to 0, but low events can still be triggered and so the cgroup may have a non-zero low event count. 2) memory.low is set to a non-zero value but the cgroup has no task in it so that it has an effective low value of 0. Again it may have a non-zero low event count if memory reclaim happens. This is probably not a result expected by the users and it is really doubtful that users will check an empty cgroup with no task in it and expecting some non-zero event counts. In the first case, even though memory.low isn't set, it may still have some low protection if memory.low is set in the parent and the cgroup2 memory_recursiveprot mount option is enabled. So low event may still be recorded. The test_memcontrol.c test has to be modified to account for that. For the second case, it really doesn't make sense to have non-zero low event if the cgroup has 0 usage. So we need to skip this corner case in shrink_node_memcgs() by skipping the !usage case. With this patch applied, the test_memcg_low sub-test finishes successfully without failure in most cases. Though both test_memcg_low and test_memcg_min sub-tests may still fail occasionally if the memory.current values fall outside of the expected ranges. Suggested-by: Johannes Weiner Suggested-by: Michal Koutn=C3=BD Signed-off-by: Waiman Long --- mm/internal.h | 9 +++++++++ mm/memcontrol-v1.h | 2 -- mm/vmscan.c | 4 ++++ tools/testing/selftests/cgroup/test_memcontrol.c | 16 +++++++++++----- 4 files changed, 24 insertions(+), 7 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 50c2f590b2d0..c06fb0e8d75c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1535,6 +1535,15 @@ void __meminit __init_page_from_nid(unsigned long pf= n, int nid); unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memc= g, int priority); =20 +#ifdef CONFIG_MEMCG +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); +#else +static inline unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, boo= l swap) +{ + return 1UL; +} +#endif + #ifdef CONFIG_SHRINKER_DEBUG static inline __printf(2, 0) int shrinker_debugfs_name_alloc( struct shrinker *shrinker, const char *fmt, va_list ap) diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 6358464bb416..e92b21af92b1 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -22,8 +22,6 @@ iter !=3D NULL; \ iter =3D mem_cgroup_iter(NULL, iter, NULL)) =20 -unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); - void drain_all_stock(struct mem_cgroup *root_memcg); =20 unsigned long memcg_events(struct mem_cgroup *memcg, int event); diff --git a/mm/vmscan.c b/mm/vmscan.c index b620d74b0f66..a771a0145a12 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5963,6 +5963,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, str= uct scan_control *sc) =20 mem_cgroup_calculate_protection(target_memcg, memcg); =20 + /* Skip memcg with no usage */ + if (!mem_cgroup_usage(memcg, false)) + continue; + if (mem_cgroup_below_min(target_memcg, memcg)) { /* * Hard protection. diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testi= ng/selftests/cgroup/test_memcontrol.c index 16f5d74ae762..5a5dcbe57b56 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -380,10 +380,10 @@ static bool reclaim_until(const char *memcg, long goa= l); * * Then it checks actual memory usages and expects that: * A/B memory.current ~=3D 50M - * A/B/C memory.current ~=3D 29M - * A/B/D memory.current ~=3D 21M - * A/B/E memory.current ~=3D 0 - * A/B/F memory.current =3D 0 + * A/B/C memory.current ~=3D 29M [memory.events:low > 0] + * A/B/D memory.current ~=3D 21M [memory.events:low > 0] + * A/B/E memory.current ~=3D 0 [memory.events:low =3D=3D 0 if !memory_r= ecursiveprot, > 0 otherwise] + * A/B/F memory.current =3D 0 [memory.events:low =3D=3D 0] * (for origin of the numbers, see model in memcg_protection.m.) * * After that it tries to allocate more than there is @@ -525,8 +525,14 @@ static int test_memcg_protection(const char *root, boo= l min) goto cleanup; } =20 + /* + * Child 2 has memory.low=3D0, but some low protection is still being + * distributed down from its parent with memory.low=3D50M if cgroup2 + * memory_recursiveprot mount option is enabled. So the low event + * count will be non-zero in this case. + */ for (i =3D 0; i < ARRAY_SIZE(children); i++) { - int no_low_events_index =3D 1; + int no_low_events_index =3D has_recursiveprot ? 2 : 1; long low, oom; =20 oom =3D cg_read_key_long(children[i], "memory.events", "oom "); --=20 2.48.1