From nobody Mon Apr 6 17:26:12 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 133BA374732 for ; Wed, 18 Mar 2026 23:41:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773877292; cv=none; b=qBQnIgfIBzuFZEFWCjhU5azR5J8JkUPAtzM6m4UsOA/DJVv1NlCtj88Vl4Tb55Fou1013beOQ9A3TTopHE+s9h4BIeJZEmaLqUqFp/fkvj3JEkzmrI9SxDnLvi27UkwXekX56RPywtMT7GmywlBJWiaODN3P72/LRmbz6UL720w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773877292; c=relaxed/simple; bh=K1I6xGMZK10ueXMP+m/JJ4NvWkz/D/PFIPNUy8nR+P8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=E4SuWzW7z5QwaSt1awAUTwh14e3EvyEHAhEBMjREhZNIpdijeXQH7cCR0MGx1d1dObDwhUNkpjTdgHz6vxaX61iKsNUiPzEEQMiHtxSXU4NzVYlOlopbmOhv/S/BZQk75RHClhErNQ4FuBkmcIklC2+BAWFTmavnVC7mo4DwSL4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ioARLvfe; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--souravpanda.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ioARLvfe" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2adef9d486bso5737535ad.2 for ; Wed, 18 Mar 2026 16:41:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773877290; x=1774482090; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZAE7XM+sTZKgqzB4BrZL2AJoWNFQZI9gaPUtJ4ReKmE=; b=ioARLvfez1nVLzGP8MtcWHa2QuIK5wxnDGqahzWWU/JbnvlIqyl7L60pyLWVgzynb5 hdIRl2W8IkGzOJ+XGJk6rujCoBf53x7EGwNeXV5bvJf+zC9dGHzfZ4Xhx+0GXOAgabRN d15v1nrGQiQZzG5z4pFI0lCS3rls7Yi9ocyGWBeO1jek2INcRkgE+ViC5t1rZFiaIyHE x/tfRjUFTcUFcougETjxTLGRM7fvBPh4ufjQ0hBeISOoeZeTm2pA5YSsqe5d/E482q2b JGHL46vd8Emy63NTWBFcnANILEdJgQ7eqzmzsHICbdbe78PD+dXOk1nCuWCQ85MbCa6B IsDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773877290; x=1774482090; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZAE7XM+sTZKgqzB4BrZL2AJoWNFQZI9gaPUtJ4ReKmE=; b=exQyrtPBYcLtAMbbwR8AS189JirTYlFiZ7vswcA3vetfMaCkNvjt30KdDzmBZxr7Fp OR1Cnhmh2pQjrTQvGye/juwSDeooFt9PTiZLaBdM0Hca02H2TbaG3i5xwkEBC5bLIQtW 48Wuadkz2wzZ2ZRmn2IiUYipKwLcHa9j/25Hq/75Q58admIIBGwcxlzTBAsuLHCpR44V Q5Lws/SxUCttyXjR51EiVmsfYDGkW1SIh7QlauZjSn6lLQM3UDP0ij+kzLhoBMs84s3j VS+xMfQWEZsxyGXGXe4YRTGZN/fOEMLEY2DmqYk+IJHwb34yXRi2B/FvzSCiZshlBRuA r55Q== X-Forwarded-Encrypted: i=1; AJvYcCV/yg3FefaP7RsEuYhrMdvvBPoRploukFiyWeDRf3Z3gmyKRGmBHzB6yyxkW6susyoDuCqjLSMprt5FIIE=@vger.kernel.org X-Gm-Message-State: AOJu0Yyua2L5ZucCwFVpy7mziU+q2nxUKRiQXeOeVETyQScQrBBX9V7J vE9+xIaayWkxt0SkzfbptfOzybFKKzmPQgo9swzUIVqb7Xu2usf66FgXS4fDJ7MrWizO3igQA05 7NhWG3XP/RV4L71/imYurks3mEQ== X-Received: from pfay33.prod.google.com ([2002:a05:6a00:1821:b0:829:72ec:561c]) (user=souravpanda job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:4781:b0:39b:9644:6e98 with SMTP id adf61e73a8af0-39b99c9eb04mr4939494637.8.1773877290134; Wed, 18 Mar 2026 16:41:30 -0700 (PDT) Date: Wed, 18 Mar 2026 23:41:25 +0000 In-Reply-To: <20260318234126.3216529-1-souravpanda@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260318234126.3216529-1-souravpanda@google.com> X-Mailer: git-send-email 2.53.0.983.g0bb29b3bc5-goog Message-ID: <20260318234126.3216529-2-souravpanda@google.com> Subject: [LSF/MM/BPF TOPIC][RFC PATCH 1/2] mm: add hugepage shrinker for frozen memory From: Sourav Panda To: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: lsf-pc@lists.linux-foundation.org, songmuchun@bytedance.com, osalvador@suse.de, mike.kravetz@oracle.com, mathieu.desnoyers@efficios.com, willy@infradead.org, david@redhat.com, pasha.tatashin@soleen.com, rientjes@google.com, weixugc@google.com, gthelen@google.com, souravpanda@google.com, surenb@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement a shrinker for the hugetlbfs subsystem to provide one-way fungibility, converting unused persistent huge pages back to the buddy system. One Huge page at a time. This is designed for virtualization user cases, where a large pool of huge pages is reserved but kept free, acting as a "frozen" memory reservoir. When the host experiences memory pressure, this shrinker thaws the memory by reclaiming huge pages on-demand. Pass the hugetlb_shrinker_enabled=3D1 kernel command line param to enable. Please note the nr_huge_pages will change without user intervention. Both kswapd and direct reclaim can shrink gigantic hugepages when the system is under memory pressure. To safely support concurrent reclaimers (e.g., kswapd and multiple direct reclaim tasks), a new mutex `hugepage_shrink_mutex` is introduced. Signed-off-by: Sourav Panda --- include/linux/shrinker.h | 2 + mm/Kconfig | 9 +++ mm/hugetlb.c | 125 +++++++++++++++++++++++++++++++++++++++ mm/shrinker.c | 2 + 4 files changed, 138 insertions(+) diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 1a00be90d93a..5374c251ee9e 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -51,6 +51,8 @@ struct shrink_control { */ unsigned long nr_scanned; =20 + s8 priority; + /* current memcg being shrunk (for memcg aware shrinkers) */ struct mem_cgroup *memcg; }; diff --git a/mm/Kconfig b/mm/Kconfig index ebd8ea353687..a88f370c7485 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -769,6 +769,15 @@ config NOMMU_INITIAL_TRIM_EXCESS config ARCH_WANT_GENERAL_HUGETLB bool =20 +config HUGETLB_FROZEN_MEMORY_SHRINKER + bool "HugeTLB Frozen Memory Shrinker" + depends on HUGETLBFS + help + Enables a shrinker for the hugetlb subsystem that allows + unused huge pages to be released back to the buddy + system under memory pressure. One huge page at a time. + Further gated by kernel cmdline hugetlb_shrinker_enabled. + config ARCH_WANTS_THP_SWAP def_bool n =20 diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 327eaa4074d3..d4953ff1dda1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -4127,6 +4128,129 @@ ssize_t __nr_hugepages_store_common(bool obey_mempo= licy, return err ? err : len; } =20 +#ifdef CONFIG_HUGETLB_FROZEN_MEMORY_SHRINKER + +static bool hugetlb_shrinker_enabled; +static int __init cmdline_parse_hugetlb_shrinker_enabled(char *p) +{ + return kstrtobool(p, &hugetlb_shrinker_enabled); +} +early_param("hugetlb_shrinker_enabled", cmdline_parse_hugetlb_shrinker_ena= bled); + +static unsigned long hugepage_shrinker_count(struct shrinker *s, + struct shrink_control *sc) +{ + struct hstate *h; + + if (sc->priority >=3D DEF_PRIORITY - 6) + return 0; + + if (!gigantic_page_runtime_supported()) + return 0; + + for_each_hstate(h) { + if (hstate_is_gigantic(h) && h->nr_huge_pages_node[sc->nid] > 0) + return SWAP_CLUSTER_MAX; + } + return 0; +} + +static bool hugepage_shrinker_is_watermark_ok(int nid) +{ + int i; + pg_data_t *pgdat =3D NODE_DATA(nid); + + for (i =3D 0; i < MAX_NR_ZONES; i++) { + unsigned long mark; + unsigned long free_pages; + struct zone *zone =3D pgdat->node_zones + i; + + if (!managed_zone(zone)) + continue; + + mark =3D high_wmark_pages(zone); + free_pages =3D zone_page_state(zone, NR_FREE_PAGES); + if (__zone_watermark_ok(zone, MAX_PAGE_ORDER, mark, + MAX_NR_ZONES, 0, free_pages)) + return true; + } + return false; +} + +static DEFINE_MUTEX(hugepage_shrink_mutex); + +static unsigned long hugepage_shrinker_scan(struct shrinker *s, + struct shrink_control *sc) +{ + int err; + struct hstate *h; + unsigned long old_nr; + nodemask_t nodes_allowed; + + if (sc->priority >=3D DEF_PRIORITY - 6) + return SHRINK_STOP; + + if (sc->nr_to_scan =3D=3D 0) + return SHRINK_STOP; + + if (!gigantic_page_runtime_supported()) + return SHRINK_STOP; + + if (hugepage_shrinker_is_watermark_ok(sc->nid)) + return SHRINK_STOP; + + mutex_lock(&hugepage_shrink_mutex); + + if (hugepage_shrinker_is_watermark_ok(sc->nid)) + goto unlock; + + init_nodemask_of_node(&nodes_allowed, sc->nid); + + for_each_hstate(h) { + if (!hstate_is_gigantic(h)) + continue; + + old_nr =3D h->nr_huge_pages_node[sc->nid]; + if (!old_nr) + continue; + + err =3D set_max_huge_pages(h, old_nr - 1, sc->nid, &nodes_allowed); + if (!err) + goto unlock; + } +unlock: + mutex_unlock(&hugepage_shrink_mutex); + return SHRINK_STOP; +} + +static struct shrinker *hugepage_shrinker; + +static int __init hugetlb_shrinker_init(void) +{ + if (!hugetlb_shrinker_enabled) + return 0; + + hugepage_shrinker =3D shrinker_alloc(0, "hugetlbfs"); + if (!hugepage_shrinker) + return -ENOMEM; + + hugepage_shrinker->count_objects =3D hugepage_shrinker_count; + hugepage_shrinker->scan_objects =3D hugepage_shrinker_scan; + hugepage_shrinker->seeks =3D 0; + hugepage_shrinker->batch =3D 1; + + pr_info("Registering hugetlbfs shrinker\n"); + shrinker_register(hugepage_shrinker); + + return 0; +} +#else +static int __init hugetlb_shrinker_init(void) +{ + return 0; +} +#endif + static int __init hugetlb_init(void) { int i; @@ -4183,6 +4307,7 @@ static int __init hugetlb_init(void) hugetlb_sysfs_init(); hugetlb_cgroup_file_init(); hugetlb_sysctl_init(); + hugetlb_shrinker_init(); =20 #ifdef CONFIG_SMP num_fault_mutexes =3D roundup_pow_of_two(8 * num_possible_cpus()); diff --git a/mm/shrinker.c b/mm/shrinker.c index 7b61fc0ee78f..8a7a05182465 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -529,6 +529,7 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, = int nid, .gfp_mask =3D gfp_mask, .nid =3D nid, .memcg =3D memcg, + .priority =3D priority, }; struct shrinker *shrinker; int shrinker_id =3D calc_shrinker_id(index, offset); @@ -654,6 +655,7 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, stru= ct mem_cgroup *memcg, .gfp_mask =3D gfp_mask, .nid =3D nid, .memcg =3D memcg, + .priority =3D priority, }; =20 if (!shrinker_try_get(shrinker)) --=20 2.53.0.983.g0bb29b3bc5-goog