From nobody Tue Oct 7 19:26:33 2025 Received: from mout-p-102.mailbox.org (mout-p-102.mailbox.org [80.241.56.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 100FE2D94AC; Mon, 7 Jul 2025 14:23:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751898227; cv=none; b=q0gvn/+qY3AfhVpSQbL46+m+fcXPoymH0fHR8K8mtFChZhaTuUBZu11gWAWGgpBOVzbv5nWcGr5nQ1pJvD8HdhDG70F+4t8nRDc78f8OSh5FM8rG8/SC0an7D62rE1RSrtdsbsitfOWJlHVU1YKQQh2kgXYGWCMxwTHyMOcEDrQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751898227; c=relaxed/simple; bh=Yit2Oo5Uv3NUZPX8v4yxQ8Kkxzn/eltUeObd9Yys35U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=h2zE9Xll2K6/s6vxsvmCtznPDllnlUXsx7gUpDcwHUCxmBB9icNZpqGb/4XP44P86iaL6qvD3N9jYdpBua7bzbhg3fEH5EJsfS7bFv2wJrhXATQHS4YssKS4aGTti8AIQbF7hb651s8hmaWmv4j9bucutfl4iYc8paxdC5c2nJ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=pankajraghav.com; spf=pass smtp.mailfrom=pankajraghav.com; dkim=pass (2048-bit key) header.d=pankajraghav.com header.i=@pankajraghav.com header.b=wp3KZfoK; arc=none smtp.client-ip=80.241.56.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=pankajraghav.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pankajraghav.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pankajraghav.com header.i=@pankajraghav.com header.b="wp3KZfoK" Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4bbRLJ1x3pz9tyQ; Mon, 7 Jul 2025 16:23:36 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1751898216; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d91pgcDgrRL2JpOOqB781Bo9qwyVwFxGfEFfjYlReNI=; b=wp3KZfoKnF6ITu982GQemk4XaqE9lTE0V+hBMyjHhaiQUaHbr+ZgnuudICs62/+f+pNYJt rpsnAm8HGEE66RB+ZOKlhdeRFUec9kMz7r/+Oa9UmEPuSo8M1tXS0+THT7yYM37KUFc47X hlPW0H9WjuHh5J/5IMS7z5fAuO8uwF+XeFCOrxVhPeZfajkWEdUWFAu5fxbsolNDBA5F0j JJku7U0793nWBiTafdK414V6ecgO5BX20ur8VcX0h8YfsjCYvOWRpC6COpuU4SMQp+PSX4 jHT/6YATywrR4R+vs4iV2Kn8shiXQu5VmX8LUBU1OiMcUfN0hAOjO+a6bbOv7w== From: "Pankaj Raghav (Samsung)" To: Suren Baghdasaryan , Ryan Roberts , Baolin Wang , Borislav Petkov , Ingo Molnar , "H . Peter Anvin" , Vlastimil Babka , Zi Yan , Mike Rapoport , Dave Hansen , Michal Hocko , David Hildenbrand , Lorenzo Stoakes , Andrew Morton , Thomas Gleixner , Nico Pache , Dev Jain , "Liam R . Howlett" , Jens Axboe Cc: linux-kernel@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org, x86@kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Darrick J . Wong" , mcgrof@kernel.org, gost.dev@samsung.com, kernel@pankajraghav.com, hch@lst.de, Pankaj Raghav Subject: [PATCH v2 1/5] mm: move huge_zero_page declaration from huge_mm.h to mm.h Date: Mon, 7 Jul 2025 16:23:15 +0200 Message-ID: <20250707142319.319642-2-kernel@pankajraghav.com> In-Reply-To: <20250707142319.319642-1-kernel@pankajraghav.com> References: <20250707142319.319642-1-kernel@pankajraghav.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Pankaj Raghav Move the declaration associated with huge_zero_page from huge_mm.h to mm.h. This patch is in preparation for adding static PMD zero page as we will be reusing some of the huge_zero_page infrastructure. No functional changes. Signed-off-by: Pankaj Raghav --- include/linux/huge_mm.h | 31 ------------------------------- include/linux/mm.h | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 31 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2f190c90192d..3e887374892c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -478,22 +478,6 @@ struct page *follow_devmap_pmd(struct vm_area_struct *= vma, unsigned long addr, =20 vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); =20 -extern struct folio *huge_zero_folio; -extern unsigned long huge_zero_pfn; - -static inline bool is_huge_zero_folio(const struct folio *folio) -{ - return READ_ONCE(huge_zero_folio) =3D=3D folio; -} - -static inline bool is_huge_zero_pmd(pmd_t pmd) -{ - return pmd_present(pmd) && READ_ONCE(huge_zero_pfn) =3D=3D pmd_pfn(pmd); -} - -struct folio *mm_get_huge_zero_folio(struct mm_struct *mm); -void mm_put_huge_zero_folio(struct mm_struct *mm); - static inline bool thp_migration_supported(void) { return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION); @@ -631,21 +615,6 @@ static inline vm_fault_t do_huge_pmd_numa_page(struct = vm_fault *vmf) return 0; } =20 -static inline bool is_huge_zero_folio(const struct folio *folio) -{ - return false; -} - -static inline bool is_huge_zero_pmd(pmd_t pmd) -{ - return false; -} - -static inline void mm_put_huge_zero_folio(struct mm_struct *mm) -{ - return; -} - static inline struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap) { diff --git a/include/linux/mm.h b/include/linux/mm.h index 0ef2ba0c667a..c8fbeaacf896 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4018,6 +4018,40 @@ static inline bool vma_is_special_huge(const struct = vm_area_struct *vma) =20 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */ =20 +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +extern struct folio *huge_zero_folio; +extern unsigned long huge_zero_pfn; + +static inline bool is_huge_zero_folio(const struct folio *folio) +{ + return READ_ONCE(huge_zero_folio) =3D=3D folio; +} + +static inline bool is_huge_zero_pmd(pmd_t pmd) +{ + return pmd_present(pmd) && READ_ONCE(huge_zero_pfn) =3D=3D pmd_pfn(pmd); +} + +struct folio *mm_get_huge_zero_folio(struct mm_struct *mm); +void mm_put_huge_zero_folio(struct mm_struct *mm); + +#else +static inline bool is_huge_zero_folio(const struct folio *folio) +{ + return false; +} + +static inline bool is_huge_zero_pmd(pmd_t pmd) +{ + return false; +} + +static inline void mm_put_huge_zero_folio(struct mm_struct *mm) +{ + return; +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + #if MAX_NUMNODES > 1 void __init setup_nr_node_ids(void); #else --=20 2.49.0 From nobody Tue Oct 7 19:26:33 2025 Received: from mout-p-102.mailbox.org (mout-p-102.mailbox.org [80.241.56.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1008C2D8DA7; Mon, 7 Jul 2025 14:23:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751898229; cv=none; b=TZ7WKOn0RNtqPvsuMWS7CsnmqP5WNmDUyid74e7sbLzYN6SNzk9HJ4qwQCA2FPCqv+xaZVfZmMaFXGQ7ZJ0YNTsViTPiJn9vqwGAL5yl1FEsbRHBncq9tUvtFXULXECuGfKrUlTDDRbR4XInPeKPCJEC/C3DP5tno+I+riV7qmI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751898229; c=relaxed/simple; bh=Ivb4sEJ315yPpA6w6gGkLWI94DbQvOmt61KT3beBg94=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CpXdLlGIMGUcnOe3jl6NfSLDwgaFkMhfq0Q9wt9XIruhOCduRcZqQXVT+WRD2CYUwG8crl54Pgo5iLth7DevA/I5tQ5D0B7Z4tdygRZrzhPzU9Yd/WnbVgsRtQviGYiH6a+JYbD33RzO5iw4AxK6w3CDOeW03twUAdvWVnE7e+c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=pankajraghav.com; spf=pass smtp.mailfrom=pankajraghav.com; dkim=pass (2048-bit key) header.d=pankajraghav.com header.i=@pankajraghav.com header.b=ffR+SWki; arc=none smtp.client-ip=80.241.56.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=pankajraghav.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pankajraghav.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pankajraghav.com header.i=@pankajraghav.com header.b="ffR+SWki" Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4bbRLQ4bdWz9t1B; Mon, 7 Jul 2025 16:23:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1751898222; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+6e+PcGOhziaHPv2fooE3Mhq4C1IPp18icg31JFqEy0=; b=ffR+SWkiKB9WbWaNGhqMuxqsE52z254tpTUQZe5jFt8NBzuujnqJZw9d1+45vg6y7eMN92 OB9P13fXqYVzkwN4t1wJF0vB4a9wJRhy7WQ0Iiava1lotvj/tQNQE/PFq3GzAxu07Ly+z3 6nD8y9sDqHk2IeKXs8KJZUEz5/vvhQnKHAnbGgvVmZLwLaCWFsqYghnAjaf5cTKjxwSDlI ZQegm5oxr43vVDNZYEUtm7TZOiur6dkgIUuvQIpUhha6IrFcBbetxUWEOPY6vOHMfG4vVE YqL61VItf9ZnU9IXw8EcUiwV6XOEjBhjyl3ewfpjOKpQLUHJLAiLTuYmy+fHyQ== From: "Pankaj Raghav (Samsung)" To: Suren Baghdasaryan , Ryan Roberts , Baolin Wang , Borislav Petkov , Ingo Molnar , "H . Peter Anvin" , Vlastimil Babka , Zi Yan , Mike Rapoport , Dave Hansen , Michal Hocko , David Hildenbrand , Lorenzo Stoakes , Andrew Morton , Thomas Gleixner , Nico Pache , Dev Jain , "Liam R . Howlett" , Jens Axboe Cc: linux-kernel@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org, x86@kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Darrick J . Wong" , mcgrof@kernel.org, gost.dev@samsung.com, kernel@pankajraghav.com, hch@lst.de, Pankaj Raghav Subject: [PATCH v2 2/5] huge_memory: add huge_zero_page_shrinker_(init|exit) function Date: Mon, 7 Jul 2025 16:23:16 +0200 Message-ID: <20250707142319.319642-3-kernel@pankajraghav.com> In-Reply-To: <20250707142319.319642-1-kernel@pankajraghav.com> References: <20250707142319.319642-1-kernel@pankajraghav.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Pankaj Raghav Add huge_zero_page_shrinker_init() and huge_zero_page_shrinker_exit(). As shrinker will not be needed when static PMD zero page is enabled, these two functions can be a no-op. This is a preparation patch for static PMD zero page. No functional changes. Signed-off-by: Pankaj Raghav Reviewed-by: Lorenzo Stoakes --- mm/huge_memory.c | 38 +++++++++++++++++++++++++++----------- 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d3e66136e41a..101b67ab2eb6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -289,6 +289,24 @@ static unsigned long shrink_huge_zero_page_scan(struct= shrinker *shrink, } =20 static struct shrinker *huge_zero_page_shrinker; +static int huge_zero_page_shrinker_init(void) +{ + huge_zero_page_shrinker =3D shrinker_alloc(0, "thp-zero"); + if (!huge_zero_page_shrinker) + return -ENOMEM; + + huge_zero_page_shrinker->count_objects =3D shrink_huge_zero_page_count; + huge_zero_page_shrinker->scan_objects =3D shrink_huge_zero_page_scan; + shrinker_register(huge_zero_page_shrinker); + return 0; +} + +static void huge_zero_page_shrinker_exit(void) +{ + shrinker_free(huge_zero_page_shrinker); + return; +} + =20 #ifdef CONFIG_SYSFS static ssize_t enabled_show(struct kobject *kobj, @@ -850,33 +868,31 @@ static inline void hugepage_exit_sysfs(struct kobject= *hugepage_kobj) =20 static int __init thp_shrinker_init(void) { - huge_zero_page_shrinker =3D shrinker_alloc(0, "thp-zero"); - if (!huge_zero_page_shrinker) - return -ENOMEM; + int ret =3D 0; =20 deferred_split_shrinker =3D shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE | SHRINKER_NONSLAB, "thp-deferred_split"); - if (!deferred_split_shrinker) { - shrinker_free(huge_zero_page_shrinker); + if (!deferred_split_shrinker) return -ENOMEM; - } - - huge_zero_page_shrinker->count_objects =3D shrink_huge_zero_page_count; - huge_zero_page_shrinker->scan_objects =3D shrink_huge_zero_page_scan; - shrinker_register(huge_zero_page_shrinker); =20 deferred_split_shrinker->count_objects =3D deferred_split_count; deferred_split_shrinker->scan_objects =3D deferred_split_scan; shrinker_register(deferred_split_shrinker); =20 + ret =3D huge_zero_page_shrinker_init(); + if (ret) { + shrinker_free(deferred_split_shrinker); + return ret; + } + return 0; } =20 static void __init thp_shrinker_exit(void) { - shrinker_free(huge_zero_page_shrinker); + huge_zero_page_shrinker_exit(); shrinker_free(deferred_split_shrinker); } =20 --=20 2.49.0 From nobody Tue Oct 7 19:26:33 2025 Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE7D52DFA44; Mon, 7 Jul 2025 14:23:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.161 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751898235; cv=none; b=IBTD/R34BvbnXcKu8ZG7d2lFyF+bPhzGMOy0Ii0MH0Np1olj7eXiDPNM2sj96lDJpDC89zYhYQFQJFrvPT2ufNweD7hssRmgGNDZDFC8i73UnU1HAL5ahqbu5km5xMUNNj8cOgVlumnMzbTikVbAbqgk93pIa/e8xXxlxmjb1hg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751898235; c=relaxed/simple; bh=DRCKQK856vY/XhqC271b6reyPd1XmuXWY0VkCfRxp/o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rmtPBsjTBJ5ZodW1/Z+wDKPClIc4WWkeFa1Rcpt63tw5CPMD3qUFURYcKjyqLNJTl46B0U9AyEpagxKyBupBm8VctvKWEK95oe+lDEEB2wC2QwSb4AqAMOZDhF8uBlK8bbaT4wzPW2o8+TtnkQHqoUb+pZGzrAczTKSN1lOpTCQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=pankajraghav.com; spf=pass smtp.mailfrom=pankajraghav.com; dkim=pass (2048-bit key) header.d=pankajraghav.com header.i=@pankajraghav.com header.b=iZsZqnCv; arc=none smtp.client-ip=80.241.56.161 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=pankajraghav.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pankajraghav.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pankajraghav.com header.i=@pankajraghav.com header.b="iZsZqnCv" Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4bbRLY0CMSz9tGx; Mon, 7 Jul 2025 16:23:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1751898229; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dioc976QRUw5bLt02/DkSyYspIHpJxyFl5YdOlbtU3Y=; b=iZsZqnCvOLftVzN22H/zGurFF1wZNHMczIdkygQCJEd73NIfq19mZQOw2buUf8ZOB+l9DD 3+VKM3RZy1Uwkfx6NLJj4xUt+wfX8tBMN8TqVxUe/4nvqz8+nO63NDGZECYNPV0f62MM5e 0SszRUpsSD18v6VAsrpMdeT9dpNkLXC4ZgB15DnLbcvsaJLwg3cEmKq7jkDOSaE5CUdc4b 5mZGOPtBUjdj5kB8M7mujrQm1CC9sB/njlYbGTr0vHhNmySrw626BXHizNDjdkhrXSDJzB 7Qvc/7rvhNyx3qyevTice33nVlvjV0NdkdvieOYiVD+dLl6G4mH6Anv0UvV/qQ== From: "Pankaj Raghav (Samsung)" To: Suren Baghdasaryan , Ryan Roberts , Baolin Wang , Borislav Petkov , Ingo Molnar , "H . Peter Anvin" , Vlastimil Babka , Zi Yan , Mike Rapoport , Dave Hansen , Michal Hocko , David Hildenbrand , Lorenzo Stoakes , Andrew Morton , Thomas Gleixner , Nico Pache , Dev Jain , "Liam R . Howlett" , Jens Axboe Cc: linux-kernel@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org, x86@kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Darrick J . Wong" , mcgrof@kernel.org, gost.dev@samsung.com, kernel@pankajraghav.com, hch@lst.de, Pankaj Raghav Subject: [PATCH v2 3/5] mm: add static PMD zero page Date: Mon, 7 Jul 2025 16:23:17 +0200 Message-ID: <20250707142319.319642-4-kernel@pankajraghav.com> In-Reply-To: <20250707142319.319642-1-kernel@pankajraghav.com> References: <20250707142319.319642-1-kernel@pankajraghav.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Pankaj Raghav There are many places in the kernel where we need to zeroout larger chunks but the maximum segment we can zeroout at a time by ZERO_PAGE is limited by PAGE_SIZE. This is especially annoying in block devices and filesystems where we attach multiple ZERO_PAGEs to the bio in different bvecs. With multipage bvec support in block layer, it is much more efficient to send out larger zero pages as a part of single bvec. This concern was raised during the review of adding LBS support to XFS[1][2]. Usually huge_zero_folio is allocated on demand, and it will be deallocated by the shrinker if there are no users of it left. At moment, huge_zero_folio infrastructure refcount is tied to the process lifetime that created it. This might not work for bio layer as the completitions can be async and the process that created the huge_zero_folio might no longer be alive. Add a config option STATIC_PMD_ZERO_PAGE that will always allocate the huge_zero_folio, and it will never be freed. This makes using the huge_zero_folio without having to pass any mm struct and does not tie the lifetime of the zero folio to anything. memblock is used to allocated this PMD zero page during early boot. If STATIC_PMD_ZERO_PAGE config option is enabled, then mm_get_huge_zero_folio() will simply return this page instead of dynamically allocating a new PMD page. As STATIC_PMD_ZERO_PAGE does not depend on THP, declare huge_zero_folio and huge_zero_pfn outside the THP config. [1] https://lore.kernel.org/linux-xfs/20231027051847.GA7885@lst.de/ [2] https://lore.kernel.org/linux-xfs/ZitIK5OnR7ZNY0IG@infradead.org/ Suggested-by: David Hildenbrand Signed-off-by: Pankaj Raghav --- include/linux/mm.h | 25 ++++++++++++++++++++++++- mm/Kconfig | 9 +++++++++ mm/huge_memory.c | 24 ++++++++++++++++++++---- mm/memory.c | 25 +++++++++++++++++++++++++ mm/mm_init.c | 1 + 5 files changed, 79 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index c8fbeaacf896..428fe6d36b3c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4018,10 +4018,19 @@ static inline bool vma_is_special_huge(const struct= vm_area_struct *vma) =20 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */ =20 -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_STATIC_PMD_ZERO_PAGE +extern void __init static_pmd_zero_init(void); +#else +static inline void __init static_pmd_zero_init(void) +{ + return; +} +#endif + extern struct folio *huge_zero_folio; extern unsigned long huge_zero_pfn; =20 +#ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline bool is_huge_zero_folio(const struct folio *folio) { return READ_ONCE(huge_zero_folio) =3D=3D folio; @@ -4032,9 +4041,23 @@ static inline bool is_huge_zero_pmd(pmd_t pmd) return pmd_present(pmd) && READ_ONCE(huge_zero_pfn) =3D=3D pmd_pfn(pmd); } =20 +#ifdef CONFIG_STATIC_PMD_ZERO_PAGE +static inline struct folio *mm_get_huge_zero_folio(struct mm_struct *mm) +{ + return READ_ONCE(huge_zero_folio); +} + +static inline void mm_put_huge_zero_folio(struct mm_struct *mm) +{ + return; +} + +#else struct folio *mm_get_huge_zero_folio(struct mm_struct *mm); void mm_put_huge_zero_folio(struct mm_struct *mm); =20 +#endif /* CONFIG_STATIC_PMD_ZERO_PAGE */ + #else static inline bool is_huge_zero_folio(const struct folio *folio) { diff --git a/mm/Kconfig b/mm/Kconfig index 781be3240e21..89d5971cf180 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -826,6 +826,15 @@ config ARCH_WANTS_THP_SWAP config MM_ID def_bool n =20 +config STATIC_PMD_ZERO_PAGE + bool "Allocate a PMD page for zeroing" + help + Typically huge_zero_folio, which is a PMD page of zeroes, is allocated + on demand and deallocated when not in use. This option will + allocate a PMD sized zero page during early boot and huge_zero_folio wi= ll + use it instead allocating dynamically. + Not suitable for memory constrained systems. + menuconfig TRANSPARENT_HUGEPAGE bool "Transparent Hugepage Support" depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE && !PREEMPT_RT diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 101b67ab2eb6..c12ca7134e88 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -75,9 +75,6 @@ static unsigned long deferred_split_scan(struct shrinker = *shrink, struct shrink_control *sc); static bool split_underused_thp =3D true; =20 -static atomic_t huge_zero_refcount; -struct folio *huge_zero_folio __read_mostly; -unsigned long huge_zero_pfn __read_mostly =3D ~0UL; unsigned long huge_anon_orders_always __read_mostly; unsigned long huge_anon_orders_madvise __read_mostly; unsigned long huge_anon_orders_inherit __read_mostly; @@ -208,6 +205,23 @@ unsigned long __thp_vma_allowable_orders(struct vm_are= a_struct *vma, return orders; } =20 +#ifdef CONFIG_STATIC_PMD_ZERO_PAGE +static int huge_zero_page_shrinker_init(void) +{ + return 0; +} + +static void huge_zero_page_shrinker_exit(void) +{ + return; +} +#else + +static struct shrinker *huge_zero_page_shrinker; +static atomic_t huge_zero_refcount; +struct folio *huge_zero_folio __read_mostly; +unsigned long huge_zero_pfn __read_mostly =3D ~0UL; + static bool get_huge_zero_page(void) { struct folio *zero_folio; @@ -288,7 +302,6 @@ static unsigned long shrink_huge_zero_page_scan(struct = shrinker *shrink, return 0; } =20 -static struct shrinker *huge_zero_page_shrinker; static int huge_zero_page_shrinker_init(void) { huge_zero_page_shrinker =3D shrinker_alloc(0, "thp-zero"); @@ -307,6 +320,7 @@ static void huge_zero_page_shrinker_exit(void) return; } =20 +#endif =20 #ifdef CONFIG_SYSFS static ssize_t enabled_show(struct kobject *kobj, @@ -2843,6 +2857,8 @@ static void __split_huge_zero_page_pmd(struct vm_area= _struct *vma, pte_t *pte; int i; =20 + // FIXME: can this be called with static zero page? + VM_BUG_ON(IS_ENABLED(CONFIG_STATIC_PMD_ZERO_PAGE)); /* * Leave pmd empty until pte is filled note that it is fine to delay * notification until mmu_notifier_invalidate_range_end() as we are diff --git a/mm/memory.c b/mm/memory.c index b0cda5aab398..42c4c31ad14c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -159,6 +160,30 @@ static int __init init_zero_pfn(void) } early_initcall(init_zero_pfn); =20 +#ifdef CONFIG_STATIC_PMD_ZERO_PAGE +struct folio *huge_zero_folio __read_mostly =3D NULL; +unsigned long huge_zero_pfn __read_mostly =3D ~0UL; + +void __init static_pmd_zero_init(void) +{ + void *alloc =3D memblock_alloc(PMD_SIZE, PAGE_SIZE); + + if (!alloc) + return; + + huge_zero_folio =3D virt_to_folio(alloc); + huge_zero_pfn =3D page_to_pfn(virt_to_page(alloc)); + + __folio_set_head(huge_zero_folio); + prep_compound_head((struct page *)huge_zero_folio, PMD_ORDER); + /* Ensure zero folio won't have large_rmappable flag set. */ + folio_clear_large_rmappable(huge_zero_folio); + folio_zero_range(huge_zero_folio, 0, PMD_SIZE); + + return; +} +#endif + void mm_trace_rss_stat(struct mm_struct *mm, int member) { trace_rss_stat(mm, member); diff --git a/mm/mm_init.c b/mm/mm_init.c index f2944748f526..56d7ec372af1 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2765,6 +2765,7 @@ void __init mm_core_init(void) */ kho_memory_init(); =20 + static_pmd_zero_init(); memblock_free_all(); mem_init(); kmem_cache_init(); --=20 2.49.0 From nobody Tue Oct 7 19:26:33 2025 Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E69BE2BE045; Mon, 7 Jul 2025 14:34:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751898848; cv=none; b=htwnbbsUCdzUfbsH5LJOoNC6sKm8C/L9Jv7YEwg5avER2rABlPGXLT3aF1BJCJVr/jbWvZfzRDseVbtlVYn6roEL9XQ4JVaQUZsZtn6xXcA9A0l6DZT9licz8F7oHvHanWJsENF1gB0N84eTDKWP6U9K58XwLp8QHAg+wWtaJqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751898848; c=relaxed/simple; bh=T9eXTp/Ms48FTz+AjZO1co1eQEeFU0O9d1T8mABx73Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aNDswAvxT1QhuBvZJwyiEqjcFLN5oBwaDtggrrZkj4eO/OkQtmNRxvG98kr183J0yclqwKoqDg1S9Nh1w9DT/FwRlZwDhQ2fTeuFs2TXf9e0R2nwQU2iOlBK43uS5n7I6Wv53czB5682/quKNE4JlKTz8wlq4msNQuF7snyeh0I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=pankajraghav.com; spf=pass smtp.mailfrom=pankajraghav.com; dkim=pass (2048-bit key) header.d=pankajraghav.com header.i=@pankajraghav.com header.b=uredxrHk; arc=none smtp.client-ip=80.241.56.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=pankajraghav.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pankajraghav.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pankajraghav.com header.i=@pankajraghav.com header.b="uredxrHk" Received: from smtp1.mailbox.org (smtp1.mailbox.org [10.196.197.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4bbRLg16kyz9sFB; Mon, 7 Jul 2025 16:23:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1751898235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RJ0ngV7aTWi6Z/gfOczCcU5cXbfqX/K4zPNArN06u9Y=; b=uredxrHk9oyYeoI7h+a4bxQwVTR6dF1NeERNQrzoRqcBvNsGLKUQOwjlRR4zmVu9IOK1wd k44Sac7kd7FQq4ehzXpFgwjiCq7QxzioqS6UPuIeW7+1kq+asKeUgXQ6aYlG+vlw6eXFaK vJ9ZlRYwJt6Wb1IVwumM0k57a8AE0f9Fe1FVCIpSyPG3dSGsKFnXrdne+Ox6UChfucdIgb Nh40REFiXGI6K0SEfMOQYlbPHr4mO6iaOXbQeT7SArQubEBC3y8Ol3a3Kt8h6O264z/3Bg I5+/W+wBp3RdcfHeeanWPoa22RNznXWCAtmz0SoI+UxEItMNupceL6g3LMgrzg== From: "Pankaj Raghav (Samsung)" To: Suren Baghdasaryan , Ryan Roberts , Baolin Wang , Borislav Petkov , Ingo Molnar , "H . Peter Anvin" , Vlastimil Babka , Zi Yan , Mike Rapoport , Dave Hansen , Michal Hocko , David Hildenbrand , Lorenzo Stoakes , Andrew Morton , Thomas Gleixner , Nico Pache , Dev Jain , "Liam R . Howlett" , Jens Axboe Cc: linux-kernel@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org, x86@kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Darrick J . Wong" , mcgrof@kernel.org, gost.dev@samsung.com, kernel@pankajraghav.com, hch@lst.de, Pankaj Raghav Subject: [PATCH v2 4/5] mm: add largest_zero_folio() routine Date: Mon, 7 Jul 2025 16:23:18 +0200 Message-ID: <20250707142319.319642-5-kernel@pankajraghav.com> In-Reply-To: <20250707142319.319642-1-kernel@pankajraghav.com> References: <20250707142319.319642-1-kernel@pankajraghav.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Pankaj Raghav Add largest_zero_folio() routine so that huge_zero_folio can be used without the need to pass any mm struct. This will return ZERO_PAGE folio if CONFIG_STATIC_PMD_ZERO_PAGE is disabled or if we failed to allocate a PMD page from memblock. This routine can also be called even if THP is disabled. Signed-off-by: Pankaj Raghav --- include/linux/mm.h | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 428fe6d36b3c..d5543cf7b8e9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4018,17 +4018,41 @@ static inline bool vma_is_special_huge(const struct= vm_area_struct *vma) =20 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */ =20 +extern struct folio *huge_zero_folio; +extern unsigned long huge_zero_pfn; + #ifdef CONFIG_STATIC_PMD_ZERO_PAGE extern void __init static_pmd_zero_init(void); + +/* + * largest_zero_folio - Get the largest zero size folio available + * + * This function will return a PMD sized zero folio if CONFIG_STATIC_PMD_Z= ERO_PAGE + * is enabled. Otherwise, a ZERO_PAGE folio is returned. + * + * Deduce the size of the folio with folio_size instead of assuming the + * folio size. + */ +static inline struct folio *largest_zero_folio(void) +{ + if(!huge_zero_folio) + return page_folio(ZERO_PAGE(0)); + + return READ_ONCE(huge_zero_folio); +} + #else static inline void __init static_pmd_zero_init(void) { return; } + +static inline struct folio *largest_zero_folio(void) +{ + return page_folio(ZERO_PAGE(0)); +} #endif =20 -extern struct folio *huge_zero_folio; -extern unsigned long huge_zero_pfn; =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline bool is_huge_zero_folio(const struct folio *folio) --=20 2.49.0 From nobody Tue Oct 7 19:26:33 2025 Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E69542BDC2C; Mon, 7 Jul 2025 14:34:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=80.241.56.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751898847; cv=none; b=eRO/GltdNfK9FiAJmeVtsjWIiZfxui01qEytqkb+4NJoJWFGALBJmq5F3P9GfAWUuU7D5zLfU2mGon5KMEy9WsEiWD0+g843GaKWlwEmxk/wZyGfd9lHenGKg+ql42p3AeaHCM8oaXS567L0XzDIhUcy3E9fqCh5ct47/dXyhrg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751898847; c=relaxed/simple; bh=HfEBjhDybxA/lCndKha/gf5kwENN2tgin7s12uZ7/gw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MjKuGXCedRBDpTQIwxDuWG06JRtqHIry3GcxXqr6+nY5nIK9wXbWFRWRVBnc3r4jHLp37lpi7/utWogQmR3+5aod4PO82X3ZR9hIoQWdKo9a8gBFKVbqOJSM5uycB5u+l0bfCHWft3A4mUef2vJmAq70wK7o0MyUXz5QrPEy8Zw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=pankajraghav.com; spf=pass smtp.mailfrom=pankajraghav.com; dkim=pass (2048-bit key) header.d=pankajraghav.com header.i=@pankajraghav.com header.b=a4rIafs7; arc=none smtp.client-ip=80.241.56.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=pankajraghav.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pankajraghav.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pankajraghav.com header.i=@pankajraghav.com header.b="a4rIafs7" Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:b231:465::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4bbRLn36V0z9sqV; Mon, 7 Jul 2025 16:24:01 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1751898241; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=24KDo3KRXZxzAB+MpsSNoPOtleVCAJRJx+y6hk41NC4=; b=a4rIafs7ejMCL/CNvTdRuBlYOowXAqIKjXtP4LS8rmCWE0hVIYMqBPxATAFWTzPY7eKHhY vOZHuibUJbYTQF5RkUjg95/WDGjODG+LS2eqsuQTluKkzxuNuGgFY4JdPIusFIDrranKjQ +Flk3LB7JQlfmkzTS8eQkssBRm22SB28a+4UNq+HVGloMpqEQrg84FH5bJu/xRCLtcMQMQ EES7IR7CXKkDkdUeuygNJ7xJ/LWFp9gvhxgm9lO0YIYFIPDEqZvXY7zXpBqHVScIJMQBEK UKInRCqeEpexdwcfpcP1KLRuPlsBXssNB4gpHZpZRpM/92Mill8wrk8sY7dRFg== From: "Pankaj Raghav (Samsung)" To: Suren Baghdasaryan , Ryan Roberts , Baolin Wang , Borislav Petkov , Ingo Molnar , "H . Peter Anvin" , Vlastimil Babka , Zi Yan , Mike Rapoport , Dave Hansen , Michal Hocko , David Hildenbrand , Lorenzo Stoakes , Andrew Morton , Thomas Gleixner , Nico Pache , Dev Jain , "Liam R . Howlett" , Jens Axboe Cc: linux-kernel@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org, x86@kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Darrick J . Wong" , mcgrof@kernel.org, gost.dev@samsung.com, kernel@pankajraghav.com, hch@lst.de, Pankaj Raghav Subject: [PATCH v2 5/5] block: use largest_zero_folio in __blkdev_issue_zero_pages() Date: Mon, 7 Jul 2025 16:23:19 +0200 Message-ID: <20250707142319.319642-6-kernel@pankajraghav.com> In-Reply-To: <20250707142319.319642-1-kernel@pankajraghav.com> References: <20250707142319.319642-1-kernel@pankajraghav.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4bbRLn36V0z9sqV Content-Type: text/plain; charset="utf-8" From: Pankaj Raghav Use largest_zero_folio() in __blkdev_issue_zero_pages(). On systems with CONFIG_STATIC_PMD_ZERO_PAGE enabled, we will end up sending larger bvecs instead of multiple small ones. Noticed a 4% increase in performance on a commercial NVMe SSD which does not support OP_WRITE_ZEROES. The device's MDTS was 128K. The performance gains might be bigger if the device supports bigger MDTS. Signed-off-by: Pankaj Raghav --- block/blk-lib.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 4c9f20a689f7..70a5700b6717 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -196,6 +196,10 @@ static void __blkdev_issue_zero_pages(struct block_dev= ice *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct bio **biop, unsigned int flags) { + struct folio *zero_folio; + + zero_folio =3D largest_zero_folio(); + while (nr_sects) { unsigned int nr_vecs =3D __blkdev_sectors_to_bio_pages(nr_sects); struct bio *bio; @@ -208,15 +212,14 @@ static void __blkdev_issue_zero_pages(struct block_de= vice *bdev, break; =20 do { - unsigned int len, added; + unsigned int len; =20 - len =3D min_t(sector_t, - PAGE_SIZE, nr_sects << SECTOR_SHIFT); - added =3D bio_add_page(bio, ZERO_PAGE(0), len, 0); - if (added < len) + len =3D min_t(sector_t, folio_size(zero_folio), + nr_sects << SECTOR_SHIFT); + if (!bio_add_folio(bio, zero_folio, len, 0)) break; - nr_sects -=3D added >> SECTOR_SHIFT; - sector +=3D added >> SECTOR_SHIFT; + nr_sects -=3D len >> SECTOR_SHIFT; + sector +=3D len >> SECTOR_SHIFT; } while (nr_sects); =20 *biop =3D bio_chain_and_submit(*biop, bio); --=20 2.49.0