From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B29EE3A0B29 for ; Thu, 15 Jan 2026 14:46:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488372; cv=none; b=IpYgHhPlwYJ1zpNPz5XJpHDSWOfB7Onh51OZki2AmdQ4xBxYPgRAPSw3tjKJDkrAeWOzZrd2FeTbPnzeyMOx7bYUBB42uSDHeziY2YPNXVRF4q+p/o7hE6TWR69Wl/nUaTXlBWZ+KuDjwJC5x5+3ccUvwuFi+ezMf9L8tjqvoLg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488372; c=relaxed/simple; bh=h2aVQ3ENQjj9ztxIwRYvM4dkt+By0RfpKKj6zYgSlLw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lIpqFphaUI3RM9v1KMRzYXil84OwlcIqtrTr0nRC/nno+LsHOQTLwno6wL33MXBwGwnix96R1QIZ6w43CqJeZiwACbndz9ku0JymfGlpcQiGcong32bKUceH49i59A3zD93mUHDeDMxsIc1EqVj1lgOKGaHw6dn9kuBLh3C/+DY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=i9Netuif; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="i9Netuif" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F0EBBC116D0; Thu, 15 Jan 2026 14:46:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488372; bh=h2aVQ3ENQjj9ztxIwRYvM4dkt+By0RfpKKj6zYgSlLw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=i9Netuifl0YR/4BxtFdfMrYyCXF4B/jBbWTmhyyatFuOJHTRs/w5mpC1sU+GxeFMJ 0Woll9vOjlUOQDO0geV9OwfhU8/PPo8qQqtxi7HRNU582F5opZ7VBJvpXzBJ/Er7yU jYZSIN5Ohlw8AvEj+efT20/UcXoQofWQpzxErBIo8V7elS/IWalbVLDuGpLqpMIqLh EaHKqFXPESFncv/Eb+IChWRHIyFUjS3fNvw5bzxxw0lRykHswxGEVelSYnhxP5Eqax Y8vcmnDm0t2vPFmYjo77hqhBGN1k8ob6LClc3Z4aL+kVDmQ87e0cd9FYW2ePYomBPV WHpp/3MqDEs/g== Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfauth.phl.internal (Postfix) with ESMTP id 19F0EF4006B; Thu, 15 Jan 2026 09:46:11 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Thu, 15 Jan 2026 09:46:11 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddupdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:10 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau , x86@kernel.org Subject: [PATCHv3 01/15] x86/vdso32: Prepare for inclusion Date: Thu, 15 Jan 2026 14:45:47 +0000 Message-ID: <20260115144604.822702-2-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The 32-bit vDSO for 64-bit kernels is built by faking a 32-bit environment through various #undefs and #defines in fake_32bit_build.h. Upcoming change will include in . Without preparation, it breaks build of 32-bit vDSO because of exposure to more 64-bit things. CONFIG_PHYS_ADDR_T_64BIT triggers "Missing MAX_POSSIBLE_PHYSMEM_BITS definition" error in . And CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS leads to "shift count >=3D width of type" errors in pte_flags_pkey(). Undefine CONFIG_PHYS_ADDR_T_64BIT and CONFIG_X86_INTEL_MEMORY_PROTECTION_KE= YS in fake_32bit_build.h to fix the problem. Signed-off-by: Kiryl Shutsemau Cc: x86@kernel.org --- arch/x86/entry/vdso/vdso32/fake_32bit_build.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/entry/vdso/vdso32/fake_32bit_build.h b/arch/x86/entry= /vdso/vdso32/fake_32bit_build.h index db1b15f686e3..900cdcde1029 100644 --- a/arch/x86/entry/vdso/vdso32/fake_32bit_build.h +++ b/arch/x86/entry/vdso/vdso32/fake_32bit_build.h @@ -13,6 +13,8 @@ #undef CONFIG_SPARSEMEM_VMEMMAP #undef CONFIG_NR_CPUS #undef CONFIG_PARAVIRT_XXL +#undef CONFIG_PHYS_ADDR_T_64BIT +#undef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS =20 #define CONFIG_X86_32 1 #define CONFIG_PGTABLE_LEVELS 2 --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F3713A0E84; Thu, 15 Jan 2026 14:46:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488377; cv=none; b=rM/nH4WOhbxUbKJ/PMQTz2cz51n0dqTmQIOjrodcYJ4GpYI3z7fgb5w5g0f8HeAEEfKoKaYSgGldejQnOwpozmKsJfcw+OG2asOeoCGRDbeBdsAAdlBjlNWGypgogtG3MuyKxtlK3twJuovxadOU8ueM+OVgadY/aZoHdIMPxWk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488377; c=relaxed/simple; bh=USgADkH2vXFF+KeppM0w/kcZpZeTOtI5OuFJPC+VuNA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S+ppGzgiuv7FRAwrIe4Sm1cBK+9coVuedlNcBLsDoM0fEEUKRWJyv+uNay7QaNQVCDecy5tDhhOF79zDOcE6u+aAdY4ogNLP8svM1ImJC/uZKJbkhutHJbJT/QOQom1jAcfdEoZmsMIMwIk5XbCoAF1Z14jUhxNSSCoXKN71dFU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lRTf9nDk; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lRTf9nDk" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7B44AC4AF09; Thu, 15 Jan 2026 14:46:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488374; bh=USgADkH2vXFF+KeppM0w/kcZpZeTOtI5OuFJPC+VuNA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lRTf9nDkDZh3WnVifpgG0nhelTmx/1Uky6SnoMxTaWq3FZxYnhkVheRyuzDybzdRU hCRseGVBIhmX0BH3gkJCZiUvpLTmfz/G9mlI5gxLwZ1wAAp4tzbq0BsbKURTFK0a+M X9gfv2n4AWVdMVYUeBTMmZJOnDYn5xqigmsOVtPPEKz9EGhCdyBMUtkWEn+O4UdXVr oZZct2Xkyuzw2nuqObAXVOQVkVsMjTKTUbgTK8jAZvwgJSomDG+89389S1xZwCO+1g X6lyL1AKbotHT0M4zxlnghKSaWBFYOedvWNYvB3MIUE/KtHVn1EVkDyL2lczPqcBd5 6jVNvw/MBK/Ww== Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfauth.phl.internal (Postfix) with ESMTP id A959BF40068; Thu, 15 Jan 2026 09:46:12 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-06.internal (MEProxy); Thu, 15 Jan 2026 09:46:12 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:12 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 02/15] mm: Move MAX_FOLIO_ORDER definition to mmzone.h Date: Thu, 15 Jan 2026 14:45:48 +0000 Message-ID: <20260115144604.822702-3-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move MAX_FOLIO_ORDER definition from mm.h to mmzone.h. This is preparation for adding the vmemmap_tails array to struct pglist_data, which requires MAX_FOLIO_ORDER to be available in mmzone.h. Signed-off-by: Kiryl Shutsemau Acked-by: David Hildenbrand (Red Hat) --- include/linux/mm.h | 31 ------------------------------- include/linux/mmzone.h | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+), 31 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 7c79b3369b82..2c409f583569 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -26,7 +26,6 @@ #include #include #include -#include #include #include #include @@ -2074,36 +2073,6 @@ static inline unsigned long folio_nr_pages(const str= uct folio *folio) return folio_large_nr_pages(folio); } =20 -#if !defined(CONFIG_HAVE_GIGANTIC_FOLIOS) -/* - * We don't expect any folios that exceed buddy sizes (and consequently - * memory sections). - */ -#define MAX_FOLIO_ORDER MAX_PAGE_ORDER -#elif defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) -/* - * Only pages within a single memory section are guaranteed to be - * contiguous. By limiting folios to a single memory section, all folio - * pages are guaranteed to be contiguous. - */ -#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT -#elif defined(CONFIG_HUGETLB_PAGE) -/* - * There is no real limit on the folio size. We limit them to the maximum = we - * currently expect (see CONFIG_HAVE_GIGANTIC_FOLIOS): with hugetlb, we ex= pect - * no folios larger than 16 GiB on 64bit and 1 GiB on 32bit. - */ -#define MAX_FOLIO_ORDER get_order(IS_ENABLED(CONFIG_64BIT) ? SZ_16G : SZ_= 1G) -#else -/* - * Without hugetlb, gigantic folios that are bigger than a single PUD are - * currently impossible. - */ -#define MAX_FOLIO_ORDER PUD_ORDER -#endif - -#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER) - /* * compound_nr() returns the number of pages in this potentially compound * page. compound_nr() can be called on a tail page, and is defined to diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7fb7331c5725..6a2f3696068e 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -21,8 +21,10 @@ #include #include #include +#include #include #include +#include #include =20 /* Free memory management - zoned buddy allocator. */ @@ -61,6 +63,36 @@ */ #define PAGE_ALLOC_COSTLY_ORDER 3 =20 +#if !defined(CONFIG_HAVE_GIGANTIC_FOLIOS) +/* + * We don't expect any folios that exceed buddy sizes (and consequently + * memory sections). + */ +#define MAX_FOLIO_ORDER MAX_PAGE_ORDER +#elif defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) +/* + * Only pages within a single memory section are guaranteed to be + * contiguous. By limiting folios to a single memory section, all folio + * pages are guaranteed to be contiguous. + */ +#define MAX_FOLIO_ORDER PFN_SECTION_SHIFT +#elif defined(CONFIG_HUGETLB_PAGE) +/* + * There is no real limit on the folio size. We limit them to the maximum = we + * currently expect (see CONFIG_HAVE_GIGANTIC_FOLIOS): with hugetlb, we ex= pect + * no folios larger than 16 GiB on 64bit and 1 GiB on 32bit. + */ +#define MAX_FOLIO_ORDER get_order(IS_ENABLED(CONFIG_64BIT) ? SZ_16G : SZ_= 1G) +#else +/* + * Without hugetlb, gigantic folios that are bigger than a single PUD are + * currently impossible. + */ +#define MAX_FOLIO_ORDER PUD_ORDER +#endif + +#define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER) + enum migratetype { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F4DD3A0EAF for ; Thu, 15 Jan 2026 14:46:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488377; cv=none; b=afuEvySOyuJYPNUFHI6gCOaWv7Cw70+WtPud5PVHCMrWN1udkwT9hqkCHHl+6xBgc6LutwSNSzGlrEK19SVw2UCP+w5NYF4ZzWetHls25VO1B7wTSXaUiXLNjYV0Meld/046JeQ2mJ9w7UT9sOYqp14fWxVTCzELlNV7ONwd4gk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488377; c=relaxed/simple; bh=ft4QB+pe2zfMyyeRxFmvFGapI2DJYfjteZFz/aCazm4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SDqOVaPAIi5fGKNXV+RqU517G2WeFwwQqcafeAFMZmEycW8FTUcGqnqVGb7yrsaEuVyExSZupnKYCfTixdx85bqk6K4nSEPvqHNTgDDqWu32nUx6ubh81KvlJn6DXgMk4Yc5+uf9AutwoLCdMQyn1USaprH+XqJHOipxOxOG+l0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Snk1xT7N; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Snk1xT7N" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 113FCC116D0; Thu, 15 Jan 2026 14:46:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488375; bh=ft4QB+pe2zfMyyeRxFmvFGapI2DJYfjteZFz/aCazm4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Snk1xT7N/hAcSlG+429C+NoZs18+WxhsUMs5OQqWEz7O+khW6evtiM3B/f0WgUbvX MliO4HtvbKbd1g6NqFtSyxmn6B4w6g5o7f+zwEdWyQdbAGsHLBzTg0OYgECs0t0Mhl I5x0JVmwjqX2VZHqo2SrxmNf8LYWczfzhH6HBmqFxj1NB3taX35UH4pXWZl/yeJEDp ldsWa2GFaZuRvisHiZkfj4s+6p1ewJsqd2J7QGdUzBgbql0hVkS2HLQFYMb781C9S0 yWwr69eijmzxhBuoCHldIUjMmlQa+hbgL1h/Y30Mvc1am1fzEiz3RtXwha2KHUV+7Z aN3sXFg2NQMog== Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfauth.phl.internal (Postfix) with ESMTP id 3DFC6F40068; Thu, 15 Jan 2026 09:46:14 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Thu, 15 Jan 2026 09:46:14 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:13 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 03/15] mm: Change the interface of prep_compound_tail() Date: Thu, 15 Jan 2026 14:45:49 +0000 Message-ID: <20260115144604.822702-4-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead of passing down the head page and tail page index, pass the tail and head pages directly, as well as the order of the compound page. This is a preparation for changing how the head position is encoded in the tail page. Signed-off-by: Kiryl Shutsemau Reviewed-by: Muchun Song --- include/linux/page-flags.h | 4 +++- mm/hugetlb.c | 8 +++++--- mm/internal.h | 12 ++++++------ mm/mm_init.c | 2 +- mm/page_alloc.c | 2 +- 5 files changed, 16 insertions(+), 12 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 0091ad1986bf..d4952573a4af 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -865,7 +865,9 @@ static inline bool folio_test_large(const struct folio = *folio) return folio_test_head(folio); } =20 -static __always_inline void set_compound_head(struct page *page, struct pa= ge *head) +static __always_inline void set_compound_head(struct page *page, + const struct page *head, + unsigned int order) { WRITE_ONCE(page->compound_head, (unsigned long)head + 1); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0455119716ec..a55d638975bd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3212,6 +3212,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int n= id) =20 /* Initialize [start_page:end_page_number] tail struct pages of a hugepage= */ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, + struct hstate *h, unsigned long start_page_number, unsigned long end_page_number) { @@ -3220,6 +3221,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(st= ruct folio *folio, struct page *page =3D folio_page(folio, start_page_number); unsigned long head_pfn =3D folio_pfn(folio); unsigned long pfn, end_pfn =3D head_pfn + end_page_number; + unsigned int order =3D huge_page_order(h); =20 /* * As we marked all tail pages with memblock_reserved_mark_noinit(), @@ -3227,7 +3229,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(st= ruct folio *folio, */ for (pfn =3D head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) { __init_single_page(page, pfn, zone, nid); - prep_compound_tail((struct page *)folio, pfn - head_pfn); + prep_compound_tail(page, &folio->page, order); set_page_count(page, 0); } } @@ -3247,7 +3249,7 @@ static void __init hugetlb_folio_init_vmemmap(struct = folio *folio, __folio_set_head(folio); ret =3D folio_ref_freeze(folio, 1); VM_BUG_ON(!ret); - hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages); + hugetlb_folio_init_tail_vmemmap(folio, h, 1, nr_pages); prep_compound_head((struct page *)folio, huge_page_order(h)); } =20 @@ -3304,7 +3306,7 @@ static void __init prep_and_add_bootmem_folios(struct= hstate *h, * time as this is early in boot and there should * be no contention. */ - hugetlb_folio_init_tail_vmemmap(folio, + hugetlb_folio_init_tail_vmemmap(folio, h, HUGETLB_VMEMMAP_RESERVE_PAGES, pages_per_huge_page(h)); } diff --git a/mm/internal.h b/mm/internal.h index 1561fc2ff5b8..f385370256b9 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -810,13 +810,13 @@ static inline void prep_compound_head(struct page *pa= ge, unsigned int order) INIT_LIST_HEAD(&folio->_deferred_list); } =20 -static inline void prep_compound_tail(struct page *head, int tail_idx) +static inline void prep_compound_tail(struct page *tail, + const struct page *head, + unsigned int order) { - struct page *p =3D head + tail_idx; - - p->mapping =3D TAIL_MAPPING; - set_compound_head(p, head); - set_page_private(p, 0); + tail->mapping =3D TAIL_MAPPING; + set_compound_head(tail, head, order); + set_page_private(tail, 0); } =20 void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flag= s); diff --git a/mm/mm_init.c b/mm/mm_init.c index 7712d887b696..87d1e0277318 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1102,7 +1102,7 @@ static void __ref memmap_init_compound(struct page *h= ead, struct page *page =3D pfn_to_page(pfn); =20 __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); - prep_compound_tail(head, pfn - head_pfn); + prep_compound_tail(page, head, order); set_page_count(page, 0); } prep_compound_head(head, order); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ed82ee55e66a..fe77c00c99df 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -717,7 +717,7 @@ void prep_compound_page(struct page *page, unsigned int= order) =20 __SetPageHead(page); for (i =3D 1; i < nr_pages; i++) - prep_compound_tail(page, i); + prep_compound_tail(page + i, page, order); =20 prep_compound_head(page, order); } --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F4303A0E8B for ; Thu, 15 Jan 2026 14:46:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488377; cv=none; b=RflfMErRQdRJL6StqKx8kpBhNexX2j7CkfZQcGF7cl5Vw6uUGwRCICgB7q87QZZF0v0h89wh2PEq5RLvfSroFuImmw6SAyw/FLEAtpg0EeryAvjdcoqNj438v+OLGkZKvUQuPYujqV5Q5xEjO7/Kt3V/HQAwdQLfXhI1PZZOrns= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488377; c=relaxed/simple; bh=i+BbGSx7w1/9QRLZPjCHH351V3zfwSVDU5fmUf6X6P4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XSj5u14teapnB4RVxbiKw+Yf8PFPbkrQ0YykmAsK+LOjM3Hrlq880sefNscLgE9XwtiY+MWm42NvlOAAdRvgTAubMl6nAtlODUbmEuP+Fjx4kq/ZOwU2KlJJep1Xc+0WJsZHz0ff80wpSBJ4lgYQhu4QpD2BAoDq0MU04vP+wMs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=X8AMInSU; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="X8AMInSU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9A56EC4AF0D; Thu, 15 Jan 2026 14:46:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488377; bh=i+BbGSx7w1/9QRLZPjCHH351V3zfwSVDU5fmUf6X6P4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=X8AMInSUWo+ZvkIpW5Or1i02V5tiLf75cL3g4D5euqUXnzngFN58Pax4d8vkww3Df aAooDOAj5WZrMeeX7T8THvabDOEkWfIBmrtIcw3qbFc0iCZD37/XiACAuK0p8Fbh6Z GZAIs18Z3Zq7f3FA8yjKeVw+wq7CiKaBYEm+Ol6EYfjhU8PBLoMYdL4xpAA3J4tI8b ZR4Eb04nEqy3NXU79KmYRTDMI0Xd/CVJTyOBgXcXfGcgYleOetjFcI+2r5Ew794tTt ZbXU12Q2ws0LMNSRw+cDXuNNKIECxZoQu89mHZRKHnhf5wn0ruN23PhBLm60SvpWsk K5cwVd/wgXAZA== Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfauth.phl.internal (Postfix) with ESMTP id C5DA6F4006B; Thu, 15 Jan 2026 09:46:15 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-01.internal (MEProxy); Thu, 15 Jan 2026 09:46:15 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:15 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 04/15] mm: Rename the 'compound_head' field in the 'struct page' to 'compound_info' Date: Thu, 15 Jan 2026 14:45:50 +0000 Message-ID: <20260115144604.822702-5-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The 'compound_head' field in the 'struct page' encodes whether the page is a tail and where to locate the head page. Bit 0 is set if the page is a tail, and the remaining bits in the field point to the head page. As preparation for changing how the field encodes information about the head page, rename the field to 'compound_info'. Signed-off-by: Kiryl Shutsemau Reviewed-by: Muchun Song --- .../admin-guide/kdump/vmcoreinfo.rst | 2 +- Documentation/mm/vmemmap_dedup.rst | 6 +++--- include/linux/mm_types.h | 20 +++++++++---------- include/linux/page-flags.h | 18 ++++++++--------- include/linux/types.h | 2 +- kernel/vmcore_info.c | 2 +- mm/page_alloc.c | 2 +- mm/slab.h | 2 +- mm/util.c | 2 +- 9 files changed, 28 insertions(+), 28 deletions(-) diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation= /admin-guide/kdump/vmcoreinfo.rst index 404a15f6782c..7663c610fe90 100644 --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst @@ -141,7 +141,7 @@ nodemask_t The size of a nodemask_t type. Used to compute the number of online nodes. =20 -(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compou= nd_head) +(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compou= nd_info) --------------------------------------------------------------------------= -------- =20 User-space tools compute their values based on the offset of these diff --git a/Documentation/mm/vmemmap_dedup.rst b/Documentation/mm/vmemmap_= dedup.rst index b4a55b6569fa..1863d88d2dcb 100644 --- a/Documentation/mm/vmemmap_dedup.rst +++ b/Documentation/mm/vmemmap_dedup.rst @@ -24,7 +24,7 @@ For each base page, there is a corresponding ``struct pag= e``. Within the HugeTLB subsystem, only the first 4 ``struct page`` are used to contain unique information about a HugeTLB page. ``__NR_USED_SUBPAGE`` pro= vides this upper limit. The only 'useful' information in the remaining ``struct = page`` -is the compound_head field, and this field is the same for all tail pages. +is the compound_info field, and this field is the same for all tail pages. =20 By removing redundant ``struct page`` for HugeTLB pages, memory can be ret= urned to the buddy allocator for other uses. @@ -124,10 +124,10 @@ Here is how things look before optimization:: | | +-----------+ =20 -The value of page->compound_head is the same for all tail pages. The first +The value of page->compound_info is the same for all tail pages. The first page of ``struct page`` (page 0) associated with the HugeTLB page contains= the 4 ``struct page`` necessary to describe the HugeTLB. The only use of the rem= aining -pages of ``struct page`` (page 1 to page 7) is to point to page->compound_= head. +pages of ``struct page`` (page 1 to page 7) is to point to page->compound_= info. Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of ``struct pa= ge`` will be used for each HugeTLB page. This will allow us to free the remaini= ng 7 pages to the buddy allocator. diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 90e5790c318f..a94683272869 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -125,14 +125,14 @@ struct page { atomic_long_t pp_ref_count; }; struct { /* Tail pages of compound page */ - unsigned long compound_head; /* Bit zero is set */ + unsigned long compound_info; /* Bit zero is set */ }; struct { /* ZONE_DEVICE pages */ /* - * The first word is used for compound_head or folio + * The first word is used for compound_info or folio * pgmap */ - void *_unused_pgmap_compound_head; + void *_unused_pgmap_compound_info; void *zone_device_data; /* * ZONE_DEVICE private pages are counted as being @@ -383,7 +383,7 @@ struct folio { /* private: avoid cluttering the output */ /* For the Unevictable "LRU list" slot */ struct { - /* Avoid compound_head */ + /* Avoid compound_info */ void *__filler; /* public: */ unsigned int mlock_count; @@ -484,7 +484,7 @@ struct folio { FOLIO_MATCH(flags, flags); FOLIO_MATCH(lru, lru); FOLIO_MATCH(mapping, mapping); -FOLIO_MATCH(compound_head, lru); +FOLIO_MATCH(compound_info, lru); FOLIO_MATCH(__folio_index, index); FOLIO_MATCH(private, private); FOLIO_MATCH(_mapcount, _mapcount); @@ -503,7 +503,7 @@ FOLIO_MATCH(_last_cpupid, _last_cpupid); static_assert(offsetof(struct folio, fl) =3D=3D \ offsetof(struct page, pg) + sizeof(struct page)) FOLIO_MATCH(flags, _flags_1); -FOLIO_MATCH(compound_head, _head_1); +FOLIO_MATCH(compound_info, _head_1); FOLIO_MATCH(_mapcount, _mapcount_1); FOLIO_MATCH(_refcount, _refcount_1); #undef FOLIO_MATCH @@ -511,13 +511,13 @@ FOLIO_MATCH(_refcount, _refcount_1); static_assert(offsetof(struct folio, fl) =3D=3D \ offsetof(struct page, pg) + 2 * sizeof(struct page)) FOLIO_MATCH(flags, _flags_2); -FOLIO_MATCH(compound_head, _head_2); +FOLIO_MATCH(compound_info, _head_2); #undef FOLIO_MATCH #define FOLIO_MATCH(pg, fl) \ static_assert(offsetof(struct folio, fl) =3D=3D \ offsetof(struct page, pg) + 3 * sizeof(struct page)) FOLIO_MATCH(flags, _flags_3); -FOLIO_MATCH(compound_head, _head_3); +FOLIO_MATCH(compound_info, _head_3); #undef FOLIO_MATCH =20 /** @@ -583,8 +583,8 @@ struct ptdesc { #define TABLE_MATCH(pg, pt) \ static_assert(offsetof(struct page, pg) =3D=3D offsetof(struct ptdesc, pt= )) TABLE_MATCH(flags, pt_flags); -TABLE_MATCH(compound_head, pt_list); -TABLE_MATCH(compound_head, _pt_pad_1); +TABLE_MATCH(compound_info, pt_list); +TABLE_MATCH(compound_info, _pt_pad_1); TABLE_MATCH(mapping, __page_mapping); TABLE_MATCH(__folio_index, pt_index); TABLE_MATCH(rcu_head, pt_rcu_head); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index d4952573a4af..72c933a43b6a 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -213,7 +213,7 @@ static __always_inline const struct page *page_fixed_fa= ke_head(const struct page /* * Only addresses aligned with PAGE_SIZE of struct page may be fake head * struct page. The alignment check aims to avoid access the fields ( - * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly) + * e.g. compound_info) of the @page[1]. It can avoid touch a (possibly) * cold cacheline in some cases. */ if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) && @@ -223,7 +223,7 @@ static __always_inline const struct page *page_fixed_fa= ke_head(const struct page * because the @page is a compound page composed with at least * two contiguous pages. */ - unsigned long head =3D READ_ONCE(page[1].compound_head); + unsigned long head =3D READ_ONCE(page[1].compound_info); =20 if (likely(head & 1)) return (const struct page *)(head - 1); @@ -281,7 +281,7 @@ static __always_inline int page_is_fake_head(const stru= ct page *page) =20 static __always_inline unsigned long _compound_head(const struct page *pag= e) { - unsigned long head =3D READ_ONCE(page->compound_head); + unsigned long head =3D READ_ONCE(page->compound_info); =20 if (unlikely(head & 1)) return head - 1; @@ -320,13 +320,13 @@ static __always_inline unsigned long _compound_head(c= onst struct page *page) =20 static __always_inline int PageTail(const struct page *page) { - return READ_ONCE(page->compound_head) & 1 || page_is_fake_head(page); + return READ_ONCE(page->compound_info) & 1 || page_is_fake_head(page); } =20 static __always_inline int PageCompound(const struct page *page) { return test_bit(PG_head, &page->flags.f) || - READ_ONCE(page->compound_head) & 1; + READ_ONCE(page->compound_info) & 1; } =20 #define PAGE_POISON_PATTERN -1l @@ -348,7 +348,7 @@ static const unsigned long *const_folio_flags(const str= uct folio *folio, { const struct page *page =3D &folio->page; =20 - VM_BUG_ON_PGFLAGS(page->compound_head & 1, page); + VM_BUG_ON_PGFLAGS(page->compound_info & 1, page); VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags.f), page); return &page[n].flags.f; } @@ -357,7 +357,7 @@ static unsigned long *folio_flags(struct folio *folio, = unsigned n) { struct page *page =3D &folio->page; =20 - VM_BUG_ON_PGFLAGS(page->compound_head & 1, page); + VM_BUG_ON_PGFLAGS(page->compound_info & 1, page); VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags.f), page); return &page[n].flags.f; } @@ -869,12 +869,12 @@ static __always_inline void set_compound_head(struct = page *page, const struct page *head, unsigned int order) { - WRITE_ONCE(page->compound_head, (unsigned long)head + 1); + WRITE_ONCE(page->compound_info, (unsigned long)head + 1); } =20 static __always_inline void clear_compound_head(struct page *page) { - WRITE_ONCE(page->compound_head, 0); + WRITE_ONCE(page->compound_info, 0); } =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/include/linux/types.h b/include/linux/types.h index 6dfdb8e8e4c3..3a65f0ef4a73 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -234,7 +234,7 @@ struct ustat { * * This guarantee is important for few reasons: * - future call_rcu_lazy() will make use of lower bits in the pointer; - * - the structure shares storage space in struct page with @compound_hea= d, + * - the structure shares storage space in struct page with @compound_inf= o, * which encode PageTail() in bit 0. The guarantee is needed to avoid * false-positive PageTail(). */ diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c index e066d31d08f8..782bc2050a40 100644 --- a/kernel/vmcore_info.c +++ b/kernel/vmcore_info.c @@ -175,7 +175,7 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_OFFSET(page, lru); VMCOREINFO_OFFSET(page, _mapcount); VMCOREINFO_OFFSET(page, private); - VMCOREINFO_OFFSET(page, compound_head); + VMCOREINFO_OFFSET(page, compound_info); VMCOREINFO_OFFSET(pglist_data, node_zones); VMCOREINFO_OFFSET(pglist_data, nr_zones); #ifdef CONFIG_FLATMEM diff --git a/mm/page_alloc.c b/mm/page_alloc.c index fe77c00c99df..cecd6d89ff60 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -704,7 +704,7 @@ static inline bool pcp_allowed_order(unsigned int order) * The first PAGE_SIZE page is called the "head page" and have PG_head set. * * The remaining PAGE_SIZE pages are called "tail pages". PageTail() is en= coded - * in bit 0 of page->compound_head. The rest of bits is pointer to head pa= ge. + * in bit 0 of page->compound_info. The rest of bits is pointer to head pa= ge. * * The first tail page's ->compound_order holds the order of allocation. * This usage means that zero-order pages may not be compound. diff --git a/mm/slab.h b/mm/slab.h index 078daecc7cf5..b471877af296 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -104,7 +104,7 @@ struct slab { #define SLAB_MATCH(pg, sl) \ static_assert(offsetof(struct page, pg) =3D=3D offsetof(struct slab, sl)) SLAB_MATCH(flags, flags); -SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */ +SLAB_MATCH(compound_info, slab_cache); /* Ensure bit 0 is clear */ SLAB_MATCH(_refcount, __page_refcount); #ifdef CONFIG_MEMCG SLAB_MATCH(memcg_data, obj_exts); diff --git a/mm/util.c b/mm/util.c index 8989d5767528..cbf93cf3223a 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1244,7 +1244,7 @@ void snapshot_page(struct page_snapshot *ps, const st= ruct page *page) again: memset(&ps->folio_snapshot, 0, sizeof(struct folio)); memcpy(&ps->page_snapshot, page, sizeof(*page)); - head =3D ps->page_snapshot.compound_head; + head =3D ps->page_snapshot.compound_info; if ((head & 1) =3D=3D 0) { ps->idx =3D 0; foliop =3D (struct folio *)&ps->page_snapshot; --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FF953A1E8B; Thu, 15 Jan 2026 14:46:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488378; cv=none; b=EaAqdaZujkHbDbSg20fC+cJYokUDlG0DK/MJmnBz3AMmB1GQm6DThrAAQrwavWKCjwjqMvI3GlwwNf5J4S5H8hy5L1TvrwhLIXr0pTPBlTWla3xbzUs+Ah0eNn8EV5s4i9ns5jgCQiJfNSN2I9Pm2Spa8MBXQO6DCBY6OH/L1TM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488378; c=relaxed/simple; bh=WUDBQAa58qP3pHbP6EnrIXpUoC+TsGSOWrtaUEZdRJA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Wd/WMSc6Y1rbph63Ja9iyPEONyICWi9Uf0tO0UtFylYlIeuZzg9JEXzZeHrlyMRZy6BgnUBUzYotLjpmv33cfIWb8lOWgzP7V5z98QeTIfx6PAZCmSpzzB+BiLHGocg4FGC9P1LGgqW2/inPnJhVF7FjR3S4NvRqC/e0H9m6JzA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vOk/BYUF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vOk/BYUF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 18C6DC116D0; Thu, 15 Jan 2026 14:46:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488378; bh=WUDBQAa58qP3pHbP6EnrIXpUoC+TsGSOWrtaUEZdRJA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vOk/BYUF3RLC3PapuN4FEHV/ZzzuMRtsVLGbNO8Fw8rkeQ8nQuXzAp0W00erckRm2 ikAKfh6ZLwYL4muFR4byIG0GEaBtCpIbrE4bPmux7crS4Zs8o73h64VuhCLw9i7mIe poJrb8uDmXrmxLkI0hCMJeqsv4FMjc5+LngPmnFsG3aJ3aLHxoMI7rU2e72bMYy1Gh f6VIvEMhvG4ZLAg3vWs0AJWXRa5Qyy74PBWbaJH+FJjJu/ZQVTGVRE0jEhiKok0x7h Hg1vn32C3A19TpX6QN/iHeKXLd1X2diRI7IRQKw0B7VyDvRKwDWQxPcKMuYSmqMk42 FodVyw8c5ts8g== Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfauth.phl.internal (Postfix) with ESMTP id 45A69F40068; Thu, 15 Jan 2026 09:46:17 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-04.internal (MEProxy); Thu, 15 Jan 2026 09:46:17 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:16 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 05/15] mm: Move set/clear_compound_head() next to compound_head() Date: Thu, 15 Jan 2026 14:45:51 +0000 Message-ID: <20260115144604.822702-6-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move set_compound_head() and clear_compound_head() to be adjacent to the compound_head() function in page-flags.h. These functions encode and decode the same compound_info field, so keeping them together makes it easier to verify their logic is consistent, especially when the encoding changes. Signed-off-by: Kiryl Shutsemau Reviewed-by: Muchun Song --- include/linux/page-flags.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 72c933a43b6a..0de7db7efb00 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -290,6 +290,18 @@ static __always_inline unsigned long _compound_head(co= nst struct page *page) =20 #define compound_head(page) ((typeof(page))_compound_head(page)) =20 +static __always_inline void set_compound_head(struct page *page, + const struct page *head, + unsigned int order) +{ + WRITE_ONCE(page->compound_info, (unsigned long)head + 1); +} + +static __always_inline void clear_compound_head(struct page *page) +{ + WRITE_ONCE(page->compound_info, 0); +} + /** * page_folio - Converts from page to folio. * @p: The page. @@ -865,18 +877,6 @@ static inline bool folio_test_large(const struct folio= *folio) return folio_test_head(folio); } =20 -static __always_inline void set_compound_head(struct page *page, - const struct page *head, - unsigned int order) -{ - WRITE_ONCE(page->compound_info, (unsigned long)head + 1); -} - -static __always_inline void clear_compound_head(struct page *page) -{ - WRITE_ONCE(page->compound_info, 0); -} - #ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline void ClearPageCompound(struct page *page) { --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 666BC3A63EF for ; Thu, 15 Jan 2026 14:46:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488380; cv=none; b=IDsH3tdhVE3WjguYE/17zKA2p0QPIhY+IhAYDFM6g8TSSEr2wG8AI1+SzMpZKd4HlydoNI2Hlo1a2/1Cv8u8nNeUKTMvUxOFEwiS7XQtm4IfEWwqjw2qcxJ4i+4p7N7TbID1hMAXQaMlL9BuO2YQJA77Oq8zKXqocsM+lfBUtX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488380; c=relaxed/simple; bh=jueaS1qYz8lwqJ6jPeGHTAmiWIc9uM56rJANRez3R/4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=irdX6xPQnEPaCCspr89XPVixtwKjvJyowYTaXPz0ryjuFyTI8Q3hunMkBY/C2YNcM/q4tXZL7Y2OnaJs9N4Cc3XYeVfUqf5KTKlUZzn/KkM87qrPq/J4QaKVUanR/YLgzaX/Piafl9sCosVFwQbLnt30uzYCWRwiMyVz37btYws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eeuk/XSg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eeuk/XSg" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AF09DC19423; Thu, 15 Jan 2026 14:46:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488380; bh=jueaS1qYz8lwqJ6jPeGHTAmiWIc9uM56rJANRez3R/4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=eeuk/XSg2IkrbpmoSRe9I13l+hXMFkbgB7BdibGmBByh8QHTrc1wP4GSkQ8x7nz8s CmguYmU5Pnjv4W/3Tp/3NJjIMJlzz74/AbREC4hhcuE/OEymvgFn8Fst2ju2HT5f+E phoArj4tkGCBZTy7TvRIZIiDUZgT0tazIXpI7GLYTyXhAXs0pS1vS+mthqAbIK64bg slg43j6HRrZHvDm/EbB3sed6mZQ919/uy5hYDf1LkVqG/L8F2euGF3H4NS5425OI/D ogjq2/pG5ugSpQkTZsJyH6CzbxLdGvy++uuee07vKFgH0VQRzIqvucBeqcDlu92ux2 8sXHNfqypd34g== Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfauth.phl.internal (Postfix) with ESMTP id DF9FEF40068; Thu, 15 Jan 2026 09:46:18 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Thu, 15 Jan 2026 09:46:18 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:18 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 06/15] mm: Rework compound_head() for power-of-2 sizeof(struct page) Date: Thu, 15 Jan 2026 14:45:52 +0000 Message-ID: <20260115144604.822702-7-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For tail pages, the kernel uses the 'compound_info' field to get to the head page. The bit 0 of the field indicates whether the page is a tail page, and if set, the remaining bits represent a pointer to the head page. For cases when size of struct page is power-of-2, change the encoding of compound_info to store a mask that can be applied to the virtual address of the tail page in order to access the head page. It is possible because struct page of the head page is naturally aligned with regards to order of the page. The significant impact of this modification is that all tail pages of the same order will now have identical 'compound_info', regardless of the compound page they are associated with. This paves the way for eliminating fake heads. The HugeTLB Vmemmap Optimization (HVO) creates fake heads and it is only applied when the sizeof(struct page) is power-of-2. Having identical tail pages allows the same page to be mapped into the vmemmap of all pages, maintaining memory savings without fake heads. If sizeof(struct page) is not power-of-2, there is no functional changes. Limit mask usage to SPARSEMEM_VMEMMAP where it makes a difference because HVO. The approach with mask would work for any memory model, but it requires validating that struct pages are naturally aligned for all orders up to the MAX_FOLIO order, which can be tricky. Signed-off-by: Kiryl Shutsemau Reviewed-by: Muchun Song --- include/linux/page-flags.h | 81 ++++++++++++++++++++++++++++++++++---- mm/util.c | 16 ++++++-- 2 files changed, 85 insertions(+), 12 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 0de7db7efb00..e16a4bc82856 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -198,6 +198,29 @@ enum pageflags { =20 #ifndef __GENERATING_BOUNDS_H =20 +/* + * For tail pages, if the size of struct page is power-of-2 ->compound_info + * encodes the mask that converts the address of the tail page address to + * the head page address. + * + * Otherwise, ->compound_info has direct pointer to head pages. + */ +static __always_inline bool compound_info_has_mask(void) +{ + /* + * Limit mask usage to SPARSEMEM_VMEMMAP where it makes a difference + * because of the HugeTLB vmemmap optimization (HVO). + * + * The approach with mask would work for any memory model, but it + * requires validating that struct pages are naturally aligned for + * all orders up to the MAX_FOLIO order, which can be tricky. + */ + if (!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) + return false; + + return is_power_of_2(sizeof(struct page)); +} + #ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); =20 @@ -210,6 +233,10 @@ static __always_inline const struct page *page_fixed_f= ake_head(const struct page if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key)) return page; =20 + /* Fake heads only exists if compound_info_has_mask() is true */ + if (!compound_info_has_mask()) + return page; + /* * Only addresses aligned with PAGE_SIZE of struct page may be fake head * struct page. The alignment check aims to avoid access the fields ( @@ -223,10 +250,14 @@ static __always_inline const struct page *page_fixed_= fake_head(const struct page * because the @page is a compound page composed with at least * two contiguous pages. */ - unsigned long head =3D READ_ONCE(page[1].compound_info); + unsigned long info =3D READ_ONCE(page[1].compound_info); =20 - if (likely(head & 1)) - return (const struct page *)(head - 1); + /* See set_compound_head() */ + if (likely(info & 1)) { + unsigned long p =3D (unsigned long)page; + + return (const struct page *)(p & info); + } } return page; } @@ -281,11 +312,26 @@ static __always_inline int page_is_fake_head(const st= ruct page *page) =20 static __always_inline unsigned long _compound_head(const struct page *pag= e) { - unsigned long head =3D READ_ONCE(page->compound_info); + unsigned long info =3D READ_ONCE(page->compound_info); =20 - if (unlikely(head & 1)) - return head - 1; - return (unsigned long)page_fixed_fake_head(page); + /* Bit 0 encodes PageTail() */ + if (!(info & 1)) + return (unsigned long)page_fixed_fake_head(page); + + /* + * If compound_info_has_mask() is false, the rest of compound_info is + * the pointer to the head page. + */ + if (!compound_info_has_mask()) + return info - 1; + + /* + * If compoun_info_has_mask() is true the rest of the info encodes + * the mask that converts the address of the tail page to the head page. + * + * No need to clear bit 0 in the mask as 'page' always has it clear. + */ + return (unsigned long)page & info; } =20 #define compound_head(page) ((typeof(page))_compound_head(page)) @@ -294,7 +340,26 @@ static __always_inline void set_compound_head(struct p= age *page, const struct page *head, unsigned int order) { - WRITE_ONCE(page->compound_info, (unsigned long)head + 1); + unsigned int shift; + unsigned long mask; + + if (!compound_info_has_mask()) { + WRITE_ONCE(page->compound_info, (unsigned long)head | 1); + return; + } + + /* + * If the size of struct page is power-of-2, bits [shift:0] of the + * virtual address of compound head are zero. + * + * Calculate mask that can be applied to the virtual address of + * the tail page to get address of the head page. + */ + shift =3D order + order_base_2(sizeof(struct page)); + mask =3D GENMASK(BITS_PER_LONG - 1, shift); + + /* Bit 0 encodes PageTail() */ + WRITE_ONCE(page->compound_info, mask | 1); } =20 static __always_inline void clear_compound_head(struct page *page) diff --git a/mm/util.c b/mm/util.c index cbf93cf3223a..f01a9655067f 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1234,7 +1234,7 @@ static void set_ps_flags(struct page_snapshot *ps, co= nst struct folio *folio, */ void snapshot_page(struct page_snapshot *ps, const struct page *page) { - unsigned long head, nr_pages =3D 1; + unsigned long info, nr_pages =3D 1; struct folio *foliop; int loops =3D 5; =20 @@ -1244,8 +1244,8 @@ void snapshot_page(struct page_snapshot *ps, const st= ruct page *page) again: memset(&ps->folio_snapshot, 0, sizeof(struct folio)); memcpy(&ps->page_snapshot, page, sizeof(*page)); - head =3D ps->page_snapshot.compound_info; - if ((head & 1) =3D=3D 0) { + info =3D ps->page_snapshot.compound_info; + if ((info & 1) =3D=3D 0) { ps->idx =3D 0; foliop =3D (struct folio *)&ps->page_snapshot; if (!folio_test_large(foliop)) { @@ -1256,7 +1256,15 @@ void snapshot_page(struct page_snapshot *ps, const s= truct page *page) } foliop =3D (struct folio *)page; } else { - foliop =3D (struct folio *)(head - 1); + /* See compound_head() */ + if (compound_info_has_mask()) { + unsigned long p =3D (unsigned long)page; + + foliop =3D (struct folio *)(p & info); + } else { + foliop =3D (struct folio *)(info - 1); + } + ps->idx =3D folio_page_idx(foliop, page); } =20 --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E157D3A782E for ; Thu, 15 Jan 2026 14:46:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488382; cv=none; b=MYAHGzHLipwtAy9oIv81tp6GOkYtp2oq2+oWU9fwwZV1yezViNrd9QL8tJEDyLzT18oyivk6dV0VwKKN9JC6Ceb5UzIRuRB3gK8/HTNYxlcnzVBF1RXRJ9h0Vtzq+wkRX05Oex+Y40o21d9ZDCi0a4TEdCkL9drwIXZ1SiLI1Gk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488382; c=relaxed/simple; bh=T8UK/c799JlRwYME3UvlwD+wczrXw9AaT7vDCkxDE1M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nl3Nr/wRJBwfvYQ806Ooa+d27C7uc79uaNYMXgEz2D28xN91pdst9KBcRVpmNkHL21NXKMBeBcD5ZUT5ZTpouzH03qxF6VI7GL0Ox0crRNeGm4w2cOofcjZ6645miiI3/9V7CSb7qV4KTDYItsk12nYzEI2cIaA5eMWVoRS/LEY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WZ3g8gbn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WZ3g8gbn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 51395C19422; Thu, 15 Jan 2026 14:46:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488381; bh=T8UK/c799JlRwYME3UvlwD+wczrXw9AaT7vDCkxDE1M=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WZ3g8gbnEiCuhI8vMJDySSJbBbUKB/9dP+m6ZqhSJibYs4S/FWvO4YDs9A6Wyc6xV G4hjEhzGcCBoUsrgPDJLPumCmCs+SY1gFt6+q1155iTdVaqZ4elE2bMIbKZcGXRZBg AbXxkSUiF3cVak2vqJGG+QsTohAbcxAl0QWXG6HN8gwZhdwdPwhxcATH2tj61O+BFf 9aSZwge4QVwtWWbdyCYBOJX5Vf5A2YsGvT/rtmJG9nIcv9l8s7IWusrOfAWMPwdVt3 ctw4WkyFvmhH8oDTivbbK+QRUYoCUa2j4YGWkgXPu5jzj1K8+J0eyfYYY1Qof+B2Ph xqM8AKKNGWhgA== Received: from phl-compute-08.internal (phl-compute-08.internal [10.202.2.48]) by mailfauth.phl.internal (Postfix) with ESMTP id 797CDF4006B; Thu, 15 Jan 2026 09:46:20 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-08.internal (MEProxy); Thu, 15 Jan 2026 09:46:20 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:20 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 07/15] mm: Make page_zonenum() use head page Date: Thu, 15 Jan 2026 14:45:53 +0000 Message-ID: <20260115144604.822702-8-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With the upcoming changes to HVO, a single page of tail struct pages will be shared across all huge pages of the same order on a node. Since huge pages on the same node may belong to different zones, the zone information stored in shared tail page flags would be incorrect. Always fetch zone information from the head page, which has unique and correct zone flags for each compound page. Signed-off-by: Kiryl Shutsemau --- include/linux/mmzone.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6a2f3696068e..590d1a494c4c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1224,6 +1224,7 @@ static inline enum zone_type memdesc_zonenum(memdesc_= flags_t flags) =20 static inline enum zone_type page_zonenum(const struct page *page) { + page =3D compound_head(page); return memdesc_zonenum(page->flags); } =20 --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60AC13A7848 for ; Thu, 15 Jan 2026 14:46:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488383; cv=none; b=a7bU1n+V5La5IhSv6v/fcS1M+ipKh1+1FtfbkywgEJ339NyTlI0ajFVixN/Fw99vVRJZ6gUONrDlptXJC9nVrfho0bAtec3/ogmocxxwk3iyLm1dG2rhVT13/VDXa7il+tD+0NLLTRBFH4kbHjbNT87sZwUQLerop13LoUxZ2mA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488383; c=relaxed/simple; bh=6g9s/Zv5Bx6BabFSpoMhFmX+Y4IAWEXocX3aiWMnCgE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QYLkCiOUXWd8ILUnsQVYKlTjjnFS0WL0gtnmpVXAYiWohj5aFP53PDGN6crKEHMfTPghNpziOrHEPBJ7sW0GRdUpSZj6IgUSjWo4ibx89n3ZkzFqTuQRMAgu9qwS8Bzh7ROrngXs4vIe5CPA1vF5N52MN4OXWu2f0Lkn9TG60Cc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=EZikTKWW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EZikTKWW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B89C1C116D0; Thu, 15 Jan 2026 14:46:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488383; bh=6g9s/Zv5Bx6BabFSpoMhFmX+Y4IAWEXocX3aiWMnCgE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EZikTKWWRDRCncKDnsWIQuNB3lIxtLlBsbSUOlbBs3oJB3Bzbek7Y1pB8x2RUgq/T gN+gyS2t/ABkcjykSQ7QYZfvOlLpEhN9d94MN3JaOQwn8P0nQbJOimRXtbZ2g0Kwgc Pa4Vihv4FpvENgxaNep3Wi88Dv6zSh9ePIPZ1yF+a66L3DIJRpi3jPGq5NFXxmfg3q mKr5mKXFqrX0+NZzdGr8nOSkqdKc5LJoj3bTuP0RH1PRd19TPiu5p8Oh3uEUV9BD9M 2yyoS0+CbkfUWbMd5V2ZHqBovGXKW1xorCIuV3wBGKNmWCAhKt1mC6u/K25n7+WhJz jtSGyDOSOLgvQ== Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfauth.phl.internal (Postfix) with ESMTP id E8866F40068; Thu, 15 Jan 2026 09:46:21 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Thu, 15 Jan 2026 09:46:21 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgepvdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:21 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 08/15] mm/sparse: Check memmap alignment for compound_info_has_mask() Date: Thu, 15 Jan 2026 14:45:54 +0000 Message-ID: <20260115144604.822702-9-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If page->compound_info encodes a mask, it is expected that memmap to be naturally aligned to the maximum folio size. Add a warning if it is not. A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the kernel is still likely to be functional if this strict check fails. Signed-off-by: Kiryl Shutsemau --- include/linux/mmzone.h | 1 + mm/sparse.c | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 590d1a494c4c..322ed4c42cfc 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -92,6 +92,7 @@ #endif =20 #define MAX_FOLIO_NR_PAGES (1UL << MAX_FOLIO_ORDER) +#define MAX_FOLIO_SIZE (PAGE_SIZE << MAX_FOLIO_ORDER) =20 enum migratetype { MIGRATE_UNMOVABLE, diff --git a/mm/sparse.c b/mm/sparse.c index 17c50a6415c2..5f41a3edcc24 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -600,6 +600,11 @@ void __init sparse_init(void) BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section))); memblocks_present(); =20 + if (compound_info_has_mask()) { + WARN_ON(!IS_ALIGNED((unsigned long)pfn_to_page(0), + MAX_FOLIO_SIZE / sizeof(struct page))); + } + pnum_begin =3D first_present_section_nr(); nid_begin =3D sparse_early_nid(__nr_to_section(pnum_begin)); =20 --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5965F3A7E18; Thu, 15 Jan 2026 14:46:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488385; cv=none; b=SiIxa3n2HWevr8aIrWkh3foL57PpofPJ7034r2RiinIPWZjAYLjEfFok/6UcZw+O05LiuSxthr6Gp3Ka3bjt1V8NBHsqTCkBeCpBSF++yEgw6JZFBA/swXIuJKXeqJ0V58AYaXHxBmdS7H9KPHktwTMuqIGq/iv7/0xnJdD2vbE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488385; c=relaxed/simple; bh=1HTQBltWYLmkQyjOip6NWnISwSivq5xaZ3eChCpcmuw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JM2GVrILAYDn6Tq2LrQtHDj7TKL/JvGSQf+yFXhccnPdH1sH72KlfE/5fIkbAX0B5En+Ms1zD+xuW0qor8O7Kcv/420uYYcpDXAjumgd2CXZLNzUQ8mFd4fZF0arAQ5TJiHRR3l5vIJY7J77ykqloFqmv2Lney/EnNtTUgAhLnk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lu0+u1FY; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lu0+u1FY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4EEC5C4AF09; Thu, 15 Jan 2026 14:46:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488385; bh=1HTQBltWYLmkQyjOip6NWnISwSivq5xaZ3eChCpcmuw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lu0+u1FYIUQ1LFvM7cFe+eZ/jS27u791/ilxlbLmvfgaYkyS+jFLUKwjJ5aAHX/mK rrC5/gmxr4e0cLpVZs8SoaM8nJYGKLFA2g6Coz1YGF1KarVTgp2oXqKJ06i1MFOOBc 3a0YZzE3XhVQzbO3sVnevLlsBb+7R8kEWsdH63PBWaXrj/CoJeD8gIpRlN5nU5cyWf pigzeZ05A8YLOlOPkkM+3hrQVBmN2J2pZlvi/hkIooeGO/bh0rFso3ejB4wc9LSJIV 1tUHOeEAeH2aDq4gDbHVHzCuD5rC8VDQmw+IaPKfhQkHm8i8jbHPjnrLB8uON7hBQ8 zgA/nrom9Hiew== Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfauth.phl.internal (Postfix) with ESMTP id 79F80F4006B; Thu, 15 Jan 2026 09:46:23 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-04.internal (MEProxy); Thu, 15 Jan 2026 09:46:23 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgepvdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:22 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 09/15] mm/hugetlb: Refactor code around vmemmap_walk Date: Thu, 15 Jan 2026 14:45:55 +0000 Message-ID: <20260115144604.822702-10-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To prepare for removing fake head pages, the vmemmap_walk code is being rew= orked. The reuse_page and reuse_addr variables are being eliminated. There will no longer be an expectation regarding the reuse address in relation to the operated range. Instead, the caller will provide head and tail vmemmap pages, along with the vmemmap_start address where the head page is located. Currently, vmemmap_head and vmemmap_tail are set to the same page, but this will change in the future. The only functional change is that __hugetlb_vmemmap_optimize_folio() will abandon optimization if memory allocation fails. Signed-off-by: Kiryl Shutsemau --- mm/hugetlb_vmemmap.c | 198 ++++++++++++++++++------------------------- 1 file changed, 83 insertions(+), 115 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index ba0fb1b6a5a8..2b19c2205091 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -24,8 +24,9 @@ * * @remap_pte: called for each lowest-level entry (PTE). * @nr_walked: the number of walked pte. - * @reuse_page: the page which is reused for the tail vmemmap pages. - * @reuse_addr: the virtual address of the @reuse_page page. + * @vmemmap_start: the start of vmemmap range, where head page is located + * @vmemmap_head: the page to be installed as first in the vmemmap range + * @vmemmap_tail: the page to be installed as non-first in the vmemmap ran= ge * @vmemmap_pages: the list head of the vmemmap pages that can be freed * or is mapped from. * @flags: used to modify behavior in vmemmap page table walking @@ -34,11 +35,14 @@ struct vmemmap_remap_walk { void (*remap_pte)(pte_t *pte, unsigned long addr, struct vmemmap_remap_walk *walk); + unsigned long nr_walked; - struct page *reuse_page; - unsigned long reuse_addr; + unsigned long vmemmap_start; + struct page *vmemmap_head; + struct page *vmemmap_tail; struct list_head *vmemmap_pages; =20 + /* Skip the TLB flush when we split the PMD */ #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) /* Skip the TLB flush when we remap the PTE */ @@ -140,14 +144,7 @@ static int vmemmap_pte_entry(pte_t *pte, unsigned long= addr, { struct vmemmap_remap_walk *vmemmap_walk =3D walk->private; =20 - /* - * The reuse_page is found 'first' in page table walking before - * starting remapping. - */ - if (!vmemmap_walk->reuse_page) - vmemmap_walk->reuse_page =3D pte_page(ptep_get(pte)); - else - vmemmap_walk->remap_pte(pte, addr, vmemmap_walk); + vmemmap_walk->remap_pte(pte, addr, vmemmap_walk); vmemmap_walk->nr_walked++; =20 return 0; @@ -207,18 +204,12 @@ static void free_vmemmap_page_list(struct list_head *= list) static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, struct vmemmap_remap_walk *walk) { - /* - * Remap the tail pages as read-only to catch illegal write operation - * to the tail pages. - */ - pgprot_t pgprot =3D PAGE_KERNEL_RO; struct page *page =3D pte_page(ptep_get(pte)); pte_t entry; =20 /* Remapping the head page requires r/w */ - if (unlikely(addr =3D=3D walk->reuse_addr)) { - pgprot =3D PAGE_KERNEL; - list_del(&walk->reuse_page->lru); + if (unlikely(addr =3D=3D walk->vmemmap_start)) { + list_del(&walk->vmemmap_head->lru); =20 /* * Makes sure that preceding stores to the page contents from @@ -226,9 +217,16 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned lon= g addr, * write. */ smp_wmb(); + + entry =3D mk_pte(walk->vmemmap_head, PAGE_KERNEL); + } else { + /* + * Remap the tail pages as read-only to catch illegal write + * operation to the tail pages. + */ + entry =3D mk_pte(walk->vmemmap_tail, PAGE_KERNEL_RO); } =20 - entry =3D mk_pte(walk->reuse_page, pgprot); list_add(&page->lru, walk->vmemmap_pages); set_pte_at(&init_mm, addr, pte, entry); } @@ -255,16 +253,13 @@ static inline void reset_struct_pages(struct page *st= art) static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, struct vmemmap_remap_walk *walk) { - pgprot_t pgprot =3D PAGE_KERNEL; struct page *page; void *to; =20 - BUG_ON(pte_page(ptep_get(pte)) !=3D walk->reuse_page); - page =3D list_first_entry(walk->vmemmap_pages, struct page, lru); list_del(&page->lru); to =3D page_to_virt(page); - copy_page(to, (void *)walk->reuse_addr); + copy_page(to, (void *)walk->vmemmap_start); reset_struct_pages(to); =20 /* @@ -272,7 +267,7 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned lo= ng addr, * before the set_pte_at() write. */ smp_wmb(); - set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); + set_pte_at(&init_mm, addr, pte, mk_pte(page, PAGE_KERNEL)); } =20 /** @@ -282,33 +277,29 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned = long addr, * to remap. * @end: end address of the vmemmap virtual address range that we wa= nt to * remap. - * @reuse: reuse address. - * * Return: %0 on success, negative error code otherwise. */ -static int vmemmap_remap_split(unsigned long start, unsigned long end, - unsigned long reuse) +static int vmemmap_remap_split(unsigned long start, unsigned long end) { struct vmemmap_remap_walk walk =3D { .remap_pte =3D NULL, + .vmemmap_start =3D start, .flags =3D VMEMMAP_SPLIT_NO_TLB_FLUSH, }; =20 - /* See the comment in the vmemmap_remap_free(). */ - BUG_ON(start - reuse !=3D PAGE_SIZE); - - return vmemmap_remap_range(reuse, end, &walk); + return vmemmap_remap_range(start, end, &walk); } =20 /** * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @= end) - * to the page which @reuse is mapped to, then free vmemmap - * which the range are mapped to. + * to use @vmemmap_head/tail, then free vmemmap which + * the range are mapped to. * @start: start address of the vmemmap virtual address range that we want * to remap. * @end: end address of the vmemmap virtual address range that we want to * remap. - * @reuse: reuse address. + * @vmemmap_head: the page to be installed as first in the vmemmap range + * @vmemmap_tail: the page to be installed as non-first in the vmemmap ran= ge * @vmemmap_pages: list to deposit vmemmap pages to be freed. It is calle= rs * responsibility to free pages. * @flags: modifications to vmemmap_remap_walk flags @@ -316,69 +307,40 @@ static int vmemmap_remap_split(unsigned long start, u= nsigned long end, * Return: %0 on success, negative error code otherwise. */ static int vmemmap_remap_free(unsigned long start, unsigned long end, - unsigned long reuse, + struct page *vmemmap_head, + struct page *vmemmap_tail, struct list_head *vmemmap_pages, unsigned long flags) { int ret; struct vmemmap_remap_walk walk =3D { .remap_pte =3D vmemmap_remap_pte, - .reuse_addr =3D reuse, + .vmemmap_start =3D start, + .vmemmap_head =3D vmemmap_head, + .vmemmap_tail =3D vmemmap_tail, .vmemmap_pages =3D vmemmap_pages, .flags =3D flags, }; - int nid =3D page_to_nid((struct page *)reuse); - gfp_t gfp_mask =3D GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; + + ret =3D vmemmap_remap_range(start, end, &walk); + if (!ret || !walk.nr_walked) + return ret; + + end =3D start + walk.nr_walked * PAGE_SIZE; =20 /* - * Allocate a new head vmemmap page to avoid breaking a contiguous - * block of struct page memory when freeing it back to page allocator - * in free_vmemmap_page_list(). This will allow the likely contiguous - * struct page backing memory to be kept contiguous and allowing for - * more allocations of hugepages. Fallback to the currently - * mapped head page in case should it fail to allocate. + * vmemmap_pages contains pages from the previous vmemmap_remap_range() + * call which failed. These are pages which were removed from + * the vmemmap. They will be restored in the following call. */ - walk.reuse_page =3D alloc_pages_node(nid, gfp_mask, 0); - if (walk.reuse_page) { - copy_page(page_to_virt(walk.reuse_page), - (void *)walk.reuse_addr); - list_add(&walk.reuse_page->lru, vmemmap_pages); - memmap_pages_add(1); - } + walk =3D (struct vmemmap_remap_walk) { + .remap_pte =3D vmemmap_restore_pte, + .vmemmap_start =3D start, + .vmemmap_pages =3D vmemmap_pages, + .flags =3D 0, + }; =20 - /* - * In order to make remapping routine most efficient for the huge pages, - * the routine of vmemmap page table walking has the following rules - * (see more details from the vmemmap_pte_range()): - * - * - The range [@start, @end) and the range [@reuse, @reuse + PAGE_SIZE) - * should be continuous. - * - The @reuse address is part of the range [@reuse, @end) that we are - * walking which is passed to vmemmap_remap_range(). - * - The @reuse address is the first in the complete range. - * - * So we need to make sure that @start and @reuse meet the above rules. - */ - BUG_ON(start - reuse !=3D PAGE_SIZE); - - ret =3D vmemmap_remap_range(reuse, end, &walk); - if (ret && walk.nr_walked) { - end =3D reuse + walk.nr_walked * PAGE_SIZE; - /* - * vmemmap_pages contains pages from the previous - * vmemmap_remap_range call which failed. These - * are pages which were removed from the vmemmap. - * They will be restored in the following call. - */ - walk =3D (struct vmemmap_remap_walk) { - .remap_pte =3D vmemmap_restore_pte, - .reuse_addr =3D reuse, - .vmemmap_pages =3D vmemmap_pages, - .flags =3D 0, - }; - - vmemmap_remap_range(reuse, end, &walk); - } + vmemmap_remap_range(start, end, &walk); =20 return ret; } @@ -415,29 +377,27 @@ static int alloc_vmemmap_page_list(unsigned long star= t, unsigned long end, * to remap. * @end: end address of the vmemmap virtual address range that we want to * remap. - * @reuse: reuse address. * @flags: modifications to vmemmap_remap_walk flags * * Return: %0 on success, negative error code otherwise. */ static int vmemmap_remap_alloc(unsigned long start, unsigned long end, - unsigned long reuse, unsigned long flags) + unsigned long flags) { LIST_HEAD(vmemmap_pages); struct vmemmap_remap_walk walk =3D { .remap_pte =3D vmemmap_restore_pte, - .reuse_addr =3D reuse, + .vmemmap_start =3D start, .vmemmap_pages =3D &vmemmap_pages, .flags =3D flags, }; =20 - /* See the comment in the vmemmap_remap_free(). */ - BUG_ON(start - reuse !=3D PAGE_SIZE); + start +=3D HUGETLB_VMEMMAP_RESERVE_SIZE; =20 if (alloc_vmemmap_page_list(start, end, &vmemmap_pages)) return -ENOMEM; =20 - return vmemmap_remap_range(reuse, end, &walk); + return vmemmap_remap_range(start, end, &walk); } =20 DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); @@ -454,8 +414,7 @@ static int __hugetlb_vmemmap_restore_folio(const struct= hstate *h, struct folio *folio, unsigned long flags) { int ret; - unsigned long vmemmap_start =3D (unsigned long)&folio->page, vmemmap_end; - unsigned long vmemmap_reuse; + unsigned long vmemmap_start, vmemmap_end; =20 VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(folio), folio); VM_WARN_ON_ONCE_FOLIO(folio_ref_count(folio), folio); @@ -466,18 +425,16 @@ static int __hugetlb_vmemmap_restore_folio(const stru= ct hstate *h, if (flags & VMEMMAP_SYNCHRONIZE_RCU) synchronize_rcu(); =20 + vmemmap_start =3D (unsigned long)folio; vmemmap_end =3D vmemmap_start + hugetlb_vmemmap_size(h); - vmemmap_reuse =3D vmemmap_start; - vmemmap_start +=3D HUGETLB_VMEMMAP_RESERVE_SIZE; =20 /* * The pages which the vmemmap virtual address range [@vmemmap_start, - * @vmemmap_end) are mapped to are freed to the buddy allocator, and - * the range is mapped to the page which @vmemmap_reuse is mapped to. + * @vmemmap_end) are mapped to are freed to the buddy allocator. * When a HugeTLB page is freed to the buddy allocator, previously * discarded vmemmap pages must be allocated and remapping. */ - ret =3D vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse, fl= ags); + ret =3D vmemmap_remap_alloc(vmemmap_start, vmemmap_end, flags); if (!ret) { folio_clear_hugetlb_vmemmap_optimized(folio); static_branch_dec(&hugetlb_optimize_vmemmap_key); @@ -565,9 +522,9 @@ static int __hugetlb_vmemmap_optimize_folio(const struc= t hstate *h, struct list_head *vmemmap_pages, unsigned long flags) { - int ret =3D 0; - unsigned long vmemmap_start =3D (unsigned long)&folio->page, vmemmap_end; - unsigned long vmemmap_reuse; + unsigned long vmemmap_start, vmemmap_end; + struct page *vmemmap_head, *vmemmap_tail; + int nid, ret =3D 0; =20 VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(folio), folio); VM_WARN_ON_ONCE_FOLIO(folio_ref_count(folio), folio); @@ -592,18 +549,31 @@ static int __hugetlb_vmemmap_optimize_folio(const str= uct hstate *h, */ folio_set_hugetlb_vmemmap_optimized(folio); =20 + nid =3D folio_nid(folio); + vmemmap_head =3D alloc_pages_node(nid, GFP_KERNEL, 0); + + if (!vmemmap_head) { + ret =3D -ENOMEM; + goto out; + } + + copy_page(page_to_virt(vmemmap_head), folio); + list_add(&vmemmap_head->lru, vmemmap_pages); + memmap_pages_add(1); + + vmemmap_tail =3D vmemmap_head; + vmemmap_start =3D (unsigned long)folio; vmemmap_end =3D vmemmap_start + hugetlb_vmemmap_size(h); - vmemmap_reuse =3D vmemmap_start; - vmemmap_start +=3D HUGETLB_VMEMMAP_RESERVE_SIZE; =20 /* - * Remap the vmemmap virtual address range [@vmemmap_start, @vmemmap_end) - * to the page which @vmemmap_reuse is mapped to. Add pages previously - * mapping the range to vmemmap_pages list so that they can be freed by - * the caller. + * Remap the vmemmap virtual address range [@vmemmap_start, @vmemmap_end). + * Add pages previously mapping the range to vmemmap_pages list so that + * they can be freed by the caller. */ - ret =3D vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, + ret =3D vmemmap_remap_free(vmemmap_start, vmemmap_end, + vmemmap_head, vmemmap_tail, vmemmap_pages, flags); +out: if (ret) { static_branch_dec(&hugetlb_optimize_vmemmap_key); folio_clear_hugetlb_vmemmap_optimized(folio); @@ -632,21 +602,19 @@ void hugetlb_vmemmap_optimize_folio(const struct hsta= te *h, struct folio *folio) =20 static int hugetlb_vmemmap_split_folio(const struct hstate *h, struct foli= o *folio) { - unsigned long vmemmap_start =3D (unsigned long)&folio->page, vmemmap_end; - unsigned long vmemmap_reuse; + unsigned long vmemmap_start, vmemmap_end; =20 if (!vmemmap_should_optimize_folio(h, folio)) return 0; =20 + vmemmap_start =3D (unsigned long)folio; vmemmap_end =3D vmemmap_start + hugetlb_vmemmap_size(h); - vmemmap_reuse =3D vmemmap_start; - vmemmap_start +=3D HUGETLB_VMEMMAP_RESERVE_SIZE; =20 /* * Split PMDs on the vmemmap virtual address range [@vmemmap_start, * @vmemmap_end] */ - return vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse); + return vmemmap_remap_split(vmemmap_start, vmemmap_end); } =20 static void __hugetlb_vmemmap_optimize_folios(struct hstate *h, --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F1263A7F57 for ; Thu, 15 Jan 2026 14:46:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488386; cv=none; b=BV0+toyWhurBMnt0iMKZrLtx+Iikj1q/Ie7oLPtXkHkgxBB6A+EYIdbEBbiBMMTLdY7veaPS5AiAJl8d4zMlXvRSr8IfsnH8SJePnWhnV5quPhdXRVasVjgQUtA3Rh1YCUQMzvVB3I3ulruvAhi5DMUC4m+sdfs3jWZU1yyj83o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488386; c=relaxed/simple; bh=I0peSzQJvlIZz5uMjSKzPiwpkY7ufWY5RidKc3yKlbU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cUUV6/8SjBbBi1zgsIDcS2e4xtIsCmAZgwS07Da5OYnsjlT3udZLmGO75m6PmiL8KDy+mRXLxm2uFlCf47G1p7z4VzlVw1mzbTIzVQxXBr5epGvmEi36X0hOUL56rEQBK5761W21D/3anE5KNAM8Gs6IjjEpkn+ANMBCG2Jk9FI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=R699iQhV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="R699iQhV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3C9FC4AF09; Thu, 15 Jan 2026 14:46:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488386; bh=I0peSzQJvlIZz5uMjSKzPiwpkY7ufWY5RidKc3yKlbU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=R699iQhV4OOHiDgDYTjylFpQBC6eLW+aiX9cTvJJ4tgeLRRFggUCDCtDpoRKfsh/S 02U6AOEO2KAqxxfFd57tns0l2SHnNpWa+pJdCRYhD9f44xqT2iit4TbiowmIcB0H6e /V8JkqSpD4hVBTYr1zeOGzqb/+wmIJclkfDxNt8t3h+BWZctFpERXLcr+ktikmBS2V RFo5qY7fwFh64Xd08iQrcIFstqyelGubITEBBEZqh0DO4+YaTX5u1FQhEgvagvYz4B 2EtjG4u2+ORD+G5l/Z9Us+8RZxXz6hY+cvTGxaJ3rLUjPWsVxRKTWmY1tuCrChmyI+ 2yysEf2GfKojg== Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfauth.phl.internal (Postfix) with ESMTP id 0016EF40068; Thu, 15 Jan 2026 09:46:25 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Thu, 15 Jan 2026 09:46:25 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgepvdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:24 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 10/15] mm/hugetlb: Remove fake head pages Date: Thu, 15 Jan 2026 14:45:56 +0000 Message-ID: <20260115144604.822702-11-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" HugeTLB Vmemmap Optimization (HVO) reduces memory usage by freeing most vmemmap pages for huge pages and remapping the freed range to a single page containing the struct page metadata. With the new mask-based compound_info encoding (for power-of-2 struct page sizes), all tail pages of the same order are now identical regardless of which compound page they belong to. This means the tail pages can be truly shared without fake heads. Allocate a single page of initialized tail struct pages per NUMA node per order in the vmemmap_tails[] array in pglist_data. All huge pages of that order on the node share this tail page, mapped read-only into their vmemmap. The head page remains unique per huge page. This eliminates fake heads while maintaining the same memory savings, and simplifies compound_head() by removing fake head detection. Signed-off-by: Kiryl Shutsemau --- include/linux/mmzone.h | 16 ++++++++++++++- mm/hugetlb_vmemmap.c | 44 ++++++++++++++++++++++++++++++++++++++++-- mm/sparse-vmemmap.c | 44 ++++++++++++++++++++++++++++++++++-------- 3 files changed, 93 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 322ed4c42cfc..2ee3eb610291 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -82,7 +82,11 @@ * currently expect (see CONFIG_HAVE_GIGANTIC_FOLIOS): with hugetlb, we ex= pect * no folios larger than 16 GiB on 64bit and 1 GiB on 32bit. */ -#define MAX_FOLIO_ORDER get_order(IS_ENABLED(CONFIG_64BIT) ? SZ_16G : SZ_= 1G) +#ifdef CONFIG_64BIT +#define MAX_FOLIO_ORDER (34 - PAGE_SHIFT) +#else +#define MAX_FOLIO_ORDER (30 - PAGE_SHIFT) +#endif #else /* * Without hugetlb, gigantic folios that are bigger than a single PUD are @@ -1408,6 +1412,13 @@ struct memory_failure_stats { }; #endif =20 +/* + * vmemmap optimization (like HVO) is only possible for page orders that f= ill + * two or more pages with struct pages. + */ +#define VMEMMAP_TAIL_MIN_ORDER (ilog2(2 * PAGE_SIZE / sizeof(struct page))) +#define NR_VMEMMAP_TAILS (MAX_FOLIO_ORDER - VMEMMAP_TAIL_MIN_ORDER + 1) + /* * On NUMA machines, each NUMA node would have a pg_data_t to describe * it's memory layout. On UMA machines there is a single pglist_data which @@ -1556,6 +1567,9 @@ typedef struct pglist_data { #ifdef CONFIG_MEMORY_FAILURE struct memory_failure_stats mf_stats; #endif +#ifdef CONFIG_SPARSEMEM_VMEMMAP + unsigned long vmemmap_tails[NR_VMEMMAP_TAILS]; +#endif } pg_data_t; =20 #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 2b19c2205091..cbdca4684db1 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -18,6 +18,7 @@ #include #include #include "hugetlb_vmemmap.h" +#include "internal.h" =20 /** * struct vmemmap_remap_walk - walk vmemmap page table @@ -517,6 +518,41 @@ static bool vmemmap_should_optimize_folio(const struct= hstate *h, struct folio * return true; } =20 +static struct page *vmemmap_get_tail(unsigned int order, int node) +{ + unsigned long pfn; + unsigned int idx; + struct page *tail, *p; + + idx =3D order - VMEMMAP_TAIL_MIN_ORDER; + pfn =3D NODE_DATA(node)->vmemmap_tails[idx]; + if (pfn) + return pfn_to_page(pfn); + + tail =3D alloc_pages_node(node, GFP_KERNEL, 0); + if (!tail) + return NULL; + + p =3D page_to_virt(tail); + for (int i =3D 0; i < PAGE_SIZE / sizeof(struct page); i++) + prep_compound_tail(p + i, NULL, order); + + spin_lock(&hugetlb_lock); + if (!NODE_DATA(node)->vmemmap_tails[idx]) { + pfn =3D PHYS_PFN(virt_to_phys(p)); + NODE_DATA(node)->vmemmap_tails[idx] =3D pfn; + tail =3D NULL; + } else { + pfn =3D NODE_DATA(node)->vmemmap_tails[idx]; + } + spin_unlock(&hugetlb_lock); + + if (tail) + __free_page(tail); + + return pfn_to_page(pfn); +} + static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h, struct folio *folio, struct list_head *vmemmap_pages, @@ -532,6 +568,12 @@ static int __hugetlb_vmemmap_optimize_folio(const stru= ct hstate *h, if (!vmemmap_should_optimize_folio(h, folio)) return ret; =20 + nid =3D folio_nid(folio); + + vmemmap_tail =3D vmemmap_get_tail(h->order, nid); + if (!vmemmap_tail) + return -ENOMEM; + static_branch_inc(&hugetlb_optimize_vmemmap_key); =20 if (flags & VMEMMAP_SYNCHRONIZE_RCU) @@ -549,7 +591,6 @@ static int __hugetlb_vmemmap_optimize_folio(const struc= t hstate *h, */ folio_set_hugetlb_vmemmap_optimized(folio); =20 - nid =3D folio_nid(folio); vmemmap_head =3D alloc_pages_node(nid, GFP_KERNEL, 0); =20 if (!vmemmap_head) { @@ -561,7 +602,6 @@ static int __hugetlb_vmemmap_optimize_folio(const struc= t hstate *h, list_add(&vmemmap_head->lru, vmemmap_pages); memmap_pages_add(1); =20 - vmemmap_tail =3D vmemmap_head; vmemmap_start =3D (unsigned long)folio; vmemmap_end =3D vmemmap_start + hugetlb_vmemmap_size(h); =20 diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index dbd8daccade2..94b4e90fa00f 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -378,16 +378,45 @@ void vmemmap_wrprotect_hvo(unsigned long addr, unsign= ed long end, } } =20 -/* - * Populate vmemmap pages HVO-style. The first page contains the head - * page and needed tail pages, the other ones are mirrors of the first - * page. - */ +static __meminit unsigned long vmemmap_get_tail(unsigned int order, int no= de) +{ + unsigned long pfn; + unsigned int idx; + struct page *p; + + BUG_ON(order < VMEMMAP_TAIL_MIN_ORDER); + BUG_ON(order > MAX_FOLIO_ORDER); + + idx =3D order - VMEMMAP_TAIL_MIN_ORDER; + pfn =3D NODE_DATA(node)->vmemmap_tails[idx]; + if (pfn) + return pfn; + + p =3D vmemmap_alloc_block_zero(PAGE_SIZE, node); + if (!p) + return 0; + + for (int i =3D 0; i < PAGE_SIZE / sizeof(struct page); i++) + prep_compound_tail(p + i, NULL, order); + + pfn =3D PHYS_PFN(virt_to_phys(p)); + NODE_DATA(node)->vmemmap_tails[idx] =3D pfn; + + return pfn; +} + int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end, int node, unsigned long headsize) { + unsigned long maddr, len, tail_pfn; + unsigned int order; pte_t *pte; - unsigned long maddr; + + len =3D end - addr; + order =3D ilog2(len * sizeof(struct page) / PAGE_SIZE); + tail_pfn =3D vmemmap_get_tail(order, node); + if (!tail_pfn) + return -ENOMEM; =20 for (maddr =3D addr; maddr < addr + headsize; maddr +=3D PAGE_SIZE) { pte =3D vmemmap_populate_address(maddr, node, NULL, -1, 0); @@ -398,8 +427,7 @@ int __meminit vmemmap_populate_hvo(unsigned long addr, = unsigned long end, /* * Reuse the last page struct page mapped above for the rest. */ - return vmemmap_populate_range(maddr, end, node, NULL, - pte_pfn(ptep_get(pte)), 0); + return vmemmap_populate_range(maddr, end, node, NULL, tail_pfn, 0); } =20 void __weak __meminit vmemmap_set_pmd(pmd_t *pmd, void *p, int node, --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F9023A7F4E; Thu, 15 Jan 2026 14:46:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488388; cv=none; b=j5a4Bv+Ah5ZFcT+jN8R06XXH0sNnnrLv5cHn4LnDKdV278/gzHhmtTk8L3YmpM6afWmb6sS3fj8zEyL6eJeqlEIGpuuFzoKljiMA5spgr8jLezXO/6cD06iD/H2qvzXtndnD0lS4XMZa83/qnfGYp4VYZBn3c4MSjSlYS7JxUnY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488388; c=relaxed/simple; bh=YqZnPHtWAbhYe/dyoR0Wa8mgCurb8e3E4uC+0tBgKL8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MTwRg+6HAqLtGtfJ+LC5TmSAA59EfGT9j5H1MuCracBpyGnckKOS6f42azLvQMT3Z4fqarqRN7qbHr1eIVyd9kd78UyjWcxGyZ/ehoQsQuulykfvB/ajol4/tl1gXjwT266w6YBIj/vNmQnBB9wdHy4mZeo02YjGOZeFcYVfKg4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rWv4qe+s; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rWv4qe+s" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 463FDC16AAE; Thu, 15 Jan 2026 14:46:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488387; bh=YqZnPHtWAbhYe/dyoR0Wa8mgCurb8e3E4uC+0tBgKL8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rWv4qe+sJ+/hFsRhoutKxYJ8CJi+1CusM17y46kUQBOMKSUm/MhDuVzP0DGfLPF9H oZfOrBpfbuI8nzW1nVm6hJL0+mUtSghpqW30SE18xSuxgpvAZLxASGNT+42YLhLiLz cAO6rBFlP6v7XJs1p2Q9p3NABva9NV/JHSHpYfhS8HcSWO2wTuYxE4JGYSUxK+w0lJ sIbPo7vZis0AK28ZdJv7ZP1HiZqnegbrWVFQLH9eqiN0u+JM4oNZKojNPxTfGTN9FF 8JGgAvudtokffov/+XeYX08WHKIybpy7+PEL74ChIlqjjES6rlSg4MRUPm0BR6FSGg mDCHI+SdCpQUA== Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfauth.phl.internal (Postfix) with ESMTP id 73FE6F4006B; Thu, 15 Jan 2026 09:46:26 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Thu, 15 Jan 2026 09:46:26 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:26 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 11/15] mm: Drop fake head checks Date: Thu, 15 Jan 2026 14:45:57 +0000 Message-ID: <20260115144604.822702-12-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With fake head pages eliminated in the previous commit, remove the supporting infrastructure: - page_fixed_fake_head(): no longer needed to detect fake heads; - page_is_fake_head(): no longer needed; - page_count_writable(): no longer needed for RCU protection; - RCU read_lock in page_ref_add_unless(): no longer needed; This substantially simplifies compound_head() and page_ref_add_unless(), removing both branches and RCU overhead from these hot paths. Signed-off-by: Kiryl Shutsemau Reviewed-by: Muchun Song --- include/linux/page-flags.h | 93 ++------------------------------------ include/linux/page_ref.h | 8 +--- 2 files changed, 4 insertions(+), 97 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e16a4bc82856..660f9154a211 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -221,102 +221,15 @@ static __always_inline bool compound_info_has_mask(v= oid) return is_power_of_2(sizeof(struct page)); } =20 -#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); =20 -/* - * Return the real head page struct iff the @page is a fake head page, oth= erwise - * return the @page itself. See Documentation/mm/vmemmap_dedup.rst. - */ -static __always_inline const struct page *page_fixed_fake_head(const struc= t page *page) -{ - if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key)) - return page; - - /* Fake heads only exists if compound_info_has_mask() is true */ - if (!compound_info_has_mask()) - return page; - - /* - * Only addresses aligned with PAGE_SIZE of struct page may be fake head - * struct page. The alignment check aims to avoid access the fields ( - * e.g. compound_info) of the @page[1]. It can avoid touch a (possibly) - * cold cacheline in some cases. - */ - if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) && - test_bit(PG_head, &page->flags.f)) { - /* - * We can safely access the field of the @page[1] with PG_head - * because the @page is a compound page composed with at least - * two contiguous pages. - */ - unsigned long info =3D READ_ONCE(page[1].compound_info); - - /* See set_compound_head() */ - if (likely(info & 1)) { - unsigned long p =3D (unsigned long)page; - - return (const struct page *)(p & info); - } - } - return page; -} - -static __always_inline bool page_count_writable(const struct page *page, i= nt u) -{ - if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key)) - return true; - - /* - * The refcount check is ordered before the fake-head check to prevent - * the following race: - * CPU 1 (HVO) CPU 2 (speculative PFN walker) - * - * page_ref_freeze() - * synchronize_rcu() - * rcu_read_lock() - * page_is_fake_head() is false - * vmemmap_remap_pte() - * XXX: struct page[] becomes r/o - * - * page_ref_unfreeze() - * page_ref_count() is not zero - * - * atomic_add_unless(&page->_refcount) - * XXX: try to modify r/o struct page[] - * - * The refcount check also prevents modification attempts to other (r/o) - * tail pages that are not fake heads. - */ - if (atomic_read_acquire(&page->_refcount) =3D=3D u) - return false; - - return page_fixed_fake_head(page) =3D=3D page; -} -#else -static inline const struct page *page_fixed_fake_head(const struct page *p= age) -{ - return page; -} - -static inline bool page_count_writable(const struct page *page, int u) -{ - return true; -} -#endif - -static __always_inline int page_is_fake_head(const struct page *page) -{ - return page_fixed_fake_head(page) !=3D page; -} - static __always_inline unsigned long _compound_head(const struct page *pag= e) { unsigned long info =3D READ_ONCE(page->compound_info); =20 /* Bit 0 encodes PageTail() */ if (!(info & 1)) - return (unsigned long)page_fixed_fake_head(page); + return (unsigned long)page; =20 /* * If compound_info_has_mask() is false, the rest of compound_info is @@ -397,7 +310,7 @@ static __always_inline void clear_compound_head(struct = page *page) =20 static __always_inline int PageTail(const struct page *page) { - return READ_ONCE(page->compound_info) & 1 || page_is_fake_head(page); + return READ_ONCE(page->compound_info) & 1; } =20 static __always_inline int PageCompound(const struct page *page) @@ -924,7 +837,7 @@ static __always_inline bool folio_test_head(const struc= t folio *folio) static __always_inline int PageHead(const struct page *page) { PF_POISONED_CHECK(page); - return test_bit(PG_head, &page->flags.f) && !page_is_fake_head(page); + return test_bit(PG_head, &page->flags.f); } =20 __SETPAGEFLAG(Head, head, PF_ANY) diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index 544150d1d5fd..490d0ad6e56d 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -230,13 +230,7 @@ static inline int folio_ref_dec_return(struct folio *f= olio) =20 static inline bool page_ref_add_unless(struct page *page, int nr, int u) { - bool ret =3D false; - - rcu_read_lock(); - /* avoid writing to the vmemmap area being remapped */ - if (page_count_writable(page, u)) - ret =3D atomic_add_unless(&page->_refcount, nr, u); - rcu_read_unlock(); + bool ret =3D atomic_add_unless(&page->_refcount, nr, u); =20 if (page_ref_tracepoint_active(page_ref_mod_unless)) __page_ref_mod_unless(page, nr, ret); --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B7163A89B6 for ; Thu, 15 Jan 2026 14:46:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488389; cv=none; b=TrifWvnx5hvYQYkcpKH4T4G36oug6/TiLIClsC6sX9+PXJBTdNNi/2ju6+P0KG42WZKMF5m64jhG4NJXtmvWIvF4ljpRkfGW/j5qRx3GVcP1mbG6Q8wbxG1tLDgIdW5fFLTw8/qtwNErLctk6swqyDcTBEe+bG0tKWZhvBCs69w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488389; c=relaxed/simple; bh=vihP4PIaZOeW2WsiFsW3RxkvVukSadfEbbD8jojLpjU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GKjR50vzl3xf3vuLfJbVxUWKfArTbGlhIZi3X1NrTwqDkww6hTop9fDp+OASpH0Qv6G0lRYNcCQttqn/7jxkfD1IgOoXXe5EYrgRuF0lFRuXLoO9IOT5P48CRHYOUnwNqEzJTaNbn6o5saf5V/Pk5LvEDHq2lFc6Q8xNjo88Rso= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=f16iBKwJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="f16iBKwJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD867C4AF09; Thu, 15 Jan 2026 14:46:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488389; bh=vihP4PIaZOeW2WsiFsW3RxkvVukSadfEbbD8jojLpjU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=f16iBKwJuF7CDueME/0dyCAazzZAskjd+uQsSGr/+cm+PQnlLJwRgrDNXZgXrQLKw 1TLcNPSU12uU8lrwcw9z4dJ9uE9tOTEmfzoiXBESqR+9JG4RmpXqtTQk/dZADgh/iq DNjQ/aZmWy2O/QzRF4m9o4PgBB8EjGYWplGS+FpHNLgjSBXSc0QfLoz7R9qsVK3GU2 9mvZKxKoe2pVo6ERpiwlcFFn1cv7Yl8jxTzVMH+1YWiRQ6Fq7tm8XoKD+JOGvZ5pdc xCcl91SE3XboTkVMtDBdpU6/OQXLc+ggupGbnQ5Ro6N9BMKqUp4A4P35o27UrOMnAa 2OTywiMiw5NSg== Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfauth.phl.internal (Postfix) with ESMTP id 04076F40068; Thu, 15 Jan 2026 09:46:28 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Thu, 15 Jan 2026 09:46:28 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:27 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 12/15] hugetlb: Remove VMEMMAP_SYNCHRONIZE_RCU Date: Thu, 15 Jan 2026 14:45:58 +0000 Message-ID: <20260115144604.822702-13-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The VMEMMAP_SYNCHRONIZE_RCU flag triggered synchronize_rcu() calls to prevent a race between HVO remapping and page_ref_add_unless(). The race could occur when a speculative PFN walker tried to modify the refcount on a struct page that was in the process of being remapped to a fake head. With fake heads eliminated, page_ref_add_unless() no longer needs RCU protection. Remove the flag and synchronize_rcu() calls. Signed-off-by: Kiryl Shutsemau Reviewed-by: Muchun Song --- mm/hugetlb_vmemmap.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index cbdca4684db1..5a802e292bac 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -48,8 +48,6 @@ struct vmemmap_remap_walk { #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) /* Skip the TLB flush when we remap the PTE */ #define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) -/* synchronize_rcu() to avoid writes from page_ref_add_unless() */ -#define VMEMMAP_SYNCHRONIZE_RCU BIT(2) unsigned long flags; }; =20 @@ -423,9 +421,6 @@ static int __hugetlb_vmemmap_restore_folio(const struct= hstate *h, if (!folio_test_hugetlb_vmemmap_optimized(folio)) return 0; =20 - if (flags & VMEMMAP_SYNCHRONIZE_RCU) - synchronize_rcu(); - vmemmap_start =3D (unsigned long)folio; vmemmap_end =3D vmemmap_start + hugetlb_vmemmap_size(h); =20 @@ -456,7 +451,7 @@ static int __hugetlb_vmemmap_restore_folio(const struct= hstate *h, */ int hugetlb_vmemmap_restore_folio(const struct hstate *h, struct folio *fo= lio) { - return __hugetlb_vmemmap_restore_folio(h, folio, VMEMMAP_SYNCHRONIZE_RCU); + return __hugetlb_vmemmap_restore_folio(h, folio, 0); } =20 /** @@ -479,14 +474,11 @@ long hugetlb_vmemmap_restore_folios(const struct hsta= te *h, struct folio *folio, *t_folio; long restored =3D 0; long ret =3D 0; - unsigned long flags =3D VMEMMAP_REMAP_NO_TLB_FLUSH | VMEMMAP_SYNCHRONIZE_= RCU; + unsigned long flags =3D VMEMMAP_REMAP_NO_TLB_FLUSH; =20 list_for_each_entry_safe(folio, t_folio, folio_list, lru) { if (folio_test_hugetlb_vmemmap_optimized(folio)) { ret =3D __hugetlb_vmemmap_restore_folio(h, folio, flags); - /* only need to synchronize_rcu() once for each batch */ - flags &=3D ~VMEMMAP_SYNCHRONIZE_RCU; - if (ret) break; restored++; @@ -576,8 +568,6 @@ static int __hugetlb_vmemmap_optimize_folio(const struc= t hstate *h, =20 static_branch_inc(&hugetlb_optimize_vmemmap_key); =20 - if (flags & VMEMMAP_SYNCHRONIZE_RCU) - synchronize_rcu(); /* * Very Subtle * If VMEMMAP_REMAP_NO_TLB_FLUSH is set, TLB flushing is not performed @@ -636,7 +626,7 @@ void hugetlb_vmemmap_optimize_folio(const struct hstate= *h, struct folio *folio) { LIST_HEAD(vmemmap_pages); =20 - __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, VMEMMAP_SYNCHR= ONIZE_RCU); + __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, 0); free_vmemmap_page_list(&vmemmap_pages); } =20 @@ -664,7 +654,7 @@ static void __hugetlb_vmemmap_optimize_folios(struct hs= tate *h, struct folio *folio; int nr_to_optimize; LIST_HEAD(vmemmap_pages); - unsigned long flags =3D VMEMMAP_REMAP_NO_TLB_FLUSH | VMEMMAP_SYNCHRONIZE_= RCU; + unsigned long flags =3D VMEMMAP_REMAP_NO_TLB_FLUSH; =20 nr_to_optimize =3D 0; list_for_each_entry(folio, folio_list, lru) { @@ -717,8 +707,6 @@ static void __hugetlb_vmemmap_optimize_folios(struct hs= tate *h, int ret; =20 ret =3D __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags= ); - /* only need to synchronize_rcu() once for each batch */ - flags &=3D ~VMEMMAP_SYNCHRONIZE_RCU; =20 /* * Pages to be freed may have been accumulated. If we --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C02CD3A89DF for ; Thu, 15 Jan 2026 14:46:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488391; cv=none; b=matNnAuu0JzYlxw0vxARQQdVSjb61xhtSV1lByRU3KgkIdtNbG39PFSuqqpvdsQIoSAJ8OBReMy76nthlPQ6rQEkBDiuHisYGn0k3Tw7R6G/IWj/sIbEIYpwwZvLT6ZpylV67BR2rYAAwTbDimpjAA+0YJ6p8bJk/OCF9okrlcg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488391; c=relaxed/simple; bh=56Ngy67RioCpKMVBbaFXYfTVQpop0DnoZP0CjrywvUo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Lhi9aPPNOpvoIJGUorPV3qBNHi6lLtmAShq8rxMQSRE+SDoAh+/J9Dgm9624QQu36B/YQzY0wBPVOdMmDZAM/Ok4Y7qMujBle4IxoRoeSZJSxkonwJvlz41O87AZABlxAvIbge+yS7AtFuSUNpSSeTKXvhj5TSTIknjYBzqpng0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=f+4yh2N1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="f+4yh2N1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 751A9C4AF0C; Thu, 15 Jan 2026 14:46:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488391; bh=56Ngy67RioCpKMVBbaFXYfTVQpop0DnoZP0CjrywvUo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=f+4yh2N1Ui5vDxiqaa6nDoQS70tqyMV4l0hk6BZRLjBISP92a7NNssKlOaF1MJg3/ l2Qd/2rEHQZV9AU5pv8Yybj2GS4xWwDLQETriDFX60jH0d+rSx+xwy5VxWnpF6Z+Uj VeR/JeOytDJybekS7QRxMTa82aOcS+RBzeJ/HC1/+ddps1xK01AkJvR5OdAX+HPWuE s06wvq3szAUcN0W+u2MLn6QZf9OXaI0t29SSAhz9iGlsU5NnZNKdVbApylrI7/Rsnr TxpffDc3TI3pGphP6Dfw7C8Jkm0fuGUbStpTRiK7GrVj80T9XlImza8G/mqFeM4Bw3 kxOsGfxIpaf8w== Received: from phl-compute-02.internal (phl-compute-02.internal [10.202.2.42]) by mailfauth.phl.internal (Postfix) with ESMTP id A2945F40068; Thu, 15 Jan 2026 09:46:29 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-02.internal (MEProxy); Thu, 15 Jan 2026 09:46:29 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:29 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 13/15] mm/hugetlb: Remove hugetlb_optimize_vmemmap_key static key Date: Thu, 15 Jan 2026 14:45:59 +0000 Message-ID: <20260115144604.822702-14-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The hugetlb_optimize_vmemmap_key static key was used to guard fake head detection in compound_head() and related functions. It allowed skipping the fake head checks entirely when HVO was not in use. With fake heads eliminated and the detection code removed, the static key serves no purpose. Remove its definition and all increment/decrement calls. Signed-off-by: Kiryl Shutsemau Reviewed-by: Muchun Song --- include/linux/page-flags.h | 2 -- mm/hugetlb_vmemmap.c | 14 ++------------ 2 files changed, 2 insertions(+), 14 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 660f9154a211..f89702e101e8 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -221,8 +221,6 @@ static __always_inline bool compound_info_has_mask(void) return is_power_of_2(sizeof(struct page)); } =20 -DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); - static __always_inline unsigned long _compound_head(const struct page *pag= e) { unsigned long info =3D READ_ONCE(page->compound_info); diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 5a802e292bac..5afd9f4231e3 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -399,9 +399,6 @@ static int vmemmap_remap_alloc(unsigned long start, uns= igned long end, return vmemmap_remap_range(start, end, &walk); } =20 -DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); -EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); - static bool vmemmap_optimize_enabled =3D IS_ENABLED(CONFIG_HUGETLB_PAGE_OP= TIMIZE_VMEMMAP_DEFAULT_ON); static int __init hugetlb_vmemmap_optimize_param(char *buf) { @@ -431,10 +428,8 @@ static int __hugetlb_vmemmap_restore_folio(const struc= t hstate *h, * discarded vmemmap pages must be allocated and remapping. */ ret =3D vmemmap_remap_alloc(vmemmap_start, vmemmap_end, flags); - if (!ret) { + if (!ret) folio_clear_hugetlb_vmemmap_optimized(folio); - static_branch_dec(&hugetlb_optimize_vmemmap_key); - } =20 return ret; } @@ -566,8 +561,6 @@ static int __hugetlb_vmemmap_optimize_folio(const struc= t hstate *h, if (!vmemmap_tail) return -ENOMEM; =20 - static_branch_inc(&hugetlb_optimize_vmemmap_key); - /* * Very Subtle * If VMEMMAP_REMAP_NO_TLB_FLUSH is set, TLB flushing is not performed @@ -604,10 +597,8 @@ static int __hugetlb_vmemmap_optimize_folio(const stru= ct hstate *h, vmemmap_head, vmemmap_tail, vmemmap_pages, flags); out: - if (ret) { - static_branch_dec(&hugetlb_optimize_vmemmap_key); + if (ret) folio_clear_hugetlb_vmemmap_optimized(folio); - } =20 return ret; } @@ -673,7 +664,6 @@ static void __hugetlb_vmemmap_optimize_folios(struct hs= tate *h, register_page_bootmem_memmap(pfn_to_section_nr(spfn), &folio->page, HUGETLB_VMEMMAP_RESERVE_SIZE); - static_branch_inc(&hugetlb_optimize_vmemmap_key); continue; } =20 --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7578C3A7F57; Thu, 15 Jan 2026 14:46:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488393; cv=none; b=qQL9CQC3BGqA1dw09oP2RM6VzG9cczQgyMTFTATtAQg+Tn0uhIkWPmxriEm/PcIlCASlbnaQDOok8SvVsFbN5i3Rff8KeUy877ObrZr+TPxdcgYKJLPVZ5Bt2XI4lC5IjwRud1Lk9IpiTPRNf/waTsOHPLzMsAV7MU/Djx+68no= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488393; c=relaxed/simple; bh=G+MNU7DgEj1sJhXfeeHNFOyPFz1DCwEooo5Fb5WVje8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Eoy9dNfT8MxHe2LeI5i1YJPjh5TSarKZzy82wQzK02LP9//9hD7EjjwPx4nAkPM9ogjkfMuap2tZe6l8zmC5eCot5AwFnAPnfCuL2jQcsKVeCF+zmQu+uih7WsQa+BUy/bVtbsuGn2Ql7BEDr8Y78yEHBWOqAuA5g0eYCmWqENY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TH5fO+yM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TH5fO+yM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1A9F5C2BC9E; Thu, 15 Jan 2026 14:46:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488392; bh=G+MNU7DgEj1sJhXfeeHNFOyPFz1DCwEooo5Fb5WVje8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TH5fO+yMgEA1svpjFaAUNdTpySr4fbVFPv4QdN9JW4Hma8kixvp9X0lmWEzRNEq4g fDJ62es05oxfLpQhp768VtI2TC3YmYwTGu7VAkQNIZX7/A43BOEztoOo8ujeq1mQjW o/2CtwIjdhWd+D7/nX2tqdDdRFiIE5wBUEIzlglOaYvdIJE2owBH6fh/hUDW0Fh1Bp iNt0bJccXiuc8LgvfMKrlwdYqo3wz9qmyJZWblR+i/iJxooCdQIrsg8L3w6lFQpFzv ngpzqt//A+Cuir5WGkyXl+erJyfM9KoBIGx8En1EXeG5vhpbcONuouL2gO9OV4x61e 1psnlI+DBdcjw== Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfauth.phl.internal (Postfix) with ESMTP id 444DDF40068; Thu, 15 Jan 2026 09:46:31 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Thu, 15 Jan 2026 09:46:31 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:30 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 14/15] mm: Remove the branch from compound_head() Date: Thu, 15 Jan 2026 14:46:00 +0000 Message-ID: <20260115144604.822702-15-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The compound_head() function is a hot path. For example, the zap path calls it for every leaf page table entry. Rewrite the helper function in a branchless manner to eliminate the risk of CPU branch misprediction. Signed-off-by: Kiryl Shutsemau Reviewed-by: Muchun Song --- include/linux/page-flags.h | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f89702e101e8..95ac963b78af 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -224,25 +224,32 @@ static __always_inline bool compound_info_has_mask(vo= id) static __always_inline unsigned long _compound_head(const struct page *pag= e) { unsigned long info =3D READ_ONCE(page->compound_info); + unsigned long mask; + + if (!compound_info_has_mask()) { + /* Bit 0 encodes PageTail() */ + if (info & 1) + return info - 1; =20 - /* Bit 0 encodes PageTail() */ - if (!(info & 1)) return (unsigned long)page; - - /* - * If compound_info_has_mask() is false, the rest of compound_info is - * the pointer to the head page. - */ - if (!compound_info_has_mask()) - return info - 1; + } =20 /* * If compoun_info_has_mask() is true the rest of the info encodes * the mask that converts the address of the tail page to the head page. * * No need to clear bit 0 in the mask as 'page' always has it clear. + * + * Let's do it in a branchless manner. */ - return (unsigned long)page & info; + + /* Non-tail: -1UL, Tail: 0 */ + mask =3D (info & 1) - 1; + + /* Non-tail: -1UL, Tail: info */ + mask |=3D info; + + return (unsigned long)page & mask; } =20 #define compound_head(page) ((typeof(page))_compound_head(page)) --=20 2.51.2 From nobody Sat Feb 7 22:34:20 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9A433A4F43 for ; Thu, 15 Jan 2026 14:46:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488395; cv=none; b=Q+X10gBoeHTiWjohwoQffF3Xj1d0I7tXIxHPvbR4i4jxxSt4cJ3PzdCGhoJCJqzoz53Csl06PiUC6lAXVLGxD1HRMBH7DcX7GeEVMnOhx/u/kphoGuJV4IAeM0tVewC2fDrGd87Wutcf4YvWo5B/zOM+6bbXcu/E3p3hyPVNkCM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768488395; c=relaxed/simple; bh=HdI2DN25qqaNRHiMcOMwkD0s20/vCqB8MRYI0Frqk18=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=o3hMi6t/tm57dE0ee1ILpYupUs+8/43oziFTy9TmVEaqIvE4C4j+u9qg/QBGlbZejJcmEKNg4KtuS/7r5g92WtSMgq5LQUY04glwtJxPW2H9nrdvgTvG54UbQ6FHEEoBqt/ij81dQ4BkJkts8WLMr/hG8tgYibgrpfd6MFVD3Rw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lAelqERT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lAelqERT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6487C19423; Thu, 15 Jan 2026 14:46:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768488394; bh=HdI2DN25qqaNRHiMcOMwkD0s20/vCqB8MRYI0Frqk18=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lAelqERTWP5khOctgvKCsEaUvSTJhRUnR/A2GJLPL1Zj/Y+Gx8hJdJBONvZE+rbUt FhW1M0qP4pkgOZv2Bh2emL8Vd/L2arg+xMAEsbgK/MPdLywF0lNRz6AeX3HRpnG9Kd 5CFurJIzXh0TUQ2DEjh6T4fdSkS2Y4DbJFifncO2c6e2E5yjh0kZdjZ0lGAVtdZLbX RftmxQLWnsXK4K737qkUdieMPWXY1te+C/Qe5w6815IfjDcQIOpSMDCym+o7hLm6AS sn71EWWDI+t6eEguPJh0FXb2CSd/k+2TI9pu6+2nrjbvXQc9yEfqs7lmTiZQJWlF4s 4h1/CbLntkQhQ== Received: from phl-compute-08.internal (phl-compute-08.internal [10.202.2.48]) by mailfauth.phl.internal (Postfix) with ESMTP id D1ACBF40068; Thu, 15 Jan 2026 09:46:32 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-08.internal (MEProxy); Thu, 15 Jan 2026 09:46:32 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduvdeifeegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhggtgfgsehtkeertdertdejnecuhfhrohhmpefmihhrhihl ucfuhhhuthhsvghmrghuuceokhgrsheskhgvrhhnvghlrdhorhhgqeenucggtffrrghtth gvrhhnpeffkefffedugfeiudejheefleehteevtefgvefhveetheehkefhjeefhefgleej veenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehkih hrihhllhdomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudeiudduiedvieeh hedqvdekgeeggeejvdekqdhkrghspeepkhgvrhhnvghlrdhorhhgsehshhhuthgvmhhovh drnhgrmhgvpdhnsggprhgtphhtthhopedvtddpmhhouggvpehsmhhtphhouhhtpdhrtghp thhtoheprghkphhmsehlihhnuhigqdhfohhunhgurghtihhonhdrohhrghdprhgtphhtth hopehmuhgthhhunhdrshhonhhgsehlihhnuhigrdguvghvpdhrtghpthhtohepuggrvhhi ugeskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepfihilhhlhiesihhnfhhrrgguvggrug drohhrghdprhgtphhtthhopehushgrmhgrrghrihhfieegvdesghhmrghilhdrtghomhdp rhgtphhtthhopehfvhgulhesghhoohhglhgvrdgtohhmpdhrtghpthhtohepohhsrghlvh grughorhesshhushgvrdguvgdprhgtphhtthhopehrphhptheskhgvrhhnvghlrdhorhhg pdhrtghpthhtohepvhgsrggskhgrsehsuhhsvgdrtgii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Jan 2026 09:46:32 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv3 15/15] hugetlb: Update vmemmap_dedup.rst Date: Thu, 15 Jan 2026 14:46:01 +0000 Message-ID: <20260115144604.822702-16-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260115144604.822702-1-kas@kernel.org> References: <20260115144604.822702-1-kas@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Update the documentation regarding vmemmap optimization for hugetlb to reflect the changes in how the kernel maps the tail pages. Fake heads no longer exist. Remove their description. Signed-off-by: Kiryl Shutsemau --- Documentation/mm/vmemmap_dedup.rst | 60 +++++++++++++----------------- 1 file changed, 26 insertions(+), 34 deletions(-) diff --git a/Documentation/mm/vmemmap_dedup.rst b/Documentation/mm/vmemmap_= dedup.rst index 1863d88d2dcb..fca9d0ce282a 100644 --- a/Documentation/mm/vmemmap_dedup.rst +++ b/Documentation/mm/vmemmap_dedup.rst @@ -124,33 +124,35 @@ Here is how things look before optimization:: | | +-----------+ =20 -The value of page->compound_info is the same for all tail pages. The first -page of ``struct page`` (page 0) associated with the HugeTLB page contains= the 4 -``struct page`` necessary to describe the HugeTLB. The only use of the rem= aining -pages of ``struct page`` (page 1 to page 7) is to point to page->compound_= info. -Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of ``struct pa= ge`` -will be used for each HugeTLB page. This will allow us to free the remaini= ng -7 pages to the buddy allocator. +The first page of ``struct page`` (page 0) associated with the HugeTLB page +contains the 4 ``struct page`` necessary to describe the HugeTLB. The rema= ining +pages of ``struct page`` (page 1 to page 7) are tail pages. + +The optimization is only applied when the size of the struct page is a pow= er-of-2 +In this case, all tail pages of the same order are identical. See +compound_head(). This allows us to remap the tail pages of the vmemmap to a +shared, read-only page. The head page is also remapped to a new page. This +allows the original vmemmap pages to be freed. =20 Here is how things look after remapping:: =20 - HugeTLB struct pages(8 pages) page frame(8 pa= ges) - +-----------+ ---virt_to_page---> +-----------+ mapping to +---------= --+ - | | | 0 | -------------> | 0 = | - | | +-----------+ +---------= --+ - | | | 1 | ---------------^ ^ ^ ^ ^ = ^ ^ - | | +-----------+ | | | | = | | - | | | 2 | -----------------+ | | | = | | - | | +-----------+ | | | = | | - | | | 3 | -------------------+ | | = | | - | | +-----------+ | | = | | - | | | 4 | ---------------------+ | = | | - | PMD | +-----------+ | = | | - | level | | 5 | -----------------------+ = | | - | mapping | +-----------+ = | | - | | | 6 | -------------------------= + | - | | +-----------+ = | - | | | 7 | -------------------------= --+ + HugeTLB struct pages(8 pages) page fr= ame + +-----------+ ---virt_to_page---> +-----------+ mapping to +---------= -------+ + | | | 0 | -------------> | 0 = | + | | +-----------+ +---------= -------+ + | | | 1 | ------=E2=94=90 + | | +-----------+ | + | | | 2 | ------=E2=94=BC +-= ---------------------------+ + | | +-----------+ | | A single= , per-node page | + | | | 3 | ------=E2=94=BC------> | = frame shared among all | + | | +-----------+ | | hugepage= s of the same size | + | | | 4 | ------=E2=94=BC +-= ---------------------------+ + | | +-----------+ | + | | | 5 | ------=E2=94=BC + | PMD | +-----------+ | + | level | | 6 | ------=E2=94=BC + | mapping | +-----------+ | + | | | 7 | ------=E2=94=98 | | +-----------+ | | | | @@ -172,16 +174,6 @@ The contiguous bit is used to increase the mapping siz= e at the pmd and pte (last) level. So this type of HugeTLB page can be optimized only when its size of the ``struct page`` structs is greater than **1** page. =20 -Notice: The head vmemmap page is not freed to the buddy allocator and all -tail vmemmap pages are mapped to the head vmemmap page frame. So we can see -more than one ``struct page`` struct with ``PG_head`` (e.g. 8 per 2 MB Hug= eTLB -page) associated with each HugeTLB page. The ``compound_head()`` can handle -this correctly. There is only **one** head ``struct page``, the tail -``struct page`` with ``PG_head`` are fake head ``struct page``. We need an -approach to distinguish between those two different types of ``struct page= `` so -that ``compound_head()`` can return the real head ``struct page`` when the -parameter is the tail ``struct page`` but with ``PG_head``. - Device DAX =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.51.2