From nobody Wed Dec 17 07:28:47 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FA1B230274; Fri, 16 May 2025 09:15:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747386946; cv=none; b=nU705I73uNKnmJeygavX3s0tHlEmLhJ71UAWWkMgqXUgVululuS1jlwruT8Ove8LuHbfNIJ9/7L5BCzcZGCFt7iq3KzcGJKrkou93gafAT6ykhIw9gUvbQWsiEPlgHVYBPcqPwC8nVc/bb0yJ43bZ7aQkh7P7h29jLPHwGsspLE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747386946; c=relaxed/simple; bh=qHa068Z4tNA2EKL3RSWrNOGmS2Oavr7psnUz+YrP6jU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ctpLxTFOUwkyJZrDYdWg9/1CsC9Mh9ji/9GUp8fn4BMw/xMqdSfFA24AoBdb/2MpOmmbFvLRxKy7nGFUY+b3p65QDZPW7wLHpysoNKxqA5yRX3DUb3OB4xZKmcVA9Sc6rCGu1jqtAG7kdWXTbxFf19AtsIg9f5Y+JtU0pYnHBXQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FSoy9zme; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FSoy9zme" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1747386944; x=1778922944; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qHa068Z4tNA2EKL3RSWrNOGmS2Oavr7psnUz+YrP6jU=; b=FSoy9zmeV3Z0vPTyiOQd3zXnHF4qwqmCzjmMqA3VHFPvpORWlp4yL7bR FbzD/cA0jqowQIHybuu7zX/fSNOzhKqpDMW9VYjYUhZjAUW63U77L2ymu T94kU5dPPt/3FHu5bnl2HHCX8H9BSWSg+tGuFv0FC/sreTLzIW+o/X2IA sAugGxyJ/4vZw3pox55A0FVyoHa2XtS/L0C7leYm+7sG2gbyAceQ4fTu2 zqMYf4I7b1Fyo1CFauKu4VU4dUzszMI5FMGuQIXpeThFJwfO7VfbWDWDQ CK8fKiDRkBbidFOZsnTKXbMQkiOHOY9AwSiDS0vXa6JxPxTbWaRjc9sl4 A==; X-CSE-ConnectionGUID: rsOLf6kORveah6OeEVlYcw== X-CSE-MsgGUID: ZkTBUtEbTAybxoantLvHdw== X-IronPort-AV: E=McAfee;i="6700,10204,11434"; a="48605808" X-IronPort-AV: E=Sophos;i="6.15,293,1739865600"; d="scan'208";a="48605808" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 May 2025 02:15:43 -0700 X-CSE-ConnectionGUID: DsjJgBYNRiiX9ads9rcshQ== X-CSE-MsgGUID: 4fl8596/SnCRBtotG2f4ug== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,293,1739865600"; d="scan'208";a="143749858" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa005.jf.intel.com with ESMTP; 16 May 2025 02:15:38 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 03A7323F; Fri, 16 May 2025 12:15:36 +0300 (EEST) From: "Kirill A. Shutemov" To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" Cc: Jonathan Corbet , Andy Lutomirski , Peter Zijlstra , Ard Biesheuvel , Jan Kiszka , Kieran Bingham , "Kirill A. Shutemov" , Michael Roth , Rick Edgecombe , Brijesh Singh , Sandipan Das , Juergen Gross , Tom Lendacky , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-efi@vger.kernel.org, linux-mm@kvack.org Subject: [PATCHv2 1/3] x86/64/mm: Always use dynamic memory layout Date: Fri, 16 May 2025 12:15:31 +0300 Message-ID: <20250516091534.3414310-2-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250516091534.3414310-1-kirill.shutemov@linux.intel.com> References: <20250516091534.3414310-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Dynamic memory layout is used by KASLR and 5-level paging. CONFIG_X86_5LEVEL is going to be removed, making 5-level paging support unconditional which requires unconditional support of dynamic memory layout. Remove CONFIG_DYNAMIC_MEMORY_LAYOUT. Signed-off-by: Kirill A. Shutemov Reviewed-by: Ard Biesheuvel --- arch/x86/Kconfig | 8 -------- arch/x86/include/asm/page_64_types.h | 4 ---- arch/x86/include/asm/pgtable_64_types.h | 6 ------ arch/x86/kernel/head64.c | 2 -- scripts/gdb/linux/pgtable.py | 4 +--- 5 files changed, 1 insertion(+), 23 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 891a69b308cb..d3c2da3b2f0b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1467,7 +1467,6 @@ config X86_PAE config X86_5LEVEL bool "Enable 5-level page tables support" default y - select DYNAMIC_MEMORY_LAYOUT select SPARSEMEM_VMEMMAP depends on X86_64 help @@ -2167,17 +2166,10 @@ config PHYSICAL_ALIGN =20 Don't change this unless you know what you are doing. =20 -config DYNAMIC_MEMORY_LAYOUT - bool - help - This option makes base addresses of vmalloc and vmemmap as well as - __PAGE_OFFSET movable during boot. - config RANDOMIZE_MEMORY bool "Randomize the kernel memory sections" depends on X86_64 depends on RANDOMIZE_BASE - select DYNAMIC_MEMORY_LAYOUT default RANDOMIZE_BASE help Randomizes the base virtual address of kernel memory sections diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/pa= ge_64_types.h index 1faa8f88850a..6b8c8169c71d 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -41,11 +41,7 @@ #define __PAGE_OFFSET_BASE_L5 _AC(0xff11000000000000, UL) #define __PAGE_OFFSET_BASE_L4 _AC(0xffff888000000000, UL) =20 -#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT #define __PAGE_OFFSET page_offset_base -#else -#define __PAGE_OFFSET __PAGE_OFFSET_BASE_L4 -#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */ =20 #define __START_KERNEL_map _AC(0xffffffff80000000, UL) =20 diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm= /pgtable_64_types.h index e83721db18c9..eee06f77b245 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -128,15 +128,9 @@ extern unsigned int ptrs_per_p4d; #define __VMEMMAP_BASE_L4 0xffffea0000000000UL #define __VMEMMAP_BASE_L5 0xffd4000000000000UL =20 -#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT # define VMALLOC_START vmalloc_base # define VMALLOC_SIZE_TB (pgtable_l5_enabled() ? VMALLOC_SIZE_TB_L5 : VMAL= LOC_SIZE_TB_L4) # define VMEMMAP_START vmemmap_base -#else -# define VMALLOC_START __VMALLOC_BASE_L4 -# define VMALLOC_SIZE_TB VMALLOC_SIZE_TB_L4 -# define VMEMMAP_START __VMEMMAP_BASE_L4 -#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */ =20 #ifdef CONFIG_RANDOMIZE_MEMORY # define DIRECT_MAP_PHYSMEM_END direct_map_physmem_end diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 14f7dda20954..9f617be64fa9 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -59,14 +59,12 @@ unsigned int ptrs_per_p4d __ro_after_init =3D 1; EXPORT_SYMBOL(ptrs_per_p4d); #endif =20 -#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT unsigned long page_offset_base __ro_after_init =3D __PAGE_OFFSET_BASE_L4; EXPORT_SYMBOL(page_offset_base); unsigned long vmalloc_base __ro_after_init =3D __VMALLOC_BASE_L4; EXPORT_SYMBOL(vmalloc_base); unsigned long vmemmap_base __ro_after_init =3D __VMEMMAP_BASE_L4; EXPORT_SYMBOL(vmemmap_base); -#endif =20 /* Wipe all early page tables except for the kernel symbol map */ static void __init reset_early_page_tables(void) diff --git a/scripts/gdb/linux/pgtable.py b/scripts/gdb/linux/pgtable.py index 30d837f3dfae..09aac2421fb8 100644 --- a/scripts/gdb/linux/pgtable.py +++ b/scripts/gdb/linux/pgtable.py @@ -29,11 +29,9 @@ def page_mask(level=3D1): raise Exception(f'Unknown page level: {level}') =20 =20 -#page_offset_base in case CONFIG_DYNAMIC_MEMORY_LAYOUT is disabled -POB_NO_DYNAMIC_MEM_LAYOUT =3D '0xffff888000000000' def _page_offset_base(): pob_symbol =3D gdb.lookup_global_symbol('page_offset_base') - pob =3D pob_symbol.name if pob_symbol else POB_NO_DYNAMIC_MEM_LAYOUT + pob =3D pob_symbol.name return gdb.parse_and_eval(pob) =20 =20 --=20 2.47.2 From nobody Wed Dec 17 07:28:47 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DFC72309AF; Fri, 16 May 2025 09:15:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747386948; cv=none; b=uOgMGg2sbJYimdEM3QvGxQ0CScQ+t3qIqmyfMUP/I4yykYX81a8y2TKnu/VMV5jnADPWtrnIsVymREUv6eR40Su6vlEeJjLCYA1xUYFbRca4qL/ovHVUbyDwznzqwFluUF55731FKqH1/9N9F5qRWoCsSuVIYAcPo7r+xB2uvyQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747386948; c=relaxed/simple; bh=2KE05UQzIRlAfabsyuWXczUgkDHBW/1hRqGuOVIU5fY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ou4eZGYjKoyg4ecYl/8K67MBSqqaDYg+8jUuZXk0vsYT+YM3XF8uQ8c9WuTuo0y1oKem5CVmcX2XbTOLw4qBQrdA4u/p8PlrDvz76C/7DdJoEioqTYSdlIjVZBJqsAhGjCbi7OJd5Cj7zqFbPliQR/4LrTddwdM75ldtPt6qQ7U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=a35dFR+1; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="a35dFR+1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1747386946; x=1778922946; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2KE05UQzIRlAfabsyuWXczUgkDHBW/1hRqGuOVIU5fY=; b=a35dFR+1TeG6m6a5gGmSQ+gm6pDPUmQ4dWX5ZWxfvFauj8YopbwmuP1B DZIxY7ljEDDS7iNnIe4a8JoUAF31U6CvCArxwB+g8WHTzLIWUj/136Wze 1Pu2SU+GL1fSUz1KX6WFkSPZSGkNVWZ0hfc3KjA3AIl4HyNFt4Y744MTB nIufa/XotWaGg7EAbBeUjDmWquAq64ndktxA7kaM9V2KwrqzPF83LiczV w4Uj8PX0l4SmbRMxEMyghPmq6nCb+TI3OBRbIZtGWCtOO+C136olCEl7W er3uzDWe7FKv+9iu8i9Y5sDGO+mn2j1XefMyLslqF0Acjegmc17yr43+G w==; X-CSE-ConnectionGUID: 5B93RP3LQT+2lg0BOaqPZg== X-CSE-MsgGUID: xzcUPLXKRPGppqHHkzZ6BQ== X-IronPort-AV: E=McAfee;i="6700,10204,11434"; a="60375790" X-IronPort-AV: E=Sophos;i="6.15,293,1739865600"; d="scan'208";a="60375790" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 May 2025 02:15:43 -0700 X-CSE-ConnectionGUID: 1Wy1vj20STWNM0bHPQPhMg== X-CSE-MsgGUID: b0/PqiYPSpehnIaTM11IXQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,293,1739865600"; d="scan'208";a="138527546" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa006.jf.intel.com with ESMTP; 16 May 2025 02:15:38 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 1768846C; Fri, 16 May 2025 12:15:37 +0300 (EEST) From: "Kirill A. Shutemov" To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" Cc: Jonathan Corbet , Andy Lutomirski , Peter Zijlstra , Ard Biesheuvel , Jan Kiszka , Kieran Bingham , "Kirill A. Shutemov" , Michael Roth , Rick Edgecombe , Brijesh Singh , Sandipan Das , Juergen Gross , Tom Lendacky , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-efi@vger.kernel.org, linux-mm@kvack.org Subject: [PATCHv2 2/3] x86/64/mm: Make SPARSEMEM_VMEMMAP the only memory model Date: Fri, 16 May 2025 12:15:32 +0300 Message-ID: <20250516091534.3414310-3-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250516091534.3414310-1-kirill.shutemov@linux.intel.com> References: <20250516091534.3414310-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 5-level paging only supports SPARSEMEM_VMEMMAP. CONFIG_X86_5LEVEL is being phased out, making 5-level paging support mandatory. Make CONFIG_SPARSEMEM_VMEMMAP mandatory for x86-64 and eliminate any associated conditional statements. Signed-off-by: Kirill A. Shutemov Reviewed-by: Ard Biesheuvel --- arch/x86/Kconfig | 2 +- arch/x86/mm/init_64.c | 9 +-------- 2 files changed, 2 insertions(+), 9 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d3c2da3b2f0b..45b36a019b5e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1467,7 +1467,6 @@ config X86_PAE config X86_5LEVEL bool "Enable 5-level page tables support" default y - select SPARSEMEM_VMEMMAP depends on X86_64 help 5-level paging enables access to larger address space: @@ -1579,6 +1578,7 @@ config ARCH_SPARSEMEM_ENABLE def_bool y select SPARSEMEM_STATIC if X86_32 select SPARSEMEM_VMEMMAP_ENABLE if X86_64 + select SPARSEMEM_VMEMMAP if X86_64 =20 config ARCH_SPARSEMEM_DEFAULT def_bool X86_64 || (NUMA && X86_32) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index bf45c7aed336..66330fe4e18c 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -833,7 +833,6 @@ void __init paging_init(void) zone_sizes_init(); } =20 -#ifdef CONFIG_SPARSEMEM_VMEMMAP #define PAGE_UNUSED 0xFD =20 /* @@ -932,7 +931,6 @@ static void __meminit vmemmap_use_new_sub_pmd(unsigned = long start, unsigned long if (!IS_ALIGNED(end, PMD_SIZE)) unused_pmd_start =3D end; } -#endif =20 /* * Memory hotplug specific functions @@ -1152,16 +1150,13 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long ad= dr, unsigned long end, pmd_clear(pmd); spin_unlock(&init_mm.page_table_lock); pages++; - } -#ifdef CONFIG_SPARSEMEM_VMEMMAP - else if (vmemmap_pmd_is_unused(addr, next)) { + } else if (vmemmap_pmd_is_unused(addr, next)) { free_hugepage_table(pmd_page(*pmd), altmap); spin_lock(&init_mm.page_table_lock); pmd_clear(pmd); spin_unlock(&init_mm.page_table_lock); } -#endif continue; } =20 @@ -1500,7 +1495,6 @@ unsigned long memory_block_size_bytes(void) return memory_block_size_probed; } =20 -#ifdef CONFIG_SPARSEMEM_VMEMMAP /* * Initialise the sparsemem vmemmap using huge-pages at the PMD level. */ @@ -1647,4 +1641,3 @@ void __meminit vmemmap_populate_print_last(void) node_start =3D 0; } } -#endif --=20 2.47.2 From nobody Wed Dec 17 07:28:47 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7ACA922FF39; Fri, 16 May 2025 09:15:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747386946; cv=none; b=uKSo9fGa6F78LCOSay1CQSJWPnclpFMssPo6bdDAconqrYmaNOvTu0vTAqfnv4zLhedfVofqAkWPkYyOCWg1FMRvH5k6oM3G7mD/uUHkoyIVXkQKZoKJj4SLbpfV00r4M5YO0+fqPzR5eoCmRo/3tx+1oVZhx5iEiow65QnGpS0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747386946; c=relaxed/simple; bh=jL+uCRUE8lDqeJXJnyY03TRBly9Cd5B13xMPh7t8+q0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UvtSc9FwVWxatMPBtYPAQ8uY6bN/BICT0FDrjHHA2jrc1YVopNs+bY3YIYkzW9eXgg46SBA1vBhLc16Q7qY9stKSh/gBPa1lqpJ6ZUUblMojktRlXHNJLpTHIyn2GZyTQRj0vlPATae9Pp6+/RZDugowluMI0G6QL3EvVob8wwQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NdfdKfo2; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NdfdKfo2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1747386944; x=1778922944; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jL+uCRUE8lDqeJXJnyY03TRBly9Cd5B13xMPh7t8+q0=; b=NdfdKfo2P+b/h++x5rvCkJZBeCM4PjRSrKyllBbYTrkOY+QX7ANCbcBv t4jivPgrPfsSWdmPHdjbr0cuFcQpQ23aV8+s45vykKf8jcJEY/MKw0EKg 6kVTuhPtEquqM7JQ8qImCS+dg9rWtTRkOE8CqopWDkhCgeejmlL8s9/oA 5mP0qjsbqbpc9gJrPDjC8vLgWXzmAuidKMeqA1LpiujxdWrFmfrAELLqC fg0vw6Y4yxk2doENPrnM14Y9QEuMR6os6A7VhceqUwKNUMR0UNBKSGWpp MKDbL5wv+TMEGEQXuDxmjywhRnZfe5PmC2GDC2OaTXq8sfEtT+7pnDLpA A==; X-CSE-ConnectionGUID: QPn2o32yTr+PmqcPPTcSeg== X-CSE-MsgGUID: REWOVGgiTUq5iI7uDoKSLw== X-IronPort-AV: E=McAfee;i="6700,10204,11434"; a="60375778" X-IronPort-AV: E=Sophos;i="6.15,293,1739865600"; d="scan'208";a="60375778" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 May 2025 02:15:43 -0700 X-CSE-ConnectionGUID: pecmxtR4RASsa0Ai0aYm9w== X-CSE-MsgGUID: 1NA7uBIDRT6IgNLNtW30TQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,293,1739865600"; d="scan'208";a="138527544" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa006.jf.intel.com with ESMTP; 16 May 2025 02:15:38 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 2645D47A; Fri, 16 May 2025 12:15:37 +0300 (EEST) From: "Kirill A. Shutemov" To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" Cc: Jonathan Corbet , Andy Lutomirski , Peter Zijlstra , Ard Biesheuvel , Jan Kiszka , Kieran Bingham , "Kirill A. Shutemov" , Michael Roth , Rick Edgecombe , Brijesh Singh , Sandipan Das , Juergen Gross , Tom Lendacky , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-efi@vger.kernel.org, linux-mm@kvack.org Subject: [PATCHv2 3/3] x86/64/mm: Make 5-level paging support unconditional Date: Fri, 16 May 2025 12:15:33 +0300 Message-ID: <20250516091534.3414310-4-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250516091534.3414310-1-kirill.shutemov@linux.intel.com> References: <20250516091534.3414310-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Both Intel and AMD CPUs support 5-level paging, which is expected to become more widely adopted in the future. Remove CONFIG_X86_5LEVEL and ifdeffery for it to make it more readable. Signed-off-by: Kirill A. Shutemov Suggested-by: Borislav Petkov Reviewed-by: Ard Biesheuvel Reviewed-by: Borislav Petkov (AMD) --- Documentation/arch/x86/cpuinfo.rst | 8 +++---- .../arch/x86/x86_64/5level-paging.rst | 9 -------- arch/x86/Kconfig | 22 +------------------ arch/x86/Kconfig.cpufeatures | 4 ---- arch/x86/boot/compressed/pgtable_64.c | 11 ++-------- arch/x86/boot/header.S | 4 ---- arch/x86/boot/startup/map_kernel.c | 5 +---- arch/x86/include/asm/page_64.h | 2 -- arch/x86/include/asm/page_64_types.h | 7 ------ arch/x86/include/asm/pgtable_64_types.h | 18 --------------- arch/x86/kernel/alternative.c | 2 +- arch/x86/kernel/head64.c | 2 -- arch/x86/kernel/head_64.S | 2 -- arch/x86/mm/init.c | 4 ---- arch/x86/mm/pgtable.c | 2 +- drivers/firmware/efi/libstub/x86-5lvl.c | 2 +- 16 files changed, 10 insertions(+), 94 deletions(-) diff --git a/Documentation/arch/x86/cpuinfo.rst b/Documentation/arch/x86/cp= uinfo.rst index f80e2a558d2a..dd8b7806944e 100644 --- a/Documentation/arch/x86/cpuinfo.rst +++ b/Documentation/arch/x86/cpuinfo.rst @@ -173,10 +173,10 @@ For example, when an old kernel is running on new har= dware. The kernel disabled support for it at compile-time -------------------------------------------------- =20 -For example, if 5-level-paging is not enabled when building (i.e., -CONFIG_X86_5LEVEL is not selected) the flag "la57" will not show up [#f1]_. +For example, if Linear Address Masking (LAM) is not enabled when building = (i.e., +CONFIG_ADDRESS_MASKING is not selected) the flag "lam" will not show up. Even though the feature will still be detected via CPUID, the kernel disab= les -it by clearing via setup_clear_cpu_cap(X86_FEATURE_LA57). +it by clearing via setup_clear_cpu_cap(X86_FEATURE_LAM). =20 The feature is disabled at boot-time ------------------------------------ @@ -200,5 +200,3 @@ missing at runtime. For example, AVX flags will not sho= w up if XSAVE feature is disabled since they depend on XSAVE feature. Another example would be b= roken CPUs and them missing microcode patches. Due to that, the kernel decides n= ot to enable a feature. - -.. [#f1] 5-level paging uses linear address of 57 bits. diff --git a/Documentation/arch/x86/x86_64/5level-paging.rst b/Documentatio= n/arch/x86/x86_64/5level-paging.rst index 71f882f4a173..ad7ddc13f79d 100644 --- a/Documentation/arch/x86/x86_64/5level-paging.rst +++ b/Documentation/arch/x86/x86_64/5level-paging.rst @@ -22,15 +22,6 @@ QEMU 2.9 and later support 5-level paging. Virtual memory layout for 5-level paging is described in Documentation/arch/x86/x86_64/mm.rst =20 - -Enabling 5-level paging -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D -CONFIG_X86_5LEVEL=3Dy enables the feature. - -Kernel with CONFIG_X86_5LEVEL=3Dy still able to boot on 4-level hardware. -In this case additional page table level -- p4d -- will be folded at -runtime. - User-space and large virtual address space =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D On x86, 5-level paging enables 56-bit userspace virtual address space. diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 45b36a019b5e..7aed3fa0e780 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -427,8 +427,7 @@ config DYNAMIC_PHYSICAL_MASK =20 config PGTABLE_LEVELS int - default 5 if X86_5LEVEL - default 4 if X86_64 + default 5 if X86_64 default 3 if X86_PAE default 2 =20 @@ -1464,25 +1463,6 @@ config X86_PAE has the cost of more pagetable lookup overhead, and also consumes more pagetable space per process. =20 -config X86_5LEVEL - bool "Enable 5-level page tables support" - default y - depends on X86_64 - help - 5-level paging enables access to larger address space: - up to 128 PiB of virtual address space and 4 PiB of - physical address space. - - It will be supported by future Intel CPUs. - - A kernel with the option enabled can be booted on machines that - support 4- or 5-level paging. - - See Documentation/arch/x86/x86_64/5level-paging.rst for more - information. - - Say N if unsure. - config X86_DIRECT_GBPAGES def_bool y depends on X86_64 diff --git a/arch/x86/Kconfig.cpufeatures b/arch/x86/Kconfig.cpufeatures index e12d5b7e39a2..250c10627ab3 100644 --- a/arch/x86/Kconfig.cpufeatures +++ b/arch/x86/Kconfig.cpufeatures @@ -132,10 +132,6 @@ config X86_DISABLED_FEATURE_OSPKE def_bool y depends on !X86_INTEL_MEMORY_PROTECTION_KEYS =20 -config X86_DISABLED_FEATURE_LA57 - def_bool y - depends on !X86_5LEVEL - config X86_DISABLED_FEATURE_PTI def_bool y depends on !MITIGATION_PAGE_TABLE_ISOLATION diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compress= ed/pgtable_64.c index 5a6c7a190e5b..bdd26050dff7 100644 --- a/arch/x86/boot/compressed/pgtable_64.c +++ b/arch/x86/boot/compressed/pgtable_64.c @@ -10,12 +10,10 @@ #define BIOS_START_MIN 0x20000U /* 128K, less than this is insane */ #define BIOS_START_MAX 0x9f000U /* 640K, absolute maximum */ =20 -#ifdef CONFIG_X86_5LEVEL /* __pgtable_l5_enabled needs to be in .data to avoid being cleared along = with .bss */ unsigned int __section(".data") __pgtable_l5_enabled; unsigned int __section(".data") pgdir_shift =3D 39; unsigned int __section(".data") ptrs_per_p4d =3D 1; -#endif =20 /* Buffer to preserve trampoline memory */ static char trampoline_save[TRAMPOLINE_32BIT_SIZE]; @@ -114,18 +112,13 @@ asmlinkage void configure_5level_paging(struct boot_p= arams *bp, void *pgtable) * Check if LA57 is desired and supported. * * There are several parts to the check: - * - if the kernel supports 5-level paging: CONFIG_X86_5LEVEL=3Dy * - if user asked to disable 5-level paging: no5lvl in cmdline * - if the machine supports 5-level paging: * + CPUID leaf 7 is supported * + the leaf has the feature bit set - * - * That's substitute for boot_cpu_has() in early boot code. */ - if (IS_ENABLED(CONFIG_X86_5LEVEL) && - !cmdline_find_option_bool("no5lvl") && - native_cpuid_eax(0) >=3D 7 && - (native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)))) { + if (!cmdline_find_option_bool("no5lvl") && + native_cpuid_eax(0) >=3D 7 && (native_cpuid_ecx(7) & BIT(16))) { l5_required =3D true; =20 /* Initialize variables for 5-level paging */ diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S index 9cb91421b4e4..e30649e44d8f 100644 --- a/arch/x86/boot/header.S +++ b/arch/x86/boot/header.S @@ -361,12 +361,8 @@ xloadflags: #endif =20 #ifdef CONFIG_X86_64 -#ifdef CONFIG_X86_5LEVEL #define XLF56 (XLF_5LEVEL|XLF_5LEVEL_ENABLED) #else -#define XLF56 XLF_5LEVEL -#endif -#else #define XLF56 0 #endif =20 diff --git a/arch/x86/boot/startup/map_kernel.c b/arch/x86/boot/startup/map= _kernel.c index 905e8734b5a3..332dbe6688c4 100644 --- a/arch/x86/boot/startup/map_kernel.c +++ b/arch/x86/boot/startup/map_kernel.c @@ -16,9 +16,6 @@ extern unsigned int next_early_pgt; =20 static inline bool check_la57_support(void) { - if (!IS_ENABLED(CONFIG_X86_5LEVEL)) - return false; - /* * 5-level paging is detected and enabled at kernel decompression * stage. Only check if it has been enabled there. @@ -129,7 +126,7 @@ unsigned long __head __startup_64(unsigned long p2v_off= set, pgd =3D rip_rel_ptr(early_top_pgt); pgd[pgd_index(__START_KERNEL_map)] +=3D load_delta; =20 - if (IS_ENABLED(CONFIG_X86_5LEVEL) && la57) { + if (la57) { p4d =3D (p4dval_t *)rip_rel_ptr(level4_kernel_pgt); p4d[MAX_PTRS_PER_P4D - 1] +=3D load_delta; =20 diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index d3aab6f4e59a..015d23f3e01f 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -62,7 +62,6 @@ static inline void clear_page(void *page) void copy_page(void *to, void *from); KCFI_REFERENCE(copy_page); =20 -#ifdef CONFIG_X86_5LEVEL /* * User space process size. This is the first address outside the user ra= nge. * There are a few constraints that determine this: @@ -93,7 +92,6 @@ static __always_inline unsigned long task_size_max(void) =20 return ret; } -#endif /* CONFIG_X86_5LEVEL */ =20 #endif /* !__ASSEMBLER__ */ =20 diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/pa= ge_64_types.h index 6b8c8169c71d..7400dab373fe 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -48,14 +48,7 @@ /* See Documentation/arch/x86/x86_64/mm.rst for a description of the memor= y map. */ =20 #define __PHYSICAL_MASK_SHIFT 52 - -#ifdef CONFIG_X86_5LEVEL #define __VIRTUAL_MASK_SHIFT (pgtable_l5_enabled() ? 56 : 47) -/* See task_size_max() in */ -#else -#define __VIRTUAL_MASK_SHIFT 47 -#define task_size_max() ((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE) -#endif =20 #define TASK_SIZE_MAX task_size_max() #define DEFAULT_MAP_WINDOW ((1UL << 47) - PAGE_SIZE) diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm= /pgtable_64_types.h index eee06f77b245..4604f924d8b8 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -23,7 +23,6 @@ typedef struct { pmdval_t pmd; } pmd_t; =20 extern unsigned int __pgtable_l5_enabled; =20 -#ifdef CONFIG_X86_5LEVEL #ifdef USE_EARLY_PGTABLE_L5 /* * cpu_feature_enabled() is not available in early boot code. @@ -37,17 +36,11 @@ static inline bool pgtable_l5_enabled(void) #define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_LA57) #endif /* USE_EARLY_PGTABLE_L5 */ =20 -#else -#define pgtable_l5_enabled() 0 -#endif /* CONFIG_X86_5LEVEL */ - extern unsigned int pgdir_shift; extern unsigned int ptrs_per_p4d; =20 #endif /* !__ASSEMBLER__ */ =20 -#ifdef CONFIG_X86_5LEVEL - /* * PGDIR_SHIFT determines what a top-level page table entry can map */ @@ -65,17 +58,6 @@ extern unsigned int ptrs_per_p4d; =20 #define MAX_POSSIBLE_PHYSMEM_BITS 52 =20 -#else /* CONFIG_X86_5LEVEL */ - -/* - * PGDIR_SHIFT determines what a top-level page table entry can map - */ -#define PGDIR_SHIFT 39 -#define PTRS_PER_PGD 512 -#define MAX_PTRS_PER_P4D 1 - -#endif /* CONFIG_X86_5LEVEL */ - /* * 3rd level page */ diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 29572927f9c5..ecfe7b497cad 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -596,7 +596,7 @@ void __init_or_module noinline apply_alternatives(struc= t alt_instr *start, DPRINTK(ALT, "alt table %px, -> %px", start, end); =20 /* - * In the case CONFIG_X86_5LEVEL=3Dy, KASAN_SHADOW_START is defined using + * KASAN_SHADOW_START is defined using * cpu_feature_enabled(X86_FEATURE_LA57) and is therefore patched here. * During the process, KASAN becomes confused seeing partial LA57 * conversion and triggers a false-positive out-of-bound report. diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 9f617be64fa9..533fcf5636fc 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -51,13 +51,11 @@ unsigned int __initdata next_early_pgt; SYM_PIC_ALIAS(next_early_pgt); pmdval_t early_pmd_flags =3D __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_= NX); =20 -#ifdef CONFIG_X86_5LEVEL unsigned int __pgtable_l5_enabled __ro_after_init; unsigned int pgdir_shift __ro_after_init =3D 39; EXPORT_SYMBOL(pgdir_shift); unsigned int ptrs_per_p4d __ro_after_init =3D 1; EXPORT_SYMBOL(ptrs_per_p4d); -#endif =20 unsigned long page_offset_base __ro_after_init =3D __PAGE_OFFSET_BASE_L4; EXPORT_SYMBOL(page_offset_base); diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 069420853304..3e9b3a3bd039 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -649,13 +649,11 @@ SYM_DATA_START_PTI_ALIGNED(init_top_pgt) SYM_DATA_END(init_top_pgt) #endif =20 -#ifdef CONFIG_X86_5LEVEL SYM_DATA_START_PAGE_ALIGNED(level4_kernel_pgt) .fill 511,8,0 .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC SYM_DATA_END(level4_kernel_pgt) SYM_PIC_ALIAS(level4_kernel_pgt) -#endif =20 SYM_DATA_START_PAGE_ALIGNED(level3_kernel_pgt) .fill L3_START_KERNEL,8,0 diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index aa56d9ac0b8f..7456df985d96 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -174,11 +174,7 @@ __ref void *alloc_low_pages(unsigned int num) * randomization is enabled. */ =20 -#ifndef CONFIG_X86_5LEVEL -#define INIT_PGD_PAGE_TABLES 3 -#else #define INIT_PGD_PAGE_TABLES 4 -#endif =20 #ifndef CONFIG_RANDOMIZE_MEMORY #define INIT_PGD_PAGE_COUNT (2 * INIT_PGD_PAGE_TABLES) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 59c42dec7076..62777ba4de1a 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -592,7 +592,7 @@ void native_set_fixmap(unsigned /* enum fixed_addresses= */ idx, } =20 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP -#ifdef CONFIG_X86_5LEVEL +#if CONFIG_PGTABLE_LEVELS > 4 /** * p4d_set_huge - Set up kernel P4D mapping * @p4d: Pointer to the P4D entry diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi= /libstub/x86-5lvl.c index 77359e802181..f1c5fb45d5f7 100644 --- a/drivers/firmware/efi/libstub/x86-5lvl.c +++ b/drivers/firmware/efi/libstub/x86-5lvl.c @@ -62,7 +62,7 @@ efi_status_t efi_setup_5level_paging(void) =20 void efi_5level_switch(void) { - bool want_la57 =3D IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl; + bool want_la57 =3D !efi_no5lvl; bool have_la57 =3D native_read_cr4() & X86_CR4_LA57; bool need_toggle =3D want_la57 ^ have_la57; u64 *pgt =3D (void *)la57_toggle + PAGE_SIZE; --=20 2.47.2