From nobody Mon Jun 22 19:03:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDBA6C433FE for ; Fri, 18 Mar 2022 10:09:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234848AbiCRKKf (ORCPT ); Fri, 18 Mar 2022 06:10:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234832AbiCRKKd (ORCPT ); Fri, 18 Mar 2022 06:10:33 -0400 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E4CD1FAA0E for ; Fri, 18 Mar 2022 03:09:15 -0700 (PDT) Received: by mail-pj1-x102c.google.com with SMTP id mj15-20020a17090b368f00b001c637aa358eso10039999pjb.0 for ; Fri, 18 Mar 2022 03:09:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CbeJa8HYrLQ+Ows7xcLj2fADMbIualKHQUd1u73fRK4=; b=IBx1YI1c0P58TKd4LXPU+bCDVhZPQWPKOBH8TT0csj+PyWrDJQrYAyBYgMwHEPjrcb wmy9+gVNWxEp39W3uO4PYLgQBReZ6xhSjqamTwHjE8IUr8OxExNQWD6skl3k1D7uXPwa CXNyHXn6+I4xx/Fh4th29hxpIOVHXqTzKv7KPuaTlHZClJxgWLbMBp3UslqycmVcKsH8 T1SDiwH9cZKSuw7ZxxMXl4P/u+7DCCYXu/UuCOzKcmZ/UtLX2FdWt7WKmQ1M+RcRiFDM J90y/AKOKdeCXvB1aUXBPEWN7PUWy4LYI61p9E5XxL8zOe5f06Yxr3z2/0SXpu0jkB+g G7sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CbeJa8HYrLQ+Ows7xcLj2fADMbIualKHQUd1u73fRK4=; b=Q3dn7pMMG+hZPAWE+/5Y+YOGopDF3NyIGZrCSDlSDwe4iqrwHl1fQr2PQKbDEa5DS3 IcWGuf7JiX7pgPo3/v6xPuboH6SZCI945/TyWZyu3Roijp/Bqa8J2dGDCT9t8pR5MBMh L9CW0cqF5fQmBNNuGrUo8N54CL57PhNYH0ipMBj69Pz00n4sxPuSS1mCS0wZzXtVlLao srK19EjHRScvXChfz8YYgx0lmkYZVqdjWElPHT/c2X2yCZUyV/pWDBHGd2LBiSfcjIfv 6a6x3SqVx2JtSqdP+vvVVim1r32S++rfFx97iWaU5KH3Dq2TxQmFqkTmM8v++Y3pVYWh MK0g== X-Gm-Message-State: AOAM5316WMdhgP5z9lQlHDtczu0HDN2TmEw1SFYiyMRDSBvVeze//yye 0oKOB42MI1tJlVaBcVEvmx4Uyg== X-Google-Smtp-Source: ABdhPJyqEhYFHSuQKBWKtSkVNuChtkgjO57/sD0zSO+cpSCWurcOy+Dpdf0HXl5Vc/xUxorWEKVjgA== X-Received: by 2002:a17:90b:4b4a:b0:1bf:83d:6805 with SMTP id mi10-20020a17090b4b4a00b001bf083d6805mr21360599pjb.174.1647598154783; Fri, 18 Mar 2022 03:09:14 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id j3-20020a056a00234300b004fa042e8216sm9541252pfj.2.2022.03.18.03.09.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Mar 2022 03:09:14 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v4 1/4] mm: hugetlb_vmemmap: introduce STRUCT_PAGE_SIZE_IS_POWER_OF_2 Date: Fri, 18 Mar 2022 18:07:17 +0800 Message-Id: <20220318100720.14524-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220318100720.14524-1-songmuchun@bytedance.com> References: <20220318100720.14524-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If the size of "struct page" is not the power of two and this feature is enabled, then the vmemmap pages of HugeTLB will be corrupted after remapping (panic is about to happen in theory). But this only exists when !CONFIG_MEMCG && !CONFIG_SLUB on x86_64. However, it is not a conventional configuration nowadays. So it is not a real word issue, just the result of a code review. But we have to prevent anyone from configuring that combined configuration. In order to avoid many checks like "is_power_of_2 (sizeof(struct page))" through mm/hugetlb_vmemmap.c. Introduce STRUCT_PAGE_SIZE_IS_POWER_OF_2 to detect if the size of struct page is power of 2 and make this feature depends on this new config. Then we could prevent anyone do any unexpected configuration. Signed-off-by: Muchun Song --- Kbuild | 12 ++++++++++++ fs/Kconfig | 2 +- include/linux/mm_types.h | 2 ++ mm/Kconfig | 3 +++ mm/hugetlb_vmemmap.c | 6 ------ mm/struct_page_size.c | 19 +++++++++++++++++++ scripts/check_struct_page_po2.sh | 11 +++++++++++ 7 files changed, 48 insertions(+), 7 deletions(-) create mode 100644 mm/struct_page_size.c create mode 100755 scripts/check_struct_page_po2.sh diff --git a/Kbuild b/Kbuild index fa441b98c9f6..6bb97d348d62 100644 --- a/Kbuild +++ b/Kbuild @@ -14,6 +14,18 @@ $(bounds-file): kernel/bounds.s FORCE $(call filechk,offsets,__LINUX_BOUNDS_H__) =20 ##### +# Generate struct_page_size.h. Must follows bounds.h. + +struct_page_size-file :=3D include/generated/struct_page_size.h + +always-y :=3D $(struct_page_size-file) +targets :=3D mm/struct_page_size.s + +$(struct_page_size-file): mm/struct_page_size.s FORCE + $(call filechk,offsets,__LINUX_STRUCT_PAGE_SIZE_H__) + $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig + +##### # Generate timeconst.h =20 timeconst-file :=3D include/generated/timeconst.h diff --git a/fs/Kconfig b/fs/Kconfig index 7f2455e8e18a..b8b722f7f773 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -248,7 +248,7 @@ config HUGETLB_PAGE config HUGETLB_PAGE_FREE_VMEMMAP def_bool HUGETLB_PAGE depends on X86_64 - depends on SPARSEMEM_VMEMMAP + depends on SPARSEMEM_VMEMMAP && STRUCT_PAGE_SIZE_IS_POWER_OF_2 =20 config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON bool "Default freeing vmemmap pages of HugeTLB to on" diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8834e38c06a4..b4defcea6534 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -223,6 +223,7 @@ struct page { #endif } _struct_page_alignment; =20 +#ifndef __GENERATING_STRUCT_PAGE_SIZE_IS_POWER_OF_2_H /** * struct folio - Represents a contiguous set of bytes. * @flags: Identical to the page flags. @@ -844,5 +845,6 @@ enum fault_flag { FAULT_FLAG_INSTRUCTION =3D 1 << 8, FAULT_FLAG_INTERRUPTIBLE =3D 1 << 9, }; +#endif /* __GENERATING_STRUCT_PAGE_SIZE_IS_POWER_OF_2_H */ =20 #endif /* _LINUX_MM_TYPES_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 034d87953600..9314bd34f49e 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -2,6 +2,9 @@ =20 menu "Memory Management options" =20 +config STRUCT_PAGE_SIZE_IS_POWER_OF_2 + def_bool $(success,test "$(shell, $(srctree)/scripts/check_struct_page_po= 2.sh)" =3D 1) + config SELECT_MEMORY_MODEL def_bool y depends on ARCH_SELECT_MEMORY_MODEL diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 791626983c2e..33ecb77c2b2a 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -194,12 +194,6 @@ EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key); =20 static int __init early_hugetlb_free_vmemmap_param(char *buf) { - /* We cannot optimize if a "struct page" crosses page boundaries. */ - if (!is_power_of_2(sizeof(struct page))) { - pr_warn("cannot free vmemmap pages because \"struct page\" crosses page = boundaries\n"); - return 0; - } - if (!buf) return -EINVAL; =20 diff --git a/mm/struct_page_size.c b/mm/struct_page_size.c new file mode 100644 index 000000000000..5749609aa1b3 --- /dev/null +++ b/mm/struct_page_size.c @@ -0,0 +1,19 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Generate definitions needed by the preprocessor. + * This code generates raw asm output which is post-processed + * to extract and format the required data. + */ + +#define __GENERATING_STRUCT_PAGE_SIZE_IS_POWER_OF_2_H +/* Include headers that define the enum constants of interest */ +#include +#include +#include + +int main(void) +{ + DEFINE(STRUCT_PAGE_SIZE_IS_POWER_OF_2, is_power_of_2(sizeof(struct page))= ); + + return 0; +} diff --git a/scripts/check_struct_page_po2.sh b/scripts/check_struct_page_p= o2.sh new file mode 100755 index 000000000000..9547ad3aca05 --- /dev/null +++ b/scripts/check_struct_page_po2.sh @@ -0,0 +1,11 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Check if the size of "struct page" is power of 2 + +file=3D"include/generated/struct_page_size.h" +if [ ! -f "$file" ]; then + exit 1 +fi + +grep STRUCT_PAGE_SIZE_IS_POWER_OF_2 "$file" | cut -d' ' -f3 --=20 2.11.0 From nobody Mon Jun 22 19:03:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69DA5C4332F for ; Fri, 18 Mar 2022 10:09:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234853AbiCRKKo (ORCPT ); Fri, 18 Mar 2022 06:10:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234832AbiCRKKk (ORCPT ); Fri, 18 Mar 2022 06:10:40 -0400 Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78D751FAA21 for ; Fri, 18 Mar 2022 03:09:20 -0700 (PDT) Received: by mail-pj1-x102e.google.com with SMTP id m11-20020a17090a7f8b00b001beef6143a8so7912000pjl.4 for ; Fri, 18 Mar 2022 03:09:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WwEC4WbmrPNUQyCrBW01Ge/3AIxEgFtA8vMHcjbrgUI=; b=pFHhxarl1w/haQopvFTYGEKrEjTDEhvc8Y1ROTjzbsriOVSGD0TQodxuvP1n0E6JNC 4JTGW8wGVt/u3qZ1fafuRXfUKMC8dbMNbHQ8FtNmDVtslx/osFnOLyluZYRPKeqd7WO+ kvpsqJPYVkqB55mTsG9Yq5kqf8C1hiz/b7R/LOmdtescHgXs6yhEs6goe7fm3VvJf0e4 OYqI0i+ACvgaq5PU/BoG6nz3FGrsEyCldcGViA+PAL+a/6gg1X06B4Ba4LeT5H21Gz0G MHe9kY0KO2xqSlMlgbddMN6fpgag/+X2DGKiKFxyIRJ3EjN7GbIH/M/+gsCUZf8+maC8 9J7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WwEC4WbmrPNUQyCrBW01Ge/3AIxEgFtA8vMHcjbrgUI=; b=KDQ69qrKMwyBidMrRQp/LEL+nEkJbxiXFtgXdA8I46UAifDH5Ho02i1yzbJapS1VqK ciiqrzzEX/GmO9v7felXDIYu82AYbUAGCTQxJkGzRYk7K/OkR+Ygj8+DTqICN6y2vzU1 Vr7Y7UC+xmudQoJGBrVocs4766dK5eFZLDdvoJKygyQgjaqdVMk0LYotmXKGRF+J/D8f 8hBUgm2BoDBYSb/gdlkS3n+fS7yBTRqVhAPQhn7e+QsPYVrDH08CQPI1pl2S4WqnWxYU nW9Q4EpGxzBRgL1WAgnFPKO8r51b397MkTcknRZU9Xbb17edg20vJirGQvw3ui2Rut1u Xwhg== X-Gm-Message-State: AOAM531g0geC3Xdz1kUiEGeu6/1Ct6m70lXf/GPQPiLXwGbnUNJt1nr3 bDwEwxpje/90yXcS3dpVStnrig== X-Google-Smtp-Source: ABdhPJwDnJVONODWzHybiJauSNnambY6vz8I3zJjMpEe1c+AtPn5LnIfdtAUWQvTdXzU3GQHegb/cA== X-Received: by 2002:a17:902:e74b:b0:152:fef9:a56 with SMTP id p11-20020a170902e74b00b00152fef90a56mr9624097plf.58.1647598159884; Fri, 18 Mar 2022 03:09:19 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id j3-20020a056a00234300b004fa042e8216sm9541252pfj.2.2022.03.18.03.09.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Mar 2022 03:09:19 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v4 2/4] mm: memory_hotplug: override memmap_on_memory when hugetlb_free_vmemmap=on Date: Fri, 18 Mar 2022 18:07:18 +0800 Message-Id: <20220318100720.14524-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220318100720.14524-1-songmuchun@bytedance.com> References: <20220318100720.14524-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When "hugetlb_free_vmemmap=3Don" and "memory_hotplug.memmap_on_memory" are both passed to boot cmdline, the variable of "memmap_on_memory" will be set to 1 even if the vmemmap pages will not be allocated from the hotadded memory since the former takes precedence over the latter. In the next patch, we want to enable or disable the feature of freeing vmemmap pages of HugeTLB via sysctl. We need a way to know if the feature of memory_hotplug.memmap_on_memory is enabled when enabling the feature of freeing vmemmap pages since those two features are not compatible, however, the variable of "memmap_on_memory" cannot indicate this nowadays. Do not set "memmap_on_memory" to 1 when both parameters are passed to cmdline, in this case, "memmap_on_memory" could indicate if this feature is enabled by the users. Also introduce mhp_memmap_on_memory() helper to move the definition of "memmap_on_memory" to the scope of CONFIG_MHP_MEMMAP_ON_MEMORY. In the next patch, mhp_memmap_on_memory() will also be exported to be used in hugetlb_vmemmap.c. Signed-off-by: Muchun Song --- mm/memory_hotplug.c | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 416b38ca8def..da594b382829 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -42,14 +42,36 @@ #include "internal.h" #include "shuffle.h" =20 +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY +static int memmap_on_memory_set(const char *val, const struct kernel_param= *kp) +{ + if (hugetlb_free_vmemmap_enabled()) + return 0; + return param_set_bool(val, kp); +} + +static const struct kernel_param_ops memmap_on_memory_ops =3D { + .flags =3D KERNEL_PARAM_OPS_FL_NOARG, + .set =3D memmap_on_memory_set, + .get =3D param_get_bool, +}; =20 /* * memory_hotplug.memmap_on_memory parameter */ static bool memmap_on_memory __ro_after_init; -#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY -module_param(memmap_on_memory, bool, 0444); +module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory= , 0444); MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hot= plug"); + +static inline bool mhp_memmap_on_memory(void) +{ + return memmap_on_memory; +} +#else +static inline bool mhp_memmap_on_memory(void) +{ + return false; +} #endif =20 enum { @@ -1288,9 +1310,7 @@ bool mhp_supports_memmap_on_memory(unsigned long size) * altmap as an alternative source of memory, and we do not exactly * populate a single PMD. */ - return memmap_on_memory && - !hugetlb_free_vmemmap_enabled() && - IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && + return mhp_memmap_on_memory() && size =3D=3D memory_block_size_bytes() && IS_ALIGNED(vmemmap_size, PMD_SIZE) && IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)); @@ -2074,7 +2094,7 @@ static int __ref try_remove_memory(u64 start, u64 siz= e) * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in * the same granularity it was added - a single memory block. */ - if (memmap_on_memory) { + if (mhp_memmap_on_memory()) { nr_vmemmap_pages =3D walk_memory_blocks(start, size, NULL, get_nr_vmemmap_pages_cb); if (nr_vmemmap_pages) { --=20 2.11.0 From nobody Mon Jun 22 19:03:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C100DC433F5 for ; Fri, 18 Mar 2022 10:09:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234900AbiCRKKw (ORCPT ); Fri, 18 Mar 2022 06:10:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49558 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234870AbiCRKKn (ORCPT ); Fri, 18 Mar 2022 06:10:43 -0400 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A2C11FAA1E for ; Fri, 18 Mar 2022 03:09:25 -0700 (PDT) Received: by mail-pf1-x434.google.com with SMTP id u17so9003764pfk.11 for ; Fri, 18 Mar 2022 03:09:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VeiakI6ZdfDqy26wAEKtPtiKSfYNO4Hqsr8Ua8hMV6s=; b=RDQVX1ZRVvaucfFOAP3mBrB+koiw+26bwZKvIYeeTs9vsVtInLLnzToEhxEYTdN3ut ktUd0StaxKeGjBvUjKB7VKrY/VDc3VFbPXOe0TGBIKu5UEG1v0lvtubKLYNMaa75In78 xcs7Or2qQHTGxd4TttERqtkqxUQlMfb5qHJkMA5Fp9fBFxYUU7uuu55o7C7GwZcjQsRQ DLsXM42iBnB84MxvsJvE9jmVZYgtVZiG02+14Twn5UBcdG2qUx6trWh2k2eZKX9PLq/G l4iQpMy3PEelpVzA5kUBnDvgDu0Gs0NYhJz+c5WRLndFcTXsyrArxFXPGyHUVfCAlJxg yv8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VeiakI6ZdfDqy26wAEKtPtiKSfYNO4Hqsr8Ua8hMV6s=; b=KwJ76kReHY00JJkPWB7KHlsUAsMAJKmNHooTn1M2VHQl7W5gr1tU7YGkquf7d4gTrr RtnUxevMHrMphCoYgv/eFyOB+QkV2GM1AD7pcJOnyvpcgPW/bE4IYbL+d1kIuTqa9qmD w+ekje2bZc1MKcY7JE4umT2NeJ8CYn2HQx/6kS9pN5oJ745EnjB9av1lL7DDwFBFfxQh 3TfA5+1yReC8YlFArGkiVy8HFYCNPY7NLwz079j1jEi74zVoaKpijSnphUKAcio9uxaQ yetvxEZlZZXa6y8+u4qb6RucGiX0D6DimrX4Epdf3HrbC6mOgVMCRnnfPNtSYy5zCeUD bROg== X-Gm-Message-State: AOAM531Vyx8e5q1vTaM/yBmAiHzS+YlfRyU7Icq/2Cy76uHQnVRNJx4N 9Dc+8iiz2GP1452AxyhbUVbQWg== X-Google-Smtp-Source: ABdhPJxCCJnh1IXzUQpPAo1+tedl1QUehfYAAocuezbBd0A4UaNgvcR+Z92fhQTN/b1bZhBXA8y/uw== X-Received: by 2002:a65:418b:0:b0:382:250b:4dda with SMTP id a11-20020a65418b000000b00382250b4ddamr3508780pgq.428.1647598164788; Fri, 18 Mar 2022 03:09:24 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id j3-20020a056a00234300b004fa042e8216sm9541252pfj.2.2022.03.18.03.09.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Mar 2022 03:09:24 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v4 3/4] sysctl: allow to set extra1 to SYSCTL_ONE Date: Fri, 18 Mar 2022 18:07:19 +0800 Message-Id: <20220318100720.14524-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220318100720.14524-1-songmuchun@bytedance.com> References: <20220318100720.14524-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" proc_do_static_key() does not consider the situation where a sysctl is only allowed to be enabled and cannot be disabled under certain circumstances since it set "->extra1" to SYSCTL_ZERO unconditionally. This patch add the functionality to set "->extra1" accordingly. Signed-off-by: Muchun Song --- kernel/sysctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 770d5f7c7ae4..1e89c3e428ad 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1638,7 +1638,7 @@ int proc_do_static_key(struct ctl_table *table, int w= rite, .data =3D &val, .maxlen =3D sizeof(val), .mode =3D table->mode, - .extra1 =3D SYSCTL_ZERO, + .extra1 =3D table->extra1 =3D=3D SYSCTL_ONE ? SYSCTL_ONE : SYSCTL_ZERO, .extra2 =3D SYSCTL_ONE, }; =20 --=20 2.11.0 From nobody Mon Jun 22 19:03:20 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FCA8C433EF for ; Fri, 18 Mar 2022 10:09:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234882AbiCRKLA (ORCPT ); Fri, 18 Mar 2022 06:11:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234891AbiCRKKv (ORCPT ); Fri, 18 Mar 2022 06:10:51 -0400 Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41A521FB509 for ; Fri, 18 Mar 2022 03:09:31 -0700 (PDT) Received: by mail-pl1-x636.google.com with SMTP id n2so6620476plf.4 for ; Fri, 18 Mar 2022 03:09:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=yxPYjtXpfBf0hnIpFUPE7iclhmNThzJGoUWla3RxJ4c=; b=hnKOFrBbu1/7WR02NgOeXBxwK+ROyYs3aHbIACeEBqNgtvcSUTgVCZdgWSyBYMFr0j 2rkzhGzTEm14AdWbacYdTQuP44qjRRhC/JtEo0KxsptigAIRK9wk2LURhB7HgZ35C8hF gh9xVWWxACVAPBwe/X4CSSppNgAMu3sBBiaW/YQCY5hv3nACsN7bY3s2j89D0+xOA3iB X5Iw/6hTXg6ze/45TjYkBkD4xuLSmGGmwgfuLIEpBC5RXdrjyh3sVsExtq+CKEeyscTU unBaXqEKnH/e/CHO85C+Tty0bgpI3KiA/aTqckNxOFDuirGq6GCfTXWmFJIoD7jdnAGc iilQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yxPYjtXpfBf0hnIpFUPE7iclhmNThzJGoUWla3RxJ4c=; b=dniRsZjT0kAd/9LPoP91kr8WcE6RM6TVNaRL0q5RzJIuY33dxjIKSNcUiX7zFKkaxE 8z5/o7RBdSfHgheIGKFxPg9y2NkADD6vHz72RJc9L/3ad00KZ28V48pDMuFi6ZBEuXOX 77pA8y0NP9BBKqXmJ6JEVaLWUgx37F9gh3WiyUoqmD9OE7ByxTcpAwI11IyqV1er+BCU rGOxO2NiDKg1d1mQUERHqwkCGsWNpzSByE45PQsgW0kPSBjuQYiM7kqFDOhqCCrcjisv 5Kdb3Qj0439OGSPWO0lRlvP9gCPjT/XqTZgP4nOz4Z2dvC4JlfwmN/Ekp4XU5VuWP2Bv USXQ== X-Gm-Message-State: AOAM532clMb4+Fe+sBJ35AwFbci8dy1zdBrSHtcQkzZd29xvdpoVO2xD go+f4pENrT05tx7iBvtMwgirbg== X-Google-Smtp-Source: ABdhPJz4cmjJa5gjbYkzAbeXtq6exBZgu6Q7fEKo5yUsm6W7EX0gl+E4lNHOJtVyPSInwrZiBFRgDw== X-Received: by 2002:a17:90b:4f4b:b0:1bf:bd24:263f with SMTP id pj11-20020a17090b4f4b00b001bfbd24263fmr10417342pjb.228.1647598170645; Fri, 18 Mar 2022 03:09:30 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id j3-20020a056a00234300b004fa042e8216sm9541252pfj.2.2022.03.18.03.09.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Mar 2022 03:09:30 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v4 4/4] mm: hugetlb_vmemmap: add hugetlb_free_vmemmap sysctl Date: Fri, 18 Mar 2022 18:07:20 +0800 Message-Id: <20220318100720.14524-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220318100720.14524-1-songmuchun@bytedance.com> References: <20220318100720.14524-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We must add "hugetlb_free_vmemmap=3Don" to boot cmdline and reboot the server to enable the feature of freeing vmemmap pages of HugeTLB pages. Rebooting usually takes a long time. Add a sysctl to enable or disable the feature at runtime without rebooting. Disabling requires there is no any optimized HugeTLB page in the system. If you fail to disable it, you can set "nr_hugepages" to 0 and then retry. Signed-off-by: Muchun Song --- Documentation/admin-guide/sysctl/vm.rst | 14 +++++ include/linux/memory_hotplug.h | 9 +++ mm/hugetlb_vmemmap.c | 101 +++++++++++++++++++++++++---= ---- mm/hugetlb_vmemmap.h | 4 +- mm/memory_hotplug.c | 7 +-- 5 files changed, 108 insertions(+), 27 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-= guide/sysctl/vm.rst index f4804ce37c58..9e0e153ed935 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -561,6 +561,20 @@ Change the minimum size of the hugepage pool. See Documentation/admin-guide/mm/hugetlbpage.rst =20 =20 +hugetlb_free_vmemmap +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap +pages associated with each HugeTLB page. Once true, the vmemmap pages of +subsequent allocation of HugeTLB pages from buddy system will be optimized, +whereas already allocated HugeTLB pages will not be optimized. If you fail +to disable this feature, you can set "nr_hugepages" to 0 and then retry +since it is only allowed to be disabled after there is no any optimized +HugeTLB page in the system. + +See Documentation/admin-guide/mm/hugetlbpage.rst + + nr_hugepages_mempolicy =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 1ce6f8044f1e..9b015b254e86 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -348,4 +348,13 @@ void arch_remove_linear_mapping(u64 start, u64 size); extern bool mhp_supports_memmap_on_memory(unsigned long size); #endif /* CONFIG_MEMORY_HOTPLUG */ =20 +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY +bool mhp_memmap_on_memory(void); +#else +static inline bool mhp_memmap_on_memory(void) +{ + return false; +} +#endif + #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 33ecb77c2b2a..f920073d52ba 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -176,6 +176,7 @@ */ #define pr_fmt(fmt) "HugeTLB: " fmt =20 +#include #include "hugetlb_vmemmap.h" =20 /* @@ -192,6 +193,10 @@ DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMM= AP_DEFAULT_ON, hugetlb_free_vmemmap_enabled_key); EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key); =20 +/* How many HugeTLB pages with vmemmap pages optimized. */ +static atomic_long_t optimized_pages =3D ATOMIC_LONG_INIT(0); +static DECLARE_RWSEM(sysctl_rwsem); + static int __init early_hugetlb_free_vmemmap_param(char *buf) { if (!buf) @@ -208,11 +213,6 @@ static int __init early_hugetlb_free_vmemmap_param(cha= r *buf) } early_param("hugetlb_free_vmemmap", early_hugetlb_free_vmemmap_param); =20 -static inline unsigned long free_vmemmap_pages_size_per_hpage(struct hstat= e *h) -{ - return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT; -} - /* * Previously discarded vmemmap pages will be allocated and remapping * after this function returns zero. @@ -221,14 +221,18 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct = page *head) { int ret; unsigned long vmemmap_addr =3D (unsigned long)head; - unsigned long vmemmap_end, vmemmap_reuse; + unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; =20 if (!HPageVmemmapOptimized(head)) return 0; =20 - vmemmap_addr +=3D RESERVE_VMEMMAP_SIZE; - vmemmap_end =3D vmemmap_addr + free_vmemmap_pages_size_per_hpage(h); - vmemmap_reuse =3D vmemmap_addr - PAGE_SIZE; + vmemmap_addr +=3D RESERVE_VMEMMAP_SIZE; + vmemmap_pages =3D free_vmemmap_pages_per_hpage(h); + vmemmap_end =3D vmemmap_addr + (vmemmap_pages << PAGE_SHIFT); + vmemmap_reuse =3D vmemmap_addr - PAGE_SIZE; + + VM_BUG_ON_PAGE(!vmemmap_pages, head); + /* * The pages which the vmemmap virtual address range [@vmemmap_addr, * @vmemmap_end) are mapped to are freed to the buddy allocator, and @@ -238,8 +242,14 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct p= age *head) */ ret =3D vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE); - if (!ret) + if (!ret) { ClearHPageVmemmapOptimized(head); + /* + * Paired with acquire semantic in + * hugetlb_free_vmemmap_handler(). + */ + atomic_long_dec_return_release(&optimized_pages); + } =20 return ret; } @@ -247,22 +257,28 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct = page *head) void free_huge_page_vmemmap(struct hstate *h, struct page *head) { unsigned long vmemmap_addr =3D (unsigned long)head; - unsigned long vmemmap_end, vmemmap_reuse; + unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; =20 - if (!free_vmemmap_pages_per_hpage(h)) - return; + down_read(&sysctl_rwsem); + vmemmap_pages =3D free_vmemmap_pages_per_hpage(h); + if (!vmemmap_pages) + goto out; =20 - vmemmap_addr +=3D RESERVE_VMEMMAP_SIZE; - vmemmap_end =3D vmemmap_addr + free_vmemmap_pages_size_per_hpage(h); - vmemmap_reuse =3D vmemmap_addr - PAGE_SIZE; + vmemmap_addr +=3D RESERVE_VMEMMAP_SIZE; + vmemmap_end =3D vmemmap_addr + (vmemmap_pages << PAGE_SHIFT); + vmemmap_reuse =3D vmemmap_addr - PAGE_SIZE; =20 /* * Remap the vmemmap virtual address range [@vmemmap_addr, @vmemmap_end) * to the page which @vmemmap_reuse is mapped to, then free the pages * which the range [@vmemmap_addr, @vmemmap_end] is mapped to. */ - if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) + if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) { SetHPageVmemmapOptimized(head); + atomic_long_inc(&optimized_pages); + } +out: + up_read(&sysctl_rwsem); } =20 void __init hugetlb_vmemmap_init(struct hstate *h) @@ -278,9 +294,6 @@ void __init hugetlb_vmemmap_init(struct hstate *h) BUILD_BUG_ON(__NR_USED_SUBPAGE >=3D RESERVE_VMEMMAP_SIZE / sizeof(struct page)); =20 - if (!hugetlb_free_vmemmap_enabled()) - return; - vmemmap_pages =3D (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; /* * The head page is not to be freed to buddy allocator, the other tail @@ -296,3 +309,51 @@ void __init hugetlb_vmemmap_init(struct hstate *h) pr_info("can free %d vmemmap pages for %s\n", h->nr_free_vmemmap_pages, h->name); } + +static int hugetlb_free_vmemmap_handler(struct ctl_table *table, int write, + void *buffer, size_t *length, + loff_t *ppos) +{ + int ret; + + down_write(&sysctl_rwsem); + /* + * Cannot be disabled when there is at lease one optimized + * HugeTLB in the system. + * + * The acquire semantic is paired with release semantic in + * alloc_huge_page_vmemmap(). If we saw the @optimized_pages + * with 0, all the operations of vmemmap pages remapping from + * alloc_huge_page_vmemmap() are visible too so that we can + * safely disable static key. + */ + table->extra1 =3D atomic_long_read_acquire(&optimized_pages) ? + SYSCTL_ONE : SYSCTL_ZERO; + ret =3D proc_do_static_key(table, write, buffer, length, ppos); + up_write(&sysctl_rwsem); + + return ret; +} + +static struct ctl_table hugetlb_vmemmap_sysctls[] =3D { + { + .procname =3D "hugetlb_free_vmemmap", + .data =3D &hugetlb_free_vmemmap_enabled_key.key, + .mode =3D 0644, + .proc_handler =3D hugetlb_free_vmemmap_handler, + }, + { } +}; + +static __init int hugetlb_vmemmap_sysctls_init(void) +{ + /* + * The vmemmap pages cannot be optimized if + * "memory_hotplug.memmap_on_memory" is enabled. + */ + if (!mhp_memmap_on_memory()) + register_sysctl_init("vm", hugetlb_vmemmap_sysctls); + + return 0; +} +late_initcall(hugetlb_vmemmap_sysctls_init); diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index cb2bef8f9e73..b67a159027f4 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -21,7 +21,9 @@ void hugetlb_vmemmap_init(struct hstate *h); */ static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h) { - return h->nr_free_vmemmap_pages; + if (hugetlb_free_vmemmap_enabled()) + return h->nr_free_vmemmap_pages; + return 0; } #else static inline int alloc_huge_page_vmemmap(struct hstate *h, struct page *h= ead) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index da594b382829..793c04cfe46f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -63,15 +63,10 @@ static bool memmap_on_memory __ro_after_init; module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory= , 0444); MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hot= plug"); =20 -static inline bool mhp_memmap_on_memory(void) +bool mhp_memmap_on_memory(void) { return memmap_on_memory; } -#else -static inline bool mhp_memmap_on_memory(void) -{ - return false; -} #endif =20 enum { --=20 2.11.0