From nobody Fri May 8 03:09:06 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B84FC433FE for ; Mon, 16 May 2022 10:23:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242581AbiEPKXu (ORCPT ); Mon, 16 May 2022 06:23:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242559AbiEPKXS (ORCPT ); Mon, 16 May 2022 06:23:18 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 891AAE085 for ; Mon, 16 May 2022 03:23:08 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id x12so13659136pgj.7 for ; Mon, 16 May 2022 03:23:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=E1HA7mTkokdfUBOTOx/UT/zRHvpwkRZMgaF2tqkC7Lc=; b=I0/JeIClQwBorAMPSgPnbRPgkEOQC7Mr3Pms0T10DCu0fw6VhfTLuxj7jOtLphdp95 2qMARSAiX5pBAbL4qdkFvFE3g3Mx2OLrOD61gNNw6loxErIQzZWg3y9F4UGd6h2GccgH BmINNSNVBA9veX8ta2i94KfzCly2VD/huBZRwQEtAveOHwDk4R3SeBoO2NUUarW9PqKG EEf5fPmZDyWYHJBD4VCbMznNow+dRJL5kc9OAiFvqAIbycA0Tf2lPHNw0UPuCPu0GVmJ zX33ayCnncNEV7Zu+mW3ITLLR9kOnBrXxJNprV2Oqql25YtAlOAzqgZqziQwIrhRKC3s Ke7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=E1HA7mTkokdfUBOTOx/UT/zRHvpwkRZMgaF2tqkC7Lc=; b=uwDU4IhcYAZU83x4xkG3AQKjIlSXLDT8cGJBk79rF0FfA93OzL5A5sFy2PpyXPeQc4 tIHCb8EiH5+pAKBvOB9XUP/hfYi63c2Z6X+rXD2wDc1MpPOuTuovo2UsvMsb7apWjvbp 3OLi/r0CffkYmBORhfmSLG+HiFShOrCPb4Kf8mXbIcfD/2j0COG5QefVO4rg2LivNz/3 NJjM9PWE19V8RnR7D9Qlta4PG97/DgNFTUtR5AkuJBR/P+7HzQIQNqEYUIwdsmpR37KI 9xS13S04/ExQzGvDEghy1vjVYOewED65lknMunNCgap84rS/RkSXP9rK4Yn4AuUPfHC2 uXcA== X-Gm-Message-State: AOAM532iGQRJR+zL6GggHvV96V9SPi9CFry8/wbQZru+mRP3bStewog/ 21qMiN+9xwyLD6KI9GT+dxJGJA== X-Google-Smtp-Source: ABdhPJx+le9SIrFuAU5RA8TqPnUWh9duy4U2Dkz5aeGmIOp6I4BVYjVIVE/iqBYV3UcV8r9rt34QsA== X-Received: by 2002:a05:6a00:1492:b0:50e:11ae:f62f with SMTP id v18-20020a056a00149200b0050e11aef62fmr17044341pfu.43.1652696588092; Mon, 16 May 2022 03:23:08 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:07 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 1/7] mm: hugetlb_vmemmap: disable hugetlb_optimize_vmemmap when struct page crosses page boundaries Date: Mon, 16 May 2022 18:22:05 +0800 Message-Id: <20220516102211.41557-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If the size of "struct page" is not the power of two but with the feature of minimizing overhead of struct page associated with each HugeTLB is enabled, then the vmemmap pages of HugeTLB will be corrupted after remapping (panic is about to happen in theory). But this only exists when !CONFIG_MEMCG && !CONFIG_SLUB on x86_64. However, it is not a conventional configuration nowadays. So it is not a real word issue, just the result of a code review. But we cannot prevent anyone from configuring that combined configure. This hugetlb_optimize_vmemmap should be disable in this case to fix this issue. Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Acked-by: David Hildenbrand Reviewed-by: Oscar Salvador --- mm/hugetlb_vmemmap.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 29554c6ef2ae..6254bb2d4ae5 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -28,12 +28,6 @@ EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); =20 static int __init hugetlb_vmemmap_early_param(char *buf) { - /* We cannot optimize if a "struct page" crosses page boundaries. */ - if (!is_power_of_2(sizeof(struct page))) { - pr_warn("cannot free vmemmap pages because \"struct page\" crosses page = boundaries\n"); - return 0; - } - if (!buf) return -EINVAL; =20 @@ -119,6 +113,12 @@ void __init hugetlb_vmemmap_init(struct hstate *h) if (!hugetlb_optimize_vmemmap_enabled()) return; =20 + if (!is_power_of_2(sizeof(struct page))) { + pr_warn_once("cannot optimize vmemmap pages because \"struct page\" cros= ses page boundaries\n"); + static_branch_disable(&hugetlb_optimize_vmemmap_key); + return; + } + vmemmap_pages =3D (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; /* * The head page is not to be freed to buddy allocator, the other tail --=20 2.11.0 From nobody Fri May 8 03:09:06 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35F1EC433EF for ; Mon, 16 May 2022 10:23:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242503AbiEPKXp (ORCPT ); Mon, 16 May 2022 06:23:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242613AbiEPKXT (ORCPT ); Mon, 16 May 2022 06:23:19 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4C5CE0F7 for ; Mon, 16 May 2022 03:23:14 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id ev18so3388724pjb.4 for ; Mon, 16 May 2022 03:23:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=D8MT2hpyVq6RWkeIwFsjJwgP64ytBP9yS8NMgw6YICA=; b=DKzSkynxZeLrta+Y2p0gffzxOvDJHI6t71Uy+/CZImu1BPgayQiPvNudLg0SLSTZkt Iz3qrv0fn288Qp21h6fbFOz8USa18tIo14smiheLi0oCVsNIHFk+tJa2ZKSa0md5EZCA 0nNfAbv6x1RYJeT+AV3KT70HHMy+yge+s//u83DSm0wixBgz3c2xSlyD7pugDiOCHEWE XNfpWYPPjjLLFuH7Qlkq5rfxrOtMyd78/tMd1E9Iqr+NZsrcAn+Z67mFnouHnpcieUwi LQXcgkYH+L+SUToJfKGzYoULwqtHtBziNZcsFbRv1ALDNkyJmfUFRvcnaSzyINGw2UI4 G2yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=D8MT2hpyVq6RWkeIwFsjJwgP64ytBP9yS8NMgw6YICA=; b=2MpiLdhJi/TaH3oIQzRYUDQ51YYhdgeE+7cE0dKwK8ndBGaeA+shhgMMPgcACPCFkH B0fTTVM7A3a13okQ03KwLUBKIf60W1YKaVKtWx9JAoFNyqPwJT1BM+7L2g45rAqP/qwt t8rMErhpJ31f8aFkoiXSXwVkBDbR4k8PtVBx2C2lw04ciRgUNMPD6YNWJF61im/7DsVt 9nSGGZOusFwA7Kij9ozOAnWOL18bnf1YobFkwOA1R7+kVSlm/ghUy5e+fMJkhc+V8kZx L0Wls8zXpxjFuHYwc+gwrBIxDVw1LmBA7GiFwIjlLnLn6pUZW+bXXVsOpHVRvsPOxIuX zV6g== X-Gm-Message-State: AOAM531K6hKdDT/wpqJvxt6vNQANGtrP65wirmg3RyNJA9P30wL9SXzF RlWvDTWkAUCiUZXK3Uo8E4vr1Q== X-Google-Smtp-Source: ABdhPJw8tB9MAEiv0ASMkUFc6CnGi+nHHhMKptiG8ntp7nzRoMhc0imKrJfdtQmE9hPZnzGG0TQICA== X-Received: by 2002:a17:90b:4d11:b0:1dc:ec4f:a19c with SMTP id mw17-20020a17090b4d1100b001dcec4fa19cmr29767454pjb.117.1652696594183; Mon, 16 May 2022 03:23:14 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:13 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 2/7] mm: hugetlb_vmemmap: use kstrtobool for hugetlb_vmemmap param parsing Date: Mon, 16 May 2022 18:22:06 +0800 Message-Id: <20220516102211.41557-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Use kstrtobool rather than open coding "on" and "off" parsing in mm/hugetlb_vmemmap.c, which is more powerful to handle all kinds of parameters like 'Yy1Nn0' or [oO][NnFf] for "on" and "off". Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Acked-by: David Hildenbrand Reviewed-by: Oscar Salvador --- mm/hugetlb_vmemmap.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 6254bb2d4ae5..cc4ec752ec16 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -28,15 +28,15 @@ EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); =20 static int __init hugetlb_vmemmap_early_param(char *buf) { - if (!buf) + bool enable; + + if (kstrtobool(buf, &enable)) return -EINVAL; =20 - if (!strcmp(buf, "on")) + if (enable) static_branch_enable(&hugetlb_optimize_vmemmap_key); - else if (!strcmp(buf, "off")) - static_branch_disable(&hugetlb_optimize_vmemmap_key); else - return -EINVAL; + static_branch_disable(&hugetlb_optimize_vmemmap_key); =20 return 0; } --=20 2.11.0 From nobody Fri May 8 03:09:06 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78DC0C433FE for ; Mon, 16 May 2022 10:23:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238167AbiEPKXi (ORCPT ); Mon, 16 May 2022 06:23:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242498AbiEPKXV (ORCPT ); Mon, 16 May 2022 06:23:21 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B43EDF0B for ; Mon, 16 May 2022 03:23:20 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id bh5so2192789plb.6 for ; Mon, 16 May 2022 03:23:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=304UY58xJZ+d2CkQooSXCSWWtYW1Q2iHG2puQFFW3wk=; b=TVrqS+L9G/q2rbodkQRoJPGzhMf2BPdNYw7FBBOap7cujZkLJN/HgS00hqAD+4eiNx TwFVPXjnc1k87mNghh6HMrJ7ZRmjtxW3Sz8Y1egdbjz1iK13UVUtSo9PN4r3MfqVFM9W sGzOpiwNENmW4tNkb3El0KH+0HgCAjqrPDbWGQoKxx/ZKitejeBHdR0UJd4al+Q05o/b mpw4b4hPBJPQ8mQgGhBou3w5oZ9DX/XHrNdE2lVytllAzl9yV+BZQXoAsFLeVi4o2QlI +LCoVhg+HfjgFKqq1mm5OCS06/Ivfxdz7EDPWGWl+0CEnQvCMo49R/0pes1f1a9EF9M2 6iZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=304UY58xJZ+d2CkQooSXCSWWtYW1Q2iHG2puQFFW3wk=; b=um6riXlFpz8Q/W0OiQqAj6439fd0Cmdvt8fp33jXRAgFG0nWuCiBUX+j2pjdQOPuhp bIjUxWJaGEiE6/3+1PNMN7gHhXcHZN2Rdif4fS/oXOL/NiftoB8poXLur8YjLj84dBz+ OOIRoqPIHKP86r4jIQaNZS0v8Ddt5k/YJkwhGF5bwE3DYQMqe/RJTKnr7lnEb22bC2ls FZksNlaQLmOX8YGjvGWd0NbCU+Mdg+bdoTZQJigZ7QNX01Y0p6L1R5gbhls0rB1fDPNU jV7c63Tup1GZ9EY3y+ldw+QHMb/iFoK4HI2xv3rYDHyV58KUBN/BLsCm11BGcSsJbhoN xcoQ== X-Gm-Message-State: AOAM532m3feqg62TYNU3Snch6KAxBPIDdWsWoK+lyOlVDE0JTdPMowQt 3GJn67d2HBwoWiYQM5ajUP105w== X-Google-Smtp-Source: ABdhPJzLO2sspgSF1GMFuqfn3V6NhOrHC6osoIFEG23hv3578fqx7SBZnJ4T/3rONCHLN0+TuozGow== X-Received: by 2002:a17:902:a9ca:b0:161:54a6:af3f with SMTP id b10-20020a170902a9ca00b0016154a6af3fmr9856256plr.48.1652696599775; Mon, 16 May 2022 03:23:19 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:19 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 3/7] mm: memory_hotplug: enumerate all supported section flags Date: Mon, 16 May 2022 18:22:07 +0800 Message-Id: <20220516102211.41557-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We are almost running out of free slots, only one bit is available in the worst case (powerpc with 256k pages). However, there are still some free slots on other architectures (e.g. x86_64 has 10 bits available, arm64 has 8 bits available with worst case of 64K pages). We have hard coded those numbers in code, it is inconvenient to use those bits on other architectures except powerpc. So transfer those section flags to enumeration to make it easy to add new section flags in the future. Also, move SECTION_TAINT_ZONE_DEVICE into the scope of CONFIG_ZONE_DEVICE to save a bit on non-zone-device case. Signed-off-by: Muchun Song --- include/linux/kconfig.h | 1 + include/linux/mmzone.h | 37 +++++++++++++++++++++++++++++-------- mm/memory_hotplug.c | 6 ++++++ 3 files changed, 36 insertions(+), 8 deletions(-) diff --git a/include/linux/kconfig.h b/include/linux/kconfig.h index 20d1079e92b4..7044032b9f42 100644 --- a/include/linux/kconfig.h +++ b/include/linux/kconfig.h @@ -10,6 +10,7 @@ #define __LITTLE_ENDIAN 1234 #endif =20 +#define __ARG_PLACEHOLDER_ 0, #define __ARG_PLACEHOLDER_1 0, #define __take_second_arg(__ignored, val, ...) val =20 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index aab70355d64f..af057e20b9d7 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1418,16 +1418,37 @@ extern size_t mem_section_usage_size(void); * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the * worst combination is powerpc with 256k pages, * which results in PFN_SECTION_SHIFT equal 6. - * To sum it up, at least 6 bits are available. + * To sum it up, at least 6 bits are available on all architectures. + * However, we can exceed 6 bits on some other architectures except + * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available + * with the worst case of 64K pages on arm64) if we make sure the + * exceeded bit is not applicable to powerpc. */ -#define SECTION_MARKED_PRESENT (1UL<<0) -#define SECTION_HAS_MEM_MAP (1UL<<1) -#define SECTION_IS_ONLINE (1UL<<2) -#define SECTION_IS_EARLY (1UL<<3) -#define SECTION_TAINT_ZONE_DEVICE (1UL<<4) -#define SECTION_MAP_LAST_BIT (1UL<<5) +#define ENUM_SECTION_FLAG(MAPPER) \ + MAPPER(MARKED_PRESENT) \ + MAPPER(HAS_MEM_MAP) \ + MAPPER(IS_ONLINE) \ + MAPPER(IS_EARLY) \ + MAPPER(TAINT_ZONE_DEVICE, CONFIG_ZONE_DEVICE) \ + MAPPER(MAP_LAST_BIT) + +#define __SECTION_SHIFT_FLAG_MAPPER_0(x) +#define __SECTION_SHIFT_FLAG_MAPPER_1(x) SECTION_##x##_SHIFT, +#define __SECTION_SHIFT_FLAG_MAPPER(x, ...) \ + __PASTE(__SECTION_SHIFT_FLAG_MAPPER_, IS_ENABLED(__VA_ARGS__))(x) + +#define __SECTION_FLAG_MAPPER_0(x) +#define __SECTION_FLAG_MAPPER_1(x) SECTION_##x =3D BIT(SECTION_##x##_SHIF= T), +#define __SECTION_FLAG_MAPPER(x, ...) \ + __PASTE(__SECTION_FLAG_MAPPER_, IS_ENABLED(__VA_ARGS__))(x) + +enum { + ENUM_SECTION_FLAG(__SECTION_SHIFT_FLAG_MAPPER) + ENUM_SECTION_FLAG(__SECTION_FLAG_MAPPER) +}; + #define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1)) -#define SECTION_NID_SHIFT 6 +#define SECTION_NID_SHIFT SECTION_MAP_LAST_BIT_SHIFT =20 static inline struct page *__section_mem_map_addr(struct mem_section *sect= ion) { diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 111684878fd9..aef3f041dec7 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -655,12 +655,18 @@ static void __meminit resize_pgdat_range(struct pglis= t_data *pgdat, unsigned lon =20 } =20 +#ifdef CONFIG_ZONE_DEVICE static void section_taint_zone_device(unsigned long pfn) { struct mem_section *ms =3D __pfn_to_section(pfn); =20 ms->section_mem_map |=3D SECTION_TAINT_ZONE_DEVICE; } +#else +static inline void section_taint_zone_device(unsigned long pfn) +{ +} +#endif =20 /* * Associate the pfn range with the given zone, initializing the memmaps --=20 2.11.0 From nobody Fri May 8 03:09:06 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C644C433EF for ; Mon, 16 May 2022 10:24:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242538AbiEPKX7 (ORCPT ); Mon, 16 May 2022 06:23:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242523AbiEPKXk (ORCPT ); Mon, 16 May 2022 06:23:40 -0400 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CDB7DF10 for ; Mon, 16 May 2022 03:23:25 -0700 (PDT) Received: by mail-pf1-x42c.google.com with SMTP id x23so13619016pff.9 for ; Mon, 16 May 2022 03:23:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7JtnflfEJKqY3LT1d1QkIpZrqHmFAbtgKUqGlXFNu5E=; b=zRkfLhylUGkW87aY79564FT+hxq3E8o3xWGXVlFy5/+agN802xij4IQ2Jukpjml3jf ORGeN/u+DCHrINLpyO1K7vYpgkzZVHv/9TiCbQrCrWYBJchQkMGaD5Jxbv+1sDvPOgub HgJpIdm+FEqbItSW8SVZm4pinzoXxQrmnAr3mAiQcicfuGRE3b8XU60z/l0F6xw6x1Uh V1KWKntZlX/ldkdrHqYKR2UpfYWKHum3iOuKJT6xzzKSTel5YDB7QPb8VvKK8e+FLFQ7 ctexPvxj0Bx3IUyNlRl12y9lYXgNrpmXYR7apL32qy2Sfp1iE8+cVeu4e+Smsn61v50t RwfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7JtnflfEJKqY3LT1d1QkIpZrqHmFAbtgKUqGlXFNu5E=; b=U88rDmzsgHrHD/t1Lz5WoaMleqCjXFntlBRevvdo00bRHeVNNZIeFmoLcxc8smrK7Z ZcHjALRIafVGBKEjaVoLHXqGhamvxMXezzHQdhs0sryOyYZF5IaMWkCYvu4U2DTQGP8+ 3fY304DED8VyG/hgZukZOrfqXyimkM3NWCxBpn2xSNNJi5kw968PuuYlFNS79NMGqmsu OVjMPHXsNcVSh3QL9R/CG4Sgr3+vcRGJsci29O+ys0+6XHZjnCAtRjRtR7fmLpYeS4BA dBOPiKvWtVy2MC7cBatCPgSI+kTkdeA1TqID66eK59mJegqhjotogxN10smzL/jQ/jo0 cZyQ== X-Gm-Message-State: AOAM532pMNX4hPYcWykmAlSq9Rg27x3GX935hIeC/TH2zkMnamI2UFse 0qVcJSQmexq2DTZiF3qs8Lfgtw== X-Google-Smtp-Source: ABdhPJxVdKZZYlQsIfygts/ghRsAOYGQgWuAMSj1u7mQPP1e+E/GnmSwo7g+ZIlowU+ehRCdMmkPDg== X-Received: by 2002:a05:6a00:2295:b0:510:635b:5eee with SMTP id f21-20020a056a00229500b00510635b5eeemr16796975pfe.20.1652696605396; Mon, 16 May 2022 03:23:25 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:25 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 4/7] mm: hotplug: introduce SECTION_CANNOT_OPTIMIZE_VMEMMAP Date: Mon, 16 May 2022 18:22:08 +0800 Message-Id: <20220516102211.41557-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" For now, the feature of hugetlb_free_vmemmap is not compatible with the feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap takes precedence over memory_hotplug.memmap_on_memory. However, someone wants to make memory_hotplug.memmap_on_memory takes precedence over hugetlb_free_vmemmap since memmap_on_memory makes it more likely to succeed memory hotplug in close-to-OOM situations. So the decision of making hugetlb_free_vmemmap take precedence is not wise and elegant. The proper approach is to have hugetlb_vmemmap.c do the check whether the section which the HugeTLB pages belong to can be optimized. If the section's vmemmap pages are allocated from the added memory block itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap, otherwise, do the optimization. Then both kernel parameters are compatible. So this patch introduces SECTION_CANNOT_OPTIMIZE_VMEMMAP to indicate whether the section could be optimized. Signed-off-by: Muchun Song --- Documentation/admin-guide/kernel-parameters.txt | 22 +++++++++++----------- include/linux/mmzone.h | 17 +++++++++++++++++ mm/hugetlb_vmemmap.c | 16 +++++++++++++++- mm/memory_hotplug.c | 1 - mm/sparse.c | 7 +++++++ 5 files changed, 50 insertions(+), 13 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 308da668bbb1..a0a014f2104c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1711,9 +1711,11 @@ Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=3Dy, the default is on. =20 - This is not compatible with memory_hotplug.memmap_on_memory. - If both parameters are enabled, hugetlb_free_vmemmap takes - precedence over memory_hotplug.memmap_on_memory. + Note that the vmemmap pages may be allocated from the added + memory block itself when memory_hotplug.memmap_on_memory is + enabled, those vmemmap pages cannot be optimized even if this + feature is enabled. Other vmemmap pages not allocated from + the added memory block itself do not be affected. =20 hung_task_panic=3D [KNL] Should the hung task detector generate panics. @@ -3038,10 +3040,12 @@ [KNL,X86,ARM] Boolean flag to enable this feature. Format: {on | off (default)} When enabled, runtime hotplugged memory will - allocate its internal metadata (struct pages) - from the hotadded memory which will allow to - hotadd a lot of memory without requiring - additional memory to do so. + allocate its internal metadata (struct pages, + those vmemmap pages cannot be optimized even + if hugetlb_free_vmemmap is enabled) from the + hotadded memory which will allow to hotadd a + lot of memory without requiring additional + memory to do so. This feature is disabled by default because it has some implication on large (e.g. GB) allocations in some configurations (e.g. small @@ -3051,10 +3055,6 @@ Note that even when enabled, there are a few cases where the feature is not effective. =20 - This is not compatible with hugetlb_free_vmemmap. If - both parameters are enabled, hugetlb_free_vmemmap takes - precedence over memory_hotplug.memmap_on_memory. - memtest=3D [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest Format: default : 0 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index af057e20b9d7..7b69acc5c2a9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1430,6 +1430,7 @@ extern size_t mem_section_usage_size(void); MAPPER(IS_ONLINE) \ MAPPER(IS_EARLY) \ MAPPER(TAINT_ZONE_DEVICE, CONFIG_ZONE_DEVICE) \ + MAPPER(CANNOT_OPTIMIZE_VMEMMAP, CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) \ MAPPER(MAP_LAST_BIT) =20 #define __SECTION_SHIFT_FLAG_MAPPER_0(x) @@ -1457,6 +1458,22 @@ static inline struct page *__section_mem_map_addr(st= ruct mem_section *section) return (struct page *)map; } =20 +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP +static inline void section_mark_cannot_optimize_vmemmap(struct mem_section= *ms) +{ + ms->section_mem_map |=3D SECTION_CANNOT_OPTIMIZE_VMEMMAP; +} + +static inline int section_cannot_optimize_vmemmap(struct mem_section *ms) +{ + return (ms && (ms->section_mem_map & SECTION_CANNOT_OPTIMIZE_VMEMMAP)); +} +#else +static inline void section_mark_cannot_optimize_vmemmap(struct mem_section= *ms) +{ +} +#endif + static inline int present_section(struct mem_section *section) { return (section && (section->section_mem_map & SECTION_MARKED_PRESENT)); diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index cc4ec752ec16..970c36b8935f 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -75,12 +75,26 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page= *head) return ret; } =20 +static unsigned int optimizable_vmemmap_pages(struct hstate *h, + struct page *head) +{ + unsigned long pfn =3D page_to_pfn(head); + unsigned long end =3D pfn + pages_per_huge_page(h); + + for (; pfn < end; pfn +=3D PAGES_PER_SECTION) { + if (section_cannot_optimize_vmemmap(__pfn_to_section(pfn))) + return 0; + } + + return hugetlb_optimize_vmemmap_pages(h); +} + void hugetlb_vmemmap_free(struct hstate *h, struct page *head) { unsigned long vmemmap_addr =3D (unsigned long)head; unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; =20 - vmemmap_pages =3D hugetlb_optimize_vmemmap_pages(h); + vmemmap_pages =3D optimizable_vmemmap_pages(h, head); if (!vmemmap_pages) return; =20 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index aef3f041dec7..1d0225d57166 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1270,7 +1270,6 @@ bool mhp_supports_memmap_on_memory(unsigned long size) * populate a single PMD. */ return memmap_on_memory && - !hugetlb_optimize_vmemmap_enabled() && IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && size =3D=3D memory_block_size_bytes() && IS_ALIGNED(vmemmap_size, PMD_SIZE) && diff --git a/mm/sparse.c b/mm/sparse.c index d2d76d158b39..8197ef9b7c4c 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -913,6 +913,13 @@ int __meminit sparse_add_section(int nid, unsigned lon= g start_pfn, ms =3D __nr_to_section(section_nr); set_section_nid(section_nr, nid); __section_mark_present(ms, section_nr); + /* + * Mark whole section as non-optimizable once there is a subsection + * whose vmemmap pages are allocated from alternative allocator. The + * early section is always optimizable. + */ + if (!early_section(ms) && altmap) + section_mark_cannot_optimize_vmemmap(ms); =20 /* Align memmap to section boundary in the subsection case */ if (section_nr_to_pfn(section_nr) !=3D start_pfn) --=20 2.11.0 From nobody Fri May 8 03:09:06 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75617C433F5 for ; Mon, 16 May 2022 10:24:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242628AbiEPKYQ (ORCPT ); Mon, 16 May 2022 06:24:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242616AbiEPKYB (ORCPT ); Mon, 16 May 2022 06:24:01 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E79BDDF5F for ; Mon, 16 May 2022 03:23:32 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id t11-20020a17090a6a0b00b001df6f318a8bso823174pjj.4 for ; Mon, 16 May 2022 03:23:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jSjHlEJZ1lWJ5sUgf+NErUi604xPZiJFf+96LoXPZRA=; b=tvaEqhT5/0VShTdmGcoV7QtkaEOFtFRpQgSee1AARQzV7qYhbtTSIUiemTABLiPqRJ wO0IWpVnGfUGXIuoRzl/oYax5/wlxFS7PIMUc9w28HhTQ9vupeD5RBq5b+SWjGq4BUQ+ daFSBYXRBJyH7Mb0rqkhKeUBYrwF2/lSQtUUcPp9B5FpIXRdP+dp11fvmMhY3RQwUct8 Qn6SF5MYrU6EgxrBpTj+oTe7c8e/pZXcdGtirgqcBmgBpO8pECK6KjYP7pzgUhQvU8BN XmKyNkvkbG2dH7BgTbrwpssZKSaxayfFF0JxUZ9QyiXXmkeT0M59bXNImnZ/bz6nZsY5 ASBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jSjHlEJZ1lWJ5sUgf+NErUi604xPZiJFf+96LoXPZRA=; b=KNd539p0MMSveEwmMozo6y0kuY4VxZwtPYlsjziYZpJGSvEEDtdRgc+nsvUntghcg4 KilaYPC3JPf2SVx8Z0DBBfsWaC8ManXapYmA/lPe7sqN3gFCfOLKy3QUHqMf3O4DwpeP 1PZOWRfp0cHFqb4+qM8RIO4vbWHBi5kY1rIcImFUW47CwZhcdDxmO0TDL5jFLW8XvxRe V0fHNeLVEgiG7QrB//KRlYhL20AGzymz7Qrqq1gV7TI+PVKbLJyvHWZxSacekiPNzvYn h1Y0MFpiXHygnCOeZn30hA+uGoKtEEI/6zxRxgMGCIum1F1RGHO+QabcnBsR0tHd8H0X bjag== X-Gm-Message-State: AOAM533g08FlRttA3+EueQdghmISIe4inpH0PYZc7pEGMTE64Q3PWJ1A SytRqOZBrEV1Ca7QsLJfVYVWNQ== X-Google-Smtp-Source: ABdhPJz8RxTwBuh5VCxsEKPELii/OLyV7DdKmnwjxoTMtu2gqnnHrzqm70OoYqtS83/xrVBs05SXow== X-Received: by 2002:a17:90b:1249:b0:1df:257a:539a with SMTP id gx9-20020a17090b124900b001df257a539amr9957427pjb.47.1652696612277; Mon, 16 May 2022 03:23:32 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:32 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song , Catalin Marinas , Will Deacon , Anshuman Khandual Subject: [PATCH v12 5/7] mm: hugetlb_vmemmap: remove hugetlb_optimize_vmemmap_enabled() Date: Mon, 16 May 2022 18:22:09 +0800 Message-Id: <20220516102211.41557-6-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There is only one user of hugetlb_optimize_vmemmap_enabled() outside of hugetlb_vmemmap, that is flush_dcache_page() in arch/arm64/mm/flush.c. However, it does not need to call hugetlb_optimize_vmemmap_enabled() in flush_dcache_page() since HugeTLB pages are always fully mapped and only head page will be set PG_dcache_clean meaning only head page 's flag may need to be cleared (see commit cf5a501d985b). After this change, there is no users of hugetlb_optimize_vmemmap_enabled() outside of hugetlb_vmemmap. So remove hugetlb_optimize_vmemmap_enabled() to simplify the code. Signed-off-by: Muchun Song Cc: Catalin Marinas Cc: Will Deacon Cc: Anshuman Khandual --- arch/arm64/mm/flush.c | 13 +++---------- include/linux/page-flags.h | 14 ++------------ mm/hugetlb_vmemmap.c | 3 ++- 3 files changed, 7 insertions(+), 23 deletions(-) diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index fc4f710e9820..5f9379b3c8c8 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -76,17 +76,10 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache); void flush_dcache_page(struct page *page) { /* - * Only the head page's flags of HugeTLB can be cleared since the tail - * vmemmap pages associated with each HugeTLB page are mapped with - * read-only when CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is enabled (more - * details can refer to vmemmap_remap_pte()). Although - * __sync_icache_dcache() only set PG_dcache_clean flag on the head - * page struct, there is more than one page struct with PG_dcache_clean - * associated with the HugeTLB page since the head vmemmap page frame - * is reused (more details can refer to the comments above - * page_fixed_fake_head()). + * HugeTLB pages are always fully mapped and only head page will be + * set PG_dcache_clean (see comments in __sync_icache_dcache()). */ - if (hugetlb_optimize_vmemmap_enabled() && PageHuge(page)) + if (PageHuge(page)) page =3D compound_head(page); =20 if (test_bit(PG_dcache_clean, &page->flags)) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b70124b9c7c1..404f4ede17f5 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -203,12 +203,6 @@ enum pageflags { DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, hugetlb_optimize_vmemmap_key); =20 -static __always_inline bool hugetlb_optimize_vmemmap_enabled(void) -{ - return static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_O= N, - &hugetlb_optimize_vmemmap_key); -} - /* * If the feature of optimizing vmemmap pages associated with each HugeTLB * page is enabled, the head vmemmap page frame is reused and all of the t= ail @@ -227,7 +221,8 @@ static __always_inline bool hugetlb_optimize_vmemmap_en= abled(void) */ static __always_inline const struct page *page_fixed_fake_head(const struc= t page *page) { - if (!hugetlb_optimize_vmemmap_enabled()) + if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, + &hugetlb_optimize_vmemmap_key)) return page; =20 /* @@ -255,11 +250,6 @@ static inline const struct page *page_fixed_fake_head(= const struct page *page) { return page; } - -static inline bool hugetlb_optimize_vmemmap_enabled(void) -{ - return false; -} #endif =20 static __always_inline int page_is_fake_head(struct page *page) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 970c36b8935f..d1fea65fec98 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -124,7 +124,8 @@ void __init hugetlb_vmemmap_init(struct hstate *h) BUILD_BUG_ON(__NR_USED_SUBPAGE >=3D RESERVE_VMEMMAP_SIZE / sizeof(struct page)); =20 - if (!hugetlb_optimize_vmemmap_enabled()) + if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, + &hugetlb_optimize_vmemmap_key)) return; =20 if (!is_power_of_2(sizeof(struct page))) { --=20 2.11.0 From nobody Fri May 8 03:09:06 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DBDFC433EF for ; Mon, 16 May 2022 10:24:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242697AbiEPKYc (ORCPT ); Mon, 16 May 2022 06:24:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242660AbiEPKYF (ORCPT ); Mon, 16 May 2022 06:24:05 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7158AE0E7 for ; Mon, 16 May 2022 03:23:38 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id w17-20020a17090a529100b001db302efed6so13870198pjh.4 for ; Mon, 16 May 2022 03:23:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=66QHju611WH53dYegSIeTLMF0HCCMVlkCTvmtXu5zP0=; b=dbOdnoBhv8yDK+GYyHYeN5G+d7/MHd9XVwc4tEPW7W3O+cZRT3Wv7x+znTXvNtO+f3 6bEsAy40OaeqAaep5aSPNUgrcDvjiV7Mbt0ptchb8A3pYElKJt1bNlTdgx7zFZ4MxyHX 0B5xxhVpkjVQjpaCkqHkLxuE/ntC47wRcsxOZZAg+3yzJVnK0NBoVC5kQbXHy99msudO Oe6PF6U9kalMK2v1ZjOxyE9Jl0fXH67dfu4LtcJmdbOW8f8piy081NEqSkdF/H+wgM4B qYkLF9/8nR7Xdej7zfxtkgHwip0sZnt3tx+uk8zKb9pebOPiv1hSqyGFpMK/ACJwdU4+ eKoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=66QHju611WH53dYegSIeTLMF0HCCMVlkCTvmtXu5zP0=; b=OloKGL4vMgMR8P1UbcKBXvWXu9mnndk2Cg9naBKExc3dYz+XWJJWd+QAI1a99H8DhZ GAGwMYzG3FuRIzTwNe3DpcTJ/4020m0zqlZVg1hiJ1R54j21kFVddpjSlrjXR5tEuY/M VcbQR6JYZZRpKbGthQ4UXmv8Zgi0luKfcQ3VRtTu1vE0BfqHJM6BJxEXxuDih+/AsIyB e5MjmUwoGOFVmDuCvZ0/h9bO9AfjkH5PQZdi8bq00HQLb0413oHncKvs1HlskZdsYokt m8mPdCzKuGb8RKc02nKnRvG5SVRaK88+32Tvne29tLLMAASTJ1yo1XolHnwAooNOVcA4 SXPQ== X-Gm-Message-State: AOAM531hbgipaVR/LDkJAnO4IH+HYNHYUcVAD3kX/zBLWMy+wkcfAD3n Z9rttiKU6cnp57+LvJQEVD61ZQ== X-Google-Smtp-Source: ABdhPJx7T73yVTTd72pWQdg0tme4uuKLleJpMJUhliphuOjg/2KAtHyrnRxxoAQV31EyZlOMr2VLWg== X-Received: by 2002:a17:902:ab8c:b0:15e:fd9f:3f39 with SMTP id f12-20020a170902ab8c00b0015efd9f3f39mr16805245plr.103.1652696618000; Mon, 16 May 2022 03:23:38 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:37 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 6/7] sysctl: handle table->maxlen properly for proc_dobool Date: Mon, 16 May 2022 18:22:10 +0800 Message-Id: <20220516102211.41557-7-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Setting ->proc_handler to proc_dobool at the same time setting ->maxlen to sizeof(int) is counter-intuitive, it is easy to make mistakes. For robustness, fix it by handling able->maxlen properly for proc_dobool in __do_proc_dointvec(). In the next patch, we will use proc_dobool which depends on this change. Signed-off-by: Muchun Song Cc: Luis Chamberlain Cc: Kees Cook Cc: Iurii Zaikin --- fs/lockd/svc.c | 2 +- kernel/sysctl.c | 22 ++++++++++++---------- 2 files changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 59ef8a1f843f..6e48ee787f49 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -496,7 +496,7 @@ static struct ctl_table nlm_sysctls[] =3D { { .procname =3D "nsm_use_hostnames", .data =3D &nsm_use_hostnames, - .maxlen =3D sizeof(int), + .maxlen =3D sizeof(nsm_use_hostnames), .mode =3D 0644, .proc_handler =3D proc_dobool, }, diff --git a/kernel/sysctl.c b/kernel/sysctl.c index e52b6e372c60..353fb9093012 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -428,6 +428,8 @@ static int do_proc_dobool_conv(bool *negp, unsigned lon= g *lvalp, int write, void *data) { if (write) { + if (*negp || (*lvalp !=3D 0 && *lvalp !=3D 1)) + return -EINVAL; *(bool *)valp =3D *lvalp; } else { int val =3D *(bool *)valp; @@ -489,17 +491,17 @@ static int __do_proc_dointvec(void *tbl_data, struct = ctl_table *table, int write, void *data), void *data) { - int *i, vleft, first =3D 1, err =3D 0; + int vleft, first =3D 1, err =3D 0, size; size_t left; char *p; -=09 + if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) { *lenp =3D 0; return 0; } -=09 - i =3D (int *) tbl_data; - vleft =3D table->maxlen / sizeof(*i); + + size =3D conv =3D=3D do_proc_dobool_conv ? sizeof(bool) : sizeof(int); + vleft =3D table->maxlen / size; left =3D *lenp; =20 if (!conv) @@ -514,7 +516,7 @@ static int __do_proc_dointvec(void *tbl_data, struct ct= l_table *table, p =3D buffer; } =20 - for (; left && vleft--; i++, first=3D0) { + for (; left && vleft--; tbl_data =3D (char *)tbl_data + size, first=3D0) { unsigned long lval; bool neg; =20 @@ -528,12 +530,12 @@ static int __do_proc_dointvec(void *tbl_data, struct = ctl_table *table, sizeof(proc_wspace_sep), NULL); if (err) break; - if (conv(&neg, &lval, i, 1, data)) { + if (conv(&neg, &lval, tbl_data, 1, data)) { err =3D -EINVAL; break; } } else { - if (conv(&neg, &lval, i, 0, data)) { + if (conv(&neg, &lval, tbl_data, 0, data)) { err =3D -EINVAL; break; } @@ -708,8 +710,8 @@ int do_proc_douintvec(struct ctl_table *table, int writ= e, * @lenp: the size of the user buffer * @ppos: file position * - * Reads/writes up to table->maxlen/sizeof(unsigned int) integer - * values from/to the user buffer, treated as an ASCII string. + * Reads/writes up to table->maxlen/sizeof(bool) bool values from/to + * the user buffer, treated as an ASCII string. * * Returns 0 on success. */ --=20 2.11.0 From nobody Fri May 8 03:09:06 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4190CC433EF for ; Mon, 16 May 2022 10:24:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242683AbiEPKYY (ORCPT ); Mon, 16 May 2022 06:24:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242681AbiEPKYF (ORCPT ); Mon, 16 May 2022 06:24:05 -0400 Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8329EEE14 for ; Mon, 16 May 2022 03:23:44 -0700 (PDT) Received: by mail-pg1-x530.google.com with SMTP id r71so13327659pgr.0 for ; Mon, 16 May 2022 03:23:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=T2nrfWBUpSmxs8BMPg2U3sDIf+/GmZoBVIZhJdaoytg=; b=dKfrkrt8fHskoCLeIWrzp3GTykb1mfrzZfzaNwlQhbuuppQU9DwGo3kN4jh2/u0bLJ 3wc5VM8Cum+4SLs8sovUu57dWoGKwIrj+pr6k+XfkLP4kCraKzQWV0J35AENNIYdmpcZ ALZdPHw2X/7y26GuGhJevp3hLFXJQCdq5mOMtIdYMASSV8V0hfb6pVUoty2udz2oz+vi ODQsPi1GDNcbojpVQHDCO5mqwsikzPZnK2mLUXGX03e3QleSYgMvs+4uB+9UplS+EYi9 OsUTfuLgwiatHZvBWUb+OhVnbkCxExJa17wYKRhMpX9DevNs6EhyvSsJy0DBiAXjzmZy HxqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=T2nrfWBUpSmxs8BMPg2U3sDIf+/GmZoBVIZhJdaoytg=; b=2n9Hl15KLGnlklczQoMGM36Of7n790wIyH5zZhGfJkK+4YxhnBLQNZuMk/lfnQW1qG DNiOtcapjVyKBtoPhLKEtguZswyIpo54u8V7tAccaSJsItSk4Za774ASVN+OKJes/ZkD KRMvLbILEEH6wl7J7BB+kWTxLdbBfqZ7oOLX39TWEMvR3gYRdLb3OFUHreo6KYYgUs8s Y+GJx96wpezv25jxHzUnEX+/GC4ezrGXZbpd6Z1vVrW82a0XFQVIdnHXQGvGA8+Dc+95 f0fWzfLJhKQtMt0ZSmGlKzx7F9zHX2rA0ruMREmjyZB1KlipLdeouIrQ7+9bms5JTwK8 cl2g== X-Gm-Message-State: AOAM533P4gbAURrBPq28vTEHUrAhV/fFa+EGKpfQpgSIEyBK+IIY8F3Y kDax2BbQ1PjuQeXdpAYZqb5Rpw== X-Google-Smtp-Source: ABdhPJz3wbyg+mEtptAOIyrqWF5aj1M8NyLRlopaeMm1hMN0AyhWhrSKsExN37wIZ3lwQKkUyFyMKg== X-Received: by 2002:a05:6a00:198f:b0:50d:bf61:3de9 with SMTP id d15-20020a056a00198f00b0050dbf613de9mr16755109pfl.16.1652696623985; Mon, 16 May 2022 03:23:43 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:43 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 7/7] mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl Date: Mon, 16 May 2022 18:22:11 +0800 Message-Id: <20220516102211.41557-8-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We must add hugetlb_free_vmemmap=3Don (or "off") to the boot cmdline and reboot the server to enable or disable the feature of optimizing vmemmap pages associated with HugeTLB pages. However, rebooting usually takes a long time. So add a sysctl to enable or disable the feature at runtime without rebooting. Why we need this? There are 3 use cases. 1) The feature of minimizing overhead of struct page associated with each HugeTLB is disabled by default without passing "hugetlb_free_vmemmap=3Don" to the boot cmdline. When we (ByteDance) deliver the servers to the users who want to enable this feature, they have to configure the grub (change boot cmdline) and reboot the servers, whereas rebooting usually takes a long time (we have thousands of servers). It's a very bad experience for the users. So we need a approach to enable this feature after rebooting. This is a use case in our practical environment. 2) Some use cases are that HugeTLB pages are allocated 'on the fly' instead of being pulled from the HugeTLB pool, those workloads would be affected with this feature enabled. Those workloads could be identified by the characteristics of they never explicitly allocating huge pages with 'nr_hugepages' but only set 'nr_overcommit_hugepages' and then let the pages be allocated from the buddy allocator at fault time. We can confirm it is a real use case from the commit 099730d67417. For those workloads, the page fault time could be ~2x slower than before. We suspect those users want to disable this feature if the system has enabled this before and they don't think the memory savings benefit is enough to make up for the performance drop. 3) If the workload which wants vmemmap pages to be optimized and the workload which wants to set 'nr_overcommit_hugepages' and does not want the extera overhead at fault time when the overcommitted pages be allocated from the buddy allocator are deployed in the same server. The user could enable this feature and set 'nr_hugepages' and 'nr_overcommit_hugepages', then disable the feature. In this case, the overcommited HugeTLB pages will not encounter the extra overhead at fault time. Signed-off-by: Muchun Song --- Documentation/admin-guide/sysctl/vm.rst | 38 ++++++++++++++++++++ include/linux/page-flags.h | 6 ++-- mm/hugetlb_vmemmap.c | 61 ++++++++++++++++++++++-------= ---- 3 files changed, 81 insertions(+), 24 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-= guide/sysctl/vm.rst index 747e325ebcd0..d7374a1e8ac9 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -562,6 +562,44 @@ Change the minimum size of the hugepage pool. See Documentation/admin-guide/mm/hugetlbpage.rst =20 =20 +hugetlb_optimize_vmemmap +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +This knob is not available when the size of 'struct page' (a structure def= ined +in include/linux/mm_types.h) is not power of two (an unusual system config= could +result in this). + +Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap = pages +associated with each HugeTLB page. + +Once enabled, the vmemmap pages of subsequent allocation of HugeTLB pages = from +buddy allocator will be optimized (7 pages per 2MB HugeTLB page and 4095 p= ages +per 1GB HugeTLB page), whereas already allocated HugeTLB pages will not be +optimized. When those optimized HugeTLB pages are freed from the HugeTLB = pool +to the buddy allocator, the vmemmap pages representing that range needs to= be +remapped again and the vmemmap pages discarded earlier need to be rellocat= ed +again. If your use case is that HugeTLB pages are allocated 'on the fly' = (e.g. +never explicitly allocating HugeTLB pages with 'nr_hugepages' but only set +'nr_overcommit_hugepages', those overcommitted HugeTLB pages are allocated= 'on +the fly') instead of being pulled from the HugeTLB pool, you should weigh = the +benefits of memory savings against the more overhead (~2x slower than befo= re) +of allocation or freeing HugeTLB pages between the HugeTLB pool and the bu= ddy +allocator. Another behavior to note is that if the system is under heavy = memory +pressure, it could prevent the user from freeing HugeTLB pages from the Hu= geTLB +pool to the buddy allocator since the allocation of vmemmap pages could be +failed, you have to retry later if your system encounter this situation. + +Once disabled, the vmemmap pages of subsequent allocation of HugeTLB pages= from +buddy allocator will not be optimized meaning the extra overhead at alloca= tion +time from buddy allocator disappears, whereas already optimized HugeTLB pa= ges +will not be affected. If you want to make sure there are no optimized Hug= eTLB +pages, you can set "nr_hugepages" to 0 first and then disable this. Note = that +writing 0 to nr_hugepages will make any "in use" HugeTLB pages become surp= lus +pages. So, those surplus pages are still optimized until they are no long= er +in use. You would need to wait for those surplus pages to be released bef= ore +there are no optimized pages in the system. + + nr_hugepages_mempolicy =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 404f4ede17f5..07d8d444d9f1 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -200,8 +200,7 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H =20 #ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP -DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - hugetlb_optimize_vmemmap_key); +DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); =20 /* * If the feature of optimizing vmemmap pages associated with each HugeTLB @@ -221,8 +220,7 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_V= MEMMAP_DEFAULT_ON, */ static __always_inline const struct page *page_fixed_fake_head(const struc= t page *page) { - if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - &hugetlb_optimize_vmemmap_key)) + if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key)) return page; =20 /* diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index d1fea65fec98..02862f117c2b 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -22,23 +22,15 @@ #define RESERVE_VMEMMAP_NR 1U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) =20 -DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - hugetlb_optimize_vmemmap_key); +DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); =20 +static bool optimize_vmemmap_enabled =3D + IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); + static int __init hugetlb_vmemmap_early_param(char *buf) { - bool enable; - - if (kstrtobool(buf, &enable)) - return -EINVAL; - - if (enable) - static_branch_enable(&hugetlb_optimize_vmemmap_key); - else - static_branch_disable(&hugetlb_optimize_vmemmap_key); - - return 0; + return kstrtobool(buf, &optimize_vmemmap_enabled); } early_param("hugetlb_free_vmemmap", hugetlb_vmemmap_early_param); =20 @@ -69,8 +61,10 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page = *head) */ ret =3D vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE); - if (!ret) + if (!ret) { ClearHPageVmemmapOptimized(head); + static_branch_dec(&hugetlb_optimize_vmemmap_key); + } =20 return ret; } @@ -81,6 +75,9 @@ static unsigned int optimizable_vmemmap_pages(struct hsta= te *h, unsigned long pfn =3D page_to_pfn(head); unsigned long end =3D pfn + pages_per_huge_page(h); =20 + if (!READ_ONCE(optimize_vmemmap_enabled)) + return 0; + for (; pfn < end; pfn +=3D PAGES_PER_SECTION) { if (section_cannot_optimize_vmemmap(__pfn_to_section(pfn))) return 0; @@ -98,6 +95,8 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *= head) if (!vmemmap_pages) return; =20 + static_branch_inc(&hugetlb_optimize_vmemmap_key); + vmemmap_addr +=3D RESERVE_VMEMMAP_SIZE; vmemmap_end =3D vmemmap_addr + (vmemmap_pages << PAGE_SHIFT); vmemmap_reuse =3D vmemmap_addr - PAGE_SIZE; @@ -107,7 +106,9 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page= *head) * to the page which @vmemmap_reuse is mapped to, then free the pages * which the range [@vmemmap_addr, @vmemmap_end] is mapped to. */ - if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) + if (vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) + static_branch_dec(&hugetlb_optimize_vmemmap_key); + else SetHPageVmemmapOptimized(head); } =20 @@ -124,13 +125,8 @@ void __init hugetlb_vmemmap_init(struct hstate *h) BUILD_BUG_ON(__NR_USED_SUBPAGE >=3D RESERVE_VMEMMAP_SIZE / sizeof(struct page)); =20 - if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - &hugetlb_optimize_vmemmap_key)) - return; - if (!is_power_of_2(sizeof(struct page))) { pr_warn_once("cannot optimize vmemmap pages because \"struct page\" cros= ses page boundaries\n"); - static_branch_disable(&hugetlb_optimize_vmemmap_key); return; } =20 @@ -149,3 +145,28 @@ void __init hugetlb_vmemmap_init(struct hstate *h) pr_info("can optimize %d vmemmap pages for %s\n", h->optimize_vmemmap_pages, h->name); } + +#ifdef CONFIG_PROC_SYSCTL +static struct ctl_table hugetlb_vmemmap_sysctls[] =3D { + { + .procname =3D "hugetlb_optimize_vmemmap", + .data =3D &optimize_vmemmap_enabled, + .maxlen =3D sizeof(optimize_vmemmap_enabled), + .mode =3D 0644, + .proc_handler =3D proc_dobool, + }, + { } +}; + +static int __init hugetlb_vmemmap_sysctls_init(void) +{ + /* + * If "struct page" crosses page boundaries, the vmemmap pages cannot + * be optimized. + */ + if (is_power_of_2(sizeof(struct page))) + register_sysctl_init("vm", hugetlb_vmemmap_sysctls); + return 0; +} +late_initcall(hugetlb_vmemmap_sysctls_init); +#endif /* CONFIG_PROC_SYSCTL */ --=20 2.11.0