From nobody Wed Dec 17 12:35:42 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E03D1C4167B for ; Mon, 27 Nov 2023 08:47:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232710AbjK0Irp (ORCPT ); Mon, 27 Nov 2023 03:47:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232719AbjK0Irl (ORCPT ); Mon, 27 Nov 2023 03:47:41 -0500 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81580125 for ; Mon, 27 Nov 2023 00:47:23 -0800 (PST) Received: by mail-pf1-x434.google.com with SMTP id d2e1a72fcca58-6c398717726so3114688b3a.2 for ; Mon, 27 Nov 2023 00:47:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701074843; x=1701679643; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9POB4kFuEKuzMcdZHLoubaPw/TJvHsiTa281mWkLZq8=; b=GC5F/CXFhYJuJUenZ0PWI85KLbBUvdyJlID0Gj9ZEVieYDdnKXIQN88P/8/PMOrm8D uLJMT8J8tEbEjzBU7NCfnWf+SW0jaCyAZI4xacKQFUFu4vpP9S2JzcTUHSbhFYPA1Uz3 5+VOzKjq47G0RIhab/wVQ0nE2X8mO/i54mQxSXb3Aof3ujrP1ZLmP88dASCYTMOv5ebo 3O+5ZpPnxeqhBQiMUMAxhigWbKndoLp9fyKp8BUvu16ngY/dq769Q5KPyDLfKxC7k9ah 4vASxSjqO3F59TOqQfPLWSBN/6HsNgdy6/ZDusLaVzBEwyazQ0EDNY+TpTDBcD5Yy6id Lfyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701074843; x=1701679643; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9POB4kFuEKuzMcdZHLoubaPw/TJvHsiTa281mWkLZq8=; b=F+Q6Glr7B+sPz+6l7wZ5ZxELK0JTF+WxP4VIM8UnqVsJ4VTBCUIOTxLzX8E+bo8QV/ Zl4p9vov2m0atpf+sPcOytJqk9a4KXtWXpdt40bTJzTANgsGceXgRZYOrHsuPsfFzrO2 mxc0Nwiw+9jeUFTfo2Bf8rOGdEdi7ID+H5Fr1UT85YM0ip979fAnrOTGmKFX3r0ve0T8 lsVvMtf6OtmMmVsNAocGbfMyQGve/JpcUxGCUyKzJ1bwj6YJ94uGaqt0TTk5LH2t/yMN aMHi/Gm8jDRyeTwAt2ejqyTZ7mEh1J/HSuidLRgr6nr9ZBkVoj69AkXgr6+6HKye+jXU O1xA== X-Gm-Message-State: AOJu0YxWCR8ySgBd6m/VYU/0PL+95G1JY4EGxEJflzs9NMN6wxlKyM5+ feXM1g6nvc9XHPt7cX6GMf5n2w== X-Google-Smtp-Source: AGHT+IGPbAAqj7jljm3HkPkr6GUGngrFdQat2yh2VPEfUSz7fn/6W6SAjj42octpaoYEmJpwaYLguA== X-Received: by 2002:a05:6a00:3926:b0:690:c75e:25c8 with SMTP id fh38-20020a056a00392600b00690c75e25c8mr11586636pfb.7.1701074843002; Mon, 27 Nov 2023 00:47:23 -0800 (PST) Received: from PXLDJ45XCM.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id e22-20020aa78c56000000b006c875abecbcsm6686932pfd.121.2023.11.27.00.47.20 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 27 Nov 2023 00:47:22 -0800 (PST) From: Muchun Song To: mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song Subject: [PATCH 1/4] mm: pagewalk: assert write mmap lock only for walking the user page tables Date: Mon, 27 Nov 2023 16:46:42 +0800 Message-Id: <20231127084645.27017-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231127084645.27017-1-songmuchun@bytedance.com> References: <20231127084645.27017-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The 8782fb61cc848 ("mm: pagewalk: Fix race between unmap and page walker") introduces an assertion to walk_page_range_novma() to make all the users of page table walker is safe. However, the race only exists for walking the user page tables. And it is ridiculous to hold a particular user mmap write lock against the changes of the kernel page tables. So only assert at least mmap read lock when walking the kernel page tables. And some users matching this case could downgrade to a mmap read lock to relief the contention of mmap lock of init_mm, it will be nicer in hugetlb (only holding mmap read lock) in the next patch. Signed-off-by: Muchun Song Acked-by: Mike Kravetz --- mm/pagewalk.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index b7d7e4fcfad7a..f46c80b18ce4f 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -539,6 +539,11 @@ int walk_page_range(struct mm_struct *mm, unsigned lon= g start, * not backed by VMAs. Because 'unusual' entries may be walked this functi= on * will also not lock the PTEs for the pte_entry() callback. This is usefu= l for * walking the kernel pages tables or page tables for firmware. + * + * Note: Be careful to walk the kernel pages tables, the caller may be nee= d to + * take other effective approache (mmap lock may be insufficient) to preve= nt + * the intermediate kernel page tables belonging to the specified address = range + * from being freed (e.g. memory hot-remove). */ int walk_page_range_novma(struct mm_struct *mm, unsigned long start, unsigned long end, const struct mm_walk_ops *ops, @@ -556,7 +561,29 @@ int walk_page_range_novma(struct mm_struct *mm, unsign= ed long start, if (start >=3D end || !walk.mm) return -EINVAL; =20 - mmap_assert_write_locked(walk.mm); + /* + * 1) For walking the user virtual address space: + * + * The mmap lock protects the page walker from changes to the page + * tables during the walk. However a read lock is insufficient to + * protect those areas which don't have a VMA as munmap() detaches + * the VMAs before downgrading to a read lock and actually tearing + * down PTEs/page tables. In which case, the mmap write lock should + * be hold. + * + * 2) For walking the kernel virtual address space: + * + * The kernel intermediate page tables usually do not be freed, so + * the mmap map read lock is sufficient. But there are some exceptions. + * E.g. memory hot-remove. In which case, the mmap lock is insufficient + * to prevent the intermediate kernel pages tables belonging to the + * specified address range from being freed. The caller should take + * other actions to prevent this race. + */ + if (mm =3D=3D &init_mm) + mmap_assert_locked(walk.mm); + else + mmap_assert_write_locked(walk.mm); =20 return walk_pgd_range(start, end, &walk); } --=20 2.20.1 From nobody Wed Dec 17 12:35:42 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6776C07D5A for ; Mon, 27 Nov 2023 08:47:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232681AbjK0IrX (ORCPT ); Mon, 27 Nov 2023 03:47:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48036 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232433AbjK0IrV (ORCPT ); Mon, 27 Nov 2023 03:47:21 -0500 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDBBC10A for ; Mon, 27 Nov 2023 00:47:26 -0800 (PST) Received: by mail-pf1-x434.google.com with SMTP id d2e1a72fcca58-6cb66fbc63dso2699932b3a.0 for ; Mon, 27 Nov 2023 00:47:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701074846; x=1701679646; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=u6GtJdFrAMiD9TMiTJ2tpU0p4EP2T7FpthDyFgbUgV8=; b=JK2rLiLk5C6NgABJegVYOvqttLOTHHaXyatuv08w8tdfYGXzbNJrfbPRywfUU/+WD4 +RRgVQGDzpe8q6B1xNYcIHxG9lUUBJPiG4fQLb9scAvd0m8bH0QnbcOx8BQrNB2GMpDB pwO4YEcAinaT27aK6rdMF7PzAfdEVZqr6bt2uSNyCWTU2lFMboi9wAVseRSVhSJ6SgZh yOFPguqRUM5cuQZzN3jQjHdU38EayC8Hfvo1MZ5hlTtTHdnZXDE3tbGp2upioQgWo9YY naVg1YigsV2W/5eiZI0CqF/eGlxS4XW2VWYsvcAmM3kKz3KLuOk80eFH2kep3I9DA3Q4 30Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701074846; x=1701679646; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u6GtJdFrAMiD9TMiTJ2tpU0p4EP2T7FpthDyFgbUgV8=; b=sMt43rWJVVDbqyg2N7GXa7mH2xj14C1/Eyz13Yw/rsnF/iTVl64S3VPiMaPSloII63 gHbWdvmyzYvrkZtZgWo0csSd3s90Td/U3BMUHpUlUmrPmANIzMFnfxPv2CKEqDqW2ozz 3s3O7YYCfvx81jPKkwLDy6Ng1kBZ67bGJfdwkwOxTZMJhzEhiWRIH/avLE7izUikOgho RIKkNrTd5HNl7ygtY+kJSrB8egdUC6neCsOC4oCoC1xXuMuDwACUPaF04c77Puw9JbSp vahJ+J+foiDNmElBLOgk5TGU8jCmkLLdHZXDuACGMBYRqE+E/4sv8c07P8QTPpLnJkky zuHg== X-Gm-Message-State: AOJu0YwmKZcBRDiDeBkoWbCyN8iyZaJQVhq+T0kSMWCboXgmyzxAbmE/ ZLO5MFZWAWvwt7qtURrAAmnu3A== X-Google-Smtp-Source: AGHT+IH8dlIQBo06oYLZpoUw9SLI1+YxmO7I/y/nkByjLUJqY3NGvBpQZM1zhaq6gplUxS6RY2JFSg== X-Received: by 2002:a05:6a20:9390:b0:18c:5178:9649 with SMTP id x16-20020a056a20939000b0018c51789649mr7667438pzh.14.1701074846205; Mon, 27 Nov 2023 00:47:26 -0800 (PST) Received: from PXLDJ45XCM.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id e22-20020aa78c56000000b006c875abecbcsm6686932pfd.121.2023.11.27.00.47.23 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 27 Nov 2023 00:47:25 -0800 (PST) From: Muchun Song To: mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song Subject: [PATCH 2/4] mm: hugetlb_vmemmap: use walk_page_range_novma() to simplify the code Date: Mon, 27 Nov 2023 16:46:43 +0800 Message-Id: <20231127084645.27017-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231127084645.27017-1-songmuchun@bytedance.com> References: <20231127084645.27017-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" It is unnecessary to implement a series of dedicated page table walking helpers since there is already a general one walk_page_range_novma(). So use it to simplify the code. Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz --- mm/hugetlb_vmemmap.c | 148 ++++++++++++------------------------------- 1 file changed, 39 insertions(+), 109 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 87818ee7f01d7..ef14356855d13 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include "hugetlb_vmemmap.h" @@ -45,21 +46,14 @@ struct vmemmap_remap_walk { unsigned long flags; }; =20 -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, bool fl= ush) +static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long = start, + struct vmemmap_remap_walk *walk) { pmd_t __pmd; int i; unsigned long addr =3D start; - struct page *head; pte_t *pgtable; =20 - spin_lock(&init_mm.page_table_lock); - head =3D pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL; - spin_unlock(&init_mm.page_table_lock); - - if (!head) - return 0; - pgtable =3D pte_alloc_one_kernel(&init_mm); if (!pgtable) return -ENOMEM; @@ -88,7 +82,7 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned lo= ng start, bool flush) /* Make pte visible before pmd. See comment in pmd_install(). */ smp_wmb(); pmd_populate_kernel(&init_mm, pmd, pgtable); - if (flush) + if (!(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)) flush_tlb_kernel_range(start, start + PMD_SIZE); } else { pte_free_kernel(&init_mm, pgtable); @@ -98,123 +92,59 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned= long start, bool flush) return 0; } =20 -static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) -{ - pte_t *pte =3D pte_offset_kernel(pmd, addr); - - /* - * The reuse_page is found 'first' in table walk before we start - * remapping (which is calling @walk->remap_pte). - */ - if (!walk->reuse_page) { - walk->reuse_page =3D pte_page(ptep_get(pte)); - /* - * Because the reuse address is part of the range that we are - * walking, skip the reuse address range. - */ - addr +=3D PAGE_SIZE; - pte++; - walk->nr_walked++; - } - - for (; addr !=3D end; addr +=3D PAGE_SIZE, pte++) { - walk->remap_pte(pte, addr, walk); - walk->nr_walked++; - } -} - -static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) +static int vmemmap_pmd_entry(pmd_t *pmd, unsigned long addr, + unsigned long next, struct mm_walk *walk) { - pmd_t *pmd; - unsigned long next; - - pmd =3D pmd_offset(pud, addr); - do { - int ret; - - ret =3D split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, - !(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)); - if (ret) - return ret; + struct page *head; + struct vmemmap_remap_walk *vmemmap_walk =3D walk->private; =20 - next =3D pmd_addr_end(addr, end); + /* Only splitting, not remapping the vmemmap pages. */ + if (!vmemmap_walk->remap_pte) + walk->action =3D ACTION_CONTINUE; =20 - /* - * We are only splitting, not remapping the hugetlb vmemmap - * pages. - */ - if (!walk->remap_pte) - continue; - - vmemmap_pte_range(pmd, addr, next, walk); - } while (pmd++, addr =3D next, addr !=3D end); + spin_lock(&init_mm.page_table_lock); + head =3D pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL; + spin_unlock(&init_mm.page_table_lock); + if (!head) + return 0; =20 - return 0; + return vmemmap_split_pmd(pmd, head, addr & PMD_MASK, vmemmap_walk); } =20 -static int vmemmap_pud_range(p4d_t *p4d, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) +static int vmemmap_pte_entry(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) { - pud_t *pud; - unsigned long next; - - pud =3D pud_offset(p4d, addr); - do { - int ret; + struct vmemmap_remap_walk *vmemmap_walk =3D walk->private; =20 - next =3D pud_addr_end(addr, end); - ret =3D vmemmap_pmd_range(pud, addr, next, walk); - if (ret) - return ret; - } while (pud++, addr =3D next, addr !=3D end); + /* + * The reuse_page is found 'first' in page table walking before + * starting remapping. + */ + if (!vmemmap_walk->reuse_page) + vmemmap_walk->reuse_page =3D pte_page(ptep_get(pte)); + else + vmemmap_walk->remap_pte(pte, addr, vmemmap_walk); + vmemmap_walk->nr_walked++; =20 return 0; } =20 -static int vmemmap_p4d_range(pgd_t *pgd, unsigned long addr, - unsigned long end, - struct vmemmap_remap_walk *walk) -{ - p4d_t *p4d; - unsigned long next; - - p4d =3D p4d_offset(pgd, addr); - do { - int ret; - - next =3D p4d_addr_end(addr, end); - ret =3D vmemmap_pud_range(p4d, addr, next, walk); - if (ret) - return ret; - } while (p4d++, addr =3D next, addr !=3D end); - - return 0; -} +static const struct mm_walk_ops vmemmap_remap_ops =3D { + .pmd_entry =3D vmemmap_pmd_entry, + .pte_entry =3D vmemmap_pte_entry, +}; =20 static int vmemmap_remap_range(unsigned long start, unsigned long end, struct vmemmap_remap_walk *walk) { - unsigned long addr =3D start; - unsigned long next; - pgd_t *pgd; - - VM_BUG_ON(!PAGE_ALIGNED(start)); - VM_BUG_ON(!PAGE_ALIGNED(end)); + int ret; =20 - pgd =3D pgd_offset_k(addr); - do { - int ret; + VM_BUG_ON(!PAGE_ALIGNED(start | end)); =20 - next =3D pgd_addr_end(addr, end); - ret =3D vmemmap_p4d_range(pgd, addr, next, walk); - if (ret) - return ret; - } while (pgd++, addr =3D next, addr !=3D end); + ret =3D walk_page_range_novma(&init_mm, start, end, &vmemmap_remap_ops, + NULL, walk); + if (ret) + return ret; =20 if (walk->remap_pte && !(walk->flags & VMEMMAP_REMAP_NO_TLB_FLUSH)) flush_tlb_kernel_range(start, end); --=20 2.20.1 From nobody Wed Dec 17 12:35:42 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F712C4167B for ; Mon, 27 Nov 2023 08:47:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232702AbjK0Irt (ORCPT ); Mon, 27 Nov 2023 03:47:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232730AbjK0Irp (ORCPT ); Mon, 27 Nov 2023 03:47:45 -0500 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1072D12D for ; Mon, 27 Nov 2023 00:47:30 -0800 (PST) Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-6cbd24d9557so2680844b3a.1 for ; Mon, 27 Nov 2023 00:47:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701074849; x=1701679649; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CXU92qYOzo6aL9X8rG2XQuljOMNrDRKtELmWJz0kLI0=; b=B/YT8SJhhS2MaSe7M+ZGUJYqWiINJk7d03Pina1PcV65HL36rHV16DJzwQwtsunOjt Nm3IQThzcXPTNuyryQtp9dIQL2uNDTBjPjexjnE3ekHiqhPZUOUdziJCr+W2/OsLwjhN aHvRsoNmQGc0zzf1jGoFoAmy7tw22TrlNzPmeR1RqMzSTfMj9ZaRWWoM/ASMdjxs9v8o mY6c1uyKofT8RcqVdn/x9DFcQFHc9s44WXsDBVGpTxj3oc0UplKEVdnG1X/jLeYrjDSC 4cnxUpUzP1uRkrrQ6M0jBC81tuRw3TEhcNUA9HjHzcyHaDukAApm8yuAUZLoJZOemm+f OT1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701074849; x=1701679649; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CXU92qYOzo6aL9X8rG2XQuljOMNrDRKtELmWJz0kLI0=; b=HlgZsqSkdepdk3lo64JnyJ4RmmRRBJilIV+/xcHpefdNv7+mKNUeNbSWMyia0jv/si ZeVQlWGCDBwW0K9CZiwTDAU2qCIxPMr2OpEkM+872ZYOtmesZQ4AW/10xOPhXHrzVwZ7 OQo5iqCJBRfZoPVlTCh2S63sBmLXeJbSljWYsWRkfchd0boSSorkmBq71iKvGBrbPZhQ SkouTp+OS8tgxId/Mj6TnUThjbGrG6CSiHHoxkC7fHbLgZJ3E77vpbeAKMQdhaNdoStr fKPdFOvAvz5xLWAk+Sfd21INvOE6TgKu+40AuCF45bz+xV2b8nBOWrF9XMguQAkxuz3t WsDA== X-Gm-Message-State: AOJu0YzXc1zhXHBW3KvieZdnLJVzqqYckub91kN9yAh/a9EZhu3RISD4 wCV1Vpv6JrE92AVTqaNP7Pwyfg== X-Google-Smtp-Source: AGHT+IHXNermaEvQJyziObFn7H4znAJ2WmRDi/l1vX6FLLvpxZ6tTRgBpFuCL+ANq0CYCQTGl1dHCg== X-Received: by 2002:a05:6a00:878c:b0:6cb:cc23:f69f with SMTP id hk12-20020a056a00878c00b006cbcc23f69fmr18033800pfb.16.1701074849478; Mon, 27 Nov 2023 00:47:29 -0800 (PST) Received: from PXLDJ45XCM.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id e22-20020aa78c56000000b006c875abecbcsm6686932pfd.121.2023.11.27.00.47.26 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 27 Nov 2023 00:47:28 -0800 (PST) From: Muchun Song To: mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song Subject: [PATCH 3/4] mm: hugetlb_vmemmap: move PageVmemmapSelfHosted() check to split_vmemmap_huge_pmd() Date: Mon, 27 Nov 2023 16:46:44 +0800 Message-Id: <20231127084645.27017-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231127084645.27017-1-songmuchun@bytedance.com> References: <20231127084645.27017-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" To check a page whether it is self-hosted needs to traverse the page table = (e.g. pmd_off_k()), however, we already have done this in the next calling of vmemmap_remap_range(). Moving PageVmemmapSelfHosted() check to vmemmap_pmd= _entry() could simplify the code a bit. Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz --- mm/hugetlb_vmemmap.c | 70 +++++++++++++++----------------------------- 1 file changed, 24 insertions(+), 46 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index ef14356855d13..ce920ca6c90ee 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -95,6 +95,7 @@ static int vmemmap_split_pmd(pmd_t *pmd, struct page *hea= d, unsigned long start, static int vmemmap_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next, struct mm_walk *walk) { + int ret =3D 0; struct page *head; struct vmemmap_remap_walk *vmemmap_walk =3D walk->private; =20 @@ -104,9 +105,30 @@ static int vmemmap_pmd_entry(pmd_t *pmd, unsigned long= addr, =20 spin_lock(&init_mm.page_table_lock); head =3D pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL; + /* + * Due to HugeTLB alignment requirements and the vmemmap + * pages being at the start of the hotplugged memory + * region in memory_hotplug.memmap_on_memory case. Checking + * the vmemmap page associated with the first vmemmap page + * if it is self-hosted is sufficient. + * + * [ hotplugged memory ] + * [ section ][...][ section ] + * [ vmemmap ][ usable memory ] + * ^ | ^ | + * +--+ | | + * +------------------------+ + */ + if (unlikely(!vmemmap_walk->nr_walked)) { + struct page *page =3D head ? head + pte_index(addr) : + pte_page(ptep_get(pte_offset_kernel(pmd, addr))); + + if (PageVmemmapSelfHosted(page)) + ret =3D -ENOTSUPP; + } spin_unlock(&init_mm.page_table_lock); - if (!head) - return 0; + if (!head || ret) + return ret; =20 return vmemmap_split_pmd(pmd, head, addr & PMD_MASK, vmemmap_walk); } @@ -524,50 +546,6 @@ static bool vmemmap_should_optimize(const struct hstat= e *h, const struct page *h if (!hugetlb_vmemmap_optimizable(h)) return false; =20 - if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) { - pmd_t *pmdp, pmd; - struct page *vmemmap_page; - unsigned long vaddr =3D (unsigned long)head; - - /* - * Only the vmemmap page's vmemmap page can be self-hosted. - * Walking the page tables to find the backing page of the - * vmemmap page. - */ - pmdp =3D pmd_off_k(vaddr); - /* - * The READ_ONCE() is used to stabilize *pmdp in a register or - * on the stack so that it will stop changing under the code. - * The only concurrent operation where it can be changed is - * split_vmemmap_huge_pmd() (*pmdp will be stable after this - * operation). - */ - pmd =3D READ_ONCE(*pmdp); - if (pmd_leaf(pmd)) - vmemmap_page =3D pmd_page(pmd) + pte_index(vaddr); - else - vmemmap_page =3D pte_page(*pte_offset_kernel(pmdp, vaddr)); - /* - * Due to HugeTLB alignment requirements and the vmemmap pages - * being at the start of the hotplugged memory region in - * memory_hotplug.memmap_on_memory case. Checking any vmemmap - * page's vmemmap page if it is marked as VmemmapSelfHosted is - * sufficient. - * - * [ hotplugged memory ] - * [ section ][...][ section ] - * [ vmemmap ][ usable memory ] - * ^ | | | - * +---+ | | - * ^ | | - * +-------+ | - * ^ | - * +-------------------------------------------+ - */ - if (PageVmemmapSelfHosted(vmemmap_page)) - return false; - } - return true; } =20 --=20 2.20.1 From nobody Wed Dec 17 12:35:42 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C08FC4167B for ; Mon, 27 Nov 2023 08:47:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232696AbjK0Ira (ORCPT ); Mon, 27 Nov 2023 03:47:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232689AbjK0Ir1 (ORCPT ); Mon, 27 Nov 2023 03:47:27 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C118110A for ; Mon, 27 Nov 2023 00:47:33 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id 5614622812f47-3b8382b8f5aso2527252b6e.0 for ; Mon, 27 Nov 2023 00:47:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701074853; x=1701679653; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KDwCfd7tNwbLPqndee0T3tycLaDjfbEI5ZluCSMX2aI=; b=Zo8JzX3QIvF7cZUN/IgQLWMdmTy0dvJYt8dFAonuPDXO4tOrKowwU4tuAoqrKcFWyn 1AvSBhdcfv09Qim9z0HmAHDG1F+YsKOOzz+jxuTsyEMRmbI9Ye2jJeJ2byapoGRSqNUl yHyYd3Kpk8mTFlZEIOgLwtZ86MBExZyog5vhjGvCd/lWDCkOBYz57gqRGBifaq0PMbxP s0Y5OaA2m34PvfX4eCNJZvZs5IQhsLcbujdwUkQk0cBg3C41ktew9paAS/4sfhS5ygmR 6f70GMq5mwi39SCCp4EgqyeFLARCu41NcybmTUkp/M05jzJUnW8Crp9XXmNuAssTbeWC 4HmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701074853; x=1701679653; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KDwCfd7tNwbLPqndee0T3tycLaDjfbEI5ZluCSMX2aI=; b=sX2RerqNUNT51+Z31bPsIXxeqPjFs/3zguIQ21jBSnNEHMKZ60OGWrMelo6OwBlVgR MdenBdw5QYmE3KUh/mVhIiy+UtXM2TIyuo/ZL6+YGssjxlMfD6AAPNpXgj1iOxlCNk96 TERhAQlK2cPdif3g9aml3CIvcQBTGjXH278uCLrkFHXpmA1OJy9KAbyayNwlRhprmzSM 4jIC7q6RilV7VWsBteXSZz03ZRqHsrD23xACIBcLi2zPUDOaB7lQK10fV6IqnrYvBFje i4FqV4+miH7rBf5cHpz6uoqOxhyCUGWuFrxEnDh/OWlYOqkWvaZjQ3+hVLUMtGkm9AKz 3kBQ== X-Gm-Message-State: AOJu0YxpFmmbewvS4hHgPdAQ2IfoDE4YWzpjiwMOgaX+OBVzQOrewTma bA0NuC6+qd4F/djSOM/dd7Nuyw== X-Google-Smtp-Source: AGHT+IGjtsE1BR2K2VhxfkYoA/VOTcyqYgi8MEBB3p+kUcn0pG5OAgrvWgbA1djFn9WP71UH2Rx0sg== X-Received: by 2002:aca:1c02:0:b0:3ae:156f:d325 with SMTP id c2-20020aca1c02000000b003ae156fd325mr12357984oic.58.1701074853042; Mon, 27 Nov 2023 00:47:33 -0800 (PST) Received: from PXLDJ45XCM.bytedance.net ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id e22-20020aa78c56000000b006c875abecbcsm6686932pfd.121.2023.11.27.00.47.29 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 27 Nov 2023 00:47:32 -0800 (PST) From: Muchun Song To: mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song Subject: [PATCH 4/4] mm: hugetlb_vmemmap: convert page to folio Date: Mon, 27 Nov 2023 16:46:45 +0800 Message-Id: <20231127084645.27017-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-145) In-Reply-To: <20231127084645.27017-1-songmuchun@bytedance.com> References: <20231127084645.27017-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There is still some places where it does not be converted to folio, this patch convert all of them to folio. And this patch also does some trival cleanup to fix the code style problems. Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz --- mm/hugetlb_vmemmap.c | 51 ++++++++++++++++++++++---------------------- 1 file changed, 25 insertions(+), 26 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index ce920ca6c90ee..54f388aa361fb 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -280,7 +280,7 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned lo= ng addr, * Return: %0 on success, negative error code otherwise. */ static int vmemmap_remap_split(unsigned long start, unsigned long end, - unsigned long reuse) + unsigned long reuse) { int ret; struct vmemmap_remap_walk walk =3D { @@ -447,14 +447,14 @@ EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); static bool vmemmap_optimize_enabled =3D IS_ENABLED(CONFIG_HUGETLB_PAGE_OP= TIMIZE_VMEMMAP_DEFAULT_ON); core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0); =20 -static int __hugetlb_vmemmap_restore_folio(const struct hstate *h, struct = folio *folio, unsigned long flags) +static int __hugetlb_vmemmap_restore_folio(const struct hstate *h, + struct folio *folio, unsigned long flags) { int ret; - struct page *head =3D &folio->page; - unsigned long vmemmap_start =3D (unsigned long)head, vmemmap_end; + unsigned long vmemmap_start =3D (unsigned long)&folio->page, vmemmap_end; unsigned long vmemmap_reuse; =20 - VM_WARN_ON_ONCE(!PageHuge(head)); + VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(folio), folio); if (!folio_test_hugetlb_vmemmap_optimized(folio)) return 0; =20 @@ -517,7 +517,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate= *h, list_for_each_entry_safe(folio, t_folio, folio_list, lru) { if (folio_test_hugetlb_vmemmap_optimized(folio)) { ret =3D __hugetlb_vmemmap_restore_folio(h, folio, - VMEMMAP_REMAP_NO_TLB_FLUSH); + VMEMMAP_REMAP_NO_TLB_FLUSH); if (ret) break; restored++; @@ -535,9 +535,9 @@ long hugetlb_vmemmap_restore_folios(const struct hstate= *h, } =20 /* Return true iff a HugeTLB whose vmemmap should and can be optimized. */ -static bool vmemmap_should_optimize(const struct hstate *h, const struct p= age *head) +static bool vmemmap_should_optimize_folio(const struct hstate *h, struct f= olio *folio) { - if (HPageVmemmapOptimized((struct page *)head)) + if (folio_test_hugetlb_vmemmap_optimized(folio)) return false; =20 if (!READ_ONCE(vmemmap_optimize_enabled)) @@ -550,17 +550,16 @@ static bool vmemmap_should_optimize(const struct hsta= te *h, const struct page *h } =20 static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h, - struct folio *folio, - struct list_head *vmemmap_pages, - unsigned long flags) + struct folio *folio, + struct list_head *vmemmap_pages, + unsigned long flags) { int ret =3D 0; - struct page *head =3D &folio->page; - unsigned long vmemmap_start =3D (unsigned long)head, vmemmap_end; + unsigned long vmemmap_start =3D (unsigned long)&folio->page, vmemmap_end; unsigned long vmemmap_reuse; =20 - VM_WARN_ON_ONCE(!PageHuge(head)); - if (!vmemmap_should_optimize(h, head)) + VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(folio), folio); + if (!vmemmap_should_optimize_folio(h, folio)) return ret; =20 static_branch_inc(&hugetlb_optimize_vmemmap_key); @@ -588,7 +587,7 @@ static int __hugetlb_vmemmap_optimize_folio(const struc= t hstate *h, * the caller. */ ret =3D vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, - vmemmap_pages, flags); + vmemmap_pages, flags); if (ret) { static_branch_dec(&hugetlb_optimize_vmemmap_key); folio_clear_hugetlb_vmemmap_optimized(folio); @@ -615,12 +614,12 @@ void hugetlb_vmemmap_optimize_folio(const struct hsta= te *h, struct folio *folio) free_vmemmap_page_list(&vmemmap_pages); } =20 -static int hugetlb_vmemmap_split(const struct hstate *h, struct page *head) +static int hugetlb_vmemmap_split_folio(const struct hstate *h, struct foli= o *folio) { - unsigned long vmemmap_start =3D (unsigned long)head, vmemmap_end; + unsigned long vmemmap_start =3D (unsigned long)&folio->page, vmemmap_end; unsigned long vmemmap_reuse; =20 - if (!vmemmap_should_optimize(h, head)) + if (!vmemmap_should_optimize_folio(h, folio)) return 0; =20 vmemmap_end =3D vmemmap_start + hugetlb_vmemmap_size(h); @@ -640,7 +639,7 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, = struct list_head *folio_l LIST_HEAD(vmemmap_pages); =20 list_for_each_entry(folio, folio_list, lru) { - int ret =3D hugetlb_vmemmap_split(h, &folio->page); + int ret =3D hugetlb_vmemmap_split_folio(h, folio); =20 /* * Spliting the PMD requires allocating a page, thus lets fail @@ -655,9 +654,10 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h,= struct list_head *folio_l flush_tlb_all(); =20 list_for_each_entry(folio, folio_list, lru) { - int ret =3D __hugetlb_vmemmap_optimize_folio(h, folio, - &vmemmap_pages, - VMEMMAP_REMAP_NO_TLB_FLUSH); + int ret; + + ret =3D __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, + VMEMMAP_REMAP_NO_TLB_FLUSH); =20 /* * Pages to be freed may have been accumulated. If we @@ -671,9 +671,8 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, = struct list_head *folio_l flush_tlb_all(); free_vmemmap_page_list(&vmemmap_pages); INIT_LIST_HEAD(&vmemmap_pages); - __hugetlb_vmemmap_optimize_folio(h, folio, - &vmemmap_pages, - VMEMMAP_REMAP_NO_TLB_FLUSH); + __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, + VMEMMAP_REMAP_NO_TLB_FLUSH); } } =20 --=20 2.20.1