From nobody Wed Sep 10 06:52:05 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B433C00528 for ; Tue, 1 Aug 2023 22:07:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229694AbjHAWHs (ORCPT ); Tue, 1 Aug 2023 18:07:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230256AbjHAWHl (ORCPT ); Tue, 1 Aug 2023 18:07:41 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6ECB119BE for ; Tue, 1 Aug 2023 15:07:39 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5840614b107so75692987b3.1 for ; Tue, 01 Aug 2023 15:07:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690927658; x=1691532458; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=71RLrK6Mb0aadLJnf8lNJkVJrOaKPnZ5THHnlI7+5Ec=; b=We6ajMOjm3XiLm7Fm134AiRtQwnSaKR7S3ytCgA240I8V4Cqb5DZkxb3c4VMaCqc0w V9+bLiaYnBwtT7C9b9kFVxy1bS64OrsWrAPLxZe0GmEClT2rmBC2Z2xa+eQmQ5FWLxHC yKOrz9bPWJ1QsP/sCqkL5hEzAEJ4p0Oq7uGE6QZf2POo/0NInGTznuALO13yDaRSnc2p oZZzhAMRwlN0xh4Oa0sVS9pWPrSw8B9WQU8aZavQe2FOuTA14a9EsXtVQo3/CWEI9VAF I/xuYpBEFF/+CKYPkLJOsJbz86ql23LK4Gz+qkaslMFUpxQhmC5GMs/A1z5mZyNEbgLj 4MtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690927658; x=1691532458; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=71RLrK6Mb0aadLJnf8lNJkVJrOaKPnZ5THHnlI7+5Ec=; b=LmvxBdwTId05A/zECdcatTJmA5K4tdCX2K2adGva/RiJFvbMoJ+DHUQPz1I8A/wGEZ NhAlvjy9UR/rCrY4EvTw82w/znIUepCrHoQh1LQiKKkiqR2kEfZu04WKuwwgzgN6HAzg YsVnhXSKmN+l3f/iA53gzYOinRjZYB1jidNh3H5tS0fsfwsfmgNUwbHCVAnPHwNGcF4/ /pVaIRux9Rua1J3ELn+/ka/Wpo9oYqmA2plxk9zhG0VFMx63s74HJ/svejabQOCiyv3e +z4Y7dV42IXTbP1eE67Eo+enf/vstNqsMmRmrISQBJKT0TjuYPfPL64YuTx9zp4I6dbs XqOw== X-Gm-Message-State: ABy/qLYJIRW9EplhNNzhsyP68LNHkW7Hf6os+DQ6JkZst5mI4UdEewje uOCK3HQPdXRRAp7YxZupV5VKRrWXmN8= X-Google-Smtp-Source: APBJJlFGwkEKt2rXX6M4bqakEpSaHL2RCGKxP3yPF2zEGheLcWn6DqToyN/J0zTqHXgclRZUtfHfONSz1es= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:211c:a2ff:f17b:c5e9]) (user=surenb job=sendgmr) by 2002:a25:f89:0:b0:d0b:c67:de3b with SMTP id 131-20020a250f89000000b00d0b0c67de3bmr92009ybp.13.1690927658714; Tue, 01 Aug 2023 15:07:38 -0700 (PDT) Date: Tue, 1 Aug 2023 15:07:27 -0700 In-Reply-To: <20230801220733.1987762-1-surenb@google.com> Mime-Version: 1.0 References: <20230801220733.1987762-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.585.gd2178a4bd4-goog Message-ID: <20230801220733.1987762-2-surenb@google.com> Subject: [PATCH v2 1/6] mm: enable page walking API to lock vmas during the walk From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org, jannh@google.com, willy@infradead.org, liam.howlett@oracle.com, david@redhat.com, peterx@redhat.com, ldufour@linux.ibm.com, vbabka@suse.cz, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, hannes@cmpxchg.org, dave@stgolabs.net, hughd@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org, Suren Baghdasaryan , Linus Torvalds Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" walk_page_range() and friends often operate under write-locked mmap_lock. With introduction of vma locks, the vmas have to be locked as well during such walks to prevent concurrent page faults in these areas. Add an additional member to mm_walk_ops to indicate locking requirements for the walk. Cc: stable@vger.kernel.org # 6.4.x Suggested-by: Linus Torvalds Suggested-by: Jann Horn Signed-off-by: Suren Baghdasaryan --- arch/powerpc/mm/book3s64/subpage_prot.c | 1 + arch/riscv/mm/pageattr.c | 1 + arch/s390/mm/gmap.c | 5 ++++ fs/proc/task_mmu.c | 5 ++++ include/linux/pagewalk.h | 11 ++++++++ mm/damon/vaddr.c | 2 ++ mm/hmm.c | 1 + mm/ksm.c | 25 ++++++++++------- mm/madvise.c | 3 +++ mm/memcontrol.c | 2 ++ mm/memory-failure.c | 1 + mm/mempolicy.c | 22 +++++++++------ mm/migrate_device.c | 1 + mm/mincore.c | 1 + mm/mlock.c | 1 + mm/mprotect.c | 1 + mm/pagewalk.c | 36 ++++++++++++++++++++++--- mm/vmscan.c | 1 + 18 files changed, 100 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/mm/book3s64/subpage_prot.c b/arch/powerpc/mm/book= 3s64/subpage_prot.c index 0dc85556dec5..ec98e526167e 100644 --- a/arch/powerpc/mm/book3s64/subpage_prot.c +++ b/arch/powerpc/mm/book3s64/subpage_prot.c @@ -145,6 +145,7 @@ static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned = long addr, =20 static const struct mm_walk_ops subpage_walk_ops =3D { .pmd_entry =3D subpage_walk_pmd_entry, + .walk_lock =3D PGWALK_WRLOCK_VERIFY, }; =20 static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long ad= dr, diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c index ea3d61de065b..161d0b34c2cb 100644 --- a/arch/riscv/mm/pageattr.c +++ b/arch/riscv/mm/pageattr.c @@ -102,6 +102,7 @@ static const struct mm_walk_ops pageattr_ops =3D { .pmd_entry =3D pageattr_pmd_entry, .pte_entry =3D pageattr_pte_entry, .pte_hole =3D pageattr_pte_hole, + .walk_lock =3D PGWALK_RDLOCK, }; =20 static int __set_memory(unsigned long addr, int numpages, pgprot_t set_mas= k, diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 9c8af31be970..906a7bfc2a78 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2514,6 +2514,7 @@ static int thp_split_walk_pmd_entry(pmd_t *pmd, unsig= ned long addr, =20 static const struct mm_walk_ops thp_split_walk_ops =3D { .pmd_entry =3D thp_split_walk_pmd_entry, + .walk_lock =3D PGWALK_WRLOCK_VERIFY, }; =20 static inline void thp_split_mm(struct mm_struct *mm) @@ -2565,6 +2566,7 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long= start, =20 static const struct mm_walk_ops zap_zero_walk_ops =3D { .pmd_entry =3D __zap_zero_pages, + .walk_lock =3D PGWALK_WRLOCK, }; =20 /* @@ -2655,6 +2657,7 @@ static const struct mm_walk_ops enable_skey_walk_ops = =3D { .hugetlb_entry =3D __s390_enable_skey_hugetlb, .pte_entry =3D __s390_enable_skey_pte, .pmd_entry =3D __s390_enable_skey_pmd, + .walk_lock =3D PGWALK_WRLOCK, }; =20 int s390_enable_skey(void) @@ -2692,6 +2695,7 @@ static int __s390_reset_cmma(pte_t *pte, unsigned lon= g addr, =20 static const struct mm_walk_ops reset_cmma_walk_ops =3D { .pte_entry =3D __s390_reset_cmma, + .walk_lock =3D PGWALK_WRLOCK, }; =20 void s390_reset_cmma(struct mm_struct *mm) @@ -2728,6 +2732,7 @@ static int s390_gather_pages(pte_t *ptep, unsigned lo= ng addr, =20 static const struct mm_walk_ops gather_pages_ops =3D { .pte_entry =3D s390_gather_pages, + .walk_lock =3D PGWALK_RDLOCK, }; =20 /* diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 507cd4e59d07..ef6ee330e3be 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -758,12 +758,14 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned l= ong hmask, static const struct mm_walk_ops smaps_walk_ops =3D { .pmd_entry =3D smaps_pte_range, .hugetlb_entry =3D smaps_hugetlb_range, + .walk_lock =3D PGWALK_RDLOCK, }; =20 static const struct mm_walk_ops smaps_shmem_walk_ops =3D { .pmd_entry =3D smaps_pte_range, .hugetlb_entry =3D smaps_hugetlb_range, .pte_hole =3D smaps_pte_hole, + .walk_lock =3D PGWALK_RDLOCK, }; =20 /* @@ -1245,6 +1247,7 @@ static int clear_refs_test_walk(unsigned long start, = unsigned long end, static const struct mm_walk_ops clear_refs_walk_ops =3D { .pmd_entry =3D clear_refs_pte_range, .test_walk =3D clear_refs_test_walk, + .walk_lock =3D PGWALK_WRLOCK, }; =20 static ssize_t clear_refs_write(struct file *file, const char __user *buf, @@ -1622,6 +1625,7 @@ static const struct mm_walk_ops pagemap_ops =3D { .pmd_entry =3D pagemap_pmd_range, .pte_hole =3D pagemap_pte_hole, .hugetlb_entry =3D pagemap_hugetlb_range, + .walk_lock =3D PGWALK_RDLOCK, }; =20 /* @@ -1935,6 +1939,7 @@ static int gather_hugetlb_stats(pte_t *pte, unsigned = long hmask, static const struct mm_walk_ops show_numa_ops =3D { .hugetlb_entry =3D gather_hugetlb_stats, .pmd_entry =3D gather_pte_stats, + .walk_lock =3D PGWALK_RDLOCK, }; =20 /* diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 27a6df448ee5..27cd1e59ccf7 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -6,6 +6,16 @@ =20 struct mm_walk; =20 +/* Locking requirement during a page walk. */ +enum page_walk_lock { + /* mmap_lock should be locked for read to stabilize the vma tree */ + PGWALK_RDLOCK =3D 0, + /* vma will be write-locked during the walk */ + PGWALK_WRLOCK =3D 1, + /* vma is expected to be already write-locked during the walk */ + PGWALK_WRLOCK_VERIFY =3D 2, +}; + /** * struct mm_walk_ops - callbacks for walk_page_range * @pgd_entry: if set, called for each non-empty PGD (top-level) entry @@ -66,6 +76,7 @@ struct mm_walk_ops { int (*pre_vma)(unsigned long start, unsigned long end, struct mm_walk *walk); void (*post_vma)(struct mm_walk *walk); + enum page_walk_lock walk_lock; }; =20 /* diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 2fcc9731528a..e0e59d420fca 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -386,6 +386,7 @@ static int damon_mkold_hugetlb_entry(pte_t *pte, unsign= ed long hmask, static const struct mm_walk_ops damon_mkold_ops =3D { .pmd_entry =3D damon_mkold_pmd_entry, .hugetlb_entry =3D damon_mkold_hugetlb_entry, + .walk_lock =3D PGWALK_RDLOCK, }; =20 static void damon_va_mkold(struct mm_struct *mm, unsigned long addr) @@ -525,6 +526,7 @@ static int damon_young_hugetlb_entry(pte_t *pte, unsign= ed long hmask, static const struct mm_walk_ops damon_young_ops =3D { .pmd_entry =3D damon_young_pmd_entry, .hugetlb_entry =3D damon_young_hugetlb_entry, + .walk_lock =3D PGWALK_RDLOCK, }; =20 static bool damon_va_young(struct mm_struct *mm, unsigned long addr, diff --git a/mm/hmm.c b/mm/hmm.c index 855e25e59d8f..277ddcab4947 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -562,6 +562,7 @@ static const struct mm_walk_ops hmm_walk_ops =3D { .pte_hole =3D hmm_vma_walk_hole, .hugetlb_entry =3D hmm_vma_walk_hugetlb_entry, .test_walk =3D hmm_vma_walk_test, + .walk_lock =3D PGWALK_RDLOCK, }; =20 /** diff --git a/mm/ksm.c b/mm/ksm.c index ba266359da55..00c21fb4d94e 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -455,6 +455,12 @@ static int break_ksm_pmd_entry(pmd_t *pmd, unsigned lo= ng addr, unsigned long nex =20 static const struct mm_walk_ops break_ksm_ops =3D { .pmd_entry =3D break_ksm_pmd_entry, + .walk_lock =3D PGWALK_RDLOCK, +}; + +static const struct mm_walk_ops break_ksm_lock_vma_ops =3D { + .pmd_entry =3D break_ksm_pmd_entry, + .walk_lock =3D PGWALK_WRLOCK, }; =20 /* @@ -470,16 +476,17 @@ static const struct mm_walk_ops break_ksm_ops =3D { * of the process that owns 'vma'. We also do not want to enforce * protection keys here anyway. */ -static int break_ksm(struct vm_area_struct *vma, unsigned long addr) +static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool = lock_vma) { vm_fault_t ret =3D 0; + const struct mm_walk_ops *ops =3D lock_vma ? + &break_ksm_lock_vma_ops : &break_ksm_ops; =20 do { int ksm_page; =20 cond_resched(); - ksm_page =3D walk_page_range_vma(vma, addr, addr + 1, - &break_ksm_ops, NULL); + ksm_page =3D walk_page_range_vma(vma, addr, addr + 1, ops, NULL); if (WARN_ON_ONCE(ksm_page < 0)) return ksm_page; if (!ksm_page) @@ -565,7 +572,7 @@ static void break_cow(struct ksm_rmap_item *rmap_item) mmap_read_lock(mm); vma =3D find_mergeable_vma(mm, addr); if (vma) - break_ksm(vma, addr); + break_ksm(vma, addr, false); mmap_read_unlock(mm); } =20 @@ -871,7 +878,7 @@ static void remove_trailing_rmap_items(struct ksm_rmap_= item **rmap_list) * in cmp_and_merge_page on one of the rmap_items we would be removing. */ static int unmerge_ksm_pages(struct vm_area_struct *vma, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, bool lock_vma) { unsigned long addr; int err =3D 0; @@ -882,7 +889,7 @@ static int unmerge_ksm_pages(struct vm_area_struct *vma, if (signal_pending(current)) err =3D -ERESTARTSYS; else - err =3D break_ksm(vma, addr); + err =3D break_ksm(vma, addr, lock_vma); } return err; } @@ -1029,7 +1036,7 @@ static int unmerge_and_remove_all_rmap_items(void) if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma) continue; err =3D unmerge_ksm_pages(vma, - vma->vm_start, vma->vm_end); + vma->vm_start, vma->vm_end, false); if (err) goto error; } @@ -2530,7 +2537,7 @@ static int __ksm_del_vma(struct vm_area_struct *vma) return 0; =20 if (vma->anon_vma) { - err =3D unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end); + err =3D unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end, true); if (err) return err; } @@ -2668,7 +2675,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned = long start, return 0; /* just ignore the advice */ =20 if (vma->anon_vma) { - err =3D unmerge_ksm_pages(vma, start, end); + err =3D unmerge_ksm_pages(vma, start, end, true); if (err) return err; } diff --git a/mm/madvise.c b/mm/madvise.c index 886f06066622..bfe0e06427bd 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -233,6 +233,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned l= ong start, =20 static const struct mm_walk_ops swapin_walk_ops =3D { .pmd_entry =3D swapin_walk_pmd_entry, + .walk_lock =3D PGWALK_RDLOCK, }; =20 static void shmem_swapin_range(struct vm_area_struct *vma, @@ -534,6 +535,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, =20 static const struct mm_walk_ops cold_walk_ops =3D { .pmd_entry =3D madvise_cold_or_pageout_pte_range, + .walk_lock =3D PGWALK_RDLOCK, }; =20 static void madvise_cold_page_range(struct mmu_gather *tlb, @@ -757,6 +759,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, =20 static const struct mm_walk_ops madvise_free_walk_ops =3D { .pmd_entry =3D madvise_free_pte_range, + .walk_lock =3D PGWALK_RDLOCK, }; =20 static int madvise_free_single_vma(struct vm_area_struct *vma, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e8ca4bdcb03c..315fd5f45e3c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6024,6 +6024,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t= *pmd, =20 static const struct mm_walk_ops precharge_walk_ops =3D { .pmd_entry =3D mem_cgroup_count_precharge_pte_range, + .walk_lock =3D PGWALK_RDLOCK, }; =20 static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) @@ -6303,6 +6304,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pm= d, =20 static const struct mm_walk_ops charge_walk_ops =3D { .pmd_entry =3D mem_cgroup_move_charge_pte_range, + .walk_lock =3D PGWALK_RDLOCK, }; =20 static void mem_cgroup_move_charge(void) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index ece5d481b5ff..6bfb762facab 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -831,6 +831,7 @@ static int hwpoison_hugetlb_range(pte_t *ptep, unsigned= long hmask, static const struct mm_walk_ops hwp_walk_ops =3D { .pmd_entry =3D hwpoison_pte_range, .hugetlb_entry =3D hwpoison_hugetlb_range, + .walk_lock =3D PGWALK_RDLOCK, }; =20 /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index c53f8beeb507..ec2eaceffd74 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -718,6 +718,14 @@ static const struct mm_walk_ops queue_pages_walk_ops = =3D { .hugetlb_entry =3D queue_folios_hugetlb, .pmd_entry =3D queue_folios_pte_range, .test_walk =3D queue_pages_test_walk, + .walk_lock =3D PGWALK_RDLOCK, +}; + +static const struct mm_walk_ops queue_pages_lock_vma_walk_ops =3D { + .hugetlb_entry =3D queue_folios_hugetlb, + .pmd_entry =3D queue_folios_pte_range, + .test_walk =3D queue_pages_test_walk, + .walk_lock =3D PGWALK_WRLOCK, }; =20 /* @@ -738,7 +746,7 @@ static const struct mm_walk_ops queue_pages_walk_ops = =3D { static int queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long= end, nodemask_t *nodes, unsigned long flags, - struct list_head *pagelist) + struct list_head *pagelist, bool lock_vma) { int err; struct queue_pages qp =3D { @@ -749,8 +757,10 @@ queue_pages_range(struct mm_struct *mm, unsigned long = start, unsigned long end, .end =3D end, .first =3D NULL, }; + const struct mm_walk_ops *ops =3D lock_vma ? + &queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops; =20 - err =3D walk_page_range(mm, start, end, &queue_pages_walk_ops, &qp); + err =3D walk_page_range(mm, start, end, ops, &qp); =20 if (!qp.first) /* whole range in hole */ @@ -1078,7 +1088,7 @@ static int migrate_to_node(struct mm_struct *mm, int = source, int dest, vma =3D find_vma(mm, 0); VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))); queue_pages_range(mm, vma->vm_start, mm->task_size, &nmask, - flags | MPOL_MF_DISCONTIG_OK, &pagelist); + flags | MPOL_MF_DISCONTIG_OK, &pagelist, false); =20 if (!list_empty(&pagelist)) { err =3D migrate_pages(&pagelist, alloc_migration_target, NULL, @@ -1321,12 +1331,8 @@ static long do_mbind(unsigned long start, unsigned l= ong len, * Lock the VMAs before scanning for pages to migrate, to ensure we don't * miss a concurrently inserted page. */ - vma_iter_init(&vmi, mm, start); - for_each_vma_range(vmi, vma, end) - vma_start_write(vma); - ret =3D queue_pages_range(mm, start, end, nmask, - flags | MPOL_MF_INVERT, &pagelist); + flags | MPOL_MF_INVERT, &pagelist, true); =20 if (ret < 0) { err =3D ret; diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 8365158460ed..d5f492356e3e 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -279,6 +279,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, static const struct mm_walk_ops migrate_vma_walk_ops =3D { .pmd_entry =3D migrate_vma_collect_pmd, .pte_hole =3D migrate_vma_collect_hole, + .walk_lock =3D PGWALK_RDLOCK, }; =20 /* diff --git a/mm/mincore.c b/mm/mincore.c index b7f7a516b26c..dad3622cc963 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -176,6 +176,7 @@ static const struct mm_walk_ops mincore_walk_ops =3D { .pmd_entry =3D mincore_pte_range, .pte_hole =3D mincore_unmapped_range, .hugetlb_entry =3D mincore_hugetlb, + .walk_lock =3D PGWALK_RDLOCK, }; =20 /* diff --git a/mm/mlock.c b/mm/mlock.c index 0a0c996c5c21..479e09d0994c 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -371,6 +371,7 @@ static void mlock_vma_pages_range(struct vm_area_struct= *vma, { static const struct mm_walk_ops mlock_walk_ops =3D { .pmd_entry =3D mlock_pte_range, + .walk_lock =3D PGWALK_WRLOCK_VERIFY, }; =20 /* diff --git a/mm/mprotect.c b/mm/mprotect.c index 6f658d483704..3aef1340533a 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -568,6 +568,7 @@ static const struct mm_walk_ops prot_none_walk_ops =3D { .pte_entry =3D prot_none_pte_entry, .hugetlb_entry =3D prot_none_hugetlb_entry, .test_walk =3D prot_none_test, + .walk_lock =3D PGWALK_WRLOCK, }; =20 int diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 2022333805d3..9b2d23fbf4d3 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -400,6 +400,33 @@ static int __walk_page_range(unsigned long start, unsi= gned long end, return err; } =20 +static inline void process_mm_walk_lock(struct mm_struct *mm, + enum page_walk_lock walk_lock) +{ + if (walk_lock =3D=3D PGWALK_RDLOCK) + mmap_assert_locked(mm); + else + mmap_assert_write_locked(mm); +} + +static inline void process_vma_walk_lock(struct vm_area_struct *vma, + enum page_walk_lock walk_lock) +{ +#ifdef CONFIG_PER_VMA_LOCK + switch (walk_lock) { + case PGWALK_WRLOCK: + vma_start_write(vma); + break; + case PGWALK_WRLOCK_VERIFY: + vma_assert_write_locked(vma); + break; + case PGWALK_RDLOCK: + /* PGWALK_RDLOCK is handled by process_mm_walk_lock */ + break; + } +#endif +} + /** * walk_page_range - walk page table with caller specific callbacks * @mm: mm_struct representing the target process of page table walk @@ -459,7 +486,7 @@ int walk_page_range(struct mm_struct *mm, unsigned long= start, if (!walk.mm) return -EINVAL; =20 - mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); =20 vma =3D find_vma(walk.mm, start); do { @@ -474,6 +501,7 @@ int walk_page_range(struct mm_struct *mm, unsigned long= start, if (ops->pte_hole) err =3D ops->pte_hole(start, next, -1, &walk); } else { /* inside vma */ + process_vma_walk_lock(vma, ops->walk_lock); walk.vma =3D vma; next =3D min(end, vma->vm_end); vma =3D find_vma(mm, vma->vm_end); @@ -549,7 +577,8 @@ int walk_page_range_vma(struct vm_area_struct *vma, uns= igned long start, if (start < vma->vm_start || end > vma->vm_end) return -EINVAL; =20 - mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); + process_vma_walk_lock(vma, ops->walk_lock); return __walk_page_range(start, end, &walk); } =20 @@ -566,7 +595,8 @@ int walk_page_vma(struct vm_area_struct *vma, const str= uct mm_walk_ops *ops, if (!walk.mm) return -EINVAL; =20 - mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); + process_vma_walk_lock(vma, ops->walk_lock); return __walk_page_range(vma->vm_start, vma->vm_end, &walk); } =20 diff --git a/mm/vmscan.c b/mm/vmscan.c index 1080209a568b..3555927df9b5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4284,6 +4284,7 @@ static void walk_mm(struct lruvec *lruvec, struct mm_= struct *mm, struct lru_gen_ static const struct mm_walk_ops mm_walk_ops =3D { .test_walk =3D should_skip_vma, .p4d_entry =3D walk_pud_range, + .walk_lock =3D PGWALK_RDLOCK, }; =20 int err; --=20 2.41.0.585.gd2178a4bd4-goog