From nobody Fri Dec 19 15:46:31 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0F361257ACF for ; Mon, 19 May 2025 07:48:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747640930; cv=none; b=U4Ow2uLRnS+Yvb5Ipzb30DCH5JcPhkjAvsD6TF9tMC0yB+tJQEg92Hkr4yAkTvU3QHaz+gJURBLmFnBg1SdjOXd2sHrMkxMbSnBuf6pXM7mjjnSJa1zYGG9qgv1bC78P3ti75ZVTV+oD0KkT36pRDKjsjY6LyehtA0YcLwJ9tug= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747640930; c=relaxed/simple; bh=P+7mA+e0mNdPIFNh9dUQuRv6rqP2ptUkjxEq2xLIjgA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Xg/+cjwqCKvtm0xX8HLaX9kCjOii/pDBE+pYO/j1g6WYwFVQMfw7DyvTT8lxsE7DSt4OQkbxQszwhhKVBg5xe0SIV5zvOcBD2gghtgCQqZBY6a/RRhYZ/7pRRkrSgQ0sQ/RB3QvPSIgFqaVz9Lu6AqJlByy5/dlXifBv6R1KeA0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5444B168F; Mon, 19 May 2025 00:48:35 -0700 (PDT) Received: from K4MQJ0H1H2.blr.arm.com (unknown [10.164.18.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9E5D03F6A8; Mon, 19 May 2025 00:48:41 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org Cc: ryan.roberts@arm.com, david@redhat.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, anshuman.khandual@arm.com, peterx@redhat.com, joey.gouly@arm.com, ioworker0@gmail.com, baohua@kernel.org, kevin.brodsky@arm.com, quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu, yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org, hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com, Dev Jain Subject: [PATCH v3 1/5] mm: Optimize mprotect() by batch-skipping PTEs Date: Mon, 19 May 2025 13:18:20 +0530 Message-Id: <20250519074824.42909-2-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250519074824.42909-1-dev.jain@arm.com> References: <20250519074824.42909-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In case of prot_numa, there are various cases in which we can skip to the next iteration. Since the skip condition is based on the folio and not the PTEs, we can skip a PTE batch. Signed-off-by: Dev Jain Reviewed-by: Ryan Roberts --- mm/mprotect.c | 36 +++++++++++++++++++++++++++++------- 1 file changed, 29 insertions(+), 7 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 88608d0dc2c2..1ee160ed0b14 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -83,6 +83,18 @@ bool can_change_pte_writable(struct vm_area_struct *vma,= unsigned long addr, return pte_dirty(pte); } =20 +static int mprotect_batch(struct folio *folio, unsigned long addr, pte_t *= ptep, + pte_t pte, int max_nr_ptes) +{ + const fpb_t flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; + + if (!folio_test_large(folio) || (max_nr_ptes =3D=3D 1)) + return 1; + + return folio_pte_batch(folio, addr, ptep, pte, max_nr_ptes, flags, + NULL, NULL, NULL); +} + static long change_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) @@ -94,6 +106,7 @@ static long change_pte_range(struct mmu_gather *tlb, bool prot_numa =3D cp_flags & MM_CP_PROT_NUMA; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; + int nr_ptes; =20 tlb_change_page_size(tlb, PAGE_SIZE); pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -108,8 +121,10 @@ static long change_pte_range(struct mmu_gather *tlb, flush_tlb_batched_pending(vma->vm_mm); arch_enter_lazy_mmu_mode(); do { + nr_ptes =3D 1; oldpte =3D ptep_get(pte); if (pte_present(oldpte)) { + int max_nr_ptes =3D (end - addr) >> PAGE_SHIFT; pte_t ptent; =20 /* @@ -126,15 +141,18 @@ static long change_pte_range(struct mmu_gather *tlb, continue; =20 folio =3D vm_normal_folio(vma, addr, oldpte); - if (!folio || folio_is_zone_device(folio) || - folio_test_ksm(folio)) + if (!folio) continue; =20 + if (folio_is_zone_device(folio) || + folio_test_ksm(folio)) + goto skip_batch; + /* Also skip shared copy-on-write pages */ if (is_cow_mapping(vma->vm_flags) && (folio_maybe_dma_pinned(folio) || folio_maybe_mapped_shared(folio))) - continue; + goto skip_batch; =20 /* * While migration can move some dirty pages, @@ -143,7 +161,7 @@ static long change_pte_range(struct mmu_gather *tlb, */ if (folio_is_file_lru(folio) && folio_test_dirty(folio)) - continue; + goto skip_batch; =20 /* * Don't mess with PTEs if page is already on the node @@ -151,7 +169,7 @@ static long change_pte_range(struct mmu_gather *tlb, */ nid =3D folio_nid(folio); if (target_node =3D=3D nid) - continue; + goto skip_batch; toptier =3D node_is_toptier(nid); =20 /* @@ -159,8 +177,12 @@ static long change_pte_range(struct mmu_gather *tlb, * balancing is disabled */ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) && - toptier) + toptier) { +skip_batch: + nr_ptes =3D mprotect_batch(folio, addr, pte, + oldpte, max_nr_ptes); continue; + } if (folio_use_access_time(folio)) folio_xchg_access_time(folio, jiffies_to_msecs(jiffies)); @@ -280,7 +302,7 @@ static long change_pte_range(struct mmu_gather *tlb, pages++; } } - } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); + } while (pte +=3D nr_ptes, addr +=3D nr_ptes * PAGE_SIZE, addr !=3D end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); =20 --=20 2.30.2 From nobody Fri Dec 19 15:46:31 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 56F3D268C76 for ; Mon, 19 May 2025 07:48:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747640938; cv=none; b=l+6cefCvHLOO2OdUq4uuhjmj4hf6NR5Vxpdk4y7tIpPWphuZH3+lVObDSWh7S/aYVBa3jUUOvxzw/qNk8s0fed1+jwN519gN5fmYG3FTJbxZbMGzZRzqODhRUWC+CoGXp7jykq2bLRkCsA9eEN8DgnE8HEimRz3MDzqF5vu/ltw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747640938; c=relaxed/simple; bh=cPrKA+4x+iJul5/t6t3bf5TbqXask2D3t7NGX0HJN7M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hbo/sNXWh7u52tqJfQ4gdknQrTg8wuDx697YVVZZrqLrbxXKlIWP8qn/zedvnltUkk2ZfsBVf/EKER2kV6JB1Xse7w7YDwTc8nz7ihB26uKnno2z/KWz9GjmUJACKHXuOQ+/wk9mHq8vx8b2mW7/vruQuqF87cIYoFeafoJZPSI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7E96C14BF; Mon, 19 May 2025 00:48:42 -0700 (PDT) Received: from K4MQJ0H1H2.blr.arm.com (unknown [10.164.18.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id CEC893F6A8; Mon, 19 May 2025 00:48:48 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org Cc: ryan.roberts@arm.com, david@redhat.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, anshuman.khandual@arm.com, peterx@redhat.com, joey.gouly@arm.com, ioworker0@gmail.com, baohua@kernel.org, kevin.brodsky@arm.com, quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu, yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org, hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com, Dev Jain Subject: [PATCH v3 2/5] mm: Add batched versions of ptep_modify_prot_start/commit Date: Mon, 19 May 2025 13:18:21 +0530 Message-Id: <20250519074824.42909-3-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250519074824.42909-1-dev.jain@arm.com> References: <20250519074824.42909-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Batch ptep_modify_prot_start/commit in preparation for optimizing mprotect. Architecture can override these helpers; in case not, they are implemented as a simple loop over the corresponding single pte helpers. Signed-off-by: Dev Jain --- include/linux/pgtable.h | 75 +++++++++++++++++++++++++++++++++++++++++ mm/mprotect.c | 4 +-- 2 files changed, 77 insertions(+), 2 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index b50447ef1c92..e40ed57e034d 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1333,6 +1333,81 @@ static inline void ptep_modify_prot_commit(struct vm= _area_struct *vma, __ptep_modify_prot_commit(vma, addr, ptep, pte); } #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */ + +/** + * modify_prot_start_ptes - Start a pte protection read-modify-write trans= action + * over a batch of ptes, which protects against asynchronous hardware modi= fications + * to the ptes. The intention is not to prevent the hardware from making p= te + * updates, but to prevent any updates it may make from being lost. + * Please see the comment above ptep_modify_prot_start() for full descript= ion. + * + * @vma: The virtual memory area the pages are mapped into. + * @addr: Address the first page is mapped at. + * @ptep: Page table pointer for the first entry. + * @nr: Number of entries. + * + * May be overridden by the architecture; otherwise, implemented as a simp= le + * loop over ptep_modify_prot_start(), collecting the a/d bits of the mapp= ed + * folio. + * + * Note that PTE bits in the PTE range besides the PFN can differ. + * + * Context: The caller holds the page table lock. The PTEs map consecutive + * pages that belong to the same folio. The PTEs are all in the same PMD. + */ +#ifndef modify_prot_start_ptes +static inline pte_t modify_prot_start_ptes(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, unsigned int nr) +{ + pte_t pte, tmp_pte; + + pte =3D ptep_modify_prot_start(vma, addr, ptep); + while (--nr) { + ptep++; + addr +=3D PAGE_SIZE; + tmp_pte =3D ptep_modify_prot_start(vma, addr, ptep); + if (pte_dirty(tmp_pte)) + pte =3D pte_mkdirty(pte); + if (pte_young(tmp_pte)) + pte =3D pte_mkyoung(pte); + } + return pte; +} +#endif + +/** + * modify_prot_commit_ptes - Commit an update to a batch of ptes, leaving = any + * hardware-controlled bits in the PTE unmodified. + * + * @vma: The virtual memory area the pages are mapped into. + * @addr: Address the first page is mapped at. + * @ptep: Page table pointer for the first entry. + * @nr: Number of entries. + * + * May be overridden by the architecture; otherwise, implemented as a simp= le + * loop over ptep_modify_prot_commit(). + * + * Note that PTE bits in the PTE range besides the PFN can differ. + * + * Context: The caller holds the page table lock. The PTEs map consecutive + * pages that belong to the same folio. The PTEs are all in the same PMD. + */ +#ifndef modify_prot_commit_ptes +static inline void modify_prot_commit_ptes(struct vm_area_struct *vma, uns= igned long addr, + pte_t *ptep, pte_t old_pte, pte_t pte, unsigned int nr) +{ + int i; + + for (i =3D 0; i < nr; ++i) { + ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte); + ptep++; + addr +=3D PAGE_SIZE; + old_pte =3D pte_next_pfn(old_pte); + pte =3D pte_next_pfn(pte); + } +} +#endif + #endif /* CONFIG_MMU */ =20 /* diff --git a/mm/mprotect.c b/mm/mprotect.c index 1ee160ed0b14..124612ce3d24 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -188,7 +188,7 @@ static long change_pte_range(struct mmu_gather *tlb, jiffies_to_msecs(jiffies)); } =20 - oldpte =3D ptep_modify_prot_start(vma, addr, pte); + oldpte =3D modify_prot_start_ptes(vma, addr, pte, nr_ptes); ptent =3D pte_modify(oldpte, newprot); =20 if (uffd_wp) @@ -214,7 +214,7 @@ static long change_pte_range(struct mmu_gather *tlb, can_change_pte_writable(vma, addr, ptent)) ptent =3D pte_mkwrite(ptent, vma); =20 - ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); + modify_prot_commit_ptes(vma, addr, pte, oldpte, ptent, nr_ptes); if (pte_needs_flush(oldpte, ptent)) tlb_flush_pte_range(tlb, addr, PAGE_SIZE); pages++; --=20 2.30.2 From nobody Fri Dec 19 15:46:31 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5847D2690D1 for ; Mon, 19 May 2025 07:49:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747640945; cv=none; b=cZgKcWkGq3ddhBX0RNEQMI+2X5TlVDdjkhFDbXc2b9Qwx/R7XA7y4Ew2hzpOnQdo8LYWrKClVfLpHGDADIk8FrcCQLLfX4EoHQTXPH8PhZO8rUDDJhLq7rgNTmjICCcQxiLyaC5bYYj1uuqgPm2QGbEG26+yTzIOWTpU/8jX2Pc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747640945; c=relaxed/simple; bh=QQ0iSb0HwHIad4vV84x6DnjfGBqBis3g6pyoVTLcDKg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=be1hg72NRpeSNIY/7dW0PEyZqx/rNsKQ6uYt65+fs6kE8HNPPCeGOF5MLLnhKMtWD+X82LOGdv1ECWqPMVOSpLGksDY31d6T3Rr5AI4RtFVxz8xjtEHh79HKKpMMOXKMEsjXGcoWcKFbEJuzgtT6fcyaEEFxg+vh6WFaOFlVOQw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AFEEF14BF; Mon, 19 May 2025 00:48:49 -0700 (PDT) Received: from K4MQJ0H1H2.blr.arm.com (unknown [10.164.18.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 058463F6A8; Mon, 19 May 2025 00:48:55 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org Cc: ryan.roberts@arm.com, david@redhat.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, anshuman.khandual@arm.com, peterx@redhat.com, joey.gouly@arm.com, ioworker0@gmail.com, baohua@kernel.org, kevin.brodsky@arm.com, quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu, yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org, hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com, Dev Jain Subject: [PATCH v3 3/5] mm: Optimize mprotect() by PTE batching Date: Mon, 19 May 2025 13:18:22 +0530 Message-Id: <20250519074824.42909-4-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250519074824.42909-1-dev.jain@arm.com> References: <20250519074824.42909-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use folio_pte_batch to batch process a large folio. Reuse the folio from pr= ot_numa case if possible. Since modify_prot_start_ptes() gathers access/dirty bits, it lets us batch around pte_needs_flush() (for parisc, the definition inclu= des the access bit). For all cases other than the PageAnonExclusive case, if the case holds true for one pte in the batch, one can confirm that that case will hold true for other ptes in the batch too; for pte_needs_soft_dirty_wp(), we do not pass FPB_IGNORE_SOFT_DIRTY. modify_prot_start_ptes() collects the dirty and acce= ss bits across the batch, therefore batching across pte_dirty(): this is correct si= nce the dirty bit on the PTE really is just an indication that the folio got wr= itten to, so even if the PTE is not actually dirty (but one of the PTEs in the ba= tch is), the wp-fault optimization can be made. The crux now is how to batch around the PageAnonExclusive case; we must che= ck the corresponding condition for every single page. Therefore, from the large folio batch, we process sub batches of ptes mapping pages with the same Pag= eAnonExclusive condition, and process that sub batch, then determine and process the next = sub batch, and so on. Note that this does not cause any extra overhead; if suppose the= size of the folio batch is 512, then the sub batch processing in total will take 51= 2 iterations, which is the same as what we would have done before. Signed-off-by: Dev Jain --- include/linux/mm.h | 7 ++- mm/mprotect.c | 126 +++++++++++++++++++++++++++++++++++---------- 2 files changed, 104 insertions(+), 29 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 43748c8f3454..7d5b96f005dc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2542,8 +2542,11 @@ int get_cmdline(struct task_struct *task, char *buff= er, int buflen); #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) =20 -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long add= r, - pte_t pte); +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned long ad= dr, + pte_t pte, int max_len, int *len); +#define can_change_pte_writable(vma, addr, pte) \ + can_change_ptes_writable(vma, addr, pte, 1, NULL) + extern long change_protection(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long cp_flags); diff --git a/mm/mprotect.c b/mm/mprotect.c index 124612ce3d24..6cd8cdc168fa 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -40,25 +40,36 @@ =20 #include "internal.h" =20 -bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long add= r, - pte_t pte) +bool can_change_ptes_writable(struct vm_area_struct *vma, unsigned long ad= dr, + pte_t pte, int max_len, int *len) { struct page *page; + bool temp_ret; + bool ret; + int i; =20 - if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) - return false; + if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) { + ret =3D false; + goto out; + } =20 /* Don't touch entries that are not even readable. */ - if (pte_protnone(pte)) - return false; + if (pte_protnone(pte)) { + ret =3D false; + goto out; + } =20 /* Do we need write faults for softdirty tracking? */ - if (pte_needs_soft_dirty_wp(vma, pte)) - return false; + if (pte_needs_soft_dirty_wp(vma, pte)) { + ret =3D false; + goto out; + } =20 /* Do we need write faults for uffd-wp tracking? */ - if (userfaultfd_pte_wp(vma, pte)) - return false; + if (userfaultfd_pte_wp(vma, pte)) { + ret =3D false; + goto out; + } =20 if (!(vma->vm_flags & VM_SHARED)) { /* @@ -68,7 +79,19 @@ bool can_change_pte_writable(struct vm_area_struct *vma,= unsigned long addr, * any additional checks while holding the PT lock. */ page =3D vm_normal_page(vma, addr, pte); - return page && PageAnon(page) && PageAnonExclusive(page); + ret =3D (page && PageAnon(page) && PageAnonExclusive(page)); + if (!len) + return ret; + + /* Check how many consecutive pages are AnonExclusive or not */ + for (i =3D 1; i < max_len; ++i) { + ++page; + temp_ret =3D (page && PageAnon(page) && PageAnonExclusive(page)); + if (temp_ret !=3D ret) + break; + } + *len =3D i; + return ret; } =20 VM_WARN_ON_ONCE(is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte)); @@ -80,21 +103,55 @@ bool can_change_pte_writable(struct vm_area_struct *vm= a, unsigned long addr, * FS was already notified and we can simply mark the PTE writable * just like the write-fault handler would do. */ - return pte_dirty(pte); + ret =3D pte_dirty(pte); + +out: + /* The entire batch is guaranteed to have the same return value */ + if (len) + *len =3D max_len; + return ret; } =20 static int mprotect_batch(struct folio *folio, unsigned long addr, pte_t *= ptep, - pte_t pte, int max_nr_ptes) + pte_t pte, int max_nr_ptes, bool ignore_soft_dirty) { - const fpb_t flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; + fpb_t flags =3D FPB_IGNORE_DIRTY; =20 - if (!folio_test_large(folio) || (max_nr_ptes =3D=3D 1)) + if (ignore_soft_dirty) + flags |=3D FPB_IGNORE_SOFT_DIRTY; + + if (!folio || !folio_test_large(folio) || (max_nr_ptes =3D=3D 1)) return 1; =20 return folio_pte_batch(folio, addr, ptep, pte, max_nr_ptes, flags, NULL, NULL, NULL); } =20 +/** + * modify_sub_batch - Identifies a sub-batch which has the same return val= ue + * of can_change_pte_writable(), from within a folio batch. max_len is the + * max length of the possible sub-batch. sub_batch_idx is the offset from + * the start of the original folio batch. + */ +static int modify_sub_batch(struct vm_area_struct *vma, struct mmu_gather = *tlb, + unsigned long addr, pte_t *ptep, pte_t oldpte, pte_t ptent, + int max_len, int sub_batch_idx) +{ + unsigned long new_addr =3D addr + sub_batch_idx * PAGE_SIZE; + pte_t new_oldpte =3D pte_advance_pfn(oldpte, sub_batch_idx); + pte_t new_ptent =3D pte_advance_pfn(ptent, sub_batch_idx); + pte_t *new_ptep =3D ptep + sub_batch_idx; + int len =3D 1; + + if (can_change_ptes_writable(vma, new_addr, new_ptent, max_len, &len)) + new_ptent =3D pte_mkwrite(new_ptent, vma); + + modify_prot_commit_ptes(vma, new_addr, new_ptep, new_oldpte, new_ptent, l= en); + if (pte_needs_flush(new_oldpte, new_ptent)) + tlb_flush_pte_range(tlb, new_addr, len * PAGE_SIZE); + return len; +} + static long change_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) @@ -106,7 +163,7 @@ static long change_pte_range(struct mmu_gather *tlb, bool prot_numa =3D cp_flags & MM_CP_PROT_NUMA; bool uffd_wp =3D cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve =3D cp_flags & MM_CP_UFFD_WP_RESOLVE; - int nr_ptes; + int sub_batch_idx, max_len, len, nr_ptes; =20 tlb_change_page_size(tlb, PAGE_SIZE); pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); @@ -121,10 +178,12 @@ static long change_pte_range(struct mmu_gather *tlb, flush_tlb_batched_pending(vma->vm_mm); arch_enter_lazy_mmu_mode(); do { + sub_batch_idx =3D 0; nr_ptes =3D 1; oldpte =3D ptep_get(pte); if (pte_present(oldpte)) { int max_nr_ptes =3D (end - addr) >> PAGE_SHIFT; + struct folio *folio =3D NULL; pte_t ptent; =20 /* @@ -132,7 +191,6 @@ static long change_pte_range(struct mmu_gather *tlb, * pages. See similar comment in change_huge_pmd. */ if (prot_numa) { - struct folio *folio; int nid; bool toptier; =20 @@ -180,7 +238,8 @@ static long change_pte_range(struct mmu_gather *tlb, toptier) { skip_batch: nr_ptes =3D mprotect_batch(folio, addr, pte, - oldpte, max_nr_ptes); + oldpte, max_nr_ptes, + true); continue; } if (folio_use_access_time(folio)) @@ -188,6 +247,11 @@ static long change_pte_range(struct mmu_gather *tlb, jiffies_to_msecs(jiffies)); } =20 + if (!folio) + folio =3D vm_normal_folio(vma, addr, oldpte); + + nr_ptes =3D mprotect_batch(folio, addr, pte, oldpte, + max_nr_ptes, false); oldpte =3D modify_prot_start_ptes(vma, addr, pte, nr_ptes); ptent =3D pte_modify(oldpte, newprot); =20 @@ -209,15 +273,23 @@ static long change_pte_range(struct mmu_gather *tlb, * example, if a PTE is already dirty and no other * COW or special handling is required. */ - if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && - !pte_write(ptent) && - can_change_pte_writable(vma, addr, ptent)) - ptent =3D pte_mkwrite(ptent, vma); - - modify_prot_commit_ptes(vma, addr, pte, oldpte, ptent, nr_ptes); - if (pte_needs_flush(oldpte, ptent)) - tlb_flush_pte_range(tlb, addr, PAGE_SIZE); - pages++; + if (cp_flags & MM_CP_TRY_CHANGE_WRITABLE) { + max_len =3D nr_ptes; + while (sub_batch_idx < nr_ptes) { + + /* Get length of sub batch */ + len =3D modify_sub_batch(vma, tlb, addr, pte, + oldpte, ptent, max_len, + sub_batch_idx); + sub_batch_idx +=3D len; + max_len -=3D len; + } + } else { + modify_prot_commit_ptes(vma, addr, pte, oldpte, ptent, nr_ptes); + if (pte_needs_flush(oldpte, ptent)) + tlb_flush_pte_range(tlb, addr, nr_ptes * PAGE_SIZE); + } + pages +=3D nr_ptes; } else if (is_swap_pte(oldpte)) { swp_entry_t entry =3D pte_to_swp_entry(oldpte); pte_t newpte; --=20 2.30.2 From nobody Fri Dec 19 15:46:31 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 827472690FA for ; Mon, 19 May 2025 07:49:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747640952; cv=none; b=gNJqYaFSOk+Ug90q9P8VX5XGYpclbwIrKmMvp+wxo/woU9ZF7jwnAeLFbjDqG1wCSnOaCLyymJHRY0ZLW3kD7uAk6ezFGNVxOPcLYdTeQyAxeH/hbz5+Eo8JNJLwcRwDQ39ZZE5WpjOUObN+FllsV8mTNXQ90gk5oyaT0w5als0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747640952; c=relaxed/simple; bh=SUvq8+/P1bH8HQn7a35qaXBHPdO8YqeMux3tV7BsN10=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iHVM+IkucumH+cKl5BhLw04SzCgUUjcCG2oKh0qrAPfBTrnklGLVsYMtDL99g8QiYOXQ9RwHf92ClgGat94lsl1Mb3S9EzKiZSeCX+84S88O5Ict8NiJGw1+m+XMxXfDcTtWj9GE7BHD45L7k+PClbilkJ7t5o/WJpF0yBmuwlc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D5B2D14BF; Mon, 19 May 2025 00:48:56 -0700 (PDT) Received: from K4MQJ0H1H2.blr.arm.com (unknown [10.164.18.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 3537C3F6A8; Mon, 19 May 2025 00:49:02 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org Cc: ryan.roberts@arm.com, david@redhat.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, anshuman.khandual@arm.com, peterx@redhat.com, joey.gouly@arm.com, ioworker0@gmail.com, baohua@kernel.org, kevin.brodsky@arm.com, quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu, yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org, hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com, Dev Jain Subject: [PATCH v3 4/5] arm64: Add batched version of ptep_modify_prot_start Date: Mon, 19 May 2025 13:18:23 +0530 Message-Id: <20250519074824.42909-5-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250519074824.42909-1-dev.jain@arm.com> References: <20250519074824.42909-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Override the generic definition to use get_and_clear_full_ptes(). This help= er does a TLBI only for the starting and ending contpte block of the range, wh= ereas the current implementation will call ptep_get_and_clear() for every contpte= block, thus doing a TLBI on every contpte block. Therefore, we have a performance = win. The arm64 definition of pte_accessible() allows us to batch around it in cl= ear_flush_ptes(): #define pte_accessible(mm, pte) \ (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte)) All ptes are obviously present in the folio batch, and they are also valid. Signed-off-by: Dev Jain --- arch/arm64/include/asm/pgtable.h | 5 +++++ arch/arm64/mm/mmu.c | 12 +++++++++--- include/linux/pgtable.h | 4 ++++ mm/pgtable-generic.c | 16 +++++++++++----- 4 files changed, 29 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 2a77f11b78d5..8872ea5f0642 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1553,6 +1553,11 @@ extern void ptep_modify_prot_commit(struct vm_area_s= truct *vma, unsigned long addr, pte_t *ptep, pte_t old_pte, pte_t new_pte); =20 +#define modify_prot_start_ptes modify_prot_start_ptes +extern pte_t modify_prot_start_ptes(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + unsigned int nr); + #ifdef CONFIG_ARM64_CONTPTE =20 /* diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8fcf59ba39db..fe60be8774f4 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1523,7 +1523,8 @@ static int __init prevent_bootmem_remove_init(void) early_initcall(prevent_bootmem_remove_init); #endif =20 -pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep) +pte_t modify_prot_start_ptes(struct vm_area_struct *vma, unsigned long add= r, + pte_t *ptep, unsigned int nr) { if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) { /* @@ -1532,9 +1533,14 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *= vma, unsigned long addr, pte * in cases where cpu is affected with errata #2645198. */ if (pte_user_exec(ptep_get(ptep))) - return ptep_clear_flush(vma, addr, ptep); + return clear_flush_ptes(vma, addr, ptep, nr); } - return ptep_get_and_clear(vma->vm_mm, addr, ptep); + return get_and_clear_full_ptes(vma->vm_mm, addr, ptep, nr, 0); +} + +pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep) +{ + return modify_prot_start_ptes(vma, addr, ptep, 1); } =20 void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep, diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e40ed57e034d..41f4a8de5c28 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -828,6 +828,10 @@ extern pte_t ptep_clear_flush(struct vm_area_struct *v= ma, pte_t *ptep); #endif =20 +extern pte_t clear_flush_ptes(struct vm_area_struct *vma, + unsigned long address, + pte_t *ptep, unsigned int nr); + #ifndef __HAVE_ARCH_PMDP_HUGE_CLEAR_FLUSH extern pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 5a882f2b10f9..e238f88c3cac 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -90,17 +90,23 @@ int ptep_clear_flush_young(struct vm_area_struct *vma, } #endif =20 -#ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH -pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, - pte_t *ptep) +pte_t clear_flush_ptes(struct vm_area_struct *vma, unsigned long address, + pte_t *ptep, unsigned int nr) { struct mm_struct *mm =3D (vma)->vm_mm; pte_t pte; - pte =3D ptep_get_and_clear(mm, address, ptep); + pte =3D get_and_clear_full_ptes(mm, address, ptep, nr, 0); if (pte_accessible(mm, pte)) - flush_tlb_page(vma, address); + flush_tlb_range(vma, address, address + nr * PAGE_SIZE); return pte; } + +#ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH +pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, + pte_t *ptep) +{ + return clear_flush_ptes(vma, address, ptep, 1); +} #endif =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE --=20 2.30.2 From nobody Fri Dec 19 15:46:31 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C271D268FF2 for ; Mon, 19 May 2025 07:49:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747640961; cv=none; b=koC/Zjl7bZWU1r8ba00OkFi+2EA0Nnum5XnsPPFXA/VlPoAkPTVKPRfYyIiOmYlEfKRd68ANZvt4dP6WQfzAfTai67BOdyUVlzeY8/lzA+9YpRQTl2jvz/6qH6qhHOrTYhQXxqu4Flu2wC/DwQXYQuop4FgHuyybTKNSATMyTmE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747640961; c=relaxed/simple; bh=FPC3c1UtZeXKvhNu9xElMb/9OS9+eIdnWELS1tZE6Ww=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=cpboKvOdArL/+RbwRHNefL+c5mctTQEXG/u6AA2txlDkSoq2PKEjIFOnQsuEJzCrPhaFHIlFopzdqLmSVTZYEeN5G9CLitY8eb7N5jy7dSJeToWwStgaz+mP9GRLyu4ukMhE9VdKALKGtA7MOtYHxyj0s1LNtrXLobx9K2/M4IA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 08DE914BF; Mon, 19 May 2025 00:49:04 -0700 (PDT) Received: from K4MQJ0H1H2.blr.arm.com (unknown [10.164.18.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5ADCB3F6A8; Mon, 19 May 2025 00:49:10 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org Cc: ryan.roberts@arm.com, david@redhat.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, anshuman.khandual@arm.com, peterx@redhat.com, joey.gouly@arm.com, ioworker0@gmail.com, baohua@kernel.org, kevin.brodsky@arm.com, quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu, yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org, hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com, Dev Jain Subject: [PATCH v3 5/5] arm64: Add batched version of ptep_modify_prot_commit Date: Mon, 19 May 2025 13:18:24 +0530 Message-Id: <20250519074824.42909-6-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250519074824.42909-1-dev.jain@arm.com> References: <20250519074824.42909-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Override the generic definition to simply use set_ptes() to map the new ptes into the pagetable. Signed-off-by: Dev Jain --- arch/arm64/include/asm/pgtable.h | 5 +++++ arch/arm64/mm/mmu.c | 9 ++++++++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 8872ea5f0642..0b13ca38f80c 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1558,6 +1558,11 @@ extern pte_t modify_prot_start_ptes(struct vm_area_s= truct *vma, unsigned long addr, pte_t *ptep, unsigned int nr); =20 +#define modify_prot_commit_ptes modify_prot_commit_ptes +extern void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned l= ong addr, + pte_t *ptep, pte_t old_pte, pte_t pte, + unsigned int nr); + #ifdef CONFIG_ARM64_CONTPTE =20 /* diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index fe60be8774f4..5f04bcdcd946 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1543,10 +1543,17 @@ pte_t ptep_modify_prot_start(struct vm_area_struct = *vma, unsigned long addr, pte return modify_prot_start_ptes(vma, addr, ptep, 1); } =20 +void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned long add= r, + pte_t *ptep, pte_t old_pte, pte_t pte, + unsigned int nr) +{ + set_ptes(vma->vm_mm, addr, ptep, pte, nr); +} + void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep, pte_t old_pte, pte_t pte) { - set_pte_at(vma->vm_mm, addr, ptep, pte); + modify_prot_commit_ptes(vma, addr, ptep, old_pte, pte, 1); } =20 /* --=20 2.30.2