From nobody Sat Feb 7 17:55:31 2026 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2DA646BF for ; Tue, 6 Jan 2026 11:51:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767700304; cv=none; b=BDldFsxPTz3NB4Bm9fOVNMtEAEuyQUGLd2bCgJCHNqNDiTmHHn/QiVlohg+NpvezxeMcCSi7V4acIuuVfwD3lrn3bk/W7N4WT9Yxg3V5JjSXv5q6p9SbdmNfFEMcKGySrbZSQ9twPGqzbhnsHFg1KptrEipSbWcMgLY/dvadZ1g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767700304; c=relaxed/simple; bh=BBmGgd1RvN2fwb6Tg+lSTw8Z9gqvzwSw5EWzn9VMiS0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DtF8DaD/cFLIlBGI8oFoHUocpVOJHsUALlKAq4meWz5Tg1yhrpqc6ArFLiWwvF7ZTWV8lz7UayslzLkK8qa0MnhIkE731edOrKOtXeS8sq0RE/E9X/qUstdHMtp1WTLRnR5OpbeszA2rZ9djmnhNG4nA7Bv92hHr++Ek2DbW9Vo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=iMG81bgl; arc=none smtp.client-ip=91.218.175.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="iMG81bgl" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1767700300; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aCKTGrv+k6ub8F4AW6v/G5VPK3mcKpwM4TC/1V+pwcA=; b=iMG81bglXc/pBXxZCytB59WHt3CkSpHYOHuD+F+wOBsYMgsfwLfOlkoBH264MyWBPbPeke ngRBzqAOVuGasjKeex54TwJ9nzzOqtzooXC8YB2eOauxaxevea7yfToYS92Hf5rlMHecfC 4Ts2PQ5EeyXlESrcFq4YLtrzsQcnJxg= From: lance.yang@linux.dev To: akpm@linux-foundation.org Cc: david@kernel.org, dave.hansen@intel.com, dave.hansen@linux.intel.com, will@kernel.org, aneesh.kumar@kernel.org, npiggin@gmail.com, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, arnd@arndb.de, lorenzo.stoakes@oracle.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, shy828301@gmail.com, riel@surriel.com, jannh@google.com, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ioworker0@gmail.com, Lance Yang Subject: [PATCH v3 1/2] mm/tlb: skip redundant IPI when TLB flush already synchronized Date: Tue, 6 Jan 2026 19:50:52 +0800 Message-ID: <20260106115053.32328-2-lance.yang@linux.dev> In-Reply-To: <20260106115053.32328-1-lance.yang@linux.dev> References: <20260106115053.32328-1-lance.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Lance Yang When unsharing hugetlb PMD page tables, we currently send two IPIs: one for TLB invalidation, and another to synchronize with concurrent GUP-fast walkers via tlb_remove_table_sync_one(). However, if the TLB flush already sent IPIs to all CPUs (when freed_tables or unshared_tables is true), the second IPI is redundant. GUP-fast runs with IRQs disabled, so when the TLB flush IPI completes, any concurrent GUP-fast must have finished. To avoid the redundant IPI, we add a flag to mmu_gather to track whether the TLB flush sent IPIs. We pass the mmu_gather pointer through the TLB flush path via flush_tlb_info, so native_flush_tlb_multi() can set the flag when it sends IPIs for freed_tables. We also set the flag for local-only flushes, since disabling IRQs provides the same guarantee. Suggested-by: David Hildenbrand (Red Hat) Suggested-by: Dave Hansen Signed-off-by: Lance Yang --- arch/x86/include/asm/tlb.h | 3 ++- arch/x86/include/asm/tlbflush.h | 9 +++++---- arch/x86/kernel/alternative.c | 2 +- arch/x86/kernel/ldt.c | 2 +- arch/x86/mm/tlb.c | 22 ++++++++++++++++------ include/asm-generic/tlb.h | 14 +++++++++----- mm/mmu_gather.c | 26 +++++++++++++++++++------- 7 files changed, 53 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h index 866ea78ba156..c5950a92058c 100644 --- a/arch/x86/include/asm/tlb.h +++ b/arch/x86/include/asm/tlb.h @@ -20,7 +20,8 @@ static inline void tlb_flush(struct mmu_gather *tlb) end =3D tlb->end; } =20 - flush_tlb_mm_range(tlb->mm, start, end, stride_shift, tlb->freed_tables); + flush_tlb_mm_range(tlb->mm, start, end, stride_shift, + tlb->freed_tables || tlb->unshared_tables, tlb); } =20 static inline void invlpg(unsigned long addr) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index 00daedfefc1b..83c260c88b80 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -220,6 +220,7 @@ struct flush_tlb_info { * will be zero. */ struct mm_struct *mm; + struct mmu_gather *tlb; unsigned long start; unsigned long end; u64 new_tlb_gen; @@ -305,23 +306,23 @@ static inline bool mm_in_asid_transition(struct mm_st= ruct *mm) { return false; } #endif =20 #define flush_tlb_mm(mm) \ - flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true) + flush_tlb_mm_range(mm, 0UL, TLB_FLUSH_ALL, 0UL, true, NULL) =20 #define flush_tlb_range(vma, start, end) \ flush_tlb_mm_range((vma)->vm_mm, start, end, \ ((vma)->vm_flags & VM_HUGETLB) \ ? huge_page_shift(hstate_vma(vma)) \ - : PAGE_SHIFT, true) + : PAGE_SHIFT, true, NULL) =20 extern void flush_tlb_all(void); extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, - bool freed_tables); + bool freed_tables, struct mmu_gather *tlb); extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); =20 static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned lon= g a) { - flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false); + flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false, NULL); } =20 static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 28518371d8bf..006f3705b616 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -2572,7 +2572,7 @@ static void *__text_poke(text_poke_f func, void *addr= , const void *src, size_t l */ flush_tlb_mm_range(text_poke_mm, text_poke_mm_addr, text_poke_mm_addr + (cross_page_boundary ? 2 : 1) * PAGE_SIZE, - PAGE_SHIFT, false); + PAGE_SHIFT, false, NULL); =20 if (func =3D=3D text_poke_memcpy) { /* diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 0f19ef355f5f..d8494706fec5 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -374,7 +374,7 @@ static void unmap_ldt_struct(struct mm_struct *mm, stru= ct ldt_struct *ldt) } =20 va =3D (unsigned long)ldt_slot_va(ldt->slot); - flush_tlb_mm_range(mm, va, va + nr_pages * PAGE_SIZE, PAGE_SHIFT, false); + flush_tlb_mm_range(mm, va, va + nr_pages * PAGE_SIZE, PAGE_SHIFT, false, = NULL); } =20 #else /* !CONFIG_MITIGATION_PAGE_TABLE_ISOLATION */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index f5b93e01e347..be45976c0d16 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1374,6 +1374,9 @@ STATIC_NOPV void native_flush_tlb_multi(const struct = cpumask *cpumask, else on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func, (void *)info, 1, cpumask); + + if (info->freed_tables && info->tlb) + info->tlb->tlb_flush_sent_ipi =3D true; } =20 void flush_tlb_multi(const struct cpumask *cpumask, @@ -1403,7 +1406,7 @@ static DEFINE_PER_CPU(unsigned int, flush_tlb_info_id= x); static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables, - u64 new_tlb_gen) + u64 new_tlb_gen, struct mmu_gather *tlb) { struct flush_tlb_info *info =3D this_cpu_ptr(&flush_tlb_info); =20 @@ -1433,6 +1436,7 @@ static struct flush_tlb_info *get_flush_tlb_info(stru= ct mm_struct *mm, info->new_tlb_gen =3D new_tlb_gen; info->initiating_cpu =3D smp_processor_id(); info->trim_cpumask =3D 0; + info->tlb =3D tlb; =20 return info; } @@ -1447,8 +1451,8 @@ static void put_flush_tlb_info(void) } =20 void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, - unsigned long end, unsigned int stride_shift, - bool freed_tables) + unsigned long end, unsigned int stride_shift, + bool freed_tables, struct mmu_gather *tlb) { struct flush_tlb_info *info; int cpu =3D get_cpu(); @@ -1458,7 +1462,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigne= d long start, new_tlb_gen =3D inc_mm_tlb_gen(mm); =20 info =3D get_flush_tlb_info(mm, start, end, stride_shift, freed_tables, - new_tlb_gen); + new_tlb_gen, tlb); =20 /* * flush_tlb_multi() is not optimized for the common case in which only @@ -1476,6 +1480,12 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsign= ed long start, local_irq_disable(); flush_tlb_func(info); local_irq_enable(); + /* + * Only current CPU uses this mm, so we can treat this as + * having synchronized with GUP-fast. No sync IPI needed. + */ + if (tlb && freed_tables) + tlb->tlb_flush_sent_ipi =3D true; } =20 put_flush_tlb_info(); @@ -1553,7 +1563,7 @@ void flush_tlb_kernel_range(unsigned long start, unsi= gned long end) guard(preempt)(); =20 info =3D get_flush_tlb_info(NULL, start, end, PAGE_SHIFT, false, - TLB_GENERATION_INVALID); + TLB_GENERATION_INVALID, NULL); =20 if (info->end =3D=3D TLB_FLUSH_ALL) kernel_tlb_flush_all(info); @@ -1733,7 +1743,7 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_b= atch *batch) int cpu =3D get_cpu(); =20 info =3D get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false, - TLB_GENERATION_INVALID); + TLB_GENERATION_INVALID, NULL); /* * flush_tlb_multi() is not optimized for the common case in which only * a local TLB flush is needed. Optimize this use-case by calling diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 3975f7d11553..cbbe008590ee 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -249,6 +249,7 @@ static inline void tlb_remove_table(struct mmu_gather *= tlb, void *table) #define tlb_needs_table_invalidate() (true) #endif =20 +void tlb_gather_remove_table_sync_one(struct mmu_gather *tlb); void tlb_remove_table_sync_one(void); =20 #else @@ -257,6 +258,7 @@ void tlb_remove_table_sync_one(void); #error tlb_needs_table_invalidate() requires MMU_GATHER_RCU_TABLE_FREE #endif =20 +static inline void tlb_gather_remove_table_sync_one(struct mmu_gather *tlb= ) { } static inline void tlb_remove_table_sync_one(void) { } =20 #endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */ @@ -378,6 +380,12 @@ struct mmu_gather { */ unsigned int fully_unshared_tables : 1; =20 + /* + * Did the TLB flush for freed/unshared tables send IPIs to all CPUs? + * If true, we can skip the redundant IPI in tlb_remove_table_sync_one(). + */ + unsigned int tlb_flush_sent_ipi : 1; + unsigned int batch_count; =20 #ifndef CONFIG_MMU_GATHER_NO_GATHER @@ -833,13 +841,9 @@ static inline void tlb_flush_unshared_tables(struct mm= u_gather *tlb) * * We only perform this when we are the last sharer of a page table, * as the IPI will reach all CPUs: any GUP-fast. - * - * Note that on configs where tlb_remove_table_sync_one() is a NOP, - * the expectation is that the tlb_flush_mmu_tlbonly() would have issued - * required IPIs already for us. */ if (tlb->fully_unshared_tables) { - tlb_remove_table_sync_one(); + tlb_gather_remove_table_sync_one(tlb); tlb->fully_unshared_tables =3D false; } } diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 2faa23d7f8d4..da36de52b281 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -273,8 +273,14 @@ static void tlb_remove_table_smp_sync(void *arg) /* Simply deliver the interrupt */ } =20 -void tlb_remove_table_sync_one(void) +void tlb_gather_remove_table_sync_one(struct mmu_gather *tlb) { + /* Skip the IPI if the TLB flush already synchronized with other CPUs */ + if (tlb && tlb->tlb_flush_sent_ipi) { + tlb->tlb_flush_sent_ipi =3D false; + return; + } + /* * This isn't an RCU grace period and hence the page-tables cannot be * assumed to be actually RCU-freed. @@ -285,6 +291,11 @@ void tlb_remove_table_sync_one(void) smp_call_function(tlb_remove_table_smp_sync, NULL, 1); } =20 +void tlb_remove_table_sync_one(void) +{ + tlb_gather_remove_table_sync_one(NULL); +} + static void tlb_remove_table_rcu(struct rcu_head *head) { __tlb_remove_table_free(container_of(head, struct mmu_table_batch, rcu)); @@ -328,7 +339,7 @@ static inline void __tlb_remove_table_one_rcu(struct rc= u_head *head) __tlb_remove_table(ptdesc); } =20 -static inline void __tlb_remove_table_one(void *table) +static inline void __tlb_remove_table_one(void *table, struct mmu_gather *= tlb) { struct ptdesc *ptdesc; =20 @@ -336,16 +347,16 @@ static inline void __tlb_remove_table_one(void *table) call_rcu(&ptdesc->pt_rcu_head, __tlb_remove_table_one_rcu); } #else -static inline void __tlb_remove_table_one(void *table) +static inline void __tlb_remove_table_one(void *table, struct mmu_gather *= tlb) { - tlb_remove_table_sync_one(); + tlb_gather_remove_table_sync_one(tlb); __tlb_remove_table(table); } #endif /* CONFIG_PT_RECLAIM */ =20 -static void tlb_remove_table_one(void *table) +static void tlb_remove_table_one(void *table, struct mmu_gather *tlb) { - __tlb_remove_table_one(table); + __tlb_remove_table_one(table, tlb); } =20 static void tlb_table_flush(struct mmu_gather *tlb) @@ -367,7 +378,7 @@ void tlb_remove_table(struct mmu_gather *tlb, void *tab= le) *batch =3D (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT); if (*batch =3D=3D NULL) { tlb_table_invalidate(tlb); - tlb_remove_table_one(table); + tlb_remove_table_one(table, tlb); return; } (*batch)->nr =3D 0; @@ -427,6 +438,7 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, st= ruct mm_struct *mm, tlb->vma_pfn =3D 0; =20 tlb->fully_unshared_tables =3D 0; + tlb->tlb_flush_sent_ipi =3D 0; __tlb_reset_range(tlb); inc_tlb_flush_pending(tlb->mm); } --=20 2.49.0 From nobody Sat Feb 7 17:55:31 2026 Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F1172E8B97 for ; Tue, 6 Jan 2026 11:52:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767700323; cv=none; b=kixhvEpmwdYi04HMsCBuZpvcvphcr9mLzHkMznUYg21qlC4CUIQkYGgtgfCdiaXzGFYri+FFkNs3aV2GEfzo7kDFmnzSh1bVXd441bbdjqIdlim4sOlf+1NSB74oG1eJQIEnYSedFO/ubBq8w8QijJ/aOhEpjoMpVxEiRJEwPAU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767700323; c=relaxed/simple; bh=3uE50zhBe9gc4nNAS8O0zcghuzH8rnTg7HpDCKyu5I4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iv/uO2MnxjrvbNVO6o8R014Fu/qL0chs3eIHilFcDx+cD1mz2dL7NhpMQ2naqYzDwVxSpd5Ho3GihANlGifiCxXXi8w3KJ4XVKhT5vG6UEM0m4oKZgfOOwea47ocVIavLejs+sFXBArU/ue6Oshe/e6azj5uvlqtNpq+EPn/lvA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=RJK4rD9u; arc=none smtp.client-ip=95.215.58.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="RJK4rD9u" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1767700318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ASH9sKA9CqH12pO8t+a39xF1o8l7wjDesRLJzYo7jDg=; b=RJK4rD9unB904yjqG7YtH7GMUxlSQosNsESRT1mysjrtG6iWRY3XfoCtx7wfDR1mUmSENh K+7ytPQIYIEbfTgBvfJkgXXfFELO3BA5ZjkqVX7i7ZOvSqJwQPquL2YZYJTGL8V/otYdqC m1zIidafAOa/1RV1YpAD9A5g+Pv1s04= From: lance.yang@linux.dev To: akpm@linux-foundation.org Cc: david@kernel.org, dave.hansen@intel.com, dave.hansen@linux.intel.com, will@kernel.org, aneesh.kumar@kernel.org, npiggin@gmail.com, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, arnd@arndb.de, lorenzo.stoakes@oracle.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, shy828301@gmail.com, riel@surriel.com, jannh@google.com, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, ioworker0@gmail.com, Lance Yang Subject: [PATCH v3 2/2] mm: introduce pmdp_collapse_flush_sync() to skip redundant IPI Date: Tue, 6 Jan 2026 19:50:53 +0800 Message-ID: <20260106115053.32328-3-lance.yang@linux.dev> In-Reply-To: <20260106115053.32328-1-lance.yang@linux.dev> References: <20260106115053.32328-1-lance.yang@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Lance Yang pmdp_collapse_flush() may already send IPIs to flush TLBs, and then callers send another IPI via tlb_remove_table_sync_one() or pmdp_get_lockless_sync() to synchronize with concurrent GUP-fast walkers. However, since GUP-fast runs with IRQs disabled, the TLB flush IPI already provides the necessary synchronization. We can avoid the redundant second IPI. Introduce pmdp_collapse_flush_sync() which combines flush and sync: - For architectures using the generic pmdp_collapse_flush() implementation (e.g., x86): Use mmu_gather to track IPI sends. If the TLB flush sent an IPI, tlb_gather_remove_table_sync_one() will skip the redundant one. - For architectures with custom pmdp_collapse_flush() (s390, riscv, powerpc): Fall back to calling pmdp_collapse_flush() followed by tlb_remove_table_sync_one(). No behavior change. Update khugepaged to use pmdp_collapse_flush_sync() instead of separate flush and sync calls. Remove the now-unused pmdp_get_lockless_sync() macro. Suggested-by: David Hildenbrand (Red Hat) Signed-off-by: Lance Yang --- include/linux/pgtable.h | 13 +++++++++---- mm/khugepaged.c | 9 +++------ mm/pgtable-generic.c | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 46 insertions(+), 10 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index eb8aacba3698..69e290dab450 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -755,7 +755,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) return pmd; } #define pmdp_get_lockless pmdp_get_lockless -#define pmdp_get_lockless_sync() tlb_remove_table_sync_one() #endif /* CONFIG_PGTABLE_LEVELS > 2 */ #endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */ =20 @@ -774,9 +773,6 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) { return pmdp_get(pmdp); } -static inline void pmdp_get_lockless_sync(void) -{ -} #endif =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -1174,6 +1170,8 @@ static inline void pudp_set_wrprotect(struct mm_struc= t *mm, #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); +extern pmd_t pmdp_collapse_flush_sync(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); #else static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, @@ -1182,6 +1180,13 @@ static inline pmd_t pmdp_collapse_flush(struct vm_ar= ea_struct *vma, BUILD_BUG(); return *pmdp; } +static inline pmd_t pmdp_collapse_flush_sync(struct vm_area_struct *vma, + unsigned long address, + pmd_t *pmdp) +{ + BUILD_BUG(); + return *pmdp; +} #define pmdp_collapse_flush pmdp_collapse_flush #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 9f790ec34400..0a98afc85c50 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1177,10 +1177,9 @@ static enum scan_result collapse_huge_page(struct mm= _struct *mm, unsigned long a * Parallel GUP-fast is fine since GUP-fast will back off when * it detects PMD is changed. */ - _pmd =3D pmdp_collapse_flush(vma, address, pmd); + _pmd =3D pmdp_collapse_flush_sync(vma, address, pmd); spin_unlock(pmd_ptl); mmu_notifier_invalidate_range_end(&range); - tlb_remove_table_sync_one(); =20 pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); if (pte) { @@ -1663,8 +1662,7 @@ static enum scan_result try_collapse_pte_mapped_thp(s= truct mm_struct *mm, unsign } } } - pgt_pmd =3D pmdp_collapse_flush(vma, haddr, pmd); - pmdp_get_lockless_sync(); + pgt_pmd =3D pmdp_collapse_flush_sync(vma, haddr, pmd); pte_unmap_unlock(start_pte, ptl); if (ptl !=3D pml) spin_unlock(pml); @@ -1817,8 +1815,7 @@ static void retract_page_tables(struct address_space = *mapping, pgoff_t pgoff) * races against the prior checks. */ if (likely(file_backed_vma_is_retractable(vma))) { - pgt_pmd =3D pmdp_collapse_flush(vma, addr, pmd); - pmdp_get_lockless_sync(); + pgt_pmd =3D pmdp_collapse_flush_sync(vma, addr, pmd); success =3D true; } =20 diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index d3aec7a9926a..be2ee82e6fc4 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -233,6 +233,40 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, = unsigned long address, flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd; } + +pmd_t pmdp_collapse_flush_sync(struct vm_area_struct *vma, unsigned long a= ddress, + pmd_t *pmdp) +{ + struct mmu_gather tlb; + pmd_t pmd; + + VM_BUG_ON(address & ~HPAGE_PMD_MASK); + VM_BUG_ON(pmd_trans_huge(*pmdp)); + + tlb_gather_mmu(&tlb, vma->vm_mm); + pmd =3D pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp); + + flush_tlb_mm_range(vma->vm_mm, address, address + HPAGE_PMD_SIZE, + PAGE_SHIFT, true, &tlb); + + /* + * Synchronize with GUP-fast. If the flush sent IPIs, skip the + * redundant sync IPI. + */ + tlb_gather_remove_table_sync_one(&tlb); + tlb_finish_mmu(&tlb); + return pmd; +} +#else +pmd_t pmdp_collapse_flush_sync(struct vm_area_struct *vma, unsigned long a= ddress, + pmd_t *pmdp) +{ + pmd_t pmd; + + pmd =3D pmdp_collapse_flush(vma, address, pmdp); + tlb_remove_table_sync_one(); + return pmd; +} #endif =20 /* arch define pte_free_defer in asm/pgalloc.h for its own implementation = */ --=20 2.49.0