arch/riscv/include/asm/mmu_context.h | 1 + arch/riscv/include/asm/tlbflush.h | 4 ++ arch/riscv/mm/context.c | 10 ++++ arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++- 4 files changed, 90 insertions(+), 1 deletion(-)
When need to flush tlb of a remote cpu, there is no need to send an IPI if the target cpu is not using the asid we want to flush. Instead, we can cache the tlb flush info in percpu buffer, and defer the tlb flush to the next context_switch. This reduces the number of IPI due to tlb flush: * ltp - mmapstress01 Before: ~108k After: ~46k Future plan in the next version: - This patch series reduces IPI by deferring tlb flush to context_switch. It does not clear the mm_cpumask of target mm_struct. In the next version, I will apply a threshold to the number of ASIDs maintained by each cpu's tlb. Once the threshold is exceeded, ASID that has not been used for the longest time will be flushed out. And current cpu will be cleared in the mm_cpumask. Thanks in advance for your comments. Xu Lu (4): riscv: mm: Introduce percpu loaded_asid riscv: mm: Introduce percpu tlb flush queue riscv: mm: Enqueue tlbflush info if task is not running on target cpu riscv: mm: Perform tlb flush during context_switch arch/riscv/include/asm/mmu_context.h | 1 + arch/riscv/include/asm/tlbflush.h | 4 ++ arch/riscv/mm/context.c | 10 ++++ arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++- 4 files changed, 90 insertions(+), 1 deletion(-) -- 2.20.1
On Thu, Oct 30, 2025 at 9:57 PM Xu Lu <luxu.kernel@bytedance.com> wrote: > > When need to flush tlb of a remote cpu, there is no need to send an IPI > if the target cpu is not using the asid we want to flush. Instead, we > can cache the tlb flush info in percpu buffer, and defer the tlb flush > to the next context_switch. > > This reduces the number of IPI due to tlb flush: > > * ltp - mmapstress01 > Before: ~108k > After: ~46k Could you add the results for these two test cases to the next version? * lmbench - lat_pagefault * lmbench - lat_mmap Thank you! > > Future plan in the next version: > > - This patch series reduces IPI by deferring tlb flush to > context_switch. It does not clear the mm_cpumask of target mm_struct. In > the next version, I will apply a threshold to the number of ASIDs > maintained by each cpu's tlb. Once the threshold is exceeded, ASID that > has not been used for the longest time will be flushed out. And current > cpu will be cleared in the mm_cpumask. > > Thanks in advance for your comments. > > Xu Lu (4): > riscv: mm: Introduce percpu loaded_asid > riscv: mm: Introduce percpu tlb flush queue > riscv: mm: Enqueue tlbflush info if task is not running on target cpu > riscv: mm: Perform tlb flush during context_switch > > arch/riscv/include/asm/mmu_context.h | 1 + > arch/riscv/include/asm/tlbflush.h | 4 ++ > arch/riscv/mm/context.c | 10 ++++ > arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++- > 4 files changed, 90 insertions(+), 1 deletion(-) > > -- > 2.20.1 > -- Best Regards Guo Ren
On Fri, Nov 7, 2025 at 9:56 AM Guo Ren <guoren@kernel.org> wrote: > > On Thu, Oct 30, 2025 at 9:57 PM Xu Lu <luxu.kernel@bytedance.com> wrote: > > > > When need to flush tlb of a remote cpu, there is no need to send an IPI > > if the target cpu is not using the asid we want to flush. Instead, we > > can cache the tlb flush info in percpu buffer, and defer the tlb flush > > to the next context_switch. > > > > This reduces the number of IPI due to tlb flush: > > > > * ltp - mmapstress01 > > Before: ~108k > > After: ~46k > > Could you add the results for these two test cases to the next version? > > * lmbench - lat_pagefault > * lmbench - lat_mmap Roger that. Thanks for your supplement. > > Thank you! > > > > > Future plan in the next version: > > > > - This patch series reduces IPI by deferring tlb flush to > > context_switch. It does not clear the mm_cpumask of target mm_struct. In > > the next version, I will apply a threshold to the number of ASIDs > > maintained by each cpu's tlb. Once the threshold is exceeded, ASID that > > has not been used for the longest time will be flushed out. And current > > cpu will be cleared in the mm_cpumask. > > > > Thanks in advance for your comments. > > > > Xu Lu (4): > > riscv: mm: Introduce percpu loaded_asid > > riscv: mm: Introduce percpu tlb flush queue > > riscv: mm: Enqueue tlbflush info if task is not running on target cpu > > riscv: mm: Perform tlb flush during context_switch > > > > arch/riscv/include/asm/mmu_context.h | 1 + > > arch/riscv/include/asm/tlbflush.h | 4 ++ > > arch/riscv/mm/context.c | 10 ++++ > > arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++- > > 4 files changed, 90 insertions(+), 1 deletion(-) > > > > -- > > 2.20.1 > > > > > -- > Best Regards > Guo Ren
On Thu, Oct 30, 2025 at 9:57 PM Xu Lu <luxu.kernel@bytedance.com> wrote: > > When need to flush tlb of a remote cpu, there is no need to send an IPI > if the target cpu is not using the asid we want to flush. Instead, we > can cache the tlb flush info in percpu buffer, and defer the tlb flush > to the next context_switch. > > This reduces the number of IPI due to tlb flush: > > * ltp - mmapstress01 > Before: ~108k > After: ~46k Great result! I've some questions: 1. Do we need an accurate address flush by a new queue of flush_tlb_range_data? Why not flush the whole asid? 2. If we reuse the context_tlb_flush_pending mechanism, could mmapstress01 gain the result better than ~46k? 3. If we meet the kernel address space, we must use IPI flush immediately, but I didn't see your patch consider that case, or am I wrong? > > Future plan in the next version: > > - This patch series reduces IPI by deferring tlb flush to > context_switch. It does not clear the mm_cpumask of target mm_struct. In > the next version, I will apply a threshold to the number of ASIDs > maintained by each cpu's tlb. Once the threshold is exceeded, ASID that > has not been used for the longest time will be flushed out. And current > cpu will be cleared in the mm_cpumask. > > Thanks in advance for your comments. > > Xu Lu (4): > riscv: mm: Introduce percpu loaded_asid > riscv: mm: Introduce percpu tlb flush queue > riscv: mm: Enqueue tlbflush info if task is not running on target cpu > riscv: mm: Perform tlb flush during context_switch > > arch/riscv/include/asm/mmu_context.h | 1 + > arch/riscv/include/asm/tlbflush.h | 4 ++ > arch/riscv/mm/context.c | 10 ++++ > arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++- > 4 files changed, 90 insertions(+), 1 deletion(-) > > -- > 2.20.1 > -- Best Regards Guo Ren
Hi Guo Ren, On Mon, Nov 3, 2025 at 11:44 AM Guo Ren <guoren@kernel.org> wrote: > > On Thu, Oct 30, 2025 at 9:57 PM Xu Lu <luxu.kernel@bytedance.com> wrote: > > > > When need to flush tlb of a remote cpu, there is no need to send an IPI > > if the target cpu is not using the asid we want to flush. Instead, we > > can cache the tlb flush info in percpu buffer, and defer the tlb flush > > to the next context_switch. > > > > This reduces the number of IPI due to tlb flush: > > > > * ltp - mmapstress01 > > Before: ~108k > > After: ~46k > Great result! > > I've some questions: > 1. Do we need an accurate address flush by a new queue of > flush_tlb_range_data? Why not flush the whole asid? Flushing the whole address space may cause subsequent tlb misses. Consider such a case: there is only one user mode thread frequently running on the target hart. When the user thread falls asleep and cpu context switches to idle thread, another thread of the same process running on another hart modifies the mapping and needs to perform tlb flush. The first user mode thread will encounter a large number of tlb misses when it resumes. I want to try to balance the ipi count and tlb misses. > 2. If we reuse the context_tlb_flush_pending mechanism, could > mmapstress01 gain the result better than ~46k? Besides lazy tlb flush, another way to reduce ipi overhead is to clean mm_cpumask. And it does gain a better result for mmapstress01. I have sent a patch[1] which clears mm_cpumask whenever flushing all tlb of a certain asid and it reduces the ipi count from ~98k to 268. As was mentioned in the previous email, in the next version, I will supply the mm_cpumask clear procedure. Specifically, I will flush all tlb of an asid and clear mm_cpumask whenever it hasn't been scheduled after enough context switches. [1] https://lore.kernel.org/all/20250827131444.23893-3-luxu.kernel@bytedance.com/ > 3. If we meet the kernel address space, we must use IPI flush > immediately, but I didn't see your patch consider that case, or am I > wrong? Nice catch! Forgot to add the kernel ASID judgment logic in the shoulded_ipi_flush function. I will supply it in the next version. I have considered canceling ipi and deferring the tlb flush to the next time target hart enters the s mode if the target hart is now running in user mode. But there are too many kernel entry points to consider, especially now we have sse. For kernel tlb flush, it may be more secure to send ipi. Thanks. Best Regards, Xu Lu > > > > > Future plan in the next version: > > > > - This patch series reduces IPI by deferring tlb flush to > > context_switch. It does not clear the mm_cpumask of target mm_struct. In > > the next version, I will apply a threshold to the number of ASIDs > > maintained by each cpu's tlb. Once the threshold is exceeded, ASID that > > has not been used for the longest time will be flushed out. And current > > cpu will be cleared in the mm_cpumask. > > > > Thanks in advance for your comments. > > > > Xu Lu (4): > > riscv: mm: Introduce percpu loaded_asid > > riscv: mm: Introduce percpu tlb flush queue > > riscv: mm: Enqueue tlbflush info if task is not running on target cpu > > riscv: mm: Perform tlb flush during context_switch > > > > arch/riscv/include/asm/mmu_context.h | 1 + > > arch/riscv/include/asm/tlbflush.h | 4 ++ > > arch/riscv/mm/context.c | 10 ++++ > > arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++- > > 4 files changed, 90 insertions(+), 1 deletion(-) > > > > -- > > 2.20.1 > > > > > -- > Best Regards > Guo Ren
© 2016 - 2026 Red Hat, Inc.