arch/x86/mm/tlb.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
On Thu, 5 Dec 2024 16:43:24 +0800
kernel test robot <oliver.sang@intel.com> wrote:
> besides the performance report
> "[tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression"
> in
> https://lore.kernel.org/all/202411282207.6bd28eae-lkp@intel.com/
>
Anxiously awaiting the bot to get around to v3 or v4 of that patch,
on the extra-large 2 socket system ;)
> we now also observed a WARNING from another test. the issue doesn't always
> happen, so we run it more to make sure the parent keep clean.
Thank you for spotting this corner case, too!
The warning appears to be fairly harmless, and luckily also easy
to fix.
---8<---
From 5b5d1d548fbe07b415ba9e80a2f60deed5aead62 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@surriel.com>
Date: Thu, 5 Dec 2024 10:20:28 -0500
Subject: [PATCH 2/2] x86,mm: also remove local CPU from mm_cpumask if stale
The code in flush_tlb_func that removes a remote CPU from the
cpumask if it is no longer running the target mm is also needed
on the originating CPU of a TLB flush, now that CPUs are no
longer cleared from the mm_cpumask at context switch time.
Flushing the TLB when we are not running the target mm is
harmless, because the CPU's tlb_gen only gets updated to
match the mm_tlb_gen, but it does hit this warning:
WARN_ON_ONCE(local_tlb_gen > mm_tlb_gen);
[ 210.343902][ T4668] WARNING: CPU: 38 PID: 4668 at arch/x86/mm/tlb.c:815 flush_tlb_func (arch/x86/mm/tlb.c:815)
Removing both local and remote CPUs from the mm_cpumask
when doing a flush for a not currently loaded mm avoids
that warning.
Signed-off-by: Rik van Riel <riel@surriel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202412051551.690e9656-lkp@intel.com
---
arch/x86/mm/tlb.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 0507a6773a37..458a5d5be594 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -756,13 +756,13 @@ static void flush_tlb_func(void *info)
if (!local) {
inc_irq_stat(irq_tlb_count);
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
+ }
- /* Can only happen on remote CPUs */
- if (f->mm && f->mm != loaded_mm) {
- cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
- trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
- return;
- }
+ /* The CPU was left in the mm_cpumask of the target mm. Clear it. */
+ if (f->mm && f->mm != loaded_mm) {
+ cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
+ trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
+ return;
}
if (unlikely(loaded_mm == &init_mm))
--
2.47.0
hi, Rik van Riel,
On Thu, Dec 05, 2024 at 10:46:30AM -0500, Rik van Riel wrote:
> On Thu, 5 Dec 2024 16:43:24 +0800
> kernel test robot <oliver.sang@intel.com> wrote:
>
> > besides the performance report
> > "[tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression"
> > in
> > https://lore.kernel.org/all/202411282207.6bd28eae-lkp@intel.com/
> >
> Anxiously awaiting the bot to get around to v3 or v4 of that patch,
> on the extra-large 2 socket system ;)
>
> > we now also observed a WARNING from another test. the issue doesn't always
> > happen, so we run it more to make sure the parent keep clean.
>
> Thank you for spotting this corner case, too!
>
> The warning appears to be fairly harmless, and luckily also easy
> to fix.
below patch fixes the WARNING in our tests.
Tested-by: kernel test robot <oliver.sang@intel.com>
our bot applied the patch as below:
fbf932edb3630 x86,mm: also remove local CPU from mm_cpumask if stale <-----
2815a56e4b725 (tip/x86/mm) x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU
209954cbc7d0c x86/mm/tlb: Update mm_cpumask lazily
7e33001b8b9a7 x86/mm/tlb: Put cpumask_test_cpu() check in switch_mm_irqs_off() under CONFIG_DEBUG_VM
now issue is clean on fbf932edb3630
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_ssd/nr_task/priority/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled:
gcc-12/performance/x86_64-rhel-9.4/1/32/1/debian-12-x86_64-20240206.cgz/300/lkp-icl-2sp4/swap-w-seq/vm-scalability/always/always
7e33001b8b9a7806 209954cbc7d0ce1a190fc725d20 fbf932edb3630024b60b22df596
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:50 58% 29:50 0% :50 dmesg.RIP:flush_tlb_func
:50 58% 29:50 0% :50 dmesg.WARNING:at_arch/x86/mm/tlb.c:#flush_tlb_func
>
> ---8<---
>
> From 5b5d1d548fbe07b415ba9e80a2f60deed5aead62 Mon Sep 17 00:00:00 2001
> From: Rik van Riel <riel@surriel.com>
> Date: Thu, 5 Dec 2024 10:20:28 -0500
> Subject: [PATCH 2/2] x86,mm: also remove local CPU from mm_cpumask if stale
>
> The code in flush_tlb_func that removes a remote CPU from the
> cpumask if it is no longer running the target mm is also needed
> on the originating CPU of a TLB flush, now that CPUs are no
> longer cleared from the mm_cpumask at context switch time.
>
> Flushing the TLB when we are not running the target mm is
> harmless, because the CPU's tlb_gen only gets updated to
> match the mm_tlb_gen, but it does hit this warning:
>
> WARN_ON_ONCE(local_tlb_gen > mm_tlb_gen);
>
> [ 210.343902][ T4668] WARNING: CPU: 38 PID: 4668 at arch/x86/mm/tlb.c:815 flush_tlb_func (arch/x86/mm/tlb.c:815)
>
> Removing both local and remote CPUs from the mm_cpumask
> when doing a flush for a not currently loaded mm avoids
> that warning.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202412051551.690e9656-lkp@intel.com
> ---
> arch/x86/mm/tlb.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 0507a6773a37..458a5d5be594 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -756,13 +756,13 @@ static void flush_tlb_func(void *info)
> if (!local) {
> inc_irq_stat(irq_tlb_count);
> count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
> + }
>
> - /* Can only happen on remote CPUs */
> - if (f->mm && f->mm != loaded_mm) {
> - cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
> - trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
> - return;
> - }
> + /* The CPU was left in the mm_cpumask of the target mm. Clear it. */
> + if (f->mm && f->mm != loaded_mm) {
> + cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
> + trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
> + return;
> }
>
> if (unlikely(loaded_mm == &init_mm))
> --
> 2.47.0
>
>
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 953753db887f9d70f70f61d6ecbe5cf209107672
Gitweb: https://git.kernel.org/tip/953753db887f9d70f70f61d6ecbe5cf209107672
Author: Rik van Riel <riel@surriel.com>
AuthorDate: Thu, 05 Dec 2024 10:46:30 -05:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Dec 2024 10:25:53 +01:00
x86/mm/tlb: Also remove local CPU from mm_cpumask if stale
The code in flush_tlb_func() that removes a remote CPU from the
cpumask if it is no longer running the target mm is also needed
on the originating CPU of a TLB flush, now that CPUs are no
longer cleared from the mm_cpumask at context switch time.
Flushing the TLB when we are not running the target mm is
harmless, because the CPU's tlb_gen only gets updated to
match the mm_tlb_gen, but it does hit this warning:
WARN_ON_ONCE(local_tlb_gen > mm_tlb_gen);
[ 210.343902][ T4668] WARNING: CPU: 38 PID: 4668 at arch/x86/mm/tlb.c:815 flush_tlb_func (arch/x86/mm/tlb.c:815)
Removing both local and remote CPUs from the mm_cpumask
when doing a flush for a not currently loaded mm avoids
that warning.
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241205104630.755706ca@fangorn
Closes: https://lore.kernel.org/oe-lkp/202412051551.690e9656-lkp@intel.com
---
arch/x86/mm/tlb.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 1aac4fa..3c30817 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -756,13 +756,13 @@ static void flush_tlb_func(void *info)
if (!local) {
inc_irq_stat(irq_tlb_count);
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
+ }
- /* Can only happen on remote CPUs */
- if (f->mm && f->mm != loaded_mm) {
- cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
- trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
- return;
- }
+ /* The CPU was left in the mm_cpumask of the target mm. Clear it. */
+ if (f->mm && f->mm != loaded_mm) {
+ cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm));
+ trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0);
+ return;
}
if (unlikely(loaded_mm == &init_mm))
© 2016 - 2025 Red Hat, Inc.