From nobody Sat Nov 23 22:02:40 2024 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7628256D for ; Sat, 9 Nov 2024 00:40:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731112849; cv=none; b=WzSL7kzoTU0RCV0rohlaITGtaGb6AIu3TlvUWatBl6dUiEwlCX1Hc0G8mRLwPWc5FiaLsOgWy0kpuP4UnF6wzbvtob+ExBK0M7aB08jm4JDGooxbOcgbOz8zr7o+7rcBLz/5M0EFgj5phqKWqmSV3TxtJ787rBHU/iqfzzxBmp8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731112849; c=relaxed/simple; bh=+sMzLdMwye1g6ZLFLoBgXPEEWjwm3iA5oio4pJ6/aaA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=W3Z9Da2U7Wr5a/xCP3OjOj+34HmfxgcibMZEekQvhAFMQ6w+pTDQYjyc2dxxXb5gbfVPcpMc83cVvF2tkTSbKjpqDFm6coYf73NPOx3K510hhTrTrZvAHmCQiugeLHLMPwMzK8pkDtt7p6MtvqeoziAkEq6GiJ+Z1/IPEe1KZyg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=shelob.surriel.com; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shelob.surriel.com Received: from [2601:18c:9101:a8b6:6e0b:84ff:fee2:98bb] (helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1t9ZTn-000000004fJ-3Z78; Fri, 08 Nov 2024 19:37:31 -0500 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, kernel-team@meta.com, hpa@zytor.com, Rik van Riel Subject: [PATCH 1/3] x86,tlb: update mm_cpumask lazily Date: Fri, 8 Nov 2024 19:27:48 -0500 Message-ID: <20241109003727.3958374-2-riel@surriel.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241109003727.3958374-1-riel@surriel.com> References: <20241109003727.3958374-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: riel@surriel.com Content-Type: text/plain; charset="utf-8" On busy multi-threaded workloads, there can be significant contention on the mm_cpumask at context switch time. Reduce that contention by updating mm_cpumask lazily, setting the CPU bit at context switch time (if not already set), and clearing the CPU bit at the first TLB flush sent to a CPU where the process isn't running. When a flurry of TLB flushes for a process happen, only the first one will be sent to CPUs where the process isn't running. The others will be sent to CPUs where the process is currently running. On an AMD Milan system with 36 cores, there is a noticeable difference: $ hackbench --groups 20 --loops 10000 Before: ~4.5s +/- 0.1s After: ~4.2s +/- 0.1s Signed-off-by: Rik van Riel Reported-by: Borislav Petkov --- arch/x86/mm/tlb.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 86593d1b787d..f19f6378cabf 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -606,18 +606,15 @@ void switch_mm_irqs_off(struct mm_struct *unused, str= uct mm_struct *next, cond_mitigation(tsk); =20 /* - * Stop remote flushes for the previous mm. - * Skip kernel threads; we never send init_mm TLB flushing IPIs, - * but the bitmap manipulation can cause cache line contention. + * Leave this CPU in prev's mm_cpumask. Atomic writes to + * mm_cpumask can be expensive under contention. The CPU + * will be removed lazily at TLB flush time. */ - if (prev !=3D &init_mm) { - VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu, - mm_cpumask(prev))); - cpumask_clear_cpu(cpu, mm_cpumask(prev)); - } + VM_WARN_ON_ONCE(prev !=3D &init_mm && !cpumask_test_cpu(cpu, + mm_cpumask(prev))); =20 /* Start receiving IPIs and then read tlb_gen (and LAM below) */ - if (next !=3D &init_mm) + if (next !=3D &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next))) cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen =3D atomic64_read(&next->context.tlb_gen); =20 @@ -761,8 +758,10 @@ static void flush_tlb_func(void *info) count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED); =20 /* Can only happen on remote CPUs */ - if (f->mm && f->mm !=3D loaded_mm) + if (f->mm && f->mm !=3D loaded_mm) { + cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm)); return; + } } =20 if (unlikely(loaded_mm =3D=3D &init_mm)) --=20 2.45.2 From nobody Sat Nov 23 22:02:40 2024 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F35D13AD0 for ; Sat, 9 Nov 2024 00:39:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731112764; cv=none; b=BuBMMRWXyMxSB2wLiWR3bIpy9n5++7vayq4sG11Y/va6aUBjCpLrGhddtSbOyxn+Vbx0SkyzJOqcr4ydEr+4FVg34CF/LqQyclYOJhWBo8k5DBXnTC0MZ5y6+GsmBLwbrzx5TpS28wepBQ0QGyvif2X5/mb6cBKVjKDxbwx4VfY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731112764; c=relaxed/simple; bh=kz7ZYtII6eTynbdUJEjkyMvp/ahHZ8PADv07sr+izxA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=D/wxRVc+ZyPZDyLWzB9Yt2mKTpRfmBKGJ4iEcPSwpLq5myvubwFmPpmAqQNHROm8mq/ZmOn1vHZSAGMBa5eQiSLc61Fgqse/Ikw2ZnIO7Pe8DhJA8KRg83tR3S0t1myG5p6dLadb8XxRCb4bdKSFOrZh1n9BBFwLfhbidf11V3U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=shelob.surriel.com; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shelob.surriel.com Received: from [2601:18c:9101:a8b6:6e0b:84ff:fee2:98bb] (helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1t9ZTn-000000004fJ-3dMS; Fri, 08 Nov 2024 19:37:31 -0500 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, kernel-team@meta.com, hpa@zytor.com, Rik van Riel , Dave Hansen Subject: [PATCH 2/3] x86,tlb: add tracepoint for TLB flush IPI to stale CPU Date: Fri, 8 Nov 2024 19:27:49 -0500 Message-ID: <20241109003727.3958374-3-riel@surriel.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241109003727.3958374-1-riel@surriel.com> References: <20241109003727.3958374-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: riel@surriel.com Content-Type: text/plain; charset="utf-8" Add a tracepoint when we send a TLB flush IPI to a CPU that used to be in the mm_cpumask, but isn't any more. This can be used to evaluate whether there any workloads where we end up in this path problematically often. Hopefully they don't exist. Suggested-by: Dave Hansen Signed-off-by: Rik van Riel Reported-by: Borislav Petkov --- arch/x86/mm/tlb.c | 1 + include/linux/mm_types.h | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index f19f6378cabf..9d0d34576928 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -760,6 +760,7 @@ static void flush_tlb_func(void *info) /* Can only happen on remote CPUs */ if (f->mm && f->mm !=3D loaded_mm) { cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm)); + trace_tlb_flush(TLB_REMOTE_WRONG_CPU, 0); return; } } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e3bdf8e38bc..6b6f05404304 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1335,6 +1335,7 @@ enum tlb_flush_reason { TLB_LOCAL_SHOOTDOWN, TLB_LOCAL_MM_SHOOTDOWN, TLB_REMOTE_SEND_IPI, + TLB_REMOTE_WRONG_CPU, NR_TLB_FLUSH_REASONS, }; =20 --=20 2.45.2 From nobody Sat Nov 23 22:02:40 2024 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E023D2FB for ; Sat, 9 Nov 2024 00:41:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731112875; cv=none; b=sdBZSOg8rFkx91Mn+Y40PeYDvPYTjmGm8DH4Xi7UL6VfIF+C7XY4yl6yCvnqxRa1sLQbFHn/KASvaDAtMSvTMMPHSRR2ZiCh5rXAAoVyKDaXtpq9ZEcwtu4vPN4g31is0x8ofHQVmvJv8ozzDzruj4s8rxkssvGnfdhv3OvB000= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731112875; c=relaxed/simple; bh=JtIHa05H+TvMS6BNLUEQp1cr0sM/AOO7OMAcZ2ebl20=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=irHiQPFX+QekCiHeY+ViCra/eotQyIv5jx4XYUiKFvcrtjldJDioncpuDleVK+uvUN3SVlAf/D75Nn4Lf8C8U+awce20ZcexE7U+/gTCITrJ+FGOp5tikjp9rosE6/gcFerJQ4tNmc+v0yKk5JDZmAWEnEdJkRoUslmZvIFGph4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=shelob.surriel.com; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shelob.surriel.com Received: from [2601:18c:9101:a8b6:6e0b:84ff:fee2:98bb] (helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1t9ZTn-000000004fJ-3hQW; Fri, 08 Nov 2024 19:37:31 -0500 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, kernel-team@meta.com, hpa@zytor.com, Rik van Riel Subject: [PATCH 3/3] x86,tlb: put cpumask_test_cpu in prev == next under CONFIG_DEBUG_VM Date: Fri, 8 Nov 2024 19:27:50 -0500 Message-ID: <20241109003727.3958374-4-riel@surriel.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241109003727.3958374-1-riel@surriel.com> References: <20241109003727.3958374-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: riel@surriel.com Content-Type: text/plain; charset="utf-8" On a web server workload, the cpumask_test_cpu inside the WARN_ON_ONCE in the prev =3D=3D next branch takes about 17% of all the CPU time of switch_mm_irqs_off. On a large fleet, this WARN_ON_ONCE has not fired in at least a month, possibly never. Move this test under CONFIG_DEBUG_VM so it does not get compiled in production kernels. Signed-off-by: Rik van Riel Reported-by: Borislav Petkov --- arch/x86/mm/tlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 9d0d34576928..1aac4fa90d3d 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -568,7 +568,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struc= t mm_struct *next, * mm_cpumask. The TLB shootdown code can figure out from * cpu_tlbstate_shared.is_lazy whether or not to send an IPI. */ - if (WARN_ON_ONCE(prev !=3D &init_mm && + if (IS_ENABLED(CONFIG_DEBUG_VM) && WARN_ON_ONCE(prev !=3D &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next)))) cpumask_set_cpu(cpu, mm_cpumask(next)); =20 --=20 2.45.2