From nobody Mon Jun 8 05:26:41 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 034E030C610 for ; Fri, 5 Jun 2026 16:53:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780678413; cv=none; b=F6u+R4TQIRmzCdmiyRmszDXiMgjD/Hgg73CfD3E3z2e1n0OiDowe845/uEVDV0iN4Xlgb3FNoSzQUDShYK1PPNZgB9rpuQhhSdnFvvZ6C/kyKRSbonyFxPjAZtLRu30sVMwshB1zHM61VQOwQECG+oSkXAYrdEAHAOKlFrhIAZw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780678413; c=relaxed/simple; bh=K3bPHYZjlLuR5D5JWpOVxz4bXcDj5fOburGAnBbNDpc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=i6ht1Crn2TlLH7i3FKKbrKHr1x6fYca/LZJISUru4V8GH7R3TJ3wqTGV8zxuqeggVpVZBI3iY82ZtEO9cXVubkfrtNiLgVDfRGuTpmiM3Om97TlkpDXIELW+XegV3fv3ZLyIKFenWCmtybrvNYy+llLohGtTK6sH18x/cvVtk08= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=eNDvKm7A; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="eNDvKm7A" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From: Reply-To:Content-ID:Content-Description:In-Reply-To:References; bh=G7wxEPDKcQnkJA/1OcNBpB48pngHgbJjQaepZMN0sgE=; b=eNDvKm7A9u2CoGA3vlAzE9NR06 4EfgxPGbpvqhPCr6RdFvikfFU/GukshyZoXZPRIh/TaRUadJy2+iyE6uRb+tBAKWl7BGt+eXyFzz2 vSUKyq8t5HkLkCXxv8XHUzXZvUUInaCWtomeTdMVesQqTDSKTJwlH5VEZZ4RR6zNc5ZrVDZ3F3TLo zHMjQLgMO0bRpzTTceNhc6XZXwJtKaFI4CVWuU6rxtaWMZqfZV3jsRDeuylwDR0E0LIU95vdxLK1X p9RpZpmJEyC5LhTc+R7twUEKmn44iNohJHoD67iuH4wiHH0wStdOIVT9BGs97Z5BtgPgtpIjGR8pW sL1oyL4w==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wVXnM-005R2M-30; Fri, 05 Jun 2026 16:53:21 +0000 From: Breno Leitao Date: Fri, 05 Jun 2026 09:53:12 -0700 Subject: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260605-futex-v1-1-4ad4a0d6f265@debian.org> X-B4-Tracking: v=1; b=H4sIAPj+ImoC/6tWKk4tykwtVrJSqFYqSi3LLM7Mz1OyUjDUUVBKz kjMS0/VzUxRslJQMjIwMjMwMzDVTSstSa3QTTY1MbdIMTMyt7QwVdJRUCooSk3LrACbE60U5Oa sFFtbCwB7m9OcXAAAAA== X-Change-ID: 20260605-futex-c5478d627985 To: Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Darren Hart , Davidlohr Bueso , =?utf-8?q?Andr=C3=A9_Almeida?= Cc: linux-kernel@vger.kernel.org, puranjay@kernel.org, rmikey@meta.com, stuclar@meta.com, namhyung@kernel.org, kernel-team@meta.com, Breno Leitao X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=3328; i=leitao@debian.org; h=from:subject:message-id; bh=K3bPHYZjlLuR5D5JWpOVxz4bXcDj5fOburGAnBbNDpc=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqIv79hwhjZCrktyuKX1JCPRDgqvwq0Vxqh+R4j oduuqW1YwaJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaiL+/QAKCRA1o5Of/Hh3 bb1aEACWfTkB1Sz7kFM8umy0+vc+2R1cIYp8hp2GkuSIPEBaZM/wfwmWRskFuiIDXfHinzGJCrT LLv69PZ309G307hbXiFpA+IIjIcM+7GUY1LoU/FE9TrW6gni88zxTPIih4IHJ4oYZHZZUOqjjqg ZZEill2S10spbhKhVliKQKOpTLE98evzJMv9ojvJhIXI+k01hR36FdQ1/83Xhmm2Q9lqZWCvHob YYGoQ4iJMvFhaoBZBZpYgrbVwTcSd84hoLv6QM3qcbmOlr3znNlUzegxCQ7v4VYok6tkpm2ZFqU Iz7p1+Dz3Cq4EIzWPFFQz51M6IAkvygfC1C2IdVN7cOAxhPmLPukYVPo6UQs9XWKlj9vlZO9KIt DGmdRNOWudr17yZhMti2/T5Lqf97qxTxWwMj57hZMzv1Icyjfko5Q5N34lhUtSxEBiB04Zc9CEU 0UrhHFgJas5CE0utTA9BWNgcUYEsmaFd5GUu482KitMH2EcWqv9gjeYDFXLM/xW8ndiypuk/Apg S06fpG0zl4RkYyBf0S0HJHrLlWQca2PWdTZy75mIbBFeZZDIFPBXIPu5b+/DEZlIYD3vgdypg4W CsLDD3S7qAM8uRZR5KLvAo/IPgb+vZy9nkYMFoLTuj/SyOG6qtrTb7G203adEl/8YXlAgtgrAxt CsP/OKtS1vI3AQQ== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao struct futex_hash_bucket packs (atomic_t waiters, spinlock_t lock, struct plist_head chain, struct futex_private_hash *priv) into a single ____cacheline_aligned_in_smp 64-byte block. Three distinct access patterns hit that line: 1. Lockless atomic_read(&hb->waiters) via futex_hb_waiters_pending() on the fast path before taking the lock. 2. spin_lock(&hb->lock) contenders writing the lock word. 3. The lock holder modifying chain.{next,prev} on every futex_wake, futex_q_unlock, plist_add, __futex_unqueue. This was first noticed on a Meta cache (ucache) production workload: perf c2c on a busy 176-core AMD EPYC 9D64 ranked this exact cacheline as the #1 HITM source: 129 Local + 31 Remote HITM, hit by 156 distinct CPUs in a second. The contention is not specific to that workload, though. Our very own "perf bench futex" hash exercises the same buckets and shows the same false sharing, so the rest of this changelog quantifies the fix with perf bench futex. Move chain to its own cacheline so: - Lockless waiters_pending() readers no longer invalidate the line that lock contenders are spinning to acquire. - Cross-CCD lock handoffs ship only the (waiters, lock) line; the next holder reads chain from its own L2/L3 instead of fetching chain entries together with the lock byte. This improves "perf bench futex hash" on a 176-core AMD EPYC 9D64 by 15%: baseline +fix delta average 1,394,938 1,616,781 +15.9 % median 1,430,012 1,617,072 +13.1 % min 1,214,488 1,501,741 +23.5 % max 1,488,167 1,730,734 +16.3 % The distributions do not overlap: the slowest +fix run (1.50 M) is faster than every baseline run except the single fastest (1.49 M). This improves wake up latency as well: perf bench futex wake -s (broadcast wakeup latency, lower is better): baseline: 0.300 / 0.329 / 0.266 ms (avg 0.298) +fix: 0.292 / 0.253 / 0.270 ms (avg 0.272, -9 %) Cost: one extra cacheline (56 B padding) per bucket. Would it be acceptable? Signed-off-by: Breno Leitao --- kernel/futex/futex.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 79ef2c709c81..4981dcf465a9 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -142,7 +142,16 @@ static inline bool should_fail_futex(bool fshared) struct futex_hash_bucket { atomic_t waiters; spinlock_t lock; - struct plist_head chain; + /* + * Keep the plist_head chain on its own cacheline. Lockless + * futex_hb_waiters_pending() readers and lock contenders touch + * the (waiters, lock) line; the lock holder modifies chain on + * every wake/queue. perf c2c on a busy 176-core AMD host showed + * this bucket cacheline as the #1 HITM source (129 Lcl + 31 Rmt + * in 5s), hit by 156 distinct CPUs at offset 0x4 (lock) and + * 0x8/0x10 (chain.{next,prev}). + */ + struct plist_head chain ____cacheline_aligned_in_smp; struct futex_private_hash *priv; } ____cacheline_aligned_in_smp; =20 --- base-commit: b99ae45861eccff1e1d8c7b05a13650be805d437 change-id: 20260605-futex-c5478d627985 Best regards, --=20 Breno Leitao