From nobody Fri Dec 19 04:19:00 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=citrix.com ARC-Seal: i=1; a=rsa-sha256; t=1742928765; cv=none; d=zohomail.com; s=zohoarc; b=jbjA8fAiw8031u0W4MoUwzz88RpJOOK99ajUjnL4v3ocAb5slv4gBG1Je4akuK1oWF6a1IcebkrOOD/YLzz4QDOlLfAlgv3BM+0XSWgJqsFp3nJfhf92CgNuLd3rHhEdLEzCrU4Xa78AYPq4Lrcq+yv4Y2WJFgzBlTTm3rDBcG4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1742928765; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=SsmRaKPEIATAjGQw1ayk5oTYnpRRR/3cnZRb904ZOsc=; b=BoH0GjwxCwc9ozkYXT1T5jiKwrPgHBV/7y0ahnS4vPbCALyyJi1WVJV8q/BrFV9Qna9w4QfXgD4qCqSD6pUC0iV8vuNflwYufGpF4THjPi5BXMuP+h1LYqtRo2tVVtYHA6XJ4tUjKkBC2K6v8aFTtQUmXcg1EPs8I08v5ReTpnQ= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 174292876539723.9125679112949; Tue, 25 Mar 2025 11:52:45 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.926936.1329755 (Exim 4.92) (envelope-from ) id 1tx9Ny-0000t6-1p; Tue, 25 Mar 2025 18:52:26 +0000 Received: by outflank-mailman (output) from mailman id 926936.1329755; Tue, 25 Mar 2025 18:52:26 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tx9Nx-0000sz-VU; Tue, 25 Mar 2025 18:52:25 +0000 Received: by outflank-mailman (input) for mailman id 926936; Tue, 25 Mar 2025 18:52:24 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tx9Nw-0000s8-Cl for xen-devel@lists.xenproject.org; Tue, 25 Mar 2025 18:52:24 +0000 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [2a00:1450:4864:20::332]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 49868b4f-09aa-11f0-9ea3-5ba50f476ded; Tue, 25 Mar 2025 19:52:23 +0100 (CET) Received: by mail-wm1-x332.google.com with SMTP id 5b1f17b1804b1-43948021a45so52365875e9.1 for ; Tue, 25 Mar 2025 11:52:23 -0700 (PDT) Received: from localhost.localdomain (host-92-26-98-202.as13285.net. [92.26.98.202]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d43f43e50sm206824065e9.12.2025.03.25.11.52.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Mar 2025 11:52:21 -0700 (PDT) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 49868b4f-09aa-11f0-9ea3-5ba50f476ded DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=citrix.com; s=google; t=1742928742; x=1743533542; darn=lists.xenproject.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=SsmRaKPEIATAjGQw1ayk5oTYnpRRR/3cnZRb904ZOsc=; b=AtzybXMTXKh32/3CmtkvfOajX6++0rIfScPt22lR0etaj/vjtwWId97TtWvunTMkHE rJLTFLsMZCqh2WGBJOD9XKZSt0X36EdQsz7t+JIGs49xWQ0wqEbm6moLPlq/LFIITFSZ 6297HsTNN10jEM057Y2wUC6d0CwN7XBDcXsqw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742928742; x=1743533542; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SsmRaKPEIATAjGQw1ayk5oTYnpRRR/3cnZRb904ZOsc=; b=fzYs7xv9W0/OwOL60V4rW1ocD8e6VzlBZMz7ISKjXJqYKkcU+Oixar4DcaQ57699Ly ljtIP7nw9QrOdqKD06HvNtWyxPSNIg/+2WHoLBgK83vQUCFdWQrs/5BYcAX5hxnxSm70 UvUH/zRWkBDlKIjxxCJGs+RmkMVdgKrvQj9VMJGdbPVxqYvkcnzeYif3DnYpweEfGy+8 Htv3po02zWl4DX7Tg/6k/++DBCRSWFPjHoRGO5WLyCZrBbTfdqiBtqIPDr0uLJd5g5K+ a14c29grTb6JwofeBZCcNo9yCILOotiFtdViZdnYMjeusD68CgcW9+7qpBTR/B0miI8Y l3Lw== X-Gm-Message-State: AOJu0YxyZ2qCFiDhl7v0mNHQN9O4f/yTssuhB595qMif+aHr4J1zUUKM dq+MYJDDfP/MTbZ7YuuFmhg+JAwMyRYCRejXnmneXSwYaIlTiy/enK58gwf2cKaqjoCewVwCuvH wj5k= X-Gm-Gg: ASbGncuvNsv+GP1xq33CnoxV9FCNHpgdkWJ2TqLj7JvQxHD0VQoQcDpBdUze25gn/IQ sg6BTbS6MAW9Qfk1bkHicwlqAGEknLYPSMf339PBGgRr1+A+W1/bi0T4vZAT6oQJ1s6foeg3d0X 8fCkx99dIdpfoKGaGKMwFjEvpOzik/S158kD15IoEit7uI9CT2jz2sIw3bLgGS7dRrEqyv7FmLA ReKVL7Ot5U8vRuQtVcAXOSptpSUJH8lwtQcautcEtBnvQR3q07L45eG9/W65AVTyTJlSD/5U/ME WDEo5bKfU0aXclIg2AufYhQfgJm86y12ByusrSY/MmK7/3QoWS17jMog7ql0rozL0bUdZ2Nn15g XGRZaRIlAnqzwTutEbA== X-Google-Smtp-Source: AGHT+IG3neElunJ2uTnihP6zdGPeb3Vj6lm/jTIIlyVd8l8oBDOr15uQNT0aD5OVLMsQAG3w2OWljg== X-Received: by 2002:a05:600c:1ca4:b0:43c:fbbf:7bf1 with SMTP id 5b1f17b1804b1-43d50a3a75bmr187527705e9.30.1742928742254; Tue, 25 Mar 2025 11:52:22 -0700 (PDT) From: Andrew Cooper To: Xen-devel Cc: Andrew Cooper , Jan Beulich , =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= Subject: [PATCH] x86/bitops: Account for POPCNT errata on earlier Intel CPUs Date: Tue, 25 Mar 2025 18:52:19 +0000 Message-Id: <20250325185219.315319-1-andrew.cooper3@citrix.com> X-Mailer: git-send-email 2.39.5 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @citrix.com) X-ZM-MESSAGEID: 1742928766714019100 Manually break the false dependency for the benefit of cases such as bitmap_weight() which is a reasonable hotpath. Signed-off-by: Andrew Cooper Acked-by: Jan Beulich --- CC: Jan Beulich CC: Roger Pau Monn=C3=A9 Not sure if this warrants a fixes or not, but: Fixes: 6978602334d9 ("x86/bitops: Use the POPCNT instruction when available= ") Many examples online suggest a 2x improvement perf improvement on tight loo= ps by breaking this dependency. cpumasks in particular make frequent use of t= his loop. Still TODO: 1) Put a double CS prefix on the CALL instruction to avoid a trailing 2-by= te NOP, but this depends on x86_decode_lite() in order to work. 2) Revert a buggy GAS diagnostic: ./arch/x86/include/asm/bitops.h: Assembler messages: ./arch/x86/include/asm/bitops.h:493: Error: same type of prefix used tw= ice ./arch/x86/include/asm/bitops.h:493: Error: same type of prefix used tw= ice Multiple prefixes are not an error, and are sometimes the best choice available. It turns out that LZCNT/TZCNT have the same input dependent bug, prior to Skylake. There are no instructions in the "cleaned up" part of bitops yet, and I don't expect any to survive cleaning. --- xen/arch/x86/include/asm/bitops.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/b= itops.h index bb9d75646023..87eac7782f10 100644 --- a/xen/arch/x86/include/asm/bitops.h +++ b/xen/arch/x86/include/asm/bitops.h @@ -488,10 +488,16 @@ static always_inline unsigned int arch_hweightl(unsig= ned long x) * * This limits the POPCNT instruction to using the same ABI as a funct= ion * call (input in %rdi, output in %eax) but that's fine. + * + * On Intel CPUs prior to Cannon Lake, the POPCNT instruction has a fa= lse + * input dependency on it's destination register (errata HSD146, SKL029 + * amongst others), impacting loops such as bitmap_weight(). Insert an + * XOR to manually break the dependency. */ alternative_io("call arch_generic_hweightl", + "xor %k[res], %k[res]\n\t" "popcnt %[val], %q[res]", X86_FEATURE_POPCNT, - ASM_OUTPUT2([res] "=3Da" (r) ASM_CALL_CONSTRAINT), + ASM_OUTPUT2([res] "=3D&a" (r) ASM_CALL_CONSTRAINT), [val] "D" (x)); =20 return r; --=20 2.39.5