From nobody Wed Dec 17 01:45:37 2025 Received: from mail-pl1-f193.google.com (mail-pl1-f193.google.com [209.85.214.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E619311942 for ; Wed, 10 Dec 2025 16:14:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.193 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765383261; cv=none; b=bqkaVcha751v86B67bW3AoNk9oa43MKeo2OzKWASmM+Ax0r0xWHfSPi7hca7Zk5DSj+NUK4PphNpXMQzeFVwmHENDrlCQIcHyKIqXalR5qXloCpjdDk8gS8AGXZRYCT7d/0HVch9+Wy4L1B26yL++dmaJsAz9ttyhH+KwQJtXw8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765383261; c=relaxed/simple; bh=XhAElkNP1R/DcELUvrMdV1p/6w1v1sH1Jk8z4m0bNMI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=OthKhXp5oMCVt9WzFxjIQe7ZiTU/BP6xaTWrXSeliXRO6pd8dkUr7l6gKzfFJO2YjHkjAsrA5fA87DkXRS4pj1wqhmNMguwn7g+U1TBBEQhdqI1TTgcyvLBG1c7wwbVNqSFDTOTjfaMmYUMPIdvUcoNbIQgqYKMWGT/D22RMrso= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rivosinc.com; spf=pass smtp.mailfrom=rivosinc.com; dkim=pass (2048-bit key) header.d=rivosinc.com header.i=@rivosinc.com header.b=DStChPF0; arc=none smtp.client-ip=209.85.214.193 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rivosinc.com header.i=@rivosinc.com header.b="DStChPF0" Received: by mail-pl1-f193.google.com with SMTP id d9443c01a7336-2981f9ce15cso429535ad.1 for ; Wed, 10 Dec 2025 08:14:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc.com; s=google; t=1765383256; x=1765988056; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=FKiNpIsCdDMxcSIwzFbaMqkolMp6UzV21VB4NyQJyx8=; b=DStChPF0oGmv0ZYHCHQr0mopKkrTSnS3MqBcAPWtCTwb9EqivF7WYhnTsg+Huh8WCu hFUm4GneKAbba19a+3CVXNHHKjbOsIwSGQbLlNMOnxrvS+w0xzCumxQs4KFZsEP2yXKn 5yuIbkNqDF3i/LN+/Ei+1hXbWfdgW2BSBVB7YbxCPuXI2i2SvW2xP6N8J8dlx8OWOjei kv+1SvWiSFg4gDX66WTfe2aPuIdItLwSKzqvXX8bJMiW4XiQcWtOB2V2CTYNpTPtwXp3 fkolGo/L63RIpWHrSdOC65R22+1+VfDPCYXpW+ETLJl3CiihXeUQBp4pp1T+e6JKPDRz UioQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765383256; x=1765988056; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=FKiNpIsCdDMxcSIwzFbaMqkolMp6UzV21VB4NyQJyx8=; b=tKRK1U4SbPqmaFg5pjUyFunCKQt7/rJQn/CWJfqrDvSzGhYDlEA5qmCkVFo765fIfb s8aG/EjbG+set1Bp692Lmo0PcFuNfoa2MntXCkDWH7Qa4vUjxlwOzyDa9fpDm6S+1xS5 yyQ59ibMYnU7Qqx2XY1Lgij4/FxnS4WAt15WPzin8afb+GBs7+Gw2c7dEwkqKJhgogZG TUTe3zX7jxssnX3+E2BXhEnD8U1EJx1WcpOlPz/1dfd82dZxdS9UnnVbNm5z++mHAXlJ fTg8ai+OPtHgpDO4AvtFimLuQfoOqJ0pMMzMYVH3eKpnV/mHKINB0zy4b9hX7Rx/E3fP +L8g== X-Forwarded-Encrypted: i=1; AJvYcCX9cqG2fy4aWY8yCrlf60qYfuToJNLXkbyClbB0ObocWAg+9VhDG1AWm1JHdXqbMbOWNmCMHAliY7iMWbc=@vger.kernel.org X-Gm-Message-State: AOJu0YyvIlZ/+5Q4Khsl1J0e2iUFPD1UatPQh9EZ0e2LN09mXQZZtfV+ 5uz4rZ5frJ4731lIKM7IGDTuNKsgms7oiKdy3E/aTkQH4a3VgSvLq32YbeFP65ey1Ng= X-Gm-Gg: AY/fxX4IIqi3cdynevqIdRpxzQnxDYiKBoWXuBHEFaAPFFbbRJ33taOt1a2tJ3GaTMB GXcZkiFAKwul4XYCYb3WSvpG69b5G9jDdyfp/+Fej3VLI94HLCfXCsMhLAom6uFYfwNDMqziwy7 1EdG+j14bi5ALjoUuPVNNgyX/bht5FilVWrUhDglA/eLh53QAymADPsy80y48JPjckCtzwZ/I43 BRvlzgf9e+QC+nNShPbY6MX5vVNJDg5msTbAMVCkDyKjNRAFoRBuuVSXeIPnztQ0UxiTvvYLhhN Ddoo1UPV2sc+O+cxPcuOZAD4ZNGp/D6kefJzA5WRws1hAYAqVUHzlB7cPVSXBjI1oQzVvUZSySs 7EsWB5J/BKA2Ge4BFG1vznbmh+sSM2qg89bkHIdFndVUIkEI3UUFaFGRD8E64aKqGukOSzMlJWR yoJFA3nKcay5Gb4WeH6H3Oip71rgyn5kfNBGrYLtfnR5Yvz4ElVbQMv1EHA2hdBpnyCQyisab/F UYj7tuzMqb/ X-Google-Smtp-Source: AGHT+IEwjf1hCs+QhT0kgaUbI4n2YkaG1vePiJZxAknjVUE/BY4PVlP9gWJxosnKSFD6G/wTRLwENQ== X-Received: by 2002:a05:7300:642a:b0:2a4:3593:968c with SMTP id 5a478bee46e88-2ac05430bc7mr2827842eec.9.1765383255575; Wed, 10 Dec 2025 08:14:15 -0800 (PST) Received: from [127.0.1.1] (p7838222-ipoefx.ipoe.ocn.ne.jp. [123.225.39.221]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-11f283d4733sm10364600c88.17.2025.12.10.08.14.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Dec 2025 08:14:15 -0800 (PST) From: Charlie Jenkins X-Google-Original-From: Charlie Jenkins Date: Wed, 10 Dec 2025 08:13:47 -0800 Subject: [PATCH RFC 10/10] riscv: csum: Remove inline assembly Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251210-profiles-v1-10-315a6ff2ca5a@gmail.com> References: <20251210-profiles-v1-0-315a6ff2ca5a@gmail.com> In-Reply-To: <20251210-profiles-v1-0-315a6ff2ca5a@gmail.com> To: Paul Walmsley , Palmer Dabbelt , Alexandre Ghiti , Anup Patel , Atish Patra , Samuel Holland , =?utf-8?q?Bj=C3=B6rn_T=C3=B6pel?= , Luke Nelson , Xi Wang , Eric Biggers , Conor Dooley Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Charlie Jenkins X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1765383226; l=5947; i=thecharlesjenkins@gmail.com; s=20240124; h=from:subject:message-id; bh=XhAElkNP1R/DcELUvrMdV1p/6w1v1sH1Jk8z4m0bNMI=; b=pxIq6ajzNKubSjM7RdllJJijS2wCrpEZXpA1x4QQMEnDZVGZuxtBr8aQtNDmwYiPizWboWhVH IjYN7tCROtVD/oQO+TCuKuAd3k5RFZBGcW5djCTgtBiowOuufXAUanA X-Developer-Key: i=thecharlesjenkins@gmail.com; a=ed25519; pk=eVndo3OHViAjwuqHqbJB4ZtzJzzvk/r6fUf84tZ3rw4= When the kernel is set to have zbb enabled by default, the compiler generates better code than is possible with the inline assembly. Removing the inline assembly will greatly simplify the checksumming code and improve the performance when zbb is enabled. However, performance will be decreased on kernels where only runtime discovery is enabled. Moving towards this performance model of optimizing for compiled-in extensions will help to keep the kernel code from spinning out of control with the vast amount of extensions that are available to riscv. Signed-off-by: Charlie Jenkins --- arch/riscv/include/asm/checksum.h | 32 ------------- arch/riscv/lib/csum.c | 94 -----------------------------------= ---- 2 files changed, 126 deletions(-) diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/che= cksum.h index e747af23eea2..ecc4779209b9 100644 --- a/arch/riscv/include/asm/checksum.h +++ b/arch/riscv/include/asm/checksum.h @@ -45,38 +45,6 @@ static inline __sum16 ip_fast_csum(const void *iph, unsi= gned int ihl) csum +=3D csum < ((const unsigned int *)iph)[pos]; } while (++pos < ihl); =20 - /* - * ZBB only saves three instructions on 32-bit and five on 64-bit so not - * worth checking if supported without Alternatives. - */ - if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && - IS_ENABLED(CONFIG_TOOLCHAIN_HAS_ZBB) && - riscv_has_extension_likely(ZBB)) { - unsigned long fold_temp; - - if (IS_ENABLED(CONFIG_32BIT)) { - asm(".option push \n\ - .option arch,+zbb \n\ - not %[fold_temp], %[csum] \n\ - rori %[csum], %[csum], 16 \n\ - sub %[csum], %[fold_temp], %[csum] \n\ - .option pop" - : [csum] "+r" (csum), [fold_temp] "=3D&r" (fold_temp)); - } else { - asm(".option push \n\ - .option arch,+zbb \n\ - rori %[fold_temp], %[csum], 32 \n\ - add %[csum], %[fold_temp], %[csum] \n\ - srli %[csum], %[csum], 32 \n\ - not %[fold_temp], %[csum] \n\ - roriw %[csum], %[csum], 16 \n\ - subw %[csum], %[fold_temp], %[csum] \n\ - .option pop" - : [csum] "+r" (csum), [fold_temp] "=3D&r" (fold_temp)); - } - return (__force __sum16)(csum >> 16); - } - #ifndef CONFIG_32BIT csum +=3D ror64(csum, 32); csum >>=3D 32; diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c index 4db35dd698eb..93c073f2b883 100644 --- a/arch/riscv/lib/csum.c +++ b/arch/riscv/lib/csum.c @@ -40,24 +40,6 @@ __sum16 csum_ipv6_magic(const struct in6_addr *saddr, uproto =3D (__force unsigned int)htonl(proto); sum +=3D uproto; =20 - if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && - IS_ENABLED(CONFIG_TOOLCHAIN_HAS_ZBB) && - riscv_has_extension_likely(ZBB)) { - unsigned long fold_temp; - - asm(".option push \n\ - .option arch,+zbb \n\ - rori %[fold_temp], %[sum], 32 \n\ - add %[sum], %[fold_temp], %[sum] \n\ - srli %[sum], %[sum], 32 \n\ - not %[fold_temp], %[sum] \n\ - roriw %[sum], %[sum], 16 \n\ - subw %[sum], %[fold_temp], %[sum] \n\ - .option pop" - : [sum] "+r" (sum), [fold_temp] "=3D&r" (fold_temp)); - return (__force __sum16)(sum >> 16); - } - sum +=3D ror64(sum, 32); sum >>=3D 32; return csum_fold((__force __wsum)sum); @@ -142,51 +124,6 @@ do_csum_with_alignment(const unsigned char *buff, int = len) end =3D (const unsigned long *)(buff + len); csum =3D do_csum_common(ptr, end, data); =20 -#ifdef CC_HAS_ASM_GOTO_TIED_OUTPUT - if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && - IS_ENABLED(CONFIG_TOOLCHAIN_HAS_ZBB) && - riscv_has_extension_likely(ZBB)) { - unsigned long fold_temp; - -#ifdef CONFIG_32BIT - asm_goto_output(".option push \n\ - .option arch,+zbb \n\ - rori %[fold_temp], %[csum], 16 \n\ - andi %[offset], %[offset], 1 \n\ - add %[csum], %[fold_temp], %[csum] \n\ - beq %[offset], zero, %l[end] \n\ - rev8 %[csum], %[csum] \n\ - .option pop" - : [csum] "+r" (csum), [fold_temp] "=3D&r" (fold_temp) - : [offset] "r" (offset) - : - : end); - - return (unsigned short)csum; -#else /* !CONFIG_32BIT */ - asm_goto_output(".option push \n\ - .option arch,+zbb \n\ - rori %[fold_temp], %[csum], 32 \n\ - add %[csum], %[fold_temp], %[csum] \n\ - srli %[csum], %[csum], 32 \n\ - roriw %[fold_temp], %[csum], 16 \n\ - addw %[csum], %[fold_temp], %[csum] \n\ - andi %[offset], %[offset], 1 \n\ - beq %[offset], zero, %l[end] \n\ - rev8 %[csum], %[csum] \n\ - .option pop" - : [csum] "+r" (csum), [fold_temp] "=3D&r" (fold_temp) - : [offset] "r" (offset) - : - : end); - - return (csum << 16) >> 48; -#endif /* !CONFIG_32BIT */ -end: - return csum >> 16; - } - -#endif /* CC_HAS_ASM_GOTO_TIED_OUTPUT */ #ifndef CONFIG_32BIT csum +=3D ror64(csum, 32); csum >>=3D 32; @@ -215,37 +152,6 @@ do_csum_no_alignment(const unsigned char *buff, int le= n) end =3D (const unsigned long *)(buff + len); csum =3D do_csum_common(ptr, end, data); =20 - if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && - IS_ENABLED(CONFIG_TOOLCHAIN_HAS_ZBB) && - riscv_has_extension_likely(ZBB)) { - unsigned long fold_temp; - -#ifdef CONFIG_32BIT - asm (".option push \n\ - .option arch,+zbb \n\ - rori %[fold_temp], %[csum], 16 \n\ - add %[csum], %[fold_temp], %[csum] \n\ - .option pop" - : [csum] "+r" (csum), [fold_temp] "=3D&r" (fold_temp) - : - : ); - -#else /* !CONFIG_32BIT */ - asm (".option push \n\ - .option arch,+zbb \n\ - rori %[fold_temp], %[csum], 32 \n\ - add %[csum], %[fold_temp], %[csum] \n\ - srli %[csum], %[csum], 32 \n\ - roriw %[fold_temp], %[csum], 16 \n\ - addw %[csum], %[fold_temp], %[csum] \n\ - .option pop" - : [csum] "+r" (csum), [fold_temp] "=3D&r" (fold_temp) - : - : ); -#endif /* !CONFIG_32BIT */ - return csum >> 16; - } - #ifndef CONFIG_32BIT csum +=3D ror64(csum, 32); csum >>=3D 32; --=20 2.43.0