From nobody Mon May 25 08:10:47 2026 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB3BE38C400 for ; Fri, 15 May 2026 14:50:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778856608; cv=none; b=cJ/zbqv3xUlf3G14HoyU1t19j+I/vnpZmxdaTYlgBVwxY6/6PZnHGmx2uozeaRWANyiwNZ3wTo3DveVwkSQnDbKTny/MjNTjEaXvBd7iu9ZLtcecDi9qMnW1Do3ilaB6FsEqPdQ2NwBTejRqb5oaEJqqhDV2YfsuF5XInVyjwpU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778856608; c=relaxed/simple; bh=VPJ7JbI8E0WJK3Z0Ac90+PtDOgsKGILPoCK4GmJOLmo=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=ttrM2Yq4agwkpQMW1F8vy7UTlgJGEWc75R2pEJJxwgvRllL3JfW3XaTEJTlW23+yyrfU9wfqbwTaCbg4ggDVTLarWHOt6KOFMlvsyY8PzMoHgqALNx1U9P2Gw1RcIBYE7ythiFQ73dhsburIDG3vSCfXPMciRf2Q6yDuzekmut4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OQwoxjQc; arc=none smtp.client-ip=209.85.208.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OQwoxjQc" Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-67cac5ece75so15352263a12.2 for ; Fri, 15 May 2026 07:50:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778856605; x=1779461405; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=5wLPJ697j15buFae28nKx+T1fmX+G41ijvIipmfeRcg=; b=OQwoxjQc1HDNN9Ktkidg/HSzHro3RLim+uTspKojOwSbSJQ0+xbBvG9hUJ/SI9nbi2 kcczzEPQMdQ2W0yHGeJmZHNyicUYcZmiLImVNKDfx6jViuEGHI3SE4W9STEbaEuM1JX+ 5dayr/82fHQDuKoeUX1XOWiAFAR2AJ0B5N0dh3jaJeHHCoL5x+dsmyLcmrtCfAJSz5HY UvWGg/1gJeFml4CyvLCcsh5PTFa5Ea006VBt3PPfCvssYWcbF8wjFEtNXPSIkQEKZSMP Tx5WbS+gt6cU0X78A0r2jBt8aowJlh/DsWLdxzJCHXKugRt5kZ8jcdZkQb1qWAEUll6a g5zQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778856605; x=1779461405; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5wLPJ697j15buFae28nKx+T1fmX+G41ijvIipmfeRcg=; b=pQ62NrAiWMAtDFiYQgGp4aeFww2lhAJabSUZo5H2TCb4XcUbLKo7jVf6AqcDUF+u1R siPLW4+NNvWB0fMAoRmOfZgrDKPhQEXUL5LI72FTGWAXuT0M80tcokG0Tsn2ADWQe8N3 uSWFKem2Mb6BjALN4qbpsIRBwq7ObFfCfMBSuaenRRxRCdtn6Q/Rs2s+gKgUPM5cUHAg WmSmlTPhIeVqlMgEh3QRC96sSD5KBDqPIG2jK3ij/lZnRjolAQ1cNOFYjrPclDN6fJaL NEj1oKbRfgBvBmZ+RXj52l3ujjr58/ldfGxZVG4NGaQtfDb+9PS2YWNoNu8x60S2z+jS QixQ== X-Forwarded-Encrypted: i=1; AFNElJ+Dtk9hyWkIarSGxvm9+wDKSHOSHelyPmSYrL+AFUItIyAFhkNYOuReZMQXe2fi4LWUzaxSiQzV2MaGRuc=@vger.kernel.org X-Gm-Message-State: AOJu0YxmnS4L0EpFch4e0xsnE+jk9nJfm2IJKJQquZcU1AQ9ZkT8NdxK Ae4szRAzNKUABrm0wv5kES8dfg5TTFb/Qc6Bn/FXdjOdqkc4Xg3361mC X-Gm-Gg: Acq92OE0ZeR5jtELoLus3KXTX3oWEe1a+8qdo4Npd5ab2i/Za8V5YlSw5j3w0rxDAwk RLZi+5+3uOp/5LCBUKo1f7poUe6R3TzuEx0Pd/5HcO4CGb4B62B4wx2H9EKhkLTOJqt75YepNFu sK6dohAZsuaGT1DfoBykTJBO5H2XMtkKLxB4lB//vmUV8hf1IRrS6TeOhAJ2h+p0XCGPpmatKMq lWUIjD7c5l/WagpoFM8CXK88mPR+C6AGrmV7+E/ZoQPXAeYlC286btkOVC29zAXI7hrX2wy8Jv6 UGV/OaiLPdtJfGucWDrm9RZyTlVQymry2OGtPvB4pwMDh1hSwKHgI2vd2Wd2K51fXXTz9DF8iRw 4ly3YdNsOtkwZCdmiFRixcISW0Svxp5WWjB6gZlLNx4dayVzTQG6NETz6JwruOAY+5ADLMR6Anw URm1rKvR7QAB8orXp8dsw9bE7mgzzDNUydFKCSKChQSQ== X-Received: by 2002:a17:907:960a:b0:bc6:502e:6d68 with SMTP id a640c23a62f3a-bd517acc9cbmr218905566b.40.1778856605098; Fri, 15 May 2026 07:50:05 -0700 (PDT) Received: from RTRKW671-LIN.domain.local ([77.243.23.192]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-bd4f4c3150dsm242038566b.24.2026.05.15.07.50.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 May 2026 07:50:04 -0700 (PDT) From: Milan Tripkovic To: pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu Cc: alex@ghiti.fr, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Dusan.Stojkovic@rt-rk.com, Milan Tripkovic Subject: [PATCH v2] riscv: lib: add strrchr() zbb implementation Date: Fri, 15 May 2026 16:49:56 +0200 Message-ID: <20260515144956.1389792-1-milant2002@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Milan Tripkovic Add an zbb assembly implementation of strrchr() for RISC-V. The implementation uses ZBB bit-manipulation instructions such as orc.b, ctz, and clz to process multiple bytes per iteration and significantly improve performance for longer strings compared to the generic byte-by-byte implementation. For the test case, I used the existing string_bench_strrchr benchmark, but I changed the input character from '\0' to 'a' to obtain more realistic results, because I added a check for '\0' in the assembly code. Benchmark results (QEMU TCG, rv64): Len | ZBB | WoZBB | %ZBB/WoZBB ------|--------|--------|------------ 1 B | 20.0 | 22.9 | -12.7% 7 B | 87.5 | 110.1 | -20.5% 8 B | 166.8 | 130.3 | +28.0% 16 B | 329.5 | 189.1 | +74.2% 31 B | 366.9 | 195.7 | +87.5% 64 B | 870.3 | 231.5 | +275.9% 127 B | 1007.0 | 278.9 | +261.1% 512 B | 1751.9 | 305.5 | +473.5% 1024 B| 1841.9 | 294.7 | +525.0% 2048 B| 1955.4 | 310.4 | +530.0% 4096 B| 2034.6 | 312.5 | +551.1% Signed-off-by: Milan Tripkovic --- Change in v2: - Added #if defined(CONFIG_RISCV_ISA_ZBB)... - Link to v1:https://lore.kernel.org/all/20260514160910.1796966-1-milan= t2002@gmail.com/ arch/riscv/lib/strrchr.S | 130 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 128 insertions(+), 2 deletions(-) diff --git a/arch/riscv/lib/strrchr.S b/arch/riscv/lib/strrchr.S index ac58b20ca..82a50d440 100644 --- a/arch/riscv/lib/strrchr.S +++ b/arch/riscv/lib/strrchr.S @@ -6,13 +6,17 @@ =20 #include #include +#include +#include =20 /* char *strrchr(const char *s, int c) */ SYM_FUNC_START(strrchr) + __ALTERNATIVE_CFG("nop", "j strrchr_zbb", 0, RISCV_ISA_EXT_ZBB, + IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && IS_ENABLED(CONFIG_TOOLCHAIN_HAS_ZBB)) /* * Parameters * a0 - The string to be searched - * a1 - The character to seaerch for + * a1 - The character to search for * * Returns * a0 - Address of last occurrence of 'c' or 0 @@ -31,7 +35,129 @@ SYM_FUNC_START(strrchr) addi t1, t1, 1 bnez t0, 1b ret -SYM_FUNC_END(strrchr) =20 +/* + * Variant of strrchr using the ZBB extension if available + */ +#if defined(CONFIG_RISCV_ISA_ZBB) && defined(CONFIG_TOOLCHAIN_HAS_ZBB) +strrchr_zbb: +.option push +.option arch,+zbb + /* + * Parameters + * a0 - The string to be searched + * a1 - The character to search for + * + * Returns + * a0 - Address of last occurrence of 'c' or 0 + * + * Clobbers + * t0, t1, t2, t3, t4, t5, t6 + */ + andi a1, a1, 0xff + mv t1, a0 + li a0, 0 + beqz a1, .Lfind_end_zbb + + slli t5, a1, 8 + or t5, t5, a1 + slli t2, t5, 16 + or t5, t5, t2 +#if __riscv_xlen =3D=3D 64 + slli t2, t5, 32 + or t5, t5, t2 +#endif + + andi t2, t1, SZREG-1 + bnez t2, .Lmisaligned_start + +.Lmain_loop_pre: + li t4, -1 + + .balign 16 +.Lmain_loop: + REG_L t0, 0(t1) + addi t1, t1, SZREG + xor t6, t0, t5 + orc.b t2, t0 + orc.b t6, t6 + and t3, t2, t6 + beq t3, t4, .Lmain_loop + + not t2, t2 + not t6, t6 + + beqz t2, .Lonly_matches + + addi t1, t1, -SZREG + ctz t3, t2 + sll t4, t4, t3 + andn t6, t6, t4 + beqz t6, .Ldone + + clz t3, t6 + srli t3, t3, 3 + xori t3, t3, SZREG-1 + add a0, t1, t3 +.Ldone: + ret + +.Lonly_matches: + clz t3, t6 + srli t3, t3, 3 + not t3, t3 + add a0, t1, t3 + j .Lmain_loop + +.Lfind_end_zbb: + andi t2, t1, SZREG-1 + bnez t2, .Lmisaligned_end_start + +.Lfind_end_pre: + li t4, -1 + + .balign 16 +.Lfind_end_loop: + REG_L t0, 0(t1) + addi t1, t1, SZREG + orc.b t2, t0 + beq t2, t4, .Lfind_end_loop + + addi t1, t1, -SZREG + not t2, t2 + ctz t3, t2 + srli t3, t3, 3 + add a0, t1, t3 + ret + +.Lfound_zero: + mv a0, t1 + ret +.Lmisaligned_start: + ori t2, t1, SZREG-1 + addi t2, t2, 1 +.Lalign_loop: + lbu t0, 0(t1) + beqz t0, .Ldone + bne t0, a1, 1f + mv a0, t1 +1: + addi t1, t1, 1 + bne t1, t2, .Lalign_loop + j .Lmain_loop_pre + +.Lmisaligned_end_start: + ori t2, t1, SZREG-1 + addi t2, t2, 1 +.Lfind_end_align: + lbu t0, 0(t1) + beqz t0, .Lfound_zero + addi t1, t1, 1 + bne t1, t2, .Lfind_end_align + j .Lfind_end_pre + +.option pop +#endif +SYM_FUNC_END(strrchr) SYM_FUNC_ALIAS_WEAK(__pi_strrchr, strrchr) EXPORT_SYMBOL(strrchr) --=20 2.43.0