From nobody Wed Dec 17 07:11:37 2025 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BAF820C48E for ; Wed, 19 Mar 2025 18:35:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742409336; cv=none; b=jERIrXaArfDNDM5oTm7k5t0TVEHK4R3bgLOWyuLb38RzmgSFLOG1StLf7Iw6btojqidHYtpqci6nSaP59KmzT8vTM6v+MOruNUUzQXb4VgvCuV0reBPc/6ZiYDHwemQYRJbSMVB9w9nj5yMoeMLBPZ7KaGMUmVie/aOWh1pMyro= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742409336; c=relaxed/simple; bh=63cSUUnxwjEBh6V2AkklarmCWoJh2az/xIh3a/F6s+Y=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=tQTySnlCHvEUPe5iVcDD4TEDaFLvc1rPr90gZxHGdZrEhivJW9MUhUw5fi+INZ2jHOGW5AjPSPj8MLptYy+h3mK5FE8Fbw4UuqKmUG4mddS0uwaslfHWUcIbWSMEW7n/iqCmhkyHf5JDOM3VDIanqCKOvb5k9J/DjboBE0s7TUQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com; spf=pass smtp.mailfrom=rivosinc.com; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b=f0+8B8C6; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b="f0+8B8C6" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-225e3002dffso86728435ad.1 for ; Wed, 19 Mar 2025 11:35:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1742409334; x=1743014134; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=la5qHyryzXsEWAgnraL41q7vvu0SZEHfO3JYEM8IBZk=; b=f0+8B8C6kpJUXF2qMYs3vVGMeB2w1nNUBVE8Rq4/bdd653MueiQcS/JbZj5LnqYaJy 7tBUvvTdFbgVGfqnKrZ4ErorSg2r6ABUZteoit5ndgX4s3Xk5vasI3BqzQTnHjNems5S FKEdmPKH0Bng+NmWmtw7JaASnGg5wgCqnIPiNZBI9t7BpLFM4FecWCYDYT+bPaL+oOso gE2iS/iAB5Mc7QInZfQ6aWfkFfdUDGU3ooaK9HAPzxGr85uqar/vL3jiohxsATYCHGUq hQzGpxvmdmar64PnsXISGLVC3vv7FDnm8yPORZRqZIBOgWObYE6VYvpNWHZvD6hnft94 0VJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742409334; x=1743014134; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=la5qHyryzXsEWAgnraL41q7vvu0SZEHfO3JYEM8IBZk=; b=NMJT6XdGo+BLkLDdvUoRQayuwLvxIJegsQe0CN+KCiOjAns8FxKZHSAWiqaJ3OiB6M TcrminN0oWM8RKKv/wNTL3uQiJBPzPOPKFQGLPEUhZlCnrxea3ZRhQChGKUK6vhXuVGG p5K4hNrChaxUpuGv/yObFKZ2CU6tyuW7PobSPax6waJOXMHToBJTF961Qp5LazKTSa1H f4qrHbSbKXrepy1tvYIOGM8GVh87KW/zpiF5fbL+wT70sRnwtfsRqDiAA3Vo8oG4gfKo kjUlC3kL9N93isWUrvy99AIpPXUmnRPIcAa1PG6BfwzCkA/ok6RUgYFeYnhD8NEQZJ40 EVJA== X-Forwarded-Encrypted: i=1; AJvYcCUgUPTzxPx9RbnSNbhSdblMsCoCRgm8d6/E23cKU9C4fEJ1eTum4sMO4xm22taIUFMjGmsvY5+Oik8ugd4=@vger.kernel.org X-Gm-Message-State: AOJu0YwQWOkmxw4hnyCdyc8sFq+8A9oLwOYFbi4oxEd2BEA3Q16hTdDW Ge/uIjqr01sJoh+CzAbogIHQI9D5bOZFQzIRohnuZEsPERorwWInUHaKlJOBIMk= X-Gm-Gg: ASbGncsr4oBY+BOstfzYXUOWtk2uFFyFugaW903XV3kYOVrdwc9U8CkzLMDrkFeAffW BQJU7lRcLgjfYYkFHyHPDHPUCFGwnUjKpFWeKbOmBhB+wFDfSR7Wm6gVYTwoeNlmS3Y2ZD4R4fa VgEUn01Kw0EBA2dE21tuEYaftLOkoTOVF8QwILT4MM1im/8Gg6F91p39MdmIl12I+YyzhlOlKW+ C6I6L+ibk1PbNoPeWUj2aFmlKmLeuaxAFE1saeqq0vbiRd2+S4SNgvXB7htaV9cdxLqOUrJICOi 0XVmcu/9+8CKwTSwO+DTxxdvQV3tYhH2wBnZuK9ZIlpnDJyEMpbkK5FqWSwV1/hYUNyVOC4= X-Google-Smtp-Source: AGHT+IHCOA+9CrSaw86lUNtvagoFCsKrSr9rd6nDGuzkvUete4IY2k60ro+RPGhImED2JD0D5kTgjw== X-Received: by 2002:a05:6a20:1587:b0:1f5:535c:82df with SMTP id adf61e73a8af0-1fbed3131e6mr6855293637.27.1742409333790; Wed, 19 Mar 2025 11:35:33 -0700 (PDT) Received: from charlie.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7371167e109sm11993844b3a.115.2025.03.19.11.35.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 11:35:32 -0700 (PDT) From: Charlie Jenkins Date: Wed, 19 Mar 2025 11:35:20 -0700 Subject: [PATCH v10 2/2] riscv: Add runtime constant support Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250319-runtime_const_riscv-v10-2-745b31a11d65@rivosinc.com> References: <20250319-runtime_const_riscv-v10-0-745b31a11d65@rivosinc.com> In-Reply-To: <20250319-runtime_const_riscv-v10-0-745b31a11d65@rivosinc.com> To: Paul Walmsley , Palmer Dabbelt , Ard Biesheuvel , Ben Dooks , Pasha Bouzarjomehri , Emil Renner Berthing , Alexandre Ghiti , Steven Rostedt , Masami Hiramatsu , Mark Rutland , Albert Ou , Peter Zijlstra , Josh Poimboeuf , Jason Baron , Andrew Jones Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, Charlie Jenkins X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=11510; i=charlie@rivosinc.com; h=from:subject:message-id; bh=63cSUUnxwjEBh6V2AkklarmCWoJh2az/xIh3a/F6s+Y=; b=owGbwMvMwCXWx5hUnlvL8Y3xtFoSQ/ptvvwDu532G04T10wuihS9sDzx6t6iZSrr9zKZB3xnz S37XlfRUcrCIMbFICumyMJzrYG59Y5+2VHRsgkwc1iZQIYwcHEKwESc/jP8FY5eeeDCYZFmM9F6 Po9DDWFHwq3f/rpSt7OB4Rj393j5F4wMrz84bD7N1cP4OsPPuvxnoMfD2aaPPk4xt/L5tE1M9Uk gOwA= X-Developer-Key: i=charlie@rivosinc.com; a=openpgp; fpr=7D834FF11B1D8387E61C776FFB10D1F27D6B1354 Implement the runtime constant infrastructure for riscv. Use this infrastructure to generate constants to be used by the d_hash() function. This is the riscv variant of commit 94a2bc0f611c ("arm64: add 'runtime constant' support") and commit e3c92e81711d ("runtime constants: add x86 architecture support"). Signed-off-by: Charlie Jenkins Reviewed-by: Alexandre Ghiti Tested-by: Alexandre Ghiti --- arch/riscv/Kconfig | 22 +++ arch/riscv/include/asm/asm.h | 1 + arch/riscv/include/asm/runtime-const.h | 265 +++++++++++++++++++++++++++++= ++++ arch/riscv/kernel/vmlinux.lds.S | 3 + 4 files changed, 291 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 7612c52e9b1e35607f1dd4603a596416d3357a71..c123f7c0579c1aca839e3c04bdb= 662d6856ae765 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -783,6 +783,28 @@ config RISCV_ISA_ZBC =20 If you don't know what to do here, say Y. =20 +config TOOLCHAIN_HAS_ZBKB + bool + default y + depends on !64BIT || $(cc-option,-mabi=3Dlp64 -march=3Drv64ima_zbkb) + depends on !32BIT || $(cc-option,-mabi=3Dilp32 -march=3Drv32ima_zbkb) + depends on LLD_VERSION >=3D 150000 || LD_VERSION >=3D 23900 + depends on AS_HAS_OPTION_ARCH + +config RISCV_ISA_ZBKB + bool "Zbkb extension support for bit manipulation instructions" + depends on TOOLCHAIN_HAS_ZBKB + depends on RISCV_ALTERNATIVE + default y + help + Adds support to dynamically detect the presence of the ZBKB + extension (bit manipulation for cryptography) and enable its usage. + + The Zbkb extension provides instructions to accelerate a number + of common cryptography operations (pack, zip, etc). + + If you don't know what to do here, say Y. + config RISCV_ISA_ZICBOM bool "Zicbom extension support for non-coherent DMA operation" depends on MMU diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h index 776354895b81e7dc332e58265548aaf7365a6037..a8a2af6dfe9d2406625ca8fc940= 14fe5180e4fec 100644 --- a/arch/riscv/include/asm/asm.h +++ b/arch/riscv/include/asm/asm.h @@ -27,6 +27,7 @@ #define REG_ASM __REG_SEL(.dword, .word) #define SZREG __REG_SEL(8, 4) #define LGREG __REG_SEL(3, 2) +#define SRLI __REG_SEL(srliw, srli) =20 #if __SIZEOF_POINTER__ =3D=3D 8 #ifdef __ASSEMBLY__ diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/as= m/runtime-const.h new file mode 100644 index 0000000000000000000000000000000000000000..a23a9bd47903b2765608c75cd83= f01ae578dffaa --- /dev/null +++ b/arch/riscv/include/asm/runtime-const.h @@ -0,0 +1,265 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_RISCV_RUNTIME_CONST_H +#define _ASM_RISCV_RUNTIME_CONST_H + +#include +#include +#include +#include +#include +#include + +#include + +#ifdef CONFIG_32BIT +#define runtime_const_ptr(sym) \ +({ \ + typeof(sym) __ret; \ + asm_inline(".option push\n\t" \ + ".option norvc\n\t" \ + "1:\t" \ + "lui %[__ret],0x89abd\n\t" \ + "addi %[__ret],%[__ret],-0x211\n\t" \ + ".option pop\n\t" \ + ".pushsection runtime_ptr_" #sym ",\"a\"\n\t" \ + ".long 1b - .\n\t" \ + ".popsection" \ + : [__ret] "=3Dr" (__ret)); \ + __ret; \ +}) +#else +/* + * Loading 64-bit constants into a register from immediates is a non-trivi= al + * task on riscv64. To get it somewhat performant, load 32 bits into two + * different registers and then combine the results. + * + * If the processor supports the Zbkb extension, we can combine the final + * "slli,slli,srli,add" into the single "pack" instruction. If the process= or + * doesn't support Zbkb but does support the Zbb extension, we can + * combine the final "slli,srli,add" into one instruction "add.uw". + */ +#define RISCV_RUNTIME_CONST_64_PREAMBLE \ + ".option push\n\t" \ + ".option norvc\n\t" \ + "1:\t" \ + "lui %[__ret],0x89abd\n\t" \ + "lui %[__tmp],0x1234\n\t" \ + "addiw %[__ret],%[__ret],-0x211\n\t" \ + "addiw %[__tmp],%[__tmp],0x567\n\t" \ + +#define RISCV_RUNTIME_CONST_64_BASE \ + "slli %[__tmp],%[__tmp],32\n\t" \ + "slli %[__ret],%[__ret],32\n\t" \ + "srli %[__ret],%[__ret],32\n\t" \ + "add %[__ret],%[__ret],%[__tmp]\n\t" \ + +#define RISCV_RUNTIME_CONST_64_ZBA \ + ".option push\n\t" \ + ".option arch,+zba\n\t" \ + "slli %[__tmp],%[__tmp],32\n\t" \ + "add.uw %[__ret],%[__ret],%[__tmp]\n\t" \ + "nop\n\t" \ + "nop\n\t" \ + ".option pop\n\t" \ + +#define RISCV_RUNTIME_CONST_64_ZBKB \ + ".option push\n\t" \ + ".option arch,+zbkb\n\t" \ + "pack %[__ret],%[__ret],%[__tmp]\n\t" \ + "nop\n\t" \ + "nop\n\t" \ + "nop\n\t" \ + ".option pop\n\t" \ + +#define RISCV_RUNTIME_CONST_64_POSTAMBLE(sym) \ + ".option pop\n\t" \ + ".pushsection runtime_ptr_" #sym ",\"a\"\n\t" \ + ".long 1b - .\n\t" \ + ".popsection" \ + +#if defined(CONFIG_RISCV_ISA_ZBA) && defined(CONFIG_RISCV_ISA_ZBKB) +#define runtime_const_ptr(sym) \ +({ \ + typeof(sym) __ret, __tmp; \ + asm_inline(RISCV_RUNTIME_CONST_64_PREAMBLE \ + ALTERNATIVE_2( \ + RISCV_RUNTIME_CONST_64_BASE, \ + RISCV_RUNTIME_CONST_64_ZBA, \ + 0, RISCV_ISA_EXT_ZBA, 1, \ + RISCV_RUNTIME_CONST_64_ZBKB, \ + 0, RISCV_ISA_EXT_ZBKB, 1 \ + ) \ + RISCV_RUNTIME_CONST_64_POSTAMBLE(sym) \ + : [__ret] "=3Dr" (__ret), [__tmp] "=3Dr" (__tmp)); \ + __ret; \ +}) +#elif defined(CONFIG_RISCV_ISA_ZBA) +#define runtime_const_ptr(sym) \ +({ \ + typeof(sym) __ret, __tmp; \ + asm_inline(RISCV_RUNTIME_CONST_64_PREAMBLE \ + ALTERNATIVE( \ + RISCV_RUNTIME_CONST_64_BASE, \ + RISCV_RUNTIME_CONST_64_ZBA, \ + 0, RISCV_ISA_EXT_ZBA, 1 \ + ) \ + RISCV_RUNTIME_CONST_64_POSTAMBLE(sym) \ + : [__ret] "=3Dr" (__ret), [__tmp] "=3Dr" (__tmp)); \ + __ret; \ +}) +#elif defined(CONFIG_RISCV_ISA_ZBKB) +#define runtime_const_ptr(sym) \ +({ \ + typeof(sym) __ret, __tmp; \ + asm_inline(RISCV_RUNTIME_CONST_64_PREAMBLE \ + ALTERNATIVE( \ + RISCV_RUNTIME_CONST_64_BASE, \ + RISCV_RUNTIME_CONST_64_ZBKB, \ + 0, RISCV_ISA_EXT_ZBKB, 1 \ + ) \ + RISCV_RUNTIME_CONST_64_POSTAMBLE(sym) \ + : [__ret] "=3Dr" (__ret), [__tmp] "=3Dr" (__tmp)); \ + __ret; \ +}) +#else +#define runtime_const_ptr(sym) \ +({ \ + typeof(sym) __ret, __tmp; \ + asm_inline(RISCV_RUNTIME_CONST_64_PREAMBLE \ + RISCV_RUNTIME_CONST_64_BASE \ + RISCV_RUNTIME_CONST_64_POSTAMBLE(sym) \ + : [__ret] "=3Dr" (__ret), [__tmp] "=3Dr" (__tmp)); \ + __ret; \ +}) +#endif +#endif + +#define runtime_const_shift_right_32(val, sym) \ +({ \ + u32 __ret; \ + asm_inline(".option push\n\t" \ + ".option norvc\n\t" \ + "1:\t" \ + SRLI " %[__ret],%[__val],12\n\t" \ + ".option pop\n\t" \ + ".pushsection runtime_shift_" #sym ",\"a\"\n\t" \ + ".long 1b - .\n\t" \ + ".popsection" \ + : [__ret] "=3Dr" (__ret) \ + : [__val] "r" (val)); \ + __ret; \ +}) + +#define runtime_const_init(type, sym) do { \ + extern s32 __start_runtime_##type##_##sym[]; \ + extern s32 __stop_runtime_##type##_##sym[]; \ + \ + runtime_const_fixup(__runtime_fixup_##type, \ + (unsigned long)(sym), \ + __start_runtime_##type##_##sym, \ + __stop_runtime_##type##_##sym); \ +} while (0) + +static inline void __runtime_fixup_caches(void *where, unsigned int insns) +{ + /* On riscv there are currently only cache-wide flushes so va is ignored.= */ + __always_unused uintptr_t va =3D (uintptr_t)where; + + flush_icache_range(va, va + 4 * insns); +} + +/* + * The 32-bit immediate is stored in a lui+addi pairing. + * lui holds the upper 20 bits of the immediate in the first 20 bits of th= e instruction. + * addi holds the lower 12 bits of the immediate in the first 12 bits of t= he instruction. + */ +static inline void __runtime_fixup_32(__le16 *lui_parcel, __le16 *addi_par= cel, unsigned int val) +{ + unsigned int lower_immediate, upper_immediate; + u32 lui_insn, addi_insn, addi_insn_mask; + __le32 lui_res, addi_res; + + /* Mask out upper 12 bit of addi */ + addi_insn_mask =3D 0x000fffff; + + lui_insn =3D (u32)le16_to_cpu(lui_parcel[0]) | (u32)le16_to_cpu(lui_parce= l[1]) << 16; + addi_insn =3D (u32)le16_to_cpu(addi_parcel[0]) | (u32)le16_to_cpu(addi_pa= rcel[1]) << 16; + + lower_immediate =3D sign_extend32(val, 11); + upper_immediate =3D (val - lower_immediate); + + if (upper_immediate & 0xfffff000) { + /* replace upper 20 bits of lui with upper immediate */ + lui_insn &=3D 0x00000fff; + lui_insn |=3D upper_immediate & 0xfffff000; + } else { + /* replace lui with nop if immediate is small enough to fit in addi */ + lui_insn =3D RISCV_INSN_NOP4; + /*=20 + * lui is being skipped, so do a load instead of an add. A load + * is performed by adding with the x0 register. Setting rs to + * zero with the following mask will accomplish this goal. + */ + addi_insn_mask &=3D 0x07fff; + } + + if (lower_immediate & 0x00000fff) { + /* replace upper 12 bits of addi with lower 12 bits of val */ + addi_insn &=3D addi_insn_mask; + addi_insn |=3D (lower_immediate & 0x00000fff) << 20; + } else { + /* replace addi with nop if lower_immediate is empty */ + addi_insn =3D RISCV_INSN_NOP4; + } + + addi_res =3D cpu_to_le32(addi_insn); + lui_res =3D cpu_to_le32(lui_insn); + mutex_lock(&text_mutex); + patch_insn_write(addi_parcel, &addi_res, sizeof(addi_res)); + patch_insn_write(lui_parcel, &lui_res, sizeof(lui_res)); + mutex_unlock(&text_mutex); +} + +static inline void __runtime_fixup_ptr(void *where, unsigned long val) +{ +#ifdef CONFIG_32BIT + __runtime_fixup_32(where, where + 4, val); + __runtime_fixup_caches(where, 2); +#else + __runtime_fixup_32(where, where + 8, val); + __runtime_fixup_32(where + 4, where + 12, val >> 32); + __runtime_fixup_caches(where, 4); +#endif +} + +/* + * Replace the least significant 5 bits of the srli/srliw immediate that is + * located at bits 20-24 + */ +static inline void __runtime_fixup_shift(void *where, unsigned long val) +{ + __le16 *parcel =3D where; + __le32 res; + u32 insn; + + insn =3D (u32)le16_to_cpu(parcel[0]) | (u32)le16_to_cpu(parcel[1]) << 16; + + insn &=3D 0xfe0fffff; + insn |=3D (val & 0b11111) << 20; + + res =3D cpu_to_le32(insn); + mutex_lock(&text_mutex); + patch_text_nosync(where, &res, sizeof(insn)); + mutex_unlock(&text_mutex); +} + +static inline void runtime_const_fixup(void (*fn)(void *, unsigned long), + unsigned long val, s32 *start, s32 *end) +{ + while (start < end) { + fn(*start + (void *)start, val); + start++; + } +} + +#endif /* _ASM_RISCV_RUNTIME_CONST_H */ diff --git a/arch/riscv/kernel/vmlinux.lds.S b/arch/riscv/kernel/vmlinux.ld= s.S index 002ca58dd998cb78b662837b5ebac988fb6c77bb..61bd5ba6680a786bf1db7dc37bf= 1acda0639b5c7 100644 --- a/arch/riscv/kernel/vmlinux.lds.S +++ b/arch/riscv/kernel/vmlinux.lds.S @@ -97,6 +97,9 @@ SECTIONS { EXIT_DATA } + + RUNTIME_CONST_VARIABLES + PERCPU_SECTION(L1_CACHE_BYTES) =20 .rel.dyn : { --=20 2.43.0