From nobody Mon Oct 6 15:13:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13D542E36ED for ; Sat, 19 Jul 2025 03:41:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896471; cv=none; b=sF0CHfA9DQn+JCavlD0lA4EFaTnrXLSAZaGYXrR4NsFvD34U9Evqrb7mzlO3K3B+rmI7wnrvEMdJsgSL5F5efq/SaZBDirLQbFbxKe6R9Kduot16vwsOnWbo1Tsogg8y+viOXjuKlnG4qmuOyk34LyT6FykLgeq0M93XLb65LKs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752896471; c=relaxed/simple; bh=vNKzp08fR6rGe3dowzaw73p6nSns3OGxHIzWfBKeCoc=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=g8ZEl8uDdJKpKApKB2ifHs2r4NhA0mLmU9YfE/Ik0Xk305hJwYvCHrw/skOeinS2SjXIrKb/DLsGQbNc8Zu6ZwqrFl06D2bIZXRqAVIBl61KFFZL+2qVAwrwiFgzplbZWe7cyLRXJXDwZWW13fvqA3wBhMjl7etjZnRx2LGvJl4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ICxRnPSS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ICxRnPSS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 68234C4CEE3; Sat, 19 Jul 2025 03:41:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752896470; bh=vNKzp08fR6rGe3dowzaw73p6nSns3OGxHIzWfBKeCoc=; h=From:To:Cc:Subject:Date:From; b=ICxRnPSScuwdp+SZOPuNnTuroZxiNahWtbCQm7rdVOpcYHcYGPDTbISINBzBw7WYp Htwfy3yVBSSDXPYhv+SuiGJXRquuMAH8407oTxNCnNlN3ONhFLj7HrVOFT2dCAHgGk WDEdjp5zG5hbJVXVcuVnRBVN36fyo242jaUShPKyBSObrjD/i6EOnlegzlYiL0VOna KTqtwC/9c+IHx6fuhLhpTzMDyPmHgI1AY8BfbYNgx4bNJVgtRY2TcGFibEuiluYQhK Z9MXY+APLPoTPvAkNqAoIvLaAD5NBE1IA4Gjj4VVpjAomrdB9mkjftB4FB7r2bqJLZ gycCfC/rVZghQ== From: Drew Fustini To: Palmer Dabbelt , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Alexandre Ghiti , Paul Walmsley , Samuel Holland , Drew Fustini , Andy Chiu , Conor Dooley , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Drew Fustini Subject: [PATCH] riscv: Add sysctl to control discard of vstate during syscall Date: Fri, 18 Jul 2025 20:39:13 -0700 Message-Id: <20250719033912.1313955-1-fustini@kernel.org> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Drew Fustini Clobbering the vector registers can significantly increase system call latency for some implementations. To mitigate this performance impact, a policy mechanism is provided to administrators, distro maintainers, and developers to control vector state discard in the form of a sysctl knob: /proc/sys/abi/riscv_v_vstate_discard Valid values are: 0: Do not discard vector state during syscall 1: Discard vector state during syscall The initial state is controlled by CONFIG_RISCV_ISA_V_VSTATE_DISCARD. Fixes: 9657e9b7d253 ("riscv: Discard vector state on syscalls") Signed-off-by: Drew Fustini --- Documentation/arch/riscv/vector.rst | 15 +++++++++++++++ arch/riscv/Kconfig | 10 ++++++++++ arch/riscv/include/asm/vector.h | 4 ++++ arch/riscv/kernel/vector.c | 16 +++++++++++++++- 4 files changed, 44 insertions(+), 1 deletion(-) I've tested the impact of riscv_v_vstate_discard() on the SiFive X280 cores [1] in the Tenstorrent Blackhole SoC [2]. The results from the Blackhole P100 [3] card show that discarding the vector registers increases null syscall latency by 25%. The null syscall program [4] executes vsetvli and then calls getppid() in a loop. The average duration of getppid() is 198 ns when registers are clobbered in riscv_v_vstate_discard(). The average duration drops to 149 ns when riscv_v_vstate_discard() skips clobbering the registers as result of riscv_v_vstate_discard being set to 0. $ sudo sysctl abi.riscv_v_vstate_discard=3D1 abi.riscv_v_vstate_discard =3D 1 $ ./null_syscall --vsetvli vsetvli complete iterations: 1000000000 duration: 198 seconds avg latency: 198.73 ns $ sudo sysctl abi.riscv_v_vstate_discard=3D0 abi.riscv_v_vstate_discard =3D 0 $ ./null_syscall --vsetvli vsetvli complete iterations: 1000000000 duration: 149 seconds avg latency: 149.89 ns I'm testing on the tt-blackhole-v6.16-rc1_vstate_discard [5] branch that has 13 patches, including this one, on top of v6.16-rc1. Most are simple yaml patches for dt bindings along with dts files and a bespoke network driver. I don't think the other patches are relevant to this discussion. This patch applies clean on its own to riscv/for-next and next-20250718. [1] https://www.sifive.com/cores/intelligence-x200-series [2] https://tenstorrent.com/en/hardware/blackhole [3] https://github.com/tenstorrent/tt-bh-linux [4] https://gist.github.com/tt-fustini/ab9b217756912ce75522b3cce11d0d58 [5] https://github.com/tenstorrent/linux/tree/tt-blackhole-v6.16-rc1_vstate= _discard diff --git a/Documentation/arch/riscv/vector.rst b/Documentation/arch/riscv= /vector.rst index 3987f5f76a9d..1edbce436015 100644 --- a/Documentation/arch/riscv/vector.rst +++ b/Documentation/arch/riscv/vector.rst @@ -137,4 +137,19 @@ processes in form of sysctl knob: As indicated by version 1.0 of the V extension [1], vector registers are clobbered by system calls. =20 +Clobbering the vector registers can significantly increase system call lat= ency +for some implementations. To mitigate the performance impact, a policy mec= hanism +is provided to the administrators, distro maintainers, and developers to c= ontrol +the vstate discard in the form of a sysctl knob: + +* /proc/sys/abi/riscv_v_vstate_discard + + Valid values are: + + * 0: Do not discard vector state during syscall + * 1: Discard vector state during syscall + + Reading this file returns the current discard behavior. The initial st= ate is + controlled by CONFIG_RISCV_ISA_V_VSTATE_DISCARD. + 1: https://github.com/riscv/riscv-v-spec/blob/master/calling-convention.ad= oc diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 0aeee50da016..c0039f21d1f0 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -655,6 +655,16 @@ config RISCV_ISA_V_DEFAULT_ENABLE =20 If you don't know what to do here, say Y. =20 +config RISCV_ISA_V_VSTATE_DISCARD + bool "Enable Vector state discard by default" + depends on RISCV_ISA_V + default n + help + Say Y here if you want to enable Vector state discard on syscall. + Otherwise, userspace has to enable it via the sysctl interface. + + If you don't know what to do here, say N. + config RISCV_ISA_V_UCOPY_THRESHOLD int "Threshold size for vectorized user copies" depends on RISCV_ISA_V diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vecto= r.h index 45c9b426fcc5..77991013216b 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -40,6 +40,7 @@ _res; \ }) =20 +extern bool riscv_v_vstate_discard_ctl; extern unsigned long riscv_v_vsize; int riscv_v_setup_vsize(void); bool insn_is_vector(u32 insn_buf); @@ -270,6 +271,9 @@ static inline void __riscv_v_vstate_discard(void) { unsigned long vl, vtype_inval =3D 1UL << (BITS_PER_LONG - 1); =20 + if (READ_ONCE(riscv_v_vstate_discard_ctl) =3D=3D 0) + return; + riscv_v_enable(); if (has_xtheadvector()) asm volatile (THEAD_VSETVLI_T4X0E8M8D1 : : : "t4"); diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c index 184f780c932d..7a4c209ad337 100644 --- a/arch/riscv/kernel/vector.c +++ b/arch/riscv/kernel/vector.c @@ -26,6 +26,7 @@ static struct kmem_cache *riscv_v_user_cachep; static struct kmem_cache *riscv_v_kernel_cachep; #endif =20 +bool riscv_v_vstate_discard_ctl =3D IS_ENABLED(CONFIG_RISCV_ISA_V_VSTATE_D= ISCARD); unsigned long riscv_v_vsize __read_mostly; EXPORT_SYMBOL_GPL(riscv_v_vsize); =20 @@ -307,11 +308,24 @@ static const struct ctl_table riscv_v_default_vstate_= table[] =3D { }, }; =20 +static const struct ctl_table riscv_v_vstate_discard_table[] =3D { + { + .procname =3D "riscv_v_vstate_discard", + .data =3D &riscv_v_vstate_discard_ctl, + .maxlen =3D sizeof(riscv_v_vstate_discard_ctl), + .mode =3D 0644, + .proc_handler =3D proc_dobool, + }, +}; + static int __init riscv_v_sysctl_init(void) { - if (has_vector() || has_xtheadvector()) + if (has_vector() || has_xtheadvector()) { if (!register_sysctl("abi", riscv_v_default_vstate_table)) return -EINVAL; + if (!register_sysctl("abi", riscv_v_vstate_discard_table)) + return -EINVAL; + } return 0; } =20 --=20 2.34.1