From nobody Tue Apr 7 01:18:35 2026 Received: from cstnet.cn (smtp25.cstnet.cn [159.226.251.25]) (using TLSv1.2 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8DE4A21D599 for ; Tue, 17 Mar 2026 02:22:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=159.226.251.25 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773714174; cv=none; b=IWqBbBlpqqGISsd5eeKY+SYQYPWvg2c4IazIAwns+AwwHtP1g+6hD948hLedntmZk7XXlGeZHRM7ZWno/nc9B3dy0CJrixBxh3LGz5inK+YbTEvpEABisZJt1XOo1ToYvwVCvaQdiDtfoa8/vZbLU/yPE+Qc4/MhDOy9/XGkgH4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773714174; c=relaxed/simple; bh=wQuk4JiyM37S9KgXeC7E92goR0r0EBQnivrif1boHhI=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=VkWK069cT9q+kWbIDrHHNKilcmMb0HOOXXY5fetp2FdyjnVIrQziFTVGkVxI4T+eUnf0pWaPIO9zSTkghuY2N38f5n4jOpDhMi3OhCXQzlqqe/Sx13PL8epPa/5JfFS8CaPkNcJtuf/1ZOVYmAaKhcMND+p+n46jU2NBufIpN/8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=iscas.ac.cn; spf=pass smtp.mailfrom=iscas.ac.cn; arc=none smtp.client-ip=159.226.251.25 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=iscas.ac.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=iscas.ac.cn Received: from localhost.localdomain (unknown [121.237.245.153]) by APP-05 (Coremail) with SMTP id zQCowACHFQnturhpsbunCg--.37111S2; Tue, 17 Mar 2026 10:22:37 +0800 (CST) From: daichengrong To: Paul Walmsley , Palmer Dabbelt Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, daichengrong Subject: [RFC PATCH] riscv: add userspace interface to voluntarily release vector state Date: Tue, 17 Mar 2026 10:22:32 +0800 Message-Id: <20260317022232.11022-1-daichengrong@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: zQCowACHFQnturhpsbunCg--.37111S2 X-Coremail-Antispam: 1UD129KBjvJXoWxuFykCr1xJFWDZr1xXFW8JFb_yoW7WF13pF s8CrWfJrWrCr1xur9Iy3ykWr4rGas5Ww4akr47Wa43A3W3KrW5Xr93Ka4DZF4UJFyY9a4j 9ayYkrykCw4UAF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvG14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AKxV WxJr0_GcWle2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2Wl Yx0E2Ix0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbV WUJVW8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AF wI0_JF0_Jw1lc2xSY4AK67AK6r4UMxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r 1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CE b7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0x vE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAI cVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWIevJa 73UjIFyTuYvjfU0YFCUUUUU X-CM-SenderInfo: pgdluxxhqj201qj6x2xfdvhtffof0/ Content-Type: text/plain; charset="utf-8" Vector registers in RVV can be large, and saving/restoring them on context switches introduces overhead. Some workloads only use vector instructions in short phases, after which the vector state does not need to be preserved. This patch introduces a userspace-controlled mechanism: - Userspace can declare that it no longer needs the vector state. - Kernel will skip saving/restoring vector registers during context switch while the declaration is active. - If the thread executes vector instructions after releasing its vector state, the kernel will revoke the declaration automatically. This reduces unnecessary vector context switch overhead and improves performance in workloads with intermittent vector usage. This is an RFC patch to solicit feedback on the API design and implementation approach. Signed-off-by: daichengrong --- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/syscall.h | 2 ++ arch/riscv/include/asm/vector.h | 7 +++++-- arch/riscv/kernel/process.c | 1 + arch/riscv/kernel/sys_riscv.c | 12 ++++++++++++ scripts/syscall.tbl | 1 + 6 files changed, 22 insertions(+), 2 deletions(-) diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/pr= ocessor.h index 4c3dd94d0f63..b59f1456918b 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -113,6 +113,7 @@ struct thread_struct { unsigned long envcfg; unsigned long sum; u32 riscv_v_flags; + unsigned long riscv_v_release_flags; u32 vstate_ctrl; struct __riscv_v_ext_state vstate; unsigned long align_ctl; diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/sysc= all.h index 8067e666a4ca..f6be37b01a67 100644 --- a/arch/riscv/include/asm/syscall.h +++ b/arch/riscv/include/asm/syscall.h @@ -121,4 +121,6 @@ asmlinkage long sys_riscv_flush_icache(uintptr_t, uintp= tr_t, uintptr_t); =20 asmlinkage long sys_riscv_hwprobe(struct riscv_hwprobe *, size_t, size_t, unsigned long *, unsigned int); +// asmlinkage long sys_riscv_release_vector_register(uintptr_t); +asmlinkage long sys_riscv_release_vector_register(void); #endif /* _ASM_RISCV_SYSCALL_H */ diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vecto= r.h index 00cb9c0982b1..4bccccc20cc3 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -309,6 +309,7 @@ static inline void riscv_v_vstate_save(struct __riscv_v= _ext_state *vstate, if (__riscv_v_vstate_check(regs->status, DIRTY)) { __riscv_v_vstate_save(vstate, vstate->datap); __riscv_v_vstate_clean(regs); + WRITE_ONCE(current->thread.riscv_v_release_flags, 0); } } =20 @@ -325,8 +326,10 @@ static inline void riscv_v_vstate_set_restore(struct t= ask_struct *task, struct pt_regs *regs) { if (riscv_v_vstate_query(regs)) { - set_tsk_thread_flag(task, TIF_RISCV_V_DEFER_RESTORE); - riscv_v_vstate_on(regs); + if (!READ_ONCE(current->thread.riscv_v_release_flags)) { + set_tsk_thread_flag(task, TIF_RISCV_V_DEFER_RESTORE); + riscv_v_vstate_on(regs); + } } } =20 diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index aacb23978f93..f1f36a3c7914 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -279,6 +279,7 @@ int copy_thread(struct task_struct *p, const struct ker= nel_clone_args *args) p->thread.ra =3D (unsigned long)ret_from_fork_user_asm; } p->thread.riscv_v_flags =3D 0; + p->thread.riscv_v_release_flags =3D 0; if (has_vector() || has_xtheadvector()) riscv_v_thread_alloc(p); p->thread.sp =3D (unsigned long)childregs; /* kernel sp */ diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index 22fc9b3268be..934ddc06858d 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -8,6 +8,7 @@ #include #include #include +#include =20 static long riscv_sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -78,6 +79,17 @@ SYSCALL_DEFINE3(riscv_flush_icache, uintptr_t, start, ui= ntptr_t, end, return 0; } =20 +SYSCALL_DEFINE0(riscv_release_vector_register) +{ + struct pt_regs *regs =3D task_pt_regs(current); + + if (__riscv_v_vstate_check(regs->status, DIRTY)) + __riscv_v_vstate_clean(regs); + + WRITE_ONCE(current->thread.riscv_v_release_flags, 1); + return 0; +} + /* Not defined using SYSCALL_DEFINE0 to avoid error injection */ asmlinkage long __riscv_sys_ni_syscall(const struct pt_regs *__unused) { diff --git a/scripts/syscall.tbl b/scripts/syscall.tbl index 7a42b32b6577..1d0a493b87c3 100644 --- a/scripts/syscall.tbl +++ b/scripts/syscall.tbl @@ -302,6 +302,7 @@ =20 244 or1k or1k_atomic sys_or1k_atomic =20 +257 riscv riscv_release_vector_register sys_riscv_release_vector_register 258 riscv riscv_hwprobe sys_riscv_hwprobe 259 riscv riscv_flush_icache sys_riscv_flush_icache =20 --=20 2.25.1