From nobody Mon Jun 8 14:35:13 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 772C434107D for ; Thu, 28 May 2026 21:12:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780002762; cv=none; b=umQwkZE3Zsyq6mJSJr9acmtloMnAf+2FiRUB3SrHp/+7zgYuS5h7iqCko5rAa+SkbT1SFb9S79pMdkCyRLrs4t/XFhxvcPLKqZVApo7kIUkwnbgRTOCA5yoMhwYWxE6XF5hyvBWI8vXnuCGO/m0LSZ2KxKJO9WxYp2lC9B+gRdE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780002762; c=relaxed/simple; bh=c0aYMgatbMx7KzM6scPiCk/yp6yjG6KK91X5VdDzKmM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q8j1cmqM1jNPymfgDhGy1orU6yed5Vp81r5um6JjmYxbs7W2h4jHlC9i0A6oj9T2jDpLVcQkW6rsISs2MX8o0yOm/hCJmJ8hKmxoiGxvMX2WhErBUNm60eXxnkcIOkEzqzB3XO8hM5uH3kXhQkhwjoJXd+qfbslXNbv6O8lgQiU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=hx1eH4BB; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=xX/fy/oe; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="hx1eH4BB"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="xX/fy/oe" From: Nam Cao DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1780002759; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yO2TkV30CBiOE4tZ4btDgEvHpRhLIV3vd2JMv2gLDzU=; b=hx1eH4BB0PhQ0FWGrZjS6apQGsFLclqZPW8oPWUZv5ht/+NidRCMIt8ofmpAICWopm0a22 oHvwNQfRUxvBbhncwZUroHSGLKzGeKzOyji/Xh5/JhDLEU3rbsp9PeSitQu7GuDRq7fri9 yQVyqcqS/rjI2q03FHfU/6x+igAHx0UsePV8v9J5fgqXquiyhiSgQsgtKc7TgzRkOyjRuI zufCH1o9qUtCmMbbO9uYhTZI5EtAqefIg6f396xXx8RW6yU94jVc/TTsXjeJxwQhWV60zH KoccrHo9gDIv/NXUWz19bDEHWi+vsaPCrRUAngahpUX0Zjjvg31XzXnKyik0EQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1780002759; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yO2TkV30CBiOE4tZ4btDgEvHpRhLIV3vd2JMv2gLDzU=; b=xX/fy/oe9KCF9sVzqjh3hvi/Sj2WfWPtdNOln3EtjzumDT+a81mKP8Dkfh3m10jnfx0om5 eX6G+zIO/ICa7mBA== To: Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexandre Ghiti , =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= , Andrew Jones , Charlie Jenkins , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Nam Cao Subject: [PATCH 1/2] riscv: misaligned: Fix fast_unaligned_access_speed_key init Date: Thu, 28 May 2026 23:12:30 +0200 Message-ID: <2468816ceb433394099a00d7822f819745276b49.1780002199.git.namcao@linutronix.de> In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When booting with unaligned_scalar_speed=3Dfast, fast_unaligned_access_speed_key is initialized incorrectly. The key is currently derived from the fast_misaligned_access cpumask, but that mask is only populated when the unaligned access speed probe runs. Specifying unaligned_scalar_speed=3Dfast skips the probe entirely, leaving the mask uninitialized. The information tracked by fast_misaligned_access is already available in the misaligned_access_speed per-CPU variable. Use that to initialize fast_unaligned_access_speed_key instead and remove the redundant cpumask. Signed-off-by: Nam Cao --- arch/riscv/kernel/unaligned_access_speed.c | 69 +++++++--------------- 1 file changed, 22 insertions(+), 47 deletions(-) diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel= /unaligned_access_speed.c index 11c781a4de73..bb57eb5d19df 100644 --- a/arch/riscv/kernel/unaligned_access_speed.c +++ b/arch/riscv/kernel/unaligned_access_speed.c @@ -27,8 +27,6 @@ DEFINE_PER_CPU(long, vector_misaligned_access) =3D RISCV_= HWPROBE_MISALIGNED_VECTOR static long unaligned_scalar_speed_param =3D RISCV_HWPROBE_MISALIGNED_SCAL= AR_UNKNOWN; static long unaligned_vector_speed_param =3D RISCV_HWPROBE_MISALIGNED_VECT= OR_UNKNOWN; =20 -static cpumask_t fast_misaligned_access; - static u64 __maybe_unused measure_cycles(void (*func)(void *dst, const void *src, size_t len), void *dst, void *src, size_t len) @@ -131,13 +129,10 @@ static int check_unaligned_access(struct page *page) * Set the value of fast_misaligned_access of a CPU. These operations * are atomic to avoid race conditions. */ - if (ret) { + if (ret) per_cpu(misaligned_access_speed, cpu) =3D RISCV_HWPROBE_MISALIGNED_SCALA= R_FAST; - cpumask_set_cpu(cpu, &fast_misaligned_access); - } else { + else per_cpu(misaligned_access_speed, cpu) =3D RISCV_HWPROBE_MISALIGNED_SCALA= R_SLOW; - cpumask_clear_cpu(cpu, &fast_misaligned_access); - } =20 return 0; } @@ -192,49 +187,24 @@ static void __init check_unaligned_access_speed_all_c= pus(void) =20 DEFINE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key); =20 -static void modify_unaligned_access_branches(cpumask_t *mask, int weight) +static void modify_unaligned_access_branches(const cpumask_t *mask) { - if (cpumask_weight(mask) =3D=3D weight) + bool fast =3D true; + int cpu; + + for_each_cpu(cpu, mask) { + if (per_cpu(misaligned_access_speed, cpu) !=3D RISCV_HWPROBE_MISALIGNED_= SCALAR_FAST) { + fast =3D false; + break; + } + } + + if (fast) static_branch_enable_cpuslocked(&fast_unaligned_access_speed_key); else static_branch_disable_cpuslocked(&fast_unaligned_access_speed_key); } =20 -static void set_unaligned_access_static_branches_except_cpu(int cpu) -{ - /* - * Same as set_unaligned_access_static_branches, except excludes the - * given CPU from the result. When a CPU is hotplugged into an offline - * state, this function is called before the CPU is set to offline in - * the cpumask, and thus the CPU needs to be explicitly excluded. - */ - - cpumask_t fast_except_me; - - cpumask_and(&fast_except_me, &fast_misaligned_access, cpu_online_mask); - cpumask_clear_cpu(cpu, &fast_except_me); - - modify_unaligned_access_branches(&fast_except_me, num_online_cpus() - 1); -} - -static void set_unaligned_access_static_branches(void) -{ - /* - * This will be called after check_unaligned_access_all_cpus so the - * result of unaligned access speed for all CPUs will be available. - * - * To avoid the number of online cpus changing between reading - * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be - * held before calling this function. - */ - - cpumask_t fast_and_online; - - cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask); - - modify_unaligned_access_branches(&fast_and_online, num_online_cpus()); -} - static int riscv_online_cpu(unsigned int cpu) { int ret =3D cpu_online_unaligned_access_init(cpu); @@ -266,14 +236,19 @@ static int riscv_online_cpu(unsigned int cpu) #endif =20 exit: - set_unaligned_access_static_branches(); + modify_unaligned_access_branches(cpu_online_mask); =20 return 0; } =20 static int riscv_offline_cpu(unsigned int cpu) { - set_unaligned_access_static_branches_except_cpu(cpu); + cpumask_t mask; + + cpumask_copy(&mask, cpu_online_mask); + cpumask_clear_cpu(cpu, &mask); + + modify_unaligned_access_branches(&mask); =20 return 0; } @@ -430,7 +405,7 @@ static int __init check_unaligned_access_all_cpus(void) riscv_online_cpu_vec, NULL); =20 cpus_read_lock(); - set_unaligned_access_static_branches(); + modify_unaligned_access_branches(cpu_online_mask); cpus_read_unlock(); =20 return 0; --=20 2.47.3 From nobody Mon Jun 8 14:35:13 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E02A133372A for ; Thu, 28 May 2026 21:12:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780002763; cv=none; b=D3E+T5coAObw2yXo/fe3WPeGJkRhuYplYgW8P+1ebP5HDdy+kJgvKhHft9U7uixX7imgBb6kas2RlzlvL+7KwwatVBEnzvx5takRBqWiaDzHCtj52jEvNfBqrSDffww+KaM38ySPpqg+Y7gh/D072p/InyNXUg6tonsYcBjLAmk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780002763; c=relaxed/simple; bh=kpEbe4Q9TY0OTbJEGIvNa9+fi1axLjTBPMb4qT4LPo4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=huvV7bxgmgeZXh3aDDafn4cJOhurVVeWnbkX2bdspka66Th6jlCo7WibJleXmoLuVYRw/aVkcZPyRCIaoe4UeD+MIFn3B0jkZPtqO0v69UBUwBkx5XEqYJPesoMbBBfVJus2WVD1at7XT5r+eO+U+iF2h522phNsqOpffeWeGUA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=ZUw+tPle; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=fzr34QDb; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ZUw+tPle"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="fzr34QDb" From: Nam Cao DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1780002760; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DAiPlFC0KE16G4pPhGKAXNcBhYFRFUpkc+jmIQ76G1U=; b=ZUw+tPledk24a8olyYwdqu6xyZ2coiCqh+38Zpf4HMbAqIhoYHEzPFfqV0LNQ8EhL3tDCP dh4DLPMWOjrHiHcNHSfv8DIIxCaW7G36+IAMJKU8CiMZwaxIgseRRsYczjxkjxT7HHCLc9 8yeDZaqI+h57LpLA+OKpgphHPZ0SEcLtgtj3gnhivc8Af94tqNq+g39onLZxuS1rPNEWI9 aWhjZxKYe7x0zTObi4d6+ie+e5TIbqKq+io4i1KNuekpBvGKi5dbg622MV3M8+/knkhkwh LfvmsHARINTZQKrDd6MICmIZQriNpM9h+b6BZWUwy5+glf0gUqcyaRHohRnz9Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1780002760; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DAiPlFC0KE16G4pPhGKAXNcBhYFRFUpkc+jmIQ76G1U=; b=fzr34QDbvAwUZMHVFTSEs4Qhv4PmUtVwMGyGy0AoGf7R6PIValKIVnUDFV2SIwI1/5VPcb 3WS2SB1+eE++C4BQ== To: Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexandre Ghiti , =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= , Andrew Jones , Charlie Jenkins , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Nam Cao Subject: [PATCH 2/2] riscv: traps_misaligned: Avoid redundant unaligned access speed probe Date: Thu, 28 May 2026 23:12:31 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a CPU is taken offline and then is brought back online, unaligned access speed probe always runs even though the unaligned access speed is already known, wasting CPU cycles. This is because when a CPU becomes online, the following happen: 1. check_unaligned_access_emulated() is called, which clears misaligned_access_speed if there is no emulation. 2. check_unaligned_access() is called because misaligned_access_speed is cleared, wasting CPU cycles determining something already previous known. Avoid the redundant access speed probe by stop clearing misaligned_access_speed in (1). If access speed is already known, just reuse it. On my Visionfive 2, this reduces CPU bring-up time from 26ms to 0.8ms. Signed-off-by: Nam Cao --- arch/riscv/kernel/traps_misaligned.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps= _misaligned.c index 81b7682e6c6d..6e8ae6c66322 100644 --- a/arch/riscv/kernel/traps_misaligned.c +++ b/arch/riscv/kernel/traps_misaligned.c @@ -522,10 +522,10 @@ static bool unaligned_ctl __read_mostly; static void check_unaligned_access_emulated(void *arg __always_unused) { int cpu =3D smp_processor_id(); - long *mas_ptr =3D per_cpu_ptr(&misaligned_access_speed, cpu); unsigned long tmp_var, tmp_val; =20 - *mas_ptr =3D RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN; + if (per_cpu(misaligned_access_speed, cpu) !=3D RISCV_HWPROBE_MISALIGNED_S= CALAR_UNKNOWN) + return; =20 __asm__ __volatile__ ( " "REG_L" %[tmp], 1(%[ptr])\n" --=20 2.47.3