From nobody Thu Oct 2 11:48:18 2025 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E5DD2D97B6 for ; Thu, 18 Sep 2025 06:36:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758177376; cv=none; b=LXCrF13qsfnZv/k5761UZZEbGvXBMkFJcEIdxx3vOffEiB1UKDJNyParvW0hiFqDEnQe3urUv4VlUdCDv1JSl5xnGw4gBZkQVLV6k3vTEqTREbvXjHn06DBexdV1RdzrpK46p+zZgF4leyX3dK+y1lQkMCWSJpom+lIV70Tqdh8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758177376; c=relaxed/simple; bh=yZilTx72GF8qqKvNUbLyRsBvojfMOoic5NxQVYR1doQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=h78E27mXPpJrT53JEgIamyz7EMKD0BLhIRht5XmkikD7R+kBa4mB/kdZwImyCFGcVnfTVMfbxkV2NgyrTTkdYFbn/GWLwLAm8QsGOV3OROGX1cltVOvNwSIwvnLiQw/MLhy93QIPJDYLoN6tt01BOzK8Prpuw7/4DrXAHCMQt98= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gCdRHsIt; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gCdRHsIt" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-45b990eb77cso4265025e9.0 for ; Wed, 17 Sep 2025 23:36:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1758177373; x=1758782173; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=A1ta9KttEF8sJmGrgUmmN67wbPwjz6EsYqXVUKnq6iI=; b=gCdRHsItTVNUcyqQGCJDAlulH7ER5iVlFUaZTwNwI4eMZ2XDpV1v4GCbIoTT5G0qGt R6bNas8vIzA2zVfnuHKk0s8JwBM1G5wIMRhkHpZpA3JIGSR3PCw+qMbMjAVnXC9CYQSO zpPvkP72QHOImC5H2/wcIqjsRLz9Z/CSHM8Wvuyj/CCWHF6tPnSGM9sbkis76GFW4D0n c6F7Yf1U3+aMygwyBY67Hk/ZyrXNMzW0/OMq16xHUOgv/MMuo3QkelOxPuupKgttarGv js5/vmTnJgmC1jg9014BSiC9YTyNaYfx0sEw1Gs3NlgUCJcOGxMKq+m4D38rYZn6aSCj KawQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758177373; x=1758782173; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=A1ta9KttEF8sJmGrgUmmN67wbPwjz6EsYqXVUKnq6iI=; b=THmX1osgbpbiEFE0z04o0vCFtElATVPFkVrbz+gM1FQIkTymGBGum0Zm/saeYgVYB3 /z0ei5WrIAW8r5GOezAl/JONTwe9CdgBEXaPoODGZT754sZp6Ahs7BDwoZ/gZymdR0AB Jec0y5jFgoBtlGA9u1fhR/gBQtuwl6tVXsyALUYOoMt2R0F/n4RnNIDTwTszYu9+cvZR RneZaU5M/8soQ+HZzNo3HhntkhtnNFP/xtzzda5jz7ZNCOG5TMckKOnnkqNLnTdTjQv9 1qHDA6DAaQHmVW2EKd8o7gzf8iLQrop7ptONql1gnMiqOGMqyx9PyGxfMIjVtX5kt9DV ezeQ== X-Forwarded-Encrypted: i=1; AJvYcCV+GKbP0hkLy90yO48YzAGytEAnpmo9rC/9Vxds+iE4i15xC0xBTJwRQIrZKK9zZiSjVrQz6EXaN+mFe2k=@vger.kernel.org X-Gm-Message-State: AOJu0YzKbEvyBiaMcU/3Us2II5MrVz2s0US4JPxTqgogXqHyumysZ56T fIIp3ttMcY1lespPSTKFCboh4QKzLdDhCZ4V2dKFKCp3tCdaN4N0ijtlXCiXVJN9IjG0gwbYKw= = X-Google-Smtp-Source: AGHT+IHwDdNQKSpulh2BKmpGNYTuQzvpQWSRjIlrRJfiSKNGg3YFQhrNwINDqw6AqWcxEWBgOnrglZXe X-Received: from wmbhc26.prod.google.com ([2002:a05:600c:871a:b0:45f:2859:5428]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:c0d7:b0:45b:9b9f:88cb with SMTP id 5b1f17b1804b1-464fdf4425emr15559625e9.16.1758177372848; Wed, 17 Sep 2025 23:36:12 -0700 (PDT) Date: Thu, 18 Sep 2025 08:35:45 +0200 In-Reply-To: <20250918063539.2640512-7-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250918063539.2640512-7-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=5265; i=ardb@kernel.org; h=from:subject; bh=Dk5FXoMF39rDUZtpDj8iIP76qecw7z+7kErWyEtMPzo=; b=owGbwMvMwCVmkMcZplerG8N4Wi2JIeP0CufThftPBJ5ss7Q590bVUEs2Ter/i0lV+lcLvH5sO OEvxvmxo5SFQYyLQVZMkUVg9t93O09PlKp1niULM4eVCWQIAxenAExE8A0jQ1dfScM10ctyTn+n ty7RPK65fneCi7xyrPjMkFVnk3OWVDH8r2H2uPotW6BoL8e7uzOTizP+SbnZ3nFqrFr5/s6HEP6 VLAA= X-Mailer: git-send-email 2.51.0.384.g4c02a37b29-goog Message-ID: <20250918063539.2640512-12-ardb+git@google.com> Subject: [PATCH 5/5] arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Cc: linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org, herbert@gondor.apana.org.au, ebiggers@kernel.org, Ard Biesheuvel , Marc Zyngier , Will Deacon , Mark Rutland , Kees Cook , Catalin Marinas , Mark Brown Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel Commit aefbab8e77eb16b5 ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch") added a 'kernel_fpsimd_state' field to struct thread_struct, which is the arch-specific portion of struct task_struct, and is allocated for each task in the system. The size of this field is 528 bytes, resulting in non-trivial bloat of task_struct, and the resulting memory overhead may impact performance on systems with many processes. This allocation is only used if the task is scheduled out or interrupted by a softirq while using the FP/SIMD unit in kernel mode, and given that calls to kernel_neon_begin() and kernel_neon_end() are now guaranteed to originate from the same lexical scope, it is possible to transparently allocate this buffer on the caller's stack instead. Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/neon.h | 4 +-- arch/arm64/include/asm/processor.h | 2 +- arch/arm64/kernel/fpsimd.c | 26 ++++++++++++++------ 3 files changed, 21 insertions(+), 11 deletions(-) diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h index 4e24f1058b55..acaac98ff449 100644 --- a/arch/arm64/include/asm/neon.h +++ b/arch/arm64/include/asm/neon.h @@ -13,10 +13,10 @@ =20 #define cpu_has_neon() system_supports_fpsimd() =20 -void __kernel_neon_begin(void); +void __kernel_neon_begin(struct user_fpsimd_state *); void __kernel_neon_end(void); =20 -#define kernel_neon_begin() do { __kernel_neon_begin() +#define kernel_neon_begin() do { __kernel_neon_begin(&(struct user_fpsimd_= state){}) #define kernel_neon_end() __kernel_neon_end(); } while (0) =20 #endif /* ! __ASM_NEON_H */ diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/pr= ocessor.h index 61d62bfd5a7b..226e635c53d9 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -172,7 +172,7 @@ struct thread_struct { unsigned long fault_code; /* ESR_EL1 value */ struct debug_info debug; /* debugging */ =20 - struct user_fpsimd_state kernel_fpsimd_state; + struct user_fpsimd_state *kernel_fpsimd_state; unsigned int kernel_fpsimd_cpu; #ifdef CONFIG_ARM64_PTR_AUTH struct ptrauth_keys_user keys_user; diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index d7eb073d1366..919c53a26484 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1488,21 +1488,23 @@ static void fpsimd_load_kernel_state(struct task_st= ruct *task) * Elide the load if this CPU holds the most recent kernel mode * FPSIMD context of the current task. */ - if (last->st =3D=3D &task->thread.kernel_fpsimd_state && + if (last->st =3D=3D task->thread.kernel_fpsimd_state && task->thread.kernel_fpsimd_cpu =3D=3D smp_processor_id()) return; =20 - fpsimd_load_state(&task->thread.kernel_fpsimd_state); + fpsimd_load_state(task->thread.kernel_fpsimd_state); } =20 static void fpsimd_save_kernel_state(struct task_struct *task) { struct cpu_fp_state cpu_fp_state =3D { - .st =3D &task->thread.kernel_fpsimd_state, + .st =3D task->thread.kernel_fpsimd_state, .to_save =3D FP_STATE_FPSIMD, }; =20 - fpsimd_save_state(&task->thread.kernel_fpsimd_state); + BUG_ON(!cpu_fp_state.st); + + fpsimd_save_state(task->thread.kernel_fpsimd_state); fpsimd_bind_state_to_cpu(&cpu_fp_state); =20 task->thread.kernel_fpsimd_cpu =3D smp_processor_id(); @@ -1773,6 +1775,7 @@ void fpsimd_update_current_state(struct user_fpsimd_s= tate const *state) void fpsimd_flush_task_state(struct task_struct *t) { t->thread.fpsimd_cpu =3D NR_CPUS; + t->thread.kernel_fpsimd_state =3D NULL; /* * If we don't support fpsimd, bail out after we have * reset the fpsimd_cpu for this task and clear the @@ -1833,7 +1836,7 @@ void fpsimd_save_and_flush_cpu_state(void) * The caller may freely use the FPSIMD registers until kernel_neon_end() = is * called. */ -void __kernel_neon_begin(void) +void __kernel_neon_begin(struct user_fpsimd_state *s) { if (WARN_ON(!system_supports_fpsimd())) return; @@ -1849,6 +1852,13 @@ void __kernel_neon_begin(void) } else { fpsimd_save_user_state(); =20 + /* + * Record the caller provided buffer as the kernel mode FP/SIMD + * buffer for this task, so that the state can be preserved and + * restored on a context switch. + */ + current->thread.kernel_fpsimd_state =3D s; + /* * Set the thread flag so that the kernel mode FPSIMD state * will be context switched along with the rest of the task @@ -1899,8 +1909,8 @@ void __kernel_neon_end(void) if (!IS_ENABLED(CONFIG_PREEMPT_RT) && in_serving_softirq() && test_thread_flag(TIF_KERNEL_FPSTATE)) fpsimd_load_kernel_state(current); - else - clear_thread_flag(TIF_KERNEL_FPSTATE); + else if (test_and_clear_thread_flag(TIF_KERNEL_FPSTATE)) + current->thread.kernel_fpsimd_state =3D NULL; } EXPORT_SYMBOL_GPL(__kernel_neon_end); =20 @@ -1936,7 +1946,7 @@ void __efi_fpsimd_begin(void) WARN_ON(preemptible()); =20 if (may_use_simd()) { - __kernel_neon_begin(); + __kernel_neon_begin(&efi_fpsimd_state); } else { /* * If !efi_sve_state, SVE can't be in use yet and doesn't need --=20 2.51.0.384.g4c02a37b29-goog