From nobody Tue Jun 30 04:37:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D02B9C433F5 for ; Tue, 25 Jan 2022 15:30:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234470AbiAYPaE (ORCPT ); Tue, 25 Jan 2022 10:30:04 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:50034 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345447AbiAYP1N (ORCPT ); Tue, 25 Jan 2022 10:27:13 -0500 From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1643124418; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OqTj/MWZ/KJ2RN1mgDHy03k6dnz1y2VoL1uj4/vvUAQ=; b=LbbMdRo1XXg6KVBHbbqPBcqdbG25p3Spy5dyBjTDPwNUCaIo1Et6mXdc9EdWJ9RKagpE8P sdXgUE914mMUVB54BVaZrA15CKnzZ1EXhfbgG0TuW88AIpdytOJ18S/dLyS79RcUvI+Bxc mwdiLCIdbLZEwKFHWlfh2PYFyREEqcpyJZyEV/JrJBw/xEW8jXJf/HpsG6qr3gpfH016Es ZWlHUrx6j1U+Ho2rQwYxaGySdQOYhYS9J4f4scTOKgrt4mmgkOxKStCwrHi7h6zS+zaVo6 0irxj1KR1lJEc+QvGUkaOjaleOZg2/aiqcIxlxXnzw0fg5gNrR9pUyZ5en8RJw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1643124418; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OqTj/MWZ/KJ2RN1mgDHy03k6dnz1y2VoL1uj4/vvUAQ=; b=Ztyu45yMZ5XpxvMPjIGx8jVo3+SiuxbOyIjYqQ8Zrcfg3HpFXcPvaOZb6qVCWRcytPYA30 4W3V+fxptbgZFaDg== To: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org Cc: Andy Lutomirski , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot , Sebastian Andrzej Siewior Subject: [PATCH 1/8] kernel/fork: Redo ifdefs around task's handling. Date: Tue, 25 Jan 2022 16:26:45 +0100 Message-Id: <20220125152652.1963111-2-bigeasy@linutronix.de> In-Reply-To: <20220125152652.1963111-1-bigeasy@linutronix.de> References: <20220125152652.1963111-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The use of ifdef CONFIG_VMAP_STACK is confusing in terms what is actually happenning and what can happen. For instance from reading free_thread_stack() it appears that in the CONFIG_VMAP_STACK case we may receive a non-NULL vm pointer but it may also be NULL in which case __free_pages() is used to free the stack. This is however not the case because in the VMAP case a non-NULL pointer is always returned here. Since it looks like this might happen, the compiler creates the correct dead code with the invocation to __free_pages() and everything around it. Twice. Add spaces between the ifdef and the identifer to recognize the ifdef level that we are currently in. Add the current identifer as a comment behind #else and #endif. Move the code within free_thread_stack() and alloc_thread_stack_node() into the relavant ifdef block. Signed-off-by: Sebastian Andrzej Siewior --- kernel/fork.c | 74 +++++++++++++++++++++++++++------------------------ 1 file changed, 39 insertions(+), 35 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index d75a528f7b219..f63c0af6002da 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -185,7 +185,7 @@ static inline void free_task_struct(struct task_struct = *tsk) */ # if THREAD_SIZE >=3D PAGE_SIZE || defined(CONFIG_VMAP_STACK) =20 -#ifdef CONFIG_VMAP_STACK +# ifdef CONFIG_VMAP_STACK /* * vmalloc() is a bit slow, and calling vfree() enough times will force a = TLB * flush. Try to minimize the number of calls by caching stacks. @@ -210,11 +210,9 @@ static int free_vm_stack_cache(unsigned int cpu) =20 return 0; } -#endif =20 static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int= node) { -#ifdef CONFIG_VMAP_STACK void *stack; int i; =20 @@ -258,7 +256,34 @@ static unsigned long *alloc_thread_stack_node(struct t= ask_struct *tsk, int node) tsk->stack =3D stack; } return stack; -#else +} + +static void free_thread_stack(struct task_struct *tsk) +{ + struct vm_struct *vm =3D task_stack_vm_area(tsk); + int i; + + for (i =3D 0; i < THREAD_SIZE / PAGE_SIZE; i++) + memcg_kmem_uncharge_page(vm->pages[i], 0); + + for (i =3D 0; i < NR_CACHED_STACKS; i++) { + if (this_cpu_cmpxchg(cached_stacks[i], NULL, + tsk->stack_vm_area) !=3D NULL) + continue; + + tsk->stack =3D NULL; + tsk->stack_vm_area =3D NULL; + return; + } + vfree_atomic(tsk->stack); + tsk->stack =3D NULL; + tsk->stack_vm_area =3D NULL; +} + +# else /* !CONFIG_VMAP_STACK */ + +static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int= node) +{ struct page *page =3D alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER); =20 @@ -267,36 +292,17 @@ static unsigned long *alloc_thread_stack_node(struct = task_struct *tsk, int node) return tsk->stack; } return NULL; -#endif } =20 -static inline void free_thread_stack(struct task_struct *tsk) +static void free_thread_stack(struct task_struct *tsk) { -#ifdef CONFIG_VMAP_STACK - struct vm_struct *vm =3D task_stack_vm_area(tsk); - - if (vm) { - int i; - - for (i =3D 0; i < THREAD_SIZE / PAGE_SIZE; i++) - memcg_kmem_uncharge_page(vm->pages[i], 0); - - for (i =3D 0; i < NR_CACHED_STACKS; i++) { - if (this_cpu_cmpxchg(cached_stacks[i], - NULL, tsk->stack_vm_area) !=3D NULL) - continue; - - return; - } - - vfree_atomic(tsk->stack); - return; - } -#endif - __free_pages(virt_to_page(tsk->stack), THREAD_SIZE_ORDER); + tsk->stack =3D NULL; } -# else + +# endif /* CONFIG_VMAP_STACK */ +# else /* !(THREAD_SIZE >=3D PAGE_SIZE || defined(CONFIG_VMAP_STACK)) */ + static struct kmem_cache *thread_stack_cache; =20 static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, @@ -312,6 +318,7 @@ static unsigned long *alloc_thread_stack_node(struct ta= sk_struct *tsk, static void free_thread_stack(struct task_struct *tsk) { kmem_cache_free(thread_stack_cache, tsk->stack); + tsk->stack =3D NULL; } =20 void thread_stack_cache_init(void) @@ -321,8 +328,9 @@ void thread_stack_cache_init(void) THREAD_SIZE, NULL); BUG_ON(thread_stack_cache =3D=3D NULL); } -# endif -#endif + +# endif /* THREAD_SIZE >=3D PAGE_SIZE || defined(CONFIG_VMAP_STACK) */ +#endif /* !CONFIG_ARCH_THREAD_STACK_ALLOCATOR */ =20 /* SLAB cache for signal_struct structures (tsk->signal) */ static struct kmem_cache *signal_cachep; @@ -432,10 +440,6 @@ static void release_task_stack(struct task_struct *tsk) =20 account_kernel_stack(tsk, -1); free_thread_stack(tsk); - tsk->stack =3D NULL; -#ifdef CONFIG_VMAP_STACK - tsk->stack_vm_area =3D NULL; -#endif } =20 #ifdef CONFIG_THREAD_INFO_IN_TASK --=20 2.34.1 From nobody Tue Jun 30 04:37:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F346C433F5 for ; Tue, 25 Jan 2022 15:32:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345761AbiAYPcR (ORCPT ); Tue, 25 Jan 2022 10:32:17 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:50052 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344538AbiAYP1N (ORCPT ); Tue, 25 Jan 2022 10:27:13 -0500 From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1643124418; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7cvc7OKGjIgjSJRVBHrDCTpoYiTTVf4ZQnsLwC8HQM0=; b=cfNnQT+O9ZYB37rXp1YHw/ZDlCwnxpZA01rjZQII13JWorJlD13W/DXlaAA4JqYcom/S86 +eu/K6z2004cDk5gokX8gOATLH61yQDmZztKlVs3yaKQgUuctE9UdUNl3OTXmxs64wWC3x ILyC2Xs6nQSXbn7GU1RGbxSjq1W9o2M1dNVY+VRyHj7q7EVm8LuNJuJ/TPZMHgxFOKbN5C A9WXCRp4vDaKL8i/p97cQbW/uZIMnlix5kcf/RG1yoEVVvtevj4PcnwtqjKnLEaQnJD62D JN909QPQPswVtoFXTf5MAFqTT9xVPaA5LvJudSdcG2ejN6xARqeX4+DBA69DRQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1643124418; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7cvc7OKGjIgjSJRVBHrDCTpoYiTTVf4ZQnsLwC8HQM0=; b=VB3qlt8kBnb1OwNvf/uc0Kai2qRSd9He894kVw0tgnaKy9M7UlG9chLd6MEB8KrtJB9Plr 8BpXchFMJlpXESCg== To: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org Cc: Andy Lutomirski , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot , Sebastian Andrzej Siewior Subject: [PATCH 2/8] kernel/fork: Duplicate task_struct before stack allocation. Date: Tue, 25 Jan 2022 16:26:46 +0100 Message-Id: <20220125152652.1963111-3-bigeasy@linutronix.de> In-Reply-To: <20220125152652.1963111-1-bigeasy@linutronix.de> References: <20220125152652.1963111-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" alloc_thread_stack_node() already populates the task_struct::stack member except on IA64. The stack pointer is saved and populated again because IA64 needs it and arch_dup_task_struct() overwrites it. Allocate thread's stack after task_struct has been duplicated as a preparation. Signed-off-by: Sebastian Andrzej Siewior --- kernel/fork.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index f63c0af6002da..c47dcba5d66d2 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -888,6 +888,10 @@ static struct task_struct *dup_task_struct(struct task= _struct *orig, int node) if (!tsk) return NULL; =20 + err =3D arch_dup_task_struct(tsk, orig); + if (err) + goto free_tsk; + stack =3D alloc_thread_stack_node(tsk, node); if (!stack) goto free_tsk; @@ -897,8 +901,6 @@ static struct task_struct *dup_task_struct(struct task_= struct *orig, int node) =20 stack_vm_area =3D task_stack_vm_area(tsk); =20 - err =3D arch_dup_task_struct(tsk, orig); - /* * arch_dup_task_struct() clobbers the stack-related fields. Make * sure they're properly initialized before using any stack-related @@ -912,9 +914,6 @@ static struct task_struct *dup_task_struct(struct task_= struct *orig, int node) refcount_set(&tsk->stack_refcount, 1); #endif =20 - if (err) - goto free_stack; - err =3D scs_prepare(tsk, node); if (err) goto free_stack; --=20 2.34.1 From nobody Tue Jun 30 04:37:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC9E5C433EF for ; Tue, 25 Jan 2022 15:30:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349464AbiAYPaR (ORCPT ); Tue, 25 Jan 2022 10:30:17 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:50066 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344600AbiAYP1N (ORCPT ); Tue, 25 Jan 2022 10:27:13 -0500 From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1643124419; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6xLz/KFQPrcQSXuWbgpl53d0fa2nP/lvm2qQeThafk4=; b=C5CRHCNiIL8NoJnSyTplmuSII/XcUiBucgWr4RnrEbVqLqWk5Dzh4SEXmg0eaXutRIQtPp Hl4Pzc/zgVGwFh6QHQd8JXidKPGcmK4kEn6QgroyuzVJMwG7WMVwzE6tlcX/zPB42sveAg GNrz62fODQBajX+1+iSPWzM7rLyUpBXawJQXPrKFey2WTdIsPuem2cigyZAn6AlwzyUIhg 1rzx3k1nq0I7Ipz3r7mKJKarhp8RIB2XqWjAnmXIgZiIPNemO4KxQk7io1Xvp40lpWVdtY se3hQFwzYYSpWEhuV/rzrvUMn2aDRGH0amYvo7P4yK1zVwg9N7VjthSmRdSpog== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1643124419; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6xLz/KFQPrcQSXuWbgpl53d0fa2nP/lvm2qQeThafk4=; b=8akRmo7tr2grKpZNHAYd8lvaTO6c5JidOQH+CcHp5XfufJxZs5qeebl3PB1jnbuvQMre+6 ssHHpcpS+lES8GCQ== To: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org Cc: Andy Lutomirski , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot , Sebastian Andrzej Siewior Subject: [PATCH 3/8] kernel/fork, IA64: Provide a alloc_thread_stack_node() for IA64. Date: Tue, 25 Jan 2022 16:26:47 +0100 Message-Id: <20220125152652.1963111-4-bigeasy@linutronix.de> In-Reply-To: <20220125152652.1963111-1-bigeasy@linutronix.de> References: <20220125152652.1963111-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Provide a generic alloc_thread_stack_node() for IA64/ CONFIG_ARCH_THREAD_STACK_ALLOCATOR which returns stack pointer and sets task_struct::stack so it behaves exactly like the other implementations. Rename IA64's alloc_thread_stack_node() and add the generic version to the fork code so it is in one place _and_ to drastically lower chances of fat fingering the IA64 code. Do the same for free_thread_stack(). Signed-off-by: Sebastian Andrzej Siewior --- arch/ia64/include/asm/thread_info.h | 6 +++--- kernel/fork.c | 16 ++++++++++++++++ 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/arch/ia64/include/asm/thread_info.h b/arch/ia64/include/asm/th= read_info.h index 51d20cb377062..1684716f08201 100644 --- a/arch/ia64/include/asm/thread_info.h +++ b/arch/ia64/include/asm/thread_info.h @@ -55,15 +55,15 @@ struct thread_info { #ifndef ASM_OFFSETS_C /* how to get the thread information struct from C */ #define current_thread_info() ((struct thread_info *) ((char *) current + = IA64_TASK_SIZE)) -#define alloc_thread_stack_node(tsk, node) \ +#define arch_alloc_thread_stack_node(tsk, node) \ ((unsigned long *) ((char *) (tsk) + IA64_TASK_SIZE)) #define task_thread_info(tsk) ((struct thread_info *) ((char *) (tsk) + IA= 64_TASK_SIZE)) #else #define current_thread_info() ((struct thread_info *) 0) -#define alloc_thread_stack_node(tsk, node) ((unsigned long *) 0) +#define arch_alloc_thread_stack_node(tsk, node) ((unsigned long *) 0) #define task_thread_info(tsk) ((struct thread_info *) 0) #endif -#define free_thread_stack(tsk) /* nothing */ +#define arch_free_thread_stack(tsk) /* nothing */ #define task_stack_page(tsk) ((void *)(tsk)) =20 #define __HAVE_THREAD_FUNCTIONS diff --git a/kernel/fork.c b/kernel/fork.c index c47dcba5d66d2..a0d58ae6fac76 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -330,6 +330,22 @@ void thread_stack_cache_init(void) } =20 # endif /* THREAD_SIZE >=3D PAGE_SIZE || defined(CONFIG_VMAP_STACK) */ +#else /* CONFIG_ARCH_THREAD_STACK_ALLOCATOR */ + +static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int= node) +{ + unsigned long *stack; + + stack =3D arch_alloc_thread_stack_node(tsk, node); + tsk->stack =3D stack; + return stack; +} + +static void free_thread_stack(struct task_struct *tsk, bool cache_only) +{ + arch_free_thread_stack(tsk); +} + #endif /* !CONFIG_ARCH_THREAD_STACK_ALLOCATOR */ =20 /* SLAB cache for signal_struct structures (tsk->signal) */ --=20 2.34.1 From nobody Tue Jun 30 04:37:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20DF3C433F5 for ; Tue, 25 Jan 2022 15:29:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349226AbiAYP25 (ORCPT ); Tue, 25 Jan 2022 10:28:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344629AbiAYP1D (ORCPT ); Tue, 25 Jan 2022 10:27:03 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDC75C06173B; Tue, 25 Jan 2022 07:27:01 -0800 (PST) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1643124419; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BK2stH6hi6EZUuAUSqM+wo9W+aB6VBt7vWmGOsrxxmc=; b=OdceoFbM/Wj57SOd6znbsQY3MsmXlbOmk2bulXpIvPSxnzQm7R0BwkELtv6vVlI5hKnX4F Ccifq7s4rqwa+UVzThVCk/7ZhtprWmpcwMes5t9HbSN5RuB4iU7luXDt+9C/YlRrSFBSQi gxeYx33SItNN1Al8detKvINuFmMotqq9nWnqSAr7S5iVigIznBRvRoOJfT+FRhz/wnPULJ tRZAr0+6YokMkavLhGUnbpeCQu33bUSV1xLgJ+a0Dufi1KSYGKFiZailzwIQ8FPng5YT9N 7DtaCIFfVki5IUv0wgsiuTFH1C1VxEVbYHMrl3rUcSB2TKWO5w9zK4tykDk4Bw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1643124419; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BK2stH6hi6EZUuAUSqM+wo9W+aB6VBt7vWmGOsrxxmc=; b=lIrRCHtHgCkL017Fxh1X/FsxD1FIWUvZS/FahKHPt4RhzD/rn8mjQXTS3d/dscUwBshGy1 NbDxytNDCzJFr1Dg== To: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org Cc: Andy Lutomirski , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot , Sebastian Andrzej Siewior Subject: [PATCH 4/8] kernel/fork: Don't assign the stack pointer in dup_task_struct(). Date: Tue, 25 Jan 2022 16:26:48 +0100 Message-Id: <20220125152652.1963111-5-bigeasy@linutronix.de> In-Reply-To: <20220125152652.1963111-1-bigeasy@linutronix.de> References: <20220125152652.1963111-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" All four versions of alloc_thread_stack_node() assign now task_struct::stack in case the allocation was successful. Let alloc_thread_stack_node() return an error code instead of the stack pointer and remove the stack assignment in dup_task_struct(). Signed-off-by: Sebastian Andrzej Siewior --- kernel/fork.c | 47 ++++++++++++++++------------------------------- 1 file changed, 16 insertions(+), 31 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index a0d58ae6fac76..e6337b3c34ff7 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -211,7 +211,7 @@ static int free_vm_stack_cache(unsigned int cpu) return 0; } =20 -static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int= node) +static int alloc_thread_stack_node(struct task_struct *tsk, int node) { void *stack; int i; @@ -232,7 +232,7 @@ static unsigned long *alloc_thread_stack_node(struct ta= sk_struct *tsk, int node) =20 tsk->stack_vm_area =3D s; tsk->stack =3D s->addr; - return s->addr; + return 0; } =20 /* @@ -245,17 +245,16 @@ static unsigned long *alloc_thread_stack_node(struct = task_struct *tsk, int node) THREADINFO_GFP & ~__GFP_ACCOUNT, PAGE_KERNEL, 0, node, __builtin_return_address(0)); - + if (!stack) + return -ENOMEM; /* * We can't call find_vm_area() in interrupt context, and * free_thread_stack() can be called in interrupt context, * so cache the vm_struct. */ - if (stack) { - tsk->stack_vm_area =3D find_vm_area(stack); - tsk->stack =3D stack; - } - return stack; + tsk->stack_vm_area =3D find_vm_area(stack); + tsk->stack =3D stack; + return 0; } =20 static void free_thread_stack(struct task_struct *tsk) @@ -282,16 +281,16 @@ static void free_thread_stack(struct task_struct *tsk) =20 # else /* !CONFIG_VMAP_STACK */ =20 -static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int= node) +static int alloc_thread_stack_node(struct task_struct *tsk, int node) { struct page *page =3D alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER); =20 if (likely(page)) { tsk->stack =3D kasan_reset_tag(page_address(page)); - return tsk->stack; + return 0; } - return NULL; + return -ENOMEM; } =20 static void free_thread_stack(struct task_struct *tsk) @@ -305,14 +304,13 @@ static void free_thread_stack(struct task_struct *tsk) =20 static struct kmem_cache *thread_stack_cache; =20 -static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, - int node) +static int alloc_thread_stack_node(struct task_struct *tsk, int node) { unsigned long *stack; stack =3D kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node); stack =3D kasan_reset_tag(stack); tsk->stack =3D stack; - return stack; + return stack ? 0 : -ENOMEM; } =20 static void free_thread_stack(struct task_struct *tsk) @@ -332,13 +330,13 @@ void thread_stack_cache_init(void) # endif /* THREAD_SIZE >=3D PAGE_SIZE || defined(CONFIG_VMAP_STACK) */ #else /* CONFIG_ARCH_THREAD_STACK_ALLOCATOR */ =20 -static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int= node) +static int alloc_thread_stack_node(struct task_struct *tsk, int node) { unsigned long *stack; =20 stack =3D arch_alloc_thread_stack_node(tsk, node); tsk->stack =3D stack; - return stack; + return stack ? 0 : -ENOMEM; } =20 static void free_thread_stack(struct task_struct *tsk, bool cache_only) @@ -894,8 +892,6 @@ void set_task_stack_end_magic(struct task_struct *tsk) static struct task_struct *dup_task_struct(struct task_struct *orig, int n= ode) { struct task_struct *tsk; - unsigned long *stack; - struct vm_struct *stack_vm_area __maybe_unused; int err; =20 if (node =3D=3D NUMA_NO_NODE) @@ -908,24 +904,13 @@ static struct task_struct *dup_task_struct(struct tas= k_struct *orig, int node) if (err) goto free_tsk; =20 - stack =3D alloc_thread_stack_node(tsk, node); - if (!stack) + err =3D alloc_thread_stack_node(tsk, node); + if (err) goto free_tsk; =20 if (memcg_charge_kernel_stack(tsk)) goto free_stack; =20 - stack_vm_area =3D task_stack_vm_area(tsk); - - /* - * arch_dup_task_struct() clobbers the stack-related fields. Make - * sure they're properly initialized before using any stack-related - * functions again. - */ - tsk->stack =3D stack; -#ifdef CONFIG_VMAP_STACK - tsk->stack_vm_area =3D stack_vm_area; -#endif #ifdef CONFIG_THREAD_INFO_IN_TASK refcount_set(&tsk->stack_refcount, 1); #endif --=20 2.34.1 From nobody Tue Jun 30 04:37:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3017C433F5 for ; Tue, 25 Jan 2022 15:31:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346210AbiAYPbO (ORCPT ); Tue, 25 Jan 2022 10:31:14 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:50144 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240474AbiAYP1N (ORCPT ); Tue, 25 Jan 2022 10:27:13 -0500 From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1643124420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hj/UDQFXh46ohFcq4x02lLRgfDUj8Hg+fo7XLxh0iIk=; b=ZxEi8X5Iwl/QpIiCJm9sICmY6KkFcM7sXLjCya99qvcT0lG7Vo6tgPwADOiaTBhh5Qkopz VENPrk+IfjqMv3D7wtJxWT6hAVbLtO9pd5IvgkbzeOukRuKWvGzGN8rLxHH+m0kjazHnPC Mzy2OqcKhDTv9qLIVxLHoeYJjqN5XjYs+ldSXZAgAnvR82S02Pc6/FywaEnvh04pEi8hyL svTcZljwuyB4UzOH6QDAT4BxfSX/mWE2IEw/pxXMi+XdHxESovbVus5PIq5mptYQXtPxMe XdKjnKANmUzHVAt9YLvsAkxBQyVprNiHbY4rzkWsQ8krgbcT465k+o5a+t6e1A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1643124420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hj/UDQFXh46ohFcq4x02lLRgfDUj8Hg+fo7XLxh0iIk=; b=HXCas+atbeSv75w30tw5ujuq3uMWtDrru2A79HoJgaJ9NiXknnIu6clsO3P6FXJ1Ur5RDn pNi04qt3jBem7BAA== To: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org Cc: Andy Lutomirski , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot , Sebastian Andrzej Siewior Subject: [PATCH 5/8] kernel/fork: Move memcg_charge_kernel_stack() into CONFIG_VMAP_STACK. Date: Tue, 25 Jan 2022 16:26:49 +0100 Message-Id: <20220125152652.1963111-6-bigeasy@linutronix.de> In-Reply-To: <20220125152652.1963111-1-bigeasy@linutronix.de> References: <20220125152652.1963111-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" memcg_charge_kernel_stack() is only used in the CONFIG_VMAP_STACK case. Move memcg_charge_kernel_stack() into the CONFIG_VMAP_STACK block and invoke it from within alloc_thread_stack_node(). Signed-off-by: Sebastian Andrzej Siewior --- kernel/fork.c | 69 +++++++++++++++++++++++++++------------------------ 1 file changed, 36 insertions(+), 33 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index e6337b3c34ff7..73f644482e932 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -211,6 +211,32 @@ static int free_vm_stack_cache(unsigned int cpu) return 0; } =20 +static int memcg_charge_kernel_stack(struct task_struct *tsk) +{ + struct vm_struct *vm =3D task_stack_vm_area(tsk); + int i; + int ret; + + BUILD_BUG_ON(IS_ENABLED(CONFIG_VMAP_STACK) && PAGE_SIZE % 1024 !=3D 0); + BUG_ON(vm->nr_pages !=3D THREAD_SIZE / PAGE_SIZE); + + for (i =3D 0; i < THREAD_SIZE / PAGE_SIZE; i++) { + ret =3D memcg_kmem_charge_page(vm->pages[i], GFP_KERNEL, 0); + if (ret) + goto err; + } + return 0; +err: + /* + * If memcg_kmem_charge_page() fails, page's memory cgroup pointer is + * NULL, and memcg_kmem_uncharge_page() in free_thread_stack() will + * ignore this page. + */ + for (i =3D 0; i < THREAD_SIZE / PAGE_SIZE; i++) + memcg_kmem_uncharge_page(vm->pages[i], 0); + return ret; +} + static int alloc_thread_stack_node(struct task_struct *tsk, int node) { void *stack; @@ -230,6 +256,11 @@ static int alloc_thread_stack_node(struct task_struct = *tsk, int node) /* Clear stale pointers from reused stack. */ memset(s->addr, 0, THREAD_SIZE); =20 + if (memcg_charge_kernel_stack(tsk)) { + vfree(s->addr); + return -ENOMEM; + } + tsk->stack_vm_area =3D s; tsk->stack =3D s->addr; return 0; @@ -247,6 +278,11 @@ static int alloc_thread_stack_node(struct task_struct = *tsk, int node) 0, node, __builtin_return_address(0)); if (!stack) return -ENOMEM; + + if (memcg_charge_kernel_stack(tsk)) { + vfree(stack); + return -ENOMEM; + } /* * We can't call find_vm_area() in interrupt context, and * free_thread_stack() can be called in interrupt context, @@ -417,36 +453,6 @@ static void account_kernel_stack(struct task_struct *t= sk, int account) } } =20 -static int memcg_charge_kernel_stack(struct task_struct *tsk) -{ -#ifdef CONFIG_VMAP_STACK - struct vm_struct *vm =3D task_stack_vm_area(tsk); - int ret; - - BUILD_BUG_ON(IS_ENABLED(CONFIG_VMAP_STACK) && PAGE_SIZE % 1024 !=3D 0); - - if (vm) { - int i; - - BUG_ON(vm->nr_pages !=3D THREAD_SIZE / PAGE_SIZE); - - for (i =3D 0; i < THREAD_SIZE / PAGE_SIZE; i++) { - /* - * If memcg_kmem_charge_page() fails, page's - * memory cgroup pointer is NULL, and - * memcg_kmem_uncharge_page() in free_thread_stack() - * will ignore this page. - */ - ret =3D memcg_kmem_charge_page(vm->pages[i], GFP_KERNEL, - 0); - if (ret) - return ret; - } - } -#endif - return 0; -} - static void release_task_stack(struct task_struct *tsk) { if (WARN_ON(READ_ONCE(tsk->__state) !=3D TASK_DEAD)) @@ -908,9 +914,6 @@ static struct task_struct *dup_task_struct(struct task_= struct *orig, int node) if (err) goto free_tsk; =20 - if (memcg_charge_kernel_stack(tsk)) - goto free_stack; - #ifdef CONFIG_THREAD_INFO_IN_TASK refcount_set(&tsk->stack_refcount, 1); #endif --=20 2.34.1 From nobody Tue Jun 30 04:37:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4F2BC433FE for ; Tue, 25 Jan 2022 15:29:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344726AbiAYP3r (ORCPT ); Tue, 25 Jan 2022 10:29:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345261AbiAYP1N (ORCPT ); Tue, 25 Jan 2022 10:27:13 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E8EEC06173D; Tue, 25 Jan 2022 07:27:04 -0800 (PST) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1643124420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QtVdugESSAgwLZkgGw6B4IbpoYwN4O1FoYFGCZKHn08=; b=BRFDQIGu6F0Y1Y7UvLHxSKLIVioNHnsHpN+tv1qGvIfSfPWfTDoxJ7qtygnNOkhoTtLrcw 1+pbiF9+RO8vTdWYhR6hRY5O87xMSwBQ57K9TPHfur4Xp0eGyNGQXhzcCp12Zvnxq013Sq DUUn0Kl/F7HJj5bw9tx6rkDQCbNftU5dtBdPWBIYDxTC5LVn7mhkhcZ3WDSTBLoNnjIQx5 tsKDi2PU3M48yKl1RE2erUYO9tIDRlRvZJ+XAs2I+TB7h6s7s74g0Q6yZ/PTS+IVZQvzV0 h2W29xc1J364zYvMk410rQwoSBOg2p6MF91sIDJPArmSJBCJMcFV/EV6Jx3mqA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1643124420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QtVdugESSAgwLZkgGw6B4IbpoYwN4O1FoYFGCZKHn08=; b=P54Yamdt4wFUM05Xa4Y6fzqJ5s5kccBP4hY+WXTH5pAEXIhbSTIZUhf6w2ngqpvH+PH3eC HWpWqMiXcgSFkkCA== To: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org Cc: Andy Lutomirski , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot , Sebastian Andrzej Siewior Subject: [PATCH 6/8] kernel/fork: Move task stack account to do_exit(). Date: Tue, 25 Jan 2022 16:26:50 +0100 Message-Id: <20220125152652.1963111-7-bigeasy@linutronix.de> In-Reply-To: <20220125152652.1963111-1-bigeasy@linutronix.de> References: <20220125152652.1963111-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There is no need to perform the stack accounting of the outgoing task in its final schedule() invocation which happens with disabled preemption. The task is leaving, the resources will be freed and the accounting can happen in do_exit() before the actual schedule invocation which frees the stack memory. Move the accounting of the stack memory from release_task_stack() to exit_task_stack_account() which then can be invoked from do_exit(). Signed-off-by: Sebastian Andrzej Siewior --- include/linux/sched/task_stack.h | 2 ++ kernel/exit.c | 1 + kernel/fork.c | 35 +++++++++++++++++++++----------- 3 files changed, 26 insertions(+), 12 deletions(-) diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_st= ack.h index d10150587d819..892562ebbd3aa 100644 --- a/include/linux/sched/task_stack.h +++ b/include/linux/sched/task_stack.h @@ -79,6 +79,8 @@ static inline void *try_get_task_stack(struct task_struct= *tsk) static inline void put_task_stack(struct task_struct *tsk) {} #endif =20 +void exit_task_stack_account(struct task_struct *tsk); + #define task_stack_end_corrupted(task) \ (*(end_of_stack(task)) !=3D STACK_END_MAGIC) =20 diff --git a/kernel/exit.c b/kernel/exit.c index b00a25bb4ab93..c303cffe7fdb4 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -845,6 +845,7 @@ void __noreturn do_exit(long code) put_page(tsk->task_frag.page); =20 validate_creds_for_do_exit(tsk); + exit_task_stack_account(tsk); =20 check_stack_usage(); preempt_disable(); diff --git a/kernel/fork.c b/kernel/fork.c index 73f644482e932..5f4e659a922e1 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -211,9 +211,8 @@ static int free_vm_stack_cache(unsigned int cpu) return 0; } =20 -static int memcg_charge_kernel_stack(struct task_struct *tsk) +static int memcg_charge_kernel_stack(struct vm_struct *vm) { - struct vm_struct *vm =3D task_stack_vm_area(tsk); int i; int ret; =20 @@ -239,6 +238,7 @@ static int memcg_charge_kernel_stack(struct task_struct= *tsk) =20 static int alloc_thread_stack_node(struct task_struct *tsk, int node) { + struct vm_struct *vm; void *stack; int i; =20 @@ -256,7 +256,7 @@ static int alloc_thread_stack_node(struct task_struct *= tsk, int node) /* Clear stale pointers from reused stack. */ memset(s->addr, 0, THREAD_SIZE); =20 - if (memcg_charge_kernel_stack(tsk)) { + if (memcg_charge_kernel_stack(s)) { vfree(s->addr); return -ENOMEM; } @@ -279,7 +279,8 @@ static int alloc_thread_stack_node(struct task_struct *= tsk, int node) if (!stack) return -ENOMEM; =20 - if (memcg_charge_kernel_stack(tsk)) { + vm =3D find_vm_area(stack); + if (memcg_charge_kernel_stack(vm)) { vfree(stack); return -ENOMEM; } @@ -288,19 +289,15 @@ static int alloc_thread_stack_node(struct task_struct= *tsk, int node) * free_thread_stack() can be called in interrupt context, * so cache the vm_struct. */ - tsk->stack_vm_area =3D find_vm_area(stack); + tsk->stack_vm_area =3D vm; tsk->stack =3D stack; return 0; } =20 static void free_thread_stack(struct task_struct *tsk) { - struct vm_struct *vm =3D task_stack_vm_area(tsk); int i; =20 - for (i =3D 0; i < THREAD_SIZE / PAGE_SIZE; i++) - memcg_kmem_uncharge_page(vm->pages[i], 0); - for (i =3D 0; i < NR_CACHED_STACKS; i++) { if (this_cpu_cmpxchg(cached_stacks[i], NULL, tsk->stack_vm_area) !=3D NULL) @@ -453,12 +450,25 @@ static void account_kernel_stack(struct task_struct *= tsk, int account) } } =20 +void exit_task_stack_account(struct task_struct *tsk) +{ + account_kernel_stack(tsk, -1); + + if (IS_ENABLED(CONFIG_VMAP_STACK)) { + struct vm_struct *vm; + int i; + + vm =3D task_stack_vm_area(tsk); + for (i =3D 0; i < THREAD_SIZE / PAGE_SIZE; i++) + memcg_kmem_uncharge_page(vm->pages[i], 0); + } +} + static void release_task_stack(struct task_struct *tsk) { if (WARN_ON(READ_ONCE(tsk->__state) !=3D TASK_DEAD)) return; /* Better to leak the stack than to free prematurely */ =20 - account_kernel_stack(tsk, -1); free_thread_stack(tsk); } =20 @@ -917,6 +927,7 @@ static struct task_struct *dup_task_struct(struct task_= struct *orig, int node) #ifdef CONFIG_THREAD_INFO_IN_TASK refcount_set(&tsk->stack_refcount, 1); #endif + account_kernel_stack(tsk, 1); =20 err =3D scs_prepare(tsk, node); if (err) @@ -960,8 +971,6 @@ static struct task_struct *dup_task_struct(struct task_= struct *orig, int node) tsk->wake_q.next =3D NULL; tsk->worker_private =3D NULL; =20 - account_kernel_stack(tsk, 1); - kcov_task_init(tsk); kmap_local_fork(tsk); =20 @@ -980,6 +989,7 @@ static struct task_struct *dup_task_struct(struct task_= struct *orig, int node) return tsk; =20 free_stack: + exit_task_stack_account(tsk); free_thread_stack(tsk); free_tsk: free_task_struct(tsk); @@ -2448,6 +2458,7 @@ static __latent_entropy struct task_struct *copy_proc= ess( exit_creds(p); bad_fork_free: WRITE_ONCE(p->__state, TASK_DEAD); + exit_task_stack_account(p); put_task_stack(p); delayed_free_task(p); fork_out: --=20 2.34.1 From nobody Tue Jun 30 04:37:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A1E5C433F5 for ; Tue, 25 Jan 2022 15:30:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350987AbiAYPam (ORCPT ); Tue, 25 Jan 2022 10:30:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344425AbiAYP1N (ORCPT ); Tue, 25 Jan 2022 10:27:13 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CACAC06173E; Tue, 25 Jan 2022 07:27:06 -0800 (PST) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1643124420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R2FY0o8V0KFXEgGTnMxiQrp5x0oq2+r+qes9izZZ8w4=; b=FmahXHnnXXRRnk8Jip7FdEQaCv0whJSVvkWxZPvOVbNg0IB248N23Jz+aS6QHbPs1LiSVM dV8EHuMMJjveDa0NtI2brFyBt1X+dpw0eTwU06ws1NbTqgVMWNGes2qhWWcgbHxJGyLKLh EZueV4G2zFkA2NAdnlL58NqzGf3YevOkIbnnSI8bOwnLXIQFlAqZPGp2c7eacHtvxnDpOB Z0XEkce0zOieLAJizQ4upWBRocRpv8JPpHqoq+ml4BtcqXLMKzNp0Gej5vK7DRvd0eTyxJ EkjBG9vRTL0R7pVnIbaYKrt/UOMvWkMWfHewqLu4TiEpBl1FHDJFTEeU+bfI+A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1643124420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R2FY0o8V0KFXEgGTnMxiQrp5x0oq2+r+qes9izZZ8w4=; b=2IVdTZn09W8uu4VwMs31BkHJWFJn19SokiBmh/UrV/AeVg3PGlE9q7RAMHEOqS3/iEix1t Yj1W/mayhDeujzDA== To: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org Cc: Andy Lutomirski , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot , Sebastian Andrzej Siewior Subject: [PATCH 7/8] kernel/fork: Only cache the VMAP stack in finish_task_switch(). Date: Tue, 25 Jan 2022 16:26:51 +0100 Message-Id: <20220125152652.1963111-8-bigeasy@linutronix.de> In-Reply-To: <20220125152652.1963111-1-bigeasy@linutronix.de> References: <20220125152652.1963111-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The task stack could be deallocated later in delayed_put_task_struct(). For fork()/exec() kind of workloads (say a shell script executing several commands) it is important that the stack is released in finish_task_switch() so that in VMAP_STACK case it can be cached and reused in the new task. If the free/ caching is RCU-delayed then a new stack has to be allocated because the cache is filled in batches of which only two stacks, out of many, are recycled. For PREEMPT_RT it would be good if the wake-up in vfree_atomic() could be avoided in the scheduling path. Far worse are the other free_thread_stack() implementations which invoke __free_pages()/ kmem_cache_free() with disabled preemption. Introduce put_task_stack_sched() which is invoked from the finish_task_switch() and only caches the VMAP stack. If the cache is full or !CONFIG_VMAP_STACK is used than the stack is freed from delayed_put_task_struct(). In the VMAP case this is another opportunity to fill the cache. The stack is finally released in delayed_put_task_struct() which means that a valid stack reference can be held during its invocation. As such there can be made no assumption whether the task_struct::stack pointer can be freed if non-NULL. Set the lowest bit of task_struct::stack if the stack was released via put_task_stack_sched() and needs a final free in delayed_put_task_struct(). If the bit is missing then a reference is held and put_task_stack() will release it. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/sched/task_stack.h | 8 +++++ kernel/exit.c | 1 + kernel/fork.c | 60 ++++++++++++++++++++++++++------ kernel/sched/core.c | 7 ++-- 4 files changed, 64 insertions(+), 12 deletions(-) diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_st= ack.h index 892562ebbd3aa..ccd1336aa7f42 100644 --- a/include/linux/sched/task_stack.h +++ b/include/linux/sched/task_stack.h @@ -70,6 +70,7 @@ static inline void *try_get_task_stack(struct task_struct= *tsk) } =20 extern void put_task_stack(struct task_struct *tsk); +extern void put_task_stack_sched(struct task_struct *tsk); #else static inline void *try_get_task_stack(struct task_struct *tsk) { @@ -77,6 +78,13 @@ static inline void *try_get_task_stack(struct task_struc= t *tsk) } =20 static inline void put_task_stack(struct task_struct *tsk) {} +static inline void put_task_stack_sched(struct task_struct *tsk) {} +#endif + +#ifdef CONFIG_ARCH_THREAD_STACK_ALLOCATOR +static inline void task_stack_cleanup(struct task_struct *tsk) {} +#else +extern void task_stack_cleanup(struct task_struct *tsk); #endif =20 void exit_task_stack_account(struct task_struct *tsk); diff --git a/kernel/exit.c b/kernel/exit.c index c303cffe7fdb4..293b280d23192 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -171,6 +171,7 @@ static void delayed_put_task_struct(struct rcu_head *rh= p) kprobe_flush_task(tsk); perf_event_delayed_put(tsk); trace_sched_process_free(tsk); + task_stack_cleanup(tsk); put_task_struct(tsk); } =20 diff --git a/kernel/fork.c b/kernel/fork.c index 5f4e659a922e1..f48f666582b09 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -179,6 +179,16 @@ static inline void free_task_struct(struct task_struct= *tsk) =20 #ifndef CONFIG_ARCH_THREAD_STACK_ALLOCATOR =20 +#define THREAD_STACK_DELAYED_FREE 1UL + +static void thread_stack_mark_delayed_free(struct task_struct *tsk) +{ + unsigned long val =3D (unsigned long)tsk->stack; + + val |=3D THREAD_STACK_DELAYED_FREE; + WRITE_ONCE(tsk->stack, (void *)val); +} + /* * Allocate pages if THREAD_SIZE is >=3D PAGE_SIZE, otherwise use a * kmemcache based allocator. @@ -294,7 +304,7 @@ static int alloc_thread_stack_node(struct task_struct *= tsk, int node) return 0; } =20 -static void free_thread_stack(struct task_struct *tsk) +static void free_thread_stack(struct task_struct *tsk, bool cache_only) { int i; =20 @@ -307,7 +317,12 @@ static void free_thread_stack(struct task_struct *tsk) tsk->stack_vm_area =3D NULL; return; } - vfree_atomic(tsk->stack); + if (cache_only) { + thread_stack_mark_delayed_free(tsk); + return; + } + + vfree(tsk->stack); tsk->stack =3D NULL; tsk->stack_vm_area =3D NULL; } @@ -326,8 +341,12 @@ static int alloc_thread_stack_node(struct task_struct = *tsk, int node) return -ENOMEM; } =20 -static void free_thread_stack(struct task_struct *tsk) +static void free_thread_stack(struct task_struct *tsk, bool cache_only) { + if (cache_only) { + thread_stack_mark_delayed_free(tsk); + return; + } __free_pages(virt_to_page(tsk->stack), THREAD_SIZE_ORDER); tsk->stack =3D NULL; } @@ -346,8 +365,12 @@ static int alloc_thread_stack_node(struct task_struct = *tsk, int node) return stack ? 0 : -ENOMEM; } =20 -static void free_thread_stack(struct task_struct *tsk) +static void free_thread_stack(struct task_struct *tsk, bool cache_only) { + if (cache_only) { + thread_stack_mark_delayed_free(tsk); + return; + } kmem_cache_free(thread_stack_cache, tsk->stack); tsk->stack =3D NULL; } @@ -361,8 +384,19 @@ void thread_stack_cache_init(void) } =20 # endif /* THREAD_SIZE >=3D PAGE_SIZE || defined(CONFIG_VMAP_STACK) */ -#else /* CONFIG_ARCH_THREAD_STACK_ALLOCATOR */ =20 +void task_stack_cleanup(struct task_struct *tsk) +{ + unsigned long val =3D (unsigned long)tsk->stack; + + if (!(val & THREAD_STACK_DELAYED_FREE)) + return; + + WRITE_ONCE(tsk->stack, (void *)(val & ~THREAD_STACK_DELAYED_FREE)); + free_thread_stack(tsk, false); +} + +#else /* CONFIG_ARCH_THREAD_STACK_ALLOCATOR */ static int alloc_thread_stack_node(struct task_struct *tsk, int node) { unsigned long *stack; @@ -464,19 +498,25 @@ void exit_task_stack_account(struct task_struct *tsk) } } =20 -static void release_task_stack(struct task_struct *tsk) +static void release_task_stack(struct task_struct *tsk, bool cache_only) { if (WARN_ON(READ_ONCE(tsk->__state) !=3D TASK_DEAD)) return; /* Better to leak the stack than to free prematurely */ =20 - free_thread_stack(tsk); + free_thread_stack(tsk, cache_only); } =20 #ifdef CONFIG_THREAD_INFO_IN_TASK void put_task_stack(struct task_struct *tsk) { if (refcount_dec_and_test(&tsk->stack_refcount)) - release_task_stack(tsk); + release_task_stack(tsk, false); +} + +void put_task_stack_sched(struct task_struct *tsk) +{ + if (refcount_dec_and_test(&tsk->stack_refcount)) + release_task_stack(tsk, true); } #endif =20 @@ -490,7 +530,7 @@ void free_task(struct task_struct *tsk) * The task is finally done with both the stack and thread_info, * so free both. */ - release_task_stack(tsk); + release_task_stack(tsk, false); #else /* * If the task had a separate stack allocation, it should be gone @@ -990,7 +1030,7 @@ static struct task_struct *dup_task_struct(struct task= _struct *orig, int node) =20 free_stack: exit_task_stack_account(tsk); - free_thread_stack(tsk); + free_thread_stack(tsk, false); free_tsk: free_task_struct(tsk); return NULL; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2e4ae00e52d14..bfcb45c3e59dc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4894,8 +4894,11 @@ static struct rq *finish_task_switch(struct task_str= uct *prev) if (prev->sched_class->task_dead) prev->sched_class->task_dead(prev); =20 - /* Task is done with its stack. */ - put_task_stack(prev); + /* + * Cache only the VMAP stack. The final deallocation is in + * delayed_put_task_struct. + */ + put_task_stack_sched(prev); =20 put_task_struct_rcu_user(prev); } --=20 2.34.1 From nobody Tue Jun 30 04:37:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 104DCC433F5 for ; Tue, 25 Jan 2022 15:29:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349383AbiAYP3K (ORCPT ); Tue, 25 Jan 2022 10:29:10 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:50162 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245413AbiAYP1G (ORCPT ); Tue, 25 Jan 2022 10:27:06 -0500 From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1643124421; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5xPl6JJmaxFMiMF1HCiC/9kPvONkq8tgAsmevFg4jzY=; b=iVdADx/xPjvAQGiDep5yk3M0GtKam0QpkJOF+cPo0VKvQoISkzoHagUSU74m7xt4GbZO64 8dHmy5Sf2ScJDgSLbB+DX9c1bbfUGg6Yf+0peMgk8xnsRFjFrXrQguesBC3tgrCpQPEhGY O4tZNAcZqe4TMpFCS/hVDyO3124Em6DCBKVvqyQ4uMSat3vH9sOZ6pTWyASI1IItTNh2Ya HnzTyhVls1ZPs2gDBKlNuWuX+dNLY2Mu+67j1Ll5VyVmqcR3vkZQ9YAIJgIlLGNjEgoOeq fAn2EXm1+Etl8EHs9sxCoGytaEZ3zZXmZOxXg1Rsm0uDszEYYtmyTb05G3qm3g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1643124421; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5xPl6JJmaxFMiMF1HCiC/9kPvONkq8tgAsmevFg4jzY=; b=Ha/zzR85LuJA1nsKu3bh/UKX5GfnHRbj7uPJ6fYxNZv6dZoWppUB1t5SYEWJOSgBPxhm6V U4RNFtKaHZUPpVBg== To: linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org Cc: Andy Lutomirski , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot , Sebastian Andrzej Siewior Subject: [PATCH 8/8] kernel/fork: Use IS_ENABLED() in account_kernel_stack(). Date: Tue, 25 Jan 2022 16:26:52 +0100 Message-Id: <20220125152652.1963111-9-bigeasy@linutronix.de> In-Reply-To: <20220125152652.1963111-1-bigeasy@linutronix.de> References: <20220125152652.1963111-1-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Not strickly needed but checking CONFIG_VMAP_STACK instead of task_stack_vm_area()' result allows the compiler the remove the else path in the CONFIG_VMAP_STACK case where the pointer can't be NULL. Check for CONFIG_VMAP_STACK in order to use the proper path. Signed-off-by: Sebastian Andrzej Siewior --- kernel/fork.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index f48f666582b09..92cafad00c653 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -468,16 +468,16 @@ void vm_area_free(struct vm_area_struct *vma) =20 static void account_kernel_stack(struct task_struct *tsk, int account) { - void *stack =3D task_stack_page(tsk); - struct vm_struct *vm =3D task_stack_vm_area(tsk); - - if (vm) { + if (IS_ENABLED(CONFIG_VMAP_STACK)) { + struct vm_struct *vm =3D task_stack_vm_area(tsk); int i; =20 for (i =3D 0; i < THREAD_SIZE / PAGE_SIZE; i++) mod_lruvec_page_state(vm->pages[i], NR_KERNEL_STACK_KB, account * (PAGE_SIZE / 1024)); } else { + void *stack =3D task_stack_page(tsk); + /* All stack pages are in the same node. */ mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB, account * (THREAD_SIZE / 1024)); --=20 2.34.1