From nobody Thu Dec 18 19:06:15 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0DB1EEB570 for ; Fri, 8 Sep 2023 20:38:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231759AbjIHUir (ORCPT ); Fri, 8 Sep 2023 16:38:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241303AbjIHUip (ORCPT ); Fri, 8 Sep 2023 16:38:45 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6C7ECD6 for ; Fri, 8 Sep 2023 13:38:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1694205502; x=1725741502; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qITvL8k5xRkrCVLF4OPNwjcjAY5mrhXmGAsL4D1jNcA=; b=ZzYwioYG0h6bCJfA8jU3HYwc4PgasdqGju0vptujg4tmwNh5prLcbu50 VXpdVw3KzEqEHtoSGtPUr6M3eRjrbQCaXIptQAELnCWXngn2ZHFEIyqli 8/1bmE7NXTRDGrhEuCKchNrpQmYhCjcs17+AcYKt//Ur6P6LDIsNVBCkM 7yZTmGgmVprKgolj0Ke3T/aTy5oI3mEw3506DBNaDxPfI5oqor3LHdd1+ HtYnehdXmcj/unzyIqpp+N/UVFaL3k6wiG0vHaT0VK2w0T4VZRkK7HqJQ YwWLaSWAIATRBFJnjJXIHTFkMsOGPkfsmxWpp00yvvRQvVzK+fnuAn8Lp g==; X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="376650582" X-IronPort-AV: E=Sophos;i="6.02,237,1688454000"; d="scan'208";a="376650582" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 13:37:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="832781667" X-IronPort-AV: E=Sophos;i="6.02,237,1688454000"; d="scan'208";a="832781667" Received: from imilose-mobl.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.14.33]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 13:37:19 -0700 From: Rick Edgecombe To: x86@kernel.org, Thomas Gleixner , Ingo Molnar , Andy Lutomirski , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Peter Zijlstra , hjl.tools@gmail.com, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH 1/3] x86/shstk: Handle vfork clone failure correctly Date: Fri, 8 Sep 2023 13:36:53 -0700 Message-Id: <20230908203655.543765-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230908203655.543765-1-rick.p.edgecombe@intel.com> References: <20230908203655.543765-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Shadow stacks are allocated automatically and freed on exit, depending on the clone flags. The two cases where new shadow stacks are not allocated are !CLONE_VM (fork()) and CLONE_VFORK (vfork()). For !CLONE_VM, although a new stack is not allocated, it can be freed normally because it will happen in the child's copy of the VM. However, for CLONE_VFORK the parent and the child are actually using the same shadow stack. So the kernel doesn't need to allocate *or* free a shadow stack for a CLONE_VFORK child. CLONE_VFORK children already need special tracking to avoid returning to userspace until the child exits or execs. Shadow stack uses this same tracking to avoid freeing CLONE_VFORK shadow stacks. However, the tracking is not setup until the clone has succeeded (internally). Which means, if a CLONE_VFORK fails, the existing logic will not know it is a CLONE_VFORK and proceed to unmap the parents shadow stack. This error handling cleanup logic runs via exit_thread() in the bad_fork_cleanup_thread label in copy_process(). The issue was seen in the glibc test "posix/tst-spawn3-pidfd" while running with shadow stack using currently out-of-tree glibc patches. Fix it by not unmapping the vfork shadow stack in the error case as well. Since clone is implemented in core code, it is not ideal to pass the clone flags along the error path in order to have shadow stack code have symmetric logic in the freeing half of the thread shadow stack handling. Instead use the existing state for thread shadow stacks to track whether the thread is managing its own shadow stack. For CLONE_VFORK, simply set shstk->base and shstk->size to 0, and have it mean the thread is not managing a shadow stack and so should skip cleanup work. Implement this by breaking up the CLONE_VFORK and !CLONE_VM cases in shstk_alloc_thread_stack() to separate conditionals since, the logic is now different between them. In the case of CLONE_VFORK && !CLONE_VM, the existing behavior is to not clean up the shadow stack in the child (which should go away quickly with either be exit or exec), so maintain that behavior by handling the CLONE_VFORK case first in the allocation path. This new logioc cleanly handles the case of normal, successful=20 CLONE_VFORK's skipping cleaning up their shadow stack's on exit as well.=20 So remove the existing, vfork shadow stack freeing logic. This is in=20 deactivate_mm() where vfork_done is used to tell if it is a vfork child=20 that can skip cleaning up the thread shadow stack. Reported-by: H.J. Lu Tested-by: H.J. Lu Fixes: b2926a36b97a ("x86/shstk: Handle thread shadow stack") Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/mmu_context.h | 3 +-- arch/x86/kernel/shstk.c | 22 ++++++++++++++++++++-- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_= context.h index 416901d406f8..8dac45a2c7fc 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -186,8 +186,7 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ - if (!tsk->vfork_done) \ - shstk_free(tsk); \ + shstk_free(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index fd689921a1db..ad63252ebebc 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -205,10 +205,21 @@ unsigned long shstk_alloc_thread_stack(struct task_st= ruct *tsk, unsigned long cl return 0; =20 /* - * For CLONE_VM, except vfork, the child needs a separate shadow + * For CLONE_VFORK the child will share the parents shadow stack. + * Make sure to clear the internal tracking of the thread shadow + * stack so the freeing logic run for child knows to leave it alone. + */ + if (clone_flags & CLONE_VFORK) { + shstk->base =3D 0; + shstk->size =3D 0; + return 0; + } + + /* + * For !CLONE_VM the child will use a copy of the parents shadow * stack. */ - if ((clone_flags & (CLONE_VFORK | CLONE_VM)) !=3D CLONE_VM) + if (!(clone_flags & CLONE_VM)) return 0; =20 size =3D adjust_shstk_size(stack_size); @@ -408,6 +419,13 @@ void shstk_free(struct task_struct *tsk) if (!tsk->mm || tsk->mm !=3D current->mm) return; =20 + /* + * If shstk->base is NULL, then this task is not managing its + * own shadow stack (CLONE_VFORK). So skip freeing it. + */ + if (!shstk->base) + return; + unmap_shadow_stack(shstk->base, shstk->size); } =20 --=20 2.34.1 From nobody Thu Dec 18 19:06:15 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC05DEEB56E for ; Fri, 8 Sep 2023 20:38:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239391AbjIHUiq (ORCPT ); Fri, 8 Sep 2023 16:38:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231759AbjIHUio (ORCPT ); Fri, 8 Sep 2023 16:38:44 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40B49171F for ; Fri, 8 Sep 2023 13:38:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1694205503; x=1725741503; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/oBeqyFWl76PzM9VwB+Feq4kMe336opSwYwrN7s1tNE=; b=fcUV0sFBisXNYMGa/JMp4e/fl84ZQ85ZpVImrDdzVqj5WoXgbJgut7HZ HycQ9tB1CWhOJAPIQPZtDqeM4gsg84y3+ktFXMp5epYaG1/n+u0WPPd9L 9G09gLO8NXlORr0ApXWEGQaC+KSH1OAE/LmQ8r1o1DFls0yTGGE5xdEhh S78lbVp51LMf6pMO6XbQBgaco0B5zxiPRBiLl130iYVeRkrpE39vECdCb jXYNY4IGrqVJMrw/CdNYETiMuUYLvHjUwxWNDWBHCMGR/2xeoixzY7dRw flx3BRxzhH2wiJHa/7IUOqiZGRgfwLbLEYPXEvtdFkdY/roGjGmPO30Mt w==; X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="376650592" X-IronPort-AV: E=Sophos;i="6.02,237,1688454000"; d="scan'208";a="376650592" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 13:37:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="832781672" X-IronPort-AV: E=Sophos;i="6.02,237,1688454000"; d="scan'208";a="832781672" Received: from imilose-mobl.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.14.33]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 13:37:20 -0700 From: Rick Edgecombe To: x86@kernel.org, Thomas Gleixner , Ingo Molnar , Andy Lutomirski , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Peter Zijlstra , hjl.tools@gmail.com, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH 2/3] x86/shstk: Remove useless clone error handling Date: Fri, 8 Sep 2023 13:36:54 -0700 Message-Id: <20230908203655.543765-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230908203655.543765-1-rick.p.edgecombe@intel.com> References: <20230908203655.543765-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When clone fails after the shadow stack is allocated, any allocated shadow stack is cleaned up in exit_thread() in copy_process(). So the logic in copy_thread() is unneeded, and also will not handle failures that happen outside of copy_thread(). In addition, since there is a second attempt to unmap the same shadow stack, there is a race where an newly mapped region could get unmapped. So remove the logic in copy_thread() and rely on exit_thread() to handle clone failure. Fixes: b2926a36b97a ("x86/shstk: Handle thread shadow stack") Tested-by: H.J. Lu Signed-off-by: Rick Edgecombe --- arch/x86/kernel/process.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9f0909142a0a..b6f4e8399fca 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -257,13 +257,6 @@ int copy_thread(struct task_struct *p, const struct ke= rnel_clone_args *args) if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); =20 - /* - * If copy_thread() if failing, don't leak the shadow stack possibly - * allocated in shstk_alloc_thread_stack() above. - */ - if (ret) - shstk_free(p); - return ret; } =20 --=20 2.34.1 From nobody Thu Dec 18 19:06:15 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA250EEB56E for ; Fri, 8 Sep 2023 20:38:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344547AbjIHUi4 (ORCPT ); Fri, 8 Sep 2023 16:38:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243773AbjIHUiz (ORCPT ); Fri, 8 Sep 2023 16:38:55 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DCAB6C4 for ; Fri, 8 Sep 2023 13:38:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1694205521; x=1725741521; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qRz3N4dt9aOEvJciUns9J86RjO6vBdJBsn0ap5aaM8I=; b=duYONdGu2C79nFQSjrpaLiEdvDAH+dRdlcOk6Z6+YpTY0gFZdY1MDeVc ZAnDL91dw2PS5o2yNpDAN3C++4abzl8TG1rOoTlV+9WySqr3yJ3Pnldsy zWLOI3t4aXBCaw+lLL7k3KDV8WFY+scFMcx57sVLI2DcSMLs8FzGys3lG s86qOQ2xLbxdwZfbjHxXVxupy/+HP0PVOf0XBX3O4Xs3+Cpu6wrjuDAD1 pNGg65LXZHNhGx05XeUdQJq5arKGhaQ6QHl3DuXBZD0uJc4sjw9p8wLqK wA8JSvGqQKLhs49Nf9K2gDRuwVGTR8Ntc3JDZfIVJVw5OObKe6p6ejCMj g==; X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="376650601" X-IronPort-AV: E=Sophos;i="6.02,237,1688454000"; d="scan'208";a="376650601" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 13:37:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="832781676" X-IronPort-AV: E=Sophos;i="6.02,237,1688454000"; d="scan'208";a="832781676" Received: from imilose-mobl.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.14.33]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 13:37:20 -0700 From: Rick Edgecombe To: x86@kernel.org, Thomas Gleixner , Ingo Molnar , Andy Lutomirski , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Peter Zijlstra , hjl.tools@gmail.com, linux-kernel@vger.kernel.org Cc: rick.p.edgecombe@intel.com Subject: [PATCH 3/3] x86/shstk: Add warning for shadow stack double unmap Date: Fri, 8 Sep 2023 13:36:55 -0700 Message-Id: <20230908203655.543765-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230908203655.543765-1-rick.p.edgecombe@intel.com> References: <20230908203655.543765-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" There are several ways a thread's shadow stacks can get unmapped. This can happen on exit or exec, as well as error handling in exec or clone. The task struct already keeps track of the thread's shadow stack. Use the size variable to keep track of if the shadow stack has already been freed. When an attempt to double unmap the thread shadow stack is caught, warn about it and abort the operation. Tested-by: H.J. Lu Signed-off-by: Rick Edgecombe --- arch/x86/kernel/shstk.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index ad63252ebebc..59e15dd8d0f8 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -426,7 +426,18 @@ void shstk_free(struct task_struct *tsk) if (!shstk->base) return; =20 + /* + * shstk->base is NULL for CLONE_VFORK child tasks, and so is + * normal. But size =3D 0 on a shstk->base is not normal and + * indicated an attempt to free the thread shadow stack twice. + * Warn about it. + */ + if (WARN_ON(!shstk->size)) + return; + unmap_shadow_stack(shstk->base, shstk->size); + + shstk->size =3D 0; } =20 static int wrss_control(bool enable) --=20 2.34.1