From nobody Fri Dec 27 06:06:14 2024 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 439461DB362 for ; Fri, 6 Dec 2024 10:18:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480295; cv=none; b=TVNS2+aIPdcik4aMV4OzMi39Yek2Ryfoz9ia8N7YrfCbc/Kr4ONzkOOcQOu4Ji8cWUF2vpnabQSraYFATX5I10bW/HR5hdCW8Q1sABV9cf9WzgOURj2//hjeNID4zAmd+khCVXZTWP2ULyk81/QqFcdmj8tjLZYNsDHnhICnaYE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480295; c=relaxed/simple; bh=de8OTUJPcJzqaktysdg1rIGQL6GDLSIpHmL5BJycq34=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sqkuH7A2kh7ky8Ggpis8Xfdax+Dd+LQpJrS0vsGEs33vh9gUQRvXQo637Rhfp/wH1DFW7PNsbRBGVltJzXAmpgrcWG4uQtUx8lkKpGny91CPxusxAP+6SPr3Q8dw9yaCTJesWEV0Qs7xvF9R9sN32CCvDc76F5ddp3POLVHWeJU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Y4RxJ14MkzqTVR; Fri, 6 Dec 2024 18:16:20 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 5BBA41800D9; Fri, 6 Dec 2024 18:18:08 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:06 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 01/22] arm64: ptrace: Replace interrupts_enabled() with regs_irqs_disabled() Date: Fri, 6 Dec 2024 18:17:23 +0800 Message-ID: <20241206101744.4161990-2-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry code expects architecture code to provide regs_irqs_disabled(regs) function, but arm64 does not have this and provides inerrupts_enabled(regs), which has the opposite polarity. In preparation for moving arm64 over to the generic entry code, relace arm64's interrupts_enabled() with regs_irqs_disabled() and update its callers under arch/arm64. For the moment, a definition of interrupts_enabled() is provided for the GICv3 driver. Once arch/arm implement regs_irqs_disabled(), this can be removed. No functional changes. Suggested-by: Mark Rutland Signed-off-by: Jinjie Ruan --- arch/arm64/include/asm/daifflags.h | 2 +- arch/arm64/include/asm/ptrace.h | 7 +++++++ arch/arm64/include/asm/xen/events.h | 2 +- arch/arm64/kernel/acpi.c | 2 +- arch/arm64/kernel/debug-monitors.c | 2 +- arch/arm64/kernel/entry-common.c | 4 ++-- arch/arm64/kernel/sdei.c | 2 +- 7 files changed, 14 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/da= ifflags.h index fbb5c99eb2f9..5fca48009043 100644 --- a/arch/arm64/include/asm/daifflags.h +++ b/arch/arm64/include/asm/daifflags.h @@ -128,7 +128,7 @@ static inline void local_daif_inherit(struct pt_regs *r= egs) { unsigned long flags =3D regs->pstate & DAIF_MASK; =20 - if (interrupts_enabled(regs)) + if (!regs_irqs_disabled(regs)) trace_hardirqs_on(); =20 if (system_uses_irq_prio_masking()) diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrac= e.h index 47ff8654c5ec..bcfa96880377 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -214,9 +214,16 @@ static inline void forget_syscall(struct pt_regs *regs) (regs)->pmr =3D=3D GIC_PRIO_IRQON : \ true) =20 +/* + * Used by the GICv3 driver, can be removed once arch/arm implements + * regs_irqs_disabled() directly. + */ #define interrupts_enabled(regs) \ (!((regs)->pstate & PSR_I_BIT) && irqs_priority_unmasked(regs)) =20 +#define regs_irqs_disabled(regs) \ + (((regs)->pstate & PSR_I_BIT) || (!irqs_priority_unmasked(regs))) + #define fast_interrupts_enabled(regs) \ (!((regs)->pstate & PSR_F_BIT)) =20 diff --git a/arch/arm64/include/asm/xen/events.h b/arch/arm64/include/asm/x= en/events.h index 2788e95d0ff0..2977b5fe068d 100644 --- a/arch/arm64/include/asm/xen/events.h +++ b/arch/arm64/include/asm/xen/events.h @@ -14,7 +14,7 @@ enum ipi_vector { =20 static inline int xen_irqs_disabled(struct pt_regs *regs) { - return !interrupts_enabled(regs); + return regs_irqs_disabled(regs); } =20 #define xchg_xen_ulong(ptr, val) xchg((ptr), (val)) diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c index e6f66491fbe9..732f89daae23 100644 --- a/arch/arm64/kernel/acpi.c +++ b/arch/arm64/kernel/acpi.c @@ -403,7 +403,7 @@ int apei_claim_sea(struct pt_regs *regs) return_to_irqs_enabled =3D !irqs_disabled_flags(arch_local_save_flags()); =20 if (regs) - return_to_irqs_enabled =3D interrupts_enabled(regs); + return_to_irqs_enabled =3D !regs_irqs_disabled(regs); =20 /* * SEA can interrupt SError, mask it and describe this as an NMI so diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-m= onitors.c index 58f047de3e1c..460c09d03a73 100644 --- a/arch/arm64/kernel/debug-monitors.c +++ b/arch/arm64/kernel/debug-monitors.c @@ -231,7 +231,7 @@ static void send_user_sigtrap(int si_code) if (WARN_ON(!user_mode(regs))) return; =20 - if (interrupts_enabled(regs)) + if (!regs_irqs_disabled(regs)) local_irq_enable(); =20 arm64_force_sig_fault(SIGTRAP, si_code, instruction_pointer(regs), diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index b260ddc4d3e9..c547e70428d3 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -73,7 +73,7 @@ static __always_inline void __exit_to_kernel_mode(struct = pt_regs *regs) { lockdep_assert_irqs_disabled(); =20 - if (interrupts_enabled(regs)) { + if (!regs_irqs_disabled(regs)) { if (regs->exit_rcu) { trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); @@ -569,7 +569,7 @@ static void noinstr el1_interrupt(struct pt_regs *regs, { write_sysreg(DAIF_PROCCTX_NOIRQ, daif); =20 - if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && !interrupts_enabled(regs)) + if (IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) && regs_irqs_disabled(regs)) __el1_pnmi(regs, handler); else __el1_irq(regs, handler); diff --git a/arch/arm64/kernel/sdei.c b/arch/arm64/kernel/sdei.c index 255d12f881c2..27a17da635d8 100644 --- a/arch/arm64/kernel/sdei.c +++ b/arch/arm64/kernel/sdei.c @@ -247,7 +247,7 @@ unsigned long __kprobes do_sdei_event(struct pt_regs *r= egs, * If we interrupted the kernel with interrupts masked, we always go * back to wherever we came from. */ - if (mode =3D=3D kernel_mode && !interrupts_enabled(regs)) + if (mode =3D=3D kernel_mode && regs_irqs_disabled(regs)) return SDEI_EV_HANDLED; =20 /* --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 070DC202F86 for ; Fri, 6 Dec 2024 10:18:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.190 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480301; cv=none; b=kTfeWOFRm0OM+i2qqFgsP6RvPaekZPZjEEMgJPfddzyuCE5TFVjqYuBZODUnlct3IwmgIpY3uDQjyxaM+BUSWphGA+SxcvMfBEn8wkhF0CsUT4BPkbEB9L5SUmmHgXZ8m27cdHHCy0/GuMlhPByF86LAlWViXJOx5A3yy9JvSIc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480301; c=relaxed/simple; bh=8iUmtnZixLddFQQtY6jhMRrmEOT2mXVSsMXGHal0LmY=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gT/r1EhtQPC7NTdIA6/E1mX3ObQtAY2rQgd9RlkDNejNJ+n9gctldh7tExj7q7nyNt32Azulrm4z1kvkbgyHWeEN1SyF/9OO6glxLEXBwmzx90RmzghTEfBgUXGxovVoX9SFFPmsy6qk9CFpf9SN0dIGaRcUlIOgow+L/6gw+/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.190 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4Y4RxV0z8nz21mdM; Fri, 6 Dec 2024 18:16:30 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 8A77C1401DC; Fri, 6 Dec 2024 18:18:09 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:08 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 02/22] arm64: entry: Refactor the entry and exit for exceptions from EL1 Date: Fri, 6 Dec 2024 18:17:24 +0800 Message-ID: <20241206101744.4161990-3-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry code uses irqentry_state_t to track lockdep and RCU state across exception entry and return. For historical reasons, arm64 embeds similar fields within its pt_regs structure. In preparation for moving arm64 over to the generic entry code, pull these fields out of arm64's pt_regs, and use a separate structure, matching the style of the generic entry code. No functional changes. Suggested-by: Mark Rutland Signed-off-by: Jinjie Ruan --- arch/arm64/include/asm/ptrace.h | 4 - arch/arm64/kernel/entry-common.c | 136 +++++++++++++++++++------------ 2 files changed, 85 insertions(+), 55 deletions(-) diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrac= e.h index bcfa96880377..e90dfc9982aa 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -169,10 +169,6 @@ struct pt_regs { =20 u64 sdei_ttbr1; struct frame_record_meta stackframe; - - /* Only valid for some EL1 exceptions. */ - u64 lockdep_hardirqs; - u64 exit_rcu; }; =20 /* For correct stack alignment, pt_regs has to be a multiple of 16 bytes. = */ diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index c547e70428d3..1687627b2ecf 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -28,6 +28,13 @@ #include #include =20 +typedef struct irqentry_state { + union { + bool exit_rcu; + bool lockdep; + }; +} irqentry_state_t; + /* * Handle IRQ/context state management when entering from kernel mode. * Before this function is called it is not safe to call regular kernel co= de, @@ -36,29 +43,36 @@ * This is intended to match the logic in irqentry_enter(), handling the k= ernel * mode transitions only. */ -static __always_inline void __enter_from_kernel_mode(struct pt_regs *regs) +static __always_inline irqentry_state_t __enter_from_kernel_mode(struct pt= _regs *regs) { - regs->exit_rcu =3D false; + irqentry_state_t state =3D { + .exit_rcu =3D false, + }; =20 if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) { lockdep_hardirqs_off(CALLER_ADDR0); ct_irq_enter(); trace_hardirqs_off_finish(); =20 - regs->exit_rcu =3D true; - return; + state.exit_rcu =3D true; + return state; } =20 lockdep_hardirqs_off(CALLER_ADDR0); rcu_irq_enter_check_tick(); trace_hardirqs_off_finish(); + + return state; } =20 -static void noinstr enter_from_kernel_mode(struct pt_regs *regs) +static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *reg= s) { - __enter_from_kernel_mode(regs); + irqentry_state_t state =3D __enter_from_kernel_mode(regs); + mte_check_tfsr_entry(); mte_disable_tco_entry(current); + + return state; } =20 /* @@ -69,12 +83,13 @@ static void noinstr enter_from_kernel_mode(struct pt_re= gs *regs) * This is intended to match the logic in irqentry_exit(), handling the ke= rnel * mode transitions only, and with preemption handled elsewhere. */ -static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs) +static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs, + irqentry_state_t state) { lockdep_assert_irqs_disabled(); =20 if (!regs_irqs_disabled(regs)) { - if (regs->exit_rcu) { + if (state.exit_rcu) { trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); ct_irq_exit(); @@ -84,15 +99,16 @@ static __always_inline void __exit_to_kernel_mode(struc= t pt_regs *regs) =20 trace_hardirqs_on(); } else { - if (regs->exit_rcu) + if (state.exit_rcu) ct_irq_exit(); } } =20 -static void noinstr exit_to_kernel_mode(struct pt_regs *regs) +static void noinstr exit_to_kernel_mode(struct pt_regs *regs, + irqentry_state_t state) { mte_check_tfsr_exit(); - __exit_to_kernel_mode(regs); + __exit_to_kernel_mode(regs, state); } =20 /* @@ -190,9 +206,11 @@ asmlinkage void noinstr asm_exit_to_user_mode(struct p= t_regs *regs) * mode. Before this function is called it is not safe to call regular ker= nel * code, instrumentable code, or any code which may trigger an exception. */ -static void noinstr arm64_enter_nmi(struct pt_regs *regs) +static noinstr irqentry_state_t arm64_enter_nmi(struct pt_regs *regs) { - regs->lockdep_hardirqs =3D lockdep_hardirqs_enabled(); + irqentry_state_t state; + + state.lockdep =3D lockdep_hardirqs_enabled(); =20 __nmi_enter(); lockdep_hardirqs_off(CALLER_ADDR0); @@ -201,6 +219,8 @@ static void noinstr arm64_enter_nmi(struct pt_regs *reg= s) =20 trace_hardirqs_off_finish(); ftrace_nmi_enter(); + + return state; } =20 /* @@ -208,19 +228,18 @@ static void noinstr arm64_enter_nmi(struct pt_regs *r= egs) * mode. After this function returns it is not safe to call regular kernel * code, instrumentable code, or any code which may trigger an exception. */ -static void noinstr arm64_exit_nmi(struct pt_regs *regs) +static void noinstr arm64_exit_nmi(struct pt_regs *regs, + irqentry_state_t state) { - bool restore =3D regs->lockdep_hardirqs; - ftrace_nmi_exit(); - if (restore) { + if (state.lockdep) { trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); } =20 ct_nmi_exit(); lockdep_hardirq_exit(); - if (restore) + if (state.lockdep) lockdep_hardirqs_on(CALLER_ADDR0); __nmi_exit(); } @@ -230,14 +249,18 @@ static void noinstr arm64_exit_nmi(struct pt_regs *re= gs) * kernel mode. Before this function is called it is not safe to call regu= lar * kernel code, instrumentable code, or any code which may trigger an exce= ption. */ -static void noinstr arm64_enter_el1_dbg(struct pt_regs *regs) +static noinstr irqentry_state_t arm64_enter_el1_dbg(struct pt_regs *regs) { - regs->lockdep_hardirqs =3D lockdep_hardirqs_enabled(); + irqentry_state_t state; + + state.lockdep =3D lockdep_hardirqs_enabled(); =20 lockdep_hardirqs_off(CALLER_ADDR0); ct_nmi_enter(); =20 trace_hardirqs_off_finish(); + + return state; } =20 /* @@ -245,17 +268,16 @@ static void noinstr arm64_enter_el1_dbg(struct pt_reg= s *regs) * kernel mode. After this function returns it is not safe to call regular * kernel code, instrumentable code, or any code which may trigger an exce= ption. */ -static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs) +static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs, + irqentry_state_t state) { - bool restore =3D regs->lockdep_hardirqs; - - if (restore) { + if (state.lockdep) { trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); } =20 ct_nmi_exit(); - if (restore) + if (state.lockdep) lockdep_hardirqs_on(CALLER_ADDR0); } =20 @@ -426,78 +448,86 @@ UNHANDLED(el1t, 64, error) static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr) { unsigned long far =3D read_sysreg(far_el1); + irqentry_state_t state; =20 - enter_from_kernel_mode(regs); + state =3D enter_from_kernel_mode(regs); local_daif_inherit(regs); do_mem_abort(far, esr, regs); local_daif_mask(); - exit_to_kernel_mode(regs); + exit_to_kernel_mode(regs, state); } =20 static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr) { unsigned long far =3D read_sysreg(far_el1); + irqentry_state_t state; =20 - enter_from_kernel_mode(regs); + state =3D enter_from_kernel_mode(regs); local_daif_inherit(regs); do_sp_pc_abort(far, esr, regs); local_daif_mask(); - exit_to_kernel_mode(regs); + exit_to_kernel_mode(regs, state); } =20 static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr) { - enter_from_kernel_mode(regs); + irqentry_state_t state =3D enter_from_kernel_mode(regs); + local_daif_inherit(regs); do_el1_undef(regs, esr); local_daif_mask(); - exit_to_kernel_mode(regs); + exit_to_kernel_mode(regs, state); } =20 static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr) { - enter_from_kernel_mode(regs); + irqentry_state_t state =3D enter_from_kernel_mode(regs); + local_daif_inherit(regs); do_el1_bti(regs, esr); local_daif_mask(); - exit_to_kernel_mode(regs); + exit_to_kernel_mode(regs, state); } =20 static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr) { - enter_from_kernel_mode(regs); + irqentry_state_t state =3D enter_from_kernel_mode(regs); + local_daif_inherit(regs); do_el1_gcs(regs, esr); local_daif_mask(); - exit_to_kernel_mode(regs); + exit_to_kernel_mode(regs, state); } =20 static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr) { - enter_from_kernel_mode(regs); + irqentry_state_t state =3D enter_from_kernel_mode(regs); + local_daif_inherit(regs); do_el1_mops(regs, esr); local_daif_mask(); - exit_to_kernel_mode(regs); + exit_to_kernel_mode(regs, state); } =20 static void noinstr el1_dbg(struct pt_regs *regs, unsigned long esr) { unsigned long far =3D read_sysreg(far_el1); + irqentry_state_t state; =20 - arm64_enter_el1_dbg(regs); + state =3D arm64_enter_el1_dbg(regs); if (!cortex_a76_erratum_1463225_debug_handler(regs)) do_debug_exception(far, esr, regs); - arm64_exit_el1_dbg(regs); + arm64_exit_el1_dbg(regs, state); } =20 static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr) { - enter_from_kernel_mode(regs); + irqentry_state_t state =3D enter_from_kernel_mode(regs); + local_daif_inherit(regs); do_el1_fpac(regs, esr); local_daif_mask(); - exit_to_kernel_mode(regs); + exit_to_kernel_mode(regs, state); } =20 asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs) @@ -546,15 +576,16 @@ asmlinkage void noinstr el1h_64_sync_handler(struct p= t_regs *regs) static __always_inline void __el1_pnmi(struct pt_regs *regs, void (*handler)(struct pt_regs *)) { - arm64_enter_nmi(regs); + irqentry_state_t state =3D arm64_enter_nmi(regs); + do_interrupt_handler(regs, handler); - arm64_exit_nmi(regs); + arm64_exit_nmi(regs, state); } =20 static __always_inline void __el1_irq(struct pt_regs *regs, void (*handler)(struct pt_regs *)) { - enter_from_kernel_mode(regs); + irqentry_state_t state =3D enter_from_kernel_mode(regs); =20 irq_enter_rcu(); do_interrupt_handler(regs, handler); @@ -562,7 +593,7 @@ static __always_inline void __el1_irq(struct pt_regs *r= egs, =20 arm64_preempt_schedule_irq(); =20 - exit_to_kernel_mode(regs); + exit_to_kernel_mode(regs, state); } static void noinstr el1_interrupt(struct pt_regs *regs, void (*handler)(struct pt_regs *)) @@ -588,11 +619,12 @@ asmlinkage void noinstr el1h_64_fiq_handler(struct pt= _regs *regs) asmlinkage void noinstr el1h_64_error_handler(struct pt_regs *regs) { unsigned long esr =3D read_sysreg(esr_el1); + irqentry_state_t state; =20 local_daif_restore(DAIF_ERRCTX); - arm64_enter_nmi(regs); + state =3D arm64_enter_nmi(regs); do_serror(regs, esr); - arm64_exit_nmi(regs); + arm64_exit_nmi(regs, state); } =20 static void noinstr el0_da(struct pt_regs *regs, unsigned long esr) @@ -855,12 +887,13 @@ asmlinkage void noinstr el0t_64_fiq_handler(struct pt= _regs *regs) static void noinstr __el0_error_handler_common(struct pt_regs *regs) { unsigned long esr =3D read_sysreg(esr_el1); + irqentry_state_t state; =20 enter_from_user_mode(regs); local_daif_restore(DAIF_ERRCTX); - arm64_enter_nmi(regs); + state =3D arm64_enter_nmi(regs); do_serror(regs, esr); - arm64_exit_nmi(regs); + arm64_exit_nmi(regs, state); local_daif_restore(DAIF_PROCCTX); exit_to_user_mode(regs); } @@ -968,6 +1001,7 @@ asmlinkage void noinstr __noreturn handle_bad_stack(st= ruct pt_regs *regs) asmlinkage noinstr unsigned long __sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg) { + irqentry_state_t state; unsigned long ret; =20 /* @@ -992,9 +1026,9 @@ __sdei_handler(struct pt_regs *regs, struct sdei_regis= tered_event *arg) else if (cpu_has_pan()) set_pstate_pan(0); =20 - arm64_enter_nmi(regs); + state =3D arm64_enter_nmi(regs); ret =3D do_sdei_event(regs, arg); - arm64_exit_nmi(regs); + arm64_exit_nmi(regs, state); =20 return ret; } --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9617202F9C for ; Fri, 6 Dec 2024 10:18:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.190 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480301; cv=none; b=SScD3mz4sekaoFx/lQxxMLq3xpoKOMxErmY//6/qBYYlbZu9chIvrTd5NmHM1r4B7AJXSFhM/tRQbvgByGqo9BzjwF6eDCip/YG0KeMMCENuy0QGijO5ZQNpA8t6vSTII97ctggtqUJI8t8kkp3k1Fdalg4g7Wynvs4tIteJBGY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480301; c=relaxed/simple; bh=FxP0+a3gX5OaSPLpM+e+tGDhljSsiZO/tUL6pfy1ato=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KsB1+0Wlt9bO8klzFQTwvLwtZnbgt0v72ZgkA3touxlwBk5p2m0uqiLdbpNQZVaVFM5toJOlw0in9VzJP9DuotL2cbQVYTG/bqkYNZn8FbVNN+hzeLI8SHW2XznbYB8nfhsF1Iqxo5Y/PoChoVw8/+o87sAQ79UX1dc1RXDFZ2E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.190 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rwl2yr3z2DhB0; Fri, 6 Dec 2024 18:15:51 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id C04891A016C; Fri, 6 Dec 2024 18:18:10 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:09 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 03/22] arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode() Date: Fri, 6 Dec 2024 18:17:25 +0800 Message-ID: <20241206101744.4161990-4-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry code try to reschedule every time when the kernel mode non-NMI exception return. At the moment, arm64 only reschedule every time when EL1 irq exception return; In preparation for moving arm64 over to the generic entry code, move arm64_preempt_schedule_irq() into exit_to_kernel_mode(), so not only EL1 irq but also all EL1 non-NMI exception return, there is a chance to reschedule. And only if irqs are enabled when the exception trapped, there may be a chance to reschedule after the exceptions have been handled, so move arm64_preempt_schedule_irq() into regs_irqs_disabled() check false block, but it will try to reschedule only when TINY_RCU is enabled or current is not an idle task. As Mark pointed out, this change will have the following 2 key impact: - " We'll preempt even without taking a "real" interrupt. That shouldn't result in preemption that wasn't possible before, but it does change the probability of preempting at certain points, and might have a performance impact, so probably warrants a benchmark." - " We will not preempt when taking interrupts from a region of kernel code where IRQs are enabled but RCU is not watching, matching the behaviour of the generic entry code. This has the potential to introduce livelock if we can ever have a screaming interrupt in such a region, so we'll need to go figure out whether that's actually a problem. Having this as a separate patch will make it easier to test/bisect for that specifically." Suggested-by: Mark Rutland Signed-off-by: Jinjie Ruan --- arch/arm64/kernel/entry-common.c | 88 ++++++++++++++++---------------- 1 file changed, 44 insertions(+), 44 deletions(-) diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index 1687627b2ecf..7a588515ee07 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -75,6 +75,48 @@ static noinstr irqentry_state_t enter_from_kernel_mode(s= truct pt_regs *regs) return state; } =20 +#ifdef CONFIG_PREEMPT_DYNAMIC +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); +#define need_irq_preemption() \ + (static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) +#else +#define need_irq_preemption() (IS_ENABLED(CONFIG_PREEMPTION)) +#endif + +static void __sched arm64_preempt_schedule_irq(void) +{ + if (!need_irq_preemption()) + return; + + /* + * Note: thread_info::preempt_count includes both thread_info::count + * and thread_info::need_resched, and is not equivalent to + * preempt_count(). + */ + if (READ_ONCE(current_thread_info()->preempt_count) !=3D 0) + return; + + /* + * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC + * priority masking is used the GIC irqchip driver will clear DAIF.IF + * using gic_arch_enable_irqs() for normal IRQs. If anything is set in + * DAIF we must have handled an NMI, so skip preemption. + */ + if (system_uses_irq_prio_masking() && read_sysreg(daif)) + return; + + /* + * Preempting a task from an IRQ means we leave copies of PSTATE + * on the stack. cpufeature's enable calls may modify PSTATE, but + * resuming one of these preempted tasks would undo those changes. + * + * Only allow a task to be preempted once cpufeatures have been + * enabled. + */ + if (system_capabilities_finalized()) + preempt_schedule_irq(); +} + /* * Handle IRQ/context state management when exiting to kernel mode. * After this function returns it is not safe to call regular kernel code, @@ -97,6 +139,8 @@ static __always_inline void __exit_to_kernel_mode(struct= pt_regs *regs, return; } =20 + arm64_preempt_schedule_irq(); + trace_hardirqs_on(); } else { if (state.exit_rcu) @@ -281,48 +325,6 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs = *regs, lockdep_hardirqs_on(CALLER_ADDR0); } =20 -#ifdef CONFIG_PREEMPT_DYNAMIC -DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); -#define need_irq_preemption() \ - (static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) -#else -#define need_irq_preemption() (IS_ENABLED(CONFIG_PREEMPTION)) -#endif - -static void __sched arm64_preempt_schedule_irq(void) -{ - if (!need_irq_preemption()) - return; - - /* - * Note: thread_info::preempt_count includes both thread_info::count - * and thread_info::need_resched, and is not equivalent to - * preempt_count(). - */ - if (READ_ONCE(current_thread_info()->preempt_count) !=3D 0) - return; - - /* - * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC - * priority masking is used the GIC irqchip driver will clear DAIF.IF - * using gic_arch_enable_irqs() for normal IRQs. If anything is set in - * DAIF we must have handled an NMI, so skip preemption. - */ - if (system_uses_irq_prio_masking() && read_sysreg(daif)) - return; - - /* - * Preempting a task from an IRQ means we leave copies of PSTATE - * on the stack. cpufeature's enable calls may modify PSTATE, but - * resuming one of these preempted tasks would undo those changes. - * - * Only allow a task to be preempted once cpufeatures have been - * enabled. - */ - if (system_capabilities_finalized()) - preempt_schedule_irq(); -} - static void do_interrupt_handler(struct pt_regs *regs, void (*handler)(struct pt_regs *)) { @@ -591,8 +593,6 @@ static __always_inline void __el1_irq(struct pt_regs *r= egs, do_interrupt_handler(regs, handler); irq_exit_rcu(); =20 - arm64_preempt_schedule_irq(); - exit_to_kernel_mode(regs, state); } static void noinstr el1_interrupt(struct pt_regs *regs, --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB2B11FBE90 for ; Fri, 6 Dec 2024 10:18:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.35 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480302; cv=none; b=Axmi+475BzKcqjnUhiKa4biwKmRfcYnkw1HH4pDrVrYQqt7WoINAC6S/lV+UxcbdIjMy5X217jEPQ9DxmBm5SnKHYx3cIUPHbLno8RubCzQ9Vi90D/FbrjfZ6H/aF8SVOg6r5kVaAW4BPOPS58pJgyhuhRcGeSZg4vw9iNDonhM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480302; c=relaxed/simple; bh=ZZQ57p6ylzFGJO0VHxVH8U+YK1pdsYbdIw77G6g4F8U=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Mp6lh7yPLHf7I5HmDNhMHiuUFNLZ2IgJGKx4eqxBT1uVzB16IBYYVgp7m4jIQmV2+ptq/ragpw0j26gRW86ZUSQpGKmT+GYWP+UrM/hjYtTI3y2hPOVpQX7pov//EleoKHPordJaLZ98B/O2ma+bqbPLarTrh3IChuxW9ovXsgI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.35 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rwm0C8Qz1T4md; Fri, 6 Dec 2024 18:15:52 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id EF8031401DC; Fri, 6 Dec 2024 18:18:11 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:10 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 04/22] arm64: entry: Rework arm64_preempt_schedule_irq() Date: Fri, 6 Dec 2024 18:17:26 +0800 Message-ID: <20241206101744.4161990-5-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry do preempt_schedule_irq() by checking if need_resched() satisfied, but arm64 has some of its own additional checks such as GIC priority masking. In preparation for moving arm64 over to the generic entry code, rework arm64_preempt_schedule_irq() to check whether it need resched in a check function called arm64_need_resched(). No functional changes. Signed-off-by: Jinjie Ruan --- arch/arm64/kernel/entry-common.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index 7a588515ee07..da68c089b74b 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -83,10 +83,10 @@ DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_re= sched); #define need_irq_preemption() (IS_ENABLED(CONFIG_PREEMPTION)) #endif =20 -static void __sched arm64_preempt_schedule_irq(void) +static inline bool arm64_need_resched(void) { if (!need_irq_preemption()) - return; + return false; =20 /* * Note: thread_info::preempt_count includes both thread_info::count @@ -94,7 +94,7 @@ static void __sched arm64_preempt_schedule_irq(void) * preempt_count(). */ if (READ_ONCE(current_thread_info()->preempt_count) !=3D 0) - return; + return false; =20 /* * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC @@ -103,7 +103,7 @@ static void __sched arm64_preempt_schedule_irq(void) * DAIF we must have handled an NMI, so skip preemption. */ if (system_uses_irq_prio_masking() && read_sysreg(daif)) - return; + return false; =20 /* * Preempting a task from an IRQ means we leave copies of PSTATE @@ -113,8 +113,10 @@ static void __sched arm64_preempt_schedule_irq(void) * Only allow a task to be preempted once cpufeatures have been * enabled. */ - if (system_capabilities_finalized()) - preempt_schedule_irq(); + if (!system_capabilities_finalized()) + return false; + + return true; } =20 /* @@ -139,7 +141,8 @@ static __always_inline void __exit_to_kernel_mode(struc= t pt_regs *regs, return; } =20 - arm64_preempt_schedule_irq(); + if (arm64_need_resched()) + preempt_schedule_irq(); =20 trace_hardirqs_on(); } else { --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12E6F203709 for ; Fri, 6 Dec 2024 10:18:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.255 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480303; cv=none; b=DWdMp9I7/YC140eBj1qCRpAH2gSwWs0lVYIh8h97UJ/4/HNcLUHLonj46ZrcyS1L/efv1BvRAalDOpwv4a3UlTMh2r5Hm95gmahTX361c6qFwxfDsszYGhHN0LF6+rA3M6xxLN5AMULE853dOo7aXe7rfDx8AUU/9lLnw4cKfDw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480303; c=relaxed/simple; bh=4WskRhQGagi4FTzcq1gOuzQk5wCw0fW6eoTW77UW75g=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oCeMWfdJTUBAWIXxXg2ztAVebBAs/fbxBvTPQtc3ZTW0p13qiLK8pSI2unWZ3m8et0SInGoo5x/a09Cy3QTK/ovBzXgJWmtOhofjMqNpj+2MWrHR+APwdTA442tNumNkm/BxMdUDWFAH7Kjgq08UGCLFEOROCcv3QJqhE85fuVs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.255 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rw20yXpz1V4WK; Fri, 6 Dec 2024 18:15:14 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 318F818009B; Fri, 6 Dec 2024 18:18:13 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:11 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 05/22] arm64: entry: Use preempt_count() and need_resched() helper Date: Fri, 6 Dec 2024 18:17:27 +0800 Message-ID: <20241206101744.4161990-6-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry code uses preempt_count() and need_resched() helpers to check if it is time to resched. Currently, arm64 use its own check logic, that is "READ_ONCE(current_thread_info()->preempt_count =3D=3D 0", which is equivalent to "preempt_count() =3D=3D 0 && need_resched()". In preparation for moving arm64 over to the generic entry code, use these helpers to replace arm64's own code and move it ahead. No functional changes. Signed-off-by: Jinjie Ruan --- arch/arm64/kernel/entry-common.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index da68c089b74b..efd1a990d138 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -88,14 +88,6 @@ static inline bool arm64_need_resched(void) if (!need_irq_preemption()) return false; =20 - /* - * Note: thread_info::preempt_count includes both thread_info::count - * and thread_info::need_resched, and is not equivalent to - * preempt_count(). - */ - if (READ_ONCE(current_thread_info()->preempt_count) !=3D 0) - return false; - /* * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC * priority masking is used the GIC irqchip driver will clear DAIF.IF @@ -141,8 +133,10 @@ static __always_inline void __exit_to_kernel_mode(stru= ct pt_regs *regs, return; } =20 - if (arm64_need_resched()) - preempt_schedule_irq(); + if (!preempt_count() && need_resched()) { + if (arm64_need_resched()) + preempt_schedule_irq(); + } =20 trace_hardirqs_on(); } else { --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07EC8203700 for ; Fri, 6 Dec 2024 10:18:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480300; cv=none; b=Kn7q6vfzqjLwgD4wv1VTviZsuAEODU2ZYRRGe+Qhg0QMmJ8JAaZGuMxpmdEwMGjoWIRBzmKIsfCaBKZtrHWSTJPAyjQfzHsVvnAldc0dlZ6UgQtGKWJOAX3c8rICQbBF3H2edzSIVCwOyHenyFEK6Z7/FB8ocwgY7naSdh4vU48= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480300; c=relaxed/simple; bh=s+OHIYhLdNgu2Bj1fkzB53Rc0o98RZT6LUjMLO0MEQA=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mNIm7St/cdKQte4rKGyrvMxD2Vq4cwIW6NaQ/nkIHMPbRO/JdOLW09TiJYha/4EXt271hD1JCXyxcPUh1H/ExBkTSo6EF1rKkCV4G5ejk4B+ERGmGhUhnLcWVSLwDvEx8b7BkWBHNS4EdXLzfIrIrDOkkrDZtxPyqg+9G+YsYNc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rw30dq5z11Lt2; Fri, 6 Dec 2024 18:15:15 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 6CA1D18009B; Fri, 6 Dec 2024 18:18:14 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:13 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 06/22] arm64: entry: Expand the need_irq_preemption() macro ahead Date: Fri, 6 Dec 2024 18:17:28 +0800 Message-ID: <20241206101744.4161990-7-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry has the same logic as need_irq_preemption() macro and use a helper function to check other resched condition. In preparation for moving arm64 over to the generic entry code, check and expand need_irq_preemption() ahead and extract arm64 resched check code to a helper function. No functional changes. Signed-off-by: Jinjie Ruan --- arch/arm64/include/asm/preempt.h | 1 + arch/arm64/kernel/entry-common.c | 28 +++++++++++++++++----------- 2 files changed, 18 insertions(+), 11 deletions(-) diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/pree= mpt.h index 0159b625cc7f..d0f93385bd85 100644 --- a/arch/arm64/include/asm/preempt.h +++ b/arch/arm64/include/asm/preempt.h @@ -85,6 +85,7 @@ static inline bool should_resched(int preempt_offset) void preempt_schedule(void); void preempt_schedule_notrace(void); =20 +void raw_irqentry_exit_cond_resched(void); #ifdef CONFIG_PREEMPT_DYNAMIC =20 DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index efd1a990d138..80b47ca02db2 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -77,17 +77,10 @@ static noinstr irqentry_state_t enter_from_kernel_mode(= struct pt_regs *regs) =20 #ifdef CONFIG_PREEMPT_DYNAMIC DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); -#define need_irq_preemption() \ - (static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) -#else -#define need_irq_preemption() (IS_ENABLED(CONFIG_PREEMPTION)) #endif =20 static inline bool arm64_need_resched(void) { - if (!need_irq_preemption()) - return false; - /* * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC * priority masking is used the GIC irqchip driver will clear DAIF.IF @@ -111,6 +104,22 @@ static inline bool arm64_need_resched(void) return true; } =20 +void raw_irqentry_exit_cond_resched(void) +{ +#ifdef CONFIG_PREEMPT_DYNAMIC + if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) + return; +#else + if (!IS_ENABLED(CONFIG_PREEMPTION)) + return; +#endif + + if (!preempt_count()) { + if (need_resched() && arm64_need_resched()) + preempt_schedule_irq(); + } +} + /* * Handle IRQ/context state management when exiting to kernel mode. * After this function returns it is not safe to call regular kernel code, @@ -133,10 +142,7 @@ static __always_inline void __exit_to_kernel_mode(stru= ct pt_regs *regs, return; } =20 - if (!preempt_count() && need_resched()) { - if (arm64_need_resched()) - preempt_schedule_irq(); - } + raw_irqentry_exit_cond_resched(); =20 trace_hardirqs_on(); } else { --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07E182036FB for ; Fri, 6 Dec 2024 10:18:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480301; cv=none; b=dUcvLbjqkcp9ISKs/0ARvNMxace3+t2XSDnMc1lvIeQtMDBwtiwCXYiaquSesSBZLA0uARVoBW8Cyvik0B667i0hIbu5qo/ic9Xw0+P6TxWn7w++1+lpYmHo8GNX91qEAjf3CUkOuE+7WO3G7GSGhT7RP74/hesFLhm5Jm/yTJw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480301; c=relaxed/simple; bh=rqurVqkZRfW7oZulKQzLpzbxWMwBVcnf1qx2ZqnnqlI=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=U1O78Mfr53wMW8kq07VatPGCX03Qri5qv11MSiM0dh9ZTJ70dHaVqlJGcs6TNj1ln88r3K9OzwdojVDgrGKWrnRJHz0xczzLthZnTp2caU+R/iPp4DTdApu9oG31YS2k15U3e822fajwZ1j+xg862RfapcHB3Ecuq+ZkjgeK5gk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rwr30x4z11Pg9; Fri, 6 Dec 2024 18:15:56 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 95893140360; Fri, 6 Dec 2024 18:18:15 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:14 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 07/22] arm64: entry: preempt_schedule_irq() only if PREEMPTION enabled Date: Fri, 6 Dec 2024 18:17:29 +0800 Message-ID: <20241206101744.4161990-8-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry check PREEMPTION for both PREEMPT_DYNAMIC enabled and PREEMPT_DYNAMIC disabled. Whether PREEMPT_DYNAMIC enabled or not, PREEMPTION should be enabled to allow reschedule before EL1 exception return, so move PREEMPTION check ahead in preparation for moving arm64 over to the generic entry code. Signed-off-by: Jinjie Ruan --- arch/arm64/kernel/entry-common.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index 80b47ca02db2..029f8bd72f8a 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -109,9 +109,6 @@ void raw_irqentry_exit_cond_resched(void) #ifdef CONFIG_PREEMPT_DYNAMIC if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) return; -#else - if (!IS_ENABLED(CONFIG_PREEMPTION)) - return; #endif =20 if (!preempt_count()) { @@ -142,7 +139,8 @@ static __always_inline void __exit_to_kernel_mode(struc= t pt_regs *regs, return; } =20 - raw_irqentry_exit_cond_resched(); + if (IS_ENABLED(CONFIG_PREEMPTION)) + raw_irqentry_exit_cond_resched(); =20 trace_hardirqs_on(); } else { --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18C3E20370E for ; Fri, 6 Dec 2024 10:18:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.191 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480301; cv=none; b=ML1QSi1T7YoUgM8lY7SYpEk49uuo2rGkWhU+6FT559xBPiZBicvrsByCAh1+BoPXItv1I8m/riSsX/Z9c+9SlnOc7lfH1U5QMtOEFibsY1wOwX1aqBd7YZo61CZoYicEiQ+EHzWolvpw5RFxCgDyJsQl9u34RrI1X5+6dDLEbrU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480301; c=relaxed/simple; bh=ORaEWNYQq1V+fMAvc4CPDGyD/Zi7TnK9zWVVeGQvQCg=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=rRnJgpvUADPDPQHrdoMRPkfh+8ZDwoSag+uFBwRIwOqgKuhMdsdHHvu27/sf7EfYA2+RA0SUssT73vw5oXl+14YvKRv/C+CCJ0sCBFwhhGLhK2yz1/WVdC/kdJfeyO2Mtlcg7Xt1SZlF2BbG35T4hb//TqeXyvm4GXufQXzflHA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.191 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rws4GZ2z1kvVr; Fri, 6 Dec 2024 18:15:57 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id C093C1402E0; Fri, 6 Dec 2024 18:18:16 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:15 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 08/22] arm64: entry: Use different helpers to check resched for PREEMPT_DYNAMIC Date: Fri, 6 Dec 2024 18:17:30 +0800 Message-ID: <20241206101744.4161990-9-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" In generic entry, when PREEMPT_DYNAMIC is enabled or disabled, two different helpers are used to check whether resched is required and some common code is reused. In preparation for moving arm64 over to the generic entry code, use new helper to check resched when PREEMPT_DYNAMIC enabled and reuse common code for the disabled case. No functional changes. Signed-off-by: Jinjie Ruan --- arch/arm64/include/asm/preempt.h | 3 +++ arch/arm64/kernel/entry-common.c | 21 +++++++++++---------- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/pree= mpt.h index d0f93385bd85..0f0ba250efe8 100644 --- a/arch/arm64/include/asm/preempt.h +++ b/arch/arm64/include/asm/preempt.h @@ -93,11 +93,14 @@ void dynamic_preempt_schedule(void); #define __preempt_schedule() dynamic_preempt_schedule() void dynamic_preempt_schedule_notrace(void); #define __preempt_schedule_notrace() dynamic_preempt_schedule_notrace() +void dynamic_irqentry_exit_cond_resched(void); +#define irqentry_exit_cond_resched() dynamic_irqentry_exit_cond_resched() =20 #else /* CONFIG_PREEMPT_DYNAMIC */ =20 #define __preempt_schedule() preempt_schedule() #define __preempt_schedule_notrace() preempt_schedule_notrace() +#define irqentry_exit_cond_resched() raw_irqentry_exit_cond_resched() =20 #endif /* CONFIG_PREEMPT_DYNAMIC */ #endif /* CONFIG_PREEMPTION */ diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index 029f8bd72f8a..015a65d19b52 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -75,10 +75,6 @@ static noinstr irqentry_state_t enter_from_kernel_mode(s= truct pt_regs *regs) return state; } =20 -#ifdef CONFIG_PREEMPT_DYNAMIC -DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); -#endif - static inline bool arm64_need_resched(void) { /* @@ -106,17 +102,22 @@ static inline bool arm64_need_resched(void) =20 void raw_irqentry_exit_cond_resched(void) { -#ifdef CONFIG_PREEMPT_DYNAMIC - if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) - return; -#endif - if (!preempt_count()) { if (need_resched() && arm64_need_resched()) preempt_schedule_irq(); } } =20 +#ifdef CONFIG_PREEMPT_DYNAMIC +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); +void dynamic_irqentry_exit_cond_resched(void) +{ + if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) + return; + raw_irqentry_exit_cond_resched(); +} +#endif + /* * Handle IRQ/context state management when exiting to kernel mode. * After this function returns it is not safe to call regular kernel code, @@ -140,7 +141,7 @@ static __always_inline void __exit_to_kernel_mode(struc= t pt_regs *regs, } =20 if (IS_ENABLED(CONFIG_PREEMPTION)) - raw_irqentry_exit_cond_resched(); + irqentry_exit_cond_resched(); =20 trace_hardirqs_on(); } else { --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE7BE2036F1 for ; Fri, 6 Dec 2024 10:18:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.35 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480305; cv=none; b=mAQGvnPMULbiH0DrhmtvET05etssfLppmQPSFLikQ+v2EPiQXFhI90lClsciB6EubNaxum6rRVEsAjCeTvcmdscfZQbqh6abX7BZNw8Z5uZ7tbIJHcNyBXdwtEViJT0zaBsEUS88WS26dqo0I+YVHfhNSK+XAZLC3lO9RwyZdLI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480305; c=relaxed/simple; bh=8hWzvsKf0ZzeG0LOM6k+dOICtZMRxPcGgs2L8zVoNQg=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cBtVinIpTXLYh5Z3B3J0gY1GSCSVV3yP1tdBMFcgGJMJ+qd+7WwD6ZcrWOPVp7F9Yj7fa6kKEK8FkQfOapMhhBAqDkp4trjqCBWaZvAKoYOy01JMm0uPJpIh1qaRbD6iGQwC67mbQl7BTb1iJoAIKpp2lRbQa0gup5HHoBe8+CA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.35 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rwt0sCJz1T6WV; Fri, 6 Dec 2024 18:15:58 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 0A16B180041; Fri, 6 Dec 2024 18:18:18 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:16 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 09/22] entry: Split generic entry into irq and syscall Date: Fri, 6 Dec 2024 18:17:31 +0800 Message-ID: <20241206101744.4161990-10-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" As Mark pointed out, do not try to switch to *all* the generic entry code in one go. The regular entry state management (e.g. enter_from_user_mode() and exit_to_user_mode()) is largely separate from the syscall state management. Move arm64 over to enter_from_user_mode() and exit_to_user_mode() without needing to use any of the generic syscall logic. Doing that first, *then* moving over to the generic syscall handling would be much easier to review/test/bisect, and if there are any ABI issues with the syscall handling in particular, it will be easier to handle those in isolation. So split generic entry into irq entry and syscall code, which will make review work easier and switch to generic entry clear. Introdue two configs called GENERIC_SYSCALL and GENERIC_IRQ_ENTRY, which control the irq entry and syscall parts of the generic code respectively. And split the header file irq-entry-common.h from entry-common.h for GENERIC_IRQ_ENTRY. Suggested-by: Mark Rutland Signed-off-by: Jinjie Ruan --- MAINTAINERS | 1 + arch/Kconfig | 8 + include/linux/entry-common.h | 382 +----------------------------- include/linux/irq-entry-common.h | 389 +++++++++++++++++++++++++++++++ kernel/entry/Makefile | 3 +- kernel/entry/common.c | 160 +------------ kernel/entry/syscall-common.c | 159 +++++++++++++ kernel/sched/core.c | 8 +- 8 files changed, 565 insertions(+), 545 deletions(-) create mode 100644 include/linux/irq-entry-common.h create mode 100644 kernel/entry/syscall-common.c diff --git a/MAINTAINERS b/MAINTAINERS index 21f855fe468b..7a6e87587101 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9585,6 +9585,7 @@ S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/entry F: include/linux/entry-common.h F: include/linux/entry-kvm.h +F: include/linux/irq-entry-common.h F: kernel/entry/ =20 GENERIC GPIO I2C DRIVER diff --git a/arch/Kconfig b/arch/Kconfig index 6682b2a53e34..5a454eff780b 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -64,8 +64,16 @@ config HOTPLUG_PARALLEL bool select HOTPLUG_SPLIT_STARTUP =20 +config GENERIC_IRQ_ENTRY + bool + +config GENERIC_SYSCALL + bool + config GENERIC_ENTRY bool + select GENERIC_IRQ_ENTRY + select GENERIC_SYSCALL =20 config KPROBES bool "Kprobes" diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index fc61d0205c97..b3233e8328c5 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -2,27 +2,15 @@ #ifndef __LINUX_ENTRYCOMMON_H #define __LINUX_ENTRYCOMMON_H =20 -#include +#include #include -#include #include #include -#include #include #include -#include -#include =20 #include =20 -/* - * Define dummy _TIF work flags if not defined by the architecture or for - * disabled functionality. - */ -#ifndef _TIF_PATCH_PENDING -# define _TIF_PATCH_PENDING (0) -#endif - #ifndef _TIF_UPROBE # define _TIF_UPROBE (0) #endif @@ -55,69 +43,6 @@ SYSCALL_WORK_SYSCALL_EXIT_TRAP | \ ARCH_SYSCALL_WORK_EXIT) =20 -/* - * TIF flags handled in exit_to_user_mode_loop() - */ -#ifndef ARCH_EXIT_TO_USER_MODE_WORK -# define ARCH_EXIT_TO_USER_MODE_WORK (0) -#endif - -#define EXIT_TO_USER_MODE_WORK \ - (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ - _TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY | \ - _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL | \ - ARCH_EXIT_TO_USER_MODE_WORK) - -/** - * arch_enter_from_user_mode - Architecture specific sanity check for user= mode regs - * @regs: Pointer to currents pt_regs - * - * Defaults to an empty implementation. Can be replaced by architecture - * specific code. - * - * Invoked from syscall_enter_from_user_mode() in the non-instrumentable - * section. Use __always_inline so the compiler cannot push it out of line - * and make it instrumentable. - */ -static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs= ); - -#ifndef arch_enter_from_user_mode -static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs= ) {} -#endif - -/** - * enter_from_user_mode - Establish state when coming from user mode - * - * Syscall/interrupt entry disables interrupts, but user mode is traced as - * interrupts enabled. Also with NO_HZ_FULL RCU might be idle. - * - * 1) Tell lockdep that interrupts are disabled - * 2) Invoke context tracking if enabled to reactivate RCU - * 3) Trace interrupts off state - * - * Invoked from architecture specific syscall entry code with interrupts - * disabled. The calling code has to be non-instrumentable. When the - * function returns all state is correct and interrupts are still - * disabled. The subsequent functions can be instrumented. - * - * This is invoked when there is architecture specific functionality to be - * done between establishing state and enabling interrupts. The caller must - * enable interrupts before invoking syscall_enter_from_user_mode_work(). - */ -static __always_inline void enter_from_user_mode(struct pt_regs *regs) -{ - arch_enter_from_user_mode(regs); - lockdep_hardirqs_off(CALLER_ADDR0); - - CT_WARN_ON(__ct_state() !=3D CT_STATE_USER); - user_exit_irqoff(); - - instrumentation_begin(); - kmsan_unpoison_entry_regs(regs); - trace_hardirqs_off_finish(); - instrumentation_end(); -} - /** * syscall_enter_from_user_mode_prepare - Establish state and enable inter= rupts * @regs: Pointer to currents pt_regs @@ -202,170 +127,6 @@ static __always_inline long syscall_enter_from_user_m= ode(struct pt_regs *regs, l return ret; } =20 -/** - * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enabl= e() - * @ti_work: Cached TIF flags gathered with interrupts disabled - * - * Defaults to local_irq_enable(). Can be supplied by architecture specific - * code. - */ -static inline void local_irq_enable_exit_to_user(unsigned long ti_work); - -#ifndef local_irq_enable_exit_to_user -static inline void local_irq_enable_exit_to_user(unsigned long ti_work) -{ - local_irq_enable(); -} -#endif - -/** - * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disa= ble() - * - * Defaults to local_irq_disable(). Can be supplied by architecture specif= ic - * code. - */ -static inline void local_irq_disable_exit_to_user(void); - -#ifndef local_irq_disable_exit_to_user -static inline void local_irq_disable_exit_to_user(void) -{ - local_irq_disable(); -} -#endif - -/** - * arch_exit_to_user_mode_work - Architecture specific TIF work for exit - * to user mode. - * @regs: Pointer to currents pt_regs - * @ti_work: Cached TIF flags gathered with interrupts disabled - * - * Invoked from exit_to_user_mode_loop() with interrupt enabled - * - * Defaults to NOOP. Can be supplied by architecture specific code. - */ -static inline void arch_exit_to_user_mode_work(struct pt_regs *regs, - unsigned long ti_work); - -#ifndef arch_exit_to_user_mode_work -static inline void arch_exit_to_user_mode_work(struct pt_regs *regs, - unsigned long ti_work) -{ -} -#endif - -/** - * arch_exit_to_user_mode_prepare - Architecture specific preparation for - * exit to user mode. - * @regs: Pointer to currents pt_regs - * @ti_work: Cached TIF flags gathered with interrupts disabled - * - * Invoked from exit_to_user_mode_prepare() with interrupt disabled as the= last - * function before return. Defaults to NOOP. - */ -static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, - unsigned long ti_work); - -#ifndef arch_exit_to_user_mode_prepare -static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, - unsigned long ti_work) -{ -} -#endif - -/** - * arch_exit_to_user_mode - Architecture specific final work before - * exit to user mode. - * - * Invoked from exit_to_user_mode() with interrupt disabled as the last - * function before return. Defaults to NOOP. - * - * This needs to be __always_inline because it is non-instrumentable code - * invoked after context tracking switched to user mode. - * - * An architecture implementation must not do anything complex, no locking - * etc. The main purpose is for speculation mitigations. - */ -static __always_inline void arch_exit_to_user_mode(void); - -#ifndef arch_exit_to_user_mode -static __always_inline void arch_exit_to_user_mode(void) { } -#endif - -/** - * arch_do_signal_or_restart - Architecture specific signal delivery func= tion - * @regs: Pointer to currents pt_regs - * - * Invoked from exit_to_user_mode_loop(). - */ -void arch_do_signal_or_restart(struct pt_regs *regs); - -/** - * exit_to_user_mode_loop - do any pending work before leaving to user spa= ce - */ -unsigned long exit_to_user_mode_loop(struct pt_regs *regs, - unsigned long ti_work); - -/** - * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required - * @regs: Pointer to pt_regs on entry stack - * - * 1) check that interrupts are disabled - * 2) call tick_nohz_user_enter_prepare() - * 3) call exit_to_user_mode_loop() if any flags from - * EXIT_TO_USER_MODE_WORK are set - * 4) check that interrupts are still disabled - */ -static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs) -{ - unsigned long ti_work; - - lockdep_assert_irqs_disabled(); - - /* Flush pending rcuog wakeup before the last need_resched() check */ - tick_nohz_user_enter_prepare(); - - ti_work =3D read_thread_flags(); - if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) - ti_work =3D exit_to_user_mode_loop(regs, ti_work); - - arch_exit_to_user_mode_prepare(regs, ti_work); - - /* Ensure that kernel state is sane for a return to userspace */ - kmap_assert_nomap(); - lockdep_assert_irqs_disabled(); - lockdep_sys_exit(); -} - -/** - * exit_to_user_mode - Fixup state when exiting to user mode - * - * Syscall/interrupt exit enables interrupts, but the kernel state is - * interrupts disabled when this is invoked. Also tell RCU about it. - * - * 1) Trace interrupts on state - * 2) Invoke context tracking if enabled to adjust RCU state - * 3) Invoke architecture specific last minute exit code, e.g. speculation - * mitigations, etc.: arch_exit_to_user_mode() - * 4) Tell lockdep that interrupts are enabled - * - * Invoked from architecture specific code when syscall_exit_to_user_mode() - * is not suitable as the last step before returning to userspace. Must be - * invoked with interrupts disabled and the caller must be - * non-instrumentable. - * The caller has to invoke syscall_exit_to_user_mode_work() before this. - */ -static __always_inline void exit_to_user_mode(void) -{ - instrumentation_begin(); - trace_hardirqs_on_prepare(); - lockdep_hardirqs_on_prepare(); - instrumentation_end(); - - user_enter_irqoff(); - arch_exit_to_user_mode(); - lockdep_hardirqs_on(CALLER_ADDR0); -} - /** * syscall_exit_to_user_mode_work - Handle work before returning to user m= ode * @regs: Pointer to currents pt_regs @@ -412,145 +173,4 @@ void syscall_exit_to_user_mode_work(struct pt_regs *r= egs); */ void syscall_exit_to_user_mode(struct pt_regs *regs); =20 -/** - * irqentry_enter_from_user_mode - Establish state before invoking the irq= handler - * @regs: Pointer to currents pt_regs - * - * Invoked from architecture specific entry code with interrupts disabled. - * Can only be called when the interrupt entry came from user mode. The - * calling code must be non-instrumentable. When the function returns all - * state is correct and the subsequent functions can be instrumented. - * - * The function establishes state (lockdep, RCU (context tracking), tracin= g) - */ -void irqentry_enter_from_user_mode(struct pt_regs *regs); - -/** - * irqentry_exit_to_user_mode - Interrupt exit work - * @regs: Pointer to current's pt_regs - * - * Invoked with interrupts disabled and fully valid regs. Returns with all - * work handled, interrupts disabled such that the caller can immediately - * switch to user mode. Called from architecture specific interrupt - * handling code. - * - * The call order is #2 and #3 as described in syscall_exit_to_user_mode(). - * Interrupt exit is not invoking #1 which is the syscall specific one time - * work. - */ -void irqentry_exit_to_user_mode(struct pt_regs *regs); - -#ifndef irqentry_state -/** - * struct irqentry_state - Opaque object for exception state storage - * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether = the - * exit path has to invoke ct_irq_exit(). - * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that - * lockdep state is restored correctly on exit from nmi. - * - * This opaque object is filled in by the irqentry_*_enter() functions and - * must be passed back into the corresponding irqentry_*_exit() functions - * when the exception is complete. - * - * Callers of irqentry_*_[enter|exit]() must consider this structure opaque - * and all members private. Descriptions of the members are provided to a= id in - * the maintenance of the irqentry_*() functions. - */ -typedef struct irqentry_state { - union { - bool exit_rcu; - bool lockdep; - }; -} irqentry_state_t; -#endif - -/** - * irqentry_enter - Handle state tracking on ordinary interrupt entries - * @regs: Pointer to pt_regs of interrupted context - * - * Invokes: - * - lockdep irqflag state tracking as low level ASM entry disabled - * interrupts. - * - * - Context tracking if the exception hit user mode. - * - * - The hardirq tracer to keep the state consistent as low level ASM - * entry disabled interrupts. - * - * As a precondition, this requires that the entry came from user mode, - * idle, or a kernel context in which RCU is watching. - * - * For kernel mode entries RCU handling is done conditional. If RCU is - * watching then the only RCU requirement is to check whether the tick has - * to be restarted. If RCU is not watching then ct_irq_enter() has to be - * invoked on entry and ct_irq_exit() on exit. - * - * Avoiding the ct_irq_enter/exit() calls is an optimization but also - * solves the problem of kernel mode pagefaults which can schedule, which - * is not possible after invoking ct_irq_enter() without undoing it. - * - * For user mode entries irqentry_enter_from_user_mode() is invoked to - * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit - * would not be possible. - * - * Returns: An opaque object that must be passed to idtentry_exit() - */ -irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs); - -/** - * irqentry_exit_cond_resched - Conditionally reschedule on return from in= terrupt - * - * Conditional reschedule with additional sanity checks. - */ -void raw_irqentry_exit_cond_resched(void); -#ifdef CONFIG_PREEMPT_DYNAMIC -#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) -#define irqentry_exit_cond_resched_dynamic_enabled raw_irqentry_exit_cond_= resched -#define irqentry_exit_cond_resched_dynamic_disabled NULL -DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_res= ched); -#define irqentry_exit_cond_resched() static_call(irqentry_exit_cond_resche= d)() -#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY) -DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); -void dynamic_irqentry_exit_cond_resched(void); -#define irqentry_exit_cond_resched() dynamic_irqentry_exit_cond_resched() -#endif -#else /* CONFIG_PREEMPT_DYNAMIC */ -#define irqentry_exit_cond_resched() raw_irqentry_exit_cond_resched() -#endif /* CONFIG_PREEMPT_DYNAMIC */ - -/** - * irqentry_exit - Handle return from exception that used irqentry_enter() - * @regs: Pointer to pt_regs (exception entry regs) - * @state: Return value from matching call to irqentry_enter() - * - * Depending on the return target (kernel/user) this runs the necessary - * preemption and work checks if possible and required and returns to - * the caller with interrupts disabled and no further work pending. - * - * This is the last action before returning to the low level ASM code which - * just needs to return to the appropriate context. - * - * Counterpart to irqentry_enter(). - */ -void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state); - -/** - * irqentry_nmi_enter - Handle NMI entry - * @regs: Pointer to currents pt_regs - * - * Similar to irqentry_enter() but taking care of the NMI constraints. - */ -irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs); - -/** - * irqentry_nmi_exit - Handle return from NMI handling - * @regs: Pointer to pt_regs (NMI entry regs) - * @irq_state: Return value from matching call to irqentry_nmi_enter() - * - * Last action before returning to the low level assembly code. - * - * Counterpart to irqentry_nmi_enter(). - */ -void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_= state); - #endif diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-com= mon.h new file mode 100644 index 000000000000..8af374331900 --- /dev/null +++ b/include/linux/irq-entry-common.h @@ -0,0 +1,389 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_IRQENTRYCOMMON_H +#define __LINUX_IRQENTRYCOMMON_H + +#include +#include +#include +#include +#include + +#include + +/* + * Define dummy _TIF work flags if not defined by the architecture or for + * disabled functionality. + */ +#ifndef _TIF_PATCH_PENDING +# define _TIF_PATCH_PENDING (0) +#endif + +/* + * TIF flags handled in exit_to_user_mode_loop() + */ +#ifndef ARCH_EXIT_TO_USER_MODE_WORK +# define ARCH_EXIT_TO_USER_MODE_WORK (0) +#endif + +#define EXIT_TO_USER_MODE_WORK \ + (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ + _TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY | \ + _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL | \ + ARCH_EXIT_TO_USER_MODE_WORK) + +/** + * arch_enter_from_user_mode - Architecture specific sanity check for user= mode regs + * @regs: Pointer to currents pt_regs + * + * Defaults to an empty implementation. Can be replaced by architecture + * specific code. + * + * Invoked from syscall_enter_from_user_mode() in the non-instrumentable + * section. Use __always_inline so the compiler cannot push it out of line + * and make it instrumentable. + */ +static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs= ); + +#ifndef arch_enter_from_user_mode +static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs= ) {} +#endif + +/** + * enter_from_user_mode - Establish state when coming from user mode + * + * Syscall/interrupt entry disables interrupts, but user mode is traced as + * interrupts enabled. Also with NO_HZ_FULL RCU might be idle. + * + * 1) Tell lockdep that interrupts are disabled + * 2) Invoke context tracking if enabled to reactivate RCU + * 3) Trace interrupts off state + * + * Invoked from architecture specific syscall entry code with interrupts + * disabled. The calling code has to be non-instrumentable. When the + * function returns all state is correct and interrupts are still + * disabled. The subsequent functions can be instrumented. + * + * This is invoked when there is architecture specific functionality to be + * done between establishing state and enabling interrupts. The caller must + * enable interrupts before invoking syscall_enter_from_user_mode_work(). + */ +static __always_inline void enter_from_user_mode(struct pt_regs *regs) +{ + arch_enter_from_user_mode(regs); + lockdep_hardirqs_off(CALLER_ADDR0); + + CT_WARN_ON(__ct_state() !=3D CT_STATE_USER); + user_exit_irqoff(); + + instrumentation_begin(); + kmsan_unpoison_entry_regs(regs); + trace_hardirqs_off_finish(); + instrumentation_end(); +} + +/** + * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enabl= e() + * @ti_work: Cached TIF flags gathered with interrupts disabled + * + * Defaults to local_irq_enable(). Can be supplied by architecture specific + * code. + */ +static inline void local_irq_enable_exit_to_user(unsigned long ti_work); + +#ifndef local_irq_enable_exit_to_user +static inline void local_irq_enable_exit_to_user(unsigned long ti_work) +{ + local_irq_enable(); +} +#endif + +/** + * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disa= ble() + * + * Defaults to local_irq_disable(). Can be supplied by architecture specif= ic + * code. + */ +static inline void local_irq_disable_exit_to_user(void); + +#ifndef local_irq_disable_exit_to_user +static inline void local_irq_disable_exit_to_user(void) +{ + local_irq_disable(); +} +#endif + +/** + * arch_exit_to_user_mode_work - Architecture specific TIF work for exit + * to user mode. + * @regs: Pointer to currents pt_regs + * @ti_work: Cached TIF flags gathered with interrupts disabled + * + * Invoked from exit_to_user_mode_loop() with interrupt enabled + * + * Defaults to NOOP. Can be supplied by architecture specific code. + */ +static inline void arch_exit_to_user_mode_work(struct pt_regs *regs, + unsigned long ti_work); + +#ifndef arch_exit_to_user_mode_work +static inline void arch_exit_to_user_mode_work(struct pt_regs *regs, + unsigned long ti_work) +{ +} +#endif + +/** + * arch_exit_to_user_mode_prepare - Architecture specific preparation for + * exit to user mode. + * @regs: Pointer to currents pt_regs + * @ti_work: Cached TIF flags gathered with interrupts disabled + * + * Invoked from exit_to_user_mode_prepare() with interrupt disabled as the= last + * function before return. Defaults to NOOP. + */ +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, + unsigned long ti_work); + +#ifndef arch_exit_to_user_mode_prepare +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, + unsigned long ti_work) +{ +} +#endif + +/** + * arch_exit_to_user_mode - Architecture specific final work before + * exit to user mode. + * + * Invoked from exit_to_user_mode() with interrupt disabled as the last + * function before return. Defaults to NOOP. + * + * This needs to be __always_inline because it is non-instrumentable code + * invoked after context tracking switched to user mode. + * + * An architecture implementation must not do anything complex, no locking + * etc. The main purpose is for speculation mitigations. + */ +static __always_inline void arch_exit_to_user_mode(void); + +#ifndef arch_exit_to_user_mode +static __always_inline void arch_exit_to_user_mode(void) { } +#endif + +/** + * arch_do_signal_or_restart - Architecture specific signal delivery func= tion + * @regs: Pointer to currents pt_regs + * + * Invoked from exit_to_user_mode_loop(). + */ +void arch_do_signal_or_restart(struct pt_regs *regs); + +/** + * exit_to_user_mode_loop - do any pending work before leaving to user spa= ce + */ +unsigned long exit_to_user_mode_loop(struct pt_regs *regs, + unsigned long ti_work); + +/** + * exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required + * @regs: Pointer to pt_regs on entry stack + * + * 1) check that interrupts are disabled + * 2) call tick_nohz_user_enter_prepare() + * 3) call exit_to_user_mode_loop() if any flags from + * EXIT_TO_USER_MODE_WORK are set + * 4) check that interrupts are still disabled + */ +static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs) +{ + unsigned long ti_work; + + lockdep_assert_irqs_disabled(); + + /* Flush pending rcuog wakeup before the last need_resched() check */ + tick_nohz_user_enter_prepare(); + + ti_work =3D read_thread_flags(); + if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) + ti_work =3D exit_to_user_mode_loop(regs, ti_work); + + arch_exit_to_user_mode_prepare(regs, ti_work); + + /* Ensure that kernel state is sane for a return to userspace */ + kmap_assert_nomap(); + lockdep_assert_irqs_disabled(); + lockdep_sys_exit(); +} + +/** + * exit_to_user_mode - Fixup state when exiting to user mode + * + * Syscall/interrupt exit enables interrupts, but the kernel state is + * interrupts disabled when this is invoked. Also tell RCU about it. + * + * 1) Trace interrupts on state + * 2) Invoke context tracking if enabled to adjust RCU state + * 3) Invoke architecture specific last minute exit code, e.g. speculation + * mitigations, etc.: arch_exit_to_user_mode() + * 4) Tell lockdep that interrupts are enabled + * + * Invoked from architecture specific code when syscall_exit_to_user_mode() + * is not suitable as the last step before returning to userspace. Must be + * invoked with interrupts disabled and the caller must be + * non-instrumentable. + * The caller has to invoke syscall_exit_to_user_mode_work() before this. + */ +static __always_inline void exit_to_user_mode(void) +{ + instrumentation_begin(); + trace_hardirqs_on_prepare(); + lockdep_hardirqs_on_prepare(); + instrumentation_end(); + + user_enter_irqoff(); + arch_exit_to_user_mode(); + lockdep_hardirqs_on(CALLER_ADDR0); +} + +/** + * irqentry_enter_from_user_mode - Establish state before invoking the irq= handler + * @regs: Pointer to currents pt_regs + * + * Invoked from architecture specific entry code with interrupts disabled. + * Can only be called when the interrupt entry came from user mode. The + * calling code must be non-instrumentable. When the function returns all + * state is correct and the subsequent functions can be instrumented. + * + * The function establishes state (lockdep, RCU (context tracking), tracin= g) + */ +void irqentry_enter_from_user_mode(struct pt_regs *regs); + +/** + * irqentry_exit_to_user_mode - Interrupt exit work + * @regs: Pointer to current's pt_regs + * + * Invoked with interrupts disabled and fully valid regs. Returns with all + * work handled, interrupts disabled such that the caller can immediately + * switch to user mode. Called from architecture specific interrupt + * handling code. + * + * The call order is #2 and #3 as described in syscall_exit_to_user_mode(). + * Interrupt exit is not invoking #1 which is the syscall specific one time + * work. + */ +void irqentry_exit_to_user_mode(struct pt_regs *regs); + +#ifndef irqentry_state +/** + * struct irqentry_state - Opaque object for exception state storage + * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether = the + * exit path has to invoke ct_irq_exit(). + * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that + * lockdep state is restored correctly on exit from nmi. + * + * This opaque object is filled in by the irqentry_*_enter() functions and + * must be passed back into the corresponding irqentry_*_exit() functions + * when the exception is complete. + * + * Callers of irqentry_*_[enter|exit]() must consider this structure opaque + * and all members private. Descriptions of the members are provided to a= id in + * the maintenance of the irqentry_*() functions. + */ +typedef struct irqentry_state { + union { + bool exit_rcu; + bool lockdep; + }; +} irqentry_state_t; +#endif + +/** + * irqentry_enter - Handle state tracking on ordinary interrupt entries + * @regs: Pointer to pt_regs of interrupted context + * + * Invokes: + * - lockdep irqflag state tracking as low level ASM entry disabled + * interrupts. + * + * - Context tracking if the exception hit user mode. + * + * - The hardirq tracer to keep the state consistent as low level ASM + * entry disabled interrupts. + * + * As a precondition, this requires that the entry came from user mode, + * idle, or a kernel context in which RCU is watching. + * + * For kernel mode entries RCU handling is done conditional. If RCU is + * watching then the only RCU requirement is to check whether the tick has + * to be restarted. If RCU is not watching then ct_irq_enter() has to be + * invoked on entry and ct_irq_exit() on exit. + * + * Avoiding the ct_irq_enter/exit() calls is an optimization but also + * solves the problem of kernel mode pagefaults which can schedule, which + * is not possible after invoking ct_irq_enter() without undoing it. + * + * For user mode entries irqentry_enter_from_user_mode() is invoked to + * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit + * would not be possible. + * + * Returns: An opaque object that must be passed to idtentry_exit() + */ +irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs); + +/** + * irqentry_exit_cond_resched - Conditionally reschedule on return from in= terrupt + * + * Conditional reschedule with additional sanity checks. + */ +void raw_irqentry_exit_cond_resched(void); +#ifdef CONFIG_PREEMPT_DYNAMIC +#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) +#define irqentry_exit_cond_resched_dynamic_enabled raw_irqentry_exit_cond_= resched +#define irqentry_exit_cond_resched_dynamic_disabled NULL +DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_res= ched); +#define irqentry_exit_cond_resched() static_call(irqentry_exit_cond_resche= d)() +#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY) +DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); +void dynamic_irqentry_exit_cond_resched(void); +#define irqentry_exit_cond_resched() dynamic_irqentry_exit_cond_resched() +#endif +#else /* CONFIG_PREEMPT_DYNAMIC */ +#define irqentry_exit_cond_resched() raw_irqentry_exit_cond_resched() +#endif /* CONFIG_PREEMPT_DYNAMIC */ + +/** + * irqentry_exit - Handle return from exception that used irqentry_enter() + * @regs: Pointer to pt_regs (exception entry regs) + * @state: Return value from matching call to irqentry_enter() + * + * Depending on the return target (kernel/user) this runs the necessary + * preemption and work checks if possible and required and returns to + * the caller with interrupts disabled and no further work pending. + * + * This is the last action before returning to the low level ASM code which + * just needs to return to the appropriate context. + * + * Counterpart to irqentry_enter(). + */ +void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state); + +/** + * irqentry_nmi_enter - Handle NMI entry + * @regs: Pointer to currents pt_regs + * + * Similar to irqentry_enter() but taking care of the NMI constraints. + */ +irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs); + +/** + * irqentry_nmi_exit - Handle return from NMI handling + * @regs: Pointer to pt_regs (NMI entry regs) + * @irq_state: Return value from matching call to irqentry_nmi_enter() + * + * Last action before returning to the low level assembly code. + * + * Counterpart to irqentry_nmi_enter(). + */ +void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_= state); + +#endif diff --git a/kernel/entry/Makefile b/kernel/entry/Makefile index 095c775e001e..d38f3a7e7396 100644 --- a/kernel/entry/Makefile +++ b/kernel/entry/Makefile @@ -9,5 +9,6 @@ KCOV_INSTRUMENT :=3D n CFLAGS_REMOVE_common.o =3D -fstack-protector -fstack-protector-strong CFLAGS_common.o +=3D -fno-stack-protector =20 -obj-$(CONFIG_GENERIC_ENTRY) +=3D common.o syscall_user_dispatch.o +obj-$(CONFIG_GENERIC_IRQ_ENTRY) +=3D common.o +obj-$(CONFIG_GENERIC_SYSCALL) +=3D syscall-common.o syscall_user_dispatc= h.o obj-$(CONFIG_KVM_XFER_TO_GUEST_WORK) +=3D kvm.o diff --git a/kernel/entry/common.c b/kernel/entry/common.c index e33691d5adf7..b82032777310 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -1,84 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 =20 -#include -#include +#include #include #include #include #include #include -#include #include =20 -#include "common.h" - -#define CREATE_TRACE_POINTS -#include - -static inline void syscall_enter_audit(struct pt_regs *regs, long syscall) -{ - if (unlikely(audit_context())) { - unsigned long args[6]; - - syscall_get_arguments(current, regs, args); - audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]); - } -} - -long syscall_trace_enter(struct pt_regs *regs, long syscall, - unsigned long work) -{ - long ret =3D 0; - - /* - * Handle Syscall User Dispatch. This must comes first, since - * the ABI here can be something that doesn't make sense for - * other syscall_work features. - */ - if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) { - if (syscall_user_dispatch(regs)) - return -1L; - } - - /* Handle ptrace */ - if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) { - ret =3D ptrace_report_syscall_entry(regs); - if (ret || (work & SYSCALL_WORK_SYSCALL_EMU)) - return -1L; - } - - /* Do seccomp after ptrace, to catch any tracer changes. */ - if (work & SYSCALL_WORK_SECCOMP) { - ret =3D __secure_computing(NULL); - if (ret =3D=3D -1L) - return ret; - } - - /* Either of the above might have changed the syscall number */ - syscall =3D syscall_get_nr(current, regs); - - if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) { - trace_sys_enter(regs, syscall); - /* - * Probes or BPF hooks in the tracepoint may have changed the - * system call number as well. - */ - syscall =3D syscall_get_nr(current, regs); - } - - syscall_enter_audit(regs, syscall); - - return ret ? : syscall; -} - -noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs) -{ - enter_from_user_mode(regs); - instrumentation_begin(); - local_irq_enable(); - instrumentation_end(); -} - /* Workaround to allow gradual conversion of architecture code */ void __weak arch_do_signal_or_restart(struct pt_regs *regs) { } =20 @@ -133,93 +62,6 @@ __always_inline unsigned long exit_to_user_mode_loop(st= ruct pt_regs *regs, return ti_work; } =20 -/* - * If SYSCALL_EMU is set, then the only reason to report is when - * SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP). This syscall - * instruction has been already reported in syscall_enter_from_user_mode(). - */ -static inline bool report_single_step(unsigned long work) -{ - if (work & SYSCALL_WORK_SYSCALL_EMU) - return false; - - return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP; -} - -static void syscall_exit_work(struct pt_regs *regs, unsigned long work) -{ - bool step; - - /* - * If the syscall was rolled back due to syscall user dispatching, - * then the tracers below are not invoked for the same reason as - * the entry side was not invoked in syscall_trace_enter(): The ABI - * of these syscalls is unknown. - */ - if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) { - if (unlikely(current->syscall_dispatch.on_dispatch)) { - current->syscall_dispatch.on_dispatch =3D false; - return; - } - } - - audit_syscall_exit(regs); - - if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT) - trace_sys_exit(regs, syscall_get_return_value(current, regs)); - - step =3D report_single_step(work); - if (step || work & SYSCALL_WORK_SYSCALL_TRACE) - ptrace_report_syscall_exit(regs, step); -} - -/* - * Syscall specific exit to user mode preparation. Runs with interrupts - * enabled. - */ -static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs) -{ - unsigned long work =3D READ_ONCE(current_thread_info()->syscall_work); - unsigned long nr =3D syscall_get_nr(current, regs); - - CT_WARN_ON(ct_state() !=3D CT_STATE_KERNEL); - - if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { - if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr)) - local_irq_enable(); - } - - rseq_syscall(regs); - - /* - * Do one-time syscall specific work. If these work items are - * enabled, we want to run them exactly once per syscall exit with - * interrupts enabled. - */ - if (unlikely(work & SYSCALL_WORK_EXIT)) - syscall_exit_work(regs, work); -} - -static __always_inline void __syscall_exit_to_user_mode_work(struct pt_reg= s *regs) -{ - syscall_exit_to_user_mode_prepare(regs); - local_irq_disable_exit_to_user(); - exit_to_user_mode_prepare(regs); -} - -void syscall_exit_to_user_mode_work(struct pt_regs *regs) -{ - __syscall_exit_to_user_mode_work(regs); -} - -__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs) -{ - instrumentation_begin(); - __syscall_exit_to_user_mode_work(regs); - instrumentation_end(); - exit_to_user_mode(); -} - noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs) { enter_from_user_mode(regs); diff --git a/kernel/entry/syscall-common.c b/kernel/entry/syscall-common.c new file mode 100644 index 000000000000..0eb036986ad4 --- /dev/null +++ b/kernel/entry/syscall-common.c @@ -0,0 +1,159 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include "common.h" + +#define CREATE_TRACE_POINTS +#include + +static inline void syscall_enter_audit(struct pt_regs *regs, long syscall) +{ + if (unlikely(audit_context())) { + unsigned long args[6]; + + syscall_get_arguments(current, regs, args); + audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]); + } +} + +long syscall_trace_enter(struct pt_regs *regs, long syscall, + unsigned long work) +{ + long ret =3D 0; + + /* + * Handle Syscall User Dispatch. This must comes first, since + * the ABI here can be something that doesn't make sense for + * other syscall_work features. + */ + if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) { + if (syscall_user_dispatch(regs)) + return -1L; + } + + /* Handle ptrace */ + if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) { + ret =3D ptrace_report_syscall_entry(regs); + if (ret || (work & SYSCALL_WORK_SYSCALL_EMU)) + return -1L; + } + + /* Do seccomp after ptrace, to catch any tracer changes. */ + if (work & SYSCALL_WORK_SECCOMP) { + ret =3D __secure_computing(NULL); + if (ret =3D=3D -1L) + return ret; + } + + /* Either of the above might have changed the syscall number */ + syscall =3D syscall_get_nr(current, regs); + + if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) { + trace_sys_enter(regs, syscall); + /* + * Probes or BPF hooks in the tracepoint may have changed the + * system call number as well. + */ + syscall =3D syscall_get_nr(current, regs); + } + + syscall_enter_audit(regs, syscall); + + return ret ? : syscall; +} + +noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs) +{ + enter_from_user_mode(regs); + instrumentation_begin(); + local_irq_enable(); + instrumentation_end(); +} + +/* + * If SYSCALL_EMU is set, then the only reason to report is when + * SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP). This syscall + * instruction has been already reported in syscall_enter_from_user_mode(). + */ +static inline bool report_single_step(unsigned long work) +{ + if (work & SYSCALL_WORK_SYSCALL_EMU) + return false; + + return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP; +} + +static void syscall_exit_work(struct pt_regs *regs, unsigned long work) +{ + bool step; + + /* + * If the syscall was rolled back due to syscall user dispatching, + * then the tracers below are not invoked for the same reason as + * the entry side was not invoked in syscall_trace_enter(): The ABI + * of these syscalls is unknown. + */ + if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) { + if (unlikely(current->syscall_dispatch.on_dispatch)) { + current->syscall_dispatch.on_dispatch =3D false; + return; + } + } + + audit_syscall_exit(regs); + + if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT) + trace_sys_exit(regs, syscall_get_return_value(current, regs)); + + step =3D report_single_step(work); + if (step || work & SYSCALL_WORK_SYSCALL_TRACE) + ptrace_report_syscall_exit(regs, step); +} + +/* + * Syscall specific exit to user mode preparation. Runs with interrupts + * enabled. + */ +static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs) +{ + unsigned long work =3D READ_ONCE(current_thread_info()->syscall_work); + unsigned long nr =3D syscall_get_nr(current, regs); + + CT_WARN_ON(ct_state() !=3D CT_STATE_KERNEL); + + if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { + if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr)) + local_irq_enable(); + } + + rseq_syscall(regs); + + /* + * Do one-time syscall specific work. If these work items are + * enabled, we want to run them exactly once per syscall exit with + * interrupts enabled. + */ + if (unlikely(work & SYSCALL_WORK_EXIT)) + syscall_exit_work(regs, work); +} + +static __always_inline void __syscall_exit_to_user_mode_work(struct pt_reg= s *regs) +{ + syscall_exit_to_user_mode_prepare(regs); + local_irq_disable_exit_to_user(); + exit_to_user_mode_prepare(regs); +} + +void syscall_exit_to_user_mode_work(struct pt_regs *regs) +{ + __syscall_exit_to_user_mode_work(regs); +} + +__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs) +{ + instrumentation_begin(); + __syscall_exit_to_user_mode_work(regs); + instrumentation_end(); + exit_to_user_mode(); +} diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 27a8fbd58091..2d560bb3efaa 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -68,8 +68,8 @@ #include =20 #ifdef CONFIG_PREEMPT_DYNAMIC -# ifdef CONFIG_GENERIC_ENTRY -# include +# ifdef CONFIG_GENERIC_IRQ_ENTRY +# include # endif #endif =20 @@ -7398,8 +7398,8 @@ EXPORT_SYMBOL(__cond_resched_rwlock_write); =20 #ifdef CONFIG_PREEMPT_DYNAMIC =20 -#ifdef CONFIG_GENERIC_ENTRY -#include +#ifdef CONFIG_GENERIC_IRQ_ENTRY +#include #endif =20 /* --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 044D1211707 for ; Fri, 6 Dec 2024 10:18:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480310; cv=none; b=DIkHq3wDBNNVmllgzsGQ7ItD++D2iu/VIx7+R1KeEbveEMB4mB/eC60TBoIu7lEfc/QaIznfpvPXtgFokYmdQMO8aHLTtLuFR0uANqwjNck8z3sEjTBjjZzJR3YwXJOq9JiYKDEJYa9eU7UeOUz8SPrrWyodLYbl8ODGq5TfQBU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480310; c=relaxed/simple; bh=X2Hs5ONKCpUpoSdlgu3DKTGftdwEh4l3qRPb/x/vjUk=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HwFelMEPjv/FrPD4js+qIjsa81Zm9Jqa8V7KBC9Z17l4cvBfehWrZ4fz48eXCH3IQgghfhgC4iIKv+FrWMtGHxXxfkoN/IT9NliLnVgsAkX7yeWzs+CaksVQ+M9n3ZwWWii3FOh+PU7El9QrHZPVdgyeF9QOw5OoE8Gz71LZ8Uc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rxg3tbYzRhvD; Fri, 6 Dec 2024 18:16:39 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 301DB140134; Fri, 6 Dec 2024 18:18:19 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:17 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 10/22] entry: Add arch_irqentry_exit_need_resched() for arm64 Date: Fri, 6 Dec 2024 18:17:32 +0800 Message-ID: <20241206101744.4161990-11-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" ARM64 requires an additional check whether to reschedule on return from interrupt. Add arch_irqentry_exit_need_resched() as the default NOP implementation and hook it up into the need_resched() condition in raw_irqentry_exit_cond_resched(). This allows ARM64 to implement the architecture specific version for switching over to the generic entry code. Suggested-by: Mark Rutland Suggested-by: Kevin Brodsky Suggested-by: Thomas Gleixner Signed-off-by: Jinjie Ruan --- kernel/entry/common.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/kernel/entry/common.c b/kernel/entry/common.c index b82032777310..4aa9656fa1b4 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -142,6 +142,20 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs= *regs) return ret; } =20 +/** + * arch_irqentry_exit_need_resched - Architecture specific need resched fu= nction + * + * Invoked from raw_irqentry_exit_cond_resched() to check if need resched. + * Defaults return true. + * + * The main purpose is to permit arch to skip preempt a task from an IRQ. + */ +static inline bool arch_irqentry_exit_need_resched(void); + +#ifndef arch_irqentry_exit_need_resched +static inline bool arch_irqentry_exit_need_resched(void) { return true; } +#endif + void raw_irqentry_exit_cond_resched(void) { if (!preempt_count()) { @@ -149,7 +163,7 @@ void raw_irqentry_exit_cond_resched(void) rcu_irq_exit_check_preempt(); if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON_ONCE(!on_thread_stack()); - if (need_resched()) + if (need_resched() && arch_irqentry_exit_need_resched()) preempt_schedule_irq(); } } --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE73720FA91 for ; Fri, 6 Dec 2024 10:18:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.190 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480306; cv=none; b=hIFxlyD3AQnTGmj28UhiNyvMtSqaBN2xzKzDKU0e/395sB25c7u6Ox6LGU5WvthINUNpG/5U1rRcg3s74b+TMvIZEC7KmFIe0/FcHS4ctgSzk/OHDCYV/P76oQy5wDfCy5j2wN4h1jpKBi0HzAzUF+D3p7gh4MOipMg86ycF6AY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480306; c=relaxed/simple; bh=mC54z5ysCLfjCLcfZJqV6foTCL7ePeESoxnLlYe0CcY=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=S3XhwrPAGGLVIL+vAf/WG7VdIBIYKxQb82wHzRt1FBjT7OJDcK2vr7T2WveKKomsR0luOX8pmeifXvnl3CIJFM73KgrlU3URIv+NTt5Oyy7hp6gL2IIPAsraHUTmw9QIoQkpcEAXbAkL31bMEDDjyNWBCLA+h6rLpiGFbOWR8rI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.190 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rwx0c4xz2Dh8H; Fri, 6 Dec 2024 18:16:01 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 6B99A1A016C; Fri, 6 Dec 2024 18:18:20 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:19 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 11/22] arm64: entry: Switch to generic IRQ entry Date: Fri, 6 Dec 2024 18:17:33 +0800 Message-ID: <20241206101744.4161990-12-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64 to use the generic entry infrastructure from kernel/entry/*. The generic entry makes maintainers' work easier and codes more elegant. Switch arm64 to generic IRQ entry first, which removed duplicate 100+ LOC, and it will switch to generic entry completely later. Switch to generic entry in two steps according to Mark's suggestion will make it easier to review. The changes are below: - Remove *enter_from/exit_to_kernel_mode(), and wrap with generic irqentry_enter/exit(). Also remove *enter_from/exit_to_user_mode(), and wrap with generic enter_from/exit_to_user_mode() because they are exactly the same so far. - Remove arm64_enter/exit_nmi() and use generic irqentry_nmi_enter/exit() because they're exactly the same, so the temporary arm64 version irqentry_state can also be removed. - Remove PREEMPT_DYNAMIC code, as generic entry do the same thing if arm64 implement arch_irqentry_exit_need_resched(). Suggested-by: Mark Rutland Signed-off-by: Jinjie Ruan --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/entry-common.h | 64 ++++++ arch/arm64/include/asm/preempt.h | 6 - arch/arm64/kernel/entry-common.c | 307 ++++++-------------------- arch/arm64/kernel/signal.c | 3 +- 5 files changed, 129 insertions(+), 252 deletions(-) create mode 100644 arch/arm64/include/asm/entry-common.h diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 0cd423d9aa5b..3751ab9f2a21 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -150,6 +150,7 @@ config ARM64 select GENERIC_EARLY_IOREMAP select GENERIC_IDLE_POLL_SETUP select GENERIC_IOREMAP + select GENERIC_IRQ_ENTRY select GENERIC_IRQ_IPI select GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD select GENERIC_IRQ_PROBE diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm= /entry-common.h new file mode 100644 index 000000000000..1cc9d966a6c3 --- /dev/null +++ b/arch/arm64/include/asm/entry-common.h @@ -0,0 +1,64 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _ASM_ARM64_ENTRY_COMMON_H +#define _ASM_ARM64_ENTRY_COMMON_H + +#include + +#include +#include +#include +#include + +#define ARCH_EXIT_TO_USER_MODE_WORK (_TIF_MTE_ASYNC_FAULT | _TIF_FOREIGN_F= PSTATE) + +static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *re= gs, + unsigned long ti_work) +{ + if (ti_work & _TIF_MTE_ASYNC_FAULT) { + clear_thread_flag(TIF_MTE_ASYNC_FAULT); + send_sig_fault(SIGSEGV, SEGV_MTEAERR, (void __user *)NULL, current); + } + + if (ti_work & _TIF_FOREIGN_FPSTATE) + fpsimd_restore_current_state(); +} + +#define arch_exit_to_user_mode_work arch_exit_to_user_mode_work + +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, + unsigned long ti_work) +{ + local_daif_mask(); +} + +#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare + +static inline bool arch_irqentry_exit_need_resched(void) +{ + /* + * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC + * priority masking is used the GIC irqchip driver will clear DAIF.IF + * using gic_arch_enable_irqs() for normal IRQs. If anything is set in + * DAIF we must have handled an NMI, so skip preemption. + */ + if (system_uses_irq_prio_masking() && read_sysreg(daif)) + return false; + + /* + * Preempting a task from an IRQ means we leave copies of PSTATE + * on the stack. cpufeature's enable calls may modify PSTATE, but + * resuming one of these preempted tasks would undo those changes. + * + * Only allow a task to be preempted once cpufeatures have been + * enabled. + */ + if (!system_capabilities_finalized()) + return false; + + return true; +} + +#define arch_irqentry_exit_need_resched arch_irqentry_exit_need_resched + +#endif /* _ASM_ARM64_ENTRY_COMMON_H */ diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/pree= mpt.h index 0f0ba250efe8..932ea4b62042 100644 --- a/arch/arm64/include/asm/preempt.h +++ b/arch/arm64/include/asm/preempt.h @@ -2,7 +2,6 @@ #ifndef __ASM_PREEMPT_H #define __ASM_PREEMPT_H =20 -#include #include =20 #define PREEMPT_NEED_RESCHED BIT(32) @@ -85,22 +84,17 @@ static inline bool should_resched(int preempt_offset) void preempt_schedule(void); void preempt_schedule_notrace(void); =20 -void raw_irqentry_exit_cond_resched(void); #ifdef CONFIG_PREEMPT_DYNAMIC =20 -DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); void dynamic_preempt_schedule(void); #define __preempt_schedule() dynamic_preempt_schedule() void dynamic_preempt_schedule_notrace(void); #define __preempt_schedule_notrace() dynamic_preempt_schedule_notrace() -void dynamic_irqentry_exit_cond_resched(void); -#define irqentry_exit_cond_resched() dynamic_irqentry_exit_cond_resched() =20 #else /* CONFIG_PREEMPT_DYNAMIC */ =20 #define __preempt_schedule() preempt_schedule() #define __preempt_schedule_notrace() preempt_schedule_notrace() -#define irqentry_exit_cond_resched() raw_irqentry_exit_cond_resched() =20 #endif /* CONFIG_PREEMPT_DYNAMIC */ #endif /* CONFIG_PREEMPTION */ diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index 015a65d19b52..95885da2d776 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -6,6 +6,7 @@ */ =20 #include +#include #include #include #include @@ -28,13 +29,6 @@ #include #include =20 -typedef struct irqentry_state { - union { - bool exit_rcu; - bool lockdep; - }; -} irqentry_state_t; - /* * Handle IRQ/context state management when entering from kernel mode. * Before this function is called it is not safe to call regular kernel co= de, @@ -45,24 +39,7 @@ typedef struct irqentry_state { */ static __always_inline irqentry_state_t __enter_from_kernel_mode(struct pt= _regs *regs) { - irqentry_state_t state =3D { - .exit_rcu =3D false, - }; - - if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) { - lockdep_hardirqs_off(CALLER_ADDR0); - ct_irq_enter(); - trace_hardirqs_off_finish(); - - state.exit_rcu =3D true; - return state; - } - - lockdep_hardirqs_off(CALLER_ADDR0); - rcu_irq_enter_check_tick(); - trace_hardirqs_off_finish(); - - return state; + return irqentry_enter(regs); } =20 static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *reg= s) @@ -75,49 +52,6 @@ static noinstr irqentry_state_t enter_from_kernel_mode(s= truct pt_regs *regs) return state; } =20 -static inline bool arm64_need_resched(void) -{ - /* - * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC - * priority masking is used the GIC irqchip driver will clear DAIF.IF - * using gic_arch_enable_irqs() for normal IRQs. If anything is set in - * DAIF we must have handled an NMI, so skip preemption. - */ - if (system_uses_irq_prio_masking() && read_sysreg(daif)) - return false; - - /* - * Preempting a task from an IRQ means we leave copies of PSTATE - * on the stack. cpufeature's enable calls may modify PSTATE, but - * resuming one of these preempted tasks would undo those changes. - * - * Only allow a task to be preempted once cpufeatures have been - * enabled. - */ - if (!system_capabilities_finalized()) - return false; - - return true; -} - -void raw_irqentry_exit_cond_resched(void) -{ - if (!preempt_count()) { - if (need_resched() && arm64_need_resched()) - preempt_schedule_irq(); - } -} - -#ifdef CONFIG_PREEMPT_DYNAMIC -DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); -void dynamic_irqentry_exit_cond_resched(void) -{ - if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) - return; - raw_irqentry_exit_cond_resched(); -} -#endif - /* * Handle IRQ/context state management when exiting to kernel mode. * After this function returns it is not safe to call regular kernel code, @@ -129,25 +63,7 @@ void dynamic_irqentry_exit_cond_resched(void) static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs, irqentry_state_t state) { - lockdep_assert_irqs_disabled(); - - if (!regs_irqs_disabled(regs)) { - if (state.exit_rcu) { - trace_hardirqs_on_prepare(); - lockdep_hardirqs_on_prepare(); - ct_irq_exit(); - lockdep_hardirqs_on(CALLER_ADDR0); - return; - } - - if (IS_ENABLED(CONFIG_PREEMPTION)) - irqentry_exit_cond_resched(); - - trace_hardirqs_on(); - } else { - if (state.exit_rcu) - ct_irq_exit(); - } + irqentry_exit(regs, state); } =20 static void noinstr exit_to_kernel_mode(struct pt_regs *regs, @@ -162,18 +78,15 @@ static void noinstr exit_to_kernel_mode(struct pt_regs= *regs, * Before this function is called it is not safe to call regular kernel co= de, * instrumentable code, or any code which may trigger an exception. */ -static __always_inline void __enter_from_user_mode(void) +static __always_inline void __enter_from_user_mode(struct pt_regs *regs) { - lockdep_hardirqs_off(CALLER_ADDR0); - CT_WARN_ON(ct_state() !=3D CT_STATE_USER); - user_exit_irqoff(); - trace_hardirqs_off_finish(); + enter_from_user_mode(regs); mte_disable_tco_entry(current); } =20 -static __always_inline void enter_from_user_mode(struct pt_regs *regs) +static __always_inline void arm64_enter_from_user_mode(struct pt_regs *reg= s) { - __enter_from_user_mode(); + __enter_from_user_mode(regs); } =20 /* @@ -181,113 +94,17 @@ static __always_inline void enter_from_user_mode(stru= ct pt_regs *regs) * After this function returns it is not safe to call regular kernel code, * instrumentable code, or any code which may trigger an exception. */ -static __always_inline void __exit_to_user_mode(void) +static __always_inline void arm64_exit_to_user_mode(struct pt_regs *regs) { - trace_hardirqs_on_prepare(); - lockdep_hardirqs_on_prepare(); - user_enter_irqoff(); - lockdep_hardirqs_on(CALLER_ADDR0); -} - -static void do_notify_resume(struct pt_regs *regs, unsigned long thread_fl= ags) -{ - do { - local_irq_enable(); - - if (thread_flags & _TIF_NEED_RESCHED) - schedule(); - - if (thread_flags & _TIF_UPROBE) - uprobe_notify_resume(regs); - - if (thread_flags & _TIF_MTE_ASYNC_FAULT) { - clear_thread_flag(TIF_MTE_ASYNC_FAULT); - send_sig_fault(SIGSEGV, SEGV_MTEAERR, - (void __user *)NULL, current); - } - - if (thread_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) - do_signal(regs); - - if (thread_flags & _TIF_NOTIFY_RESUME) - resume_user_mode_work(regs); - - if (thread_flags & _TIF_FOREIGN_FPSTATE) - fpsimd_restore_current_state(); - - local_irq_disable(); - thread_flags =3D read_thread_flags(); - } while (thread_flags & _TIF_WORK_MASK); -} - -static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs) -{ - unsigned long flags; - local_irq_disable(); - - flags =3D read_thread_flags(); - if (unlikely(flags & _TIF_WORK_MASK)) - do_notify_resume(regs, flags); - - local_daif_mask(); - - lockdep_sys_exit(); -} - -static __always_inline void exit_to_user_mode(struct pt_regs *regs) -{ exit_to_user_mode_prepare(regs); mte_check_tfsr_exit(); - __exit_to_user_mode(); + exit_to_user_mode(); } =20 asmlinkage void noinstr asm_exit_to_user_mode(struct pt_regs *regs) { - exit_to_user_mode(regs); -} - -/* - * Handle IRQ/context state management when entering an NMI from user/kern= el - * mode. Before this function is called it is not safe to call regular ker= nel - * code, instrumentable code, or any code which may trigger an exception. - */ -static noinstr irqentry_state_t arm64_enter_nmi(struct pt_regs *regs) -{ - irqentry_state_t state; - - state.lockdep =3D lockdep_hardirqs_enabled(); - - __nmi_enter(); - lockdep_hardirqs_off(CALLER_ADDR0); - lockdep_hardirq_enter(); - ct_nmi_enter(); - - trace_hardirqs_off_finish(); - ftrace_nmi_enter(); - - return state; -} - -/* - * Handle IRQ/context state management when exiting an NMI from user/kernel - * mode. After this function returns it is not safe to call regular kernel - * code, instrumentable code, or any code which may trigger an exception. - */ -static void noinstr arm64_exit_nmi(struct pt_regs *regs, - irqentry_state_t state) -{ - ftrace_nmi_exit(); - if (state.lockdep) { - trace_hardirqs_on_prepare(); - lockdep_hardirqs_on_prepare(); - } - - ct_nmi_exit(); - lockdep_hardirq_exit(); - if (state.lockdep) - lockdep_hardirqs_on(CALLER_ADDR0); - __nmi_exit(); + arm64_exit_to_user_mode(regs); } =20 /* @@ -346,7 +163,7 @@ extern void (*handle_arch_fiq)(struct pt_regs *); static void noinstr __panic_unhandled(struct pt_regs *regs, const char *ve= ctor, unsigned long esr) { - arm64_enter_nmi(regs); + irqentry_nmi_enter(regs); =20 console_verbose(); =20 @@ -580,10 +397,10 @@ asmlinkage void noinstr el1h_64_sync_handler(struct p= t_regs *regs) static __always_inline void __el1_pnmi(struct pt_regs *regs, void (*handler)(struct pt_regs *)) { - irqentry_state_t state =3D arm64_enter_nmi(regs); + irqentry_state_t state =3D irqentry_nmi_enter(regs); =20 do_interrupt_handler(regs, handler); - arm64_exit_nmi(regs, state); + irqentry_nmi_exit(regs, state); } =20 static __always_inline void __el1_irq(struct pt_regs *regs, @@ -624,19 +441,19 @@ asmlinkage void noinstr el1h_64_error_handler(struct = pt_regs *regs) irqentry_state_t state; =20 local_daif_restore(DAIF_ERRCTX); - state =3D arm64_enter_nmi(regs); + state =3D irqentry_nmi_enter(regs); do_serror(regs, esr); - arm64_exit_nmi(regs, state); + irqentry_nmi_exit(regs, state); } =20 static void noinstr el0_da(struct pt_regs *regs, unsigned long esr) { unsigned long far =3D read_sysreg(far_el1); =20 - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_mem_abort(far, esr, regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_ia(struct pt_regs *regs, unsigned long esr) @@ -651,50 +468,50 @@ static void noinstr el0_ia(struct pt_regs *regs, unsi= gned long esr) if (!is_ttbr0_addr(far)) arm64_apply_bp_hardening(); =20 - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_mem_abort(far, esr, regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_fpsimd_acc(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_fpsimd_acc(esr, regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_sve_acc(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_sve_acc(esr, regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_sme_acc(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_sme_acc(esr, regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_fpsimd_exc(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_fpsimd_exc(esr, regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_sys(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_el0_sys(esr, regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_pc(struct pt_regs *regs, unsigned long esr) @@ -704,58 +521,58 @@ static void noinstr el0_pc(struct pt_regs *regs, unsi= gned long esr) if (!is_ttbr0_addr(instruction_pointer(regs))) arm64_apply_bp_hardening(); =20 - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_sp_pc_abort(far, esr, regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_sp(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_sp_pc_abort(regs->sp, esr, regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_undef(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_el0_undef(regs, esr); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_bti(struct pt_regs *regs) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_el0_bti(regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_mops(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_el0_mops(regs, esr); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_gcs(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_el0_gcs(regs, esr); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_inv(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); bad_el0_sync(regs, 0, esr); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_dbg(struct pt_regs *regs, unsigned long esr) @@ -763,28 +580,28 @@ static void noinstr el0_dbg(struct pt_regs *regs, uns= igned long esr) /* Only watchpoints write FAR_EL1, otherwise its UNKNOWN */ unsigned long far =3D read_sysreg(far_el1); =20 - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); do_debug_exception(far, esr, regs); local_daif_restore(DAIF_PROCCTX); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_svc(struct pt_regs *regs) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); cortex_a76_erratum_1463225_svc_handler(); fp_user_discard(); local_daif_restore(DAIF_PROCCTX); do_el0_svc(regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_fpac(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_el0_fpac(regs, esr); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs) @@ -852,7 +669,7 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_= regs *regs) static void noinstr el0_interrupt(struct pt_regs *regs, void (*handler)(struct pt_regs *)) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); =20 write_sysreg(DAIF_PROCCTX_NOIRQ, daif); =20 @@ -863,7 +680,7 @@ static void noinstr el0_interrupt(struct pt_regs *regs, do_interrupt_handler(regs, handler); irq_exit_rcu(); =20 - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr __el0_irq_handler_common(struct pt_regs *regs) @@ -891,13 +708,13 @@ static void noinstr __el0_error_handler_common(struct= pt_regs *regs) unsigned long esr =3D read_sysreg(esr_el1); irqentry_state_t state; =20 - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_ERRCTX); - state =3D arm64_enter_nmi(regs); + state =3D irqentry_nmi_enter(regs); do_serror(regs, esr); - arm64_exit_nmi(regs, state); + irqentry_nmi_exit(regs, state); local_daif_restore(DAIF_PROCCTX); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 asmlinkage void noinstr el0t_64_error_handler(struct pt_regs *regs) @@ -908,19 +725,19 @@ asmlinkage void noinstr el0t_64_error_handler(struct = pt_regs *regs) #ifdef CONFIG_COMPAT static void noinstr el0_cp15(struct pt_regs *regs, unsigned long esr) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); local_daif_restore(DAIF_PROCCTX); do_el0_cp15(esr, regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 static void noinstr el0_svc_compat(struct pt_regs *regs) { - enter_from_user_mode(regs); + arm64_enter_from_user_mode(regs); cortex_a76_erratum_1463225_svc_handler(); local_daif_restore(DAIF_PROCCTX); do_el0_svc_compat(regs); - exit_to_user_mode(regs); + arm64_exit_to_user_mode(regs); } =20 asmlinkage void noinstr el0t_32_sync_handler(struct pt_regs *regs) @@ -994,7 +811,7 @@ asmlinkage void noinstr __noreturn handle_bad_stack(str= uct pt_regs *regs) unsigned long esr =3D read_sysreg(esr_el1); unsigned long far =3D read_sysreg(far_el1); =20 - arm64_enter_nmi(regs); + irqentry_nmi_enter(regs); panic_bad_stack(regs, esr, far); } #endif /* CONFIG_VMAP_STACK */ @@ -1028,9 +845,9 @@ __sdei_handler(struct pt_regs *regs, struct sdei_regis= tered_event *arg) else if (cpu_has_pan()) set_pstate_pan(0); =20 - state =3D arm64_enter_nmi(regs); + state =3D irqentry_nmi_enter(regs); ret =3D do_sdei_event(regs, arg); - arm64_exit_nmi(regs, state); + irqentry_nmi_exit(regs, state); =20 return ret; } diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 14ac6fdb872b..84b6628647c7 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -1603,7 +1604,7 @@ static void handle_signal(struct ksignal *ksig, struc= t pt_regs *regs) * the kernel can handle, and then we build all the user-level signal hand= ling * stack-frames in one go after that. */ -void do_signal(struct pt_regs *regs) +void arch_do_signal_or_restart(struct pt_regs *regs) { unsigned long continue_addr =3D 0, restart_addr =3D 0; int retval =3D 0; --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE6D12101B8 for ; Fri, 6 Dec 2024 10:18:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480306; cv=none; b=Rvg1+jc+jJiirGo9nnUk5/p6akIS0+maPM3mwpFtKeL0rSvlX8kfXa2oi2hPZTwebXQYmBdJh9Asm7C4MbhAD4VXXX9FsaFQWfH4YrJB9Nfm0dGp99uF4/iHkKywWoVjgOukkCz0UYaemF6OYbOfv1f+hoNbF9qWNlWyXhSIPIQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480306; c=relaxed/simple; bh=zBltK4l22OTXWoQdRCFD2k2x14WGrqiBmG+1Q4fJugA=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=eqPMNYQR5Y6kLEZYqpb0GGa3TUQl0R4E2gGGnHfKEdd/Kl/kjjS1UDmlh70feSvqNtDvw4ZCpl1wmmNLAQ2E5H4A5vaJt41NxiaXgiZcy4vKARId4lr64vaZx6S7OkfPy0ohn1iDuTscumVtgjufZz1GVvY9jM7jVXOPi2T5aYY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Y4RxY2kBHzqTBp; Fri, 6 Dec 2024 18:16:33 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 972DA1400FD; Fri, 6 Dec 2024 18:18:21 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:20 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 12/22] arm64/ptrace: Split report_syscall() function Date: Fri, 6 Dec 2024 18:17:34 +0800 Message-ID: <20241206101744.4161990-13-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" Split report_syscall() to two separate enter and exit functions. So it will be more clear when arm64 switch to generic entry. No functional changes. Suggested-by: Mark Rutland Signed-off-by: Jinjie Ruan --- arch/arm64/kernel/ptrace.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index e4437f62a2cd..d0d801a4094a 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2298,7 +2298,7 @@ enum ptrace_syscall_dir { PTRACE_SYSCALL_EXIT, }; =20 -static void report_syscall(struct pt_regs *regs, enum ptrace_syscall_dir d= ir) +static void report_syscall_enter(struct pt_regs *regs) { int regno; unsigned long saved_reg; @@ -2321,13 +2321,24 @@ static void report_syscall(struct pt_regs *regs, en= um ptrace_syscall_dir dir) */ regno =3D (is_compat_task() ? 12 : 7); saved_reg =3D regs->regs[regno]; - regs->regs[regno] =3D dir; + regs->regs[regno] =3D PTRACE_SYSCALL_ENTER; =20 - if (dir =3D=3D PTRACE_SYSCALL_ENTER) { - if (ptrace_report_syscall_entry(regs)) - forget_syscall(regs); - regs->regs[regno] =3D saved_reg; - } else if (!test_thread_flag(TIF_SINGLESTEP)) { + if (ptrace_report_syscall_entry(regs)) + forget_syscall(regs); + regs->regs[regno] =3D saved_reg; +} + +static void report_syscall_exit(struct pt_regs *regs) +{ + int regno; + unsigned long saved_reg; + + /* See comment for report_syscall_enter() */ + regno =3D (is_compat_task() ? 12 : 7); + saved_reg =3D regs->regs[regno]; + regs->regs[regno] =3D PTRACE_SYSCALL_EXIT; + + if (!test_thread_flag(TIF_SINGLESTEP)) { ptrace_report_syscall_exit(regs, 0); regs->regs[regno] =3D saved_reg; } else { @@ -2347,7 +2358,7 @@ int syscall_trace_enter(struct pt_regs *regs) unsigned long flags =3D read_thread_flags(); =20 if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { - report_syscall(regs, PTRACE_SYSCALL_ENTER); + report_syscall_enter(regs); if (flags & _TIF_SYSCALL_EMU) return NO_SYSCALL; } @@ -2375,7 +2386,7 @@ void syscall_trace_exit(struct pt_regs *regs) trace_sys_exit(regs, syscall_get_return_value(current, regs)); =20 if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP)) - report_syscall(regs, PTRACE_SYSCALL_EXIT); + report_syscall_exit(regs); =20 rseq_syscall(regs); } --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B8DE210191 for ; Fri, 6 Dec 2024 10:18:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.191 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480307; cv=none; b=c0jziWHMVewY4etwWme0VKy0l64YLchXVgU3mSwkb0JiGwNv6yUEtJg126xB8t7/5soIEN4Eq1K1yv8ur7+z5p770fR0y1owxO7yn8kwcJaXbpovmjFjK2yxRbNzTTesAt+hr80Z+3Io4KPA5D1jNG3jzPtizcePWLYGAUHgc90= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480307; c=relaxed/simple; bh=uELSP/OBp0SjmqRUcOzKoyI+fyb8P3aGzsJbnsOhRas=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FLuUhKLvt8iM3R6zba56F1r9st1ujDG7bGEju72PHvFx2dw6TJueWHbHjECsqmzNu1QQSEP64co0gAUaHBQoh9vlQaiGvwfx1ObJ15JaqZH6x18giIjBF9jM9J/7BWI5gXv3B3wIgAeXys8s4UA0SgzQY0jI1BS9gy+1K3pJuro= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.191 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.17]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rwz4YPxz1kvQ9; Fri, 6 Dec 2024 18:16:03 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id CA77B1A0188; Fri, 6 Dec 2024 18:18:22 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:21 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 13/22] arm64/ptrace: Refactor syscall_trace_enter() Date: Fri, 6 Dec 2024 18:17:35 +0800 Message-ID: <20241206101744.4161990-14-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry syscall_trace_enter() use the input syscall work flag and syscall number. In preparation for moving arm64 over to the generic entry code, refactor syscall_trace_enter() to also pass syscall number and thread flags, by using syscall_get_nr() helper. No functional changes. Signed-off-by: Jinjie Ruan --- arch/arm64/include/asm/syscall.h | 2 +- arch/arm64/kernel/ptrace.c | 20 ++++++++++++++------ arch/arm64/kernel/syscall.c | 2 +- 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/sysc= all.h index ab8e14b96f68..6b71d335c224 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -85,7 +85,7 @@ static inline int syscall_get_arch(struct task_struct *ta= sk) return AUDIT_ARCH_AARCH64; } =20 -int syscall_trace_enter(struct pt_regs *regs); +int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long = flags); void syscall_trace_exit(struct pt_regs *regs); =20 #endif /* __ASM_SYSCALL_H */ diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index d0d801a4094a..48bb813e0ef6 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2353,10 +2353,8 @@ static void report_syscall_exit(struct pt_regs *regs) } } =20 -int syscall_trace_enter(struct pt_regs *regs) +int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long = flags) { - unsigned long flags =3D read_thread_flags(); - if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { report_syscall_enter(regs); if (flags & _TIF_SYSCALL_EMU) @@ -2367,10 +2365,20 @@ int syscall_trace_enter(struct pt_regs *regs) if (secure_computing() =3D=3D -1) return NO_SYSCALL; =20 - if (test_thread_flag(TIF_SYSCALL_TRACEPOINT)) - trace_sys_enter(regs, regs->syscallno); + /* Either of the above might have changed the syscall number */ + syscall =3D syscall_get_nr(current, regs); + + if (test_thread_flag(TIF_SYSCALL_TRACEPOINT)) { + trace_sys_enter(regs, syscall); + + /* + * Probes or BPF hooks in the tracepoint may have changed the + * system call number as well. + */ + syscall =3D syscall_get_nr(current, regs); + } =20 - audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1], + audit_syscall_entry(syscall, regs->orig_x0, regs->regs[1], regs->regs[2], regs->regs[3]); =20 return regs->syscallno; diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index c442fcec6b9e..eb328ee1423c 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -124,7 +124,7 @@ static void el0_svc_common(struct pt_regs *regs, int sc= no, int sc_nr, */ if (scno =3D=3D NO_SYSCALL) syscall_set_return_value(current, regs, -ENOSYS, 0); - scno =3D syscall_trace_enter(regs); + scno =3D syscall_trace_enter(regs, regs->syscallno, flags); if (scno =3D=3D NO_SYSCALL) goto trace_exit; } --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18A411FC7E5 for ; Fri, 6 Dec 2024 10:18:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.191 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480309; cv=none; b=Mo7HRmBY5NBziYkgJFvkPzQauMwp/pgCzA5O1fn089bhIjNKQcmhMvVWEI7fUwoCTNUgqfLdqqPQrEAYwahB5d8sbunJqUfNpPnRy9UH3rIYGTuqQi8Ktbrihwfn1nRuSVpgwlRy2a51/fzZCAOF+7H9+myyeMtWRQmgKFNYBN8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480309; c=relaxed/simple; bh=clsWmJJWSODoXSxUtG7wUMUsCD++Kx4olvuiorEaguE=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=dqlwlRCbl3u4RVl/O6D4+9htahHnaHXM1UnTesKsRumuNHHzJh7hXsibTY/6Ml9MnKM7OQE5SGbQFgzm1CeHJ9+1zpQIDi6xOm8ZYSf8lBiLA/xiQvnPZVZ//QgYaY64Ej0kiIrmsZPVng4nqlJgwX7sVmuJ0hxf7H56MeXtd4A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.191 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.17]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rx06GcKz1kvVt; Fri, 6 Dec 2024 18:16:04 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 118A51A0188; Fri, 6 Dec 2024 18:18:24 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:22 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 14/22] arm64/ptrace: Refactor syscall_trace_exit() Date: Fri, 6 Dec 2024 18:17:36 +0800 Message-ID: <20241206101744.4161990-15-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry syscall_exit_work() use the input syscall work flag. In preparation for moving arm64 over to the generic entry code, refactor syscall_trace_exit() to also pass thread flags. No functional changes. Signed-off-by: Jinjie Ruan --- arch/arm64/include/asm/syscall.h | 2 +- arch/arm64/kernel/ptrace.c | 4 +--- arch/arm64/kernel/syscall.c | 3 ++- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/sysc= all.h index 6b71d335c224..925a257145f9 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -86,6 +86,6 @@ static inline int syscall_get_arch(struct task_struct *ta= sk) } =20 int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long = flags); -void syscall_trace_exit(struct pt_regs *regs); +void syscall_trace_exit(struct pt_regs *regs, unsigned long flags); =20 #endif /* __ASM_SYSCALL_H */ diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 48bb813e0ef6..bb994d668d74 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2384,10 +2384,8 @@ int syscall_trace_enter(struct pt_regs *regs, long s= yscall, unsigned long flags) return regs->syscallno; } =20 -void syscall_trace_exit(struct pt_regs *regs) +void syscall_trace_exit(struct pt_regs *regs, unsigned long flags) { - unsigned long flags =3D read_thread_flags(); - audit_syscall_exit(regs); =20 if (flags & _TIF_SYSCALL_TRACEPOINT) diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index eb328ee1423c..064dc114fb9b 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -143,7 +143,8 @@ static void el0_svc_common(struct pt_regs *regs, int sc= no, int sc_nr, } =20 trace_exit: - syscall_trace_exit(regs); + flags =3D read_thread_flags(); + syscall_trace_exit(regs, flags); } =20 void do_el0_svc(struct pt_regs *regs) --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5A92211702 for ; Fri, 6 Dec 2024 10:18:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480310; cv=none; b=MIKfsZ28spwqOj8uyTI+easgSDj8dSSIwnZenpSEADYg6NzVxWMN885iZUgV3boAPufVJ9yH/gryfMAaVuxUcpGv6sX2d5KIASxNXZA/8neaoJkFHM8fIZ+jk0rMjWOreYXOA03ak0vUAR7w9f5OUViX+fI0n8SPk1fEgmRuATw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480310; c=relaxed/simple; bh=5EzbUZRisG9ikZ8+iJRdob7XU5ICnl/53pdRmOi4nkk=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=se6bCW3Q4ml01C4UexCfrJkCqoBLfvBRohTNRI45T6Im67dNTdNDBQ9I03mPv+LNhoNVpEbDlX8Pud9AUQRyxgP8YJUSKsUDQnsCleFhNn2JdjtpDCEvH4daHrMaBktEQnDABwIww4JDRgfTb41/Y/CKhM4gP8TYkle9Mr0Hu6A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Y4RwQ3XBKzgZ8D; Fri, 6 Dec 2024 18:15:34 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 41FA418009B; Fri, 6 Dec 2024 18:18:25 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:23 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 15/22] arm64/ptrace: Refator el0_svc_common() Date: Fri, 6 Dec 2024 18:17:37 +0800 Message-ID: <20241206101744.4161990-16-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" As the generic entry, before report_syscall_exit(), it terminate the process if the syscall is issued within a restartable sequence. In preparation for moving arm64 over to the generic entry code, refator el0_svc_common() as below: - Extract syscall_exit_to_user_mode_prepare() helper to replace the the combination of read_thread_flags() and syscall_trace_exit(), also move the syscall exit check logic into it. - Move rseq_syscall() ahead, so the CONFIG_DEBUG_RSEQ check is not needed. - Move has_syscall_work() helper into asm/syscall.h to be reused for ptrace.c. Signed-off-by: Jinjie Ruan --- arch/arm64/include/asm/syscall.h | 7 ++++++- arch/arm64/kernel/ptrace.c | 10 +++++++++- arch/arm64/kernel/syscall.c | 26 +++++--------------------- 3 files changed, 20 insertions(+), 23 deletions(-) diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/sysc= all.h index 925a257145f9..6eeb1e7b033f 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -85,7 +85,12 @@ static inline int syscall_get_arch(struct task_struct *t= ask) return AUDIT_ARCH_AARCH64; } =20 +static inline bool has_syscall_work(unsigned long flags) +{ + return unlikely(flags & _TIF_SYSCALL_WORK); +} + int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long = flags); -void syscall_trace_exit(struct pt_regs *regs, unsigned long flags); +void syscall_exit_to_user_mode_prepare(struct pt_regs *regs); =20 #endif /* __ASM_SYSCALL_H */ diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index bb994d668d74..23df2e558fe9 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2384,7 +2384,7 @@ int syscall_trace_enter(struct pt_regs *regs, long sy= scall, unsigned long flags) return regs->syscallno; } =20 -void syscall_trace_exit(struct pt_regs *regs, unsigned long flags) +static void syscall_trace_exit(struct pt_regs *regs, unsigned long flags) { audit_syscall_exit(regs); =20 @@ -2393,8 +2393,16 @@ void syscall_trace_exit(struct pt_regs *regs, unsign= ed long flags) =20 if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP)) report_syscall_exit(regs); +} + +void syscall_exit_to_user_mode_prepare(struct pt_regs *regs) +{ + unsigned long flags =3D read_thread_flags(); =20 rseq_syscall(regs); + + if (has_syscall_work(flags) || flags & _TIF_SINGLESTEP) + syscall_trace_exit(regs, flags); } =20 /* diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index 064dc114fb9b..a50db885fc34 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -65,11 +65,6 @@ static void invoke_syscall(struct pt_regs *regs, unsigne= d int scno, choose_random_kstack_offset(get_random_u16()); } =20 -static inline bool has_syscall_work(unsigned long flags) -{ - return unlikely(flags & _TIF_SYSCALL_WORK); -} - static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, const syscall_fn_t syscall_table[]) { @@ -125,26 +120,15 @@ static void el0_svc_common(struct pt_regs *regs, int = scno, int sc_nr, if (scno =3D=3D NO_SYSCALL) syscall_set_return_value(current, regs, -ENOSYS, 0); scno =3D syscall_trace_enter(regs, regs->syscallno, flags); - if (scno =3D=3D NO_SYSCALL) - goto trace_exit; + if (scno =3D=3D NO_SYSCALL) { + syscall_exit_to_user_mode_prepare(regs); + return; + } } =20 invoke_syscall(regs, scno, sc_nr, syscall_table); =20 - /* - * The tracing status may have changed under our feet, so we have to - * check again. However, if we were tracing entry, then we always trace - * exit regardless, as the old entry assembly did. - */ - if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) { - flags =3D read_thread_flags(); - if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP)) - return; - } - -trace_exit: - flags =3D read_thread_flags(); - syscall_trace_exit(regs, flags); + syscall_exit_to_user_mode_prepare(regs); } =20 void do_el0_svc(struct pt_regs *regs) --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E618213243 for ; Fri, 6 Dec 2024 10:18:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480314; cv=none; b=q18M1fjlLAjq0J+IGaBtnKqaGjJAWRz0eRpuscljZb5MQLFhZEFmU2G8HZoX/S98KQSJnhkMZC2SVVGT5IpGFa3MS8PTEOiNX83o5GstIi1Wi9UeP/RDS4pEChYyyDc5Kf9PhQBYdDwIoD4tQCm3T0KkqVjQK8uyN0bQFnFYDO0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480314; c=relaxed/simple; bh=f7XyUrdtuP1aaNG1OPRMBXP6UpthIf4nUXQznFhWvn0=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Bm3SEvOY6rcwcr6dtkRgvrFsWBqEJFK7FbK1cETxLNpSm1FKMwICfQRLLlJAXpXJ7+5vKSuEsFp3IsoHNLl2zvJz14A5bXTuhedVYi52cmTc7S1UfXFR1nVAzTzp2BjFu2u4hW0SE7/mkLqI9RFE7VXVBvsMZawDksIkFpzJZzo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4Y4S023RH5z1yrpl; Fri, 6 Dec 2024 18:18:42 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 71C49180041; Fri, 6 Dec 2024 18:18:26 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:25 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 16/22] entry: Make syscall_exit_to_user_mode_prepare() not static Date: Fri, 6 Dec 2024 18:17:38 +0800 Message-ID: <20241206101744.4161990-17-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" In order to switch to the generic entry for arm64, make syscall_exit_to_user_mode_prepare() not static and can be used by arm64. No functional changes. Signed-off-by: Jinjie Ruan --- include/linux/entry-common.h | 1 + kernel/entry/syscall-common.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index b3233e8328c5..d11bdb4679b3 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -172,5 +172,6 @@ void syscall_exit_to_user_mode_work(struct pt_regs *reg= s); * compelling architectural reason to use the separate functions. */ void syscall_exit_to_user_mode(struct pt_regs *regs); +void syscall_exit_to_user_mode_prepare(struct pt_regs *regs); =20 #endif diff --git a/kernel/entry/syscall-common.c b/kernel/entry/syscall-common.c index 0eb036986ad4..f78285097111 100644 --- a/kernel/entry/syscall-common.c +++ b/kernel/entry/syscall-common.c @@ -115,7 +115,7 @@ static void syscall_exit_work(struct pt_regs *regs, uns= igned long work) * Syscall specific exit to user mode preparation. Runs with interrupts * enabled. */ -static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs) +void syscall_exit_to_user_mode_prepare(struct pt_regs *regs) { unsigned long work =3D READ_ONCE(current_thread_info()->syscall_work); unsigned long nr =3D syscall_get_nr(current, regs); --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07D45212B17 for ; Fri, 6 Dec 2024 10:18:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.190 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480311; cv=none; b=sWITxjjiCrann8B+swoVdq8j4xE5MyXN4K6vUECWCCp5t+9b8txGOhVEUr5auJNBWznUCWuBvMr9P80nv+kW0GhvfDbSwdu8I4xBBfHM+sh/AgQ4pYw4159YwhgcwD3GKMtGATg0w8XK7GW1TAPhKuBbqOw0XJDOWrwQ9Xpfgqk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480311; c=relaxed/simple; bh=/TGbD0OI8wrihVD4uQnTyLstw5h27Kq5hKtjHF4+r/M=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=BIJAa4nSgxbforUBA0duF8qyGaciOKgDwp5tEDVx7qeBmbe3BMoFLLPUOAEmw9OkUGYMAmsuEPV8OM8hfT3VxEkR7fil4kazhFtu6fh0peiu2wX3Tnvbh6fu8S6NLCtfGsGwZvgA1MjKQiptNdLEf1rRjoprjikqCKq4mwI2e2s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.190 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rx42B1tz2DhBL; Fri, 6 Dec 2024 18:16:08 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id A739C1A016C; Fri, 6 Dec 2024 18:18:27 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:26 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 17/22] arm64/ptrace: Return early for ptrace_report_syscall_entry() error Date: Fri, 6 Dec 2024 18:17:39 +0800 Message-ID: <20241206101744.4161990-18-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" As the comment said, the calling arch code should abort the system call and must prevent normal entry so no system call is made if ptrace_report_syscall_entry() return nonzero. As the generic entry check error for ptrace_report_syscall_entry(), in preparation for moving arm64 over to the generic entry code, also return early if it returns error. Signed-off-by: Jinjie Ruan --- arch/arm64/kernel/ptrace.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 23df2e558fe9..b53d3759baf8 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2298,10 +2298,10 @@ enum ptrace_syscall_dir { PTRACE_SYSCALL_EXIT, }; =20 -static void report_syscall_enter(struct pt_regs *regs) +static int report_syscall_enter(struct pt_regs *regs) { - int regno; unsigned long saved_reg; + int regno, ret; =20 /* * We have some ABI weirdness here in the way that we handle syscall @@ -2323,9 +2323,13 @@ static void report_syscall_enter(struct pt_regs *reg= s) saved_reg =3D regs->regs[regno]; regs->regs[regno] =3D PTRACE_SYSCALL_ENTER; =20 - if (ptrace_report_syscall_entry(regs)) + ret =3D ptrace_report_syscall_entry(regs); + if (ret) forget_syscall(regs); + regs->regs[regno] =3D saved_reg; + + return ret; } =20 static void report_syscall_exit(struct pt_regs *regs) @@ -2355,9 +2359,11 @@ static void report_syscall_exit(struct pt_regs *regs) =20 int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long = flags) { + int ret; + if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { - report_syscall_enter(regs); - if (flags & _TIF_SYSCALL_EMU) + ret =3D report_syscall_enter(regs); + if (ret || (flags & _TIF_SYSCALL_EMU)) return NO_SYSCALL; } =20 --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFF7D212B32 for ; Fri, 6 Dec 2024 10:18:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.191 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480312; cv=none; b=kVYfMOh6oElNSUpfqqVFc8QEOR7cMKth6iWj+Y8lugJGntcgK8gpVYIh3PKeIs7G7CaM9K4WMJfsyzd/Z9/eItR+TioMHhKJdxevjym9YfPxbjsO2DQerTHjOLzlO7s4UEhMXfj5Iqq5sVWsY4TXM2fqbstLHvM5zp+Txg6T/wc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480312; c=relaxed/simple; bh=+gLFti/3mLJXuSrrgp1/CTOIQla4r+7KOhnM62CIjYM=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HHzae7o53R+H3zlVbCirvKeV/G10ZvZW4H5fFfS0x63CBktbbx9CH9z2auiHQ22k4ch+VHQGVhFzz7bLfHueUcgo44w22w6C3rM1oikD9GUKeZ4PHR360Jkqy74l/r6FcE81hkH2HfIXJvf4HOmQdE/YdrTgfpGb90DmXpm/5t8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.191 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rx553lHz1kvW2; Fri, 6 Dec 2024 18:16:09 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id DBC6A1401DC; Fri, 6 Dec 2024 18:18:28 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:27 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 18/22] arm64/ptrace: Expand secure_computing() in place Date: Fri, 6 Dec 2024 18:17:40 +0800 Message-ID: <20241206101744.4161990-19-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry expand secure_computing() in place and call __secure_computing() directly. In order to switch to the generic entry for arm64, refactor secure_computing() for syscall_trace_enter(). No functional changes. Signed-off-by: Jinjie Ruan --- arch/arm64/kernel/ptrace.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index b53d3759baf8..c0c00e173f61 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2368,8 +2368,11 @@ int syscall_trace_enter(struct pt_regs *regs, long s= yscall, unsigned long flags) } =20 /* Do the secure computing after ptrace; failures should be fast. */ - if (secure_computing() =3D=3D -1) - return NO_SYSCALL; + if (flags & _TIF_SECCOMP) { + ret =3D __secure_computing(NULL); + if (ret =3D=3D -1L) + return NO_SYSCALL; + } =20 /* Either of the above might have changed the syscall number */ syscall =3D syscall_get_nr(current, regs); --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F836213246 for ; Fri, 6 Dec 2024 10:18:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480314; cv=none; b=o9LuHu27tGGn7uaRTmUNR6Njtxoo4SwKu2hTOzDU0Kp5JrC5LiNcuy4DZqKplczeCRMdaeNl0d9DAxkxQJ60is5LNpDOc9LEtc97mF2k3dF+N1VjREtSpY5KlSK0qaQy/10KdpDy2bEPAKDjE7KnmQCdjWnqBsKHBUKGzkxKRAU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480314; c=relaxed/simple; bh=hO/ACs13sOGx2f7jU2Ja1xLPsG0aNXlMGZAJnXP6tAQ=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=UKEUP8c/2mqS9sW3AGJcqzBH/wfqNnjmmOgKTydFfHtV0PyfvtHQutiR9XZgNaoXPnn9fLZxpmBdmQm4S4fzSMfjua+v3cZHOg/vcsEo8sQtePst+q/LyrszOpPDsfdJqw/Q9mf/u9tZ7z1+ZjwBIDWJm/zJH1AYjZw3TjLWLPE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rxj60gFzqTYt; Fri, 6 Dec 2024 18:16:41 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 13D7618009B; Fri, 6 Dec 2024 18:18:30 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:28 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 19/22] arm64/ptrace: Use syscall_get_arguments() heleper Date: Fri, 6 Dec 2024 18:17:41 +0800 Message-ID: <20241206101744.4161990-20-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" The generic entry check audit context first and use syscall_get_arguments() helper. In order to switch to the generic entry for arm64, - Also use the helper. - Extract the syscall_enter_audit() helper to make it clear. - Check audit context for syscall_enter_audit(), which only adds one additional check without any other differences as audit_syscall_entry() check it first otherwise do nothing. No functional changes. Signed-off-by: Jinjie Ruan --- arch/arm64/kernel/ptrace.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index c0c00e173f61..3a7a1eaca0a9 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -2357,6 +2357,17 @@ static void report_syscall_exit(struct pt_regs *regs) } } =20 +static inline void syscall_enter_audit(struct pt_regs *regs, long syscall) +{ + if (unlikely(audit_context())) { + unsigned long args[6]; + + syscall_get_arguments(current, regs, args); + audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]); + } + +} + int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long = flags) { int ret; @@ -2387,8 +2398,7 @@ int syscall_trace_enter(struct pt_regs *regs, long sy= scall, unsigned long flags) syscall =3D syscall_get_nr(current, regs); } =20 - audit_syscall_entry(syscall, regs->orig_x0, regs->regs[1], - regs->regs[2], regs->regs[3]); + syscall_enter_audit(regs, syscall); =20 return regs->syscallno; } --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3D8C2139C9 for ; Fri, 6 Dec 2024 10:18:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480316; cv=none; b=uvubQqaViYlPw3G3s1OVOLdRHNB/NbdQDp6XWehJSsOMNiU43wmSMluXHCs8NL/gSarmcfe3ahI8uHs8jO8YAWli22sy1aHhdsGphuTZvh2RnSaPC1A4xamlh3HXaMEePCa5+Q3EBJlBOyNnoqS09n7v1N7NZ7KMcje8zAjRcC0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480316; c=relaxed/simple; bh=P2ebTJF2keHtL38GcUMbtMHg5qQ2GTqSVGrKE4zJbmc=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Lg/D5GGHtkm4ZWkREZPOTVHz/J5ZgTLAwYgWEbcE4xI/o8J87uaHAcIFCjzTBLrn2P197jGwt6ROmVLO/ejgh+y/Ym7GJGmMblsqy2aARdrptdaoj6epWIPe1PjIMrE90YIShuwY+xArZWRO4WeimOlRV5NMGi5/+xol37MupH4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Y4Rx80XRtz11Pbq; Fri, 6 Dec 2024 18:16:12 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 41226180106; Fri, 6 Dec 2024 18:18:31 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:29 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 20/22] entry: Add arch_ptrace_report_syscall_entry/exit() Date: Fri, 6 Dec 2024 18:17:42 +0800 Message-ID: <20241206101744.4161990-21-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" Due to historical reasons, ARM64 need to save/restore during syscall entry/exit because ARM64 use a scratch register (ip(r12) on AArch32, x7 on AArch64) to denote syscall entry/exit, which differs from the implementation of the generic entry. Add arch_ptrace_report_syscall_entry/exit() as the default ptrace_report_syscall_entry/exit() implementation. This allows arm64 to implement the architecture specific version for switching over to the generic entry code. Suggested-by: Mark Rutland Suggested-by: Kevin Brodsky Suggested-by: Thomas Gleixner Signed-off-by: Jinjie Ruan --- kernel/entry/syscall-common.c | 43 +++++++++++++++++++++++++++++++++-- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/kernel/entry/syscall-common.c b/kernel/entry/syscall-common.c index f78285097111..9ffa6349e769 100644 --- a/kernel/entry/syscall-common.c +++ b/kernel/entry/syscall-common.c @@ -17,6 +17,25 @@ static inline void syscall_enter_audit(struct pt_regs *r= egs, long syscall) } } =20 +/** + * arch_ptrace_report_syscall_entry - Architecture specific + * ptrace_report_syscall_entry(). + * + * Invoked from syscall_trace_enter() to wrap ptrace_report_syscall_entry(= ). + * Defaults to ptrace_report_syscall_entry. + * + * The main purpose is to support arch-specific ptrace_report_syscall_entr= y() + * implementation. + */ +static inline int arch_ptrace_report_syscall_entry(struct pt_regs *regs); + +#ifndef arch_ptrace_report_syscall_entry +static inline int arch_ptrace_report_syscall_entry(struct pt_regs *regs) +{ + return ptrace_report_syscall_entry(regs); +} +#endif + long syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long work) { @@ -34,7 +53,7 @@ long syscall_trace_enter(struct pt_regs *regs, long sysca= ll, =20 /* Handle ptrace */ if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) { - ret =3D ptrace_report_syscall_entry(regs); + ret =3D arch_ptrace_report_syscall_entry(regs); if (ret || (work & SYSCALL_WORK_SYSCALL_EMU)) return -1L; } @@ -84,6 +103,26 @@ static inline bool report_single_step(unsigned long wor= k) return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP; } =20 +/** + * arch_ptrace_report_syscall_exit - Architecture specific + * ptrace_report_syscall_exit. + * + * Invoked from syscall_exit_work() to wrap ptrace_report_syscall_exit(). + * + * The main purpose is to support arch-specific ptrace_report_syscall_exit + * implementation. + */ +static inline void arch_ptrace_report_syscall_exit(struct pt_regs *regs, + int step); + +#ifndef arch_ptrace_report_syscall_exit +static inline void arch_ptrace_report_syscall_exit(struct pt_regs *regs, + int step) +{ + ptrace_report_syscall_exit(regs, step); +} +#endif + static void syscall_exit_work(struct pt_regs *regs, unsigned long work) { bool step; @@ -108,7 +147,7 @@ static void syscall_exit_work(struct pt_regs *regs, uns= igned long work) =20 step =3D report_single_step(work); if (step || work & SYSCALL_WORK_SYSCALL_TRACE) - ptrace_report_syscall_exit(regs, step); + arch_ptrace_report_syscall_exit(regs, step); } =20 /* --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA9C52144B1 for ; Fri, 6 Dec 2024 10:18:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.255 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480316; cv=none; b=IkPgzIT+8p3ILkX76UcgFbn+mznBw03Gbef+4/V9OUSdlEL6UlzuU3c9w4N3BJCfykrmovoTMqwI0PkoYUG+HmtNhs9hVoqDwuh3mYK7yIWS1K5hqPy8uxEmli7SOkrXELO5Q415uqKVkHEe/fTnN988kx0PO97KDFmwnsxesaY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480316; c=relaxed/simple; bh=VxaLRn3ARxNYR9P6Mzk4WJEMDHtaBCBbdJPioxFWoWw=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GEcjg1fNUv2YymVw2P2pvYdjQ3d0vSV7zCY4AT17OuHqY/RJ0HtzE/SbMkEuOraLeOU8j0Kvb82zgqsfqsm1ljzKYBjaHIocj9xhapXdZdDb3kDvpx3A7WoVInia/rQ4eJyxK/+V4XjPulssAPrU24daGcD/rSDY1lFwvqK25tw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.255 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4Y4RwP2rcJz1V5h9; Fri, 6 Dec 2024 18:15:33 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id 728BD1800D9; Fri, 6 Dec 2024 18:18:32 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:31 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 21/22] entry: Add has_syscall_work() helepr Date: Fri, 6 Dec 2024 18:17:43 +0800 Message-ID: <20241206101744.4161990-22-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" Add has_syscall_work() heleper and use it in entry.h. The benefits of doing so lie in the fact that it can be used in the architecture code that uses generic entry. No functional changes. Signed-off-by: Jinjie Ruan --- include/linux/entry-common.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index d11bdb4679b3..3bb5d7d839f4 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -43,6 +43,11 @@ SYSCALL_WORK_SYSCALL_EXIT_TRAP | \ ARCH_SYSCALL_WORK_EXIT) =20 +static inline bool has_syscall_work(unsigned long work) +{ + return unlikely(work & SYSCALL_WORK_ENTER); +} + /** * syscall_enter_from_user_mode_prepare - Establish state and enable inter= rupts * @regs: Pointer to currents pt_regs @@ -90,7 +95,7 @@ static __always_inline long syscall_enter_from_user_mode_= work(struct pt_regs *re { unsigned long work =3D READ_ONCE(current_thread_info()->syscall_work); =20 - if (work & SYSCALL_WORK_ENTER) + if (has_syscall_work(work)) syscall =3D syscall_trace_enter(regs, syscall, work); =20 return syscall; --=20 2.34.1 From nobody Fri Dec 27 06:06:14 2024 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7E29214A66 for ; Fri, 6 Dec 2024 10:18:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480318; cv=none; b=mhZTCrVRFIDay8ez0Q2jyVsAYZFXJeQFruwcSzkp5+u+Rg/DzP5LxsIuyQ+n/bCR0e8TDxOQUAYcf1bNSgxQ5Q/zlxnCAR1migkqjB264vmVObQuXyDcZhNYyrFRvxvUkUtPvjaoAUrmonsIJA1E8ouWBYw1wWcQlW78z95j778= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733480318; c=relaxed/simple; bh=YEK0+FJ69BIXMuNncC7yHMB2l20wvUFyHWVwmtqUB+c=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XaGjAA1dkAq7LFX9tn3LiYUlBjhwaXex+viDzJf3hQDd/p8SdEA9Hk65n5O4/XJsgxkuTgRcWXntmtd5iAOFtFNWNgPHRBBgtUdhcnqRks8gbSIuF17bmcCwju8G1fGkkL75U4vGH8qssp6f5R3oqtByN3LmT/C/GY/afGC2sJ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Y4RwZ71KbzgZ88; Fri, 6 Dec 2024 18:15:42 +0800 (CST) Received: from kwepemg200008.china.huawei.com (unknown [7.202.181.35]) by mail.maildlp.com (Postfix) with ESMTPS id B46F71800D9; Fri, 6 Dec 2024 18:18:33 +0800 (CST) Received: from huawei.com (10.90.53.73) by kwepemg200008.china.huawei.com (7.202.181.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 6 Dec 2024 18:18:32 +0800 From: Jinjie Ruan To: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH -next v5 22/22] arm64: entry: Convert to generic entry Date: Fri, 6 Dec 2024 18:17:44 +0800 Message-ID: <20241206101744.4161990-23-ruanjinjie@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241206101744.4161990-1-ruanjinjie@huawei.com> References: <20241206101744.4161990-1-ruanjinjie@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemg200008.china.huawei.com (7.202.181.35) Content-Type: text/plain; charset="utf-8" Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64 to use the generic entry infrastructure from kernel/entry/*. The generic entry makes maintainers' work easier and codes more elegant. The changes are below: - Remove TIF_SYSCALL_* flag, _TIF_WORK_MASK, _TIF_SYSCALL_WORK - Remove syscall_trace_enter/exit() and use generic identical functions. Tested ok with following test cases on Qemu virt platform: - Perf tests. - Different `dynamic preempt` mode switch. - Pseudo NMI tests. - Stress-ng CPU stress test. - MTE test case in Documentation/arch/arm64/memory-tagging-extension.rst and all test cases in tools/testing/selftests/arm64/mte/*. Suggested-by: Mark Rutland Signed-off-by: Jinjie Ruan --- v5: - Rebased on the previous patch udapte. - Define ARCH_SYSCALL_WORK_EXIT. --- arch/arm64/Kconfig | 2 +- arch/arm64/include/asm/entry-common.h | 70 ++++++++++++++ arch/arm64/include/asm/syscall.h | 7 +- arch/arm64/include/asm/thread_info.h | 23 +---- arch/arm64/kernel/debug-monitors.c | 7 ++ arch/arm64/kernel/ptrace.c | 134 -------------------------- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/syscall.c | 6 +- 8 files changed, 87 insertions(+), 164 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 3751ab9f2a21..a1d96712428e 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -148,9 +148,9 @@ config ARM64 select GENERIC_CPU_DEVICES select GENERIC_CPU_VULNERABILITIES select GENERIC_EARLY_IOREMAP + select GENERIC_ENTRY select GENERIC_IDLE_POLL_SETUP select GENERIC_IOREMAP - select GENERIC_IRQ_ENTRY select GENERIC_IRQ_IPI select GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD select GENERIC_IRQ_PROBE diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm= /entry-common.h index 1cc9d966a6c3..6082393c61f2 100644 --- a/arch/arm64/include/asm/entry-common.h +++ b/arch/arm64/include/asm/entry-common.h @@ -10,6 +10,12 @@ #include #include =20 +enum ptrace_syscall_dir { + PTRACE_SYSCALL_ENTER =3D 0, + PTRACE_SYSCALL_EXIT, +}; + +#define ARCH_SYSCALL_WORK_EXIT (SYSCALL_WORK_SECCOMP | SYSCALL_WORK_SYSCAL= L_EMU) #define ARCH_EXIT_TO_USER_MODE_WORK (_TIF_MTE_ASYNC_FAULT | _TIF_FOREIGN_F= PSTATE) =20 static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *re= gs, @@ -61,4 +67,68 @@ static inline bool arch_irqentry_exit_need_resched(void) =20 #define arch_irqentry_exit_need_resched arch_irqentry_exit_need_resched =20 +static inline int arch_ptrace_report_syscall_entry(struct pt_regs *regs) +{ + unsigned long saved_reg; + int regno, ret; + + /* + * We have some ABI weirdness here in the way that we handle syscall + * exit stops because we indicate whether or not the stop has been + * signalled from syscall entry or syscall exit by clobbering a general + * purpose register (ip/r12 for AArch32, x7 for AArch64) in the tracee + * and restoring its old value after the stop. This means that: + * + * - Any writes by the tracer to this register during the stop are + * ignored/discarded. + * + * - The actual value of the register is not available during the stop, + * so the tracer cannot save it and restore it later. + * + * - Syscall stops behave differently to seccomp and pseudo-step traps + * (the latter do not nobble any registers). + */ + regno =3D (is_compat_task() ? 12 : 7); + saved_reg =3D regs->regs[regno]; + regs->regs[regno] =3D PTRACE_SYSCALL_ENTER; + + ret =3D ptrace_report_syscall_entry(regs); + if (ret) + forget_syscall(regs); + + regs->regs[regno] =3D saved_reg; + + return ret; +} + +#define arch_ptrace_report_syscall_entry arch_ptrace_report_syscall_entry + +static inline void arch_ptrace_report_syscall_exit(struct pt_regs *regs, + int step) +{ + unsigned long saved_reg; + int regno; + + /* See comment for arch_ptrace_report_syscall_entry() */ + regno =3D (is_compat_task() ? 12 : 7); + saved_reg =3D regs->regs[regno]; + regs->regs[regno] =3D PTRACE_SYSCALL_EXIT; + + if (!test_thread_flag(TIF_SINGLESTEP)) { + ptrace_report_syscall_exit(regs, 0); + regs->regs[regno] =3D saved_reg; + } else { + regs->regs[regno] =3D saved_reg; + + /* + * Signal a pseudo-step exception since we are stepping but + * tracer modifications to the registers may have rewound the + * state machine. + */ + ptrace_report_syscall_exit(regs, 1); + } +} + +#define arch_ptrace_report_syscall_exit arch_ptrace_report_syscall_exit + #endif /* _ASM_ARM64_ENTRY_COMMON_H */ diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/sysc= all.h index 6eeb1e7b033f..9891b15da4c3 100644 --- a/arch/arm64/include/asm/syscall.h +++ b/arch/arm64/include/asm/syscall.h @@ -85,12 +85,9 @@ static inline int syscall_get_arch(struct task_struct *t= ask) return AUDIT_ARCH_AARCH64; } =20 -static inline bool has_syscall_work(unsigned long flags) +static inline bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs) { - return unlikely(flags & _TIF_SYSCALL_WORK); + return false; } =20 -int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long = flags); -void syscall_exit_to_user_mode_prepare(struct pt_regs *regs); - #endif /* __ASM_SYSCALL_H */ diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/= thread_info.h index 1114c1c3300a..543fdb00d713 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -43,6 +43,7 @@ struct thread_info { void *scs_sp; #endif u32 cpu; + unsigned long syscall_work; /* SYSCALL_WORK_ flags */ }; =20 #define thread_saved_pc(tsk) \ @@ -64,11 +65,6 @@ void arch_setup_new_exec(void); #define TIF_UPROBE 4 /* uprobe breakpoint or singlestep */ #define TIF_MTE_ASYNC_FAULT 5 /* MTE Asynchronous Tag Check Fault */ #define TIF_NOTIFY_SIGNAL 6 /* signal notifications exist */ -#define TIF_SYSCALL_TRACE 8 /* syscall trace active */ -#define TIF_SYSCALL_AUDIT 9 /* syscall auditing */ -#define TIF_SYSCALL_TRACEPOINT 10 /* syscall tracepoint for ftrace */ -#define TIF_SECCOMP 11 /* syscall secure computing */ -#define TIF_SYSCALL_EMU 12 /* syscall emulation active */ #define TIF_MEMDIE 18 /* is terminating due to OOM killer */ #define TIF_FREEZE 19 #define TIF_RESTORE_SIGMASK 20 @@ -87,28 +83,13 @@ void arch_setup_new_exec(void); #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) #define _TIF_FOREIGN_FPSTATE (1 << TIF_FOREIGN_FPSTATE) -#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) -#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) -#define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT) -#define _TIF_SECCOMP (1 << TIF_SECCOMP) -#define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU) -#define _TIF_UPROBE (1 << TIF_UPROBE) -#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP) +#define _TIF_UPROBE (1 << TIF_UPROBE) #define _TIF_32BIT (1 << TIF_32BIT) #define _TIF_SVE (1 << TIF_SVE) #define _TIF_MTE_ASYNC_FAULT (1 << TIF_MTE_ASYNC_FAULT) #define _TIF_NOTIFY_SIGNAL (1 << TIF_NOTIFY_SIGNAL) #define _TIF_TSC_SIGSEGV (1 << TIF_TSC_SIGSEGV) =20 -#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ - _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \ - _TIF_UPROBE | _TIF_MTE_ASYNC_FAULT | \ - _TIF_NOTIFY_SIGNAL) - -#define _TIF_SYSCALL_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ - _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \ - _TIF_SYSCALL_EMU) - #ifdef CONFIG_SHADOW_CALL_STACK #define INIT_SCS \ .scs_base =3D init_shadow_call_stack, \ diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-m= onitors.c index 460c09d03a73..95b70555a1a8 100644 --- a/arch/arm64/kernel/debug-monitors.c +++ b/arch/arm64/kernel/debug-monitors.c @@ -452,11 +452,18 @@ void user_enable_single_step(struct task_struct *task) =20 if (!test_and_set_ti_thread_flag(ti, TIF_SINGLESTEP)) set_regs_spsr_ss(task_pt_regs(task)); + + /* + * Ensure that a trap is triggered once stepping out of a system + * call prior to executing any user instruction. + */ + set_task_syscall_work(task, SYSCALL_EXIT_TRAP); } NOKPROBE_SYMBOL(user_enable_single_step); =20 void user_disable_single_step(struct task_struct *task) { clear_ti_thread_flag(task_thread_info(task), TIF_SINGLESTEP); + clear_task_syscall_work(task, SYSCALL_EXIT_TRAP); } NOKPROBE_SYMBOL(user_disable_single_step); diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 3a7a1eaca0a9..a09058b9b7fb 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -42,9 +42,6 @@ #include #include =20 -#define CREATE_TRACE_POINTS -#include - struct pt_regs_offset { const char *name; int offset; @@ -2293,137 +2290,6 @@ long arch_ptrace(struct task_struct *child, long re= quest, return ptrace_request(child, request, addr, data); } =20 -enum ptrace_syscall_dir { - PTRACE_SYSCALL_ENTER =3D 0, - PTRACE_SYSCALL_EXIT, -}; - -static int report_syscall_enter(struct pt_regs *regs) -{ - unsigned long saved_reg; - int regno, ret; - - /* - * We have some ABI weirdness here in the way that we handle syscall - * exit stops because we indicate whether or not the stop has been - * signalled from syscall entry or syscall exit by clobbering a general - * purpose register (ip/r12 for AArch32, x7 for AArch64) in the tracee - * and restoring its old value after the stop. This means that: - * - * - Any writes by the tracer to this register during the stop are - * ignored/discarded. - * - * - The actual value of the register is not available during the stop, - * so the tracer cannot save it and restore it later. - * - * - Syscall stops behave differently to seccomp and pseudo-step traps - * (the latter do not nobble any registers). - */ - regno =3D (is_compat_task() ? 12 : 7); - saved_reg =3D regs->regs[regno]; - regs->regs[regno] =3D PTRACE_SYSCALL_ENTER; - - ret =3D ptrace_report_syscall_entry(regs); - if (ret) - forget_syscall(regs); - - regs->regs[regno] =3D saved_reg; - - return ret; -} - -static void report_syscall_exit(struct pt_regs *regs) -{ - int regno; - unsigned long saved_reg; - - /* See comment for report_syscall_enter() */ - regno =3D (is_compat_task() ? 12 : 7); - saved_reg =3D regs->regs[regno]; - regs->regs[regno] =3D PTRACE_SYSCALL_EXIT; - - if (!test_thread_flag(TIF_SINGLESTEP)) { - ptrace_report_syscall_exit(regs, 0); - regs->regs[regno] =3D saved_reg; - } else { - regs->regs[regno] =3D saved_reg; - - /* - * Signal a pseudo-step exception since we are stepping but - * tracer modifications to the registers may have rewound the - * state machine. - */ - ptrace_report_syscall_exit(regs, 1); - } -} - -static inline void syscall_enter_audit(struct pt_regs *regs, long syscall) -{ - if (unlikely(audit_context())) { - unsigned long args[6]; - - syscall_get_arguments(current, regs, args); - audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]); - } - -} - -int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long = flags) -{ - int ret; - - if (flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { - ret =3D report_syscall_enter(regs); - if (ret || (flags & _TIF_SYSCALL_EMU)) - return NO_SYSCALL; - } - - /* Do the secure computing after ptrace; failures should be fast. */ - if (flags & _TIF_SECCOMP) { - ret =3D __secure_computing(NULL); - if (ret =3D=3D -1L) - return NO_SYSCALL; - } - - /* Either of the above might have changed the syscall number */ - syscall =3D syscall_get_nr(current, regs); - - if (test_thread_flag(TIF_SYSCALL_TRACEPOINT)) { - trace_sys_enter(regs, syscall); - - /* - * Probes or BPF hooks in the tracepoint may have changed the - * system call number as well. - */ - syscall =3D syscall_get_nr(current, regs); - } - - syscall_enter_audit(regs, syscall); - - return regs->syscallno; -} - -static void syscall_trace_exit(struct pt_regs *regs, unsigned long flags) -{ - audit_syscall_exit(regs); - - if (flags & _TIF_SYSCALL_TRACEPOINT) - trace_sys_exit(regs, syscall_get_return_value(current, regs)); - - if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP)) - report_syscall_exit(regs); -} - -void syscall_exit_to_user_mode_prepare(struct pt_regs *regs) -{ - unsigned long flags =3D read_thread_flags(); - - rseq_syscall(regs); - - if (has_syscall_work(flags) || flags & _TIF_SINGLESTEP) - syscall_trace_exit(regs, flags); -} - /* * SPSR_ELx bits which are always architecturally RES0 per ARM DDI 0487D.a. * We permit userspace to set SSBS (AArch64 bit 12, AArch32 bit 23) which = is diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 84b6628647c7..6cc8fe19e6a0 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -8,8 +8,8 @@ =20 #include #include +#include #include -#include #include #include #include diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index a50db885fc34..5aa585111c4b 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -2,6 +2,7 @@ =20 #include #include +#include #include #include #include @@ -68,6 +69,7 @@ static void invoke_syscall(struct pt_regs *regs, unsigned= int scno, static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, const syscall_fn_t syscall_table[]) { + unsigned long work =3D READ_ONCE(current_thread_info()->syscall_work); unsigned long flags =3D read_thread_flags(); =20 regs->orig_x0 =3D regs->regs[0]; @@ -101,7 +103,7 @@ static void el0_svc_common(struct pt_regs *regs, int sc= no, int sc_nr, return; } =20 - if (has_syscall_work(flags)) { + if (has_syscall_work(work)) { /* * The de-facto standard way to skip a system call using ptrace * is to set the system call to -1 (NO_SYSCALL) and set x0 to a @@ -119,7 +121,7 @@ static void el0_svc_common(struct pt_regs *regs, int sc= no, int sc_nr, */ if (scno =3D=3D NO_SYSCALL) syscall_set_return_value(current, regs, -ENOSYS, 0); - scno =3D syscall_trace_enter(regs, regs->syscallno, flags); + scno =3D syscall_trace_enter(regs, regs->syscallno, work); if (scno =3D=3D NO_SYSCALL) { syscall_exit_to_user_mode_prepare(regs); return; --=20 2.34.1