From nobody Wed Nov 27 23:32:06 2024 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E337218C327 for ; Mon, 7 Oct 2024 07:51:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728287521; cv=none; b=tEoXcJ9Wt07il4jEu11hwvGAeQNxuW8VmGIMK7NiNE8PFhh06fWm8iAUSnAkbOe8BtCLPYLZHhAgN/9U38w9xhOhwwwrxWqB9/KiAL4WucICZD1shVo9c1Y2+q83QzNuRgO5H0ye3s5XeM5Ok6mzifMlXluCEpi+lbvBS7UbFcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728287521; c=relaxed/simple; bh=OX8KGupN8rW/QFbg83LuXMRW/ellxnM86n9Ov8ZGJRI=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=bgrTC0PFC02jjQlUTNvXNb6Auk3D71Oq9L3FkCNLnahbcoV4Gh0FnhtZyEVR7rC3MDEF8RQkpJrv3zA9T0gfIrxHwAqQf0Aong4dXc060FAKjGf2/1k3BN3nsoDjhYcJHZffP33Ixb/lF1Mw+FcVASYCWPYrPqPWsNonGZFYHfE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=eduVKx7g; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="eduVKx7g" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=eJ8uirHpYpBUE4MiXI/smRm+GvzNWv+XYpFw0nRJ8Yg=; b=eduVKx7gP3bhf5GpOlMou1pF2X vctuCj1JuV0+EvP3F5uKxTDkPZo/LFDt2Vgeq2hSzv0DL1IcEnQsai5Z29nDMpT9tOHI8YsDE4BqJ WZVb4tb6/n79VxhCareIngC6i5Obv146ooe3sIkF8rt2O2R76nwByZGASWG/piyNKezCs30T/uNm+ 8PO/0sQTaDmF5ioij/XmU7VRwli8uzwxO8I01J5/scJjPB8SU7vyPrqa0Hlv3wK8emSiAzua8w+qe I9h5DImAdxW1kO3aKCZCnILGyKMhUcfHDFXk8oTSsUyYnEFuD5RS7GiRyhuW6KXqlVB+zg1YEDJT7 5Et920WA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1sxiWu-0000000Gvkv-2Ix7; Mon, 07 Oct 2024 07:51:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 32B43300ABE; Mon, 7 Oct 2024 09:51:44 +0200 (CEST) Message-Id: <20241007075055.219540785@infradead.org> User-Agent: quilt/0.65 Date: Mon, 07 Oct 2024 09:46:10 +0200 From: Peter Zijlstra To: bigeasy@linutronix.de, tglx@linutronix.de, mingo@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, ankur.a.arora@oracle.com, efault@gmx.de Subject: [PATCH 1/5] sched: Add TIF_NEED_RESCHED_LAZY infrastructure References: <20241007074609.447006177@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the basic infrastructure to split the TIF_NEED_RESCHED bit in two. Either bit will cause a resched on return-to-user, but only TIF_NEED_RESCHED will drive IRQ preemption. No behavioural change intended. Suggested-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Sebastian Andrzej Siewior --- include/linux/entry-common.h | 3 ++- include/linux/entry-kvm.h | 5 +++-- include/linux/sched.h | 3 ++- include/linux/thread_info.h | 21 +++++++++++++++++---- kernel/entry/common.c | 2 +- kernel/entry/kvm.c | 4 ++-- kernel/sched/core.c | 34 +++++++++++++++++++++------------- 7 files changed, 48 insertions(+), 24 deletions(-) --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -64,7 +64,8 @@ =20 #define EXIT_TO_USER_MODE_WORK \ (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ - _TIF_NEED_RESCHED | _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL | \ + _TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY | \ + _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL | \ ARCH_EXIT_TO_USER_MODE_WORK) =20 /** --- a/include/linux/entry-kvm.h +++ b/include/linux/entry-kvm.h @@ -17,8 +17,9 @@ #endif =20 #define XFER_TO_GUEST_MODE_WORK \ - (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL | \ - _TIF_NOTIFY_RESUME | ARCH_XFER_TO_GUEST_MODE_WORK) + (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY | _TIF_SIGPENDING | \ + _TIF_NOTIFY_SIGNAL | _TIF_NOTIFY_RESUME | \ + ARCH_XFER_TO_GUEST_MODE_WORK) =20 struct kvm_vcpu; =20 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2002,7 +2002,8 @@ static inline void set_tsk_need_resched( =20 static inline void clear_tsk_need_resched(struct task_struct *tsk) { - clear_tsk_thread_flag(tsk,TIF_NEED_RESCHED); + atomic_long_andnot(_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY, + (atomic_long_t *)&task_thread_info(tsk)->flags); } =20 static inline int test_tsk_need_resched(struct task_struct *tsk) --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -59,6 +59,14 @@ enum syscall_work_bit { =20 #include =20 +#ifndef TIF_NEED_RESCHED_LAZY +#ifdef CONFIG_ARCH_HAS_PREEMPT_LAZY +#error Inconsistent PREEMPT_LAZY +#endif +#define TIF_NEED_RESCHED_LAZY TIF_NEED_RESCHED +#define _TIF_NEED_RESCHED_LAZY _TIF_NEED_RESCHED +#endif + #ifdef __KERNEL__ =20 #ifndef arch_set_restart_data @@ -179,22 +187,27 @@ static __always_inline unsigned long rea =20 #ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H =20 -static __always_inline bool tif_need_resched(void) +static __always_inline bool tif_test_bit(int bit) { - return arch_test_bit(TIF_NEED_RESCHED, + return arch_test_bit(bit, (unsigned long *)(¤t_thread_info()->flags)); } =20 #else =20 -static __always_inline bool tif_need_resched(void) +static __always_inline bool tif_test_bit(int bit) { - return test_bit(TIF_NEED_RESCHED, + return test_bit(bit, (unsigned long *)(¤t_thread_info()->flags)); } =20 #endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H */ =20 +static __always_inline bool tif_need_resched(void) +{ + return tif_test_bit(TIF_NEED_RESCHED); +} + #ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES static inline int arch_within_stack_frames(const void * const stack, const void * const stackend, --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -98,7 +98,7 @@ __always_inline unsigned long exit_to_us =20 local_irq_enable_exit_to_user(ti_work); =20 - if (ti_work & _TIF_NEED_RESCHED) + if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) schedule(); =20 if (ti_work & _TIF_UPROBE) --- a/kernel/entry/kvm.c +++ b/kernel/entry/kvm.c @@ -13,7 +13,7 @@ static int xfer_to_guest_mode_work(struc return -EINTR; } =20 - if (ti_work & _TIF_NEED_RESCHED) + if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) schedule(); =20 if (ti_work & _TIF_NOTIFY_RESUME) @@ -24,7 +24,7 @@ static int xfer_to_guest_mode_work(struc return ret; =20 ti_work =3D read_thread_flags(); - } while (ti_work & XFER_TO_GUEST_MODE_WORK || need_resched()); + } while (ti_work & XFER_TO_GUEST_MODE_WORK); return 0; } =20 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -936,10 +936,9 @@ static inline void hrtick_rq_init(struct * this avoids any races wrt polling state changes and thereby avoids * spurious IPIs. */ -static inline bool set_nr_and_not_polling(struct task_struct *p) +static inline bool set_nr_and_not_polling(struct thread_info *ti, int tif) { - struct thread_info *ti =3D task_thread_info(p); - return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG); + return !(fetch_or(&ti->flags, 1 << tif) & _TIF_POLLING_NRFLAG); } =20 /* @@ -964,9 +963,9 @@ static bool set_nr_if_polling(struct tas } =20 #else -static inline bool set_nr_and_not_polling(struct task_struct *p) +static inline bool set_nr_and_not_polling(struct thread_info *ti, int tif) { - set_tsk_need_resched(p); + atomic_long_or(1 << tif, (atomic_long_t *)&ti->flags); return true; } =20 @@ -1071,28 +1070,37 @@ void wake_up_q(struct wake_q_head *head) * might also involve a cross-CPU call to trigger the scheduler on * the target CPU. */ -void resched_curr(struct rq *rq) +static void __resched_curr(struct rq *rq, int tif) { struct task_struct *curr =3D rq->curr; + struct thread_info *cti =3D task_thread_info(curr); int cpu; =20 lockdep_assert_rq_held(rq); =20 - if (test_tsk_need_resched(curr)) + if (cti->flags & ((1 << tif) | _TIF_NEED_RESCHED)) return; =20 cpu =3D cpu_of(rq); =20 if (cpu =3D=3D smp_processor_id()) { - set_tsk_need_resched(curr); - set_preempt_need_resched(); + set_ti_thread_flag(cti, tif); + if (tif =3D=3D TIF_NEED_RESCHED) + set_preempt_need_resched(); return; } =20 - if (set_nr_and_not_polling(curr)) - smp_send_reschedule(cpu); - else + if (set_nr_and_not_polling(cti, tif)) { + if (tif =3D=3D TIF_NEED_RESCHED) + smp_send_reschedule(cpu); + } else { trace_sched_wake_idle_without_ipi(cpu); + } +} + +void resched_curr(struct rq *rq) +{ + __resched_curr(rq, TIF_NEED_RESCHED); } =20 void resched_cpu(int cpu) @@ -1187,7 +1195,7 @@ static void wake_up_idle_cpu(int cpu) * and testing of the above solutions didn't appear to report * much benefits. */ - if (set_nr_and_not_polling(rq->idle)) + if (set_nr_and_not_polling(task_thread_info(rq->idle), TIF_NEED_RESCHED)) smp_send_reschedule(cpu); else trace_sched_wake_idle_without_ipi(cpu); From nobody Wed Nov 27 23:32:06 2024 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF34518C017 for ; Mon, 7 Oct 2024 07:51:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728287519; cv=none; b=KPv8e8ofbkDASlheUJV2jijFqAb9CZXKm8oYnssT8qKoxCJF6k93QSeQE6LNuN2xJg/TxDGsGZX+MqKYg5QmwShUNnv6k7fVL+WvA9Lv4OKwykXUd7yR/Rk5PbOjRtAcypBWZ50pAVDSV64XfxIH7izvm7FmahhtZyDU8ONf/dQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728287519; c=relaxed/simple; bh=RCUseBAvGquRefoieB96aAdGNA9/NTeQ8XeLRDQpzac=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=V4Jji1ivpPKF61dGA0L8IiJcFr87vrPe7amepLuGhKnfKg2R3OsXX5zdtPDE6tfOBFiBmjb37xVByK4ojKjIwVBPT1tqzXRrczl3LzErAWGW0i1VRJWXXTzJupESJCcl9frsUZVZ2ysfmxK//A3ZravGfYIcJQ3exNODkrDbNR4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=oG47bYnw; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="oG47bYnw" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=nATjKTapaRz3vaFy/xB6j/zCcWoh/uDG0jvv6zZuLJQ=; b=oG47bYnwU6w969RsE721YkPcyj L6EPJm0gzVuEo4DJ4kxk7Nug91EaDpavpMcPtWJp9HyvA7A/563ueitTw1CyAdDpkPAha2DIhpKfc elPGvI9TjNZSlbBCkqx722Yxbz7XZ1b6MaA9V8u3GD5RrwuWURhg5H1tevHzx6r/KPQIG8ObEmm5B R3vshRIduUf23fiJ35MXyqkZjBhg/tS69Djsy+Vx84mYOG22EYvzlPXccBJKqSzCdIY51DBt5o6Md PXfOu4HAQwap7mMm+iMw1UhTbr1Jzrv7ZqPc6ndkb3p7bRsZEmjuvpuggWG80WwBk4QLVwkfInhbF Jslc2dOA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1sxiWu-0000000Gvkw-2Eq6; Mon, 07 Oct 2024 07:51:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 36FF630221D; Mon, 7 Oct 2024 09:51:44 +0200 (CEST) Message-Id: <20241007075055.331243614@infradead.org> User-Agent: quilt/0.65 Date: Mon, 07 Oct 2024 09:46:11 +0200 From: Peter Zijlstra To: bigeasy@linutronix.de, tglx@linutronix.de, mingo@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, ankur.a.arora@oracle.com, efault@gmx.de Subject: [PATCH 2/5] sched: Add Lazy preemption model References: <20241007074609.447006177@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Change fair to use resched_curr_lazy(), which, when the lazy preemption model is selected, will set TIF_NEED_RESCHED_LAZY. This LAZY bit will be promoted to the full NEED_RESCHED bit on tick. As such, the average delay between setting LAZY and actually rescheduling will be TICK_NSEC/2. In short, Lazy preemption will delay preemption for fair class but will function as Full preemption for all the other classes, most notably the realtime (RR/FIFO/DEADLINE) classes. The goal is to bridge the performance gap with Voluntary, such that we might eventually remove that option entirely. Suggested-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Sebastian Andrzej Siewior --- include/linux/preempt.h | 8 ++++- kernel/Kconfig.preempt | 15 +++++++++ kernel/sched/core.c | 76 +++++++++++++++++++++++++++++++++++++++++++= +++-- kernel/sched/debug.c | 5 +-- kernel/sched/fair.c | 6 +-- kernel/sched/sched.h | 1=20 6 files changed, 103 insertions(+), 8 deletions(-) --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -486,6 +486,7 @@ DEFINE_LOCK_GUARD_0(migrate, migrate_dis extern bool preempt_model_none(void); extern bool preempt_model_voluntary(void); extern bool preempt_model_full(void); +extern bool preempt_model_lazy(void); =20 #else =20 @@ -502,6 +503,11 @@ static inline bool preempt_model_full(vo return IS_ENABLED(CONFIG_PREEMPT); } =20 +static inline bool preempt_model_lazy(void) +{ + return IS_ENABLED(CONFIG_PREEMPT_LAZY); +} + #endif =20 static inline bool preempt_model_rt(void) @@ -519,7 +525,7 @@ static inline bool preempt_model_rt(void */ static inline bool preempt_model_preemptible(void) { - return preempt_model_full() || preempt_model_rt(); + return preempt_model_full() || preempt_model_lazy() || preempt_model_rt(); } =20 #endif /* __LINUX_PREEMPT_H */ --- a/kernel/Kconfig.preempt +++ b/kernel/Kconfig.preempt @@ -11,6 +11,9 @@ config PREEMPT_BUILD select PREEMPTION select UNINLINE_SPIN_UNLOCK if !ARCH_INLINE_SPIN_UNLOCK =20 +config ARCH_HAS_PREEMPT_LAZY + bool + choice prompt "Preemption Model" default PREEMPT_NONE @@ -67,6 +70,18 @@ config PREEMPT embedded system with latency requirements in the milliseconds range. =20 +config PREEMPT_LAZY + bool "Scheduler controlled preemption model" + depends on !ARCH_NO_PREEMPT + depends on ARCH_HAS_PREEMPT_LAZY + select PREEMPT_BUILD + help + This option provides a scheduler driven preemption model that + is fundamentally similar to full preemption, but is less + eager to preempt SCHED_NORMAL tasks in an attempt to + reduce lock holder preemption and recover some of the performance + gains seen from using Voluntary preemption. + config PREEMPT_RT bool "Fully Preemptible Kernel (Real-Time)" depends on EXPERT && ARCH_SUPPORTS_RT --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1078,6 +1078,9 @@ static void __resched_curr(struct rq *rq =20 lockdep_assert_rq_held(rq); =20 + if (is_idle_task(curr) && tif =3D=3D TIF_NEED_RESCHED_LAZY) + tif =3D TIF_NEED_RESCHED; + if (cti->flags & ((1 << tif) | _TIF_NEED_RESCHED)) return; =20 @@ -1103,6 +1106,32 @@ void resched_curr(struct rq *rq) __resched_curr(rq, TIF_NEED_RESCHED); } =20 +#ifdef CONFIG_PREEMPT_DYNAMIC +static DEFINE_STATIC_KEY_FALSE(sk_dynamic_preempt_lazy); +static __always_inline bool dynamic_preempt_lazy(void) +{ + return static_branch_unlikely(&sk_dynamic_preempt_lazy); +} +#else +static __always_inline bool dynamic_preempt_lazy(void) +{ + return IS_ENABLED(PREEMPT_LAZY); +} +#endif + +static __always_inline int tif_need_resched_lazy(void) +{ + if (dynamic_preempt_lazy()) + return TIF_NEED_RESCHED_LAZY; + + return TIF_NEED_RESCHED; +} + +void resched_curr_lazy(struct rq *rq) +{ + __resched_curr(rq, tif_need_resched_lazy()); +} + void resched_cpu(int cpu) { struct rq *rq =3D cpu_rq(cpu); @@ -5598,6 +5627,10 @@ void sched_tick(void) update_rq_clock(rq); hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq)); update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure); + + if (dynamic_preempt_lazy() && tif_test_bit(TIF_NEED_RESCHED_LAZY)) + resched_curr(rq); + curr->sched_class->task_tick(rq, curr, 0); if (sched_feat(LATENCY_WARN)) resched_latency =3D cpu_resched_latency(rq); @@ -7334,6 +7367,7 @@ EXPORT_SYMBOL(__cond_resched_rwlock_writ * preempt_schedule <- NOP * preempt_schedule_notrace <- NOP * irqentry_exit_cond_resched <- NOP + * dynamic_preempt_lazy <- false * * VOLUNTARY: * cond_resched <- __cond_resched @@ -7341,6 +7375,7 @@ EXPORT_SYMBOL(__cond_resched_rwlock_writ * preempt_schedule <- NOP * preempt_schedule_notrace <- NOP * irqentry_exit_cond_resched <- NOP + * dynamic_preempt_lazy <- false * * FULL: * cond_resched <- RET0 @@ -7348,6 +7383,15 @@ EXPORT_SYMBOL(__cond_resched_rwlock_writ * preempt_schedule <- preempt_schedule * preempt_schedule_notrace <- preempt_schedule_notrace * irqentry_exit_cond_resched <- irqentry_exit_cond_resched + * dynamic_preempt_lazy <- false + * + * LAZY: + * cond_resched <- RET0 + * might_resched <- RET0 + * preempt_schedule <- preempt_schedule + * preempt_schedule_notrace <- preempt_schedule_notrace + * irqentry_exit_cond_resched <- irqentry_exit_cond_resched + * dynamic_preempt_lazy <- true */ =20 enum { @@ -7355,6 +7399,7 @@ enum { preempt_dynamic_none, preempt_dynamic_voluntary, preempt_dynamic_full, + preempt_dynamic_lazy, }; =20 int preempt_dynamic_mode =3D preempt_dynamic_undefined; @@ -7370,15 +7415,23 @@ int sched_dynamic_mode(const char *str) if (!strcmp(str, "full")) return preempt_dynamic_full; =20 +#ifdef CONFIG_ARCH_HAS_PREEMPT_LAZY + if (!strcmp(str, "lazy")) + return preempt_dynamic_lazy; +#endif + return -EINVAL; } =20 +#define preempt_dynamic_key_enable(f) static_key_enable(&sk_dynamic_##f.ke= y) +#define preempt_dynamic_key_disable(f) static_key_disable(&sk_dynamic_##f.= key) + #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) #define preempt_dynamic_enable(f) static_call_update(f, f##_dynamic_enable= d) #define preempt_dynamic_disable(f) static_call_update(f, f##_dynamic_disab= led) #elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY) -#define preempt_dynamic_enable(f) static_key_enable(&sk_dynamic_##f.key) -#define preempt_dynamic_disable(f) static_key_disable(&sk_dynamic_##f.key) +#define preempt_dynamic_enable(f) preempt_dynamic_key_enable(f) +#define preempt_dynamic_disable(f) preempt_dynamic_key_disable(f) #else #error "Unsupported PREEMPT_DYNAMIC mechanism" #endif @@ -7398,6 +7451,7 @@ static void __sched_dynamic_update(int m preempt_dynamic_enable(preempt_schedule); preempt_dynamic_enable(preempt_schedule_notrace); preempt_dynamic_enable(irqentry_exit_cond_resched); + preempt_dynamic_key_disable(preempt_lazy); =20 switch (mode) { case preempt_dynamic_none: @@ -7407,6 +7461,7 @@ static void __sched_dynamic_update(int m preempt_dynamic_disable(preempt_schedule); preempt_dynamic_disable(preempt_schedule_notrace); preempt_dynamic_disable(irqentry_exit_cond_resched); + preempt_dynamic_key_disable(preempt_lazy); if (mode !=3D preempt_dynamic_mode) pr_info("Dynamic Preempt: none\n"); break; @@ -7418,6 +7473,7 @@ static void __sched_dynamic_update(int m preempt_dynamic_disable(preempt_schedule); preempt_dynamic_disable(preempt_schedule_notrace); preempt_dynamic_disable(irqentry_exit_cond_resched); + preempt_dynamic_key_disable(preempt_lazy); if (mode !=3D preempt_dynamic_mode) pr_info("Dynamic Preempt: voluntary\n"); break; @@ -7429,9 +7485,22 @@ static void __sched_dynamic_update(int m preempt_dynamic_enable(preempt_schedule); preempt_dynamic_enable(preempt_schedule_notrace); preempt_dynamic_enable(irqentry_exit_cond_resched); + preempt_dynamic_key_disable(preempt_lazy); if (mode !=3D preempt_dynamic_mode) pr_info("Dynamic Preempt: full\n"); break; + + case preempt_dynamic_lazy: + if (!klp_override) + preempt_dynamic_disable(cond_resched); + preempt_dynamic_disable(might_resched); + preempt_dynamic_enable(preempt_schedule); + preempt_dynamic_enable(preempt_schedule_notrace); + preempt_dynamic_enable(irqentry_exit_cond_resched); + preempt_dynamic_key_enable(preempt_lazy); + if (mode !=3D preempt_dynamic_mode) + pr_info("Dynamic Preempt: lazy\n"); + break; } =20 preempt_dynamic_mode =3D mode; @@ -7494,6 +7563,8 @@ static void __init preempt_dynamic_init( sched_dynamic_update(preempt_dynamic_none); } else if (IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY)) { sched_dynamic_update(preempt_dynamic_voluntary); + } else if (IS_ENABLED(CONFIG_PREEMPT_LAZY)) { + sched_dynamic_update(preempt_dynamic_lazy); } else { /* Default static call setting, nothing to do */ WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPT)); @@ -7514,6 +7585,7 @@ static void __init preempt_dynamic_init( PREEMPT_MODEL_ACCESSOR(none); PREEMPT_MODEL_ACCESSOR(voluntary); PREEMPT_MODEL_ACCESSOR(full); +PREEMPT_MODEL_ACCESSOR(lazy); =20 #else /* !CONFIG_PREEMPT_DYNAMIC: */ =20 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -245,11 +245,12 @@ static ssize_t sched_dynamic_write(struc static int sched_dynamic_show(struct seq_file *m, void *v) { static const char * preempt_modes[] =3D { - "none", "voluntary", "full" + "none", "voluntary", "full", "lazy", }; + int j =3D ARRAY_SIZE(preempt_modes) - !IS_ENABLED(CONFIG_ARCH_HAS_PREEMPT= _LAZY); int i; =20 - for (i =3D 0; i < ARRAY_SIZE(preempt_modes); i++) { + for (i =3D 0; i < j; i++) { if (preempt_dynamic_mode =3D=3D i) seq_puts(m, "("); seq_puts(m, preempt_modes[i]); --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1251,7 +1251,7 @@ static void update_curr(struct cfs_rq *c return; =20 if (resched || did_preempt_short(cfs_rq, curr)) { - resched_curr(rq); + resched_curr_lazy(rq); clear_buddies(cfs_rq, curr); } } @@ -5677,7 +5677,7 @@ entity_tick(struct cfs_rq *cfs_rq, struc * validating it and just reschedule. */ if (queued) { - resched_curr(rq_of(cfs_rq)); + resched_curr_lazy(rq_of(cfs_rq)); return; } /* @@ -8832,7 +8832,7 @@ static void check_preempt_wakeup_fair(st return; =20 preempt: - resched_curr(rq); + resched_curr_lazy(rq); } =20 static struct task_struct *pick_task_fair(struct rq *rq) --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2692,6 +2692,7 @@ extern void init_sched_rt_class(void); extern void init_sched_fair_class(void); =20 extern void resched_curr(struct rq *rq); +extern void resched_curr_lazy(struct rq *rq); extern void resched_cpu(int cpu); =20 extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 r= untime); From nobody Wed Nov 27 23:32:06 2024 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E396184522 for ; Mon, 7 Oct 2024 07:51:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728287519; cv=none; b=nbQmzNCXGjfxdt9EeVRowfij1zk1NlcRRRn+MfkSs36id8hX2iml+NmKG0myT+40hPt8IywCywm7LGaUWlfaCwS5LFv4XSXzWOq+xKLuhOJqveW3kbrruio5tlpU/sdmtDYjrqha0aRnwO0OsrcdLRdwVhnr5pwA+jG3enz7go0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728287519; c=relaxed/simple; bh=yXvhGBWJj3vTIaZTJSTgiTmquqXpLO3clf7EJDHrNMU=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=mj0aJzoRfkyknu0SgRhpKvILejACSMTbkXIcBqV7GsnAKqUlRnnFpQQUz4aXBruYYxkaIoTN2dxmNtrK4SEzLvEaAQji9NXFFlvUuncSRHEqsEAe2rWA12dj3y8i7nsw06X6YpZI3nt/3w9nojIkbPQxAzZKhXHkkSR2vo0huMc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=X6jzqMxQ; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="X6jzqMxQ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=5lWqTRF2EJbYpKSaa6UVs0aHY6F90ciF42uvCB5HDnM=; b=X6jzqMxQpbTDUoMSbgB50nSt1E VT4b8uKfFGvCo6MhQ+5TBFsCqH6ak7lz1KlzdZKR366vKqiTVHSzkXJY9CwIUVFsH+OkCQs0a7tp+ Apa9PWA6XMRsKnN2xX5WecULEwJt0WFyOos9WgsGF+I/bhvZ1K/ltj1ZsODJQVGw9OnUToyv33oLv MnKFJGMzpgWAYYllWysTfhu23Dk9q4mz4nq2Q4q0Ok8l/T0cp8r/n84I5ColJT/jHOVzBNhlPQ0DS 4a4Waj3CwcQ7u1/XMlx7Pqkk6UQquIbvBnih6x2eLithW8WYoQqDQlXKy91dIAoggqgBTyLBIck/1 KVhDb6kQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1sxiWu-0000000Gvkx-2H3b; Mon, 07 Oct 2024 07:51:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 3B765302DAF; Mon, 7 Oct 2024 09:51:44 +0200 (CEST) Message-Id: <20241007075055.441622332@infradead.org> User-Agent: quilt/0.65 Date: Mon, 07 Oct 2024 09:46:12 +0200 From: Peter Zijlstra To: bigeasy@linutronix.de, tglx@linutronix.de, mingo@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, ankur.a.arora@oracle.com, efault@gmx.de Subject: [PATCH 3/5] sched: Enable PREEMPT_DYNAMIC for PREEMPT_RT References: <20241007074609.447006177@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to enable PREEMPT_DYNAMIC for PREEMPT_RT, remove PREEMPT_RT from the 'Preemption Model' choice. Strictly speaking PREEMPT_RT is not a change in how preemption works, but rather it makes a ton more code preemptible. Notably, take away NONE and VOLATILE options for PREEMPT_RT, they make no sense (but are techincally possible). Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Sebastian Andrzej Siewior --- kernel/Kconfig.preempt | 12 +++++++----- kernel/sched/core.c | 2 ++ kernel/sched/debug.c | 4 ++-- 3 files changed, 11 insertions(+), 7 deletions(-) --- a/kernel/Kconfig.preempt +++ b/kernel/Kconfig.preempt @@ -20,6 +20,7 @@ choice =20 config PREEMPT_NONE bool "No Forced Preemption (Server)" + depends on !PREEMPT_RT select PREEMPT_NONE_BUILD if !PREEMPT_DYNAMIC help This is the traditional Linux preemption model, geared towards @@ -35,6 +36,7 @@ config PREEMPT_NONE config PREEMPT_VOLUNTARY bool "Voluntary Kernel Preemption (Desktop)" depends on !ARCH_NO_PREEMPT + depends on !PREEMPT_RT select PREEMPT_VOLUNTARY_BUILD if !PREEMPT_DYNAMIC help This option reduces the latency of the kernel by adding more @@ -54,7 +56,7 @@ config PREEMPT_VOLUNTARY config PREEMPT bool "Preemptible Kernel (Low-Latency Desktop)" depends on !ARCH_NO_PREEMPT - select PREEMPT_BUILD + select PREEMPT_BUILD if !PREEMPT_DYNAMIC help This option reduces the latency of the kernel by making all kernel code (that is not executing in a critical section) @@ -74,7 +76,7 @@ config PREEMPT_LAZY bool "Scheduler controlled preemption model" depends on !ARCH_NO_PREEMPT depends on ARCH_HAS_PREEMPT_LAZY - select PREEMPT_BUILD + select PREEMPT_BUILD if !PREEMPT_DYNAMIC help This option provides a scheduler driven preemption model that is fundamentally similar to full preemption, but is less @@ -82,6 +84,8 @@ config PREEMPT_LAZY reduce lock holder preemption and recover some of the performance gains seen from using Voluntary preemption. =20 +endchoice + config PREEMPT_RT bool "Fully Preemptible Kernel (Real-Time)" depends on EXPERT && ARCH_SUPPORTS_RT @@ -99,8 +103,6 @@ config PREEMPT_RT Select this if you are building a kernel for systems which require real-time guarantees. =20 -endchoice - config PREEMPT_COUNT bool =20 @@ -110,7 +112,7 @@ config PREEMPTION =20 config PREEMPT_DYNAMIC bool "Preemption behaviour defined on boot" - depends on HAVE_PREEMPT_DYNAMIC && !PREEMPT_RT + depends on HAVE_PREEMPT_DYNAMIC select JUMP_LABEL if HAVE_PREEMPT_DYNAMIC_KEY select PREEMPT_BUILD default y if HAVE_PREEMPT_DYNAMIC_CALL --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7406,11 +7406,13 @@ int preempt_dynamic_mode =3D preempt_dynam =20 int sched_dynamic_mode(const char *str) { +#ifndef CONFIG_PREEMPT_RT if (!strcmp(str, "none")) return preempt_dynamic_none; =20 if (!strcmp(str, "voluntary")) return preempt_dynamic_voluntary; +#endif =20 if (!strcmp(str, "full")) return preempt_dynamic_full; --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -248,9 +248,9 @@ static int sched_dynamic_show(struct seq "none", "voluntary", "full", "lazy", }; int j =3D ARRAY_SIZE(preempt_modes) - !IS_ENABLED(CONFIG_ARCH_HAS_PREEMPT= _LAZY); - int i; + int i =3D IS_ENABLED(CONFIG_PREEMPT_RT) * 2; =20 - for (i =3D 0; i < j; i++) { + for (; i < j; i++) { if (preempt_dynamic_mode =3D=3D i) seq_puts(m, "("); seq_puts(m, preempt_modes[i]); From nobody Wed Nov 27 23:32:06 2024 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 99E7E18C003 for ; Mon, 7 Oct 2024 07:51:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728287519; cv=none; b=BPCy0eXg7NKcZgEPeT0kmaPtXl6G/dLHRhIQGn9s0JXdb3NcZ52azvdM47wky2WiMclAF44S4YvyeTmipQw/3uQwD7UmiD/knexD7fZzlv89sV77uaxEjvm2qBxKruubAbHk9dOzyhAKw6rq5KgjPjShRIJFRmL6iVwWRF81c5E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728287519; c=relaxed/simple; bh=hPUmOF8kOQtM8IO4R5KUqWzoI144Spws1jLDFFYZico=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=gD1yr+M3ISUry6oU4IpF4qjEI8VzlaHSsQZJcaRhA2ynWHYu1MVRavrAS2lGVXnvmmmBIlYNoFNbSaQzS20070gjYPMKB6j5eyCVIJyOiEftKl9eRH+aV5om6uQhbDOS0ge/+PDUcgqfibx6KU1zZ9oT8THEajDZRLES4HLSaI8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=a5D49t7E; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="a5D49t7E" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=3bI3Q70c2k0AtlpFzd03paIarv2tCsNPmOKp+gkKSIQ=; b=a5D49t7E9mh74ks4w7aiCnLqMj MHdh65GOs0k7++7+CXUOI+3NPOQixDEkoRr5QMXjJEKkJXvsjT8rrMumv423OzKai/1Zz8Ixo/KYR buCIm+hFEMrm4tx1wAGA9Xz45/Lk/Gq1nAOrtpHVJC9+mJe8TTaft+klKfyXXtvZxQ6e7R9F6D9VP nyJ6nNtVoBsmJHvSR6MlV1JHFYIRsQD3ZSgJErpczkZtgLLOF369jx956qA/QbwA0q/AngzVhGjbR WbQ9R1u9o/v/SCU/NuNt+6H8PbDRobk98Fs6KH1ENDbmEwUQzKedKR/O4NAMBUhMgFXI/xl/UoaJd 2IgxDKMA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1sxiWu-00000004NuY-3xE2; Mon, 07 Oct 2024 07:51:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 3FD0B302DD2; Mon, 7 Oct 2024 09:51:44 +0200 (CEST) Message-Id: <20241007075055.555778919@infradead.org> User-Agent: quilt/0.65 Date: Mon, 07 Oct 2024 09:46:13 +0200 From: Peter Zijlstra To: bigeasy@linutronix.de, tglx@linutronix.de, mingo@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, ankur.a.arora@oracle.com, efault@gmx.de Subject: [PATCH 4/5] sched, x86: Enable Lazy preemption References: <20241007074609.447006177@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the TIF bit and select the Kconfig symbol to make it go. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Sebastian Andrzej Siewior --- arch/x86/Kconfig | 1 + arch/x86/include/asm/thread_info.h | 6 ++++-- 2 files changed, 5 insertions(+), 2 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -93,6 +93,7 @@ config X86 select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PMEM_API if X86_64 + select ARCH_HAS_PREEMPT_LAZY select ARCH_HAS_PTE_DEVMAP if X86_64 select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_HW_PTE_YOUNG --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -87,8 +87,9 @@ struct thread_info { #define TIF_NOTIFY_RESUME 1 /* callback before returning to user */ #define TIF_SIGPENDING 2 /* signal pending */ #define TIF_NEED_RESCHED 3 /* rescheduling necessary */ -#define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/ -#define TIF_SSBD 5 /* Speculative store bypass disable */ +#define TIF_NEED_RESCHED_LAZY 4 /* rescheduling necessary */ +#define TIF_SINGLESTEP 5 /* reenable singlestep on user return*/ +#define TIF_SSBD 6 /* Speculative store bypass disable */ #define TIF_SPEC_IB 9 /* Indirect branch speculation mitigation */ #define TIF_SPEC_L1D_FLUSH 10 /* Flush L1D on mm switches (processes) */ #define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */ @@ -110,6 +111,7 @@ struct thread_info { #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) +#define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY) #define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP) #define _TIF_SSBD (1 << TIF_SSBD) #define _TIF_SPEC_IB (1 << TIF_SPEC_IB) From nobody Wed Nov 27 23:32:06 2024 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C322018C004 for ; Mon, 7 Oct 2024 07:51:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728287519; cv=none; b=QNstbZdewWNyGK7mRcehNoKndw9AclDdAYCier7jWeFXlAKiD790ZWKMABMl477MWr0CUw40+2q0jr0lVjH1hJqfRw6GyAElPiFT0iW7scE9siNu0vwVE5uYAVI0L3DvLunHtMaEyHRiGWm8LsuU4Ka+VFPsh/G48OXDjJXoF3g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728287519; c=relaxed/simple; bh=Fm2SnhLBB/ZhkbPVFzxg9IjtEZtZTmWGt4gdXLhmxF8=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=u3tv/TIok30PaZSirVU1iZXgZcp+Wt8CW7jr3I3zDbxdzpcblSymvREn1X2k+7LAepKOP968sJe8/LUnScftAkWdFZPsVUEsm2fjKxWJGWqVoSa1FFjKJWoL4Qc4rs7o8+YyCcXCMvBWJCLPRSE+NNLC6hiyrKqRne8J0csLJEg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=h48XuwlU; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="h48XuwlU" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=ippGUddcEenIDxPXrEM34DuRJYRL1QRqJM/qeMRfePg=; b=h48XuwlUhr7/lW2lcaE/pyPdcV 2sRXRxWGl59o0/BQmaPSlPcpsOkYOGs/Out9uIissl1EwV4YUvW1kTkYFGsRK9lNEPEEFOBU4BpM5 LiaIo7XeB5FAsEU/yqnHtsmkU6G2+kHJ0Yi+y3KcMdTRnc/RIzci5BK4SRlutm7nqqpFK3iZ3X9rr XOcei9lkX09YzTdjzW3Kpw+hxiQ03/dzGFewW+dQSaBCfEaC+neFr5WRL9DJXMdNLoG+f5+kuwoXD zY0MMGNacPQv3nuHPxkeyHV/n0qvzFl27p1fC/+mVnM8wbpy8bbkC4JgGy/6VGAjS1oPUD6qFlOZH tmflZUyA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1sxiWv-00000004Nud-2Ske; Mon, 07 Oct 2024 07:51:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 44BAE302DFF; Mon, 7 Oct 2024 09:51:44 +0200 (CEST) Message-Id: <20241007075055.671722644@infradead.org> User-Agent: quilt/0.65 Date: Mon, 07 Oct 2024 09:46:14 +0200 From: Peter Zijlstra To: bigeasy@linutronix.de, tglx@linutronix.de, mingo@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, ankur.a.arora@oracle.com, efault@gmx.de Subject: [PATCH 5/5] sched: Add laziest preempt model References: <20241007074609.447006177@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Much like LAZY, except lazier still. It will not promote LAZY to full preempt on tick and compete with None for suckage. (do we really wants this?) Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Sebastian Andrzej Siewior --- include/linux/preempt.h | 10 ++++++++- kernel/Kconfig.preempt | 12 +++++++++++ kernel/sched/core.c | 49 +++++++++++++++++++++++++++++++++++++++++++= ++++- kernel/sched/debug.c | 4 +-- 4 files changed, 71 insertions(+), 4 deletions(-) --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -487,6 +487,7 @@ extern bool preempt_model_none(void); extern bool preempt_model_voluntary(void); extern bool preempt_model_full(void); extern bool preempt_model_lazy(void); +extern bool preempt_model_laziest(void); =20 #else =20 @@ -507,6 +508,10 @@ static inline bool preempt_model_lazy(vo { return IS_ENABLED(CONFIG_PREEMPT_LAZY); } +static inline bool preempt_model_laziest(void) +{ + return IS_ENABLED(CONFIG_PREEMPT_LAZIEST); +} =20 #endif =20 @@ -525,7 +530,10 @@ static inline bool preempt_model_rt(void */ static inline bool preempt_model_preemptible(void) { - return preempt_model_full() || preempt_model_lazy() || preempt_model_rt(); + return preempt_model_full() || + preempt_model_lazy() || + preempt_model_laziest() || + preempt_model_rt(); } =20 #endif /* __LINUX_PREEMPT_H */ --- a/kernel/Kconfig.preempt +++ b/kernel/Kconfig.preempt @@ -84,6 +84,18 @@ config PREEMPT_LAZY reduce lock holder preemption and recover some of the performance gains seen from using Voluntary preemption. =20 +config PREEMPT_LAZIEST + bool "Scheduler controlled preemption model" + depends on !ARCH_NO_PREEMPT + depends on ARCH_HAS_PREEMPT_LAZY + select PREEMPT_BUILD if !PREEMPT_DYNAMIC + help + This option provides a scheduler driven preemption model that + is fundamentally similar to full preemption, but is least + eager to preempt SCHED_NORMAL tasks in an attempt to + reduce lock holder preemption and recover some of the performance + gains seen from using no preemption. + endchoice =20 config PREEMPT_RT --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1108,13 +1108,22 @@ void resched_curr(struct rq *rq) =20 #ifdef CONFIG_PREEMPT_DYNAMIC static DEFINE_STATIC_KEY_FALSE(sk_dynamic_preempt_lazy); +static DEFINE_STATIC_KEY_FALSE(sk_dynamic_preempt_promote); static __always_inline bool dynamic_preempt_lazy(void) { return static_branch_unlikely(&sk_dynamic_preempt_lazy); } +static __always_inline bool dynamic_preempt_promote(void) +{ + return static_branch_unlikely(&sk_dynamic_preempt_promote); +} #else static __always_inline bool dynamic_preempt_lazy(void) { + return IS_ENABLED(PREEMPT_LAZY) | IS_ENABLED(PREEMPT_LAZIEST); +} +static __always_inline bool dynamic_preempt_promote(void) +{ return IS_ENABLED(PREEMPT_LAZY); } #endif @@ -5628,7 +5637,7 @@ void sched_tick(void) hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq)); update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure); =20 - if (dynamic_preempt_lazy() && tif_test_bit(TIF_NEED_RESCHED_LAZY)) + if (dynamic_preempt_promote() && tif_test_bit(TIF_NEED_RESCHED_LAZY)) resched_curr(rq); =20 curr->sched_class->task_tick(rq, curr, 0); @@ -7368,6 +7377,7 @@ EXPORT_SYMBOL(__cond_resched_rwlock_writ * preempt_schedule_notrace <- NOP * irqentry_exit_cond_resched <- NOP * dynamic_preempt_lazy <- false + * dynamic_preempt_promote <- false * * VOLUNTARY: * cond_resched <- __cond_resched @@ -7376,6 +7386,7 @@ EXPORT_SYMBOL(__cond_resched_rwlock_writ * preempt_schedule_notrace <- NOP * irqentry_exit_cond_resched <- NOP * dynamic_preempt_lazy <- false + * dynamic_preempt_promote <- false * * FULL: * cond_resched <- RET0 @@ -7384,6 +7395,7 @@ EXPORT_SYMBOL(__cond_resched_rwlock_writ * preempt_schedule_notrace <- preempt_schedule_notrace * irqentry_exit_cond_resched <- irqentry_exit_cond_resched * dynamic_preempt_lazy <- false + * dynamic_preempt_promote <- false * * LAZY: * cond_resched <- RET0 @@ -7392,6 +7404,16 @@ EXPORT_SYMBOL(__cond_resched_rwlock_writ * preempt_schedule_notrace <- preempt_schedule_notrace * irqentry_exit_cond_resched <- irqentry_exit_cond_resched * dynamic_preempt_lazy <- true + * dynamic_preempt_promote <- true + * + * LAZIEST: + * cond_resched <- RET0 + * might_resched <- RET0 + * preempt_schedule <- preempt_schedule + * preempt_schedule_notrace <- preempt_schedule_notrace + * irqentry_exit_cond_resched <- irqentry_exit_cond_resched + * dynamic_preempt_lazy <- true + * dynamic_preempt_promote <- false */ =20 enum { @@ -7400,6 +7422,7 @@ enum { preempt_dynamic_voluntary, preempt_dynamic_full, preempt_dynamic_lazy, + preempt_dynamic_laziest, }; =20 int preempt_dynamic_mode =3D preempt_dynamic_undefined; @@ -7420,6 +7443,9 @@ int sched_dynamic_mode(const char *str) #ifdef CONFIG_ARCH_HAS_PREEMPT_LAZY if (!strcmp(str, "lazy")) return preempt_dynamic_lazy; + + if (!strcmp(str, "laziest")) + return preempt_dynamic_laziest; #endif =20 return -EINVAL; @@ -7454,6 +7480,7 @@ static void __sched_dynamic_update(int m preempt_dynamic_enable(preempt_schedule_notrace); preempt_dynamic_enable(irqentry_exit_cond_resched); preempt_dynamic_key_disable(preempt_lazy); + preempt_dynamic_key_disable(preempt_promote); =20 switch (mode) { case preempt_dynamic_none: @@ -7464,6 +7491,7 @@ static void __sched_dynamic_update(int m preempt_dynamic_disable(preempt_schedule_notrace); preempt_dynamic_disable(irqentry_exit_cond_resched); preempt_dynamic_key_disable(preempt_lazy); + preempt_dynamic_key_disable(preempt_promote); if (mode !=3D preempt_dynamic_mode) pr_info("Dynamic Preempt: none\n"); break; @@ -7476,6 +7504,7 @@ static void __sched_dynamic_update(int m preempt_dynamic_disable(preempt_schedule_notrace); preempt_dynamic_disable(irqentry_exit_cond_resched); preempt_dynamic_key_disable(preempt_lazy); + preempt_dynamic_key_disable(preempt_promote); if (mode !=3D preempt_dynamic_mode) pr_info("Dynamic Preempt: voluntary\n"); break; @@ -7488,6 +7517,7 @@ static void __sched_dynamic_update(int m preempt_dynamic_enable(preempt_schedule_notrace); preempt_dynamic_enable(irqentry_exit_cond_resched); preempt_dynamic_key_disable(preempt_lazy); + preempt_dynamic_key_disable(preempt_promote); if (mode !=3D preempt_dynamic_mode) pr_info("Dynamic Preempt: full\n"); break; @@ -7500,9 +7530,23 @@ static void __sched_dynamic_update(int m preempt_dynamic_enable(preempt_schedule_notrace); preempt_dynamic_enable(irqentry_exit_cond_resched); preempt_dynamic_key_enable(preempt_lazy); + preempt_dynamic_key_enable(preempt_promote); if (mode !=3D preempt_dynamic_mode) pr_info("Dynamic Preempt: lazy\n"); break; + + case preempt_dynamic_laziest: + if (!klp_override) + preempt_dynamic_disable(cond_resched); + preempt_dynamic_disable(might_resched); + preempt_dynamic_enable(preempt_schedule); + preempt_dynamic_enable(preempt_schedule_notrace); + preempt_dynamic_enable(irqentry_exit_cond_resched); + preempt_dynamic_key_enable(preempt_lazy); + preempt_dynamic_key_disable(preempt_promote); + if (mode !=3D preempt_dynamic_mode) + pr_info("Dynamic Preempt: laziest\n"); + break; } =20 preempt_dynamic_mode =3D mode; @@ -7567,6 +7611,8 @@ static void __init preempt_dynamic_init( sched_dynamic_update(preempt_dynamic_voluntary); } else if (IS_ENABLED(CONFIG_PREEMPT_LAZY)) { sched_dynamic_update(preempt_dynamic_lazy); + } else if (IS_ENABLED(CONFIG_PREEMPT_LAZIEST)) { + sched_dynamic_update(preempt_dynamic_laziest); } else { /* Default static call setting, nothing to do */ WARN_ON_ONCE(!IS_ENABLED(CONFIG_PREEMPT)); @@ -7588,6 +7634,7 @@ PREEMPT_MODEL_ACCESSOR(none); PREEMPT_MODEL_ACCESSOR(voluntary); PREEMPT_MODEL_ACCESSOR(full); PREEMPT_MODEL_ACCESSOR(lazy); +PREEMPT_MODEL_ACCESSOR(laziest); =20 #else /* !CONFIG_PREEMPT_DYNAMIC: */ =20 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -245,9 +245,9 @@ static ssize_t sched_dynamic_write(struc static int sched_dynamic_show(struct seq_file *m, void *v) { static const char * preempt_modes[] =3D { - "none", "voluntary", "full", "lazy", + "none", "voluntary", "full", "lazy", "laziest", }; - int j =3D ARRAY_SIZE(preempt_modes) - !IS_ENABLED(CONFIG_ARCH_HAS_PREEMPT= _LAZY); + int j =3D ARRAY_SIZE(preempt_modes) - 2*!IS_ENABLED(CONFIG_ARCH_HAS_PREEM= PT_LAZY); int i =3D IS_ENABLED(CONFIG_PREEMPT_RT) * 2; =20 for (; i < j; i++) {