Ensure that kcov is disabled and enabled with the same call stack.
This will be relied on by subsequent patches for recording function
entry/exit records via kcov.
This patch should not affect compilation of normal kernels without KCOV
(though it changes "inline" to "__always_inline").
To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jann Horn <jannh@google.com>
---
kernel/sched/core.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7f77c165a6e..c470f0a669ec 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5072,8 +5072,10 @@ static inline void kmap_local_sched_in(void)
*
* prepare_task_switch sets up locking and calls architecture specific
* hooks.
+ *
+ * Must be inlined for kcov_prepare_switch().
*/
-static inline void
+static __always_inline void
prepare_task_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
__must_hold(__rq_lockp(rq))
@@ -5149,7 +5151,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
tick_nohz_task_switch();
finish_lock_switch(rq);
finish_arch_post_lock_switch();
- kcov_finish_switch(current);
/*
* kmap_local_sched_out() is invoked with rq::lock held and
* interrupts disabled. There is no requirement for that, but the
@@ -5295,7 +5296,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
switch_to(prev, next, prev);
barrier();
- return finish_task_switch(prev);
+ rq = finish_task_switch(prev);
+ /*
+ * This has to happen outside finish_task_switch() to ensure that
+ * entry/exit records are balanced.
+ */
+ kcov_finish_switch(current);
+ return rq;
}
/*
--
2.53.0.851.ga537e3e6e9-goog
On Wed, Mar 18, 2026 at 05:26:58PM +0100, Jann Horn wrote:
> Ensure that kcov is disabled and enabled with the same call stack.
> This will be relied on by subsequent patches for recording function
> entry/exit records via kcov.
>
> This patch should not affect compilation of normal kernels without KCOV
> (though it changes "inline" to "__always_inline").
>
> To: Ingo Molnar <mingo@redhat.com>
> To: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> kernel/sched/core.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b7f77c165a6e..c470f0a669ec 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5072,8 +5072,10 @@ static inline void kmap_local_sched_in(void)
> *
> * prepare_task_switch sets up locking and calls architecture specific
> * hooks.
> + *
> + * Must be inlined for kcov_prepare_switch().
> */
> -static inline void
> +static __always_inline void
> prepare_task_switch(struct rq *rq, struct task_struct *prev,
> struct task_struct *next)
> __must_hold(__rq_lockp(rq))
> @@ -5149,7 +5151,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> tick_nohz_task_switch();
> finish_lock_switch(rq);
> finish_arch_post_lock_switch();
> - kcov_finish_switch(current);
> /*
> * kmap_local_sched_out() is invoked with rq::lock held and
> * interrupts disabled. There is no requirement for that, but the
> @@ -5295,7 +5296,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
> switch_to(prev, next, prev);
> barrier();
>
> - return finish_task_switch(prev);
> + rq = finish_task_switch(prev);
> + /*
> + * This has to happen outside finish_task_switch() to ensure that
> + * entry/exit records are balanced.
> + */
That's not exactly right; the requirement is that kcov_prepare_switch()
and kcov_finish_switch() are called from the exact same frame.
The wording above "outside finish_task_switch" could be anywhere and
doesn't cover the relation to prepare_switch().
> + kcov_finish_switch(current);
> + return rq;
> }
That said; there was a patch that marked finish_task_switch() as
__always_inline too:
https://lkml.kernel.org/r/20260301083520.110969-4-qq570070308@gmail.com
Except I think that does a little too much for one patch.
Anyway, I'm a little divided on this. Perhaps the simplest and most
obvious way is something like so.
But what about compiler funnies like the various IPA optimizations that
can do partial clones and whatnot? That could result in violating this
constraint, no?
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6e509e292f99..d9925220d51b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5135,7 +5135,6 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
__must_hold(__rq_lockp(rq))
{
- kcov_prepare_switch(prev);
sched_info_switch(rq, prev, next);
perf_event_task_sched_out(prev, next);
fire_sched_out_preempt_notifiers(prev, next);
@@ -5206,7 +5205,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
tick_nohz_task_switch();
finish_lock_switch(rq);
finish_arch_post_lock_switch();
- kcov_finish_switch(current);
/*
* kmap_local_sched_out() is invoked with rq::lock held and
* interrupts disabled. There is no requirement for that, but the
@@ -5294,6 +5292,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next, struct rq_flags *rf)
__releases(__rq_lockp(rq))
{
+ kcov_prepare_switch(prev);
prepare_task_switch(rq, prev, next);
/*
@@ -5352,7 +5351,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
switch_to(prev, next, prev);
barrier();
- return finish_task_switch(prev);
+ rq = finish_task_switch(prev);
+ /*
+ * kcov_prepare_switch() above, and kcov_finish_switch() must be
+ * called from the same stack frame.
+ */
+ kcov_finish_switch(current);
+ return rq;
}
/*
On Fri, Mar 20, 2026 at 11:10 PM Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Mar 18, 2026 at 05:26:58PM +0100, Jann Horn wrote:
> > + /*
> > + * This has to happen outside finish_task_switch() to ensure that
> > + * entry/exit records are balanced.
> > + */
>
> That's not exactly right; the requirement is that kcov_prepare_switch()
> and kcov_finish_switch() are called from the exact same frame.
>
> The wording above "outside finish_task_switch" could be anywhere and
> doesn't cover the relation to prepare_switch().
>
> > + kcov_finish_switch(current);
> > + return rq;
> > }
>
> That said; there was a patch that marked finish_task_switch() as
> __always_inline too:
>
> https://lkml.kernel.org/r/20260301083520.110969-4-qq570070308@gmail.com
>
> Except I think that does a little too much for one patch.
>
> Anyway, I'm a little divided on this. Perhaps the simplest and most
> obvious way is something like so.
>
> But what about compiler funnies like the various IPA optimizations that
> can do partial clones and whatnot? That could result in violating this
> constraint, no?
Ah, good point, bah. And commit 0ed557aa8139 ("sched/core / kcov:
avoid kcov_area during task switch") explains that we also can't just
keep emitting kcov records throughout a context switch.
So options I have are:
1. In the scheduler: Try to put enough magic annotations on __schedule
to prevent any such optimizations. (Probably "noinline __noclone"?)
But that's kind of invasive, I assume the scheduler wouldn't want to
have that unconditionally on a core function because of a niche KCOV
usecase.
2. In KCOV (without touching the scheduler code): Keep track of the
current and minimum stack depth while record generation is suppressed,
and emit those values when record generation is reenabled. That's a
bit more complexity but keeps it contained within KCOV, and I guess a
way to suppress events might be useful for other stuff too.
I guess I'll probably go with option 2...
On Tue, Mar 24, 2026 at 04:47:51PM +0100, Jann Horn wrote:
> > Anyway, I'm a little divided on this. Perhaps the simplest and most
> > obvious way is something like so.
> >
> > But what about compiler funnies like the various IPA optimizations that
> > can do partial clones and whatnot? That could result in violating this
> > constraint, no?
>
> Ah, good point, bah. And commit 0ed557aa8139 ("sched/core / kcov:
> avoid kcov_area during task switch") explains that we also can't just
> keep emitting kcov records throughout a context switch.
Oh, fun!
> So options I have are:
>
> 1. In the scheduler: Try to put enough magic annotations on __schedule
> to prevent any such optimizations. (Probably "noinline __noclone"?)
> But that's kind of invasive, I assume the scheduler wouldn't want to
> have that unconditionally on a core function because of a niche KCOV
> usecase.
So I noticed that clang doesn't even support __noclone, which is a bit
of a stumbling block there. But while going through the various function
attributes I did note __flatten, and I think we can use that.
But yeah, without then inhibiting cloning and the like this isn't going
to help. I found that linkers can also do cloning (as part of LTO I
guess).
> 2. In KCOV (without touching the scheduler code): Keep track of the
> current and minimum stack depth while record generation is suppressed,
> and emit those values when record generation is reenabled. That's a
> bit more complexity but keeps it contained within KCOV, and I guess a
> way to suppress events might be useful for other stuff too.
>
> I guess I'll probably go with option 2...
Yes, I suppose not relying on the compiler doing things exactly so is
the more robust approach.
© 2016 - 2026 Red Hat, Inc.