kernel/trace/trace_events.c | 1 + 1 file changed, 1 insertion(+)
When kernel is compiled without preemption, the eval_map_work_func()
(which calls trace_event_eval_update()) will not be preempted up to its
complete execution. This can actually cause a problem since if another
CPU call stop_machine(), the call will have to wait for the
eval_map_work_func() function to finish executing in the workqueue
before being able to be scheduled. This problem was observe on a SMP
system at boot time, when the CPU calling the initcalls executed
clocksource_done_booting() which in the end calls stop_machine(). We
observed a 1 second delay because one CPU was executing
eval_map_work_func() and was not preempted by the stop_machine() task.
Adding a call to schedule() in trace_event_eval_update() allows to let
other tasks to be executed and thus continue working asynchronously like
before without blocking any pending task at boot time.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
---
kernel/trace/trace_events.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 91951d038ba4..dbdf57a081c0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2770,6 +2770,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
update_event_fields(call, map[i]);
}
}
+ schedule();
}
up_write(&trace_event_sem);
}
--
2.40.1
On Fri, 29 Sep 2023 16:13:48 +0200 Clément Léger <cleger@rivosinc.com> wrote: > When kernel is compiled without preemption, the eval_map_work_func() > (which calls trace_event_eval_update()) will not be preempted up to its > complete execution. This can actually cause a problem since if another > CPU call stop_machine(), the call will have to wait for the > eval_map_work_func() function to finish executing in the workqueue > before being able to be scheduled. This problem was observe on a SMP > system at boot time, when the CPU calling the initcalls executed > clocksource_done_booting() which in the end calls stop_machine(). We > observed a 1 second delay because one CPU was executing > eval_map_work_func() and was not preempted by the stop_machine() task. > > Adding a call to schedule() in trace_event_eval_update() allows to let > other tasks to be executed and thus continue working asynchronously like > before without blocking any pending task at boot time. > > Signed-off-by: Clément Léger <cleger@rivosinc.com> > --- > kernel/trace/trace_events.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c > index 91951d038ba4..dbdf57a081c0 100644 > --- a/kernel/trace/trace_events.c > +++ b/kernel/trace/trace_events.c > @@ -2770,6 +2770,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len) > update_event_fields(call, map[i]); > } > } > + schedule(); The proper answer to this is "cond_resched()" but still, there's going to be work to get rid of all that soon [1]. But I'll take a cond_resched() now until that is implemented. -- Steve > } > up_write(&trace_event_sem); > } [1] https://lore.kernel.org/all/87cyyfxd4k.ffs@tglx/
On 29/09/2023 17:06, Steven Rostedt wrote: > On Fri, 29 Sep 2023 16:13:48 +0200 > Clément Léger <cleger@rivosinc.com> wrote: > >> When kernel is compiled without preemption, the eval_map_work_func() >> (which calls trace_event_eval_update()) will not be preempted up to its >> complete execution. This can actually cause a problem since if another >> CPU call stop_machine(), the call will have to wait for the >> eval_map_work_func() function to finish executing in the workqueue >> before being able to be scheduled. This problem was observe on a SMP >> system at boot time, when the CPU calling the initcalls executed >> clocksource_done_booting() which in the end calls stop_machine(). We >> observed a 1 second delay because one CPU was executing >> eval_map_work_func() and was not preempted by the stop_machine() task. >> >> Adding a call to schedule() in trace_event_eval_update() allows to let >> other tasks to be executed and thus continue working asynchronously like >> before without blocking any pending task at boot time. >> >> Signed-off-by: Clément Léger <cleger@rivosinc.com> >> --- >> kernel/trace/trace_events.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c >> index 91951d038ba4..dbdf57a081c0 100644 >> --- a/kernel/trace/trace_events.c >> +++ b/kernel/trace/trace_events.c >> @@ -2770,6 +2770,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len) >> update_event_fields(call, map[i]); >> } >> } >> + schedule(); > > The proper answer to this is "cond_resched()" but still, there's going > to be work to get rid of all that soon [1]. But I'll take a cond_resched() > now until that is implemented. Hi Steven, Thanks for the information, I'll update the patch and send a V2. Clément > > -- Steve > >> } >> up_write(&trace_event_sem); >> } > > [1] https://lore.kernel.org/all/87cyyfxd4k.ffs@tglx/
© 2016 - 2026 Red Hat, Inc.