The init_blk_tracer() function causes significant boot delay as it
waits for the trace_event_sem lock held by trace_event_update_all().
Specifically, its child function register_trace_event() requires
this lock, which is occupied for an extended period during boot.
To resolve this, when the trace_async_init parameter is enabled, the
execution of primary init_blk_tracer() is moved to the trace_init_wq
workqueue, allowing it to run asynchronously. and prevent blocking
the main boot thread.
Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
---
kernel/trace/blktrace.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index d031c8d80be4..56c7270ec447 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -1832,7 +1832,9 @@ static struct trace_event trace_blk_event = {
.funcs = &trace_blk_event_funcs,
};
-static int __init init_blk_tracer(void)
+static struct work_struct blktrace_works __initdata;
+
+static int __init __init_blk_tracer(void)
{
if (!register_trace_event(&trace_blk_event)) {
pr_warn("Warning: could not register block events\n");
@@ -1852,6 +1854,25 @@ static int __init init_blk_tracer(void)
return 0;
}
+static void __init blktrace_works_func(struct work_struct *work)
+{
+ __init_blk_tracer();
+}
+
+static int __init init_blk_tracer(void)
+{
+ int ret = 0;
+
+ if (trace_init_wq && trace_async_init) {
+ INIT_WORK(&blktrace_works, blktrace_works_func);
+ queue_work(trace_init_wq, &blktrace_works);
+ } else {
+ ret = __init_blk_tracer();
+ }
+
+ return ret;
+}
+
device_initcall(init_blk_tracer);
static int blk_trace_remove_queue(struct request_queue *q)
--
2.25.1
Jens,
Can you give me an acked-by on this patch and I can take the series through
my tree.
Or perhaps this doesn't even need to test the trace_async_init flag and can
always do the work queue? Does blk_trace ever do tracing at boot up? That
is, before user space starts?
Thanks,
-- Steve
On Wed, 28 Jan 2026 20:55:54 +0800
Yaxiong Tian <tianyaxiong@kylinos.cn> wrote:
> The init_blk_tracer() function causes significant boot delay as it
> waits for the trace_event_sem lock held by trace_event_update_all().
> Specifically, its child function register_trace_event() requires
> this lock, which is occupied for an extended period during boot.
>
> To resolve this, when the trace_async_init parameter is enabled, the
> execution of primary init_blk_tracer() is moved to the trace_init_wq
> workqueue, allowing it to run asynchronously. and prevent blocking
> the main boot thread.
>
> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
> ---
> kernel/trace/blktrace.c | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index d031c8d80be4..56c7270ec447 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -1832,7 +1832,9 @@ static struct trace_event trace_blk_event = {
> .funcs = &trace_blk_event_funcs,
> };
>
> -static int __init init_blk_tracer(void)
> +static struct work_struct blktrace_works __initdata;
> +
> +static int __init __init_blk_tracer(void)
> {
> if (!register_trace_event(&trace_blk_event)) {
> pr_warn("Warning: could not register block events\n");
> @@ -1852,6 +1854,25 @@ static int __init init_blk_tracer(void)
> return 0;
> }
>
> +static void __init blktrace_works_func(struct work_struct *work)
> +{
> + __init_blk_tracer();
> +}
> +
> +static int __init init_blk_tracer(void)
> +{
> + int ret = 0;
> +
> + if (trace_init_wq && trace_async_init) {
> + INIT_WORK(&blktrace_works, blktrace_works_func);
> + queue_work(trace_init_wq, &blktrace_works);
> + } else {
> + ret = __init_blk_tracer();
> + }
> +
> + return ret;
> +}
> +
> device_initcall(init_blk_tracer);
>
> static int blk_trace_remove_queue(struct request_queue *q)
On Jan 28, 2026, at 5:40 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > Jens, > > Can you give me an acked-by on this patch and I can take the series through > my tree. On phone, hope this works: Acked-by: Jens Axboe <axboe@kernel.dk> > Or perhaps this doesn't even need to test the trace_async_init flag and can > always do the work queue? Does blk_trace ever do tracing at boot up? That > is, before user space starts? Not via the traditonal way of running blktrace.
On Wed, 28 Jan 2026 19:25:46 -0700 Jens Axboe <axboe@kernel.dk> wrote: > On Jan 28, 2026, at 5:40 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > Jens, > > > > Can you give me an acked-by on this patch and I can take the series through > > my tree. > > On phone, hope this works: > > Acked-by: Jens Axboe <axboe@kernel.dk> Thanks! > > > Or perhaps this doesn't even need to test the trace_async_init flag and can > > always do the work queue? Does blk_trace ever do tracing at boot up? That > > is, before user space starts? > > Not via the traditonal way of running blktrace. Masami and Yaxiong, I've been thinking about this more and I'm not sure we need the trace_async_init kernel parameter at all. As blktrace should only be enabled by user space, it can always use the work queue. For kprobes, if someone is adding a kprobe on the kernel command line, then they are already specifying that tracing is more important. Patch 3 already keeps kprobes from being an issue with contention of the tracing locks, so I don't think it ever needs to use the work queue. Wouldn't it just be better to remove the trace_async_init and make blktrace always use the work queue and kprobes never do it (but exit out early if there were no kprobes registered)? That is, remove patch 2 and 4 and make this patch always use the work queue. -- Steve
在 2026/1/30 04:29, Steven Rostedt 写道:
> On Wed, 28 Jan 2026 19:25:46 -0700
> Jens Axboe <axboe@kernel.dk> wrote:
>
>> On Jan 28, 2026, at 5:40 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>>>
>>> Jens,
>>>
>>> Can you give me an acked-by on this patch and I can take the series through
>>> my tree.
>> On phone, hope this works:
>>
>> Acked-by: Jens Axboe <axboe@kernel.dk>
> Thanks!
>
>>> Or perhaps this doesn't even need to test the trace_async_init flag and can
>>> always do the work queue? Does blk_trace ever do tracing at boot up? That
>>> is, before user space starts?
>> Not via the traditonal way of running blktrace.
> Masami and Yaxiong,
>
> I've been thinking about this more and I'm not sure we need the
> trace_async_init kernel parameter at all. As blktrace should only be
> enabled by user space, it can always use the work queue.
Hi Steven and Jens:
I've been thinking about this further. If we need to consider the
possibility of non-traditional blktrace usage during the boot phase,
could we perhaps use a grub parameter like 'ftrace=blk' to handle this?
More specifically, we could check this through
the|default_bootup_tracer|mechanism.
+bool __init trace_check_need_bootup_tracer(struct tracer *type)
+{
+ if (!default_bootup_tracer)
+ return false;
+
+ if (strncmp(default_bootup_tracer, type->name, MAX_TRACER_SIZE))
+ return false;
+ else
+ return true;
+}
+
>
> For kprobes, if someone is adding a kprobe on the kernel command line, then
> they are already specifying that tracing is more important.
>
> Patch 3 already keeps kprobes from being an issue with contention of the
> tracing locks, so I don't think it ever needs to use the work queue.
>
> Wouldn't it just be better to remove the trace_async_init and make blktrace
> always use the work queue and kprobes never do it (but exit out early if
> there were no kprobes registered)?
>
> That is, remove patch 2 and 4 and make this patch always use the work queue.
>
> -- Steve
On Thu, 29 Jan 2026 15:29:58 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 28 Jan 2026 19:25:46 -0700 > Jens Axboe <axboe@kernel.dk> wrote: > > > On Jan 28, 2026, at 5:40 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > > > > Jens, > > > > > > Can you give me an acked-by on this patch and I can take the series through > > > my tree. > > > > On phone, hope this works: > > > > Acked-by: Jens Axboe <axboe@kernel.dk> > > Thanks! > > > > > > Or perhaps this doesn't even need to test the trace_async_init flag and can > > > always do the work queue? Does blk_trace ever do tracing at boot up? That > > > is, before user space starts? > > > > Not via the traditonal way of running blktrace. > > Masami and Yaxiong, > > I've been thinking about this more and I'm not sure we need the > trace_async_init kernel parameter at all. As blktrace should only be > enabled by user space, it can always use the work queue. > > For kprobes, if someone is adding a kprobe on the kernel command line, then > they are already specifying that tracing is more important. > > Patch 3 already keeps kprobes from being an issue with contention of the > tracing locks, so I don't think it ever needs to use the work queue. > > Wouldn't it just be better to remove the trace_async_init and make blktrace > always use the work queue and kprobes never do it (but exit out early if > there were no kprobes registered)? Yeah, for kprobes event case, that sounds good to me. I think [3/5] is enough to speed it up if user does not define kprobe events on cmdline. Thank you, > > That is, remove patch 2 and 4 and make this patch always use the work queue. > > -- Steve -- Masami Hiramatsu (Google) <mhiramat@kernel.org>
在 2026/1/30 17:30, Masami Hiramatsu (Google) 写道: > On Thu, 29 Jan 2026 15:29:58 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > >> On Wed, 28 Jan 2026 19:25:46 -0700 >> Jens Axboe <axboe@kernel.dk> wrote: >> >>> On Jan 28, 2026, at 5:40 PM, Steven Rostedt <rostedt@goodmis.org> wrote: >>>> >>>> Jens, >>>> >>>> Can you give me an acked-by on this patch and I can take the series through >>>> my tree. >>> On phone, hope this works: >>> >>> Acked-by: Jens Axboe <axboe@kernel.dk> >> Thanks! >> >>>> Or perhaps this doesn't even need to test the trace_async_init flag and can >>>> always do the work queue? Does blk_trace ever do tracing at boot up? That >>>> is, before user space starts? >>> Not via the traditonal way of running blktrace. >> Masami and Yaxiong, >> >> I've been thinking about this more and I'm not sure we need the >> trace_async_init kernel parameter at all. As blktrace should only be >> enabled by user space, it can always use the work queue. >> >> For kprobes, if someone is adding a kprobe on the kernel command line, then >> they are already specifying that tracing is more important. >> >> Patch 3 already keeps kprobes from being an issue with contention of the >> tracing locks, so I don't think it ever needs to use the work queue. >> >> Wouldn't it just be better to remove the trace_async_init and make blktrace >> always use the work queue and kprobes never do it (but exit out early if >> there were no kprobes registered)? > Yeah, for kprobes event case, that sounds good to me. I think [3/5] is > enough to speed it up if user does not define kprobe events on cmdline. > > Thank you, Agreed. Hi Jens: what do you think about this proposal (making blktrace always use the work queue)? > >> That is, remove patch 2 and 4 and make this patch always use the work queue. >> >> -- Steve >
在 2026/1/30 04:29, Steven Rostedt 写道: > On Wed, 28 Jan 2026 19:25:46 -0700 > Jens Axboe <axboe@kernel.dk> wrote: > >> On Jan 28, 2026, at 5:40 PM, Steven Rostedt <rostedt@goodmis.org> wrote: >>> >>> Jens, >>> >>> Can you give me an acked-by on this patch and I can take the series through >>> my tree. >> On phone, hope this works: >> >> Acked-by: Jens Axboe <axboe@kernel.dk> > Thanks! > >>> Or perhaps this doesn't even need to test the trace_async_init flag and can >>> always do the work queue? Does blk_trace ever do tracing at boot up? That >>> is, before user space starts? >> Not via the traditonal way of running blktrace. > Masami and Yaxiong, > > I've been thinking about this more and I'm not sure we need the > trace_async_init kernel parameter at all. As blktrace should only be > enabled by user space, it can always use the work queue. > > For kprobes, if someone is adding a kprobe on the kernel command line, then > they are already specifying that tracing is more important. > > Patch 3 already keeps kprobes from being an issue with contention of the > tracing locks, so I don't think it ever needs to use the work queue. > > Wouldn't it just be better to remove the trace_async_init and make blktrace > always use the work queue and kprobes never do it (but exit out early if > there were no kprobes registered)? > > That is, remove patch 2 and 4 and make this patch always use the work queue. Yesterday, I was curious about|trace_event_update_all()|, so I added|pr_err(xx)|prints within the function's loop. I discovered that these prints appeared as late as 14 seconds later (printing is time-consuming), by which time the desktop had already been up for quite a while. However,|trace_eval_sync()|had already finished running at 0.6 seconds. This implies that I originally thought|trace_eval_sync()|'s|destroy_workqueue()|would wait for all tasks to complete, but it seems that might not be the case. From this, if the above conclusion is true, then strictly speaking, tasks using|queue_work(xx)|cannot be guaranteed to finish before the init process executes. If it's necessary to strictly ensure initialization completes before user space starts, using|async_synchronize_full()|or|async_synchronize_full_domain()|would be better in such scenarios. Of course, the situation described above is an extreme case. I don't oppose this approach; I only hope to make the startup faster for ordinary users who don’t use trace, while minimizing the impact on others as much as possible. > > -- Steve
在 2026/1/30 09:35, Yaxiong Tian 写道: > > 在 2026/1/30 04:29, Steven Rostedt 写道: >> On Wed, 28 Jan 2026 19:25:46 -0700 >> Jens Axboe <axboe@kernel.dk> wrote: >> >>> On Jan 28, 2026, at 5:40 PM, Steven Rostedt <rostedt@goodmis.org> >>> wrote: >>>> >>>> Jens, >>>> >>>> Can you give me an acked-by on this patch and I can take the series >>>> through >>>> my tree. >>> On phone, hope this works: >>> >>> Acked-by: Jens Axboe <axboe@kernel.dk> >> Thanks! >> >>>> Or perhaps this doesn't even need to test the trace_async_init flag >>>> and can >>>> always do the work queue? Does blk_trace ever do tracing at boot >>>> up? That >>>> is, before user space starts? >>> Not via the traditonal way of running blktrace. >> Masami and Yaxiong, >> >> I've been thinking about this more and I'm not sure we need the >> trace_async_init kernel parameter at all. As blktrace should only be >> enabled by user space, it can always use the work queue. >> >> For kprobes, if someone is adding a kprobe on the kernel command >> line, then >> they are already specifying that tracing is more important. >> >> Patch 3 already keeps kprobes from being an issue with contention of the >> tracing locks, so I don't think it ever needs to use the work queue. >> >> Wouldn't it just be better to remove the trace_async_init and make >> blktrace >> always use the work queue and kprobes never do it (but exit out early if >> there were no kprobes registered)? >> >> That is, remove patch 2 and 4 and make this patch always use the work >> queue. > > Yesterday, I was curious about|trace_event_update_all()|, so I > added|pr_err(xx)|prints within the function's loop. I discovered that > these prints appeared as late as 14 seconds later (printing is > time-consuming), by which time the desktop had already been up for > quite a while. However,|trace_eval_sync()|had already finished running > at 0.6 seconds. > > This implies that I originally > thought|trace_eval_sync()|'s|destroy_workqueue()|would wait for all > tasks to complete, but it seems that might not be the case. From this, > if the above conclusion is true, then strictly speaking, tasks > using|queue_work(xx)|cannot be guaranteed to finish before the init > process executes. If it's necessary to strictly ensure initialization > completes before user space starts, > using|async_synchronize_full()|or|async_synchronize_full_domain()|would > be better in such scenarios. I need to double-check this issue—theoretically, it shouldn't exist. But I'm not sure why the print appeared at the 14-second mark. > > Of course, the situation described above is an extreme case. I don't > oppose this approach; I only hope to make the startup faster for > ordinary users who don’t use trace, while minimizing the impact on > others as much as possible. > >> >> -- Steve
On Fri, 30 Jan 2026 11:09:26 +0800 Yaxiong Tian <tianyaxiong@kylinos.cn> wrote: > > thought|trace_eval_sync()|'s|destroy_workqueue()|would wait for all > > tasks to complete, but it seems that might not be the case. From this, > > if the above conclusion is true, then strictly speaking, tasks > > using|queue_work(xx)|cannot be guaranteed to finish before the init > > process executes. If it's necessary to strictly ensure initialization > > completes before user space starts, > > using|async_synchronize_full()|or|async_synchronize_full_domain()|would > > be better in such scenarios. > I need to double-check this issue—theoretically, it shouldn't exist. But > I'm not sure why the print appeared at the 14-second mark. Use trace_printk() instead. printk now has a "deferred" output. I'm not sure if the timestamps of when it prints is when the print took place or when it got to the console :-/ -- Steve > > > > Of course, the situation described above is an extreme case. I don't > > oppose this approach; I only hope to make the startup faster for > > ordinary users who don’t use trace, while minimizing the impact on > > others as much as possible. > >
On Thu, 29 Jan 2026 22:26:08 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> Use trace_printk() instead. printk now has a "deferred" output. I'm not
> sure if the timestamps of when it prints is when the print took place
> or when it got to the console :-/
I added the below patch and have this result:
kworker/u33:1-79 [002] ..... 1.840855: trace_event_update_all: Start syncing
swapper/0-1 [005] ..... 6.045742: trace_eval_sync: sync maps
kworker/u33:1-79 [002] ..... 12.289296: trace_event_update_all: Finish syncing
swapper/0-1 [005] ..... 12.289387: trace_eval_sync: sync maps complete
Which shows that the final initcall waited for the work queue to complete:
-- Steve
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 396d59202438..33180d5622a8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -10817,9 +10817,11 @@ subsys_initcall(trace_eval_init);
static int __init trace_eval_sync(void)
{
+ trace_printk("sync maps\n");
/* Make sure the eval map updates are finished */
if (eval_map_wq)
destroy_workqueue(eval_map_wq);
+ trace_printk("sync maps complete\n");
return 0;
}
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index af6d1fe5cab7..194b344400e9 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -3555,6 +3555,7 @@ void trace_event_update_all(struct trace_eval_map **map, int len)
int last_i;
int i;
+ trace_printk("Start syncing\n");
down_write(&trace_event_sem);
list_for_each_entry_safe(call, p, &ftrace_events, list) {
/* events are usually grouped together with systems */
@@ -3593,6 +3594,8 @@ void trace_event_update_all(struct trace_eval_map **map, int len)
cond_resched();
}
up_write(&trace_event_sem);
+ msleep(10000);
+ trace_printk("Finish syncing\n");
}
static bool event_in_systems(struct trace_event_call *call,
On Thu, 29 Jan 2026 22:31:16 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > I added the below patch and have this result: > > kworker/u33:1-79 [002] ..... 1.840855: trace_event_update_all: Start syncing > swapper/0-1 [005] ..... 6.045742: trace_eval_sync: sync maps > kworker/u33:1-79 [002] ..... 12.289296: trace_event_update_all: Finish syncing > swapper/0-1 [005] ..... 12.289387: trace_eval_sync: sync maps complete > > Which shows that the final initcall waited for the work queue to complete: Switching to printk() gives me the same results: # dmesg |grep sync [ 1.117856] Start syncing [ 4.498360] sync maps [ 11.173304] Finish syncing [ 11.175660] sync maps complete -- Steve
在 2026/1/30 11:45, Steven Rostedt 写道: > On Thu, 29 Jan 2026 22:31:16 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > >> I added the below patch and have this result: >> >> kworker/u33:1-79 [002] ..... 1.840855: trace_event_update_all: Start syncing >> swapper/0-1 [005] ..... 6.045742: trace_eval_sync: sync maps >> kworker/u33:1-79 [002] ..... 12.289296: trace_event_update_all: Finish syncing >> swapper/0-1 [005] ..... 12.289387: trace_eval_sync: sync maps complete >> >> Which shows that the final initcall waited for the work queue to complete: > Switching to printk() gives me the same results: > > # dmesg |grep sync > [ 1.117856] Start syncing > [ 4.498360] sync maps > [ 11.173304] Finish syncing > [ 11.175660] sync maps complete > > -- Steve Sorry, yes, no problem. I confirmed that init_blk_tracer() is running properly (when executed sequentially) — if there were an issue, it would have already gotten stuck in a lock. It seems like this might be related to the print buffer. I’ll look into this issue myself. Back to this topic — I don’t object to that proposal. I think each has its own advantages. Let’s see what others think.
© 2016 - 2026 Red Hat, Inc.