[v3] RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot

RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot

Posted by Thomas Gleixner 7 hours ago

Tim!

On Tue, Mar 31 2026 at 11:10, Thomas Gleixner wrote:
> So the real good question is whether the extra information of how long
> that earliest init takes is really relevant to the goal of optimizing
> boot time. The expensive part of the boot process definitely comes after
> that.

That actually made me curious and so I hacked up the kernel with the
patch below to compensate for the difference between:

  x86_64_start_reservations()

	i.e. the C entry point of the kernel and the actual earliest
	point (ASM entry code aside) where the kernel can take a
	timestamp, which is modulo the sanity checks in the PoC the same
	thing, right?

and

  tsc_early_init()

        where the upstream kernel enables TSC sched clock today with all
        sanity checks and enumeration in place.

Here is the result on a bare metal 256 CPU machine:

[    0.000000] Linux version 7.0.0-rc3-dirty ...

....

[    0.000000] tsc: Detected 2100.000 MHz processor
[    0.012482] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved

That's ~12ms of time which is not accounted for in the overall boot time
until the machine reaches the init process:

[   12.289141] Run /init as init process

That means we are talking about ~0.1% of overall boot time in this case.

Starting a 4 CPU guest with the same kernel image on the same physical
machine and additionally 'no-kvmclock' on the command line to make the
hack work:

[    0.000000] Linux version 7.0.0-rc3-dirty ...

...

[    0.000000] tsc: Detected 2094.965 MHz processor
[    0.015122] last_pfn = 0x280000 max_arch_pfn = 0x400000000

Unsurpringly it takes a bit longer because during that phase the guest
takes a gazillion of vmexits.

[    0.995082] Run /init as init process

Now in this 4 CPU case we are talking about 1.5% of the overall boot
time.

With the same setup and 32 CPUs in the VM:

[    0.015150] e820: remove [mem 0x000a0000-0x000fffff] System RAM

The initial phase takes 30us more than with 4 CPUs, which is in the
noise and the machine ends up in init at:

[    3.329398] Run /init as init process

which means in total we are up to 0.45% of the overall boot time now.

I'm honestly confused. May I politely ask which problem you are trying
to solve?

Thanks,

        tglx
---
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -291,6 +291,8 @@ asmlinkage __visible void __init __noret
 	x86_64_start_reservations(real_mode_data);
 }
 
+extern u64 start_ts;
+
 void __init __noreturn x86_64_start_reservations(char *real_mode_data)
 {
 	/* version is always not zero if it is copied */
@@ -307,6 +309,8 @@ void __init __noreturn x86_64_start_rese
 		break;
 	}
 
+	start_ts = rdtsc();
+
 	start_kernel();
 }
 
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -53,6 +53,9 @@ static DEFINE_STATIC_KEY_FALSE_RO(__use_
 
 int tsc_clocksource_reliable;
 
+extern u64 start_ts;
+u64 start_ts;
+
 static int __read_mostly tsc_force_recalibrate;
 
 static struct clocksource_base art_base_clk = {
@@ -207,7 +210,7 @@ static void __init cyc2ns_init_boot_cpu(
 	struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns);
 
 	seqcount_latch_init(&c2n->seq);
-	__set_cyc2ns_scale(tsc_khz, smp_processor_id(), rdtsc());
+	__set_cyc2ns_scale(tsc_khz, smp_processor_id(), start_ts);
 }
 
 /*

Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot

Posted by Brian Masney 5 hours ago

Hi Thomas,

On Wed, Apr 01, 2026 at 01:36:26AM +0200, Thomas Gleixner wrote:
> On Tue, Mar 31 2026 at 11:10, Thomas Gleixner wrote:
> > So the real good question is whether the extra information of how long
> > that earliest init takes is really relevant to the goal of optimizing
> > boot time. The expensive part of the boot process definitely comes after
> > that.
> 
> That actually made me curious and so I hacked up the kernel with the
> patch below to compensate for the difference between:
> 
>   x86_64_start_reservations()
> 
> 	i.e. the C entry point of the kernel and the actual earliest
> 	point (ASM entry code aside) where the kernel can take a
> 	timestamp, which is modulo the sanity checks in the PoC the same
> 	thing, right?
> 
> and
> 
>   tsc_early_init()
> 
>         where the upstream kernel enables TSC sched clock today with all
>         sanity checks and enumeration in place.
> 
> Here is the result on a bare metal 256 CPU machine:
> 
> [    0.000000] Linux version 7.0.0-rc3-dirty ...
> 
> ....
> 
> [    0.000000] tsc: Detected 2100.000 MHz processor
> [    0.012482] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> 
> That's ~12ms of time which is not accounted for in the overall boot time
> until the machine reaches the init process:
> 
> [   12.289141] Run /init as init process
> 
> That means we are talking about ~0.1% of overall boot time in this case.
> 
> Starting a 4 CPU guest with the same kernel image on the same physical
> machine and additionally 'no-kvmclock' on the command line to make the
> hack work:
> 
> [    0.000000] Linux version 7.0.0-rc3-dirty ...
> 
> ...
> 
> [    0.000000] tsc: Detected 2094.965 MHz processor
> [    0.015122] last_pfn = 0x280000 max_arch_pfn = 0x400000000
> 
> Unsurpringly it takes a bit longer because during that phase the guest
> takes a gazillion of vmexits.
> 
> [    0.995082] Run /init as init process
> 
> Now in this 4 CPU case we are talking about 1.5% of the overall boot
> time.
> 
> With the same setup and 32 CPUs in the VM:
> 
> [    0.015150] e820: remove [mem 0x000a0000-0x000fffff] System RAM
> 
> The initial phase takes 30us more than with 4 CPUs, which is in the
> noise and the machine ends up in init at:
> 
> [    3.329398] Run /init as init process
> 
> which means in total we are up to 0.45% of the overall boot time now.
> 
> I'm honestly confused. May I politely ask which problem you are trying
> to solve?

A recent example of where this was a problem was in the creation of the
arm64 linear map:

https://lore.kernel.org/all/20240412131908.433043-1-ryan.roberts@arm.com/

The boot time was all reported as 0, however we could tell there was a
boot delay based on the timing of the firmware / other linux logs in the
serial console. Eric Chanudet @ Red Hat (CCed) used the cntvct arch
counters on arm64 to track down this unreported time to the linear map
creation.

With this patch set, on a 32GB RAM arm64 system we have the linear map
creation time went from ~350ms to 25ms. Again, the boot time was all
reported as 0 in dmesg.

What Tim is trying to do is to identify if we have points like this on
other systems, or if boot regressions are introduced in the future.

Brian

Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot

Posted by Steven Rostedt 6 hours ago

On Wed, 01 Apr 2026 01:36:26 +0200
Thomas Gleixner <tglx@kernel.org> wrote:

> On Tue, Mar 31 2026 at 11:10, Thomas Gleixner wrote:
> > So the real good question is whether the extra information of how long
> > that earliest init takes is really relevant to the goal of optimizing
> > boot time. The expensive part of the boot process definitely comes after
> > that.  
> 
> That actually made me curious and so I hacked up the kernel with the
> patch below to compensate for the difference between:
> 
>   x86_64_start_reservations()
> 
> 	i.e. the C entry point of the kernel and the actual earliest
> 	point (ASM entry code aside) where the kernel can take a
> 	timestamp, which is modulo the sanity checks in the PoC the same
> 	thing, right?
> 
> and
> 
>   tsc_early_init()
> 
>         where the upstream kernel enables TSC sched clock today with all
>         sanity checks and enumeration in place.
> 
> Here is the result on a bare metal 256 CPU machine:
> 
> [    0.000000] Linux version 7.0.0-rc3-dirty ...
> 
> ....
> 
> [    0.000000] tsc: Detected 2100.000 MHz processor
> [    0.012482] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> 
> That's ~12ms of time which is not accounted for in the overall boot time
> until the machine reaches the init process:
> 
> [   12.289141] Run /init as init process

12 seconds! Holly crap, that would fail every Chromebook requirement we
have. In most cases, we shoot for a 3 second boot up and 8 second max.
That's from hitting the power button to login screen. FYI, 30ms is
considered a really long time.

We spend a lot of time looking at every thing that slows down the boot
process, and yes we do a lot of tricks in user space as well (see
ureadahead[1]).

I'm sure there's various devices that have issues between the kernel C
starting point and where timing begins. When I was looking into boot times,
I did find that the biggest issue with boot up was the initcalls. They are
all serial. If one sleeps, it causes everything to back up. I would love to
have initcalls have more parallelism, but there's a lot of code that
depends on things running in serial :-(

Point being, I'm sure Tim is worried about devices that require fast boot
up times. Not machines with 256 CPUs.

I haven't needed to look at what is happening before time stamps are
running, as we track that time via the firmware timestamps. I'm just
pointing out that the machine you used in this example isn't the target of
the work that is looking into fast bootup times.

-- Steve

[1] https://github.com/rostedt/ureadahead