RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot

Thomas Gleixner posted 1 patch 2 months, 1 week ago
RE: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
Posted by Thomas Gleixner 2 months, 1 week ago
Tim!

On Tue, Mar 31 2026 at 11:10, Thomas Gleixner wrote:
> So the real good question is whether the extra information of how long
> that earliest init takes is really relevant to the goal of optimizing
> boot time. The expensive part of the boot process definitely comes after
> that.

That actually made me curious and so I hacked up the kernel with the
patch below to compensate for the difference between:

  x86_64_start_reservations()

	i.e. the C entry point of the kernel and the actual earliest
	point (ASM entry code aside) where the kernel can take a
	timestamp, which is modulo the sanity checks in the PoC the same
	thing, right?

and

  tsc_early_init()

        where the upstream kernel enables TSC sched clock today with all
        sanity checks and enumeration in place.

Here is the result on a bare metal 256 CPU machine:

[    0.000000] Linux version 7.0.0-rc3-dirty ...

....

[    0.000000] tsc: Detected 2100.000 MHz processor
[    0.012482] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved

That's ~12ms of time which is not accounted for in the overall boot time
until the machine reaches the init process:

[   12.289141] Run /init as init process

That means we are talking about ~0.1% of overall boot time in this case.

Starting a 4 CPU guest with the same kernel image on the same physical
machine and additionally 'no-kvmclock' on the command line to make the
hack work:

[    0.000000] Linux version 7.0.0-rc3-dirty ...

...

[    0.000000] tsc: Detected 2094.965 MHz processor
[    0.015122] last_pfn = 0x280000 max_arch_pfn = 0x400000000

Unsurpringly it takes a bit longer because during that phase the guest
takes a gazillion of vmexits.

[    0.995082] Run /init as init process

Now in this 4 CPU case we are talking about 1.5% of the overall boot
time.

With the same setup and 32 CPUs in the VM:

[    0.015150] e820: remove [mem 0x000a0000-0x000fffff] System RAM

The initial phase takes 30us more than with 4 CPUs, which is in the
noise and the machine ends up in init at:

[    3.329398] Run /init as init process

which means in total we are up to 0.45% of the overall boot time now.

I'm honestly confused. May I politely ask which problem you are trying
to solve?

Thanks,

        tglx
---
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -291,6 +291,8 @@ asmlinkage __visible void __init __noret
 	x86_64_start_reservations(real_mode_data);
 }
 
+extern u64 start_ts;
+
 void __init __noreturn x86_64_start_reservations(char *real_mode_data)
 {
 	/* version is always not zero if it is copied */
@@ -307,6 +309,8 @@ void __init __noreturn x86_64_start_rese
 		break;
 	}
 
+	start_ts = rdtsc();
+
 	start_kernel();
 }
 
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -53,6 +53,9 @@ static DEFINE_STATIC_KEY_FALSE_RO(__use_
 
 int tsc_clocksource_reliable;
 
+extern u64 start_ts;
+u64 start_ts;
+
 static int __read_mostly tsc_force_recalibrate;
 
 static struct clocksource_base art_base_clk = {
@@ -207,7 +210,7 @@ static void __init cyc2ns_init_boot_cpu(
 	struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns);
 
 	seqcount_latch_init(&c2n->seq);
-	__set_cyc2ns_scale(tsc_khz, smp_processor_id(), rdtsc());
+	__set_cyc2ns_scale(tsc_khz, smp_processor_id(), start_ts);
 }
 
 /*
Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
Posted by Brian Masney 2 months, 1 week ago
Hi Thomas,

On Wed, Apr 01, 2026 at 01:36:26AM +0200, Thomas Gleixner wrote:
> On Tue, Mar 31 2026 at 11:10, Thomas Gleixner wrote:
> > So the real good question is whether the extra information of how long
> > that earliest init takes is really relevant to the goal of optimizing
> > boot time. The expensive part of the boot process definitely comes after
> > that.
> 
> That actually made me curious and so I hacked up the kernel with the
> patch below to compensate for the difference between:
> 
>   x86_64_start_reservations()
> 
> 	i.e. the C entry point of the kernel and the actual earliest
> 	point (ASM entry code aside) where the kernel can take a
> 	timestamp, which is modulo the sanity checks in the PoC the same
> 	thing, right?
> 
> and
> 
>   tsc_early_init()
> 
>         where the upstream kernel enables TSC sched clock today with all
>         sanity checks and enumeration in place.
> 
> Here is the result on a bare metal 256 CPU machine:
> 
> [    0.000000] Linux version 7.0.0-rc3-dirty ...
> 
> ....
> 
> [    0.000000] tsc: Detected 2100.000 MHz processor
> [    0.012482] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> 
> That's ~12ms of time which is not accounted for in the overall boot time
> until the machine reaches the init process:
> 
> [   12.289141] Run /init as init process
> 
> That means we are talking about ~0.1% of overall boot time in this case.
> 
> Starting a 4 CPU guest with the same kernel image on the same physical
> machine and additionally 'no-kvmclock' on the command line to make the
> hack work:
> 
> [    0.000000] Linux version 7.0.0-rc3-dirty ...
> 
> ...
> 
> [    0.000000] tsc: Detected 2094.965 MHz processor
> [    0.015122] last_pfn = 0x280000 max_arch_pfn = 0x400000000
> 
> Unsurpringly it takes a bit longer because during that phase the guest
> takes a gazillion of vmexits.
> 
> [    0.995082] Run /init as init process
> 
> Now in this 4 CPU case we are talking about 1.5% of the overall boot
> time.
> 
> With the same setup and 32 CPUs in the VM:
> 
> [    0.015150] e820: remove [mem 0x000a0000-0x000fffff] System RAM
> 
> The initial phase takes 30us more than with 4 CPUs, which is in the
> noise and the machine ends up in init at:
> 
> [    3.329398] Run /init as init process
> 
> which means in total we are up to 0.45% of the overall boot time now.
> 
> I'm honestly confused. May I politely ask which problem you are trying
> to solve?

A recent example of where this was a problem was in the creation of the
arm64 linear map:

https://lore.kernel.org/all/20240412131908.433043-1-ryan.roberts@arm.com/

The boot time was all reported as 0, however we could tell there was a
boot delay based on the timing of the firmware / other linux logs in the
serial console. Eric Chanudet @ Red Hat (CCed) used the cntvct arch
counters on arm64 to track down this unreported time to the linear map
creation.

With this patch set, on a 32GB RAM arm64 system we have the linear map
creation time went from ~350ms to 25ms. Again, the boot time was all
reported as 0 in dmesg.

What Tim is trying to do is to identify if we have points like this on
other systems, or if boot regressions are introduced in the future.

Brian
Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
Posted by Thomas Gleixner 2 months, 1 week ago
Brian!

On Tue, Mar 31 2026 at 21:16, Brian Masney wrote:
> With this patch set, on a 32GB RAM arm64 system we have the linear map
> creation time went from ~350ms to 25ms. Again, the boot time was all
> reported as 0 in dmesg.

Impressive!

> What Tim is trying to do is to identify if we have points like this on
> other systems, or if boot regressions are introduced in the future.

I completely understand that and as I clearly said from the very
beginning I'm not trying to prevent that at all.

What I'm arguing against is the approach of creating an ad hoc wart in
printk as that's creating yet another long term maintenance burden.

Thanks,

        tglx
Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
Posted by Steven Rostedt 2 months, 1 week ago
On Wed, 01 Apr 2026 01:36:26 +0200
Thomas Gleixner <tglx@kernel.org> wrote:

> On Tue, Mar 31 2026 at 11:10, Thomas Gleixner wrote:
> > So the real good question is whether the extra information of how long
> > that earliest init takes is really relevant to the goal of optimizing
> > boot time. The expensive part of the boot process definitely comes after
> > that.  
> 
> That actually made me curious and so I hacked up the kernel with the
> patch below to compensate for the difference between:
> 
>   x86_64_start_reservations()
> 
> 	i.e. the C entry point of the kernel and the actual earliest
> 	point (ASM entry code aside) where the kernel can take a
> 	timestamp, which is modulo the sanity checks in the PoC the same
> 	thing, right?
> 
> and
> 
>   tsc_early_init()
> 
>         where the upstream kernel enables TSC sched clock today with all
>         sanity checks and enumeration in place.
> 
> Here is the result on a bare metal 256 CPU machine:
> 
> [    0.000000] Linux version 7.0.0-rc3-dirty ...
> 
> ....
> 
> [    0.000000] tsc: Detected 2100.000 MHz processor
> [    0.012482] e820: update [mem 0x00000000-0x00000fff] System RAM ==> device reserved
> 
> That's ~12ms of time which is not accounted for in the overall boot time
> until the machine reaches the init process:
> 
> [   12.289141] Run /init as init process

12 seconds! Holly crap, that would fail every Chromebook requirement we
have. In most cases, we shoot for a 3 second boot up and 8 second max.
That's from hitting the power button to login screen. FYI, 30ms is
considered a really long time.

We spend a lot of time looking at every thing that slows down the boot
process, and yes we do a lot of tricks in user space as well (see
ureadahead[1]).

I'm sure there's various devices that have issues between the kernel C
starting point and where timing begins. When I was looking into boot times,
I did find that the biggest issue with boot up was the initcalls. They are
all serial. If one sleeps, it causes everything to back up. I would love to
have initcalls have more parallelism, but there's a lot of code that
depends on things running in serial :-(

Point being, I'm sure Tim is worried about devices that require fast boot
up times. Not machines with 256 CPUs.

I haven't needed to look at what is happening before time stamps are
running, as we track that time via the firmware timestamps. I'm just
pointing out that the machine you used in this example isn't the target of
the work that is looking into fast bootup times.

-- Steve

[1] https://github.com/rostedt/ureadahead
Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
Posted by Thomas Gleixner 2 months, 1 week ago
Steven!

On Tue, Mar 31 2026 at 20:05, Steven Rostedt wrote:
> On Wed, 01 Apr 2026 01:36:26 +0200
> Thomas Gleixner <tglx@kernel.org> wrote:
>> That's ~12ms of time which is not accounted for in the overall boot time
>> until the machine reaches the init process:
>> 
>> [   12.289141] Run /init as init process
>
> 12 seconds! Holly crap, that would fail every Chromebook requirement we
> have. In most cases, we shoot for a 3 second boot up and 8 second max.
> That's from hitting the power button to login screen. FYI, 30ms is
> considered a really long time.

Which 30ms are you talking about?

The first timestamp is 12ms into the boot on bare metal as you can see
from the quoted text above. On a 4 CPU VM running on that very same host
it's 15ms due to the VM exit costs.

Which means with your 3 seconds aim, that's ~0.4% of the overall
time. Even with your 30ms it's still only 1% according to my old school
understanding of math.

> Point being, I'm sure Tim is worried about devices that require fast boot
> up times. Not machines with 256 CPUs.

You're clearly missing the point.

As I've demonstrated the time until the first time stamp is available
has nothing to do with the number of CPUs and is more or less constant:

VM with 4 CPUs:

[    0.015122] last_pfn = 0x280000 max_arch_pfn = 0x400000000

VM with 32 CPUs:

[    0.015150] e820: remove [mem 0x000a0000-0x000fffff] System RAM

The costs of the number of CPUs comes later in the boot process:

VM with 4 CPUs:

[    0.995082] Run /init as init process

VM with 32 CPUs:

[    3.329398] Run /init as init process

No?

Thanks,

        tglx
Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
Posted by Steven Rostedt 2 months, 1 week ago
On Wed, 01 Apr 2026 10:33:26 +0200
Thomas Gleixner <tglx@kernel.org> wrote:

> Steven!
> 
> On Tue, Mar 31 2026 at 20:05, Steven Rostedt wrote:
> > On Wed, 01 Apr 2026 01:36:26 +0200
> > Thomas Gleixner <tglx@kernel.org> wrote:  
> >> That's ~12ms of time which is not accounted for in the overall boot time
> >> until the machine reaches the init process:
> >> 
> >> [   12.289141] Run /init as init process  
> >
> > 12 seconds! Holly crap, that would fail every Chromebook requirement we
> > have. In most cases, we shoot for a 3 second boot up and 8 second max.
> > That's from hitting the power button to login screen. FYI, 30ms is
> > considered a really long time.  
> 
> Which 30ms are you talking about?

Sorry, that was from your previous email:

   The existing early TSC enablement on x86 starts providing printk
   timestamps at about 30ms after boot out of the box when I remove
   'earlyprintk' from the command line on the same VM/host combo I tested
   the PoC. That too uses sched clock and not some special mechanism.

We actually try to account for pretty much every 10ms and if there's a 30ms
gap somewhere, we look to see what it was doing.

> 
> As I've demonstrated the time until the first time stamp is available
> has nothing to do with the number of CPUs and is more or less constant:

So my question is, if something changes in this time frame, where it goes
from 30ms to 300ms, how would we find out?

Again, for Chromebooks, we get these timestamps from the firmware and it's
very useful.

-- Steve
Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
Posted by Thomas Gleixner 2 months, 1 week ago
On Wed, Apr 01 2026 at 11:12, Steven Rostedt wrote:
> On Wed, 01 Apr 2026 10:33:26 +0200
> Thomas Gleixner <tglx@kernel.org> wrote:
>> As I've demonstrated the time until the first time stamp is available
>> has nothing to do with the number of CPUs and is more or less constant:
>
> So my question is, if something changes in this time frame, where it goes
> from 30ms to 300ms, how would we find out?
>
> Again, for Chromebooks, we get these timestamps from the firmware and it's
> very useful.

I understand all of that and I'm not opposed to make that happen if it's
integrated properly into sched clock as I demonstrated with the PoC
hack.

Thanks,

        tglx
Re: [PATCH v3] printk: fix zero-valued printk timestamps in early boot
Posted by Geert Uytterhoeven 2 months, 1 week ago
On Wed, 1 Apr 2026 at 02:04, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 01 Apr 2026 01:36:26 +0200
> Thomas Gleixner <tglx@kernel.org> wrote:
> > [   12.289141] Run /init as init process
>
> 12 seconds! Holly crap, that would fail every Chromebook requirement we
> have. In most cases, we shoot for a 3 second boot up and 8 second max.

So that's why my Sony HD Handycam never boots Linux, but instead still
resumes the system Tim suspended 18 years ago ;-)

Looks like the old CE Linux Forum Boot Time Working Group needs more
blood...

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds