[RFC PATCH 5/5] x86/entry: Store CPU info on exception entry

ira.weiny@intel.com posted 5 patches 3 years, 6 months ago
There is a newer version of this series
[RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by ira.weiny@intel.com 3 years, 6 months ago
From: Ira Weiny <ira.weiny@intel.com>

The CPU information of an exception is useful in determining where bad
CPUs are in a large data center.

Define arch_{save|restore}_auxiliary_pt_regs() and set
ARCH_HAS_PTREGS_AUXILIARY default to yes.

Store the CPU on exception entry and use it later.

Cc: Rik van Riel <riel@surriel.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 arch/x86/Kconfig                    |  2 +-
 arch/x86/include/asm/entry-common.h | 12 ++++++++++++
 arch/x86/include/asm/ptrace.h       |  1 +
 arch/x86/mm/fault.c                 |  4 ++--
 4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b35f6a472e09..707650a6ecb2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1876,7 +1876,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 config ARCH_HAS_PTREGS_AUXILIARY
 	depends on X86_64
-	bool
+	def_bool y
 
 choice
 	prompt "TSX enable mode"
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index 674ed46d3ced..eb145106929a 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -95,4 +95,16 @@ static __always_inline void arch_exit_to_user_mode(void)
 }
 #define arch_exit_to_user_mode arch_exit_to_user_mode
 
+#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY
+
+static inline void arch_save_aux_pt_regs(struct pt_regs *regs)
+{
+	struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
+
+	aux_pt_regs->cpu = raw_smp_processor_id();
+}
+#define arch_save_aux_pt_regs arch_save_aux_pt_regs
+
+#endif
+
 #endif
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 5a9c85893459..b403b469996f 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -97,6 +97,7 @@ struct pt_regs {
  * ARCH_HAS_PTREGS_AUXILIARY.  Failure to do so will result in a build failure.
  */
 struct pt_regs_auxiliary {
+	int cpu;
 };
 
 struct pt_regs_extended {
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 82cf23975aa1..5df99fe49494 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -768,9 +768,9 @@ static inline void
 show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 		unsigned long address, struct task_struct *tsk)
 {
+	struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
 	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
-	/* This is a racy snapshot, but it's better than nothing. */
-	int cpu = raw_smp_processor_id();
+	int cpu = aux_pt_regs->cpu;
 
 	if (!unhandled_signal(tsk, SIGSEGV))
 		return;
-- 
2.35.3
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Rik van Riel 3 years, 6 months ago
On Fri, 2022-08-05 at 10:30 -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> The CPU information of an exception is useful in determining where
> bad
> CPUs are in a large data center.
> 
> Define arch_{save|restore}_auxiliary_pt_regs() and set
> ARCH_HAS_PTREGS_AUXILIARY default to yes.
> 
> Store the CPU on exception entry and use it later.
> 
> Cc: Rik van Riel <riel@surriel.com>
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Suggested-by: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Acked-by: Rik van Riel <riel@surriel.com>

-- 
All Rights Reversed.
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Dave Hansen 3 years, 6 months ago
On 8/5/22 10:30, ira.weiny@intel.com wrote:
> +static inline void arch_save_aux_pt_regs(struct pt_regs *regs)
> +{
> +	struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
> +
> +	aux_pt_regs->cpu = raw_smp_processor_id();
> +}

This is in a fast path that all interrupt and exception entry uses.  So,
I was curious what the overhead is.

Code generation in irqentry_enter() gets a _bit_ more complicated
because arch_save_aux_pt_regs() has to be done on the way out of the
function and the compiler can't (for instance) do a

	mov    $0x1,%eax
	ret

to return.  But, the gist of the change is still only two instructions
that read a pretty hot, read-only per-cpu cacheline:

	mov    %gs:0x7e21fa4a(%rip),%eax        # 15a38 <cpu_number>
	mov    %eax,-0x8(%rbx)

That doesn't seem too bad.
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Ingo Molnar 3 years, 6 months ago
* Dave Hansen <dave.hansen@intel.com> wrote:

> On 8/5/22 10:30, ira.weiny@intel.com wrote:
> > +static inline void arch_save_aux_pt_regs(struct pt_regs *regs)
> > +{
> > +	struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
> > +
> > +	aux_pt_regs->cpu = raw_smp_processor_id();
> > +}
> 
> This is in a fast path that all interrupt and exception entry uses.  So,
> I was curious what the overhead is.
> 
> Code generation in irqentry_enter() gets a _bit_ more complicated
> because arch_save_aux_pt_regs() has to be done on the way out of the
> function and the compiler can't (for instance) do a
> 
> 	mov    $0x1,%eax
> 	ret
> 
> to return.  But, the gist of the change is still only two instructions
> that read a pretty hot, read-only per-cpu cacheline:
> 
> 	mov    %gs:0x7e21fa4a(%rip),%eax        # 15a38 <cpu_number>
> 	mov    %eax,-0x8(%rbx)
> 
> That doesn't seem too bad.

It's still 2 instructions more than what we had before, while the 
fault-time CPU number is only needed infrequently AFAICS.

Thanks,

	Ingo
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Borislav Petkov 3 years, 6 months ago
On Sat, Aug 06, 2022 at 11:01:06AM +0200, Ingo Molnar wrote:
> It's still 2 instructions more than what we had before, while the 
> fault-time CPU number is only needed infrequently AFAICS.

With the amount of logical cores ever increasing and how CPU packages
(nodes, L3 sharing, you name it) get more and more complex topology,
I'd say the 2 insns to show the CPU number in every exception is a good
thing to do.

Arguably, we probably should've even done it already...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Ingo Molnar 3 years, 6 months ago
* Borislav Petkov <bp@alien8.de> wrote:

> On Sat, Aug 06, 2022 at 11:01:06AM +0200, Ingo Molnar wrote:
> > It's still 2 instructions more than what we had before, while the 
> > fault-time CPU number is only needed infrequently AFAICS.
> 
> With the amount of logical cores ever increasing and how CPU packages
> (nodes, L3 sharing, you name it) get more and more complex topology,
> I'd say the 2 insns to show the CPU number in every exception is a good
> thing to do.

We can show it - I'm arguing against extracting it too early, which costs 
us 2 instructions in the exception fast path - while in 99.999999999% of 
the cases we don't use that field at all ...

> Arguably, we probably should've even done it already...

Yeah, so I'm not against Rik's patch that prints the CPU number - that's 
indeed useful and I'd like to see it merged.

I'm arguing against extracting the CPU so early as to impact the exception 
fast path.

Thanks,

	Ingo
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Borislav Petkov 3 years, 6 months ago
On Sun, Aug 07, 2022 at 12:02:41PM +0200, Ingo Molnar wrote:
> * Borislav Petkov <bp@alien8.de> wrote:
> > With the amount of logical cores ever increasing and how CPU packages
> > (nodes, L3 sharing, you name it) get more and more complex topology,
> > I'd say the 2 insns to show the CPU number in every exception is a good
> > thing to do.
> 
> We can show it - I'm arguing against extracting it too early, which costs

Not early - more correct. We can say which CPU executed the exception
handler *exactly*. Not which CPU executed the exception handler *maybe*.

> us 2 instructions in the exception fast path

2 insns? They don't matter at all. FWIW, they'll pull in the per-CPU
cacheline earlier which should be a net win later, for code which does
smp_processor_id().

> - while in 99.999999999% of the cases we don't use that field at all ...

See my text above about the ever-increasing complexity of CPU topology.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Ira Weiny 3 years, 6 months ago
On Sun, Aug 07, 2022 at 12:35:03PM +0200, Borislav Petkov wrote:
> On Sun, Aug 07, 2022 at 12:02:41PM +0200, Ingo Molnar wrote:
> > * Borislav Petkov <bp@alien8.de> wrote:
> > > With the amount of logical cores ever increasing and how CPU packages
> > > (nodes, L3 sharing, you name it) get more and more complex topology,
> > > I'd say the 2 insns to show the CPU number in every exception is a good
> > > thing to do.
> > 
> > We can show it - I'm arguing against extracting it too early, which costs
> 
> Not early - more correct. We can say which CPU executed the exception
> handler *exactly*. Not which CPU executed the exception handler *maybe*.
> 
> > us 2 instructions in the exception fast path
> 
> 2 insns? They don't matter at all. FWIW, they'll pull in the per-CPU
> cacheline earlier which should be a net win later, for code which does
> smp_processor_id().

I agree with Boris; however I feel that I have to mention that in patch 3/5 you
also have 1 instruction on each of entry and exit to push the extra stack
space.  So all told it would cost 4 instructions.

Again, I don't believe this is too much overhead but I don't want people to say
it was not discussed.

Ira

> 
> > - while in 99.999999999% of the cases we don't use that field at all ...
> 
> See my text above about the ever-increasing complexity of CPU topology.
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Ingo Molnar 3 years, 6 months ago
* Ira Weiny <ira.weiny@intel.com> wrote:

> On Sun, Aug 07, 2022 at 12:35:03PM +0200, Borislav Petkov wrote:
> > On Sun, Aug 07, 2022 at 12:02:41PM +0200, Ingo Molnar wrote:
> > > * Borislav Petkov <bp@alien8.de> wrote:
> > > > With the amount of logical cores ever increasing and how CPU packages
> > > > (nodes, L3 sharing, you name it) get more and more complex topology,
> > > > I'd say the 2 insns to show the CPU number in every exception is a good
> > > > thing to do.
> > > 
> > > We can show it - I'm arguing against extracting it too early, which costs
> > 
> > Not early - more correct. We can say which CPU executed the exception
> > handler *exactly*. Not which CPU executed the exception handler *maybe*.
> > 
> > > us 2 instructions in the exception fast path
> > 
> > 2 insns? They don't matter at all. FWIW, they'll pull in the per-CPU
> > cacheline earlier which should be a net win later, for code which does
> > smp_processor_id().

I'd like to hear what Andy Lutomirski thinks about the notion that
"2 instructions don't matter at all" ...

Especially since it's now 4 instructions:

> I agree with Boris; however I feel that I have to mention that in patch 
> 3/5 you also have 1 instruction on each of entry and exit to push the 
> extra stack space.  So all told it would cost 4 instructions.

... 4 instructions in the exception path is a non-trivial impact.

> Again, I don't believe this is too much overhead but I don't want people 
> to say it was not discussed.

Is it necessary to do this, what are the alternatives, can this overhead be 
avoided?

Thanks,

	Ingo
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Dave Hansen 3 years, 6 months ago
On 8/8/22 04:03, Ingo Molnar wrote:
>> Again, I don't believe this is too much overhead but I don't want people 
>> to say it was not discussed.
> Is it necessary to do this, what are the alternatives, can this overhead be 
> avoided?

One thing Andy mentioned is that we _could_ get it down to two instructions:

	rdgsbase $reg
	push $reg

This could be hidden in:

	PUSH_PTREGS_AUXILIARY

where, today, it would only net add a single instruction.  But, if we
ever add more stuff to PUSH_PTREGS_AUXILIARY, it would move back to
needing two instructions since we'd need both the:

	subq $PTREGS_AUX_SIZE, %rsp

and something to write gsbase to the stack.

That doesn't get us the smp_processor_id() directly, but we can derive
it later on from the gsbase value.

The downside is that we're doing it in assembly.  We'd also have
something additional which is a bit uglier and that reads memory on
!X86_FEATURE_FSGSBASE, probably:
	
	movq    PER_CPU_VAR(cpu_number), %reg
	push %reg

Which would require some different code to decode what was there:

int read_exception_cpu_number(ext_pt_regs *e)
{
	if (cpu_feature_enabled(X86_FEATURE_FSGSBASE))
		return gsbase_to_cpu_number(e->ext_cpu_nr);
	else
		return e->ext_cpu_nr;
}

I'm thinking that the whole racy smp_processor_id() thing wasn't so bad
in the first place.
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Andy Lutomirski 3 years, 6 months ago

On Mon, Aug 8, 2022, at 9:16 AM, Dave Hansen wrote:
> On 8/8/22 04:03, Ingo Molnar wrote:
>>> Again, I don't believe this is too much overhead but I don't want people 
>>> to say it was not discussed.
>> Is it necessary to do this, what are the alternatives, can this overhead be 
>> avoided?
>
> One thing Andy mentioned is that we _could_ get it down to two instructions:
>
> 	rdgsbase $reg
> 	push $reg
>
> This could be hidden in:
>
> 	PUSH_PTREGS_AUXILIARY
>
> where, today, it would only net add a single instruction.  But, if we
> ever add more stuff to PUSH_PTREGS_AUXILIARY, it would move back to
> needing two instructions since we'd need both the:
>
> 	subq $PTREGS_AUX_SIZE, %rsp
>
> and something to write gsbase to the stack.
>
> That doesn't get us the smp_processor_id() directly, but we can derive
> it later on from the gsbase value.
>
> The downside is that we're doing it in assembly.  We'd also have
> something additional which is a bit uglier and that reads memory on
> !X86_FEATURE_FSGSBASE, probably:
>	
> 	movq    PER_CPU_VAR(cpu_number), %reg
> 	push %reg

Nah, I believe the same value that RDGSBASE reads is already in percpu memory as 'per_cpu_offset', so the alternative can just read that and the code that uses it doesn’t need to care about the alternative.

>
> Which would require some different code to decode what was there:
>
> int read_exception_cpu_number(ext_pt_regs *e)
> {
> 	if (cpu_feature_enabled(X86_FEATURE_FSGSBASE))
> 		return gsbase_to_cpu_number(e->ext_cpu_nr);
> 	else
> 		return e->ext_cpu_nr;
> }
>
> I'm thinking that the whole racy smp_processor_id() thing wasn't so bad
> in the first place.
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Rik van Riel 3 years, 6 months ago
On Mon, 2022-08-08 at 09:16 -0700, Dave Hansen wrote:
> On 8/8/22 04:03, Ingo Molnar wrote:
> > > Again, I don't believe this is too much overhead but I don't want
> > > people 
> > > to say it was not discussed.
> > Is it necessary to do this, what are the alternatives, can this
> > overhead be 
> > avoided?
> 
> I'm thinking that the whole racy smp_processor_id() thing wasn't so
> bad
> in the first place.
> 

FWIW, just grabbing the CPU number in show_signal_msg()
appears to be good enough for our use. 

It will typically show >90% of the errors happening on the
CPU core that went bad, which is more than enough to diagnose 
that a server has a hardware issue and should probably have
the CPU repaired.

-- 
All Rights Reversed.
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Borislav Petkov 3 years, 6 months ago
On Mon, Aug 08, 2022 at 01:03:24PM +0200, Ingo Molnar wrote:
> I'd like to hear what Andy Lutomirski thinks about the notion that
> "2 instructions don't matter at all" ...
> 
> Especially since it's now 4 instructions:

He wasn't opposed to it when we talked on IRC last week.

> ... 4 instructions in the exception path is a non-trivial impact.

How do I measure this "impact"?

Hell, we recently added retbleed - and IBRS especially on Intel - on
the entry path which is whopping 30% perf impact in some cases. And
now we're arguing about a handful of insns. I'm sceptical they'll be
anything else but "in-the-noise" in any sensible workload.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry
Posted by Thomas Gleixner 3 years, 6 months ago
On Mon, Aug 08 2022 at 14:01, Borislav Petkov wrote:
> On Mon, Aug 08, 2022 at 01:03:24PM +0200, Ingo Molnar wrote:
>> I'd like to hear what Andy Lutomirski thinks about the notion that
>> "2 instructions don't matter at all" ...
>> 
>> Especially since it's now 4 instructions:
>
> He wasn't opposed to it when we talked on IRC last week.
>
>> ... 4 instructions in the exception path is a non-trivial impact.
>
> How do I measure this "impact"?
>
> Hell, we recently added retbleed - and IBRS especially on Intel - on
> the entry path which is whopping 30% perf impact in some cases. And
> now we're arguing about a handful of insns. I'm sceptical they'll be
> anything else but "in-the-noise" in any sensible workload.

I'm not worried about the 4 instructions per se, but storing the CPU
number on every exception and interrupt entry just to use it in exactly
one place, i.e. the user mode #PF handler, does not make any sense at
all.

Get the CPU number before enabling interrupts and hand it in to the
bad area variants.

I know that the aux reg code is required for other things, but just for
this it's complete overkill.

Thanks,

        tglx