[RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs

ira.weiny@intel.com posted 5 patches 3 years, 6 months ago
There is a newer version of this series
[RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs
Posted by ira.weiny@intel.com 3 years, 6 months ago
From: Ira Weiny <ira.weiny@intel.com>

Some architectures have auxiliary pt_regs space available to store
information on the stack during exceptions.  This information is easier
to obtain and store within C code rather than in arch specific assembly.

Define empty calls to architecture specific save and restore auxiliary
pt_regs functions.  Call these functions on generic entry/exit.

NOTE: Due to the split nature of the Xen exit code
irqentry_exit_cond_resched() requires an unbalanced call to
arch_restore_aux_pt_regs().

Cc: Rik van Riel <riel@surriel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Forward ported from PKS series
	https://lore.kernel.org/lkml/20220419170649.1022246-20-ira.weiny@intel.com/
---
 include/linux/entry-common.h |  8 ++++++++
 kernel/entry/common.c        | 16 ++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 976cce7cf803..1c09ba64ad28 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -79,6 +79,14 @@ static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs);
 static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs) {}
 #endif
 
+#ifndef arch_save_aux_pt_regs
+static inline void arch_save_aux_pt_regs(struct pt_regs *regs) { }
+#endif
+
+#ifndef arch_restore_aux_pt_regs
+static inline void arch_restore_aux_pt_regs(struct pt_regs *regs) { }
+#endif
+
 /**
  * enter_from_user_mode - Establish state when coming from user mode
  *
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 8c0f334c4b75..a70a0f314aee 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -317,7 +317,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 
 	if (user_mode(regs)) {
 		irqentry_enter_from_user_mode(regs);
-		return ret;
+		goto aux_save;
 	}
 
 	/*
@@ -356,7 +356,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 		instrumentation_end();
 
 		ret.exit_rcu = true;
-		return ret;
+		goto aux_save;
 	}
 
 	/*
@@ -371,6 +371,11 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
 	trace_hardirqs_off_finish();
 	instrumentation_end();
 
+aux_save:
+	instrumentation_begin();
+	arch_save_aux_pt_regs(regs);
+	instrumentation_end();
+
 	return ret;
 }
 
@@ -401,6 +406,7 @@ void dynamic_irqentry_exit_cond_resched(void)
 
 void irqentry_exit_cond_resched(struct pt_regs *regs)
 {
+	arch_restore_aux_pt_regs(regs);
 	irqentry_exit_cond_resched_internal();
 }
 
@@ -408,6 +414,10 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 {
 	lockdep_assert_irqs_disabled();
 
+	instrumentation_begin();
+	arch_restore_aux_pt_regs(regs);
+	instrumentation_end();
+
 	/* Check whether this returns to user mode */
 	if (user_mode(regs)) {
 		irqentry_exit_to_user_mode(regs);
@@ -459,6 +469,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
 	instrumentation_begin();
 	trace_hardirqs_off_finish();
 	ftrace_nmi_enter();
+	arch_save_aux_pt_regs(regs);
 	instrumentation_end();
 
 	return irq_state;
@@ -467,6 +478,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
 void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state)
 {
 	instrumentation_begin();
+	arch_restore_aux_pt_regs(regs);
 	ftrace_nmi_exit();
 	if (irq_state.lockdep) {
 		trace_hardirqs_on_prepare();
-- 
2.35.3
Re: [RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs
Posted by Borislav Petkov 3 years, 6 months ago
On Fri, Aug 05, 2022 at 10:30:06AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Some architectures have auxiliary pt_regs space available to store
> information on the stack during exceptions.  This information is easier
> to obtain and store within C code rather than in arch specific assembly.

There are others?

Because I would've done this whole thing in arch/x86/ only...

> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 8c0f334c4b75..a70a0f314aee 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -317,7 +317,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
>  
>  	if (user_mode(regs)) {
>  		irqentry_enter_from_user_mode(regs);
> -		return ret;
> +		goto aux_save;

Why do you have to goto and do the instrumentation sandwitch around it
at the goto label?

Why not simply do

	if (user_mode(regs)) {
		irqentry_enter_from_user_mode(regs);
		arch_save_aux_pt_regs(regs);
		return ret;

and so on?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs
Posted by Ira Weiny 3 years, 6 months ago
On Tue, Aug 09, 2022 at 02:05:15PM +0200, Borislav Petkov wrote:
> On Fri, Aug 05, 2022 at 10:30:06AM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Some architectures have auxiliary pt_regs space available to store
> > information on the stack during exceptions.  This information is easier
> > to obtain and store within C code rather than in arch specific assembly.
> 
> There are others?

Other archs?  not now.

> 
> Because I would've done this whole thing in arch/x86/ only...

Thomas did a lot of work to make the entry code generic.  So I was keeping that
work consistent.  This also helps to ensure I did not miss any places.

> 
> > diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> > index 8c0f334c4b75..a70a0f314aee 100644
> > --- a/kernel/entry/common.c
> > +++ b/kernel/entry/common.c
> > @@ -317,7 +317,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
> >  
> >  	if (user_mode(regs)) {
> >  		irqentry_enter_from_user_mode(regs);
> > -		return ret;
> > +		goto aux_save;
> 
> Why do you have to goto and do the instrumentation sandwitch around it
> at the goto label?
> 
> Why not simply do
> 
> 	if (user_mode(regs)) {
> 		irqentry_enter_from_user_mode(regs);
> 		arch_save_aux_pt_regs(regs);
> 		return ret;

I don't believe this is correct because instrumentation is not enabled here.
See below.[1]

There are 3 exit paths from irqentry_enter().  All must call
arch_save_aux_pt_regs().  I felt the maintenance of the code would be easier if
I did not scatter those calls through the function and simply exited 1
place.[1]

FWICT calling instrumentation_{begin,end}() is a noop in production code.  So
there is no real cost to calling instrumentation_begin() -> end -> begin ->
end.  And the goto seemed low enough overhead.  Given the current discussion I
admit I may have made the wrong choice.  But I think I would add some comments
to the below to help future developers.

Ira

[1]

--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -317,6 +317,9 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)

        if (user_mode(regs)) {
                irqentry_enter_from_user_mode(regs);
+               instrumentation_begin();
+               arch_save_aux_pt_regs(regs);
+               instrumentation_end();
                return ret;
        }

@@ -353,6 +356,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
                ct_irq_enter();
                instrumentation_begin();
                trace_hardirqs_off_finish();
+               arch_save_aux_pt_regs(regs);
                instrumentation_end();

                ret.exit_rcu = true;
@@ -369,6 +373,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
        instrumentation_begin();
        rcu_irq_enter_check_tick();
        trace_hardirqs_off_finish();
+       arch_save_aux_pt_regs(regs);
        instrumentation_end();

        return ret;
Re: [RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs
Posted by Borislav Petkov 3 years, 6 months ago
On Tue, Aug 09, 2022 at 11:38:03AM -0700, Ira Weiny wrote:
> Thomas did a lot of work to make the entry code generic. So I was
> keeping that work consistent. This also helps to ensure I did not miss
> any places.

How about you worry about the other arches when you actually cross that
bridge?

> I don't believe this is correct because instrumentation is not enabled
> here.

Why do you have to run

	arch_save_aux_pt_regs()

with instrumentation enabled?

Patch 5 does

+       struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
+
+       aux_pt_regs->cpu = raw_smp_processor_id();

only?

Why does that need to run with instrumentation enabled?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs
Posted by Ira Weiny 3 years, 6 months ago
On Tue, Aug 09, 2022 at 08:49:47PM +0200, Borislav Petkov wrote:
> On Tue, Aug 09, 2022 at 11:38:03AM -0700, Ira Weiny wrote:
> > Thomas did a lot of work to make the entry code generic. So I was
> > keeping that work consistent. This also helps to ensure I did not miss
> > any places.
> 
> How about you worry about the other arches when you actually cross that
> bridge?

I guess I miss understood the reasoning behind Thomas' work.  No one mentioned
trying to isolate this to x86 during the PKS review.

> 
> > I don't believe this is correct because instrumentation is not enabled
> > here.
> 
> Why do you have to run
> 
> 	arch_save_aux_pt_regs()
> 
> with instrumentation enabled?

Sorry, that was carried over from the PKS series where I did need to call it.

I pulled this series over quickly to show basically what needed to be done.
But I did not review it for this detail.

> 
> Patch 5 does
> 
> +       struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
> +
> +       aux_pt_regs->cpu = raw_smp_processor_id();
> 
> only?
> 
> Why does that need to run with instrumentation enabled?

This does not.

Am I wrong that instrumentation begin/end are 0 overhead in production builds?

Ira

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs
Posted by Thomas Gleixner 3 years, 6 months ago
On Tue, Aug 09 2022 at 14:49, Ira Weiny wrote:
> On Tue, Aug 09, 2022 at 08:49:47PM +0200, Borislav Petkov wrote:
>> Why does that need to run with instrumentation enabled?
>
> This does not.
>
> Am I wrong that instrumentation begin/end are 0 overhead in production builds?

instrumentation_begin/end() are annotations and never emit executable
code. Neither in production nor in debug builds. Why keep you harping on
this question? Is it so hard to figure out what it does?

But that's not the question at all. The question is where auxregs need
to be set up.

Once the entry code leaves non-instrumentable code it's obviously
desired to have consistent state for instrumentation.

This obviously applies also to the auxreg space unless there is a
compelling reason why this is not required.

Thanks,

        tglx
Re: [RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs
Posted by Thomas Gleixner 3 years, 6 months ago
On Tue, Aug 09 2022 at 20:49, Borislav Petkov wrote:
> On Tue, Aug 09, 2022 at 11:38:03AM -0700, Ira Weiny wrote:
>> Thomas did a lot of work to make the entry code generic. So I was
>> keeping that work consistent. This also helps to ensure I did not miss
>> any places.
>
> How about you worry about the other arches when you actually cross that
> bridge?

Ira is right. If we want it for everything, then the generic code is the
right place.

>> I don't believe this is correct because instrumentation is not enabled
>> here.
>
> Why do you have to run
>
> 	arch_save_aux_pt_regs()
>
> with instrumentation enabled?
>
> Patch 5 does
>
> +       struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
> +
> +       aux_pt_regs->cpu = raw_smp_processor_id();
>
> only?
>
> Why does that need to run with instrumentation enabled?

There is no reason and actually _if_ we go there then this _has_ to
happen _before_ instrumentation comes into play as instrumentation might
want to have access to extended pt_regs. Not necessarily the cpu number,
but the other things which need to go there for a real reason.

Thanks,

        tglx
Re: [RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs
Posted by Borislav Petkov 3 years, 6 months ago
On Tue, Aug 09, 2022 at 11:14:19PM +0200, Thomas Gleixner wrote:
> Ira is right. If we want it for everything, then the generic code is the
> right place.

But what is "everything"? Currently, and AFAIU, it is for the PKS use
case on x86 only.

I'm not saying it should not be done this way eventually - all I'm
saying is we should not design "preemptively" before it is really needed
for other arches.

Unless putting it in generic code makes it all simpler and cleaner to
do, that is.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs
Posted by Ira Weiny 3 years, 6 months ago
On Tue, Aug 09, 2022 at 11:38:48PM +0200, Borislav Petkov wrote:
> On Tue, Aug 09, 2022 at 11:14:19PM +0200, Thomas Gleixner wrote:
> > Ira is right. If we want it for everything, then the generic code is the
> > right place.
> 
> But what is "everything"? Currently, and AFAIU, it is for the PKS use
> case on x86 only.
> 
> I'm not saying it should not be done this way eventually - all I'm
> saying is we should not design "preemptively" before it is really needed
> for other arches.
> 
> Unless putting it in generic code makes it all simpler and cleaner to
> do, that is.

For the cpu use case we could limit the number of call sites.  However, for PKS
the patch would have required changing x86 code in approximately 9 places for
the enter code.

$ git grep 'irqentry_enter(regs)' arch/x86 | wc -l
9

How about we drop patch 1 (I'll rework it to be less churn and submit it for
clean up separately because it will no longer be needed).  Keep patch 3 as is.
Then combine 2 and 5 as below.  The saving of the CPU can be lifted later if
needed.

Ira


commit 4c1d646888dd7471ae71a24109d587901a00f87d                                 
Author: Ira Weiny <ira.weiny@intel.com>                                         
Date:   Mon Jan 10 15:06:07 2022 -0800                                          
                                                                                
    x86/entry: Store CPU info on exception entry                                
                                                                                
    x86 has auxiliary pt_regs space available to store information on the       
    stack during exceptions.  This information is easier to obtain and store    
    within C code.                                                              
                                                                                
    The CPU information of a page fault is useful in determining where bad      
    CPUs are in a large data center.                                            
                                                                                
    Define aux_pt_regs_save_cpu() and set ARCH_HAS_PTREGS_AUXILIARY default     
    to yes.                                                                     
                                                                                
    Store the CPU on page fault entry and use it later.                         
                                                                                
    Cc: Rik van Riel <riel@surriel.com>                                         
    Suggested-by: Borislav Petkov <bp@alien8.de>                                
    Suggested-by: Dave Hansen <dave.hansen@intel.com>                           
    Suggested-by: Thomas Gleixner <tglx@linutronix.de>                          
    Signed-off-by: Ira Weiny <ira.weiny@intel.com>                              
                                                                                
    ---                                                                         
    Changes from RFC:                                                           
            New patch combining 2 and 5 from original series and modified.      
            Boris/Thomas - eliminate generic calls to save the cpu and call     
            only from exc_page_fault                                            
                                                                                
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig                                
index b35f6a472e09..707650a6ecb2 100644                                         
--- a/arch/x86/Kconfig                                                          
+++ b/arch/x86/Kconfig                                                          
@@ -1876,7 +1876,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS                   
                                                                                
 config ARCH_HAS_PTREGS_AUXILIARY                                               
        depends on X86_64                                                       
-       bool                                                                    
+       def_bool y                                                              
                                                                                
 choice                                                                         
        prompt "TSX enable mode"                                                
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h      
index 5a9c85893459..b403b469996f 100644                                         
--- a/arch/x86/include/asm/ptrace.h                                             
+++ b/arch/x86/include/asm/ptrace.h                                             
@@ -97,6 +97,7 @@ struct pt_regs {                                              
  * ARCH_HAS_PTREGS_AUXILIARY.  Failure to do so will result in a build failure.
  */                                                                            
 struct pt_regs_auxiliary {                                                     
+       int cpu;                                                                
 };                                                                             
                                                                                
 struct pt_regs_extended {                                                      
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c                          
index 82cf23975aa1..b9b8344b69ad 100644                                         
--- a/arch/x86/mm/fault.c                                                       
+++ b/arch/x86/mm/fault.c                                                       
@@ -768,9 +768,9 @@ static inline void                                          
 show_signal_msg(struct pt_regs *regs, unsigned long error_code,                
                unsigned long address, struct task_struct *tsk)                 
 {                                                                              
+       struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
        const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;     
-       /* This is a racy snapshot, but it's better than nothing. */            
-       int cpu = raw_smp_processor_id();                                       
+       int cpu = aux_pt_regs->cpu;                                             
                                                                                
        if (!unhandled_signal(tsk, SIGSEGV))                                    
                return;                                                         
@@ -1503,6 +1503,13 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
        }                                                                       
 }                                                                              
                                                                                
+static void aux_pt_regs_save_cpu(struct pt_regs *regs)                         
+{                                                                              
+       struct pt_regs_auxiliary *aux_pt_regs = &to_extended_pt_regs(regs)->aux;
+                                                                               
+       aux_pt_regs->cpu = raw_smp_processor_id();                              
+}                                                                              
+                                                                               
 DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)                                  
 {                                                                              
        unsigned long address = read_cr2();                                     
@@ -1546,6 +1553,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)             
         */                                                                     
        state = irqentry_enter(regs);                                           
                                                                                
+       aux_pt_regs_save_cpu(regs);                                             
        instrumentation_begin();                                                
        handle_page_fault(regs, error_code, address);                           
        instrumentation_end();
Re: [RFC PATCH 2/5] entry: Add calls for save/restore auxiliary pt_regs
Posted by Rik van Riel 3 years, 6 months ago
On Fri, 2022-08-05 at 10:30 -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Some architectures have auxiliary pt_regs space available to store
> information on the stack during exceptions.  This information is
> easier
> to obtain and store within C code rather than in arch specific
> assembly.
> 
> Define empty calls to architecture specific save and restore
> auxiliary
> pt_regs functions.  Call these functions on generic entry/exit.
> 
> NOTE: Due to the split nature of the Xen exit code
> irqentry_exit_cond_resched() requires an unbalanced call to
> arch_restore_aux_pt_regs().
> 
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
Acked-by: Rik van Riel <riel@surriel.com>

-- 
All Rights Reversed.