[PATCH] x86/mm: Inline mm_mangle_tif_spec_bits() and l1d_flush_evaluate()

Khalid Ali posted 1 patch 3 months, 2 weeks ago
arch/x86/mm/tlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH] x86/mm: Inline mm_mangle_tif_spec_bits() and l1d_flush_evaluate()
Posted by Khalid Ali 3 months, 2 weeks ago
From: Khalid Ali <khaliidcaliy@gmail.com>

These two functions are called from performance critical path like context
switch.

So make sure the compiler optimizes out by inlining. This won't result
increase of size because these functions only have one call site.

Signed-off-by: Khalid Ali <khaliidcaliy@gmail.com>
---
 arch/x86/mm/tlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 39f80111e6f1..657e8e254b69 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -624,7 +624,7 @@ static void l1d_flush_force_sigbus(struct callback_head *ch)
 	force_sig(SIGBUS);
 }
 
-static void l1d_flush_evaluate(unsigned long prev_mm, unsigned long next_mm,
+static __always_inline void l1d_flush_evaluate(unsigned long prev_mm, unsigned long next_mm,
 				struct task_struct *next)
 {
 	/* Flush L1D if the outgoing task requests it */
@@ -648,7 +648,7 @@ static void l1d_flush_evaluate(unsigned long prev_mm, unsigned long next_mm,
 	}
 }
 
-static unsigned long mm_mangle_tif_spec_bits(struct task_struct *next)
+static __always_inline unsigned long mm_mangle_tif_spec_bits(struct task_struct *next)
 {
 	unsigned long next_tif = read_task_thread_flags(next);
 	unsigned long spec_bits = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_SPEC_MASK;
-- 
2.49.0
Re: [PATCH] x86/mm: Inline mm_mangle_tif_spec_bits() and l1d_flush_evaluate()
Posted by Dave Hansen 3 months, 2 weeks ago
On 6/23/25 10:43, Khalid Ali wrote:
> These two functions are called from performance critical path like context
> switch.
> 
> So make sure the compiler optimizes out by inlining. This won't result
> increase of size because these functions only have one call site.

Khalid,

The compiler is currently given the latitude to choose an inlining
strategy for these functions. Generally, I'd assume that it's doing
something sane unless there's a specific compiler making suboptimal
decisions.

Do you have some evidence that compilers are doing the wrong thing?
Perhaps some generated code that looks wrong to you or some evidence
that the proposed change improves performance?