[RFC PATCH 01/22] kernel/entry/common: Move syscall_enter_from_user_mode_work() out of header

K Prateek Nayak posted 22 patches 10 months ago
[RFC PATCH 01/22] kernel/entry/common: Move syscall_enter_from_user_mode_work() out of header
Posted by K Prateek Nayak 10 months ago
Retain the prototype of syscall_enter_from_user_mode_work() in
linux/entry-common.h and move the function definition to
kernel/entry/common.c in preparation to notify the scheduler of task
entering and exiting kernel mode for syscall. The two architectures that
use it directly (x86, s390) and the four that call it via
syscall_enter_from_user_mode() (x86, riscv, loongarch, s390) end up
selecting GENERIC_ENTRY, hence, no functional changes are intended.

syscall_enter_from_user_mode_work() will end up calling function whose
visibility needs to be limited for kernel/* use only for cfs throttling
deferral.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 include/linux/entry-common.h | 10 +---------
 kernel/entry/common.c        | 10 ++++++++++
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index fc61d0205c97..7569a49cf7a0 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -161,15 +161,7 @@ long syscall_trace_enter(struct pt_regs *regs, long syscall,
  *     ptrace_report_syscall_entry(), __secure_computing(), trace_sys_enter()
  *  2) Invocation of audit_syscall_entry()
  */
-static __always_inline long syscall_enter_from_user_mode_work(struct pt_regs *regs, long syscall)
-{
-	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
-
-	if (work & SYSCALL_WORK_ENTER)
-		syscall = syscall_trace_enter(regs, syscall, work);
-
-	return syscall;
-}
+long syscall_enter_from_user_mode_work(struct pt_regs *regs, long syscall);
 
 /**
  * syscall_enter_from_user_mode - Establish state and check and handle work
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index e33691d5adf7..cc93cdcc36d0 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -79,6 +79,16 @@ noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
 	instrumentation_end();
 }
 
+__always_inline long syscall_enter_from_user_mode_work(struct pt_regs *regs, long syscall)
+{
+	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
+
+	if (work & SYSCALL_WORK_ENTER)
+		syscall = syscall_trace_enter(regs, syscall, work);
+
+	return syscall;
+}
+
 /* Workaround to allow gradual conversion of architecture code */
 void __weak arch_do_signal_or_restart(struct pt_regs *regs) { }
 

base-commit: d34e798094ca7be935b629a42f8b237d4d5b7f1d
-- 
2.43.0
Re: [RFC PATCH 01/22] kernel/entry/common: Move syscall_enter_from_user_mode_work() out of header
Posted by Peter Zijlstra 10 months ago
On Thu, Feb 20, 2025 at 09:32:36AM +0000, K Prateek Nayak wrote:
> Retain the prototype of syscall_enter_from_user_mode_work() in
> linux/entry-common.h and move the function definition to
> kernel/entry/common.c in preparation to notify the scheduler of task
> entering and exiting kernel mode for syscall. The two architectures that
> use it directly (x86, s390) and the four that call it via
> syscall_enter_from_user_mode() (x86, riscv, loongarch, s390) end up
> selecting GENERIC_ENTRY, hence, no functional changes are intended.
> 
> syscall_enter_from_user_mode_work() will end up calling function whose
> visibility needs to be limited for kernel/* use only for cfs throttling
> deferral.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>  include/linux/entry-common.h | 10 +---------
>  kernel/entry/common.c        | 10 ++++++++++
>  2 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index fc61d0205c97..7569a49cf7a0 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -161,15 +161,7 @@ long syscall_trace_enter(struct pt_regs *regs, long syscall,
>   *     ptrace_report_syscall_entry(), __secure_computing(), trace_sys_enter()
>   *  2) Invocation of audit_syscall_entry()
>   */
> -static __always_inline long syscall_enter_from_user_mode_work(struct pt_regs *regs, long syscall)
> -{
> -	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
> -
> -	if (work & SYSCALL_WORK_ENTER)
> -		syscall = syscall_trace_enter(regs, syscall, work);
> -
> -	return syscall;
> -}
> +long syscall_enter_from_user_mode_work(struct pt_regs *regs, long syscall);
>  
>  /**
>   * syscall_enter_from_user_mode - Establish state and check and handle work
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index e33691d5adf7..cc93cdcc36d0 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -79,6 +79,16 @@ noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
>  	instrumentation_end();
>  }
>  
> +__always_inline long syscall_enter_from_user_mode_work(struct pt_regs *regs, long syscall)
> +{
> +	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
> +
> +	if (work & SYSCALL_WORK_ENTER)
> +		syscall = syscall_trace_enter(regs, syscall, work);
> +
> +	return syscall;
> +}
> +
>  /* Workaround to allow gradual conversion of architecture code */
>  void __weak arch_do_signal_or_restart(struct pt_regs *regs) { }

This breaks s390. While you retain an external linkage, the function
looses the noinstr tag that's needed for correctness.

Also, extern __always_inline is flaky as heck. Please don't do this.
Re: [RFC PATCH 01/22] kernel/entry/common: Move syscall_enter_from_user_mode_work() out of header
Posted by K Prateek Nayak 10 months ago
Hello Peter,

Thank you for taking a look.

On 2/20/2025 4:13 PM, Peter Zijlstra wrote:
> On Thu, Feb 20, 2025 at 09:32:36AM +0000, K Prateek Nayak wrote:
>> Retain the prototype of syscall_enter_from_user_mode_work() in
>> linux/entry-common.h and move the function definition to
>> kernel/entry/common.c in preparation to notify the scheduler of task
>> entering and exiting kernel mode for syscall. The two architectures that
>> use it directly (x86, s390) and the four that call it via
>> syscall_enter_from_user_mode() (x86, riscv, loongarch, s390) end up
>> selecting GENERIC_ENTRY, hence, no functional changes are intended.
>>
>> [..snip..]
>>
>> @@ -79,6 +79,16 @@ noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
>>   	instrumentation_end();
>>   }
>>   
>> +__always_inline long syscall_enter_from_user_mode_work(struct pt_regs *regs, long syscall)
>> +{
>> +	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
>> +
>> +	if (work & SYSCALL_WORK_ENTER)
>> +		syscall = syscall_trace_enter(regs, syscall, work);
>> +
>> +	return syscall;
>> +}
>> +
>>   /* Workaround to allow gradual conversion of architecture code */
>>   void __weak arch_do_signal_or_restart(struct pt_regs *regs) { }
> 
> This breaks s390. While you retain an external linkage, the function
> looses the noinstr tag that's needed for correctness.
> 
> Also, extern __always_inline is flaky as heck. Please don't do this.

Noted! I'll try to find another way around this.

-- 
Thanks and Regards,
Prateek