[v1] kernel/entry: Remove some redundancy checks on syscall works

[PATCH] kernel/entry: Remove some redundancy checks on syscall works

Posted by Khalid Ali 4 months ago

There is a redundant checks of thread syscall work.

After we read thread syscall work we are checking the work bits using
SYSCALL_WORK_ENTER and SYSCALL_WORK_EXIT on syscall entry and exit
respectively, and at the same time syscall_trace_enter() and
syscall_exit_work() checking bits one by one, the bits we already checked.
This is redundancy. So either we need to check the work bits one by one as I
did, or check as whole. On my prespective, i think the way code is
implemented now checking work bits one by one is simpler and gives us
more granular control.

I also checking if SYSCALL_WORK_SYSCALL_AUDIT is set on work as other
part. Thus we don't need SYSCALL_WORK_ENTER and SYSCALL_WORK_EXIT
anymore. We may need on the future, so we can safely let them be there.

Signed-off-by: Khalid Ali <khaliidcaliy@gmail.com>
---
 include/linux/entry-common.h | 9 ++-------
 kernel/entry/common.c        | 8 +++++---
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index f94f3fdf15fc..60a6741d5c5a 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -165,11 +165,7 @@ long syscall_trace_enter(struct pt_regs *regs, long syscall,
 static __always_inline long syscall_enter_from_user_mode_work(struct pt_regs *regs, long syscall)
 {
 	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
-
-	if (work & SYSCALL_WORK_ENTER)
-		syscall = syscall_trace_enter(regs, syscall, work);
-
-	return syscall;
+	return  syscall_trace_enter(regs, syscall, work);
 }
 
 /**
@@ -408,8 +404,7 @@ static __always_inline void syscall_exit_to_user_mode_work(struct pt_regs *regs)
 	 * enabled, we want to run them exactly once per syscall exit with
 	 * interrupts enabled.
 	 */
-	if (unlikely(work & SYSCALL_WORK_EXIT))
-		syscall_exit_work(regs, work);
+	syscall_exit_work(regs, work);
 	local_irq_disable_exit_to_user();
 	exit_to_user_mode_prepare(regs);
 }
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index e774dc9e7eaf..9accf5fde8e7 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -64,8 +64,9 @@ long syscall_trace_enter(struct pt_regs *regs, long syscall,
 		 */
 		syscall = syscall_get_nr(current, regs);
 	}
-
-	syscall_enter_audit(regs, syscall);
+	
+	if (work & SYSCALL_WORK_SYSCALL_AUDIT)
+		syscall_enter_audit(regs, syscall);
 
 	return ret ? : syscall;
 }
@@ -162,7 +163,8 @@ void syscall_exit_work(struct pt_regs *regs, unsigned long work)
 		}
 	}
 
-	audit_syscall_exit(regs);
+	if (work & SYSCALL_WORK_SYSCALL_AUDIT)
+		audit_syscall_exit(regs);
 
 	if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
 		trace_sys_exit(regs, syscall_get_return_value(current, regs));
-- 
2.49.0

Re: [PATCH] kernel/entry: Remove some redundancy checks on syscall works

Posted by Thomas Gleixner 3 months, 4 weeks ago

On Wed, Jun 11 2025 at 11:43, Khalid Ali wrote:
> There is a redundant checks of thread syscall work.

Not really.

> After we read thread syscall work we are checking the work bits using

We are doing nothing. Please write your changelogs in imperative mood
and do not try to impersonate code.

> SYSCALL_WORK_ENTER and SYSCALL_WORK_EXIT on syscall entry and exit
> respectively, and at the same time syscall_trace_enter() and
> syscall_exit_work() checking bits one by one, the bits we already checked.
> This is redundancy. So either we need to check the work bits one by one as I
> did, or check as whole. On my prespective, i think the way code is
> implemented now checking work bits one by one is simpler and gives us
> more granular control.

That's just wrong and absolutely not redundant. Care to look at the
definition of SYSCALL_WORK_ENTER:

#define SYSCALL_WORK_ENTER	(SYSCALL_WORK_SECCOMP |			\
				 SYSCALL_WORK_SYSCALL_TRACEPOINT |	\
				 SYSCALL_WORK_SYSCALL_TRACE |		\
				 SYSCALL_WORK_SYSCALL_EMU |		\
				 SYSCALL_WORK_SYSCALL_AUDIT |		\
				 SYSCALL_WORK_SYSCALL_USER_DISPATCH |	\
				 ARCH_SYSCALL_WORK_ENTER)

So this initial check avoids:

    1) Doing an unconditional out of line call

    2) Checking bit for bit to figure out that there is none set.

Same applies for SYSCALL_WORK_EXIT.

Your change neither makes anything simpler nor provides more granular
control.

All it does is adding overhead and therefore guaranteed to introduce a
performance regression.

Not going to happen.

Thanks,

        tglx