[PATCH v6 1/2] ptrace,syscall_user_dispatch: Implement Syscall User Dispatch Suspension

Gregory Price posted 2 patches 2 years, 7 months ago
There is a newer version of this series
[PATCH v6 1/2] ptrace,syscall_user_dispatch: Implement Syscall User Dispatch Suspension
Posted by Gregory Price 2 years, 7 months ago
Adds PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH to ptrace options, and
modify Syscall User Dispatch to suspend interception when enabled.

This is modeled after the SUSPEND_SECCOMP feature, which suspends
SECCOMP interposition.  Without doing this, software like CRIU will
inject system calls into a process and be intercepted by Syscall
User Dispatch, either causing a crash (due to blocked signals) or
the delivery of those signals to a ptracer (not the intended behavior).

Since Syscall User Dispatch is not a privileged feature, a check
for permissions is not required, however attempting to set this
option when CONFIG_CHECKPOINT_RESTORE it not supported should be
disallowed, as its intended use is checkpoint/resume.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/ptrace.h               | 2 ++
 include/uapi/linux/ptrace.h          | 6 +++++-
 kernel/entry/syscall_user_dispatch.c | 5 +++++
 kernel/ptrace.c                      | 4 ++++
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index eaaef3ffec22..461ae5c99d57 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -45,6 +45,8 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
+#define PT_SUSPEND_SYSCALL_USER_DISPATCH \
+	(PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH << PT_OPT_FLAG_SHIFT)
 
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64a8c87..ba9e3f19a22c 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -146,9 +146,13 @@ struct ptrace_rseq_configuration {
 /* eventless options */
 #define PTRACE_O_EXITKILL		(1 << 20)
 #define PTRACE_O_SUSPEND_SECCOMP	(1 << 21)
+#define PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH	(1 << 22)
 
 #define PTRACE_O_MASK		(\
-	0x000000ff | PTRACE_O_EXITKILL | PTRACE_O_SUSPEND_SECCOMP)
+	0x000000ff | \
+	PTRACE_O_EXITKILL | \
+	PTRACE_O_SUSPEND_SECCOMP | \
+	PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH)
 
 #include <asm/ptrace.h>
 
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..b5ec75164805 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -8,6 +8,7 @@
 #include <linux/uaccess.h>
 #include <linux/signal.h>
 #include <linux/elf.h>
+#include <linux/ptrace.h>
 
 #include <linux/sched/signal.h>
 #include <linux/sched/task_stack.h>
@@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs)
 	struct syscall_user_dispatch *sd = &current->syscall_dispatch;
 	char state;
 
+	if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
+	    unlikely(current->ptrace & PT_SUSPEND_SYSCALL_USER_DISPATCH))
+		return false;
+
 	if (likely(instruction_pointer(regs) - sd->offset < sd->len))
 		return false;
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 54482193e1ed..a348b68d07a2 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -370,6 +370,10 @@ static int check_ptrace_options(unsigned long data)
 	if (data & ~(unsigned long)PTRACE_O_MASK)
 		return -EINVAL;
 
+	if (unlikely(data & PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH) &&
+	    (!IS_ENABLED(CONFIG_CHECKPOINT_RESTORE)))
+			return -EINVAL;
+
 	if (unlikely(data & PTRACE_O_SUSPEND_SECCOMP)) {
 		if (!IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) ||
 		    !IS_ENABLED(CONFIG_SECCOMP))
-- 
2.39.0
Re: [PATCH v6 1/2] ptrace,syscall_user_dispatch: Implement Syscall User Dispatch Suspension
Posted by Oleg Nesterov 2 years, 7 months ago
On 01/24, Gregory Price wrote:
>
> Adds PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH to ptrace options, and
> modify Syscall User Dispatch to suspend interception when enabled.
>
> This is modeled after the SUSPEND_SECCOMP feature, which suspends
> SECCOMP interposition.  Without doing this, software like CRIU will
> inject system calls into a process and be intercepted by Syscall
> User Dispatch, either causing a crash (due to blocked signals) or
> the delivery of those signals to a ptracer (not the intended behavior).

Cough... Gregory, I am sorry ;)

but can't we drop this patch to ?

CRIU needs to do PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG and check
config->mode anyway as we discussed.

Then it can simply set *config->selector = SYSCALL_DISPATCH_FILTER_ALLOW
with the same effect, no?

Oleg.
Re: [PATCH v6 1/2] ptrace,syscall_user_dispatch: Implement Syscall User Dispatch Suspension
Posted by Gregory Price 2 years, 7 months ago
On Thu, Jan 26, 2023 at 01:30:08AM +0100, Oleg Nesterov wrote:
> On 01/24, Gregory Price wrote:
> >
> > Adds PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH to ptrace options, and
> > modify Syscall User Dispatch to suspend interception when enabled.
> >
> > This is modeled after the SUSPEND_SECCOMP feature, which suspends
> > SECCOMP interposition.  Without doing this, software like CRIU will
> > inject system calls into a process and be intercepted by Syscall
> > User Dispatch, either causing a crash (due to blocked signals) or
> > the delivery of those signals to a ptracer (not the intended behavior).
> 
> Cough... Gregory, I am sorry ;)
> 
> but can't we drop this patch to ?
> 
> CRIU needs to do PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG and check
> config->mode anyway as we discussed.
> 
> Then it can simply set *config->selector = SYSCALL_DISPATCH_FILTER_ALLOW
> with the same effect, no?
> 
> Oleg.
> 

After further investigation, I believe we can drop 1/2, but for a
different reason:  It's actually insane behavior during the quiesce
phase.  Quiesce allows the program to run until a particular state,
which means we can't turn it off lest we interfere with intended
behavior - (cough cough prior review said this cough cough i'm dumb).

I'll drop patch 1/2 and resubmit (there's an unused variable warning i
need to clean up).

Thanks again for the reviews all
~Gregory
Re: [PATCH v6 1/2] ptrace,syscall_user_dispatch: Implement Syscall User Dispatch Suspension
Posted by Gregory Price 2 years, 7 months ago
On Thu, Jan 26, 2023 at 01:30:08AM +0100, Oleg Nesterov wrote:
> On 01/24, Gregory Price wrote:
> >
> > Adds PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH to ptrace options, and
> > modify Syscall User Dispatch to suspend interception when enabled.
> >
> > This is modeled after the SUSPEND_SECCOMP feature, which suspends
> > SECCOMP interposition.  Without doing this, software like CRIU will
> > inject system calls into a process and be intercepted by Syscall
> > User Dispatch, either causing a crash (due to blocked signals) or
> > the delivery of those signals to a ptracer (not the intended behavior).
> 
> Cough... Gregory, I am sorry ;)
> 
> but can't we drop this patch to ?
> 
> CRIU needs to do PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG and check
> config->mode anyway as we discussed.
> 
> Then it can simply set *config->selector = SYSCALL_DISPATCH_FILTER_ALLOW
> with the same effect, no?
> 
> Oleg.
> 

The selector is optional, but the core idea seems reasonable.

Though I think this complicates the quiesce vs checkpoint phases a bit.

My best understanding of CRIU is there are (at least) two checkpoint
phases: quiesce and checkpoint. The intent of patch 1/2 is to aid the
quiesce phase, not the checkpoint phase.

In both phases the `compel` code is used to inject system calls, so
turning SUD off is required.  That can obviously be achieved via saving
with get_config, and just clearing it entirely with set_config.

I'm NOT sure whether the `compel` code can save settings that the
`cr-check` code then saves to disc, or if `compel` is standalone. I will
go check this and report back.

The only other concern is one of how it's restored, and in what order
compared to SECCOMP - for the absolute insane case of someone running a
SUD task inside a locked down cgroup? Technically possible (TM)!

We may find that the suspend flag is "just easier" but not required.

I do think more-simple-is-more-better, though, so I will investigate.

~Gregory