[RFC PATCH] piped/ptraced coredump (was: Dump smaller VMAs first in ELF cores)

Oleg Nesterov posted 1 patch 1 year, 4 months ago
[RFC PATCH] piped/ptraced coredump (was: Dump smaller VMAs first in ELF cores)
Posted by Oleg Nesterov 1 year, 4 months ago
On 08/02, Brian Mak wrote:
>
> Large cores may be truncated in some scenarios, such as with daemons
> with stop timeouts that are not large enough or lack of disk space. This
> impacts debuggability with large core dumps since critical information
> necessary to form a usable backtrace, such as stacks and shared library
> information, are omitted.
>
> Attempting to find all the VMAs necessary to form a proper backtrace and
> then prioritizing those VMAs specifically while core dumping is complex.
> So instead, we can mitigate the impact of core dump truncation by
> dumping smaller VMAs first, which may be more likely to contain memory
> necessary to form a usable backtrace.

I thought of a another approach... See the simple patch below,

	- Incomplete, obviously not for inclusion. I think a new
	  PTRACE_EVENT_COREDUMP makes sense, this will simplify
	  the code even more.

	- Needs some preparations. In particular, I still think we
	  should reintroduce SIGNAL_GROUP_COREDUMP regardless of
	  this feature, but lets not discuss this right now.

This patch adds the new %T specifier to core_pattern, so that

	$ echo '|/path/to/dumper %T' /proc/sys/kernel/core_pattern

means that the coredumping thread will run as a traced child of the
"dumper" process, and it will stop in TASK_TRACED before it calls
binfmt->core_dump().

So the dumper process can extract/save the backtrace/registers/whatever
first, then do PTRACE_CONT or kill the tracee if it doesn't need the
"full" coredump.

Of course this won't work if the dumping thread is already ptraced,
but in this case the debugger has all the necessary info.

What do you think?

Oleg.
---

diff --git a/fs/coredump.c b/fs/coredump.c
index 7f12ff6ad1d3..fbe8e5ae7c00 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -337,6 +337,10 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
 			case 'C':
 				err = cn_printf(cn, "%d", cprm->cpu);
 				break;
+			case 'T':
+				// XXX explain that we don't need get_task_struct()
+				cprm->traceme = current;
+				break;
 			default:
 				break;
 			}
@@ -516,9 +520,30 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 	/* and disallow core files too */
 	current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1};
 
+	if (cp->traceme) {
+		if (ptrace_attach(cp->traceme, PTRACE_SEIZE, 0,0))
+			cp->traceme = NULL;
+	}
+
 	return err;
 }
 
+static void umh_pipe_cleanup(struct subprocess_info *info)
+{
+	struct coredump_params *cp = (struct coredump_params *)info->data;
+	// XXX: we can't rely on this check, for example
+	// CONFIG_STATIC_USERMODEHELPER_PATH == ""
+	if (cp->traceme) {
+		// XXX: meaningful exit_code/message, maybe new PTRACE_EVENT_
+		ptrace_notify(SIGTRAP, 0);
+
+		spin_lock_irq(&current->sighand->siglock);
+		if (!__fatal_signal_pending(current))
+			clear_thread_flag(TIF_SIGPENDING);
+		spin_unlock_irq(&current->sighand->siglock);
+	}
+}
+
 void do_coredump(const kernel_siginfo_t *siginfo)
 {
 	struct core_state core_state;
@@ -637,7 +662,8 @@ void do_coredump(const kernel_siginfo_t *siginfo)
 		retval = -ENOMEM;
 		sub_info = call_usermodehelper_setup(helper_argv[0],
 						helper_argv, NULL, GFP_KERNEL,
-						umh_pipe_setup, NULL, &cprm);
+						umh_pipe_setup, umh_pipe_cleanup,
+						&cprm);
 		if (sub_info)
 			retval = call_usermodehelper_exec(sub_info,
 							  UMH_WAIT_EXEC);
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index 0904ba010341..490b6c5e05d8 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -28,6 +28,7 @@ struct coredump_params {
 	int vma_count;
 	size_t vma_data_size;
 	struct core_vma_metadata *vma_meta;
+	struct task_struct *traceme;
 };
 
 extern unsigned int core_file_note_size_limit;
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 90507d4afcd6..13aed4c358b6 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,6 +46,9 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
 
+extern int ptrace_attach(struct task_struct *task, long request,
+			 unsigned long addr, unsigned long flags);
+
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
 extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d5f89f9ef29f..47f1e09f8fc9 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -406,9 +406,8 @@ static inline void ptrace_set_stopped(struct task_struct *task, bool seize)
 	}
 }
 
-static int ptrace_attach(struct task_struct *task, long request,
-			 unsigned long addr,
-			 unsigned long flags)
+int ptrace_attach(struct task_struct *task, long request,
+		  unsigned long addr, unsigned long flags)
 {
 	bool seize = (request == PTRACE_SEIZE);
 	int retval;
Re: [RFC PATCH] piped/ptraced coredump (was: Dump smaller VMAs first in ELF cores)
Posted by Linus Torvalds 1 year, 4 months ago
On Sun, 4 Aug 2024 at 08:23, Oleg Nesterov <oleg@redhat.com> wrote:
>
> What do you think?

Eww. I really don't like giving the dumper ptrace rights.

I think the real limitations of the "dump to pipe" is that it's just
being very stupid. Which is fine in the sense that core dumps aren't
likely to be a huge priority. But if or when they _are_ a priority,
it's not a great model.

So I prefer the original patch because it's also small, but it's
conceptually much smaller.

That said, even that simplified v2 looks a bit excessive to me.

Does it really help so much to create a new array of core_vma_metadata
pointers - could we not just sort those things in place?

Also, honestly, if the issue is core dump truncation, at some point we
should just support truncating individual mappings rather than the
whole core file anyway. No?

Depending on what the major issue is, we might also tweak the
heuristics for which vmas get written out.

For example, I wouldn't be surprised if there's a fair number of "this
read-only private file mapping gets written out because it has been
written to" due to runtime linking. And I kind of suspect that in many
cases that's not all that interesting.

Anyway, I assume that Brian had some specific problem case that
triggered this all, and I'd like to know a bit more.

           Linus
Re: [RFC PATCH] piped/ptraced coredump (was: Dump smaller VMAs first in ELF cores)
Posted by Oleg Nesterov 1 year, 4 months ago
OK, I won't insist, just a couple of notes.

On 08/04, Linus Torvalds wrote:
>
> On Sun, 4 Aug 2024 at 08:23, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > What do you think?
>
> Eww. I really don't like giving the dumper ptrace rights.

Why?

Apart from SIGKILL, the dumper already has the full control.

And note that the dumper can already use ptrace. It can do, say,
ptrace(PTRACE_SEIZE, PTRACE_O_TRACEEXIT), close stdin, and wait
for PTRACE_EVENT_EXIT.

IIRC some people already do this, %T just makes the usage of ptrace
more convenient/powerful in this case.

> So I prefer the original patch because it's also small, but it's
> conceptually much smaller.

Ah, sorry. I didn't mean that %T makes the Brian's patch unnecessary,
I just wanted to discuss this feature "on a related note".

Oleg.
Re: [RFC PATCH] piped/ptraced coredump (was: Dump smaller VMAs first in ELF cores)
Posted by Linus Torvalds 1 year, 4 months ago
On Sun, 4 Aug 2024 at 11:53, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Apart from SIGKILL, the dumper already has the full control.

What do you mean? It's a regular usermodehelper. It gets the dump data
as input. That's all the control it has.

> And note that the dumper can already use ptrace.

.. with the normal ptrace() rules, yes.

You realize that some setups literally disable ptrace() system calls,
right? Which your patch now effectively sidesteps.

THAT is why I don't like it. ptrace() is *dangerous*. It is very
typically one of the things that people limit for various reasons.

Just adding some implicit tracing willy-nilly needs to be something
people really worry about.

             Linus
Re: [RFC PATCH] piped/ptraced coredump (was: Dump smaller VMAs first in ELF cores)
Posted by Oleg Nesterov 1 year, 4 months ago
On 08/04, Linus Torvalds wrote:
>
> On Sun, 4 Aug 2024 at 11:53, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Apart from SIGKILL, the dumper already has the full control.
>
> What do you mean? It's a regular usermodehelper. It gets the dump data
> as input. That's all the control it has.

I meant, the dumping thread can't exit until the dumper reads the data
from stdin or closes the pipe. Until then the damper can read /proc/pid/mem
and do other things.

> > And note that the dumper can already use ptrace.
>
> .. with the normal ptrace() rules, yes.
>
> You realize that some setups literally disable ptrace() system calls,
> right? Which your patch now effectively sidesteps.

Well. If, say, selinux disables ptrace, then ptrace_attach() in this
patch should also fail.

But if some setups disable sys_ptrace() as a system call... then yes,
I didn't know that.

> THAT is why I don't like it. ptrace() is *dangerous*.

And horrible ;)

> Just adding some implicit tracing willy-nilly needs to be something
> people really worry about.

Ok, as I said I won't insist.

Oleg.