On x86, occasionally, the removal of a breakpoint (i.e., removal of
the int3 instruction) fails because the text_mutex is taken by another
CPU (mainly due to the static_key mechanism, I think). The function
kgdb_skipexception catches exceptions from these spurious int3
instructions, bails out of KGDB, and continues execution from the
previous PC address.
However, this led to an endless loop between the int3 instruction and
kgdb_skipexception since the int3 instruction (being still present)
triggered again. This effectively caused the system to hang.
With this patch, we try to remove the concerned spurious int3
instruction in kgdb_skipexception before continuing execution. This
may take a few attempts until the concurrent holders of the text_mutex
have released it, but eventually succeeds and the kernel can continue.
Signed-off-by: Florian Rommel <mail@florommel.de>
---
arch/x86/kernel/kgdb.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 64c332151af7..585a7a72af74 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -723,7 +723,31 @@ void kgdb_arch_exit(void)
int kgdb_skipexception(int exception, struct pt_regs *regs)
{
if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) {
+ struct kgdb_bkpt *bpt;
+ int i, error;
+
regs->ip -= 1;
+
+ /*
+ * Try to remove the spurious int3 instruction.
+ * These int3s can result from failed breakpoint removals
+ * in kgdb_arch_remove_breakpoint.
+ */
+ for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
+ if (kgdb_break[i].bpt_addr == regs->ip &&
+ kgdb_break[i].state == BP_REMOVED &&
+ (kgdb_break[i].type == BP_BREAKPOINT ||
+ kgdb_break[i].type == BP_POKE_BREAKPOINT)) {
+ bpt = &kgdb_break[i];
+ break;
+ }
+ }
+ if (!bpt)
+ return 1;
+ error = kgdb_arch_remove_breakpoint(bpt);
+ if (error)
+ pr_err("skipexception: breakpoint remove failed: %lx\n",
+ bpt->bpt_addr);
return 1;
}
return 0;
--
2.46.0
Florian!
On Mon, Aug 12 2024 at 19:43, Florian Rommel wrote:
> On x86, occasionally, the removal of a breakpoint (i.e., removal of
> the int3 instruction) fails because the text_mutex is taken by another
> CPU (mainly due to the static_key mechanism, I think).
Either you know it or not. Speculation is reserved for CPUs.
> The function kgdb_skipexception catches exceptions from these spurious
> int3 instructions, bails out of KGDB, and continues execution from the
> previous PC address.
>
> However, this led to an endless loop between the int3 instruction and
> kgdb_skipexception since the int3 instruction (being still present)
> triggered again. This effectively caused the system to hang.
>
> With this patch, we try to remove the concerned spurious int3
> instruction in kgdb_skipexception before continuing execution. This
> may take a few attempts until the concurrent holders of the text_mutex
> have released it, but eventually succeeds and the kernel can continue.
What guarantees the release of text mutex?
> Signed-off-by: Florian Rommel <mail@florommel.de>
> ---
> arch/x86/kernel/kgdb.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index 64c332151af7..585a7a72af74 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -723,7 +723,31 @@ void kgdb_arch_exit(void)
> int kgdb_skipexception(int exception, struct pt_regs *regs)
Btw, kgdb_skipexception() is a gross nisnomer because it rewinds the
instruction pointer to the exception address and does not skip anything,
but that's an orthogonal issue though it could be cleaned up along the
way...
> {
> if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) {
> + struct kgdb_bkpt *bpt;
> + int i, error;
> +
> regs->ip -= 1;
> +
> + /*
> + * Try to remove the spurious int3 instruction.
> + * These int3s can result from failed breakpoint removals
> + * in kgdb_arch_remove_breakpoint.
> + */
> + for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
> + if (kgdb_break[i].bpt_addr == regs->ip &&
> + kgdb_break[i].state == BP_REMOVED &&
> + (kgdb_break[i].type == BP_BREAKPOINT ||
> + kgdb_break[i].type == BP_POKE_BREAKPOINT)) {
> + bpt = &kgdb_break[i];
> + break;
> + }
> + }
Seriously? The KGBD core already walked that array in
kgdb_isremovedbreak() just so you can walk it again here.
struct kkgdb_bkpt *kgdb_get_removed_breakpoint(unsigned long addr)
{
struct kgdb_bkpt *bp = kgdb_break;
for (int i = 0; i < KGDB_MAX_BREAKPOINTS; i++, bp++) {
if (bp>.state == BP_REMOVED && bp->kgdb_bpt_addr == addr)
return bp;
}
return NULL;
}
bool kgdb_isremovedbreak(unsigned long addr)
{
return !!kgdb_get_removed_breakpoint(addr);
}
bool kgdb_rewind_exception(int exception, struct pt_regs *regs)
{
struct kgdb_bkpt *bp;
if (exception != 3)
return false;
bp = kgdb_get_removed_breakpoint(--regs->ip);
if (!bp || !bp->type == BP_BREAKPOINT)
return false;
Hmm?
> + error = kgdb_arch_remove_breakpoint(bpt);
> + if (error)
> + pr_err("skipexception: breakpoint remove failed: %lx\n",
> + bpt->bpt_addr);
Lacks curly brackets. See Documentation.
return !error;
Aside of that the same problem exists on PowerPC. So you can move the
attempt to remove the breakpoint into the generic code, no?
Thanks,
tglx
Hi Thomas,
Thanks for the feedback.
On Mon, 2024-08-12 at 23:04 +0200, Thomas Gleixner wrote:
> Either you know it or not. Speculation is reserved for CPUs.
I have now checked it, it is always the static_key patching mechanism
when I reproduce the issue (in QEMU with a varying number of CPUs), but
we can skip this statement.
>
> > {
> > if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) {
> > + struct kgdb_bkpt *bpt;
> > + int i, error;
> > +
> > regs->ip -= 1;
> > +
> > + /*
> > + * Try to remove the spurious int3 instruction.
> > + * These int3s can result from failed breakpoint removals
> > + * in kgdb_arch_remove_breakpoint.
> > + */
> > + for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
> > + if (kgdb_break[i].bpt_addr == regs->ip &&
> > + kgdb_break[i].state == BP_REMOVED &&
> > + (kgdb_break[i].type == BP_BREAKPOINT ||
> > + kgdb_break[i].type == BP_POKE_BREAKPOINT)) {
> > + bpt = &kgdb_break[i];
> > + break;
> > + }
> > + }
>
> Seriously? The KGBD core already walked that array in
> kgdb_isremovedbreak() just so you can walk it again here.
>
> struct kkgdb_bkpt *kgdb_get_removed_breakpoint(unsigned long addr)
> {
> struct kgdb_bkpt *bp = kgdb_break;
>
> for (int i = 0; i < KGDB_MAX_BREAKPOINTS; i++, bp++) {
> if (bp>.state == BP_REMOVED && bp->kgdb_bpt_addr == addr)
> return bp;
> }
> return NULL;
> }
>
> bool kgdb_isremovedbreak(unsigned long addr)
> {
> return !!kgdb_get_removed_breakpoint(addr);
> }
>
> bool kgdb_rewind_exception(int exception, struct pt_regs *regs)
> {
> struct kgdb_bkpt *bp;
>
> if (exception != 3)
> return false;
>
> bp = kgdb_get_removed_breakpoint(--regs->ip);
> if (!bp || !bp->type == BP_BREAKPOINT)
> return false;
>
> Hmm?
Ok, ok, looks much better. My intention was to keep the changes in the
generic debug core minimal, especially since efficiency is not
important here.. but I see, it should be done right.
Best regards,
Flo
On Mon, Aug 12, 2024 at 11:04:13PM +0200, Thomas Gleixner wrote:
> Florian!
>
> On Mon, Aug 12 2024 at 19:43, Florian Rommel wrote:
> > On x86, occasionally, the removal of a breakpoint (i.e., removal of
> > the int3 instruction) fails because the text_mutex is taken by another
> > CPU (mainly due to the static_key mechanism, I think).
>
> Either you know it or not. Speculation is reserved for CPUs.
>
> > The function kgdb_skipexception catches exceptions from these spurious
> > int3 instructions, bails out of KGDB, and continues execution from the
> > previous PC address.
> >
> > However, this led to an endless loop between the int3 instruction and
> > kgdb_skipexception since the int3 instruction (being still present)
> > triggered again. This effectively caused the system to hang.
> >
> > With this patch, we try to remove the concerned spurious int3
> > instruction in kgdb_skipexception before continuing execution. This
> > may take a few attempts until the concurrent holders of the text_mutex
> > have released it, but eventually succeeds and the kernel can continue.
>
> What guarantees the release of text mutex?
>
> > Signed-off-by: Florian Rommel <mail@florommel.de>
> > ---
> > arch/x86/kernel/kgdb.c | 24 ++++++++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> > index 64c332151af7..585a7a72af74 100644
> > --- a/arch/x86/kernel/kgdb.c
> > +++ b/arch/x86/kernel/kgdb.c
> > @@ -723,7 +723,31 @@ void kgdb_arch_exit(void)
> > int kgdb_skipexception(int exception, struct pt_regs *regs)
>
> Btw, kgdb_skipexception() is a gross nisnomer because it rewinds the
> instruction pointer to the exception address and does not skip anything,
> but that's an orthogonal issue though it could be cleaned up along the
> way...
kgdb_skipexception() is not a directive from debug core to architecture.
It is a question to the archictecture: should the debug core skip normal
debug exception handling and resume immediately?.
It allows an architecture to direct the debug core to ignore a spurious
trap that, according to the comments, can occur on some
(micro)architectures when we have already restored the original
not-a-breakpoint instruction.
Florian's patch changes things so that we will also skip debug exception
handling if we can successfully poke to the text section. I don't think
it is sufficient on its own since the text_mutex could be owned by the
core that is stuck trapping on the int3 that we can't remove.
> > {
> > if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) {
> > + struct kgdb_bkpt *bpt;
> > + int i, error;
> > +
> > regs->ip -= 1;
> > +
> > + /*
> > + * Try to remove the spurious int3 instruction.
> > + * These int3s can result from failed breakpoint removals
> > + * in kgdb_arch_remove_breakpoint.
> > + */
> > + for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
> > + if (kgdb_break[i].bpt_addr == regs->ip &&
> > + kgdb_break[i].state == BP_REMOVED &&
> > + (kgdb_break[i].type == BP_BREAKPOINT ||
> > + kgdb_break[i].type == BP_POKE_BREAKPOINT)) {
> > + bpt = &kgdb_break[i];
> > + break;
> > + }
> > + }
>
> Seriously? The KGBD core already walked that array in
> kgdb_isremovedbreak() just so you can walk it again here.
>
> struct kkgdb_bkpt *kgdb_get_removed_breakpoint(unsigned long addr)
> {
> struct kgdb_bkpt *bp = kgdb_break;
>
> for (int i = 0; i < KGDB_MAX_BREAKPOINTS; i++, bp++) {
> if (bp>.state == BP_REMOVED && bp->kgdb_bpt_addr == addr)
> return bp;
> }
> return NULL;
> }
>
> bool kgdb_isremovedbreak(unsigned long addr)
> {
> return !!kgdb_get_removed_breakpoint(addr);
> }
>
> bool kgdb_rewind_exception(int exception, struct pt_regs *regs)
> {
> struct kgdb_bkpt *bp;
>
> if (exception != 3)
> return false;
>
> bp = kgdb_get_removed_breakpoint(--regs->ip);
> if (!bp || !bp->type == BP_BREAKPOINT)
> return false;
>
> Hmm?
>
> > + error = kgdb_arch_remove_breakpoint(bpt);
> > + if (error)
> > + pr_err("skipexception: breakpoint remove failed: %lx\n",
> > + bpt->bpt_addr);
>
> Lacks curly brackets. See Documentation.
>
> return !error;
>
> Aside of that the same problem exists on PowerPC. So you can move the
> attempt to remove the breakpoint into the generic code, no?
Getting the debug core to track breakpoints that are stuck on would be a
good improvement.
We would be able to use that logic to retry the removal of stuck
breakpoint once other SMP cores are running again. That would be great
for architectures like arm64 that use spin-with-irqsave locking in their
noinstr text_poke() machinery.
However it won't solve the problem for x86 since it uses mutex locking.
A partial solution might be to get kgdb_arch_remove_breakpoint() to
disregard the text_mutex completely if kdb_continue_catastrophic is set
(and adding stuck breakpoints to the reasons to inhibit a continue).
This is a partial solution in the sense that it is not safe: we will
simply tell the kernel dev driving the debugger that they are
responsible for the safety of the continue and then disable the safety
rails.
I haven't yet come up with a full fix that doesn't require
text_poke_kgdb() to not require text_mutex to be free. I did note that
comment in __text_poke() that says calling get_locked_pte() "is not
really needed, but this allows to avoid open-coding" but I got a bit lost
trying to figure out other locks and nesting concerns.
Daniel.
[1] I try to avoid calling them "users" because
Daniel!
On Tue, Aug 13 2024 at 12:31, Daniel Thompson wrote:
> On Mon, Aug 12, 2024 at 11:04:13PM +0200, Thomas Gleixner wrote:
>> Btw, kgdb_skipexception() is a gross nisnomer because it rewinds the
>> instruction pointer to the exception address and does not skip anything,
>> but that's an orthogonal issue though it could be cleaned up along the
>> way...
>
> kgdb_skipexception() is not a directive from debug core to architecture.
> It is a question to the archictecture: should the debug core skip normal
> debug exception handling and resume immediately?.
Ah. This code is so exceptionally intuitive ....
> It allows an architecture to direct the debug core to ignore a spurious
> trap that, according to the comments, can occur on some
> (micro)architectures when we have already restored the original
> not-a-breakpoint instruction.
Potentially due to the lack of sync_core() after the copy.
And of course the removal can fail as Florian discussed.
> Florian's patch changes things so that we will also skip debug exception
> handling if we can successfully poke to the text section. I don't think
> it is sufficient on its own since the text_mutex could be owned by the
> core that is stuck trapping on the int3 that we can't remove.
I was asking exactly that:
> What guarantees the release of text mutex?
:)
>> Aside of that the same problem exists on PowerPC. So you can move the
>> attempt to remove the breakpoint into the generic code, no?
>
> Getting the debug core to track breakpoints that are stuck on would be a
> good improvement.
>
> We would be able to use that logic to retry the removal of stuck
> breakpoint once other SMP cores are running again. That would be great
> for architectures like arm64 that use spin-with-irqsave locking in their
> noinstr text_poke() machinery.
>
> However it won't solve the problem for x86 since it uses mutex locking.
>
> A partial solution might be to get kgdb_arch_remove_breakpoint() to
> disregard the text_mutex completely if kdb_continue_catastrophic is set
> (and adding stuck breakpoints to the reasons to inhibit a continue).
The interrupted owner of text_mutex might be in the middle of modifying
the poking_mm page tables or in the worst case modifying the code which
kgdb wants to play with.
Dragons are guaranteed.
> This is a partial solution in the sense that it is not safe: we will
> simply tell the kernel dev driving the debugger that they are
> responsible for the safety of the continue and then disable the safety
> rails.
>
> I haven't yet come up with a full fix that doesn't require
> text_poke_kgdb() to not require text_mutex to be free. I did note that
> comment in __text_poke() that says calling get_locked_pte() "is not
> really needed, but this allows to avoid open-coding" but I got a bit lost
> trying to figure out other locks and nesting concerns.
So there are no other locks involved. The PTE lock is not strictly
required because the access to poking_mm is fully serialized.
You'd need to add a separate kgdb_poking_mm and kgdb_poking_addr. Now
make __text_poke() take a @mm and @pokeaddr argument and let it operate
on them. But that solves only part of the problem:
1) A concurrent modification of the same code will result in
undefined behaviour. Not sure whether that's actually an issue,
but I would not bet on it.
2) switch_mm() and the related code are not reentrancy safe either
3) TLB flushing. That can't use tlb_flush_mm_range() simply because
that's not reentrant.
Which makes me wonder about #2. As this stuff can run in NMI context,
then even if text_mutex is uncontended, then tlb_flush_mm_range() might
be what had been interrupted, so reentrancy would be fatal.
That's all a horrorshow as you can't play with CR0.WP either at least
not when CR4.CET is set.
So if you can force disable CET when KGDB is enabled, then you might get
away with:
bp->code = *p;
clear_cr0_wp();
*p = int3;
set_cr0_wp();
Though the hardening people might not be really happy about this.
But let's take a step back. Installing a breakpoint is done by the
debugger. The debugger knows the original code, right?
So why cant the debugger provide a trampoline:
ORIG_OP // with fixed up displacements
JMP NEXT_INSN
Now you stick that into a trampoline page slot and then patch the INT3
in. Now on removal can be a two stage approach:
bp->state = INACTIVE;
kick_breakpoint_gc();
breakpoint_gc() removes the breakpoint and invalidates the trampoline
from a safe context and up to that point kgdb_skipexception() can do:
bool kgdb_skip_exception(int exception, struct pt_regs *regs)
{
struct kgdb_bkpt *bp;
if (exception != 3)
return false;
bp = kgdb_get_inactive_breakpoint(--regs->ip, BP_BREAKPOINT);
if (bp)
regs->ip = bp->trampoline;
return true;
}
Hmm?
Thanks,
tglx
Experimental implementation of Thomas' trampoline idea.
Link: https://lore.kernel.org/all/87wmkkpf5v.ffs@tglx/
Every breakpoint gets a trampoline entry containing the original
instruction (with a corrected displacement if necessary) and a jump
back into the function to the logically subsequent instruction.
With this, KGDB can skip a still present debug trap for a removed
breakpoint by routing the control flow through the trampoline.
In this experimental implementation, the actual removal of the debug
trap instructions is completely disabled. So, all trap instructions
planted by KGDB currently remain in the code, and KGDB will skip all
these breakpoints through the trampolines when they are in the removed
state. There is not yet a dedicated breakpoint state for the
to-be-removed-but-still-present breakpoints.
Inspect the trampolines via:
(gdb) x/16i kgdb_trampoline_page
Signed-off-by: Florian Rommel <mail@florommel.de>
---
I have been experimenting a bit with Thomas' idea of the trampoline
approach. It seems to work so far but of course has a lot of rough
edges.
I am stuck for now, as I am not sure about the best way to implement
the safe context where KGDB finally removes the int3 instructions.
Maybe, this is useful.. at least it was a fun ride.
arch/x86/kernel/kgdb.c | 128 +++++++++++++++++++++++++++++++++-----
include/linux/kgdb.h | 2 +
kernel/debug/debug_core.c | 12 ++++
3 files changed, 128 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 9c9faa1634fb..e09090f14894 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -35,6 +35,7 @@
#include <linux/hw_breakpoint.h>
#include <linux/uaccess.h>
#include <linux/memory.h>
+#include <linux/execmem.h>
#include <asm/text-patching.h>
#include <asm/debugreg.h>
@@ -42,6 +43,11 @@
#include <asm/apic.h>
#include <asm/nmi.h>
#include <asm/switch_to.h>
+#include <asm/insn.h>
+#include <asm/set_memory.h>
+
+static void *kgdb_trampoline_page;
+static void *kgdb_trampoline_page_curr_slot;
struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] =
{
@@ -402,6 +408,60 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
}
}
+static int kgdb_init_trampolines(void)
+{
+ /* FIXME: We should reserve enough space for all breakpoints
+ or make the trampoline table dynamic somehow.. */
+ /* FIXME: Currently borrowing EXECMEM_KPROBES */
+ kgdb_trampoline_page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
+ if (!kgdb_trampoline_page)
+ return 1;
+ memset(kgdb_trampoline_page, INT3_INSN_OPCODE, PAGE_SIZE);
+ set_memory_rox((unsigned long)kgdb_trampoline_page, 1);
+ kgdb_trampoline_page_curr_slot = kgdb_trampoline_page;
+ return 0;
+}
+
+static int kgdb_make_trampoline(struct kgdb_bkpt *bpt, void *orig_insn)
+{
+ void *kernel_addr, *slot_addr, *jmp_addr, *jmp_code;
+ int slot_size;
+ struct insn insn;
+ u8 insn_buff[MAX_INSN_SIZE + JMP32_INSN_SIZE];
+
+ if (WARN_ON(insn_decode_kernel(&insn, orig_insn)))
+ return -EINVAL;
+
+ slot_size = insn.length + JMP32_INSN_SIZE;
+
+ if (!bpt->trampoline) {
+ /* FIXME: Not enough space for all possible breakpoints */
+ if (kgdb_trampoline_page_curr_slot + slot_size
+ > kgdb_trampoline_page + PAGE_SIZE)
+ return -ENOSPC;
+
+ bpt->trampoline = kgdb_trampoline_page_curr_slot;
+ kgdb_trampoline_page_curr_slot += slot_size;
+ }
+
+ kernel_addr = (void *)bpt->bpt_addr;
+ slot_addr = bpt->trampoline;
+ jmp_addr = slot_addr + insn.length;
+ jmp_code = text_gen_insn(JMP32_INSN_OPCODE, jmp_addr,
+ kernel_addr + insn.length);
+
+ memcpy(insn_buff, kernel_addr, insn.length);
+ memcpy(insn_buff + insn.length, jmp_code, JMP32_INSN_SIZE);
+ apply_relocation(insn_buff, slot_addr, insn.length,
+ kernel_addr, insn.length);
+
+ if (mutex_is_locked(&text_mutex))
+ return -EBUSY;
+ text_poke_kgdb(slot_addr, insn_buff, insn.length + JMP32_INSN_SIZE);
+
+ return 0;
+}
+
#ifdef CONFIG_SMP
/**
* kgdb_roundup_cpus - Get other CPUs into a holding pattern
@@ -598,6 +658,10 @@ int kgdb_arch_init(void)
{
int retval;
+ retval = kgdb_init_trampolines();
+ if (retval)
+ goto out;
+
retval = register_die_notifier(&kgdb_notifier);
if (retval)
goto out;
@@ -708,11 +772,18 @@ void kgdb_arch_exit(void)
*/
int kgdb_skipexception(int exception, struct pt_regs *regs)
{
- if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) {
- regs->ip -= 1;
- return 1;
+ struct kgdb_bkpt *bp;
+
+ if (exception != 3)
+ return false;
+
+ bp = kgdb_get_removedbreak(regs->ip - 1);
+ if (bp) {
+ regs->ip = (unsigned long)bp->trampoline;
+ return true;
}
- return 0;
+
+ return false;
}
unsigned long kgdb_arch_pc(int exception, struct pt_regs *regs)
@@ -730,12 +801,33 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long ip)
int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
{
int err;
+ char orig_insn[MAX_INSN_SIZE];
bpt->type = BP_BREAKPOINT;
- err = copy_from_kernel_nofault(bpt->saved_instr, (char *)bpt->bpt_addr,
- BREAK_INSTR_SIZE);
+
+ /*
+ * FIXME: Currently, debug traps are not removed, so all logically
+ * removed breakpoints are still there.
+ * For now, abort if the trampoline is already present.
+ * Later, we should use a dedicated breakpoint state to signal this.
+ */
+ if (bpt->trampoline) {
+ bpt->type = BP_POKE_BREAKPOINT;
+ return 0;
+ }
+
+ /* FIXME: What if we cannot read all of MAX_INSN_SIZE */
+ err = copy_from_kernel_nofault(orig_insn, (char *)bpt->bpt_addr,
+ MAX_INSN_SIZE);
if (err)
return err;
+
+ memcpy(bpt->saved_instr, orig_insn, BREAK_INSTR_SIZE);
+
+ err = kgdb_make_trampoline(bpt, orig_insn);
+ if (err)
+ return err;
+
err = copy_to_kernel_nofault((char *)bpt->bpt_addr,
arch_kgdb_ops.gdb_bpt_instr, BREAK_INSTR_SIZE);
if (!err)
@@ -755,16 +847,11 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
{
+ /* FIXME: Do not mess with early breakpoints for now */
if (bpt->type != BP_POKE_BREAKPOINT)
goto knl_write;
- /*
- * It is safe to call text_poke_kgdb() because normal kernel execution
- * is stopped on all cores, so long as the text_mutex is not locked.
- */
- if (mutex_is_locked(&text_mutex))
- goto knl_write;
- text_poke_kgdb((void *)bpt->bpt_addr, bpt->saved_instr,
- BREAK_INSTR_SIZE);
+
+ /* FIXME: Removal is not necessary anymore.. for now */
return 0;
knl_write:
@@ -772,6 +859,19 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
(char *)bpt->saved_instr, BREAK_INSTR_SIZE);
}
+int kgdb_validate_break_address(unsigned long addr)
+{
+ if (kgdb_within_blocklist(addr))
+ return -EINVAL;
+
+ /*
+ * With breakpoint trampolines, there is no need to further validate
+ * setting and removal of breakpoints.
+ */
+
+ return 0;
+}
+
const struct kgdb_arch arch_kgdb_ops = {
/* Breakpoint instruction: */
.gdb_bpt_instr = { 0xcc },
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index 76e891ee9e37..19c2d8ee8124 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -79,6 +79,7 @@ struct kgdb_bkpt {
unsigned char saved_instr[BREAK_INSTR_SIZE];
enum kgdb_bptype type;
enum kgdb_bpstate state;
+ void *trampoline;
};
struct dbg_reg_def_t {
@@ -324,6 +325,7 @@ extern int kgdb_hex2long(char **ptr, unsigned long *long_val);
extern char *kgdb_mem2hex(char *mem, char *buf, int count);
extern int kgdb_hex2mem(char *buf, char *mem, int count);
+extern struct kgdb_bkpt *kgdb_get_removedbreak(unsigned long addr);
extern int kgdb_isremovedbreak(unsigned long addr);
extern int kgdb_has_hit_break(unsigned long addr);
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index ce1bb2301c06..60bc63d1000a 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -393,6 +393,18 @@ int dbg_remove_sw_break(unsigned long addr)
return -ENOENT;
}
+struct kgdb_bkpt *kgdb_get_removedbreak(unsigned long addr)
+{
+ int i;
+
+ for (i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
+ if ((kgdb_break[i].state == BP_REMOVED) &&
+ (kgdb_break[i].bpt_addr == addr))
+ return &kgdb_break[i];
+ }
+ return NULL;
+}
+
int kgdb_isremovedbreak(unsigned long addr)
{
int i;
--
2.46.0
Hi Florian, kernel test robot noticed the following build errors: [auto build test ERROR on tip/master] [also build test ERROR on tip/x86/core linus/master v6.11-rc3 next-20240816] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Florian-Rommel/x86-kgdb-trampolines-for-shadowed-instructions/20240814-230936 base: tip/master patch link: https://lore.kernel.org/r/20240814085141.171564-1-mail%40florommel.de patch subject: [PATCH WIP] x86/kgdb: trampolines for shadowed instructions config: x86_64-buildonly-randconfig-004-20240816 (https://download.01.org/0day-ci/archive/20240816/202408162013.3Nz53QCv-lkp@intel.com/config) compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240816/202408162013.3Nz53QCv-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408162013.3Nz53QCv-lkp@intel.com/ All errors (new ones prefixed by >>): >> ld.lld: error: undefined symbol: execmem_alloc >>> referenced by usercopy_64.c >>> vmlinux.o:(kgdb_arch_init) -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Hi Florian,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/master]
[also build test ERROR on tip/x86/core linus/master v6.11-rc3 next-20240815]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Florian-Rommel/x86-kgdb-trampolines-for-shadowed-instructions/20240814-230936
base: tip/master
patch link: https://lore.kernel.org/r/20240814085141.171564-1-mail%40florommel.de
patch subject: [PATCH WIP] x86/kgdb: trampolines for shadowed instructions
config: x86_64-buildonly-randconfig-004-20240815 (https://download.01.org/0day-ci/archive/20240816/202408160303.FvgjFUth-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240816/202408160303.FvgjFUth-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408160303.FvgjFUth-lkp@intel.com/
All errors (new ones prefixed by >>):
ld: vmlinux.o: in function `kgdb_arch_init':
>> arch/x86/kernel/kgdb.c:416: undefined reference to `execmem_alloc'
vim +416 arch/x86/kernel/kgdb.c
410
411 static int kgdb_init_trampolines(void)
412 {
413 /* FIXME: We should reserve enough space for all breakpoints
414 or make the trampoline table dynamic somehow.. */
415 /* FIXME: Currently borrowing EXECMEM_KPROBES */
> 416 kgdb_trampoline_page = execmem_alloc(EXECMEM_KPROBES, PAGE_SIZE);
417 if (!kgdb_trampoline_page)
418 return 1;
419 memset(kgdb_trampoline_page, INT3_INSN_OPCODE, PAGE_SIZE);
420 set_memory_rox((unsigned long)kgdb_trampoline_page, 1);
421 kgdb_trampoline_page_curr_slot = kgdb_trampoline_page;
422 return 0;
423 }
424
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Wed, Aug 14, 2024 at 10:51:41AM +0200, Florian Rommel wrote: > Experimental implementation of Thomas' trampoline idea. > Link: https://lore.kernel.org/all/87wmkkpf5v.ffs@tglx/ > > Every breakpoint gets a trampoline entry containing the original > instruction (with a corrected displacement if necessary) and a jump > back into the function to the logically subsequent instruction. > With this, KGDB can skip a still present debug trap for a removed > breakpoint by routing the control flow through the trampoline. > > In this experimental implementation, the actual removal of the debug > trap instructions is completely disabled. So, all trap instructions > planted by KGDB currently remain in the code, and KGDB will skip all > these breakpoints through the trampolines when they are in the removed > state. There is not yet a dedicated breakpoint state for the > to-be-removed-but-still-present breakpoints. > > Inspect the trampolines via: > (gdb) x/16i kgdb_trampoline_page > > Signed-off-by: Florian Rommel <mail@florommel.de> > --- > I have been experimenting a bit with Thomas' idea of the trampoline > approach. It seems to work so far but of course has a lot of rough > edges. Interesting. Perhaps a dumb question but is this trampoline code doing the same thing as the out-of-line single-step code in kprobes? > I am stuck for now, as I am not sure about the best way to implement > the safe context where KGDB finally removes the int3 instructions. I think this would actually fit really nicely into the debug core code. Firstly dbg_deactivate_sw_breakpoints() should strictly maintain BP_ACTIVE and BP_SET states (e.g. if the kgdb_arch_remove_breakpoint() fails then do not transition from BP_ACTIVE and BP_SET). There would be a little bit of extra logic to clean things up here (each equality check on BP_SET needs to be reviewed) but the resulting state tracking is more correct. Once we've done that we can add a new state BP_REMOVE_PENDING and arrange for this to be set by dbg_remove_sw_break() if the breakpoint is BP_ACTIVE. At this stage we can also arrange for dbg_deactivate_sw_breakpoints() to handle BP_REMOVE_PENDING to BP_REMOVE transitions by calling kgdb_arch_remove_breakpoint(). That's enough to eventuallyremove the int3 instructions but it relies on entering the debug trap handler and there's no limit on how long could take before that happens. For that reason I think the core should also attempt to transition BP_REMOVE_PENDING breakpoints to BP_REMOVE after kgdb_skipexception() returns true. That means if we keep trapping on a disabled breakpoint eventually we will hit a window where the text_mutex is free and clean things up. Daniel.
On Wed, Aug 14 2024 at 11:29, Daniel Thompson wrote:
> On Wed, Aug 14, 2024 at 10:51:41AM +0200, Florian Rommel wrote:
> That's enough to eventuallyremove the int3 instructions but it relies
> on entering the debug trap handler and there's no limit on how long
> could take before that happens. For that reason I think the core should
> also attempt to transition BP_REMOVE_PENDING breakpoints to BP_REMOVE
> after kgdb_skipexception() returns true. That means if we keep trapping
> on a disabled breakpoint eventually we will hit a window where the
> text_mutex is free and clean things up.
Even when text_mutex is uncontended then text_poke_kgdb() is completely
broken in the KGDB NMI context when the NMI hit into anything related to
mm switching and tlb flushing, which is utilized in __text_poke().
The same problem is obviously true for installing a breakpoint from that
context.
I'm starting to be more convinced that the only sane solution for all of
this is to disable CET when KGDB is on and use CRO.WP to work around all
of this.
Thanks,
tglx
On Tue, 2024-08-13 at 12:31 +0100, Daniel Thompson wrote:
> On Mon, Aug 12, 2024 at 11:04:13PM +0200, Thomas Gleixner wrote:
>
> >
> > Btw, kgdb_skipexception() is a gross nisnomer because it rewinds the
> > instruction pointer to the exception address and does not skip anything,
> > but that's an orthogonal issue though it could be cleaned up along the
> > way...
>
> kgdb_skipexception() is not a directive from debug core to architecture.
> It is a question to the archictecture: should the debug core skip normal
> debug exception handling and resume immediately?.
>
> It allows an architecture to direct the debug core to ignore a spurious
> trap that, according to the comments, can occur on some
> (micro)architectures when we have already restored the original
> not-a-breakpoint instruction.
>
> Florian's patch changes things so that we will also skip debug exception
> handling if we can successfully poke to the text section. I don't think
> it is sufficient on its own since the text_mutex could be owned by the
> core that is stuck trapping on the int3 that we can't remove.
Yes, if the text_mutex is owned by the trapping core, that's a inherent
problem. This won't be solved by my patch. But this will probably be
difficult to be solved at all..
The patch only adds an removal attempt, without changing the result of
kgdb_skipexception(). It was thought as a best-effort solution: Better
try removing the breakpoint than getting stuck in the int3 loop for
sure. Of course, there is no guarantee for this being successful.
>
>
> > > {
> > > if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) {
> > > + struct kgdb_bkpt *bpt;
> > > + int i, error;
> > > +
> > > regs->ip -= 1;
> > > +
> > > + /*
> > > + * Try to remove the spurious int3 instruction.
> > > + * These int3s can result from failed breakpoint removals
> > > + * in kgdb_arch_remove_breakpoint.
> > > + */
> > > + for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
> > > + if (kgdb_break[i].bpt_addr == regs->ip &&
> > > + kgdb_break[i].state == BP_REMOVED &&
> > > + (kgdb_break[i].type == BP_BREAKPOINT ||
> > > + kgdb_break[i].type == BP_POKE_BREAKPOINT)) {
> > > + bpt = &kgdb_break[i];
> > > + break;
> > > + }
> > > + }
> >
> > Seriously? The KGBD core already walked that array in
> > kgdb_isremovedbreak() just so you can walk it again here.
> >
> > struct kkgdb_bkpt *kgdb_get_removed_breakpoint(unsigned long addr)
> > {
> > struct kgdb_bkpt *bp = kgdb_break;
> >
> > for (int i = 0; i < KGDB_MAX_BREAKPOINTS; i++, bp++) {
> > if (bp>.state == BP_REMOVED && bp->kgdb_bpt_addr == addr)
> > return bp;
> > }
> > return NULL;
> > }
> >
> > bool kgdb_isremovedbreak(unsigned long addr)
> > {
> > return !!kgdb_get_removed_breakpoint(addr);
> > }
> >
> > bool kgdb_rewind_exception(int exception, struct pt_regs *regs)
> > {
> > struct kgdb_bkpt *bp;
> >
> > if (exception != 3)
> > return false;
> >
> > bp = kgdb_get_removed_breakpoint(--regs->ip);
> > if (!bp || !bp->type == BP_BREAKPOINT)
> > return false;
> >
> > Hmm?
> >
> > > + error = kgdb_arch_remove_breakpoint(bpt);
> > > + if (error)
> > > + pr_err("skipexception: breakpoint remove failed: %lx\n",
> > > + bpt->bpt_addr);
> >
> > Lacks curly brackets. See Documentation.
> >
> > return !error;
> >
> > Aside of that the same problem exists on PowerPC. So you can move the
> > attempt to remove the breakpoint into the generic code, no?
>
> Getting the debug core to track breakpoints that are stuck on would be a
> good improvement.
Do you mean tracking failed removals with an extra kgdb_bptype variant?
>
> We would be able to use that logic to retry the removal of stuck
> breakpoint once other SMP cores are running again. That would be great
> for architectures like arm64 that use spin-with-irqsave locking in their
> noinstr text_poke() machinery.
>
> However it won't solve the problem for x86 since it uses mutex locking.
>
> A partial solution might be to get kgdb_arch_remove_breakpoint() to
> disregard the text_mutex completely if kdb_continue_catastrophic is set
> (and adding stuck breakpoints to the reasons to inhibit a continue).
> This is a partial solution in the sense that it is not safe: we will
> simply tell the kernel dev driving the debugger that they are
> responsible for the safety of the continue and then disable the safety
> rails.
>
> I haven't yet come up with a full fix that doesn't require
> text_poke_kgdb() to not require text_mutex to be free. I did note that
> comment in __text_poke() that says calling get_locked_pte() "is not
> really needed, but this allows to avoid open-coding" but I got a bit lost
> trying to figure out other locks and nesting concerns.
>
This is a bit off-topic, but isn't setting software breakpoints in
combination with other text modifications unsafe anyway? If we remove a
breakpoint in a code location that has been modified in the meantime,
we would restore an outdated saved_instr. What am I missing here?
Best regards,
Flo
© 2016 - 2026 Red Hat, Inc.