[PATCH 0/2] Fix for multi-process gdbstub breakpoints

Roque Arcudia Hernandez posted 2 patches 2 months, 2 weeks ago
There is a newer version of this series
accel/tcg/tcg-accel-ops.c | 15 +++++++++++++++
gdbstub/gdbstub.c         | 11 ++++++++---
include/exec/gdbstub.h    | 15 +++++++++++++++
3 files changed, 38 insertions(+), 3 deletions(-)
[PATCH 0/2] Fix for multi-process gdbstub breakpoints
Posted by Roque Arcudia Hernandez 2 months, 2 weeks ago
This patch series modifies the gdbstub to address a bug running a
multi cluster machine in QEMU using TCG. The machine where the
problem was seen had several clusters of CPUs with similar
architectures and similar memory layout all working with physical
addresses. It was discovered under gdb debugging that a breakpoint
targeting one cluster misfired on the wrong cluster quite frequently
with no possible workaround since gdb was also unaware of any
breakpoint in that cluster and simply reported SIGTRAP.

The sequence that discovered the bug adds N inferiors and adds a
breakpoint on inferior N. Then after continuing an unrelated thread
stops the execution when its PC hits the same address of the break
targeting a different inferior.

target extended-remote :1234
add-inferior
inferior 2
attach 2
...
add-inferior
inferior N
attach N
add-symbol-file /path/to/foo.elf
break foo
> Breakpoint 1 at 0xf00add
info break
> Num     Type           Disp Enb Address    What
> 1       breakpoint     keep y   0x00f00add in foo
>                                            at foo.c:1234 inf N
continue
> Continuing.
>
> Thread 2.1 received signal SIGTRAP, Trace/breakpoint trap.
> [Switching to Thread 2.2]
> 0xf00add in ?? ()

The main problem is the current implementation of
'tcg_insert_breakpoint' and 'tcg_remove_breakpoint' insert and remove
breakpoints to all the CPUs in the system regardless of what the
remote gdb protocol implements.

If we look at the current source code of GDB we can see that the
function 'remote_target::insert_breakpoint' in file gdb/remote.c has
the intention to select the process ID of the inferior where the
break was inserted.

int
remote_target::insert_breakpoint (struct gdbarch *gdbarch,
                                  struct bp_target_info *bp_tgt)
{
...
  /* Make sure the remote is pointing at the right process, if
     necessary.  */
  if (!gdbarch_has_global_breakpoints (current_inferior ()->arch ()))
    set_general_process ();
...
}

https:sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/remote.c;h=2c3988cb5075655e8a799d1cc5d4760ad8ed426e;hb=HEAD#l11023

This would not happen when we input the 'break' in gdb but it is
deferred until the time we execute the 'continue' command. Because we
might be currently selecting an inferior that is not the one where we
previously set the breakpoint, the remote gdb has to make sure we
move the focus to the process ID of the inferior where we inserted
the break.

In the previous example this will translate to something like:

HgpN.M
Z0,00f00add,4

Another thing that is wrong with the current implementation (and it
affects both TCG and KVM mode) is that the remote gdb protocol uses
'Hg' and not 'Hc' to select the process. Functions
'gdb_breakpoint_insert' and 'gdb_breakpoint_remove' receive wrongly
'gdbserver_state.c_cpu' instead of 'gdbserver_state.g_cpu'.

This is supported by the documentation of 'H op thread-id' where op =
'c' is reserved to the step and continue:

https:sourceware.org/gdb/current/onlinedocs/gdb.html/Packets.html

And it can be verified that the function 'set_general_process' in the
previous code snippet will eventually call
'remote_target::set_general_thread' and not
'remote_target::set_continue_thread' if it needs to change focus.

A third scenario that has to be taken into account is the case of a
break on a specific thread, for instance the sequence:

inferior 1
break bar thread 1.3
break bar thread 1.4

The remote protocol expects the gdbstub to apply the break to the
process ID of inferior 1 and considers the specific thread-id as a
breakpoint condition (not too different from a 'break if').

In this case the packet exchange may look like:

Hgp1.1
Z0,00ba2add,4

There wouldn't be an independent set of packets for 'Hgp1.3' and
'Hgp1.4'.

In the gdb source code, the handling of the specific thread-id
happens during breakpoint evaluation in function
'bpstat_check_breakpoint_conditions' of file gdb/breakpoint.c

https:sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/breakpoint.c;h=17bd627f867cf3d4dc81322ed1919ba40cbb237d;hb=HEAD#l5550

The proposed fix inserts or removes a breakpoint to all the cpus
sharing the same process ID as the one selected with the latest
received 'Hg' packet.

Roque Arcudia Hernandez (2):
  gdbstub: Fix wrong CPUState pointer in breakpoint functions
  gdbstub: Apply breakpoints only to the selected PID

 accel/tcg/tcg-accel-ops.c | 15 +++++++++++++++
 gdbstub/gdbstub.c         | 11 ++++++++---
 include/exec/gdbstub.h    | 15 +++++++++++++++
 3 files changed, 38 insertions(+), 3 deletions(-)

-- 
2.46.0.469.g59c65b2a67-goog