[PATCH] xen/trace: Don't dump offline CPUs in debugtrace_dump_worker()

Andrew Cooper posted 1 patch 3 years, 11 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20200521084422.24073-1-andrew.cooper3@citrix.com
xen/common/debugtrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] xen/trace: Don't dump offline CPUs in debugtrace_dump_worker()
Posted by Andrew Cooper 3 years, 11 months ago
The 'T' debugkey reliably wedges on one of my systems, which has a sparse
APIC_ID layout due to a non power-of-2 number of cores per socket.  The
per_cpu(dt_cpu_data, cpu) calcution falls over the deliberately non-canonical
poison value.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Julien Grall <julien@xen.org>
CC: Juergen Gross <jgross@suse.com>

What is however weird is that instead of a crash, Xen wedges without printing
a clean backtrace.  Usually it blocks after just a few characters.  The best I
managed to get (and can't reproduce) is:

88 cpupool_rm_domain(dom=1,pool=0) n_dom 1
(XEN) wrap: 0
(XEN) debugtrace_dump() global buffer finished
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82d040207b51>] common/debugtrace.c#debugtrace_dump_worker+0x6c/0xa1
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor (d0v13)
(XEN) rax: 80007d2fbf6d1000   rbx: 0000000000000030   rcx: 00000000fffffffa
(XEN) rdx: ffff82d040473c04   rsi: ffff83103ff0fc48   rdi: ffff83103ff0fc3e
(XEN) rbp: ffff83103ff0fc78   rsp: ffff83103ff0fc38   r8:  0000000000000001
(XEN) r9:  0000000000000038   r10: 0000000000000030   r11: 0000000000000002
(XEN) r12: ffff82d0409535a0   r13: ffff83103ff0fc38   r14: ffff82d040930000
(XEN) r15: ffff82d040473bfe   cr0: 0000000080050033   cr4: 0000000000362660
(XEN) cr3: 0000001dd0f74000   cr2: 000000000041e5b0
(XEN) fsb: 00007f5bb0f15780   gsb: ffff88827ad40000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d040207b51> (common/debugtrace.c#debugtrace_dump_worker+0x6c/0xa1):
(XEN)  5e 2d 03 00 49 8b 04 24 <4a> 8b 3c 30 4c 89 ee e8 e6 fe ff ff 83 c3 01 49
(XEN) Xen stack trace from rsp=ffff83103ff0fc38:
(XEN)    ff00383420757063 ffff83103ff0fc48 0000000000000292 0000000000000292
(XEN)    ffff83103ff0fef8 ffff83103ff0ffff ffff83103ff0fd28 ffff83103ff0fef8
(XEN)    ffff83103ff0fc98 ffff82d040207c05 ffff83103ff0fc98 0000000000000054
(XEN)    ffff83103ff0fcb8 ffff82d04021d04a 0000000000000000 0000000000000000
(XEN)    ffff83103ff0fe48 ffff82d0402329f6 ffff831033cd9000 0000000000000000
(XEN)    ffff831000800027 0000000000000001 000000003ff0fcf8 0000000000000286
(XEN)    ffff83103ff0fd28 0000000000000000 00007f5bb0f27010 0000000000000000
(XEN)    0000000000000000 ffff82004009c938 ffff83103ff0fe54 ffff82d04035b055
(XEN)    ffff831033cd9000 ffff82c000000000 ffff83103ff0fd68 ffff82d040234612
(XEN)    ffff831033c8e068 0000000000000003 0000000000823679 ffff83103ff0fdf8
(XEN)    ffff83103ff0fd88 ffff82d040350f14 ffff83103ff0fdb8 0000001300000007
(XEN)    00007f5bb0f28010 0000000000000001 00007f5bafeb02c4 000000000000001c
(XEN)    00007f5bb01f02a0 0000000000000001 00007ffd208ae538 0000000000637a70
(XEN)    0000000000637a30 0000000000424e59 00007f5baff0d88b 00007f5bb01f33c0
(XEN)    0000000000000000 0000000000000002 00007f5baff0d913 0000000000000000
(XEN)    ffff82d0403b33d4 ffff83103ff0fef8 0000000000000230 ffff831033c8e000
(XEN)    0000000000000001 deadbeefdeadf00d ffff83103ff0fee8 ffff82d04032f0e0
(XEN)    ffff82d0403b33d4 00007f5bb0f27010 deadbeefdeadf00d deadbeefdeadf00d
(XEN)    deadbeefdeadf00d deadbeefdeadf00d ffff82d0403b33d4 ffff82d0403b33c8
(XEN)    ffff82d0403b33d4 ffff82d0403b33c8 ffff82d0403b33d4 ffff82d0403b33c8
(XEN) Xen call trace:
(XEN)    [<ffff82d040207b51>] R common/debugtrace.c#debugtrace_dump_worker+0x6c/0xa1
(XEN)    [<ffff82d040207c05>] F common/debugtrace.c#debugtrace_key+0x7f/0x81
(XEN)    [<ffff82d04021d04a>] F handle_keypress+0xb2/0xc9
(XEN)    [<ffff82d0402329f6>] F do_sysctl+0x6bc/0x148b
(XEN)    [<ffff82d04032f0e0>] F pv_hypercall+0x2fd/0x578
(XEN)    [<ffff82d0403b3432>] F lstar_enter+0x112/0x120
(XEN)

which is lacking the remainder of the #GP output from the non-canonical memory
reference in mov (%rax,%r14,1), %rdi.  The wedge also doesn't suffer a
watchdog timeout, which is even more concerning.
---
 xen/common/debugtrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/debugtrace.c b/xen/common/debugtrace.c
index c21ec99ee0..f3794b9453 100644
--- a/xen/common/debugtrace.c
+++ b/xen/common/debugtrace.c
@@ -95,7 +95,7 @@ static void debugtrace_dump_worker(void)
 
     debugtrace_dump_buffer(dt_data, "global");
 
-    for ( cpu = 0; cpu < nr_cpu_ids; cpu++ )
+    for_each_online_cpu ( cpu )
     {
         char buf[16];
 
-- 
2.11.0


Re: [PATCH] xen/trace: Don't dump offline CPUs in debugtrace_dump_worker()
Posted by Jan Beulich 3 years, 11 months ago
On 21.05.2020 10:44, Andrew Cooper wrote:
> The 'T' debugkey reliably wedges on one of my systems, which has a sparse
> APIC_ID layout due to a non power-of-2 number of cores per socket.  The
> per_cpu(dt_cpu_data, cpu) calcution falls over the deliberately non-canonical
> poison value.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>