Concerns regarding e17bebd049 ("dump: Set correct vaddr for ELF dump")

Posted by Stephen Brennan 7 months, 1 week ago

Hello all,

I've started working on better support and documentation around
hypervisor vmcores in the Drgn debugger[1]. Of course there's quite a
lot of different implementations out there, but recently I'm looking at
Qemu kdump and ELF vmcores generated via dump-guest-memory, and one
thing caught my eye. I generated a ELF vmcore without the paging option
enabled, and without the guest note loaded, and the resulting core
dump's program header looked like this:

$ eu-readelf -l dumpfile2
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  NOTE           0x000168 0x0000000000000000 0x0000000000000000 0x001980 0x001980     0x0
  LOAD           0x001ae8 0x0000000000000000 0x0000000000000000 0x80000000 0x80000000     0x0
  LOAD           0x80001ae8 0x00000000fffc0000 0x00000000fffc0000 0x040000 0x040000     0x0

In particular, the "VirtAddr" field for the loadable segment shows a
confusing address - it appears to reuse the segment's physical address,
despite the fact that there's no actual corresponding mapping.

By comparison, the /proc/kcore and /proc/vmcore ELF vmcores use the
VirtAddr in the program header to represent the real virtual memory
mappings in use by the kernel. Debuggers can directly use these without
needing to walk page tables. If there is no virtual memory mapping
information available, I would have expected a placeholder value such as
0000... or FFFF... to take the place of VirtAddr here so a debugger can
detect the lack of virtual mappings and know that it needs to use
architecture-specific details (and the vmcoreinfo) to find the page
tables and accurately determine memory mappings. As it is, this program
header seems to advertise to a debugger, "yes, we have the virtual
memory mappings" when in fact, that's not the case.

It seems that this behavior was introduced in e17bebd049 ("dump: Set
correct vaddr for ELF dump")[2], a small commit I'll reproduce below.
The justification seems to be that it fixes an issue reading the vmcore
with GDB, but I wonder if that's not a GDB bug which should have been
fixed with them? If GDB aims to support ELF kernel core dumps,
presumably it should be handling physical addresses separately from
virtual addresses. And if GDB doesn't aim for this, but you'd like to
con it into reading your core dump, presumably the onus is on you to
edit the ELF VirtAddr field to suit your needs? It should be QEMU's
primary goal to produce a *correct* vmcore, not work around limitations
or bugs in GDB.

I'd like to propose reverting this, since it makes it impossible to
interpret QEMU ELF vmcores, unless you discard all the virtual addresses
in the program headers, and unconditionally do all the page table walks
yourself. But I wanted to see if there was some justification for this
behavior that I missed.

Thanks,
Stephen

[1]: https://github.com/osandov/drgn
[2]: https://lore.kernel.org/qemu-devel/20181225125344.4482-1-arilou@gmail.com/

---

commit e17bebd049d78f489c2cff755e2b66a0536a156e
Author: Jon Doron <arilou@gmail.com>
Date:   Wed Jan 9 10:22:03 2019 +0200

    dump: Set correct vaddr for ELF dump
    
    vaddr needs to be equal to the paddr since the dump file represents the
    physical memory image.
    
    Without setting vaddr correctly, GDB would load all the different memory
    regions on top of each other to vaddr 0, thus making GDB showing the wrong
    memory data for a given address.
    
    Signed-off-by: Jon Doron <arilou@gmail.com>
    Message-Id: <20190109082203.27142-1-arilou@gmail.com>
    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Tested-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Acked-by: Laszlo Ersek <lersek@redhat.com>

diff --git a/dump.c b/dump.c
index ef1d8025c9..107a67165a 100644
--- a/dump.c
+++ b/dump.c
@@ -192,7 +192,7 @@ static void write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
     phdr.p_paddr = cpu_to_dump64(s, memory_mapping->phys_addr);
     phdr.p_filesz = cpu_to_dump64(s, filesz);
     phdr.p_memsz = cpu_to_dump64(s, memory_mapping->length);
-    phdr.p_vaddr = cpu_to_dump64(s, memory_mapping->virt_addr);
+    phdr.p_vaddr = cpu_to_dump64(s, memory_mapping->virt_addr) ?: phdr.p_paddr;
 
     assert(memory_mapping->length >= filesz);
 
@@ -216,7 +216,8 @@ static void write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
     phdr.p_paddr = cpu_to_dump32(s, memory_mapping->phys_addr);
     phdr.p_filesz = cpu_to_dump32(s, filesz);
     phdr.p_memsz = cpu_to_dump32(s, memory_mapping->length);
-    phdr.p_vaddr = cpu_to_dump32(s, memory_mapping->virt_addr);
+    phdr.p_vaddr =
+        cpu_to_dump32(s, memory_mapping->virt_addr) ?: phdr.p_paddr;
 
     assert(memory_mapping->length >= filesz);
 
diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
index 198cd0fe40..2c587cbefc 100644
--- a/scripts/dump-guest-memory.py
+++ b/scripts/dump-guest-memory.py
@@ -163,6 +163,7 @@ def add_segment(self, p_type, p_paddr, p_size):
         phdr = get_arch_phdr(self.endianness, self.elfclass)
         phdr.p_type = p_type
         phdr.p_paddr = p_paddr
+        phdr.p_vaddr = p_paddr
         phdr.p_filesz = p_size
         phdr.p_memsz = p_size
         self.segments.append(phdr)

Re: Concerns regarding e17bebd049 ("dump: Set correct vaddr for ELF dump")

Posted by Jon Doron 7 months, 1 week ago

Hi Stephen,

Like you have said the reason is as I wrote in the commit message, 
without "fixing" the vaddr GDB is messing up mapping and working with 
the generated core file.

This patch is almost 4 years old, perhaps some changes to GDB has been 
introduced to resolve this, I have not checked since then.

As I'm no longer using this feature and have not worked and tested it 
in a long while, so I have no obligations to this change, but perhaps
someone else might be using it...

-- Jon.

On 19/09/2023, Stephen Brennan wrote:
>Hello all,
>
>I've started working on better support and documentation around
>hypervisor vmcores in the Drgn debugger[1]. Of course there's quite a
>lot of different implementations out there, but recently I'm looking at
>Qemu kdump and ELF vmcores generated via dump-guest-memory, and one
>thing caught my eye. I generated a ELF vmcore without the paging option
>enabled, and without the guest note loaded, and the resulting core
>dump's program header looked like this:
>
>$ eu-readelf -l dumpfile2
>Program Headers:
>  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
>  NOTE           0x000168 0x0000000000000000 0x0000000000000000 0x001980 0x001980     0x0
>  LOAD           0x001ae8 0x0000000000000000 0x0000000000000000 0x80000000 0x80000000     0x0
>  LOAD           0x80001ae8 0x00000000fffc0000 0x00000000fffc0000 0x040000 0x040000     0x0
>
>In particular, the "VirtAddr" field for the loadable segment shows a
>confusing address - it appears to reuse the segment's physical address,
>despite the fact that there's no actual corresponding mapping.
>
>By comparison, the /proc/kcore and /proc/vmcore ELF vmcores use the
>VirtAddr in the program header to represent the real virtual memory
>mappings in use by the kernel. Debuggers can directly use these without
>needing to walk page tables. If there is no virtual memory mapping
>information available, I would have expected a placeholder value such as
>0000... or FFFF... to take the place of VirtAddr here so a debugger can
>detect the lack of virtual mappings and know that it needs to use
>architecture-specific details (and the vmcoreinfo) to find the page
>tables and accurately determine memory mappings. As it is, this program
>header seems to advertise to a debugger, "yes, we have the virtual
>memory mappings" when in fact, that's not the case.
>
>It seems that this behavior was introduced in e17bebd049 ("dump: Set
>correct vaddr for ELF dump")[2], a small commit I'll reproduce below.
>The justification seems to be that it fixes an issue reading the vmcore
>with GDB, but I wonder if that's not a GDB bug which should have been
>fixed with them? If GDB aims to support ELF kernel core dumps,
>presumably it should be handling physical addresses separately from
>virtual addresses. And if GDB doesn't aim for this, but you'd like to
>con it into reading your core dump, presumably the onus is on you to
>edit the ELF VirtAddr field to suit your needs? It should be QEMU's
>primary goal to produce a *correct* vmcore, not work around limitations
>or bugs in GDB.
>
>I'd like to propose reverting this, since it makes it impossible to
>interpret QEMU ELF vmcores, unless you discard all the virtual addresses
>in the program headers, and unconditionally do all the page table walks
>yourself. But I wanted to see if there was some justification for this
>behavior that I missed.
>
>Thanks,
>Stephen
>
>[1]: https://github.com/osandov/drgn
>[2]: https://lore.kernel.org/qemu-devel/20181225125344.4482-1-arilou@gmail.com/
>
>---
>
>commit e17bebd049d78f489c2cff755e2b66a0536a156e
>Author: Jon Doron <arilou@gmail.com>
>Date:   Wed Jan 9 10:22:03 2019 +0200
>
>    dump: Set correct vaddr for ELF dump
>
>    vaddr needs to be equal to the paddr since the dump file represents the
>    physical memory image.
>
>    Without setting vaddr correctly, GDB would load all the different memory
>    regions on top of each other to vaddr 0, thus making GDB showing the wrong
>    memory data for a given address.
>
>    Signed-off-by: Jon Doron <arilou@gmail.com>
>    Message-Id: <20190109082203.27142-1-arilou@gmail.com>
>    Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>    Tested-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>    Acked-by: Laszlo Ersek <lersek@redhat.com>
>
>diff --git a/dump.c b/dump.c
>index ef1d8025c9..107a67165a 100644
>--- a/dump.c
>+++ b/dump.c
>@@ -192,7 +192,7 @@ static void write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
>     phdr.p_paddr = cpu_to_dump64(s, memory_mapping->phys_addr);
>     phdr.p_filesz = cpu_to_dump64(s, filesz);
>     phdr.p_memsz = cpu_to_dump64(s, memory_mapping->length);
>-    phdr.p_vaddr = cpu_to_dump64(s, memory_mapping->virt_addr);
>+    phdr.p_vaddr = cpu_to_dump64(s, memory_mapping->virt_addr) ?: phdr.p_paddr;
>
>     assert(memory_mapping->length >= filesz);
>
>@@ -216,7 +216,8 @@ static void write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
>     phdr.p_paddr = cpu_to_dump32(s, memory_mapping->phys_addr);
>     phdr.p_filesz = cpu_to_dump32(s, filesz);
>     phdr.p_memsz = cpu_to_dump32(s, memory_mapping->length);
>-    phdr.p_vaddr = cpu_to_dump32(s, memory_mapping->virt_addr);
>+    phdr.p_vaddr =
>+        cpu_to_dump32(s, memory_mapping->virt_addr) ?: phdr.p_paddr;
>
>     assert(memory_mapping->length >= filesz);
>
>diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
>index 198cd0fe40..2c587cbefc 100644
>--- a/scripts/dump-guest-memory.py
>+++ b/scripts/dump-guest-memory.py
>@@ -163,6 +163,7 @@ def add_segment(self, p_type, p_paddr, p_size):
>         phdr = get_arch_phdr(self.endianness, self.elfclass)
>         phdr.p_type = p_type
>         phdr.p_paddr = p_paddr
>+        phdr.p_vaddr = p_paddr
>         phdr.p_filesz = p_size
>         phdr.p_memsz = p_size
>         self.segments.append(phdr)

Re: Concerns regarding e17bebd049 ("dump: Set correct vaddr for ELF dump")

Posted by Stephen Brennan 7 months, 1 week ago

Hi Jon,

Jon Doron <arilou@gmail.com> writes:
> Hi Stephen,
> Like you have said the reason is as I wrote in the commit message, 
> without "fixing" the vaddr GDB is messing up mapping and working with 
> the generated core file.

For the record I totally love this workaround :)

It's clever and gets the job done and I would have done it in a
heartbeat. It's just that it does end up making vmcores that have
incorrect data, which is a pain for debuggers that are actually designed
to look at kernel core dumps.

> This patch is almost 4 years old, perhaps some changes to GDB has been 
> introduced to resolve this, I have not checked since then.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  NOTE           0x0000000000000168 0x0000000000000000 0x0000000000000000
                 0x0000000000001980 0x0000000000001980         0x0
  LOAD           0x0000000000001ae8 0x0000000000000000 0x0000000000000000
                 0x0000000080000000 0x0000000080000000         0x0
  LOAD           0x0000000080001ae8 0x0000000000000000 0x00000000fffc0000
                 0x0000000000040000 0x0000000000040000         0x0

(gdb) info files
Local core dump file:
        `/home/stepbren/repos/test_code/elf/dumpfile', file type elf64-x86-64.
        0x0000000000000000 - 0x0000000080000000 is load1
        0x0000000000000000 - 0x0000000000040000 is load2

$ gdb --version 
GNU gdb (GDB) Red Hat Enterprise Linux 10.2-10.0.2.el9
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


It doesn't *look like* anything has changed in this version of GDB. But
I'm not really certain that GDB is expected to use the physical
addresses in the load segments: it's not a kernel debugger.

I think hacking the p_vaddr field _is_ the way to get GDB to behave in
the way you want: allow you to read physical memory addresses.

> As I'm no longer using this feature and have not worked and tested it 
> in a long while, so I have no obligations to this change, but perhaps
> someone else might be using it...

I definitely think it's valuable for people to continue being able to
use QEMU vmcores generated with paging=off in GDB, even if GDB isn't
desgined for it. It seems like a useful hack that appeals to the lowest
common denominator: most people have GDB and not a purpose-built kernel
debugger. But maybe we could point to a program like the below that will
tweak the p_paddr field after the fact, in order to appeal to GDB's
sensibilities?

Thanks,
Stephen

---
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <byteswap.h>

#include <elf.h>

static void fail(const char *msg)
{
	fprintf(stderr, "%s\n", msg);
	exit(EXIT_FAILURE);
}

static void perror_fail(const char *pfx)
{
	perror(pfx);
	exit(EXIT_FAILURE);
}

static void usage(void)
{
	puts("usage: phys2virt COREFILE");
	puts("Modifies the ELF COREFILE so that load segments have their virtual");
	puts("address value copied from the physical address field.");
	exit(EXIT_SUCCESS);
}

static int endian(void)
{
	union {
		uint32_t ival;
		char     cval[4];
	} data;
	data.ival = 1;
	if (data.cval[0])
		return ELFDATA2LSB;
	else
		return ELFDATA2MSB;
}

int main(int argc, char **argv)
{
	char *filename;
	FILE *f;
	Elf64_Ehdr hdr;
	Elf64_Phdr *phdrs;
	off_t phoff;
	int phnum, phentsize;
	bool bswap;

	if (argc != 2 || strcmp(argv[1], "-h") == 0)
		usage();

	filename = argv[1];
	f = fopen(filename,  "r+");
	if (!f)
		perror_fail("open");

	if (fread(&hdr, sizeof(hdr), 1, f) != 1)
		perror_fail("read elf header");

	if (memcmp(hdr.e_ident, ELFMAG, 4) != 0)
		fail("not an ELF file");

	if (hdr.e_ident[EI_CLASS] != ELFCLASS64)
		fail("file is not 64-bits: unsupported");

	if (bswap) {
		phoff = bswap_64(hdr.e_phoff);
		phnum = bswap_16(hdr.e_phnum);
		phentsize = bswap_16(hdr.e_phentsize);
	} else {
		phoff = hdr.e_phoff;
		phnum = hdr.e_phnum;
		phentsize = hdr.e_phentsize;
	}
	if (phentsize != sizeof(Elf64_Phdr))
		fail("error: mismatch between phentsize and sizeof(Elf64_Phdr)");

	if (fseek(f, phoff, SEEK_SET) < 0)
		perror_fail("fseek");

	phdrs = calloc(phnum, phentsize);
	if (!phdrs)
		fail("error: allocation error");

	if (fread(phdrs, phentsize, phnum, f) != phnum)
		perror_fail("fread phdrs");

	for (int i = 0; i < phnum; i++)
		phdrs[i].p_vaddr = 0; //phdrs[i].p_paddr;

	if (fseek(f, phoff, SEEK_SET) < 0)
		perror_fail("fseek");
	if (fwrite(phdrs, phentsize, phnum, f) != phnum)
		perror_fail("fwrite phdrs");

	fclose(f);
	return EXIT_SUCCESS;
}

Re: Concerns regarding e17bebd049 ("dump: Set correct vaddr for ELF dump")

Posted by Laszlo Ersek 7 months, 1 week ago

On 9/20/23 19:35, Stephen Brennan wrote:
> Hi Jon,
> 
> Jon Doron <arilou@gmail.com> writes:
>> Hi Stephen,
>> Like you have said the reason is as I wrote in the commit message, 
>> without "fixing" the vaddr GDB is messing up mapping and working with 
>> the generated core file.
> 
> For the record I totally love this workaround :)
> 
> It's clever and gets the job done and I would have done it in a
> heartbeat. It's just that it does end up making vmcores that have
> incorrect data, which is a pain for debuggers that are actually designed
> to look at kernel core dumps.
> 
>> This patch is almost 4 years old, perhaps some changes to GDB has been 
>> introduced to resolve this, I have not checked since then.
> 
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr
>                  FileSiz            MemSiz              Flags  Align
>   NOTE           0x0000000000000168 0x0000000000000000 0x0000000000000000
>                  0x0000000000001980 0x0000000000001980         0x0
>   LOAD           0x0000000000001ae8 0x0000000000000000 0x0000000000000000
>                  0x0000000080000000 0x0000000080000000         0x0
>   LOAD           0x0000000080001ae8 0x0000000000000000 0x00000000fffc0000
>                  0x0000000000040000 0x0000000000040000         0x0
> 
> (gdb) info files
> Local core dump file:
>         `/home/stepbren/repos/test_code/elf/dumpfile', file type elf64-x86-64.
>         0x0000000000000000 - 0x0000000080000000 is load1
>         0x0000000000000000 - 0x0000000000040000 is load2
> 
> $ gdb --version 
> GNU gdb (GDB) Red Hat Enterprise Linux 10.2-10.0.2.el9
> Copyright (C) 2021 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> 
> 
> It doesn't *look like* anything has changed in this version of GDB. But
> I'm not really certain that GDB is expected to use the physical
> addresses in the load segments: it's not a kernel debugger.

The paging=off vmcores dumped by QEMU are primarily meant for the
"crash" utility <https://github.com/crash-utility/crash.git>, not gdb.
Crash builds upon gdb (it downloads a gdb tarball at build time, IIRC),
but either way, the vmcores are meant to be consumed by "crash", and
crash *is* a kernel debugger (both live, and post-mortem).

So, from my perspective: "whatever works with 'crash'". If you revert
Jon's commit and the vmcores continue working with "crash", I won't object.

I commented similarly under Jon's v1 patch -- as long as paging=off
dumps continue working with "crash", I'm fine:

http://mid.mail-archive.com/7961a154-f139-af73-613d-94b88bf95392@redhat.com

For reference, these are the v1 through v3 patch threads, from 2019:

http://mid.mail-archive.com/20181225125344.4482-1-arilou@gmail.com
http://mid.mail-archive.com/20190108130219.18550-1-arilou@gmail.com
http://mid.mail-archive.com/20190109082203.27142-1-arilou@gmail.com

Laszlo


> 
> I think hacking the p_vaddr field _is_ the way to get GDB to behave in
> the way you want: allow you to read physical memory addresses.
> 
>> As I'm no longer using this feature and have not worked and tested it 
>> in a long while, so I have no obligations to this change, but perhaps
>> someone else might be using it...
> 
> I definitely think it's valuable for people to continue being able to
> use QEMU vmcores generated with paging=off in GDB, even if GDB isn't
> desgined for it. It seems like a useful hack that appeals to the lowest
> common denominator: most people have GDB and not a purpose-built kernel
> debugger. But maybe we could point to a program like the below that will
> tweak the p_paddr field after the fact, in order to appeal to GDB's
> sensibilities?
> 
> Thanks,
> Stephen
> 
> ---
> #include <stdbool.h>
> #include <stdlib.h>
> #include <stdio.h>
> #include <string.h>
> #include <byteswap.h>
> 
> #include <elf.h>
> 
> static void fail(const char *msg)
> {
> 	fprintf(stderr, "%s\n", msg);
> 	exit(EXIT_FAILURE);
> }
> 
> static void perror_fail(const char *pfx)
> {
> 	perror(pfx);
> 	exit(EXIT_FAILURE);
> }
> 
> static void usage(void)
> {
> 	puts("usage: phys2virt COREFILE");
> 	puts("Modifies the ELF COREFILE so that load segments have their virtual");
> 	puts("address value copied from the physical address field.");
> 	exit(EXIT_SUCCESS);
> }
> 
> static int endian(void)
> {
> 	union {
> 		uint32_t ival;
> 		char     cval[4];
> 	} data;
> 	data.ival = 1;
> 	if (data.cval[0])
> 		return ELFDATA2LSB;
> 	else
> 		return ELFDATA2MSB;
> }
> 
> int main(int argc, char **argv)
> {
> 	char *filename;
> 	FILE *f;
> 	Elf64_Ehdr hdr;
> 	Elf64_Phdr *phdrs;
> 	off_t phoff;
> 	int phnum, phentsize;
> 	bool bswap;
> 
> 	if (argc != 2 || strcmp(argv[1], "-h") == 0)
> 		usage();
> 
> 	filename = argv[1];
> 	f = fopen(filename,  "r+");
> 	if (!f)
> 		perror_fail("open");
> 
> 	if (fread(&hdr, sizeof(hdr), 1, f) != 1)
> 		perror_fail("read elf header");
> 
> 	if (memcmp(hdr.e_ident, ELFMAG, 4) != 0)
> 		fail("not an ELF file");
> 
> 	if (hdr.e_ident[EI_CLASS] != ELFCLASS64)
> 		fail("file is not 64-bits: unsupported");
> 
> 	if (bswap) {
> 		phoff = bswap_64(hdr.e_phoff);
> 		phnum = bswap_16(hdr.e_phnum);
> 		phentsize = bswap_16(hdr.e_phentsize);
> 	} else {
> 		phoff = hdr.e_phoff;
> 		phnum = hdr.e_phnum;
> 		phentsize = hdr.e_phentsize;
> 	}
> 	if (phentsize != sizeof(Elf64_Phdr))
> 		fail("error: mismatch between phentsize and sizeof(Elf64_Phdr)");
> 
> 	if (fseek(f, phoff, SEEK_SET) < 0)
> 		perror_fail("fseek");
> 
> 	phdrs = calloc(phnum, phentsize);
> 	if (!phdrs)
> 		fail("error: allocation error");
> 
> 	if (fread(phdrs, phentsize, phnum, f) != phnum)
> 		perror_fail("fread phdrs");
> 
> 	for (int i = 0; i < phnum; i++)
> 		phdrs[i].p_vaddr = 0; //phdrs[i].p_paddr;
> 
> 	if (fseek(f, phoff, SEEK_SET) < 0)
> 		perror_fail("fseek");
> 	if (fwrite(phdrs, phentsize, phnum, f) != phnum)
> 		perror_fail("fwrite phdrs");
> 
> 	fclose(f);
> 	return EXIT_SUCCESS;
> }
>

Re: Concerns regarding e17bebd049 ("dump: Set correct vaddr for ELF dump")

Posted by Stephen Brennan 7 months, 1 week ago

Stephen Brennan <stephen.s.brennan@oracle.com> writes:
> Hi Jon,
>
> Jon Doron <arilou@gmail.com> writes:
>> Hi Stephen,
>> Like you have said the reason is as I wrote in the commit message, 
>> without "fixing" the vaddr GDB is messing up mapping and working with 
>> the generated core file.
>
> For the record I totally love this workaround :)
>
> It's clever and gets the job done and I would have done it in a
> heartbeat. It's just that it does end up making vmcores that have
> incorrect data, which is a pain for debuggers that are actually designed
> to look at kernel core dumps.
>
>> This patch is almost 4 years old, perhaps some changes to GDB has been 
>> introduced to resolve this, I have not checked since then.
>
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr
>                  FileSiz            MemSiz              Flags  Align
>   NOTE           0x0000000000000168 0x0000000000000000 0x0000000000000000
>                  0x0000000000001980 0x0000000000001980         0x0
>   LOAD           0x0000000000001ae8 0x0000000000000000 0x0000000000000000
>                  0x0000000080000000 0x0000000080000000         0x0
>   LOAD           0x0000000080001ae8 0x0000000000000000 0x00000000fffc0000
>                  0x0000000000040000 0x0000000000040000         0x0
>
> (gdb) info files
> Local core dump file:
>         `/home/stepbren/repos/test_code/elf/dumpfile', file type elf64-x86-64.
>         0x0000000000000000 - 0x0000000080000000 is load1
>         0x0000000000000000 - 0x0000000000040000 is load2
>
> $ gdb --version 
> GNU gdb (GDB) Red Hat Enterprise Linux 10.2-10.0.2.el9
> Copyright (C) 2021 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
>
>
> It doesn't *look like* anything has changed in this version of GDB. But
> I'm not really certain that GDB is expected to use the physical
> addresses in the load segments: it's not a kernel debugger.
>
> I think hacking the p_vaddr field _is_ the way to get GDB to behave in
> the way you want: allow you to read physical memory addresses.
>
>> As I'm no longer using this feature and have not worked and tested it 
>> in a long while, so I have no obligations to this change, but perhaps
>> someone else might be using it...
>
> I definitely think it's valuable for people to continue being able to
> use QEMU vmcores generated with paging=off in GDB, even if GDB isn't
> desgined for it. It seems like a useful hack that appeals to the lowest
> common denominator: most people have GDB and not a purpose-built kernel
> debugger. But maybe we could point to a program like the below that will
> tweak the p_paddr field after the fact, in order to appeal to GDB's
> sensibilities?

And of course I sent the wrong copy of the file. Attached is the program
I intended to send (which properly handles endianness and sets the vaddr
as expected).

Re: Concerns regarding e17bebd049 ("dump: Set correct vaddr for ELF dump")

Posted by Dave Young 7 months, 1 week ago

Not sure if crash people subscribed to linux-debuggers, let's add more
cc for awareness about this thread.

On Thu, 21 Sept 2023 at 01:45, Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Stephen Brennan <stephen.s.brennan@oracle.com> writes:
> > Hi Jon,
> >
> > Jon Doron <arilou@gmail.com> writes:
> >> Hi Stephen,
> >> Like you have said the reason is as I wrote in the commit message,
> >> without "fixing" the vaddr GDB is messing up mapping and working with
> >> the generated core file.
> >
> > For the record I totally love this workaround :)
> >
> > It's clever and gets the job done and I would have done it in a
> > heartbeat. It's just that it does end up making vmcores that have
> > incorrect data, which is a pain for debuggers that are actually designed
> > to look at kernel core dumps.
> >
> >> This patch is almost 4 years old, perhaps some changes to GDB has been
> >> introduced to resolve this, I have not checked since then.
> >
> > Program Headers:
> >   Type           Offset             VirtAddr           PhysAddr
> >                  FileSiz            MemSiz              Flags  Align
> >   NOTE           0x0000000000000168 0x0000000000000000 0x0000000000000000
> >                  0x0000000000001980 0x0000000000001980         0x0
> >   LOAD           0x0000000000001ae8 0x0000000000000000 0x0000000000000000
> >                  0x0000000080000000 0x0000000080000000         0x0
> >   LOAD           0x0000000080001ae8 0x0000000000000000 0x00000000fffc0000
> >                  0x0000000000040000 0x0000000000040000         0x0
> >
> > (gdb) info files
> > Local core dump file:
> >         `/home/stepbren/repos/test_code/elf/dumpfile', file type elf64-x86-64.
> >         0x0000000000000000 - 0x0000000080000000 is load1
> >         0x0000000000000000 - 0x0000000000040000 is load2
> >
> > $ gdb --version
> > GNU gdb (GDB) Red Hat Enterprise Linux 10.2-10.0.2.el9
> > Copyright (C) 2021 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.
> >
> >
> > It doesn't *look like* anything has changed in this version of GDB. But
> > I'm not really certain that GDB is expected to use the physical
> > addresses in the load segments: it's not a kernel debugger.
> >
> > I think hacking the p_vaddr field _is_ the way to get GDB to behave in
> > the way you want: allow you to read physical memory addresses.
> >
> >> As I'm no longer using this feature and have not worked and tested it
> >> in a long while, so I have no obligations to this change, but perhaps
> >> someone else might be using it...
> >
> > I definitely think it's valuable for people to continue being able to
> > use QEMU vmcores generated with paging=off in GDB, even if GDB isn't
> > desgined for it. It seems like a useful hack that appeals to the lowest
> > common denominator: most people have GDB and not a purpose-built kernel
> > debugger. But maybe we could point to a program like the below that will
> > tweak the p_paddr field after the fact, in order to appeal to GDB's
> > sensibilities?
>
> And of course I sent the wrong copy of the file. Attached is the program
> I intended to send (which properly handles endianness and sets the vaddr
> as expected).
>