GDB causing OOPS on insmod

Posted by Vishal Moola (Oracle) 5 months ago

I'm on a x86 defconfig + GDB_SCRIPTS + DEBUG_VM + PAGE_OWNER kernel. Running 
'lx-symbols' in gdb Before loading modules causes the kernel to OOPS on
module load:

[   13.627373] BUG: kernel NULL pointer dereference, address: 0000000000000900
[   13.627376] #PF: supervisor write access in kernel mode
[   13.627377] #PF: error_code(0x0002) - not-present page
[   13.627378] PGD 0 P4D 0 
[   13.627379] Oops: Oops: 0002 [#1] SMP PTI
[   13.627383] CPU: 0 UID: 0 PID: 279 Comm: insmod Not tainted 6.18.0-rc3+ #163 PREEMPT(voluntary) 
[   13.627384] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-6.fc43 04/01/2014
[   13.627385] RIP: 0010:__kernel_read+0x210/0x2f0
[   13.627390] Code: 00 40 0f 84 bd 00 00 00 48 3b 7f 18 0f 84 c3 00 00 00 48 89 f2 b9 02 00 00 00 44 89 d6 e8 78 6c 06 00 4d 01 ac 24 f0 08 00 00 <49> 83 84 24 00 09 00 00 01 48 8b 45 e0 65 48 2b 05 53 38 c7 01 0f
[   13.627391] RSP: 0018:ffffc900002f7c68 EFLAGS: 00010246
[   13.627393] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[   13.627393] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff82d47e70
[   13.627394] RBP: 00000000002f7cf8 R08: 0000000000000000 R09: 0000000000000000
[   13.627394] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[   13.627395] R13: 0000000000000000 R14: ffffc900002f7d10 R15: ffffc900002f7d10
[   13.627399] FS:  00007f704851c740(0000) GS:ffff8880bba45000(0000) knlGS:0000000000000000
[   13.627401] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.627402] CR2: 0000000000000900 CR3: 00000000095f2000 CR4: 00000000000006f0
[   13.627406] Call Trace:
[   13.627407]  <TASK>
[   13.627409]  ? init_module_from_file+0x92/0xd0
[   13.627412]  ? init_module_from_file+0x92/0xd0
[   13.627414]  ? idempotent_init_module+0x109/0x2f0
[   13.627416]  ? __x64_sys_finit_module+0x60/0xb0
[   13.627418]  ? x64_sys_call+0x1a74/0x1da0
[   13.627421]  ? do_syscall_64+0xa4/0x290
[   13.627429]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   13.627431]  </TASK>
[   13.627431] Modules linked in: test_xarray
[   13.627433] CR2: 0000000000000900
[   13.627434] ---[ end trace 0000000000000000 ]---
[   13.627435] RIP: 0010:__kernel_read+0x210/0x2f0
[   13.627437] Code: 00 40 0f 84 bd 00 00 00 48 3b 7f 18 0f 84 c3 00 00 00 48 89 f2 b9 02 00 00 00 44 89 d6 e8 78 6c 06 00 4d 01 ac 24 f0 08 00 00 <49> 83 84 24 00 09 00 00 01 48 8b 45 e0 65 48 2b 05 53 38 c7 01 0f
[   13.627438] RSP: 0018:ffffc900002f7c68 EFLAGS: 00010246
[   13.627439] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[   13.627439] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff82d47e70
[   13.627440] RBP: 00000000002f7cf8 R08: 0000000000000000 R09: 0000000000000000
[   13.627440] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[   13.627440] R13: 0000000000000000 R14: ffffc900002f7d10 R15: ffffc900002f7d10
[   13.627442] FS:  00007f704851c740(0000) GS:ffff8880bba45000(0000) knlGS:0000000000000000
[   13.627444] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.627445] CR2: 0000000000000900 CR3: 00000000095f2000 CR4: 00000000000006f0

I used module test_xarray for the purpose of demonstration, but this does
happen with all modules.

I have no clue what patch caused this, or when this bug was introduced.
I played around with the scripts a bit and found the diff below eliminates
this issue entirely:

diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
index 6edb99221675..8b507907e044 100644
--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -44,8 +44,7 @@ if hasattr(gdb, 'Breakpoint'):
                               "'{0}'\n".format(module_name))
                     cmd.load_all_symbols()
                 else:
-                    cmd.load_module_symbols(module)
-
+                    cmd.load_all_symbols()
             return False

Does anyone know what's going on here? And is this the fix we should upstream?

Re: GDB causing OOPS on insmod

Posted by Jan Kiszka 5 months ago

On 06.11.25 00:26, Vishal Moola (Oracle) wrote:
> I'm on a x86 defconfig + GDB_SCRIPTS + DEBUG_VM + PAGE_OWNER kernel. Running 
> 'lx-symbols' in gdb Before loading modules causes the kernel to OOPS on
> module load:
> 
> [   13.627373] BUG: kernel NULL pointer dereference, address: 0000000000000900
> [   13.627376] #PF: supervisor write access in kernel mode
> [   13.627377] #PF: error_code(0x0002) - not-present page
> [   13.627378] PGD 0 P4D 0 
> [   13.627379] Oops: Oops: 0002 [#1] SMP PTI
> [   13.627383] CPU: 0 UID: 0 PID: 279 Comm: insmod Not tainted 6.18.0-rc3+ #163 PREEMPT(voluntary) 
> [   13.627384] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-6.fc43 04/01/2014
> [   13.627385] RIP: 0010:__kernel_read+0x210/0x2f0
> [   13.627390] Code: 00 40 0f 84 bd 00 00 00 48 3b 7f 18 0f 84 c3 00 00 00 48 89 f2 b9 02 00 00 00 44 89 d6 e8 78 6c 06 00 4d 01 ac 24 f0 08 00 00 <49> 83 84 24 00 09 00 00 01 48 8b 45 e0 65 48 2b 05 53 38 c7 01 0f
> [   13.627391] RSP: 0018:ffffc900002f7c68 EFLAGS: 00010246
> [   13.627393] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> [   13.627393] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff82d47e70
> [   13.627394] RBP: 00000000002f7cf8 R08: 0000000000000000 R09: 0000000000000000
> [   13.627394] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [   13.627395] R13: 0000000000000000 R14: ffffc900002f7d10 R15: ffffc900002f7d10
> [   13.627399] FS:  00007f704851c740(0000) GS:ffff8880bba45000(0000) knlGS:0000000000000000
> [   13.627401] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   13.627402] CR2: 0000000000000900 CR3: 00000000095f2000 CR4: 00000000000006f0
> [   13.627406] Call Trace:
> [   13.627407]  <TASK>
> [   13.627409]  ? init_module_from_file+0x92/0xd0
> [   13.627412]  ? init_module_from_file+0x92/0xd0
> [   13.627414]  ? idempotent_init_module+0x109/0x2f0
> [   13.627416]  ? __x64_sys_finit_module+0x60/0xb0
> [   13.627418]  ? x64_sys_call+0x1a74/0x1da0
> [   13.627421]  ? do_syscall_64+0xa4/0x290
> [   13.627429]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [   13.627431]  </TASK>
> [   13.627431] Modules linked in: test_xarray
> [   13.627433] CR2: 0000000000000900
> [   13.627434] ---[ end trace 0000000000000000 ]---
> [   13.627435] RIP: 0010:__kernel_read+0x210/0x2f0
> [   13.627437] Code: 00 40 0f 84 bd 00 00 00 48 3b 7f 18 0f 84 c3 00 00 00 48 89 f2 b9 02 00 00 00 44 89 d6 e8 78 6c 06 00 4d 01 ac 24 f0 08 00 00 <49> 83 84 24 00 09 00 00 01 48 8b 45 e0 65 48 2b 05 53 38 c7 01 0f
> [   13.627438] RSP: 0018:ffffc900002f7c68 EFLAGS: 00010246
> [   13.627439] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> [   13.627439] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff82d47e70
> [   13.627440] RBP: 00000000002f7cf8 R08: 0000000000000000 R09: 0000000000000000
> [   13.627440] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [   13.627440] R13: 0000000000000000 R14: ffffc900002f7d10 R15: ffffc900002f7d10
> [   13.627442] FS:  00007f704851c740(0000) GS:ffff8880bba45000(0000) knlGS:0000000000000000
> [   13.627444] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   13.627445] CR2: 0000000000000900 CR3: 00000000095f2000 CR4: 00000000000006f0
> 
> I used module test_xarray for the purpose of demonstration, but this does
> happen with all modules.
> 
> I have no clue what patch caused this, or when this bug was introduced.
> I played around with the scripts a bit and found the diff below eliminates
> this issue entirely:
> 
> diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
> index 6edb99221675..8b507907e044 100644
> --- a/scripts/gdb/linux/symbols.py
> +++ b/scripts/gdb/linux/symbols.py
> @@ -44,8 +44,7 @@ if hasattr(gdb, 'Breakpoint'):
>                                "'{0}'\n".format(module_name))
>                      cmd.load_all_symbols()
>                  else:
> -                    cmd.load_module_symbols(module)
> -
> +                    cmd.load_all_symbols()
>              return False
> 
> Does anyone know what's going on here? And is this the fix we should upstream?

Are you using kvm or tcg with qemu? Is the issue gone when switching the
accelerator mode?

And when do you attach to the kernel here? System booted, idle, attach,
continue, load (another) module?

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

Re: GDB causing OOPS on insmod

Posted by Vishal Moola (Oracle) 5 months ago

On Thu, Nov 06, 2025 at 07:07:28AM +0100, Jan Kiszka wrote:
> On 06.11.25 00:26, Vishal Moola (Oracle) wrote:
> > I'm on a x86 defconfig + GDB_SCRIPTS + DEBUG_VM + PAGE_OWNER kernel. Running 
> > 'lx-symbols' in gdb Before loading modules causes the kernel to OOPS on
> > module load:
> > 
> > [   13.627373] BUG: kernel NULL pointer dereference, address: 0000000000000900
> > [   13.627376] #PF: supervisor write access in kernel mode
> > [   13.627377] #PF: error_code(0x0002) - not-present page
> > [   13.627378] PGD 0 P4D 0 
> > [   13.627379] Oops: Oops: 0002 [#1] SMP PTI
> > [   13.627383] CPU: 0 UID: 0 PID: 279 Comm: insmod Not tainted 6.18.0-rc3+ #163 PREEMPT(voluntary) 
> > [   13.627384] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-6.fc43 04/01/2014
> > [   13.627385] RIP: 0010:__kernel_read+0x210/0x2f0
> > [   13.627390] Code: 00 40 0f 84 bd 00 00 00 48 3b 7f 18 0f 84 c3 00 00 00 48 89 f2 b9 02 00 00 00 44 89 d6 e8 78 6c 06 00 4d 01 ac 24 f0 08 00 00 <49> 83 84 24 00 09 00 00 01 48 8b 45 e0 65 48 2b 05 53 38 c7 01 0f
> > [   13.627391] RSP: 0018:ffffc900002f7c68 EFLAGS: 00010246
> > [   13.627393] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > [   13.627393] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff82d47e70
> > [   13.627394] RBP: 00000000002f7cf8 R08: 0000000000000000 R09: 0000000000000000
> > [   13.627394] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> > [   13.627395] R13: 0000000000000000 R14: ffffc900002f7d10 R15: ffffc900002f7d10
> > [   13.627399] FS:  00007f704851c740(0000) GS:ffff8880bba45000(0000) knlGS:0000000000000000
> > [   13.627401] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   13.627402] CR2: 0000000000000900 CR3: 00000000095f2000 CR4: 00000000000006f0
> > [   13.627406] Call Trace:
> > [   13.627407]  <TASK>
> > [   13.627409]  ? init_module_from_file+0x92/0xd0
> > [   13.627412]  ? init_module_from_file+0x92/0xd0
> > [   13.627414]  ? idempotent_init_module+0x109/0x2f0
> > [   13.627416]  ? __x64_sys_finit_module+0x60/0xb0
> > [   13.627418]  ? x64_sys_call+0x1a74/0x1da0
> > [   13.627421]  ? do_syscall_64+0xa4/0x290
> > [   13.627429]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > [   13.627431]  </TASK>
> > [   13.627431] Modules linked in: test_xarray
> > [   13.627433] CR2: 0000000000000900
> > [   13.627434] ---[ end trace 0000000000000000 ]---
> > [   13.627435] RIP: 0010:__kernel_read+0x210/0x2f0
> > [   13.627437] Code: 00 40 0f 84 bd 00 00 00 48 3b 7f 18 0f 84 c3 00 00 00 48 89 f2 b9 02 00 00 00 44 89 d6 e8 78 6c 06 00 4d 01 ac 24 f0 08 00 00 <49> 83 84 24 00 09 00 00 01 48 8b 45 e0 65 48 2b 05 53 38 c7 01 0f
> > [   13.627438] RSP: 0018:ffffc900002f7c68 EFLAGS: 00010246
> > [   13.627439] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> > [   13.627439] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff82d47e70
> > [   13.627440] RBP: 00000000002f7cf8 R08: 0000000000000000 R09: 0000000000000000
> > [   13.627440] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> > [   13.627440] R13: 0000000000000000 R14: ffffc900002f7d10 R15: ffffc900002f7d10
> > [   13.627442] FS:  00007f704851c740(0000) GS:ffff8880bba45000(0000) knlGS:0000000000000000
> > [   13.627444] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   13.627445] CR2: 0000000000000900 CR3: 00000000095f2000 CR4: 00000000000006f0
> > 
> > I used module test_xarray for the purpose of demonstration, but this does
> > happen with all modules.
> > 
> > I have no clue what patch caused this, or when this bug was introduced.
> > I played around with the scripts a bit and found the diff below eliminates
> > this issue entirely:
> > 
> > diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
> > index 6edb99221675..8b507907e044 100644
> > --- a/scripts/gdb/linux/symbols.py
> > +++ b/scripts/gdb/linux/symbols.py
> > @@ -44,8 +44,7 @@ if hasattr(gdb, 'Breakpoint'):
> >                                "'{0}'\n".format(module_name))
> >                      cmd.load_all_symbols()
> >                  else:
> > -                    cmd.load_module_symbols(module)
> > -
> > +                    cmd.load_all_symbols()
> >              return False
> > 
> > Does anyone know what's going on here? And is this the fix we should upstream?
> 
> Are you using kvm or tcg with qemu? Is the issue gone when switching the
> accelerator mode?

I'm using kvm. Switching to tcg works, I hadn't thought to do that :)

kvm is definitely faster though, so support for that is my preferred
option.

> And when do you attach to the kernel here? System booted, idle, attach,
> continue, load (another) module?

I've tried attaching at all those mentioned points, it always Kills
whatever module I attmept to load after attachting gdb and running
lx-symbols. Aka I do not run into this if I never run gdb, or detach gdb
before loading a module.