drivers/char/ipmi/ipmi_msghandler.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
In kernel 6.18.7, we encountered the following panic.
[164050.860241] list_add double add: new=ffff8a5833cd0000, prev=ffff8a5833cd0000, next=ffff8a387b2491b0.
[164050.869744] ------------[ cut here ]------------
[164050.874698] kernel BUG at lib/list_debug.c:35!
[164050.879435] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[164050.884742] CPU: 5 UID: 0 PID: 99228 Comm: kworker/5:2 Kdump: loaded Tainted: G S E 6.18.7-20260127.el9.x86_64 #1 PREEMPT(voluntary)
[164050.899481] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
[164050.905470] Hardware name: Dell Inc. PowerEdge R640/0X45NX, BIOS 2.15.1 06/15/2022
[164050.913285] Workqueue: events smi_work [ipmi_msghandler]
[164050.918865] RIP: 0010:__list_add_valid_or_report+0xb6/0xc0
[164050.924609] Code: c7 e8 b1 c3 89 48 8b 16 48 89 f1 4c 89 e6 e8 e1 16 a9 ff 0f 0b 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 40 b2 c3 89 e8 ca 16 a9 ff <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90
[164050.943787] RSP: 0018:ffffceacac91fdc0 EFLAGS: 00010246
[164050.949271] RAX: 0000000000000058 RBX: ffff8a5833cd0000 RCX: 0000000000000000
[164050.956665] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8a773f89c1c0
[164050.964054] RBP: ffff8a5833cd0000 R08: 0000000000000000 R09: ffffceacac91fc78
[164050.971441] R10: ffffceacac91fc70 R11: ffffffff8a7e10c8 R12: ffff8a387b2491b0
[164050.978837] R13: 0000000000000000 R14: ffff8a387b249190 R15: ffff8a387b2491b0
[164050.986229] FS: 0000000000000000(0000) GS:ffff8a77b459d000(0000) knlGS:0000000000000000
[164050.994581] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[164051.000597] CR2: 00007ff95841be6c CR3: 000000063b022001 CR4: 00000000007726f0
[164051.007997] PKRU: 55555554
[164051.010970] Call Trace:
[164051.013690] <TASK>
[164051.016055] ? mutex_lock+0xe/0x30
[164051.019724] deliver_response+0x59/0x100 [ipmi_msghandler]
[164051.025495] smi_work+0xa0/0x370 [ipmi_msghandler]
[164051.030563] process_one_work+0x19d/0x3d0
[164051.034844] worker_thread+0x23e/0x360
[164051.038873] ? __pfx_worker_thread+0x10/0x10
[164051.043423] kthread+0xfb/0x230
[164051.046850] ? __pfx_kthread+0x10/0x10
[164051.050872] ? __pfx_kthread+0x10/0x10
[164051.054894] ret_from_fork+0xe9/0x100
[164051.058826] ? __pfx_kthread+0x10/0x10
[164051.062852] ret_from_fork_asm+0x1a/0x30
[164051.067065] </TASK>
Because kdump was not properly configured, I was unable to inspect the
vmcore, but based on the oops and the current implementation, I infer
that the issue occurred via the following mechanism.
- The BMC becomes unstable
- Some kind of msg is queued in (hp_)xmit_msgs and smi_work runs
- (Because the BMC is unstable) intf->handlers->sender returns an error
- deliver_err_response() queues newmsg into intf->user_msg
- goto restart, but since intf->curr_msg is naturally non-NULL, no
dequeue is performed from (hp_)xmit_msgs
- The same newmsg as before the restart goes through the same flow and
deliver_err_response is executed, leading to a double add
I took a quick look at the BMC logs and there was a watchdog BMC reset
around the time of the panic, so I'm pretty sure the BMC was unstable.
I'm not sure if this is the correct approach, but I submit a RFC PATCH
in the spirit of a bug report. I would appreciate your feedback. You
can completely discard mine and fix it as a separate patch if you
prefer.
Thanks.
Kenta Akagi (1):
ipmi: Fix double list_add when sender returns an error
drivers/char/ipmi/ipmi_msghandler.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--
2.50.1
On Thu, Feb 05, 2026 at 11:47:38PM +0900, Kenta Akagi wrote: > In kernel 6.18.7, we encountered the following panic. > > [164050.860241] list_add double add: new=ffff8a5833cd0000, prev=ffff8a5833cd0000, next=ffff8a387b2491b0. > [164050.869744] ------------[ cut here ]------------ > [164050.874698] kernel BUG at lib/list_debug.c:35! > [164050.879435] Oops: invalid opcode: 0000 [#1] SMP NOPTI > [164050.884742] CPU: 5 UID: 0 PID: 99228 Comm: kworker/5:2 Kdump: loaded Tainted: G S E 6.18.7-20260127.el9.x86_64 #1 PREEMPT(voluntary) > [164050.899481] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE > [164050.905470] Hardware name: Dell Inc. PowerEdge R640/0X45NX, BIOS 2.15.1 06/15/2022 > [164050.913285] Workqueue: events smi_work [ipmi_msghandler] > [164050.918865] RIP: 0010:__list_add_valid_or_report+0xb6/0xc0 > [164050.924609] Code: c7 e8 b1 c3 89 48 8b 16 48 89 f1 4c 89 e6 e8 e1 16 a9 ff 0f 0b 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 40 b2 c3 89 e8 ca 16 a9 ff <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 > [164050.943787] RSP: 0018:ffffceacac91fdc0 EFLAGS: 00010246 > [164050.949271] RAX: 0000000000000058 RBX: ffff8a5833cd0000 RCX: 0000000000000000 > [164050.956665] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8a773f89c1c0 > [164050.964054] RBP: ffff8a5833cd0000 R08: 0000000000000000 R09: ffffceacac91fc78 > [164050.971441] R10: ffffceacac91fc70 R11: ffffffff8a7e10c8 R12: ffff8a387b2491b0 > [164050.978837] R13: 0000000000000000 R14: ffff8a387b249190 R15: ffff8a387b2491b0 > [164050.986229] FS: 0000000000000000(0000) GS:ffff8a77b459d000(0000) knlGS:0000000000000000 > [164050.994581] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [164051.000597] CR2: 00007ff95841be6c CR3: 000000063b022001 CR4: 00000000007726f0 > [164051.007997] PKRU: 55555554 > [164051.010970] Call Trace: > [164051.013690] <TASK> > [164051.016055] ? mutex_lock+0xe/0x30 > [164051.019724] deliver_response+0x59/0x100 [ipmi_msghandler] > [164051.025495] smi_work+0xa0/0x370 [ipmi_msghandler] > [164051.030563] process_one_work+0x19d/0x3d0 > [164051.034844] worker_thread+0x23e/0x360 > [164051.038873] ? __pfx_worker_thread+0x10/0x10 > [164051.043423] kthread+0xfb/0x230 > [164051.046850] ? __pfx_kthread+0x10/0x10 > [164051.050872] ? __pfx_kthread+0x10/0x10 > [164051.054894] ret_from_fork+0xe9/0x100 > [164051.058826] ? __pfx_kthread+0x10/0x10 > [164051.062852] ret_from_fork_asm+0x1a/0x30 > [164051.067065] </TASK> > > Because kdump was not properly configured, I was unable to inspect the > vmcore, but based on the oops and the current implementation, I infer > that the issue occurred via the following mechanism. A fix for this is already queued in the next tree. I should have it out soon. -corey > > - The BMC becomes unstable > - Some kind of msg is queued in (hp_)xmit_msgs and smi_work runs > - (Because the BMC is unstable) intf->handlers->sender returns an error > - deliver_err_response() queues newmsg into intf->user_msg > - goto restart, but since intf->curr_msg is naturally non-NULL, no > dequeue is performed from (hp_)xmit_msgs > - The same newmsg as before the restart goes through the same flow and > deliver_err_response is executed, leading to a double add > > I took a quick look at the BMC logs and there was a watchdog BMC reset > around the time of the panic, so I'm pretty sure the BMC was unstable. > > I'm not sure if this is the correct approach, but I submit a RFC PATCH > in the spirit of a bug report. I would appreciate your feedback. You > can completely discard mine and fix it as a separate patch if you > prefer. > > Thanks. > > > Kenta Akagi (1): > ipmi: Fix double list_add when sender returns an error > > drivers/char/ipmi/ipmi_msghandler.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > -- > 2.50.1 >
On 2026/02/06 2:50, Corey Minyard wrote: > On Thu, Feb 05, 2026 at 11:47:38PM +0900, Kenta Akagi wrote: >> In kernel 6.18.7, we encountered the following panic. >> >> [164050.860241] list_add double add: new=ffff8a5833cd0000, prev=ffff8a5833cd0000, next=ffff8a387b2491b0. >> [164050.869744] ------------[ cut here ]------------ >> [164050.874698] kernel BUG at lib/list_debug.c:35! >> [164050.879435] Oops: invalid opcode: 0000 [#1] SMP NOPTI >> [164050.884742] CPU: 5 UID: 0 PID: 99228 Comm: kworker/5:2 Kdump: loaded Tainted: G S E 6.18.7-20260127.el9.x86_64 #1 PREEMPT(voluntary) >> [164050.899481] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE >> [164050.905470] Hardware name: Dell Inc. PowerEdge R640/0X45NX, BIOS 2.15.1 06/15/2022 >> [164050.913285] Workqueue: events smi_work [ipmi_msghandler] >> [164050.918865] RIP: 0010:__list_add_valid_or_report+0xb6/0xc0 >> [164050.924609] Code: c7 e8 b1 c3 89 48 8b 16 48 89 f1 4c 89 e6 e8 e1 16 a9 ff 0f 0b 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 40 b2 c3 89 e8 ca 16 a9 ff <0f> 0b 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 >> [164050.943787] RSP: 0018:ffffceacac91fdc0 EFLAGS: 00010246 >> [164050.949271] RAX: 0000000000000058 RBX: ffff8a5833cd0000 RCX: 0000000000000000 >> [164050.956665] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8a773f89c1c0 >> [164050.964054] RBP: ffff8a5833cd0000 R08: 0000000000000000 R09: ffffceacac91fc78 >> [164050.971441] R10: ffffceacac91fc70 R11: ffffffff8a7e10c8 R12: ffff8a387b2491b0 >> [164050.978837] R13: 0000000000000000 R14: ffff8a387b249190 R15: ffff8a387b2491b0 >> [164050.986229] FS: 0000000000000000(0000) GS:ffff8a77b459d000(0000) knlGS:0000000000000000 >> [164050.994581] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [164051.000597] CR2: 00007ff95841be6c CR3: 000000063b022001 CR4: 00000000007726f0 >> [164051.007997] PKRU: 55555554 >> [164051.010970] Call Trace: >> [164051.013690] <TASK> >> [164051.016055] ? mutex_lock+0xe/0x30 >> [164051.019724] deliver_response+0x59/0x100 [ipmi_msghandler] >> [164051.025495] smi_work+0xa0/0x370 [ipmi_msghandler] >> [164051.030563] process_one_work+0x19d/0x3d0 >> [164051.034844] worker_thread+0x23e/0x360 >> [164051.038873] ? __pfx_worker_thread+0x10/0x10 >> [164051.043423] kthread+0xfb/0x230 >> [164051.046850] ? __pfx_kthread+0x10/0x10 >> [164051.050872] ? __pfx_kthread+0x10/0x10 >> [164051.054894] ret_from_fork+0xe9/0x100 >> [164051.058826] ? __pfx_kthread+0x10/0x10 >> [164051.062852] ret_from_fork_asm+0x1a/0x30 >> [164051.067065] </TASK> >> >> Because kdump was not properly configured, I was unable to inspect the >> vmcore, but based on the oops and the current implementation, I infer >> that the issue occurred via the following mechanism. > > A fix for this is already queued in the next tree. I should have it > out soon. Ah, sorry for I didn't notice that. I'll wait for the "ipmi: Fix use-after-free and list corruption on sender error". Thanks, Akagi > > -corey > >> >> - The BMC becomes unstable >> - Some kind of msg is queued in (hp_)xmit_msgs and smi_work runs >> - (Because the BMC is unstable) intf->handlers->sender returns an error >> - deliver_err_response() queues newmsg into intf->user_msg >> - goto restart, but since intf->curr_msg is naturally non-NULL, no >> dequeue is performed from (hp_)xmit_msgs >> - The same newmsg as before the restart goes through the same flow and >> deliver_err_response is executed, leading to a double add >> >> I took a quick look at the BMC logs and there was a watchdog BMC reset >> around the time of the panic, so I'm pretty sure the BMC was unstable. >> >> I'm not sure if this is the correct approach, but I submit a RFC PATCH >> in the spirit of a bug report. I would appreciate your feedback. You >> can completely discard mine and fix it as a separate patch if you >> prefer. >> >> Thanks. >> >> >> Kenta Akagi (1): >> ipmi: Fix double list_add when sender returns an error >> >> drivers/char/ipmi/ipmi_msghandler.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> -- >> 2.50.1 >> >
© 2016 - 2026 Red Hat, Inc.