drivers/vhost/vdpa.c | 1 + 1 file changed, 1 insertion(+)
The following workflow triggers the crash referenced below:
1) vhost_vdpa_unsetup_vq_irq() unregisters the irq bypass producer
but the producer->token is still valid.
2) vq context gets released and reassigned to another vq.
3) That other vq registers it's producer with the same vq context
pointer as token in vhost_vdpa_setup_vq_irq().
4) The original vq tries to unregister it's producer which it has
already unlinked in step 1. irq_bypass_unregister_producer() will go
ahead and unlink the producer once again. That happens because:
a) The producer has a token.
b) An element with that token is found. But that element comes
from step 3.
I see 3 ways to fix this:
1) Fix the vhost-vdpa part. What this patch does. vfio has a different
workflow.
2) Set the token to NULL directly in irq_bypass_unregister_producer()
after unlinking the producer. But that makes the API asymmetrical.
3) Make irq_bypass_unregister_producer() also compare the pointer
elements not just the tokens and do the unlink only on match.
Any thoughts?
Oops: general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP
CPU: 8 PID: 5190 Comm: qemu-system-x86 Not tainted 6.10.0-rc7+ #6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:irq_bypass_unregister_producer+0xa5/0xd0
RSP: 0018:ffffc900034d7e50 EFLAGS: 00010246
RAX: dead000000000122 RBX: ffff888353d12718 RCX: ffff88810336a000
RDX: dead000000000100 RSI: ffffffff829243a0 RDI: 0000000000000000
RBP: ffff888353c42000 R08: ffff888104882738 R09: ffff88810336a000
R10: ffff888448ab2050 R11: 0000000000000000 R12: ffff888353d126a0
R13: 0000000000000004 R14: 0000000000000055 R15: 0000000000000004
FS: 00007f9df9403c80(0000) GS:ffff88852cc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000562dffc6b568 CR3: 000000012efbb006 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
<TASK>
? die_addr+0x36/0x90
? exc_general_protection+0x1a8/0x390
? asm_exc_general_protection+0x26/0x30
? irq_bypass_unregister_producer+0xa5/0xd0
vhost_vdpa_setup_vq_irq+0x5a/0xc0 [vhost_vdpa]
vhost_vdpa_unlocked_ioctl+0xdcd/0xe00 [vhost_vdpa]
? vhost_vdpa_config_cb+0x30/0x30 [vhost_vdpa]
__x64_sys_ioctl+0x90/0xc0
do_syscall_64+0x4f/0x110
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f9df930774f
RSP: 002b:00007ffc55013080 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000562dfe134d20 RCX: 00007f9df930774f
RDX: 00007ffc55013200 RSI: 000000004008af21 RDI: 0000000000000011
RBP: 00007ffc55013200 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000562dfe134360
R13: 0000562dfe134d20 R14: 0000000000000000 R15: 00007f9df801e190
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
drivers/vhost/vdpa.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 478cd46a49ed..d4a7a3918d86 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -226,6 +226,7 @@ static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid)
struct vhost_virtqueue *vq = &v->vqs[qid];
irq_bypass_unregister_producer(&vq->call_ctx.producer);
+ vq->call_ctx.producer.token = NULL;
}
static int _compat_vdpa_reset(struct vhost_vdpa *v)
--
2.45.2
On Thu, Aug 1, 2024 at 11:38 PM Dragos Tatulea <dtatulea@nvidia.com> wrote: > > The following workflow triggers the crash referenced below: > > 1) vhost_vdpa_unsetup_vq_irq() unregisters the irq bypass producer > but the producer->token is still valid. > 2) vq context gets released and reassigned to another vq. Just to make sure I understand here, which structure is referred to as "vq context" here? I guess it's not call_ctx as it is a part of the vq itself. > 3) That other vq registers it's producer with the same vq context > pointer as token in vhost_vdpa_setup_vq_irq(). Or did you mean when a single eventfd is shared among different vqs? > 4) The original vq tries to unregister it's producer which it has > already unlinked in step 1. irq_bypass_unregister_producer() will go > ahead and unlink the producer once again. That happens because: > a) The producer has a token. > b) An element with that token is found. But that element comes > from step 3. > > I see 3 ways to fix this: > 1) Fix the vhost-vdpa part. What this patch does. vfio has a different > workflow. > 2) Set the token to NULL directly in irq_bypass_unregister_producer() > after unlinking the producer. But that makes the API asymmetrical. > 3) Make irq_bypass_unregister_producer() also compare the pointer > elements not just the tokens and do the unlink only on match. > > Any thoughts? > > Oops: general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP > CPU: 8 PID: 5190 Comm: qemu-system-x86 Not tainted 6.10.0-rc7+ #6 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > RIP: 0010:irq_bypass_unregister_producer+0xa5/0xd0 > RSP: 0018:ffffc900034d7e50 EFLAGS: 00010246 > RAX: dead000000000122 RBX: ffff888353d12718 RCX: ffff88810336a000 > RDX: dead000000000100 RSI: ffffffff829243a0 RDI: 0000000000000000 > RBP: ffff888353c42000 R08: ffff888104882738 R09: ffff88810336a000 > R10: ffff888448ab2050 R11: 0000000000000000 R12: ffff888353d126a0 > R13: 0000000000000004 R14: 0000000000000055 R15: 0000000000000004 > FS: 00007f9df9403c80(0000) GS:ffff88852cc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000562dffc6b568 CR3: 000000012efbb006 CR4: 0000000000772ef0 > PKRU: 55555554 > Call Trace: > <TASK> > ? die_addr+0x36/0x90 > ? exc_general_protection+0x1a8/0x390 > ? asm_exc_general_protection+0x26/0x30 > ? irq_bypass_unregister_producer+0xa5/0xd0 > vhost_vdpa_setup_vq_irq+0x5a/0xc0 [vhost_vdpa] > vhost_vdpa_unlocked_ioctl+0xdcd/0xe00 [vhost_vdpa] > ? vhost_vdpa_config_cb+0x30/0x30 [vhost_vdpa] > __x64_sys_ioctl+0x90/0xc0 > do_syscall_64+0x4f/0x110 > entry_SYSCALL_64_after_hwframe+0x4b/0x53 > RIP: 0033:0x7f9df930774f > RSP: 002b:00007ffc55013080 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000562dfe134d20 RCX: 00007f9df930774f > RDX: 00007ffc55013200 RSI: 000000004008af21 RDI: 0000000000000011 > RBP: 00007ffc55013200 R08: 0000000000000002 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000562dfe134360 > R13: 0000562dfe134d20 R14: 0000000000000000 R15: 00007f9df801e190 > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> > --- > drivers/vhost/vdpa.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index 478cd46a49ed..d4a7a3918d86 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -226,6 +226,7 @@ static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid) > struct vhost_virtqueue *vq = &v->vqs[qid]; > > irq_bypass_unregister_producer(&vq->call_ctx.producer); > + vq->call_ctx.producer.token = NULL; > } > > static int _compat_vdpa_reset(struct vhost_vdpa *v) > -- > 2.45.2 > Thanks
© 2016 - 2024 Red Hat, Inc.