When destroy inst we should make sure that we don't race
against threaded IRQ (or pending IRQ), otherwise we can
concurrently kfree() inst context and inst itself.
BUG: KASAN: slab-use-after-free in vb2_queue_error+0x80/0x90
Call trace:
dump_backtrace+0x1c4/0x1f8
show_stack+0x38/0x60
dump_stack_lvl+0x168/0x1f0
print_report+0x170/0x4c8
kasan_report+0x94/0xd0
__asan_report_load2_noabort+0x20/0x30
vb2_queue_error+0x80/0x90
venus_helper_vb2_queue_error+0x54/0x78
venc_event_notify+0xec/0x158
hfi_event_notify+0x878/0xd20
hfi_process_msg_packet+0x27c/0x4e0
venus_isr_thread+0x258/0x6e8
hfi_isr_thread+0x70/0x90
venus_isr_thread+0x34/0x50
irq_thread_fn+0x88/0x130
irq_thread+0x160/0x2c0
kthread+0x294/0x328
ret_from_fork+0x10/0x20
Allocated by task 20291:
kasan_set_track+0x4c/0x80
kasan_save_alloc_info+0x28/0x38
__kasan_kmalloc+0x84/0xa0
kmalloc_trace+0x7c/0x98
v4l2_m2m_ctx_init+0x74/0x280
venc_open+0x444/0x6d0
v4l2_open+0x19c/0x2a0
chrdev_open+0x374/0x3f0
do_dentry_open+0x710/0x10a8
vfs_open+0x88/0xa8
path_openat+0x1e6c/0x2700
do_filp_open+0x1a4/0x2e0
do_sys_openat2+0xe8/0x508
do_sys_open+0x15c/0x1a0
__arm64_sys_openat+0xa8/0xc8
invoke_syscall+0xdc/0x270
el0_svc_common+0x1ec/0x250
do_el0_svc+0x54/0x70
el0_svc+0x50/0xe8
el0t_64_sync_handler+0x48/0x120
el0t_64_sync+0x1a8/0x1b0
Guard inst destruction (both dec and enc) with hard and threaded
IRQ synchronization.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/media/platform/qcom/venus/vdec.c | 13 +++++++++++++
drivers/media/platform/qcom/venus/venc.c | 13 +++++++++++++
2 files changed, 26 insertions(+)
diff --git a/drivers/media/platform/qcom/venus/vdec.c b/drivers/media/platform/qcom/venus/vdec.c
index 0013c4704f03..ff1823bc967c 100644
--- a/drivers/media/platform/qcom/venus/vdec.c
+++ b/drivers/media/platform/qcom/venus/vdec.c
@@ -1747,6 +1747,13 @@ static int vdec_close(struct file *file)
{
struct venus_inst *inst = to_inst(file);
+ /*
+ * Make sure we don't have pending IRQs and no threaded IRQ handler is
+ * running nor pending (synchronize_irq)), which can race with inst
+ * destruction.
+ */
+ disable_irq(inst->core->irq);
+
vdec_pm_get(inst);
cancel_work_sync(&inst->delayed_process_work);
@@ -1763,6 +1770,12 @@ static int vdec_close(struct file *file)
vdec_pm_put(inst, false);
+ /*
+ * inst is gone from the core->instances list, re-enable IRQ and
+ * threaded IRQ
+ */
+ enable_irq(inst->core->irq);
+
kfree(inst);
return 0;
}
diff --git a/drivers/media/platform/qcom/venus/venc.c b/drivers/media/platform/qcom/venus/venc.c
index 6a26a6592424..6575e84312fe 100644
--- a/drivers/media/platform/qcom/venus/venc.c
+++ b/drivers/media/platform/qcom/venus/venc.c
@@ -1515,6 +1515,13 @@ static int venc_close(struct file *file)
{
struct venus_inst *inst = to_inst(file);
+ /*
+ * Make sure we don't have pending IRQs and no threaded IRQ handler is
+ * running nor pending (synchronize_irq)), which can race with inst
+ * destruction.
+ */
+ disable_irq(inst->core->irq);
+
venc_pm_get(inst);
v4l2_m2m_ctx_release(inst->m2m_ctx);
@@ -1530,6 +1537,12 @@ static int venc_close(struct file *file)
venc_pm_put(inst, false);
+ /*
+ * inst is gone from the core->instances list, re-enable IRQ and
+ * threaded IRQ
+ */
+ enable_irq(inst->core->irq);
+
kfree(inst);
return 0;
}
--
2.47.0.105.g07ac214952-goog
On (24/10/23 14:24), Sergey Senozhatsky wrote: > Guard inst destruction (both dec and enc) with hard and threaded > IRQ synchronization. Folks, please ignore this patch. Stand by for v2.
On (24/10/24 13:58), Sergey Senozhatsky wrote: > Date: Thu, 24 Oct 2024 13:58:36 +0900 > From: Sergey Senozhatsky <senozhatsky@chromium.org> > To: Sergey Senozhatsky <senozhatsky@chromium.org> > Cc: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>, Vikash Garodia > <quic_vgarodia@quicinc.com>, Bryan O'Donoghue > <bryan.odonoghue@linaro.org>, linux-media@vger.kernel.org, > linux-kernel@vger.kernel.org > Subject: Re: [PATCH 2/2] media: venus: sync with threaded IRQ during inst > destruction > Message-ID: <20241024045836.GJ1279924@google.com> > > On (24/10/23 14:24), Sergey Senozhatsky wrote: > > Guard inst destruction (both dec and enc) with hard and threaded > > IRQ synchronization. > > Folks, please ignore this patch. Stand by for v2. I think it probably should be something like this (both for dec and enc). --- @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file) venc_pm_get(inst); + /* + * First, remove the inst from the ->instances list, so that + * to_instance() will return NULL. + */ + hfi_session_destroy(inst); + /* + * Second, make sure we don't have IRQ/IRQ-thread currently running or + * pending execution (disable_irq() calls synchronize_irq()), which + * can race with the inst destruction. + */ + disable_irq(inst->core->irq); + /* + * Lastly, inst is gone from the core->instances list and we don't + * have running/pending IRQ/IRQ-thread, proceed with the destruction + */ + enable_irq(inst->core->irq); + v4l2_m2m_ctx_release(inst->m2m_ctx); v4l2_m2m_release(inst->m2m_dev); - hfi_session_destroy(inst); v4l2_fh_del(&inst->fh); v4l2_fh_exit(&inst->fh); venc_ctrl_deinit(inst);
Hi Sergey, On Thu, Oct 24, 2024 at 2:13 PM Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > > On (24/10/24 13:58), Sergey Senozhatsky wrote: > > Date: Thu, 24 Oct 2024 13:58:36 +0900 > > From: Sergey Senozhatsky <senozhatsky@chromium.org> > > To: Sergey Senozhatsky <senozhatsky@chromium.org> > > Cc: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>, Vikash Garodia > > <quic_vgarodia@quicinc.com>, Bryan O'Donoghue > > <bryan.odonoghue@linaro.org>, linux-media@vger.kernel.org, > > linux-kernel@vger.kernel.org > > Subject: Re: [PATCH 2/2] media: venus: sync with threaded IRQ during inst > > destruction > > Message-ID: <20241024045836.GJ1279924@google.com> > > > > On (24/10/23 14:24), Sergey Senozhatsky wrote: > > > Guard inst destruction (both dec and enc) with hard and threaded > > > IRQ synchronization. > > > > Folks, please ignore this patch. Stand by for v2. > > I think it probably should be something like this (both for dec and > enc). > > --- > > @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file) > > venc_pm_get(inst); > > + /* > + * First, remove the inst from the ->instances list, so that > + * to_instance() will return NULL. > + */ > + hfi_session_destroy(inst); > + /* > + * Second, make sure we don't have IRQ/IRQ-thread currently running or > + * pending execution (disable_irq() calls synchronize_irq()), which > + * can race with the inst destruction. > + */ > + disable_irq(inst->core->irq); > + /* > + * Lastly, inst is gone from the core->instances list and we don't > + * have running/pending IRQ/IRQ-thread, proceed with the destruction > + */ > + enable_irq(inst->core->irq); > + Thanks a lot for looking into this. Wouldn't it be enough to just call synchronize_irq() at this point, since the instance was removed from the list already? I guess the question is if that's the only way the interrupt handler can get hold of the instance. Best, Tomasz > v4l2_m2m_ctx_release(inst->m2m_ctx); > v4l2_m2m_release(inst->m2m_dev); > - hfi_session_destroy(inst); > v4l2_fh_del(&inst->fh); > v4l2_fh_exit(&inst->fh); > venc_ctrl_deinit(inst); >
On (24/10/24 14:18), Tomasz Figa wrote: > > @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file) > > > > venc_pm_get(inst); > > > > + /* > > + * First, remove the inst from the ->instances list, so that > > + * to_instance() will return NULL. > > + */ > > + hfi_session_destroy(inst); > > + /* > > + * Second, make sure we don't have IRQ/IRQ-thread currently running or > > + * pending execution (disable_irq() calls synchronize_irq()), which > > + * can race with the inst destruction. > > + */ > > + disable_irq(inst->core->irq); > > + /* > > + * Lastly, inst is gone from the core->instances list and we don't > > + * have running/pending IRQ/IRQ-thread, proceed with the destruction > > + */ > > + enable_irq(inst->core->irq); > > + > > Thanks a lot for looking into this. Wouldn't it be enough to just call > synchronize_irq() at this point, since the instance was removed from > the list already? I guess the question is if that's the only way the > interrupt handler can get hold of the instance. Good question. synchronize_irq() waits for IRQ-threads, so if inst is accessed only from IRQ-thread then we are fine. If, however, inst is also accessed from hard IRQ, then synchronize_irq() won't work, I guess, because it doesn't wait for "in flight hard IRQs". disable_irq() OTOH "waits for completion", so we cover in-flight hard IRQs too.
On Thu, Oct 24, 2024 at 2:46 PM Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > > On (24/10/24 14:18), Tomasz Figa wrote: > > > @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file) > > > > > > venc_pm_get(inst); > > > > > > + /* > > > + * First, remove the inst from the ->instances list, so that > > > + * to_instance() will return NULL. > > > + */ > > > + hfi_session_destroy(inst); > > > + /* > > > + * Second, make sure we don't have IRQ/IRQ-thread currently running or > > > + * pending execution (disable_irq() calls synchronize_irq()), which > > > + * can race with the inst destruction. > > > + */ > > > + disable_irq(inst->core->irq); > > > + /* > > > + * Lastly, inst is gone from the core->instances list and we don't > > > + * have running/pending IRQ/IRQ-thread, proceed with the destruction > > > + */ > > > + enable_irq(inst->core->irq); > > > + > > > > Thanks a lot for looking into this. Wouldn't it be enough to just call > > synchronize_irq() at this point, since the instance was removed from > > the list already? I guess the question is if that's the only way the > > interrupt handler can get hold of the instance. > > Good question. > > synchronize_irq() waits for IRQ-threads, so if inst is accessed only from > IRQ-thread then we are fine. If, however, inst is also accessed from hard > IRQ, then synchronize_irq() won't work, I guess, because it doesn't wait > for "in flight hard IRQs". disable_irq() OTOH "waits for completion", so > we cover in-flight hard IRQs too. Looking at the code, synchronize_irq() internally also calls synchronize_hardirq() and that in turn waits for the IRQD_IRQ_INPROGESS flag to be cleared before returning [1]. The flag is set by handle_irq_event() before most of the IRQ handling is run and cleared at the end of the function [2], which makes me believe that it would actually ensure all the hardirq and threaded IRQ handlers would be waited for. [1] https://elixir.bootlin.com/linux/v6.11.5/source/kernel/irq/manage.c#L38 [2] https://elixir.bootlin.com/linux/v6.11.5/source/kernel/irq/handle.c#L202 Although I guess it would be the best if someone confirmed that, because with all the IRQ handling complexities of SMP, nothing can be certain today. :P Best, Tomasz
© 2016 - 2024 Red Hat, Inc.