drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 + 1 file changed, 1 insertion(+)
In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
and mtk_jpeg_enc_device_run may be called to start the
work.
If we remove the module which will call mtk_jpeg_remove
to make cleanup, there may be a unfinished work. The
possible sequence is as follows, which will cause a
typical UAF bug.
Fix it by canceling the work before cleanup in the mtk_jpeg_remove
CPU0 CPU1
|mtk_jpeg_job_timeout_work
mtk_jpeg_remove |
v4l2_m2m_release |
kfree(m2m_dev); |
|
| v4l2_m2m_get_curr_priv
| m2m_dev->curr_ctx //use
Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
---
- v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie.
---
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index 0051f372a66c..6069ecf420b0 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev)
{
struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
+ cancel_delayed_work_sync(&jpeg->job_timeout_work);
pm_runtime_disable(&pdev->dev);
video_unregister_device(jpeg->vdev);
v4l2_m2m_release(jpeg->m2m_dev);
--
2.25.1
Hello Zheng,
On 7/7/23 12:24, Zheng Wang wrote:
> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> and mtk_jpeg_enc_device_run may be called to start the
> work.
> If we remove the module which will call mtk_jpeg_remove
> to make cleanup, there may be a unfinished work. The
> possible sequence is as follows, which will cause a
> typical UAF bug.
>
> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>
> CPU0 CPU1
>
> |mtk_jpeg_job_timeout_work
> mtk_jpeg_remove |
> v4l2_m2m_release |
> kfree(m2m_dev); |
> |
> | v4l2_m2m_get_curr_priv
> | m2m_dev->curr_ctx //use
> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> ---
> - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie.
> ---
> drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> index 0051f372a66c..6069ecf420b0 100644
> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev)
> {
> struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
>
> + cancel_delayed_work_sync(&jpeg->job_timeout_work);
> pm_runtime_disable(&pdev->dev);
> video_unregister_device(jpeg->vdev);
> v4l2_m2m_release(jpeg->m2m_dev);
AFAICS, there is a fundamental problem here. The job_timeout_work uses
v4l2_m2m_get_curr_priv() and at the time when driver module is unloaded,
all the v4l contexts must be closed and released. Hence the
v4l2_m2m_get_curr_priv() shall return NULL and crash the kernel when
work is executed before cancel_delayed_work_sync().
At the time when mtk_jpeg_remove() is invoked, there shall be no
job_timeout_work running in background because all jobs should be
completed before context is released. If you'll look at
v4l2_m2m_cancel_job(), you can see that it waits for the task completion
before closing context.
You shouldn't be able to remove driver module while it has active/opened
v4l contexts. If you can do that, then this is yours bug that needs to
be fixed.
In addition to this all, the job_timeout_work is initialized only for
the single-core JPEG device. I'd expect this patch should crash
multi-core JPEG devices.
--
Best regards,
Dmitry
Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年8月23日周三 02:51写道:
>
> Hello Zheng,
>
> On 7/7/23 12:24, Zheng Wang wrote:
> > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > and mtk_jpeg_enc_device_run may be called to start the
> > work.
> > If we remove the module which will call mtk_jpeg_remove
> > to make cleanup, there may be a unfinished work. The
> > possible sequence is as follows, which will cause a
> > typical UAF bug.
> >
> > Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> >
> > CPU0 CPU1
> >
> > |mtk_jpeg_job_timeout_work
> > mtk_jpeg_remove |
> > v4l2_m2m_release |
> > kfree(m2m_dev); |
> > |
> > | v4l2_m2m_get_curr_priv
> > | m2m_dev->curr_ctx //use
> > Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > ---
> > - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie.
> > ---
> > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > index 0051f372a66c..6069ecf420b0 100644
> > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev)
> > {
> > struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> >
> > + cancel_delayed_work_sync(&jpeg->job_timeout_work);
> > pm_runtime_disable(&pdev->dev);
> > video_unregister_device(jpeg->vdev);
> > v4l2_m2m_release(jpeg->m2m_dev);
>
> AFAICS, there is a fundamental problem here. The job_timeout_work uses
> v4l2_m2m_get_curr_priv() and at the time when driver module is unloaded,
> all the v4l contexts must be closed and released. Hence the
> v4l2_m2m_get_curr_priv() shall return NULL and crash the kernel when
> work is executed before cancel_delayed_work_sync().
>
Hi Dmitry,
Thanks for your reply. I think you're right. As m2m_dev is freed in
v4l2_m2m_release,
the invoking in v4l2_m2m_get_curr_priv might cause either UAF or null
pointer dereference
bug. I am sure that context is closed when we invoke mtk_jpeg_remove.
But I'm not sure if
context is released when mtk_jpegdec_timeout_work running.
> At the time when mtk_jpeg_remove() is invoked, there shall be no
> job_timeout_work running in background because all jobs should be
> completed before context is released. If you'll look at
> v4l2_m2m_cancel_job(), you can see that it waits for the task completion
> before closing context.
Yes, so I think the better way is to put the cancel_delayed_work_sync
invoking into
v4l2_m2m_ctx_release function?
>
> You shouldn't be able to remove driver module while it has active/opened
> v4l contexts. If you can do that, then this is yours bug that needs to
> be fixed.
>
> In addition to this all, the job_timeout_work is initialized only for
> the single-core JPEG device. I'd expect this patch should crash
> multi-core JPEG devices.
>
I think that's true. As I'm not familiar with the code here. Could you
please give me some advice about the patch?
Best regards,
Zheng
> --
> Best regards,
> Dmitry
>
On 8/24/23 11:20, Zheng Hacker wrote:
> Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年8月23日周三 02:51写道:
>
>>
>> Hello Zheng,
>>
>> On 7/7/23 12:24, Zheng Wang wrote:
>>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
>>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
>>> and mtk_jpeg_enc_device_run may be called to start the
>>> work.
>>> If we remove the module which will call mtk_jpeg_remove
>>> to make cleanup, there may be a unfinished work. The
>>> possible sequence is as follows, which will cause a
>>> typical UAF bug.
>>>
>>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>>>
>>> CPU0 CPU1
>>>
>>> |mtk_jpeg_job_timeout_work
>>> mtk_jpeg_remove |
>>> v4l2_m2m_release |
>>> kfree(m2m_dev); |
>>> |
>>> | v4l2_m2m_get_curr_priv
>>> | m2m_dev->curr_ctx //use
>>> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
>>> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
>>> ---
>>> - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie.
>>> ---
>>> drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
>>> index 0051f372a66c..6069ecf420b0 100644
>>> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
>>> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
>>> @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev)
>>> {
>>> struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
>>>
>>> + cancel_delayed_work_sync(&jpeg->job_timeout_work);
>>> pm_runtime_disable(&pdev->dev);
>>> video_unregister_device(jpeg->vdev);
>>> v4l2_m2m_release(jpeg->m2m_dev);
>>
>> AFAICS, there is a fundamental problem here. The job_timeout_work uses
>> v4l2_m2m_get_curr_priv() and at the time when driver module is unloaded,
>> all the v4l contexts must be closed and released. Hence the
>> v4l2_m2m_get_curr_priv() shall return NULL and crash the kernel when
>> work is executed before cancel_delayed_work_sync().
>>
>
> Hi Dmitry,
>
> Thanks for your reply. I think you're right. As m2m_dev is freed in
> v4l2_m2m_release,
> the invoking in v4l2_m2m_get_curr_priv might cause either UAF or null
> pointer dereference
> bug. I am sure that context is closed when we invoke mtk_jpeg_remove.
> But I'm not sure if
> context is released when mtk_jpegdec_timeout_work running.
>
>> At the time when mtk_jpeg_remove() is invoked, there shall be no
>> job_timeout_work running in background because all jobs should be
>> completed before context is released. If you'll look at
>> v4l2_m2m_cancel_job(), you can see that it waits for the task completion
>> before closing context.
>
> Yes, so I think the better way is to put the cancel_delayed_work_sync
> invoking into
> v4l2_m2m_ctx_release function?
The v4l2_m2m_ctx_release() already should wait for the job_timeout_work
completion or for the interrupt fire. Apparently it doesn't work in
yours case. You'll need to debug why v4l job or job_timeout_work is
running after v4l2_m2m_ctx_release(), it shouldn't happen.
The interrupt handler cancels job_timeout_work, you shouldn't need to
flush the work.
Technically, interrupt handler may race with job_timeout_work, but the
timeout is set to 1 second and in practice should be difficult to
trigger the race. The interrupt handler needs to be threaded, it should
use cancel_delayed_work_sync() and check the return value of this function.
>>
>> You shouldn't be able to remove driver module while it has active/opened
>> v4l contexts. If you can do that, then this is yours bug that needs to
>> be fixed.
>>
>> In addition to this all, the job_timeout_work is initialized only for
>> the single-core JPEG device. I'd expect this patch should crash
>> multi-core JPEG devices.
>>
>
> I think that's true. As I'm not familiar with the code here. Could you
> please give me some advice about the patch?
We'll need to understand why v4l2_m2m_ctx_release() doesn't work as
expected before thinking about the patch.
--
Best regards,
Dmitry
Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年8月28日周一 10:04写道:
>
> On 8/24/23 11:20, Zheng Hacker wrote:
> > Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年8月23日周三 02:51写道:
> >
> >>
> >> Hello Zheng,
> >>
> >> On 7/7/23 12:24, Zheng Wang wrote:
> >>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> >>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> >>> and mtk_jpeg_enc_device_run may be called to start the
> >>> work.
> >>> If we remove the module which will call mtk_jpeg_remove
> >>> to make cleanup, there may be a unfinished work. The
> >>> possible sequence is as follows, which will cause a
> >>> typical UAF bug.
> >>>
> >>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> >>>
> >>> CPU0 CPU1
> >>>
> >>> |mtk_jpeg_job_timeout_work
> >>> mtk_jpeg_remove |
> >>> v4l2_m2m_release |
> >>> kfree(m2m_dev); |
> >>> |
> >>> | v4l2_m2m_get_curr_priv
> >>> | m2m_dev->curr_ctx //use
> >>> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> >>> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> >>> ---
> >>> - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie.
> >>> ---
> >>> drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 +
> >>> 1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> >>> index 0051f372a66c..6069ecf420b0 100644
> >>> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> >>> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> >>> @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev)
> >>> {
> >>> struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> >>>
> >>> + cancel_delayed_work_sync(&jpeg->job_timeout_work);
> >>> pm_runtime_disable(&pdev->dev);
> >>> video_unregister_device(jpeg->vdev);
> >>> v4l2_m2m_release(jpeg->m2m_dev);
> >>
> >> AFAICS, there is a fundamental problem here. The job_timeout_work uses
> >> v4l2_m2m_get_curr_priv() and at the time when driver module is unloaded,
> >> all the v4l contexts must be closed and released. Hence the
> >> v4l2_m2m_get_curr_priv() shall return NULL and crash the kernel when
> >> work is executed before cancel_delayed_work_sync().
> >>
> >
> > Hi Dmitry,
> >
> > Thanks for your reply. I think you're right. As m2m_dev is freed in
> > v4l2_m2m_release,
> > the invoking in v4l2_m2m_get_curr_priv might cause either UAF or null
> > pointer dereference
> > bug. I am sure that context is closed when we invoke mtk_jpeg_remove.
> > But I'm not sure if
> > context is released when mtk_jpegdec_timeout_work running.
> >
> >> At the time when mtk_jpeg_remove() is invoked, there shall be no
> >> job_timeout_work running in background because all jobs should be
> >> completed before context is released. If you'll look at
> >> v4l2_m2m_cancel_job(), you can see that it waits for the task completion
> >> before closing context.
> >
> > Yes, so I think the better way is to put the cancel_delayed_work_sync
> > invoking into
> > v4l2_m2m_ctx_release function?
>
> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work
> completion or for the interrupt fire. Apparently it doesn't work in
> yours case. You'll need to debug why v4l job or job_timeout_work is
> running after v4l2_m2m_ctx_release(), it shouldn't happen.
>
Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING,
the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish
to trigger that.
However, this is not the only path to call v4l2_m2m_job_finish. Here
is a invoking chain:
v4l_streamon
->v4l2_m2m_ioctl_streamon
->v4l2_m2m_streamon
->v4l2_m2m_try_schedule
->v4l2_m2m_try_run
->mtk_jpeg_dec_device_run
->schedule_delayed_work(&jpeg->job_timeout_work...
->error path goto dec_end
->v4l2_m2m_job_finish
In some specific situation, it starts the worker and also calls
v4l2_m2m_job_finish, which might
make v4l2_m2m_cancel_job continues.
> The interrupt handler cancels job_timeout_work, you shouldn't need to
> flush the work.
It will, but as I said, there might be an early invocation chain to
start the work.(Not very sure)
>
> Technically, interrupt handler may race with job_timeout_work, but the
> timeout is set to 1 second and in practice should be difficult to
> trigger the race. The interrupt handler needs to be threaded, it should
> use cancel_delayed_work_sync() and check the return value of this function.
>
Yes, it's better to use cancel_delayed_work_sync here.
> >>
> >> You shouldn't be able to remove driver module while it has active/opened
> >> v4l contexts. If you can do that, then this is yours bug that needs to
> >> be fixed.
> >>
> >> In addition to this all, the job_timeout_work is initialized only for
> >> the single-core JPEG device. I'd expect this patch should crash
> >> multi-core JPEG devices.
> >>
> >
> > I think that's true. As I'm not familiar with the code here. Could you
> > please give me some advice about the patch?
>
> We'll need to understand why v4l2_m2m_ctx_release() doesn't work as
> expected before thinking about the patch.
>
> --
> Best regards,
> Dmitry
>
On 8/31/23 11:18, Zheng Hacker wrote: >> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work >> completion or for the interrupt fire. Apparently it doesn't work in >> yours case. You'll need to debug why v4l job or job_timeout_work is >> running after v4l2_m2m_ctx_release(), it shouldn't happen. >> > Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING, > the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish > to trigger that. > > However, this is not the only path to call v4l2_m2m_job_finish. Here > is a invoking chain: > v4l_streamon > ->v4l2_m2m_ioctl_streamon > ->v4l2_m2m_streamon > ->v4l2_m2m_try_schedule > ->v4l2_m2m_try_run > ->mtk_jpeg_dec_device_run > ->schedule_delayed_work(&jpeg->job_timeout_work... > ->error path goto dec_end > ->v4l2_m2m_job_finish > > In some specific situation, it starts the worker and also calls > v4l2_m2m_job_finish, which might > make v4l2_m2m_cancel_job continues. Then the error path should cancel the job_timeout_work, or better job needs to be run after the dec/enc has been started and not before. Looking further at the code, I'm confused by this hunk: mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base); v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx); The job should be marked as finished when h/w has finished processing the job and not right after the job has been started. So the job is always completed and mtk_jpeg_job_timeout_work() doesn't work as expected, am I missing something? -- Best regards, Dmitry
Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年9月20日周三 02:24写道: > > On 8/31/23 11:18, Zheng Hacker wrote: > >> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work > >> completion or for the interrupt fire. Apparently it doesn't work in > >> yours case. You'll need to debug why v4l job or job_timeout_work is > >> running after v4l2_m2m_ctx_release(), it shouldn't happen. > >> > > Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING, > > the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish > > to trigger that. > > > > However, this is not the only path to call v4l2_m2m_job_finish. Here > > is a invoking chain: > > v4l_streamon > > ->v4l2_m2m_ioctl_streamon > > ->v4l2_m2m_streamon > > ->v4l2_m2m_try_schedule > > ->v4l2_m2m_try_run > > ->mtk_jpeg_dec_device_run > > ->schedule_delayed_work(&jpeg->job_timeout_work... > > ->error path goto dec_end > > ->v4l2_m2m_job_finish > > > > In some specific situation, it starts the worker and also calls > > v4l2_m2m_job_finish, which might > > make v4l2_m2m_cancel_job continues. > > Then the error path should cancel the job_timeout_work, or better job > needs to be run after the dec/enc has been started and not before. > Hi, Sorry for my late reply for I just went on a long vacation. Get it. I'll write another patch and change the summary to the lack of canceling job in error path. > Looking further at the code, I'm confused by this hunk: > > mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base); > v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx); > > The job should be marked as finished when h/w has finished processing > the job and not right after the job has been started. So the job is > always completed and mtk_jpeg_job_timeout_work() doesn't work as > expected, am I missing something? After reading the code I still don't know. I didn't see any function like mtk_jpeg_dec_end. The same thing happens on mtk_jpeg_enc_start. I think I'd better fix the first problem and wait for someone familiar with the second part. Best regards, Zheng > > -- > Best regards, > Dmitry >
On 10/8/23 12:13, Zheng Hacker wrote: > Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年9月20日周三 02:24写道: >> >> On 8/31/23 11:18, Zheng Hacker wrote: >>>> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work >>>> completion or for the interrupt fire. Apparently it doesn't work in >>>> yours case. You'll need to debug why v4l job or job_timeout_work is >>>> running after v4l2_m2m_ctx_release(), it shouldn't happen. >>>> >>> Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING, >>> the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish >>> to trigger that. >>> >>> However, this is not the only path to call v4l2_m2m_job_finish. Here >>> is a invoking chain: >>> v4l_streamon >>> ->v4l2_m2m_ioctl_streamon >>> ->v4l2_m2m_streamon >>> ->v4l2_m2m_try_schedule >>> ->v4l2_m2m_try_run >>> ->mtk_jpeg_dec_device_run >>> ->schedule_delayed_work(&jpeg->job_timeout_work... >>> ->error path goto dec_end >>> ->v4l2_m2m_job_finish >>> >>> In some specific situation, it starts the worker and also calls >>> v4l2_m2m_job_finish, which might >>> make v4l2_m2m_cancel_job continues. >> >> Then the error path should cancel the job_timeout_work, or better job >> needs to be run after the dec/enc has been started and not before. >> > > Hi, > > Sorry for my late reply for I just went on a long vacation. > > Get it. I'll write another patch and change the summary to the lack of > canceling job in error path. > >> Looking further at the code, I'm confused by this hunk: >> >> mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base); >> v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx); >> >> The job should be marked as finished when h/w has finished processing >> the job and not right after the job has been started. So the job is >> always completed and mtk_jpeg_job_timeout_work() doesn't work as >> expected, am I missing something? > > After reading the code I still don't know. I didn't see any function > like mtk_jpeg_dec_end. The same thing > happens on mtk_jpeg_enc_start. I think I'd better fix the first > problem and wait for someone familiar with > the second part. I missed that the code mentioned above is related to the multi-core hw version, while you care about single-core. Nevertheless, the multi-core device_run() looks incorrect, So, the error code paths need to be corrected. Please try to revert yours fix and test this change: diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c index 0051f372a66c..fd3b0587fcad 100644 --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c @@ -1254,9 +1254,6 @@ static void mtk_jpegdec_worker(struct work_struct *work) v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx); v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx); - schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work, - msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); - mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs); if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, @@ -1266,6 +1263,9 @@ static void mtk_jpegdec_worker(struct work_struct *work) goto setdst_end; } + schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work, + msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); + spin_lock_irqsave(&comp_jpeg[hw_id]->hw_lock, flags); ctx->total_frame_num++; mtk_jpeg_dec_reset(comp_jpeg[hw_id]->reg_base); @@ -1330,13 +1330,13 @@ static void mtk_jpeg_dec_device_run(void *priv) if (ret < 0) goto dec_end; - schedule_delayed_work(&jpeg->job_timeout_work, - msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); - mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs); if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, &dst_buf->vb2_buf, &fb)) goto dec_end; + schedule_delayed_work(&jpeg->job_timeout_work, + msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); + spin_lock_irqsave(&jpeg->hw_lock, flags); mtk_jpeg_dec_reset(jpeg->reg_base); mtk_jpeg_dec_set_config(jpeg->reg_base, -- Best regards, Dmitry
Thanks for your patch. I think this should fix the problem. As I have no experience in reverting, can I submit the patch with your fix as well as reverting my fix? Best regards, Zheng Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年10月20日周五 03:56写道: > > On 10/8/23 12:13, Zheng Hacker wrote: > > Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年9月20日周三 02:24写道: > >> > >> On 8/31/23 11:18, Zheng Hacker wrote: > >>>> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work > >>>> completion or for the interrupt fire. Apparently it doesn't work in > >>>> yours case. You'll need to debug why v4l job or job_timeout_work is > >>>> running after v4l2_m2m_ctx_release(), it shouldn't happen. > >>>> > >>> Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING, > >>> the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish > >>> to trigger that. > >>> > >>> However, this is not the only path to call v4l2_m2m_job_finish. Here > >>> is a invoking chain: > >>> v4l_streamon > >>> ->v4l2_m2m_ioctl_streamon > >>> ->v4l2_m2m_streamon > >>> ->v4l2_m2m_try_schedule > >>> ->v4l2_m2m_try_run > >>> ->mtk_jpeg_dec_device_run > >>> ->schedule_delayed_work(&jpeg->job_timeout_work... > >>> ->error path goto dec_end > >>> ->v4l2_m2m_job_finish > >>> > >>> In some specific situation, it starts the worker and also calls > >>> v4l2_m2m_job_finish, which might > >>> make v4l2_m2m_cancel_job continues. > >> > >> Then the error path should cancel the job_timeout_work, or better job > >> needs to be run after the dec/enc has been started and not before. > >> > > > > Hi, > > > > Sorry for my late reply for I just went on a long vacation. > > > > Get it. I'll write another patch and change the summary to the lack of > > canceling job in error path. > > > >> Looking further at the code, I'm confused by this hunk: > >> > >> mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base); > >> v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx); > >> > >> The job should be marked as finished when h/w has finished processing > >> the job and not right after the job has been started. So the job is > >> always completed and mtk_jpeg_job_timeout_work() doesn't work as > >> expected, am I missing something? > > > > After reading the code I still don't know. I didn't see any function > > like mtk_jpeg_dec_end. The same thing > > happens on mtk_jpeg_enc_start. I think I'd better fix the first > > problem and wait for someone familiar with > > the second part. > > I missed that the code mentioned above is related to the multi-core hw version, while you care about single-core. Nevertheless, the multi-core device_run() looks incorrect, > > So, the error code paths need to be corrected. Please try to revert yours fix and test this change: > > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > index 0051f372a66c..fd3b0587fcad 100644 > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > @@ -1254,9 +1254,6 @@ static void mtk_jpegdec_worker(struct work_struct *work) > v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx); > v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx); > > - schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work, > - msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); > - > mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs); > if (mtk_jpeg_set_dec_dst(ctx, > &jpeg_src_buf->dec_param, > @@ -1266,6 +1263,9 @@ static void mtk_jpegdec_worker(struct work_struct *work) > goto setdst_end; > } > > + schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work, > + msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); > + > spin_lock_irqsave(&comp_jpeg[hw_id]->hw_lock, flags); > ctx->total_frame_num++; > mtk_jpeg_dec_reset(comp_jpeg[hw_id]->reg_base); > @@ -1330,13 +1330,13 @@ static void mtk_jpeg_dec_device_run(void *priv) > if (ret < 0) > goto dec_end; > > - schedule_delayed_work(&jpeg->job_timeout_work, > - msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); > - > mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs); > if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, &dst_buf->vb2_buf, &fb)) > goto dec_end; > > + schedule_delayed_work(&jpeg->job_timeout_work, > + msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); > + > spin_lock_irqsave(&jpeg->hw_lock, flags); > mtk_jpeg_dec_reset(jpeg->reg_base); > mtk_jpeg_dec_set_config(jpeg->reg_base, > > -- > Best regards, > Dmitry >
On 9/19/23 21:24, Dmitry Osipenko wrote: > On 8/31/23 11:18, Zheng Hacker wrote: >>> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work >>> completion or for the interrupt fire. Apparently it doesn't work in >>> yours case. You'll need to debug why v4l job or job_timeout_work is >>> running after v4l2_m2m_ctx_release(), it shouldn't happen. >>> >> Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING, >> the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish >> to trigger that. >> >> However, this is not the only path to call v4l2_m2m_job_finish. Here >> is a invoking chain: >> v4l_streamon >> ->v4l2_m2m_ioctl_streamon >> ->v4l2_m2m_streamon >> ->v4l2_m2m_try_schedule >> ->v4l2_m2m_try_run >> ->mtk_jpeg_dec_device_run >> ->schedule_delayed_work(&jpeg->job_timeout_work... >> ->error path goto dec_end >> ->v4l2_m2m_job_finish >> >> In some specific situation, it starts the worker and also calls >> v4l2_m2m_job_finish, which might >> make v4l2_m2m_cancel_job continues. > > Then the error path should cancel the job_timeout_work, or better job s/job/timeout work/ > needs to be run after the dec/enc has been started and not before. > > Looking further at the code, I'm confused by this hunk: > > mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base); > v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx); > > The job should be marked as finished when h/w has finished processing > the job and not right after the job has been started. So the job is > always completed and mtk_jpeg_job_timeout_work() doesn't work as > expected, am I missing something? > -- Best regards, Dmitry
Friendly ping.
Zheng Hacker <hackerzheng666@gmail.com> 于2023年8月31日周四 16:18写道:
>
> Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年8月28日周一 10:04写道:
> >
> > On 8/24/23 11:20, Zheng Hacker wrote:
> > > Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年8月23日周三 02:51写道:
> > >
> > >>
> > >> Hello Zheng,
> > >>
> > >> On 7/7/23 12:24, Zheng Wang wrote:
> > >>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > >>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > >>> and mtk_jpeg_enc_device_run may be called to start the
> > >>> work.
> > >>> If we remove the module which will call mtk_jpeg_remove
> > >>> to make cleanup, there may be a unfinished work. The
> > >>> possible sequence is as follows, which will cause a
> > >>> typical UAF bug.
> > >>>
> > >>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> > >>>
> > >>> CPU0 CPU1
> > >>>
> > >>> |mtk_jpeg_job_timeout_work
> > >>> mtk_jpeg_remove |
> > >>> v4l2_m2m_release |
> > >>> kfree(m2m_dev); |
> > >>> |
> > >>> | v4l2_m2m_get_curr_priv
> > >>> | m2m_dev->curr_ctx //use
> > >>> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> > >>> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > >>> ---
> > >>> - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie.
> > >>> ---
> > >>> drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 +
> > >>> 1 file changed, 1 insertion(+)
> > >>>
> > >>> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > >>> index 0051f372a66c..6069ecf420b0 100644
> > >>> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > >>> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > >>> @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev)
> > >>> {
> > >>> struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> > >>>
> > >>> + cancel_delayed_work_sync(&jpeg->job_timeout_work);
> > >>> pm_runtime_disable(&pdev->dev);
> > >>> video_unregister_device(jpeg->vdev);
> > >>> v4l2_m2m_release(jpeg->m2m_dev);
> > >>
> > >> AFAICS, there is a fundamental problem here. The job_timeout_work uses
> > >> v4l2_m2m_get_curr_priv() and at the time when driver module is unloaded,
> > >> all the v4l contexts must be closed and released. Hence the
> > >> v4l2_m2m_get_curr_priv() shall return NULL and crash the kernel when
> > >> work is executed before cancel_delayed_work_sync().
> > >>
> > >
> > > Hi Dmitry,
> > >
> > > Thanks for your reply. I think you're right. As m2m_dev is freed in
> > > v4l2_m2m_release,
> > > the invoking in v4l2_m2m_get_curr_priv might cause either UAF or null
> > > pointer dereference
> > > bug. I am sure that context is closed when we invoke mtk_jpeg_remove.
> > > But I'm not sure if
> > > context is released when mtk_jpegdec_timeout_work running.
> > >
> > >> At the time when mtk_jpeg_remove() is invoked, there shall be no
> > >> job_timeout_work running in background because all jobs should be
> > >> completed before context is released. If you'll look at
> > >> v4l2_m2m_cancel_job(), you can see that it waits for the task completion
> > >> before closing context.
> > >
> > > Yes, so I think the better way is to put the cancel_delayed_work_sync
> > > invoking into
> > > v4l2_m2m_ctx_release function?
> >
> > The v4l2_m2m_ctx_release() already should wait for the job_timeout_work
> > completion or for the interrupt fire. Apparently it doesn't work in
> > yours case. You'll need to debug why v4l job or job_timeout_work is
> > running after v4l2_m2m_ctx_release(), it shouldn't happen.
> >
>
> Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING,
> the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish
> to trigger that.
>
> However, this is not the only path to call v4l2_m2m_job_finish. Here
> is a invoking chain:
> v4l_streamon
> ->v4l2_m2m_ioctl_streamon
> ->v4l2_m2m_streamon
> ->v4l2_m2m_try_schedule
> ->v4l2_m2m_try_run
> ->mtk_jpeg_dec_device_run
> ->schedule_delayed_work(&jpeg->job_timeout_work...
> ->error path goto dec_end
> ->v4l2_m2m_job_finish
>
> In some specific situation, it starts the worker and also calls
> v4l2_m2m_job_finish, which might
> make v4l2_m2m_cancel_job continues.
>
> > The interrupt handler cancels job_timeout_work, you shouldn't need to
> > flush the work.
>
> It will, but as I said, there might be an early invocation chain to
> start the work.(Not very sure)
>
> >
> > Technically, interrupt handler may race with job_timeout_work, but the
> > timeout is set to 1 second and in practice should be difficult to
> > trigger the race. The interrupt handler needs to be threaded, it should
> > use cancel_delayed_work_sync() and check the return value of this function.
> >
>
> Yes, it's better to use cancel_delayed_work_sync here.
>
> > >>
> > >> You shouldn't be able to remove driver module while it has active/opened
> > >> v4l contexts. If you can do that, then this is yours bug that needs to
> > >> be fixed.
> > >>
> > >> In addition to this all, the job_timeout_work is initialized only for
> > >> the single-core JPEG device. I'd expect this patch should crash
> > >> multi-core JPEG devices.
> > >>
> > >
> > > I think that's true. As I'm not familiar with the code here. Could you
> > > please give me some advice about the patch?
> >
> > We'll need to understand why v4l2_m2m_ctx_release() doesn't work as
> > expected before thinking about the patch.
> >
> > --
> > Best regards,
> > Dmitry
> >
Hi Dmitry,
The patch is on the stable queue. Could you please take a look at my
analysis? Thanks for your help.
Best regards,
Zheng
Zheng Hacker <hackerzheng666@gmail.com> 于2023年9月5日周二 12:24写道:
>
> Friendly ping.
>
> Zheng Hacker <hackerzheng666@gmail.com> 于2023年8月31日周四 16:18写道:
> >
> > Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年8月28日周一 10:04写道:
> > >
> > > On 8/24/23 11:20, Zheng Hacker wrote:
> > > > Dmitry Osipenko <dmitry.osipenko@collabora.com> 于2023年8月23日周三 02:51写道:
> > > >
> > > >>
> > > >> Hello Zheng,
> > > >>
> > > >> On 7/7/23 12:24, Zheng Wang wrote:
> > > >>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > >>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > >>> and mtk_jpeg_enc_device_run may be called to start the
> > > >>> work.
> > > >>> If we remove the module which will call mtk_jpeg_remove
> > > >>> to make cleanup, there may be a unfinished work. The
> > > >>> possible sequence is as follows, which will cause a
> > > >>> typical UAF bug.
> > > >>>
> > > >>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> > > >>>
> > > >>> CPU0 CPU1
> > > >>>
> > > >>> |mtk_jpeg_job_timeout_work
> > > >>> mtk_jpeg_remove |
> > > >>> v4l2_m2m_release |
> > > >>> kfree(m2m_dev); |
> > > >>> |
> > > >>> | v4l2_m2m_get_curr_priv
> > > >>> | m2m_dev->curr_ctx //use
> > > >>> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> > > >>> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > > >>> ---
> > > >>> - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie.
> > > >>> ---
> > > >>> drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 +
> > > >>> 1 file changed, 1 insertion(+)
> > > >>>
> > > >>> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > >>> index 0051f372a66c..6069ecf420b0 100644
> > > >>> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > >>> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > >>> @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev)
> > > >>> {
> > > >>> struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> > > >>>
> > > >>> + cancel_delayed_work_sync(&jpeg->job_timeout_work);
> > > >>> pm_runtime_disable(&pdev->dev);
> > > >>> video_unregister_device(jpeg->vdev);
> > > >>> v4l2_m2m_release(jpeg->m2m_dev);
> > > >>
> > > >> AFAICS, there is a fundamental problem here. The job_timeout_work uses
> > > >> v4l2_m2m_get_curr_priv() and at the time when driver module is unloaded,
> > > >> all the v4l contexts must be closed and released. Hence the
> > > >> v4l2_m2m_get_curr_priv() shall return NULL and crash the kernel when
> > > >> work is executed before cancel_delayed_work_sync().
> > > >>
> > > >
> > > > Hi Dmitry,
> > > >
> > > > Thanks for your reply. I think you're right. As m2m_dev is freed in
> > > > v4l2_m2m_release,
> > > > the invoking in v4l2_m2m_get_curr_priv might cause either UAF or null
> > > > pointer dereference
> > > > bug. I am sure that context is closed when we invoke mtk_jpeg_remove.
> > > > But I'm not sure if
> > > > context is released when mtk_jpegdec_timeout_work running.
> > > >
> > > >> At the time when mtk_jpeg_remove() is invoked, there shall be no
> > > >> job_timeout_work running in background because all jobs should be
> > > >> completed before context is released. If you'll look at
> > > >> v4l2_m2m_cancel_job(), you can see that it waits for the task completion
> > > >> before closing context.
> > > >
> > > > Yes, so I think the better way is to put the cancel_delayed_work_sync
> > > > invoking into
> > > > v4l2_m2m_ctx_release function?
> > >
> > > The v4l2_m2m_ctx_release() already should wait for the job_timeout_work
> > > completion or for the interrupt fire. Apparently it doesn't work in
> > > yours case. You'll need to debug why v4l job or job_timeout_work is
> > > running after v4l2_m2m_ctx_release(), it shouldn't happen.
> > >
> >
> > Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING,
> > the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish
> > to trigger that.
> >
> > However, this is not the only path to call v4l2_m2m_job_finish. Here
> > is a invoking chain:
> > v4l_streamon
> > ->v4l2_m2m_ioctl_streamon
> > ->v4l2_m2m_streamon
> > ->v4l2_m2m_try_schedule
> > ->v4l2_m2m_try_run
> > ->mtk_jpeg_dec_device_run
> > ->schedule_delayed_work(&jpeg->job_timeout_work...
> > ->error path goto dec_end
> > ->v4l2_m2m_job_finish
> >
> > In some specific situation, it starts the worker and also calls
> > v4l2_m2m_job_finish, which might
> > make v4l2_m2m_cancel_job continues.
> >
> > > The interrupt handler cancels job_timeout_work, you shouldn't need to
> > > flush the work.
> >
> > It will, but as I said, there might be an early invocation chain to
> > start the work.(Not very sure)
> >
> > >
> > > Technically, interrupt handler may race with job_timeout_work, but the
> > > timeout is set to 1 second and in practice should be difficult to
> > > trigger the race. The interrupt handler needs to be threaded, it should
> > > use cancel_delayed_work_sync() and check the return value of this function.
> > >
> >
> > Yes, it's better to use cancel_delayed_work_sync here.
> >
> > > >>
> > > >> You shouldn't be able to remove driver module while it has active/opened
> > > >> v4l contexts. If you can do that, then this is yours bug that needs to
> > > >> be fixed.
> > > >>
> > > >> In addition to this all, the job_timeout_work is initialized only for
> > > >> the single-core JPEG device. I'd expect this patch should crash
> > > >> multi-core JPEG devices.
> > > >>
> > > >
> > > > I think that's true. As I'm not familiar with the code here. Could you
> > > > please give me some advice about the patch?
> > >
> > > We'll need to understand why v4l2_m2m_ctx_release() doesn't work as
> > > expected before thinking about the patch.
> > >
> > > --
> > > Best regards,
> > > Dmitry
> > >
Il 07/07/23 11:24, Zheng Wang ha scritto:
> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> and mtk_jpeg_enc_device_run may be called to start the
> work.
> If we remove the module which will call mtk_jpeg_remove
> to make cleanup, there may be a unfinished work. The
> possible sequence is as follows, which will cause a
> typical UAF bug.
>
> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>
> CPU0 CPU1
>
> |mtk_jpeg_job_timeout_work
> mtk_jpeg_remove |
> v4l2_m2m_release |
> kfree(m2m_dev); |
> |
> | v4l2_m2m_get_curr_priv
> | m2m_dev->curr_ctx //use
> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Hello,
Is there any follow-up about this issue?
Thanks,
Zheng
AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
于2023年7月20日周四 15:45写道:
>
> Il 07/07/23 11:24, Zheng Wang ha scritto:
> > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > and mtk_jpeg_enc_device_run may be called to start the
> > work.
> > If we remove the module which will call mtk_jpeg_remove
> > to make cleanup, there may be a unfinished work. The
> > possible sequence is as follows, which will cause a
> > typical UAF bug.
> >
> > Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> >
> > CPU0 CPU1
> >
> > |mtk_jpeg_job_timeout_work
> > mtk_jpeg_remove |
> > v4l2_m2m_release |
> > kfree(m2m_dev); |
> > |
> > | v4l2_m2m_get_curr_priv
> > | m2m_dev->curr_ctx //use
> > Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
>
> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
>
>
On Fri, Jul 7, 2023 at 5:25 PM Zheng Wang <zyytlz.wz@163.com> wrote:
>
> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> and mtk_jpeg_enc_device_run may be called to start the
> work.
> If we remove the module which will call mtk_jpeg_remove
> to make cleanup, there may be a unfinished work. The
> possible sequence is as follows, which will cause a
> typical UAF bug.
>
> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>
> CPU0 CPU1
>
> |mtk_jpeg_job_timeout_work
> mtk_jpeg_remove |
> v4l2_m2m_release |
> kfree(m2m_dev); |
> |
> | v4l2_m2m_get_curr_priv
> | m2m_dev->curr_ctx //use
> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
On 07/07/2023 11:24, Zheng Wang wrote: > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > and mtk_jpeg_enc_device_run may be called to start the > work. > If we remove the module which will call mtk_jpeg_remove > to make cleanup, there may be a unfinished work. The > possible sequence is as follows, which will cause a > typical UAF bug. > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > CPU0 CPU1 > > |mtk_jpeg_job_timeout_work > mtk_jpeg_remove | > v4l2_m2m_release | > kfree(m2m_dev); | > | > | v4l2_m2m_get_curr_priv > | m2m_dev->curr_ctx //use Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> -- Regards, Alexandre
Hi, This issue has not been resolved for a long time. Is there anyone who can help? Best regards, Zheng Alexandre Mergnat <amergnat@baylibre.com> 于2023年7月7日周五 22:11写道: > > > > On 07/07/2023 11:24, Zheng Wang wrote: > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > and mtk_jpeg_enc_device_run may be called to start the > > work. > > If we remove the module which will call mtk_jpeg_remove > > to make cleanup, there may be a unfinished work. The > > possible sequence is as follows, which will cause a > > typical UAF bug. > > > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > > > CPU0 CPU1 > > > > |mtk_jpeg_job_timeout_work > > mtk_jpeg_remove | > > v4l2_m2m_release | > > kfree(m2m_dev); | > > | > > | v4l2_m2m_get_curr_priv > > | m2m_dev->curr_ctx //use > > Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> > > -- > Regards, > Alexandre
Friendly ping Zheng Hacker <hackerzheng666@gmail.com> 于2023年7月16日周日 00:08写道: > > Hi, > > This issue has not been resolved for a long time. Is there anyone who can help? > > Best regards, > Zheng > > Alexandre Mergnat <amergnat@baylibre.com> 于2023年7月7日周五 22:11写道: > > > > > > > > On 07/07/2023 11:24, Zheng Wang wrote: > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > > and mtk_jpeg_enc_device_run may be called to start the > > > work. > > > If we remove the module which will call mtk_jpeg_remove > > > to make cleanup, there may be a unfinished work. The > > > possible sequence is as follows, which will cause a > > > typical UAF bug. > > > > > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > > > > > CPU0 CPU1 > > > > > > |mtk_jpeg_job_timeout_work > > > mtk_jpeg_remove | > > > v4l2_m2m_release | > > > kfree(m2m_dev); | > > > | > > > | v4l2_m2m_get_curr_priv > > > | m2m_dev->curr_ctx //use > > > > Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> > > > > -- > > Regards, > > Alexandre
On 18/07/2023 05:07, Zheng Hacker wrote: > Friendly ping > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年7月16日周日 00:08写道: >> >> Hi, >> >> This issue has not been resolved for a long time. Is there anyone who can help? >> >> Best regards, >> Zheng >> >> Alexandre Mergnat <amergnat@baylibre.com> 于2023年7月7日周五 22:11写道: >>> >>> >>> >>> On 07/07/2023 11:24, Zheng Wang wrote: >>>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with >>>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run >>>> and mtk_jpeg_enc_device_run may be called to start the >>>> work. >>>> If we remove the module which will call mtk_jpeg_remove >>>> to make cleanup, there may be a unfinished work. The >>>> possible sequence is as follows, which will cause a >>>> typical UAF bug. >>>> >>>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove >>>> >>>> CPU0 CPU1 >>>> >>>> |mtk_jpeg_job_timeout_work >>>> mtk_jpeg_remove | >>>> v4l2_m2m_release | >>>> kfree(m2m_dev); | >>>> | >>>> | v4l2_m2m_get_curr_priv >>>> | m2m_dev->curr_ctx //use >>> >>> Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> >>> >>> -- >>> Regards, >>> Alexandre Hi Zheng, If you asking me to merge patch, sorry but I can't, I'm just a reviewer. I invite you to ping the maintainers directly: Bin Liu <bin.liu@mediatek.com> (supporter:MEDIATEK JPEG DRIVER) Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB)) Matthias Brugger <matthias.bgg@gmail.com> (maintainer:ARM/Mediatek SoC support) Otherwise, I misunderstood what you asking me. If so, can you rephrase your question please? -- Regards, Alexandre
Thanks dor your kind reply. I'll try to connect them later. Best regards, Zheng Alexandre Mergnat <amergnat@baylibre.com> 于2023年7月19日周三 18:17写道: > > > > On 18/07/2023 05:07, Zheng Hacker wrote: > > Friendly ping > > > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年7月16日周日 00:08写道: > >> > >> Hi, > >> > >> This issue has not been resolved for a long time. Is there anyone who can help? > >> > >> Best regards, > >> Zheng > >> > >> Alexandre Mergnat <amergnat@baylibre.com> 于2023年7月7日周五 22:11写道: > >>> > >>> > >>> > >>> On 07/07/2023 11:24, Zheng Wang wrote: > >>>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > >>>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > >>>> and mtk_jpeg_enc_device_run may be called to start the > >>>> work. > >>>> If we remove the module which will call mtk_jpeg_remove > >>>> to make cleanup, there may be a unfinished work. The > >>>> possible sequence is as follows, which will cause a > >>>> typical UAF bug. > >>>> > >>>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove > >>>> > >>>> CPU0 CPU1 > >>>> > >>>> |mtk_jpeg_job_timeout_work > >>>> mtk_jpeg_remove | > >>>> v4l2_m2m_release | > >>>> kfree(m2m_dev); | > >>>> | > >>>> | v4l2_m2m_get_curr_priv > >>>> | m2m_dev->curr_ctx //use > >>> > >>> Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> > >>> > >>> -- > >>> Regards, > >>> Alexandre > > Hi Zheng, > > If you asking me to merge patch, sorry but I can't, I'm just a reviewer. > I invite you to ping the maintainers directly: > > Bin Liu <bin.liu@mediatek.com> (supporter:MEDIATEK JPEG DRIVER) > Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT > INFRASTRUCTURE (V4L/DVB)) > Matthias Brugger <matthias.bgg@gmail.com> (maintainer:ARM/Mediatek SoC > support) > > Otherwise, I misunderstood what you asking me. If so, can you rephrase > your question please? > > -- > Regards, > Alexandre
© 2016 - 2026 Red Hat, Inc.