[PATCH] drm/vblank: fix misuse of drm_WARN in drm_wait_one_vblank()

Vitaliy Shevtsov posted 1 patch 1 year ago
drivers/gpu/drm/drm_vblank.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH] drm/vblank: fix misuse of drm_WARN in drm_wait_one_vblank()
Posted by Vitaliy Shevtsov 1 year ago
drm_wait_one_vblank() uses drm_WARN() to check for a time-dependent
condition. Since syzkaller runs the kernel with the panic_on_warn set, this
causes the entire kernel to panic with a "vblank wait timed out on crtc %i"
message.

In this case it does not mean that there is something wrong with the kernel
but is caused by time delays in vblanks handling that the fuzzer introduces
as a side effect when fail_alloc_pages, failslab, fail_usercopy faults are
injected with maximum verbosity. With lower verbosity this issue disappears.

drm_WARN() was introduced here by e8450f51a4b3 ("drm/irq: Implement a
generic vblank_wait function") and it is intended to indicate a failure with
vblank irqs handling by the underlying driver. The issue is raised during
testing of the vkms driver, but it may be potentially reproduced with other
drivers.

Fix this by using drm_warn() instead which does not cause the kernel to
panic with panic_on_warn set, but still provides a way to tell users about
this unexpected condition.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: e8450f51a4b3 ("drm/irq: Implement a generic vblank_wait function")
Cc: stable@vger.kernel.org
Reported-by: syzbot+9a8f87865d5e2e8ef57f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9a8f87865d5e2e8ef57f
Signed-off-by: Vitaliy Shevtsov <v.shevtsov@maxima.ru>
---
 drivers/gpu/drm/drm_vblank.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 94e45ed6869d..fa09ff5b1d48 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -1304,7 +1304,8 @@ void drm_wait_one_vblank(struct drm_device *dev, unsigned int pipe)
 				 last != drm_vblank_count(dev, pipe),
 				 msecs_to_jiffies(100));
 
-	drm_WARN(dev, ret == 0, "vblank wait timed out on crtc %i\n", pipe);
+	if (!ret)
+		drm_warn(dev, "vblank wait timed out on crtc %i\n", pipe);
 
 	drm_vblank_put(dev, pipe);
 }
-- 
2.47.1
Re: [PATCH] drm/vblank: fix misuse of drm_WARN in drm_wait_one_vblank()
Posted by Simona Vetter 1 year ago
On Fri, Jan 10, 2025 at 04:49:13PM +0000, Vitaliy Shevtsov wrote:
> drm_wait_one_vblank() uses drm_WARN() to check for a time-dependent
> condition. Since syzkaller runs the kernel with the panic_on_warn set, this
> causes the entire kernel to panic with a "vblank wait timed out on crtc %i"
> message.
> 
> In this case it does not mean that there is something wrong with the kernel
> but is caused by time delays in vblanks handling that the fuzzer introduces
> as a side effect when fail_alloc_pages, failslab, fail_usercopy faults are
> injected with maximum verbosity. With lower verbosity this issue disappears.

Hm, unless a drivers vblank handling code is extremely fun, there should
be absolutely no memory allocations or user copies in there at all. Hence
I think you're papering over a real bug here. The vblank itself should be
purely a free-wheeling hrtimer, if those stop we have serious kernel bug
at our hands.

Which wouldn't be a big surprise, because we've fixed a _lot_ of bugs in
vkms' vblank and page flip code, it's surprisingly tricky.

Iow, what kind of memory allocation is holding up vkms vblanks?

Cheers, Sima

> drm_WARN() was introduced here by e8450f51a4b3 ("drm/irq: Implement a
> generic vblank_wait function") and it is intended to indicate a failure with
> vblank irqs handling by the underlying driver. The issue is raised during
> testing of the vkms driver, but it may be potentially reproduced with other
> drivers.
> 
> Fix this by using drm_warn() instead which does not cause the kernel to
> panic with panic_on_warn set, but still provides a way to tell users about
> this unexpected condition.
> 
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
> 
> Fixes: e8450f51a4b3 ("drm/irq: Implement a generic vblank_wait function")
> Cc: stable@vger.kernel.org
> Reported-by: syzbot+9a8f87865d5e2e8ef57f@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=9a8f87865d5e2e8ef57f
> Signed-off-by: Vitaliy Shevtsov <v.shevtsov@maxima.ru>
> ---
>  drivers/gpu/drm/drm_vblank.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> index 94e45ed6869d..fa09ff5b1d48 100644
> --- a/drivers/gpu/drm/drm_vblank.c
> +++ b/drivers/gpu/drm/drm_vblank.c
> @@ -1304,7 +1304,8 @@ void drm_wait_one_vblank(struct drm_device *dev, unsigned int pipe)
>  				 last != drm_vblank_count(dev, pipe),
>  				 msecs_to_jiffies(100));
>  
> -	drm_WARN(dev, ret == 0, "vblank wait timed out on crtc %i\n", pipe);
> +	if (!ret)
> +		drm_warn(dev, "vblank wait timed out on crtc %i\n", pipe);
>  
>  	drm_vblank_put(dev, pipe);
>  }
> -- 
> 2.47.1
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Re: [PATCH] drm/vblank: fix misuse of drm_WARN in drm_wait_one_vblank()
Posted by Vitaliy Shevtsov 1 year ago
On Fri, 10 Jan 2025 20:19:47 +0100
Simona Vetter <simona.vetter@ffwll.ch> wrote:

> Hm, unless a drivers vblank handling code is extremely fun, there should
> be absolutely no memory allocations or user copies in there at all. Hence
> I think you're papering over a real bug here. The vblank itself should be
> purely a free-wheeling hrtimer, if those stop we have serious kernel bug
> at our hands.
> 
> Which wouldn't be a big surprise, because we've fixed a _lot_ of bugs in
> vkms' vblank and page flip code, it's surprisingly tricky.
> 
> Iow, what kind of memory allocation is holding up vkms vblanks?
> 
> Cheers, Sima
> 

I don't think this is because of memory allocation. As far as I can see
there is no memory allocation in vblanks handling. Okay, there is a kzalloc()
call in vkms_atomic_crtc_reset() without checking a pointer but this is
not the root cause of this issue. My first thought was that somehow a
vblank might not be successfully processed by drm_crtc_handle_vblank() in
vkms_vblank_simulate() function which always returns HRTIMER_RESTART even
if a vblank handling failed. But this hypothesis was not confirmed -
all vblanks are fine. The hrtimers in vkms have a hardcoded framedur
value of 16ms and what I can see is that the fault injection creates
some delays by unwinding the call stack when it simulates an allocation
failure and this caused the hrtimers to lag. This what I was able to
investigate while I was debugging the kernel in the gdb.

A similar issue was being discussed in
https://lore.kernel.org/linux-kernel//0000000000009cd8d505bd545452@google.com/T/

-- 
Vitaliy Shevtsov <v.shevtsov@maxima.ru>
Re: [PATCH] drm/vblank: fix misuse of drm_WARN in drm_wait_one_vblank()
Posted by Jani Nikula 1 year ago
On Sat, 11 Jan 2025, Vitaliy Shevtsov <v.shevtsov@maxima.ru> wrote:
> On Fri, 10 Jan 2025 20:19:47 +0100
> Simona Vetter <simona.vetter@ffwll.ch> wrote:
>
>> Hm, unless a drivers vblank handling code is extremely fun, there should
>> be absolutely no memory allocations or user copies in there at all. Hence
>> I think you're papering over a real bug here. The vblank itself should be
>> purely a free-wheeling hrtimer, if those stop we have serious kernel bug
>> at our hands.
>> 
>> Which wouldn't be a big surprise, because we've fixed a _lot_ of bugs in
>> vkms' vblank and page flip code, it's surprisingly tricky.
>> 
>> Iow, what kind of memory allocation is holding up vkms vblanks?
>> 
>> Cheers, Sima
>> 
>
> I don't think this is because of memory allocation. As far as I can see
> there is no memory allocation in vblanks handling. Okay, there is a kzalloc()
> call in vkms_atomic_crtc_reset() without checking a pointer but this is
> not the root cause of this issue. My first thought was that somehow a
> vblank might not be successfully processed by drm_crtc_handle_vblank() in
> vkms_vblank_simulate() function which always returns HRTIMER_RESTART even
> if a vblank handling failed. But this hypothesis was not confirmed -
> all vblanks are fine. The hrtimers in vkms have a hardcoded framedur
> value of 16ms and what I can see is that the fault injection creates
> some delays by unwinding the call stack when it simulates an allocation
> failure and this caused the hrtimers to lag. This what I was able to
> investigate while I was debugging the kernel in the gdb.
>
> A similar issue was being discussed in
> https://lore.kernel.org/linux-kernel//0000000000009cd8d505bd545452@google.com/T/

Seems to me in most cases we do want the WARN, but there are corner
cases. Arguably those should be addressed instead to ensure we won't
ignore the real bugs. We want the warning, you want the panic.


BR,
Jani.


-- 
Jani Nikula, Intel