[PATCH 4/4] media: dw100: Split interrupt handler to fix timeout error

Stefan Klug posted 4 patches 1 month ago
There is a newer version of this series
[PATCH 4/4] media: dw100: Split interrupt handler to fix timeout error
Posted by Stefan Klug 1 month ago
In the previous commit, the interrupt handler was changed to threaded.
This sometimes leads to DW100_INTERRUPT_STATUS_INT_ERR_TIME_OUT being
set after changing the vertex map. This can be seen by repeated error
outputs in dmesg:

dw100 32e30000.dwe: Interrupt error: 0x1

As there is no documentation available, it is unclear why that happens
and if this condition can simply be ignored. By splitting the interrupt
handling into two parts and only handling the dw100_job_finish() within
the threaded part, the error does not occur anymore.

Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>

---

As noted on the cover letter, this patch still is intended to start the
discussion for a proper fix.

While writing this I noted that when
DW100_INTERRUPT_STATUS_INT_FRAME_DONE is set, the job gets finished
without error even when err_irqs != 0. Is that on purpose?
---
 drivers/media/platform/nxp/dw100/dw100.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/media/platform/nxp/dw100/dw100.c b/drivers/media/platform/nxp/dw100/dw100.c
index 4f5ef70e5f4a052fb5f208e35f8785f9d30dc54e..67d941bdf768398edc611c94896cc42a70b88225 100644
--- a/drivers/media/platform/nxp/dw100/dw100.c
+++ b/drivers/media/platform/nxp/dw100/dw100.c
@@ -10,6 +10,7 @@
 #include <linux/clk.h>
 #include <linux/debugfs.h>
 #include <linux/interrupt.h>
+#include <linux/irqreturn.h>
 #include <linux/io.h>
 #include <linux/minmax.h>
 #include <linux/module.h>
@@ -74,6 +75,7 @@ struct dw100_device {
 	struct clk_bulk_data		*clks;
 	int				num_clks;
 	struct dentry			*debugfs_root;
+	bool				frame_failed;
 };
 
 struct dw100_q_data {
@@ -1411,7 +1413,8 @@ static irqreturn_t dw100_irq_handler(int irq, void *dev_id)
 {
 	struct dw100_device *dw_dev = dev_id;
 	u32 pending_irqs, err_irqs, frame_done_irq;
-	bool with_error = true;
+
+	dw_dev->frame_failed = true;
 
 	pending_irqs = dw_hw_get_pending_irqs(dw_dev);
 	frame_done_irq = pending_irqs & DW100_INTERRUPT_STATUS_INT_FRAME_DONE;
@@ -1419,7 +1422,7 @@ static irqreturn_t dw100_irq_handler(int irq, void *dev_id)
 
 	if (frame_done_irq) {
 		dev_dbg(&dw_dev->pdev->dev, "Frame done interrupt\n");
-		with_error = false;
+		dw_dev->frame_failed = false;
 		err_irqs &= ~DW100_INTERRUPT_STATUS_INT_ERR_STATUS
 			(DW100_INTERRUPT_STATUS_INT_ERR_FRAME_DONE);
 	}
@@ -1432,7 +1435,14 @@ static irqreturn_t dw100_irq_handler(int irq, void *dev_id)
 	dw100_hw_clear_irq(dw_dev, pending_irqs |
 			   DW100_INTERRUPT_STATUS_INT_ERR_TIME_OUT);
 
-	dw100_job_finish(dw_dev, with_error);
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t dw100_irq_thread_fn(int irq, void *dev_id)
+{
+	struct dw100_device *dw_dev = dev_id;
+
+	dw100_job_finish(dw_dev, dw_dev->frame_failed);
 
 	return IRQ_HANDLED;
 }
@@ -1600,8 +1610,8 @@ static int dw100_probe(struct platform_device *pdev)
 
 	pm_runtime_put_sync(&pdev->dev);
 
-	ret = devm_request_threaded_irq(&pdev->dev, irq, NULL,
-					dw100_irq_handler, IRQF_ONESHOT,
+	ret = devm_request_threaded_irq(&pdev->dev, irq, dw100_irq_handler,
+					dw100_irq_thread_fn, IRQF_ONESHOT,
 					dev_name(&pdev->dev), dw_dev);
 	if (ret < 0) {
 		dev_err(&pdev->dev, "Failed to request irq: %d\n", ret);

-- 
2.51.0
Re: [PATCH 4/4] media: dw100: Split interrupt handler to fix timeout error
Posted by Nicolas Dufresne 1 month ago
Hi,

Le lundi 05 janvier 2026 à 12:35 +0100, Stefan Klug a écrit :
> In the previous commit, the interrupt handler was changed to threaded.
> This sometimes leads to DW100_INTERRUPT_STATUS_INT_ERR_TIME_OUT being
> set after changing the vertex map. This can be seen by repeated error
> outputs in dmesg:
> 
> dw100 32e30000.dwe: Interrupt error: 0x1
> 
> As there is no documentation available, it is unclear why that happens
> and if this condition can simply be ignored. By splitting the interrupt
> handling into two parts and only handling the dw100_job_finish() within
> the threaded part, the error does not occur anymore.
> 
> Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>

Ok, but arguably, this could be squashed.

Nicolas

> 
> ---
> 
> As noted on the cover letter, this patch still is intended to start the
> discussion for a proper fix.
> 
> While writing this I noted that when
> DW100_INTERRUPT_STATUS_INT_FRAME_DONE is set, the job gets finished
> without error even when err_irqs != 0. Is that on purpose?
> ---
>  drivers/media/platform/nxp/dw100/dw100.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/media/platform/nxp/dw100/dw100.c
> b/drivers/media/platform/nxp/dw100/dw100.c
> index
> 4f5ef70e5f4a052fb5f208e35f8785f9d30dc54e..67d941bdf768398edc611c94896cc42a70b8
> 8225 100644
> --- a/drivers/media/platform/nxp/dw100/dw100.c
> +++ b/drivers/media/platform/nxp/dw100/dw100.c
> @@ -10,6 +10,7 @@
>  #include <linux/clk.h>
>  #include <linux/debugfs.h>
>  #include <linux/interrupt.h>
> +#include <linux/irqreturn.h>
>  #include <linux/io.h>
>  #include <linux/minmax.h>
>  #include <linux/module.h>
> @@ -74,6 +75,7 @@ struct dw100_device {
>  	struct clk_bulk_data		*clks;
>  	int				num_clks;
>  	struct dentry			*debugfs_root;
> +	bool				frame_failed;
>  };
>  
>  struct dw100_q_data {
> @@ -1411,7 +1413,8 @@ static irqreturn_t dw100_irq_handler(int irq, void
> *dev_id)
>  {
>  	struct dw100_device *dw_dev = dev_id;
>  	u32 pending_irqs, err_irqs, frame_done_irq;
> -	bool with_error = true;
> +
> +	dw_dev->frame_failed = true;
>  
>  	pending_irqs = dw_hw_get_pending_irqs(dw_dev);
>  	frame_done_irq = pending_irqs &
> DW100_INTERRUPT_STATUS_INT_FRAME_DONE;
> @@ -1419,7 +1422,7 @@ static irqreturn_t dw100_irq_handler(int irq, void
> *dev_id)
>  
>  	if (frame_done_irq) {
>  		dev_dbg(&dw_dev->pdev->dev, "Frame done interrupt\n");
> -		with_error = false;
> +		dw_dev->frame_failed = false;
>  		err_irqs &= ~DW100_INTERRUPT_STATUS_INT_ERR_STATUS
>  			(DW100_INTERRUPT_STATUS_INT_ERR_FRAME_DONE);
>  	}
> @@ -1432,7 +1435,14 @@ static irqreturn_t dw100_irq_handler(int irq, void
> *dev_id)
>  	dw100_hw_clear_irq(dw_dev, pending_irqs |
>  			   DW100_INTERRUPT_STATUS_INT_ERR_TIME_OUT);
>  
> -	dw100_job_finish(dw_dev, with_error);
> +	return IRQ_WAKE_THREAD;
> +}
> +
> +static irqreturn_t dw100_irq_thread_fn(int irq, void *dev_id)
> +{
> +	struct dw100_device *dw_dev = dev_id;
> +
> +	dw100_job_finish(dw_dev, dw_dev->frame_failed);
>  
>  	return IRQ_HANDLED;
>  }
> @@ -1600,8 +1610,8 @@ static int dw100_probe(struct platform_device *pdev)
>  
>  	pm_runtime_put_sync(&pdev->dev);
>  
> -	ret = devm_request_threaded_irq(&pdev->dev, irq, NULL,
> -					dw100_irq_handler, IRQF_ONESHOT,
> +	ret = devm_request_threaded_irq(&pdev->dev, irq, dw100_irq_handler,
> +					dw100_irq_thread_fn, IRQF_ONESHOT,
>  					dev_name(&pdev->dev), dw_dev);
>  	if (ret < 0) {
>  		dev_err(&pdev->dev, "Failed to request irq: %d\n", ret);
Re: [PATCH 4/4] media: dw100: Split interrupt handler to fix timeout error
Posted by Steven Rostedt 1 month ago
On Mon, 05 Jan 2026 14:03:58 -0500
Nicolas Dufresne <nicolas@ndufresne.ca> wrote:

> Le lundi 05 janvier 2026 à 12:35 +0100, Stefan Klug a écrit :
> > In the previous commit, the interrupt handler was changed to threaded.
> > This sometimes leads to DW100_INTERRUPT_STATUS_INT_ERR_TIME_OUT being
> > set after changing the vertex map. This can be seen by repeated error
> > outputs in dmesg:
> > 
> > dw100 32e30000.dwe: Interrupt error: 0x1
> > 
> > As there is no documentation available, it is unclear why that happens
> > and if this condition can simply be ignored. By splitting the interrupt
> > handling into two parts and only handling the dw100_job_finish() within
> > the threaded part, the error does not occur anymore.
> > 
> > Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>  
> 
> Ok, but arguably, this could be squashed.

Agreed. Because it doesn't seem to make sense to have a oneshot threaded
irq handler that doesn't have the two parts (non-threaded to acknowledge the
irq, and the threaded to handle it and re-enable it).

-- Steve
Re: [PATCH 4/4] media: dw100: Split interrupt handler to fix timeout error
Posted by Laurent Pinchart 1 month ago
On Mon, Jan 05, 2026 at 04:37:48PM -0500, Steven Rostedt wrote:
> On Mon, 05 Jan 2026 14:03:58 -0500 Nicolas Dufresne wrote:
> > Le lundi 05 janvier 2026 à 12:35 +0100, Stefan Klug a écrit :
> > > In the previous commit, the interrupt handler was changed to threaded.
> > > This sometimes leads to DW100_INTERRUPT_STATUS_INT_ERR_TIME_OUT being
> > > set after changing the vertex map. This can be seen by repeated error
> > > outputs in dmesg:
> > > 
> > > dw100 32e30000.dwe: Interrupt error: 0x1
> > > 
> > > As there is no documentation available, it is unclear why that happens
> > > and if this condition can simply be ignored. By splitting the interrupt
> > > handling into two parts and only handling the dw100_job_finish() within
> > > the threaded part, the error does not occur anymore.
> > > 
> > > Signed-off-by: Stefan Klug <stefan.klug@ideasonboard.com>  
> > 
> > Ok, but arguably, this could be squashed.

Stefan mentioned that in the cover letter, yes. The patches are
currently split because 4/4 shouldn't be needed based on our
understanding of the hardware. We're hoping for feedback on the issue
from someone with knowledge of the DW100 and access to its
documentation.

> Agreed. Because it doesn't seem to make sense to have a oneshot threaded
> irq handler that doesn't have the two parts (non-threaded to acknowledge the
> irq, and the threaded to handle it and re-enable it).

Why is so ? Isn't oneshot meant exactly for this purpose ? It's
documented as not reenabling the interrupt after the hardirq handler
(which is absent after 3/4) returns, why would a hardirq handler be
mandatory then ?

-- 
Regards,

Laurent Pinchart
Re: [PATCH 4/4] media: dw100: Split interrupt handler to fix timeout error
Posted by Steven Rostedt 1 month ago
On Tue, 6 Jan 2026 01:44:52 +0200
Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote:

> > Agreed. Because it doesn't seem to make sense to have a oneshot threaded
> > irq handler that doesn't have the two parts (non-threaded to acknowledge the
> > irq, and the threaded to handle it and re-enable it).  
> 
> Why is so ? Isn't oneshot meant exactly for this purpose ? It's
> documented as not reenabling the interrupt after the hardirq handler
> (which is absent after 3/4) returns, why would a hardirq handler be
> mandatory then ?

Because it's timing out. The error in the change log states:

    In the previous commit, the interrupt handler was changed to threaded.
    This sometimes leads to DW100_INTERRUPT_STATUS_INT_ERR_TIME_OUT being
    set after changing the vertex map. This can be seen by repeated error
    outputs in dmesg:

    dw100 32e30000.dwe: Interrupt error: 0x1

It needs to be acknowledged in a timely manner. That is best done in the
hard irq context where no locks need to be taken. It looks like the handler
also disables the interrupt on the device and will be reenabled after the
handler has completed (in thread context).

-- Steve
Re: [PATCH 4/4] media: dw100: Split interrupt handler to fix timeout error
Posted by Laurent Pinchart 1 month ago
On Mon, Jan 05, 2026 at 07:43:50PM -0500, Steven Rostedt wrote:
> On Tue, 6 Jan 2026 01:44:52 +0200 Laurent Pinchart wrote:
> 
> > > Agreed. Because it doesn't seem to make sense to have a oneshot threaded
> > > irq handler that doesn't have the two parts (non-threaded to acknowledge the
> > > irq, and the threaded to handle it and re-enable it).  
> > 
> > Why is so ? Isn't oneshot meant exactly for this purpose ? It's
> > documented as not reenabling the interrupt after the hardirq handler
> > (which is absent after 3/4) returns, why would a hardirq handler be
> > mandatory then ?
> 
> Because it's timing out. The error in the change log states:
> 
>     In the previous commit, the interrupt handler was changed to threaded.
>     This sometimes leads to DW100_INTERRUPT_STATUS_INT_ERR_TIME_OUT being
>     set after changing the vertex map. This can be seen by repeated error
>     outputs in dmesg:
> 
>     dw100 32e30000.dwe: Interrupt error: 0x1
> 
> It needs to be acknowledged in a timely manner. That is best done in the
> hard irq context where no locks need to be taken. It looks like the handler
> also disables the interrupt on the device and will be reenabled after the
> handler has completed (in thread context).

My point is that we (neither I nor Stefan) don't know why it's "timing
out" and what it means. There's no documentation publicly available.
I'd like to get to the bottom of this.

-- 
Regards,

Laurent Pinchart
Re: [PATCH 4/4] media: dw100: Split interrupt handler to fix timeout error
Posted by Steven Rostedt 1 month ago
On Tue, 6 Jan 2026 02:51:35 +0200
Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote:

> My point is that we (neither I nor Stefan) don't know why it's "timing
> out" and what it means. There's no documentation publicly available.
> I'd like to get to the bottom of this.

OK, that's beyond my knowledge as it's independent from PREEMPT_RT ;-)

-- Steve