[PATCH 1/7] drm/tidss: Fix issue in irq handling causing irq-flood issue

Tomi Valkeinen posted 7 patches 1 month ago
[PATCH 1/7] drm/tidss: Fix issue in irq handling causing irq-flood issue
Posted by Tomi Valkeinen 1 month ago
It has been observed that sometimes DSS will trigger an interrupt and
the top level interrupt (DISPC_IRQSTATUS) is not zero, but the VP and
VID level interrupt-statuses are zero.

As the top level irqstatus is supposed to tell whether we have VP/VID
interrupts, the thinking of the driver authors was that this particular
case could never happen. Thus the driver only clears the DISPC_IRQSTATUS
bits which has corresponding interrupts in VP/VID status. So when this
issue happens, the driver will not clear DISPC_IRQSTATUS, and we get an
interrupt flood.

It is unclear why the issue happens. It could be a race issue in the
driver, but no such race has been found. It could also be an issue with
the HW. However a similar case can be easily triggered by manually
writing to DISPC_IRQSTATUS_RAW. This will forcibly set a bit in the
DISPC_IRQSTATUS and trigger an interrupt, and as the driver never clears
the bit, we get an interrupt flood.

To fix the issue, always clear DISPC_IRQSTATUS. The concern with this
solution is that if the top level irqstatus is the one that triggers the
interrupt, always clearing DISPC_IRQSTATUS might leave some interrupts
unhandled if VP/VID interrupt statuses have bits set. However, testing
shows that if any of the irqstatuses is set (i.e. even if
DISPC_IRQSTATUS == 0, but a VID irqstatus has a bit set), we will get an
interrupt.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Co-developed-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Co-developed-by: Devarsh Thakkar <devarsht@ti.com>
Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
Co-developed-by: Jonathan Cormier <jcormier@criticallink.com>
Signed-off-by: Jonathan Cormier <jcormier@criticallink.com>
Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/tidss/tidss_dispc.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 1ad711f8d2a8..f81111067578 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -780,24 +780,20 @@ static
 void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
 {
 	unsigned int i;
-	u32 top_clear = 0;
 
 	for (i = 0; i < dispc->feat->num_vps; ++i) {
-		if (clearmask & DSS_IRQ_VP_MASK(i)) {
+		if (clearmask & DSS_IRQ_VP_MASK(i))
 			dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
-			top_clear |= BIT(i);
-		}
 	}
 	for (i = 0; i < dispc->feat->num_planes; ++i) {
-		if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
+		if (clearmask & DSS_IRQ_PLANE_MASK(i))
 			dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
-			top_clear |= BIT(4 + i);
-		}
 	}
 	if (dispc->feat->subrev == DISPC_K2G)
 		return;
 
-	dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
+	/* always clear the top level irqstatus */
+	dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS));
 
 	/* Flush posted writes */
 	dispc_read(dispc, DISPC_IRQSTATUS);

-- 
2.43.0
Re: [PATCH 1/7] drm/tidss: Fix issue in irq handling causing irq-flood issue
Posted by Aradhya Bhatia 1 day, 10 hours ago
Hi Tomi, Devarsh,

On 10/21/24 19:37, Tomi Valkeinen wrote:
> It has been observed that sometimes DSS will trigger an interrupt and
> the top level interrupt (DISPC_IRQSTATUS) is not zero, but the VP and
> VID level interrupt-statuses are zero.

Does this mean that there was a legitimate interrupt that potentially
went unrecognized? Or that there was a, for the lack of a better word,
fake interrupt trigger that doesn't need handling but just clearing?

> 
> As the top level irqstatus is supposed to tell whether we have VP/VID
> interrupts, the thinking of the driver authors was that this particular
> case could never happen. Thus the driver only clears the DISPC_IRQSTATUS
> bits which has corresponding interrupts in VP/VID status. So when this
> issue happens, the driver will not clear DISPC_IRQSTATUS, and we get an
> interrupt flood.
> 
> It is unclear why the issue happens. It could be a race issue in the
> driver, but no such race has been found. It could also be an issue with
> the HW. However a similar case can be easily triggered by manually
> writing to DISPC_IRQSTATUS_RAW. This will forcibly set a bit in the
> DISPC_IRQSTATUS and trigger an interrupt, and as the driver never clears
> the bit, we get an interrupt flood.
> 
> To fix the issue, always clear DISPC_IRQSTATUS. The concern with this
> solution is that if the top level irqstatus is the one that triggers the
> interrupt, always clearing DISPC_IRQSTATUS might leave some interrupts
> unhandled if VP/VID interrupt statuses have bits set. However, testing
> shows that if any of the irqstatuses is set (i.e. even if
> DISPC_IRQSTATUS == 0, but a VID irqstatus has a bit set), we will get an
> interrupt.

Does this mean if VID/VP irqstatus has been set right around the time
the equivalent DISPC_IRQSTATUS bit is being cleared, the equivalent
DISPC_IRQSTATUS bit is going to get set again, and make the driver
handle the event as we expect it to?

> 
> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
> Co-developed-by: Bin Liu <b-liu@ti.com>
> Signed-off-by: Bin Liu <b-liu@ti.com>
> Co-developed-by: Devarsh Thakkar <devarsht@ti.com>
> Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
> Co-developed-by: Jonathan Cormier <jcormier@criticallink.com>
> Signed-off-by: Jonathan Cormier <jcormier@criticallink.com>
> Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/tidss/tidss_dispc.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
> index 1ad711f8d2a8..f81111067578 100644
> --- a/drivers/gpu/drm/tidss/tidss_dispc.c
> +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
> @@ -780,24 +780,20 @@ static
>  void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
>  {
>  	unsigned int i;
> -	u32 top_clear = 0;
>  
>  	for (i = 0; i < dispc->feat->num_vps; ++i) {
> -		if (clearmask & DSS_IRQ_VP_MASK(i)) {
> +		if (clearmask & DSS_IRQ_VP_MASK(i))
>  			dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
> -			top_clear |= BIT(i);
> -		}
>  	}
>  	for (i = 0; i < dispc->feat->num_planes; ++i) {
> -		if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
> +		if (clearmask & DSS_IRQ_PLANE_MASK(i))
>  			dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
> -			top_clear |= BIT(4 + i);
> -		}
>  	}

nit: Maybe these for-loop braces could be dropped as well.

Otherwise, LGTM,

Reviewed-by: Aradhya Bhatia <aradhya.bhatia@linux.dev>

Regards
Aradhya


[..]
Re: [PATCH 1/7] drm/tidss: Fix issue in irq handling causing irq-flood issue
Posted by Tomi Valkeinen 17 hours ago
Hi,

On 24/11/2024 19:18, Aradhya Bhatia wrote:
> Hi Tomi, Devarsh,
> 
> On 10/21/24 19:37, Tomi Valkeinen wrote:
>> It has been observed that sometimes DSS will trigger an interrupt and
>> the top level interrupt (DISPC_IRQSTATUS) is not zero, but the VP and
>> VID level interrupt-statuses are zero.
> 
> Does this mean that there was a legitimate interrupt that potentially
> went unrecognized? Or that there was a, for the lack of a better word,
> fake interrupt trigger that doesn't need handling but just clearing?

I don't have an answer to that. I haven't been able to trigger this 
issue, and I guess it's difficult to say for certain in any case.

My guess is that it's some kind of race issue either in the HW or the 
combination of HW+SW.

>> As the top level irqstatus is supposed to tell whether we have VP/VID
>> interrupts, the thinking of the driver authors was that this particular
>> case could never happen. Thus the driver only clears the DISPC_IRQSTATUS
>> bits which has corresponding interrupts in VP/VID status. So when this
>> issue happens, the driver will not clear DISPC_IRQSTATUS, and we get an
>> interrupt flood.
>>
>> It is unclear why the issue happens. It could be a race issue in the
>> driver, but no such race has been found. It could also be an issue with
>> the HW. However a similar case can be easily triggered by manually
>> writing to DISPC_IRQSTATUS_RAW. This will forcibly set a bit in the
>> DISPC_IRQSTATUS and trigger an interrupt, and as the driver never clears
>> the bit, we get an interrupt flood.
>>
>> To fix the issue, always clear DISPC_IRQSTATUS. The concern with this
>> solution is that if the top level irqstatus is the one that triggers the
>> interrupt, always clearing DISPC_IRQSTATUS might leave some interrupts
>> unhandled if VP/VID interrupt statuses have bits set. However, testing
>> shows that if any of the irqstatuses is set (i.e. even if
>> DISPC_IRQSTATUS == 0, but a VID irqstatus has a bit set), we will get an
>> interrupt.
> 
> Does this mean if VID/VP irqstatus has been set right around the time
> the equivalent DISPC_IRQSTATUS bit is being cleared, the equivalent
> DISPC_IRQSTATUS bit is going to get set again, and make the driver
> handle the event as we expect it to?

(If I recall right) no, DISPC_IRQSTATUS won't be set. But it doesn't 
matter, the interrupt will be triggered anyway, and the driver will 
handle the interrupt.

>>
>> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
>> Co-developed-by: Bin Liu <b-liu@ti.com>
>> Signed-off-by: Bin Liu <b-liu@ti.com>
>> Co-developed-by: Devarsh Thakkar <devarsht@ti.com>
>> Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
>> Co-developed-by: Jonathan Cormier <jcormier@criticallink.com>
>> Signed-off-by: Jonathan Cormier <jcormier@criticallink.com>
>> Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
>> Cc: stable@vger.kernel.org
>> ---
>>   drivers/gpu/drm/tidss/tidss_dispc.c | 12 ++++--------
>>   1 file changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
>> index 1ad711f8d2a8..f81111067578 100644
>> --- a/drivers/gpu/drm/tidss/tidss_dispc.c
>> +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
>> @@ -780,24 +780,20 @@ static
>>   void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
>>   {
>>   	unsigned int i;
>> -	u32 top_clear = 0;
>>   
>>   	for (i = 0; i < dispc->feat->num_vps; ++i) {
>> -		if (clearmask & DSS_IRQ_VP_MASK(i)) {
>> +		if (clearmask & DSS_IRQ_VP_MASK(i))
>>   			dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
>> -			top_clear |= BIT(i);
>> -		}
>>   	}
>>   	for (i = 0; i < dispc->feat->num_planes; ++i) {
>> -		if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
>> +		if (clearmask & DSS_IRQ_PLANE_MASK(i))
>>   			dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
>> -			top_clear |= BIT(4 + i);
>> -		}
>>   	}
> 
> nit: Maybe these for-loop braces could be dropped as well.

I like to have braces if there are multiple lines under it.

> Otherwise, LGTM,
> 
> Reviewed-by: Aradhya Bhatia <aradhya.bhatia@linux.dev>

Thanks!

  Tomi
Re: [PATCH 1/7] drm/tidss: Fix issue in irq handling causing irq-flood issue
Posted by Jon Cormier 1 month ago
On Mon, Oct 21, 2024 at 10:08 AM Tomi Valkeinen
<tomi.valkeinen@ideasonboard.com> wrote:
>
> It has been observed that sometimes DSS will trigger an interrupt and
> the top level interrupt (DISPC_IRQSTATUS) is not zero, but the VP and
> VID level interrupt-statuses are zero.
>
> As the top level irqstatus is supposed to tell whether we have VP/VID
> interrupts, the thinking of the driver authors was that this particular
> case could never happen. Thus the driver only clears the DISPC_IRQSTATUS
> bits which has corresponding interrupts in VP/VID status. So when this
> issue happens, the driver will not clear DISPC_IRQSTATUS, and we get an
> interrupt flood.
>
> It is unclear why the issue happens. It could be a race issue in the
> driver, but no such race has been found. It could also be an issue with
> the HW. However a similar case can be easily triggered by manually
> writing to DISPC_IRQSTATUS_RAW. This will forcibly set a bit in the
> DISPC_IRQSTATUS and trigger an interrupt, and as the driver never clears
> the bit, we get an interrupt flood.
>
> To fix the issue, always clear DISPC_IRQSTATUS. The concern with this
> solution is that if the top level irqstatus is the one that triggers the
> interrupt, always clearing DISPC_IRQSTATUS might leave some interrupts
> unhandled if VP/VID interrupt statuses have bits set. However, testing
> shows that if any of the irqstatuses is set (i.e. even if
> DISPC_IRQSTATUS == 0, but a VID irqstatus has a bit set), we will get an
> interrupt.
>
> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
> Co-developed-by: Bin Liu <b-liu@ti.com>
> Signed-off-by: Bin Liu <b-liu@ti.com>
> Co-developed-by: Devarsh Thakkar <devarsht@ti.com>
> Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
> Co-developed-by: Jonathan Cormier <jcormier@criticallink.com>
> Signed-off-by: Jonathan Cormier <jcormier@criticallink.com>
> Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/tidss/tidss_dispc.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)

I assume a reviewed by doesn't make sense since I co-developed this
patch but adding my tested by, hopefully, that makes sense.

Tested an equivalent patch for several weeks.
Tested-by: Jonathan Cormier <jcormier@criticallink.com>
>
> diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
> index 1ad711f8d2a8..f81111067578 100644
> --- a/drivers/gpu/drm/tidss/tidss_dispc.c
> +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
> @@ -780,24 +780,20 @@ static
>  void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
>  {
>         unsigned int i;
> -       u32 top_clear = 0;
>
>         for (i = 0; i < dispc->feat->num_vps; ++i) {
> -               if (clearmask & DSS_IRQ_VP_MASK(i)) {
> +               if (clearmask & DSS_IRQ_VP_MASK(i))
>                         dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
> -                       top_clear |= BIT(i);
> -               }
>         }
>         for (i = 0; i < dispc->feat->num_planes; ++i) {
> -               if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
> +               if (clearmask & DSS_IRQ_PLANE_MASK(i))
>                         dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
> -                       top_clear |= BIT(4 + i);
> -               }
>         }
>         if (dispc->feat->subrev == DISPC_K2G)
>                 return;
>
> -       dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
> +       /* always clear the top level irqstatus */
> +       dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS));
>
>         /* Flush posted writes */
>         dispc_read(dispc, DISPC_IRQSTATUS);
>
> --
> 2.43.0
>