[PATCH 2/2] dmaengine: dw: Fix XFER bit set, but channel not idle error

Serge Semin posted 2 patches 2 months, 2 weeks ago
[PATCH 2/2] dmaengine: dw: Fix XFER bit set, but channel not idle error
Posted by Serge Semin 2 months, 2 weeks ago
If a client driver gets to use the DW DMAC engine device tougher
than usual, with occasional DMA-transfers termination and restart, then
the next error can be randomly spotted in the system log:

> dma dma0chan0: BUG: XFER bit set, but channel not idle!

For instance that happens in case of the 8250 UART port driver handling
the looped back high-speed traffic (in my case > 1.5Mbaud) by means of the
DMA-engine interface.

The error happens due to the two-staged nature of the DW DMAC IRQs
handling procedure and due to the critical section break in the meantime.
In particular in case if the DMA-transfer is terminated and restarted:
1. after the IRQ-handler submitted the tasklet but before the tasklet
   started handling the DMA-descriptors in dwc_scan_descriptors();
2. after the XFER completion flag was detected in the
   dwc_scan_descriptors() method, but before the dwc_complete_all() method
   is called
the error denoted above is printed due to the overlap of the last transfer
completion and the new transfer execution stages.

There are two places need to be altered in order to fix the problem.
1. Clear the IRQs in the dwc_chan_disable() method. That will prevent the
   dwc_scan_descriptors() method call in case if the DMA-transfer is
   restarted in the middle of the two-staged IRQs-handling procedure.
2. Move the dwc_complete_all() code to being executed inseparably (in the
   same atomic section) from the DMA-descriptors scanning procedure. That
   will prevent the DMA-transfer restarts after the DMA-transfer completion
   was spotted but before the actual completion is executed.

Fixes: 69cea5a00d31 ("dmaengine/dw_dmac: Replace spin_lock* with irqsave variants and enable submission from callback")
Fixes: 3bfb1d20b547 ("dmaengine: Driver for the Synopsys DesignWare DMA controller")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
---
 drivers/dma/dw/core.c | 54 ++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
index af1871646eb9..fbc46cbfe259 100644
--- a/drivers/dma/dw/core.c
+++ b/drivers/dma/dw/core.c
@@ -143,6 +143,12 @@ static inline void dwc_chan_disable(struct dw_dma *dw, struct dw_dma_chan *dwc)
 	channel_clear_bit(dw, CH_EN, dwc->mask);
 	while (dma_readl(dw, CH_EN) & dwc->mask)
 		cpu_relax();
+
+	dma_writel(dw, CLEAR.XFER, dwc->mask);
+	dma_writel(dw, CLEAR.BLOCK, dwc->mask);
+	dma_writel(dw, CLEAR.SRC_TRAN, dwc->mask);
+	dma_writel(dw, CLEAR.DST_TRAN, dwc->mask);
+	dma_writel(dw, CLEAR.ERROR, dwc->mask);
 }
 
 /*----------------------------------------------------------------------*/
@@ -259,34 +265,6 @@ dwc_descriptor_complete(struct dw_dma_chan *dwc, struct dw_desc *desc,
 	dmaengine_desc_callback_invoke(&cb, NULL);
 }
 
-static void dwc_complete_all(struct dw_dma *dw, struct dw_dma_chan *dwc)
-{
-	struct dw_desc *desc, *_desc;
-	LIST_HEAD(list);
-	unsigned long flags;
-
-	spin_lock_irqsave(&dwc->lock, flags);
-	if (dma_readl(dw, CH_EN) & dwc->mask) {
-		dev_err(chan2dev(&dwc->chan),
-			"BUG: XFER bit set, but channel not idle!\n");
-
-		/* Try to continue after resetting the channel... */
-		dwc_chan_disable(dw, dwc);
-	}
-
-	/*
-	 * Submit queued descriptors ASAP, i.e. before we go through
-	 * the completed ones.
-	 */
-	list_splice_init(&dwc->active_list, &list);
-	dwc_dostart_first_queued(dwc);
-
-	spin_unlock_irqrestore(&dwc->lock, flags);
-
-	list_for_each_entry_safe(desc, _desc, &list, desc_node)
-		dwc_descriptor_complete(dwc, desc, true);
-}
-
 /* Returns how many bytes were already received from source */
 static inline u32 dwc_get_sent(struct dw_dma_chan *dwc)
 {
@@ -303,6 +281,7 @@ static void dwc_scan_descriptors(struct dw_dma *dw, struct dw_dma_chan *dwc)
 	struct dw_desc *child;
 	u32 status_xfer;
 	unsigned long flags;
+	LIST_HEAD(list);
 
 	spin_lock_irqsave(&dwc->lock, flags);
 	status_xfer = dma_readl(dw, RAW.XFER);
@@ -341,9 +320,26 @@ static void dwc_scan_descriptors(struct dw_dma *dw, struct dw_dma_chan *dwc)
 			clear_bit(DW_DMA_IS_SOFT_LLP, &dwc->flags);
 		}
 
+		/*
+		 * No more active descriptors left to handle. So submit the
+		 * queued descriptors and finish up the already handled ones.
+		 */
+		if (dma_readl(dw, CH_EN) & dwc->mask) {
+			dev_err(chan2dev(&dwc->chan),
+				"BUG: XFER bit set, but channel not idle!\n");
+
+			/* Try to continue after resetting the channel... */
+			dwc_chan_disable(dw, dwc);
+		}
+
+		list_splice_init(&dwc->active_list, &list);
+		dwc_dostart_first_queued(dwc);
+
 		spin_unlock_irqrestore(&dwc->lock, flags);
 
-		dwc_complete_all(dw, dwc);
+		list_for_each_entry_safe(desc, _desc, &list, desc_node)
+			dwc_descriptor_complete(dwc, desc, true);
+
 		return;
 	}
 
-- 
2.43.0
Re: [PATCH 2/2] dmaengine: dw: Fix XFER bit set, but channel not idle error
Posted by Greg Kroah-Hartman 2 months, 2 weeks ago
On Wed, Sep 11, 2024 at 09:46:10PM +0300, Serge Semin wrote:
> If a client driver gets to use the DW DMAC engine device tougher
> than usual, with occasional DMA-transfers termination and restart, then
> the next error can be randomly spotted in the system log:
> 
> > dma dma0chan0: BUG: XFER bit set, but channel not idle!
> 
> For instance that happens in case of the 8250 UART port driver handling
> the looped back high-speed traffic (in my case > 1.5Mbaud) by means of the
> DMA-engine interface.
> 
> The error happens due to the two-staged nature of the DW DMAC IRQs
> handling procedure and due to the critical section break in the meantime.
> In particular in case if the DMA-transfer is terminated and restarted:
> 1. after the IRQ-handler submitted the tasklet but before the tasklet
>    started handling the DMA-descriptors in dwc_scan_descriptors();
> 2. after the XFER completion flag was detected in the
>    dwc_scan_descriptors() method, but before the dwc_complete_all() method
>    is called
> the error denoted above is printed due to the overlap of the last transfer
> completion and the new transfer execution stages.
> 
> There are two places need to be altered in order to fix the problem.
> 1. Clear the IRQs in the dwc_chan_disable() method. That will prevent the
>    dwc_scan_descriptors() method call in case if the DMA-transfer is
>    restarted in the middle of the two-staged IRQs-handling procedure.
> 2. Move the dwc_complete_all() code to being executed inseparably (in the
>    same atomic section) from the DMA-descriptors scanning procedure. That
>    will prevent the DMA-transfer restarts after the DMA-transfer completion
>    was spotted but before the actual completion is executed.
> 
> Fixes: 69cea5a00d31 ("dmaengine/dw_dmac: Replace spin_lock* with irqsave variants and enable submission from callback")
> Fixes: 3bfb1d20b547 ("dmaengine: Driver for the Synopsys DesignWare DMA controller")
> Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
> ---
>  drivers/dma/dw/core.c | 54 ++++++++++++++++++++-----------------------
>  1 file changed, 25 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
> index af1871646eb9..fbc46cbfe259 100644
> --- a/drivers/dma/dw/core.c
> +++ b/drivers/dma/dw/core.c
> @@ -143,6 +143,12 @@ static inline void dwc_chan_disable(struct dw_dma *dw, struct dw_dma_chan *dwc)
>  	channel_clear_bit(dw, CH_EN, dwc->mask);
>  	while (dma_readl(dw, CH_EN) & dwc->mask)
>  		cpu_relax();
> +
> +	dma_writel(dw, CLEAR.XFER, dwc->mask);
> +	dma_writel(dw, CLEAR.BLOCK, dwc->mask);
> +	dma_writel(dw, CLEAR.SRC_TRAN, dwc->mask);
> +	dma_writel(dw, CLEAR.DST_TRAN, dwc->mask);
> +	dma_writel(dw, CLEAR.ERROR, dwc->mask);
>  }
>  
>  /*----------------------------------------------------------------------*/
> @@ -259,34 +265,6 @@ dwc_descriptor_complete(struct dw_dma_chan *dwc, struct dw_desc *desc,
>  	dmaengine_desc_callback_invoke(&cb, NULL);
>  }
>  
> -static void dwc_complete_all(struct dw_dma *dw, struct dw_dma_chan *dwc)
> -{
> -	struct dw_desc *desc, *_desc;
> -	LIST_HEAD(list);
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&dwc->lock, flags);
> -	if (dma_readl(dw, CH_EN) & dwc->mask) {
> -		dev_err(chan2dev(&dwc->chan),
> -			"BUG: XFER bit set, but channel not idle!\n");
> -
> -		/* Try to continue after resetting the channel... */
> -		dwc_chan_disable(dw, dwc);
> -	}
> -
> -	/*
> -	 * Submit queued descriptors ASAP, i.e. before we go through
> -	 * the completed ones.
> -	 */
> -	list_splice_init(&dwc->active_list, &list);
> -	dwc_dostart_first_queued(dwc);
> -
> -	spin_unlock_irqrestore(&dwc->lock, flags);
> -
> -	list_for_each_entry_safe(desc, _desc, &list, desc_node)
> -		dwc_descriptor_complete(dwc, desc, true);
> -}
> -
>  /* Returns how many bytes were already received from source */
>  static inline u32 dwc_get_sent(struct dw_dma_chan *dwc)
>  {
> @@ -303,6 +281,7 @@ static void dwc_scan_descriptors(struct dw_dma *dw, struct dw_dma_chan *dwc)
>  	struct dw_desc *child;
>  	u32 status_xfer;
>  	unsigned long flags;
> +	LIST_HEAD(list);
>  
>  	spin_lock_irqsave(&dwc->lock, flags);
>  	status_xfer = dma_readl(dw, RAW.XFER);
> @@ -341,9 +320,26 @@ static void dwc_scan_descriptors(struct dw_dma *dw, struct dw_dma_chan *dwc)
>  			clear_bit(DW_DMA_IS_SOFT_LLP, &dwc->flags);
>  		}
>  
> +		/*
> +		 * No more active descriptors left to handle. So submit the
> +		 * queued descriptors and finish up the already handled ones.
> +		 */
> +		if (dma_readl(dw, CH_EN) & dwc->mask) {
> +			dev_err(chan2dev(&dwc->chan),
> +				"BUG: XFER bit set, but channel not idle!\n");
> +
> +			/* Try to continue after resetting the channel... */
> +			dwc_chan_disable(dw, dwc);
> +		}
> +
> +		list_splice_init(&dwc->active_list, &list);
> +		dwc_dostart_first_queued(dwc);
> +
>  		spin_unlock_irqrestore(&dwc->lock, flags);
>  
> -		dwc_complete_all(dw, dwc);
> +		list_for_each_entry_safe(desc, _desc, &list, desc_node)
> +			dwc_descriptor_complete(dwc, desc, true);
> +
>  		return;
>  	}
>  
> -- 
> 2.43.0
> 
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You have marked a patch with a "Fixes:" tag for a commit that is in an
  older released kernel, yet you do not have a cc: stable line in the
  signed-off-by area at all, which means that the patch will not be
  applied to any older kernel releases.  To properly fix this, please
  follow the documented rules in the
  Documentation/process/stable-kernel-rules.rst file for how to resolve
  this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot