If a client driver gets to use the DW DMAC engine device tougher
than usual, with occasional DMA-transfers termination and restart, then
the next error can be randomly spotted in the system log:
> dma dma0chan0: BUG: XFER bit set, but channel not idle!
For instance that happens in case of the 8250 UART port driver handling
the looped back high-speed traffic (in my case > 1.5Mbaud) by means of the
DMA-engine interface.
The error happens due to the two-staged nature of the DW DMAC IRQs
handling procedure and due to the critical section break in the meantime.
In particular in case if the DMA-transfer is terminated and restarted:
1. after the IRQ-handler submitted the tasklet but before the tasklet
started handling the DMA-descriptors in dwc_scan_descriptors();
2. after the XFER completion flag was detected in the
dwc_scan_descriptors() method, but before the dwc_complete_all() method
is called
the error denoted above is printed due to the overlap of the last transfer
completion and the new transfer execution stages.
There are two places need to be altered in order to fix the problem.
1. Clear the IRQs in the dwc_chan_disable() method. That will prevent the
dwc_scan_descriptors() method call in case if the DMA-transfer is
restarted in the middle of the two-staged IRQs-handling procedure.
2. Move the dwc_complete_all() code to being executed inseparably (in the
same atomic section) from the DMA-descriptors scanning procedure. That
will prevent the DMA-transfer restarts after the DMA-transfer completion
was spotted but before the actual completion is executed.
Fixes: 69cea5a00d31 ("dmaengine/dw_dmac: Replace spin_lock* with irqsave variants and enable submission from callback")
Fixes: 3bfb1d20b547 ("dmaengine: Driver for the Synopsys DesignWare DMA controller")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
---
drivers/dma/dw/core.c | 54 ++++++++++++++++++++-----------------------
1 file changed, 25 insertions(+), 29 deletions(-)
diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
index af1871646eb9..fbc46cbfe259 100644
--- a/drivers/dma/dw/core.c
+++ b/drivers/dma/dw/core.c
@@ -143,6 +143,12 @@ static inline void dwc_chan_disable(struct dw_dma *dw, struct dw_dma_chan *dwc)
channel_clear_bit(dw, CH_EN, dwc->mask);
while (dma_readl(dw, CH_EN) & dwc->mask)
cpu_relax();
+
+ dma_writel(dw, CLEAR.XFER, dwc->mask);
+ dma_writel(dw, CLEAR.BLOCK, dwc->mask);
+ dma_writel(dw, CLEAR.SRC_TRAN, dwc->mask);
+ dma_writel(dw, CLEAR.DST_TRAN, dwc->mask);
+ dma_writel(dw, CLEAR.ERROR, dwc->mask);
}
/*----------------------------------------------------------------------*/
@@ -259,34 +265,6 @@ dwc_descriptor_complete(struct dw_dma_chan *dwc, struct dw_desc *desc,
dmaengine_desc_callback_invoke(&cb, NULL);
}
-static void dwc_complete_all(struct dw_dma *dw, struct dw_dma_chan *dwc)
-{
- struct dw_desc *desc, *_desc;
- LIST_HEAD(list);
- unsigned long flags;
-
- spin_lock_irqsave(&dwc->lock, flags);
- if (dma_readl(dw, CH_EN) & dwc->mask) {
- dev_err(chan2dev(&dwc->chan),
- "BUG: XFER bit set, but channel not idle!\n");
-
- /* Try to continue after resetting the channel... */
- dwc_chan_disable(dw, dwc);
- }
-
- /*
- * Submit queued descriptors ASAP, i.e. before we go through
- * the completed ones.
- */
- list_splice_init(&dwc->active_list, &list);
- dwc_dostart_first_queued(dwc);
-
- spin_unlock_irqrestore(&dwc->lock, flags);
-
- list_for_each_entry_safe(desc, _desc, &list, desc_node)
- dwc_descriptor_complete(dwc, desc, true);
-}
-
/* Returns how many bytes were already received from source */
static inline u32 dwc_get_sent(struct dw_dma_chan *dwc)
{
@@ -303,6 +281,7 @@ static void dwc_scan_descriptors(struct dw_dma *dw, struct dw_dma_chan *dwc)
struct dw_desc *child;
u32 status_xfer;
unsigned long flags;
+ LIST_HEAD(list);
spin_lock_irqsave(&dwc->lock, flags);
status_xfer = dma_readl(dw, RAW.XFER);
@@ -341,9 +320,26 @@ static void dwc_scan_descriptors(struct dw_dma *dw, struct dw_dma_chan *dwc)
clear_bit(DW_DMA_IS_SOFT_LLP, &dwc->flags);
}
+ /*
+ * No more active descriptors left to handle. So submit the
+ * queued descriptors and finish up the already handled ones.
+ */
+ if (dma_readl(dw, CH_EN) & dwc->mask) {
+ dev_err(chan2dev(&dwc->chan),
+ "BUG: XFER bit set, but channel not idle!\n");
+
+ /* Try to continue after resetting the channel... */
+ dwc_chan_disable(dw, dwc);
+ }
+
+ list_splice_init(&dwc->active_list, &list);
+ dwc_dostart_first_queued(dwc);
+
spin_unlock_irqrestore(&dwc->lock, flags);
- dwc_complete_all(dw, dwc);
+ list_for_each_entry_safe(desc, _desc, &list, desc_node)
+ dwc_descriptor_complete(dwc, desc, true);
+
return;
}
--
2.43.0
On Wed, Sep 11, 2024 at 09:46:10PM +0300, Serge Semin wrote: > If a client driver gets to use the DW DMAC engine device tougher > than usual, with occasional DMA-transfers termination and restart, then > the next error can be randomly spotted in the system log: > > > dma dma0chan0: BUG: XFER bit set, but channel not idle! > > For instance that happens in case of the 8250 UART port driver handling > the looped back high-speed traffic (in my case > 1.5Mbaud) by means of the > DMA-engine interface. > > The error happens due to the two-staged nature of the DW DMAC IRQs > handling procedure and due to the critical section break in the meantime. > In particular in case if the DMA-transfer is terminated and restarted: > 1. after the IRQ-handler submitted the tasklet but before the tasklet > started handling the DMA-descriptors in dwc_scan_descriptors(); > 2. after the XFER completion flag was detected in the > dwc_scan_descriptors() method, but before the dwc_complete_all() method > is called > the error denoted above is printed due to the overlap of the last transfer > completion and the new transfer execution stages. > > There are two places need to be altered in order to fix the problem. > 1. Clear the IRQs in the dwc_chan_disable() method. That will prevent the > dwc_scan_descriptors() method call in case if the DMA-transfer is > restarted in the middle of the two-staged IRQs-handling procedure. > 2. Move the dwc_complete_all() code to being executed inseparably (in the > same atomic section) from the DMA-descriptors scanning procedure. That > will prevent the DMA-transfer restarts after the DMA-transfer completion > was spotted but before the actual completion is executed. > > Fixes: 69cea5a00d31 ("dmaengine/dw_dmac: Replace spin_lock* with irqsave variants and enable submission from callback") > Fixes: 3bfb1d20b547 ("dmaengine: Driver for the Synopsys DesignWare DMA controller") > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> > --- > drivers/dma/dw/core.c | 54 ++++++++++++++++++++----------------------- > 1 file changed, 25 insertions(+), 29 deletions(-) > > diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c > index af1871646eb9..fbc46cbfe259 100644 > --- a/drivers/dma/dw/core.c > +++ b/drivers/dma/dw/core.c > @@ -143,6 +143,12 @@ static inline void dwc_chan_disable(struct dw_dma *dw, struct dw_dma_chan *dwc) > channel_clear_bit(dw, CH_EN, dwc->mask); > while (dma_readl(dw, CH_EN) & dwc->mask) > cpu_relax(); > + > + dma_writel(dw, CLEAR.XFER, dwc->mask); > + dma_writel(dw, CLEAR.BLOCK, dwc->mask); > + dma_writel(dw, CLEAR.SRC_TRAN, dwc->mask); > + dma_writel(dw, CLEAR.DST_TRAN, dwc->mask); > + dma_writel(dw, CLEAR.ERROR, dwc->mask); > } > > /*----------------------------------------------------------------------*/ > @@ -259,34 +265,6 @@ dwc_descriptor_complete(struct dw_dma_chan *dwc, struct dw_desc *desc, > dmaengine_desc_callback_invoke(&cb, NULL); > } > > -static void dwc_complete_all(struct dw_dma *dw, struct dw_dma_chan *dwc) > -{ > - struct dw_desc *desc, *_desc; > - LIST_HEAD(list); > - unsigned long flags; > - > - spin_lock_irqsave(&dwc->lock, flags); > - if (dma_readl(dw, CH_EN) & dwc->mask) { > - dev_err(chan2dev(&dwc->chan), > - "BUG: XFER bit set, but channel not idle!\n"); > - > - /* Try to continue after resetting the channel... */ > - dwc_chan_disable(dw, dwc); > - } > - > - /* > - * Submit queued descriptors ASAP, i.e. before we go through > - * the completed ones. > - */ > - list_splice_init(&dwc->active_list, &list); > - dwc_dostart_first_queued(dwc); > - > - spin_unlock_irqrestore(&dwc->lock, flags); > - > - list_for_each_entry_safe(desc, _desc, &list, desc_node) > - dwc_descriptor_complete(dwc, desc, true); > -} > - > /* Returns how many bytes were already received from source */ > static inline u32 dwc_get_sent(struct dw_dma_chan *dwc) > { > @@ -303,6 +281,7 @@ static void dwc_scan_descriptors(struct dw_dma *dw, struct dw_dma_chan *dwc) > struct dw_desc *child; > u32 status_xfer; > unsigned long flags; > + LIST_HEAD(list); > > spin_lock_irqsave(&dwc->lock, flags); > status_xfer = dma_readl(dw, RAW.XFER); > @@ -341,9 +320,26 @@ static void dwc_scan_descriptors(struct dw_dma *dw, struct dw_dma_chan *dwc) > clear_bit(DW_DMA_IS_SOFT_LLP, &dwc->flags); > } > > + /* > + * No more active descriptors left to handle. So submit the > + * queued descriptors and finish up the already handled ones. > + */ > + if (dma_readl(dw, CH_EN) & dwc->mask) { > + dev_err(chan2dev(&dwc->chan), > + "BUG: XFER bit set, but channel not idle!\n"); > + > + /* Try to continue after resetting the channel... */ > + dwc_chan_disable(dw, dwc); > + } > + > + list_splice_init(&dwc->active_list, &list); > + dwc_dostart_first_queued(dwc); > + > spin_unlock_irqrestore(&dwc->lock, flags); > > - dwc_complete_all(dw, dwc); > + list_for_each_entry_safe(desc, _desc, &list, desc_node) > + dwc_descriptor_complete(dwc, desc, true); > + > return; > } > > -- > 2.43.0 > > Hi, This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him a patch that has triggered this response. He used to manually respond to these common problems, but in order to save his sanity (he kept writing the same thing over and over, yet to different people), I was created. Hopefully you will not take offence and will fix the problem in your patch and resubmit it so that it can be accepted into the Linux kernel tree. You are receiving this message because of the following common error(s) as indicated below: - You have marked a patch with a "Fixes:" tag for a commit that is in an older released kernel, yet you do not have a cc: stable line in the signed-off-by area at all, which means that the patch will not be applied to any older kernel releases. To properly fix this, please follow the documented rules in the Documentation/process/stable-kernel-rules.rst file for how to resolve this. If you wish to discuss this problem further, or you have questions about how to resolve this issue, please feel free to respond to this email and Greg will reply once he has dug out from the pending patches received from other developers. thanks, greg k-h's patch email bot
© 2016 - 2024 Red Hat, Inc.