[PATCH v3] serial: 8250: Fix fifo underflow on flush

John Keeping posted 1 patch 1 year ago
drivers/tty/serial/8250/8250.h      |  2 ++
drivers/tty/serial/8250/8250_dma.c  | 16 ++++++++++++++++
drivers/tty/serial/8250/8250_port.c |  9 +++++++++
3 files changed, 27 insertions(+)
[PATCH v3] serial: 8250: Fix fifo underflow on flush
Posted by John Keeping 1 year ago
When flushing the serial port's buffer, uart_flush_buffer() calls
kfifo_reset() but if there is an outstanding DMA transfer then the
completion function will consume data from the kfifo via
uart_xmit_advance(), underflowing and leading to ongoing DMA as the
driver tries to transmit another 2^32 bytes.

This is readily reproduced with serial-generic and amidi sending even
short messages as closing the device on exit will wait for the fifo to
drain and in the underflow case amidi hangs for 30 seconds on exit in
tty_wait_until_sent().  A trace of that gives:

     kworker/1:1-84    [001]    51.769423: bprint:               serial8250_tx_dma: tx_size=3 fifo_len=3
           amidi-763   [001]    51.769460: bprint:               uart_flush_buffer: resetting fifo
 irq/21-fe530000-76    [000]    51.769474: bprint:               __dma_tx_complete: tx_size=3
 irq/21-fe530000-76    [000]    51.769479: bprint:               serial8250_tx_dma: tx_size=4096 fifo_len=4294967293
 irq/21-fe530000-76    [000]    51.781295: bprint:               __dma_tx_complete: tx_size=4096
 irq/21-fe530000-76    [000]    51.781301: bprint:               serial8250_tx_dma: tx_size=4096 fifo_len=4294963197
 irq/21-fe530000-76    [000]    51.793131: bprint:               __dma_tx_complete: tx_size=4096
 irq/21-fe530000-76    [000]    51.793135: bprint:               serial8250_tx_dma: tx_size=4096 fifo_len=4294959101
 irq/21-fe530000-76    [000]    51.804949: bprint:               __dma_tx_complete: tx_size=4096

Since the port lock is held in when the kfifo is reset in
uart_flush_buffer() and in __dma_tx_complete(), adding a flush_buffer
hook to adjust the outstanding DMA byte count is sufficient to avoid the
kfifo underflow.

Fixes: 9ee4b83e51f74 ("serial: 8250: Add support for dmaengine")
Cc: stable@vger.kernel.org
Signed-off-by: John Keeping <jkeeping@inmusicbrands.com>
---
Changes in v3:
- Fix !CONFIG_SERIAL_8250_DMA build
Changes in v2:
- Add Fixes: tag
- Return early to reduce indentation in serial8250_tx_dma_flush()

 drivers/tty/serial/8250/8250.h      |  2 ++
 drivers/tty/serial/8250/8250_dma.c  | 16 ++++++++++++++++
 drivers/tty/serial/8250/8250_port.c |  9 +++++++++
 3 files changed, 27 insertions(+)

diff --git a/drivers/tty/serial/8250/8250.h b/drivers/tty/serial/8250/8250.h
index 11e05aa014e54..b861585ca02ac 100644
--- a/drivers/tty/serial/8250/8250.h
+++ b/drivers/tty/serial/8250/8250.h
@@ -374,6 +374,7 @@ static inline int is_omap1510_8250(struct uart_8250_port *pt)
 
 #ifdef CONFIG_SERIAL_8250_DMA
 extern int serial8250_tx_dma(struct uart_8250_port *);
+extern void serial8250_tx_dma_flush(struct uart_8250_port *);
 extern int serial8250_rx_dma(struct uart_8250_port *);
 extern void serial8250_rx_dma_flush(struct uart_8250_port *);
 extern int serial8250_request_dma(struct uart_8250_port *);
@@ -406,6 +407,7 @@ static inline int serial8250_tx_dma(struct uart_8250_port *p)
 {
 	return -1;
 }
+static inline void serial8250_tx_dma_flush(struct uart_8250_port *p) { }
 static inline int serial8250_rx_dma(struct uart_8250_port *p)
 {
 	return -1;
diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
index d215c494ee24c..f245a84f4a508 100644
--- a/drivers/tty/serial/8250/8250_dma.c
+++ b/drivers/tty/serial/8250/8250_dma.c
@@ -149,6 +149,22 @@ int serial8250_tx_dma(struct uart_8250_port *p)
 	return ret;
 }
 
+void serial8250_tx_dma_flush(struct uart_8250_port *p)
+{
+	struct uart_8250_dma *dma = p->dma;
+
+	if (!dma->tx_running)
+		return;
+
+	/*
+	 * kfifo_reset() has been called by the serial core, avoid
+	 * advancing and underflowing in __dma_tx_complete().
+	 */
+	dma->tx_size = 0;
+
+	dmaengine_terminate_async(dma->rxchan);
+}
+
 int serial8250_rx_dma(struct uart_8250_port *p)
 {
 	struct uart_8250_dma		*dma = p->dma;
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index d7976a21cca9c..442967a6cd52d 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -2555,6 +2555,14 @@ static void serial8250_shutdown(struct uart_port *port)
 		serial8250_do_shutdown(port);
 }
 
+static void serial8250_flush_buffer(struct uart_port *port)
+{
+	struct uart_8250_port *up = up_to_u8250p(port);
+
+	if (up->dma)
+		serial8250_tx_dma_flush(up);
+}
+
 static unsigned int serial8250_do_get_divisor(struct uart_port *port,
 					      unsigned int baud,
 					      unsigned int *frac)
@@ -3244,6 +3252,7 @@ static const struct uart_ops serial8250_pops = {
 	.break_ctl	= serial8250_break_ctl,
 	.startup	= serial8250_startup,
 	.shutdown	= serial8250_shutdown,
+	.flush_buffer	= serial8250_flush_buffer,
 	.set_termios	= serial8250_set_termios,
 	.set_ldisc	= serial8250_set_ldisc,
 	.pm		= serial8250_pm,
-- 
2.48.1
Re: [PATCH v3] serial: 8250: Fix fifo underflow on flush
Posted by Raul E Rangel 1 week, 2 days ago
On Sat, Feb 08, 2025 at 12:41:44PM +0000, John Keeping wrote:
> When flushing the serial port's buffer, uart_flush_buffer() calls
> kfifo_reset() but if there is an outstanding DMA transfer then the
> completion function will consume data from the kfifo via
> uart_xmit_advance(), underflowing and leading to ongoing DMA as the
> driver tries to transmit another 2^32 bytes.
> 
> This is readily reproduced with serial-generic and amidi sending even
> short messages as closing the device on exit will wait for the fifo to
> drain and in the underflow case amidi hangs for 30 seconds on exit in
> tty_wait_until_sent().  A trace of that gives:
> 
>      kworker/1:1-84    [001]    51.769423: bprint:               serial8250_tx_dma: tx_size=3 fifo_len=3
>            amidi-763   [001]    51.769460: bprint:               uart_flush_buffer: resetting fifo
>  irq/21-fe530000-76    [000]    51.769474: bprint:               __dma_tx_complete: tx_size=3
>  irq/21-fe530000-76    [000]    51.769479: bprint:               serial8250_tx_dma: tx_size=4096 fifo_len=4294967293
>  irq/21-fe530000-76    [000]    51.781295: bprint:               __dma_tx_complete: tx_size=4096
>  irq/21-fe530000-76    [000]    51.781301: bprint:               serial8250_tx_dma: tx_size=4096 fifo_len=4294963197
>  irq/21-fe530000-76    [000]    51.793131: bprint:               __dma_tx_complete: tx_size=4096
>  irq/21-fe530000-76    [000]    51.793135: bprint:               serial8250_tx_dma: tx_size=4096 fifo_len=4294959101
>  irq/21-fe530000-76    [000]    51.804949: bprint:               __dma_tx_complete: tx_size=4096
> 
> Since the port lock is held in when the kfifo is reset in
> uart_flush_buffer() and in __dma_tx_complete(), adding a flush_buffer
> hook to adjust the outstanding DMA byte count is sufficient to avoid the
> kfifo underflow.
> 
> Fixes: 9ee4b83e51f74 ("serial: 8250: Add support for dmaengine")
> Cc: stable@vger.kernel.org
> Signed-off-by: John Keeping <jkeeping@inmusicbrands.com>
> ---
> Changes in v3:
> - Fix !CONFIG_SERIAL_8250_DMA build
> Changes in v2:
> - Add Fixes: tag
> - Return early to reduce indentation in serial8250_tx_dma_flush()
> 
>  drivers/tty/serial/8250/8250.h      |  2 ++
>  drivers/tty/serial/8250/8250_dma.c  | 16 ++++++++++++++++
>  drivers/tty/serial/8250/8250_port.c |  9 +++++++++
>  3 files changed, 27 insertions(+)
> 
> diff --git a/drivers/tty/serial/8250/8250.h b/drivers/tty/serial/8250/8250.h
> index 11e05aa014e54..b861585ca02ac 100644
> --- a/drivers/tty/serial/8250/8250.h
> +++ b/drivers/tty/serial/8250/8250.h
> @@ -374,6 +374,7 @@ static inline int is_omap1510_8250(struct uart_8250_port *pt)
>  
>  #ifdef CONFIG_SERIAL_8250_DMA
>  extern int serial8250_tx_dma(struct uart_8250_port *);
> +extern void serial8250_tx_dma_flush(struct uart_8250_port *);
>  extern int serial8250_rx_dma(struct uart_8250_port *);
>  extern void serial8250_rx_dma_flush(struct uart_8250_port *);
>  extern int serial8250_request_dma(struct uart_8250_port *);
> @@ -406,6 +407,7 @@ static inline int serial8250_tx_dma(struct uart_8250_port *p)
>  {
>  	return -1;
>  }
> +static inline void serial8250_tx_dma_flush(struct uart_8250_port *p) { }
>  static inline int serial8250_rx_dma(struct uart_8250_port *p)
>  {
>  	return -1;
> diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
> index d215c494ee24c..f245a84f4a508 100644
> --- a/drivers/tty/serial/8250/8250_dma.c
> +++ b/drivers/tty/serial/8250/8250_dma.c
> @@ -149,6 +149,22 @@ int serial8250_tx_dma(struct uart_8250_port *p)
>  	return ret;
>  }
>  
> +void serial8250_tx_dma_flush(struct uart_8250_port *p)
> +{
> +	struct uart_8250_dma *dma = p->dma;
> +
> +	if (!dma->tx_running)
> +		return;
> +
> +	/*
> +	 * kfifo_reset() has been called by the serial core, avoid
> +	 * advancing and underflowing in __dma_tx_complete().
> +	 */
> +	dma->tx_size = 0;
> +
> +	dmaengine_terminate_async(dma->rxchan);

Hrmm, I think this is causing a deadlock.
If the DMA transaction is canceled, then the callback might not run.
This leaves dma->tx_running set to 1 causing the channel to never
restart the channel.

I tried the following fix (and it works for me), but I'm not sure it's
the correct fix.

diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
index bdd26c9f34bd..58b914c20d98 100644
--- a/drivers/tty/serial/8250/8250_dma.c
+++ b/drivers/tty/serial/8250/8250_dma.c
@@ -162,7 +162,22 @@ void serial8250_tx_dma_flush(struct uart_8250_port *p)
         */
        dma->tx_size = 0;

+       /*
+        * We can't use `dmaengine_terminate_sync` because `uart_flush_buffer` is
+        * holding the uart port spinlock.
+        */
        dmaengine_terminate_async(dma->txchan);
+
+       /*
+        * The callback might or might not run. If it doesn't run, we need to ensure
+        * that `tx_running` is cleared so that we can schedule new transactions.
+        * If it does run, then the zombie callback will clear `tx_running` for us
+        * and perform a no-op since `tx_size` was cleared.
+        *
+        * In either case, we assume DMA transaction will be terminated or the
+        * callback will run before we issue a new `serial8250_tx_dma`.
+        */
+       dma->tx_running = 0;
 }

 int serial8250_rx_dma(struct uart_8250_port *p)

The problem is IIUC, that we need to either use dmaengine_terminate_sync
or dmaengine_synchronize so we can guarantee that the hardware is
stopped, the callback is done, and we can free-reuse the resources.
I think a better approach might be to add a `tx_flushing` bool and push
a job onto a work queue that calls `dmaengine_terminate_sync`, and then
sets `tx_running=0` and `tx_flushing=0`. I'm not sure what happens
though if something comes along and starts writing to the UART registers
while we are waiting for the DMA transaction to complete.

Thoughts?

---
Here is the debug trace I took that shows the problem:

< lots of tty_write omitted >
3250.941306 |    0)               |  tty_write() {
3250.941307 |    0)   0.174 us    |    tty_ldisc_ref_wait();
3250.941307 |    0)   0.154 us    |    tty_hung_up_p();
3250.941307 |    0)               |    tty_write_room() {
3250.941308 |    0)   0.211 us    |      uart_write_room();
3250.941308 |    0)   0.542 us    |    }
3250.941308 |    0)               |    uart_write() {
3250.941308 |    0)               |      serial8250_start_tx() {
3250.941309 |    0)               |        serial8250_tx_dma() {
3250.941309 |    0)               |          /* serial8250_tx_dma: tx_running => 1 */
3250.941309 |    0)   0.408 us    |        }
3250.941309 |    0)   0.714 us    |      }
3250.941309 |    0)   1.261 us    |    }
3250.941310 |    0)               |    tty_write_room() {
3250.941310 |    0)   0.217 us    |      uart_write_room();
3250.941310 |    0)   0.489 us    |    }
3250.941310 |    0)               |    uart_flush_chars() {
3250.941310 |    0)               |      uart_start() {
3250.941311 |    0)               |        serial8250_start_tx() {
3250.941311 |    0)               |          serial8250_tx_dma() {
3250.941311 |    0)               |            /* serial8250_tx_dma: tx_running => 1 */
3250.941311 |    0)   0.389 us    |          }
3250.941311 |    0)   0.681 us    |        }
3250.941312 |    0)   1.207 us    |      }
3250.941312 |    0)   1.488 us    |    }
3251.008825 |   10)               |  serial8250_interrupt() {
3251.008828 |   10)               |    serial8250_default_handle_irq() {
3251.008830 |   10) + 17.915 us   |      dw8250_serial_in32 [8250_dw]();
3251.008848 |   10)   0.207 us    |      serial8250_handle_irq();
3251.008849 |   10) + 20.208 us   |    }
3251.008849 |   10) + 27.505 us   |  }
3251.008862 |   10)               |  /* __dma_tx_complete: tx_running <= 0 */
3251.008863 |   10)               |  serial8250_tx_dma() {
3251.008864 |   10)               |  /* serial8250_tx_dma: tx_running => 0 */
3251.008872 |   10)               |  /* serial8250_tx_dma: tx_running <= 1 */
3251.008879 |   10)   9.410 us    |  }
3251.107685 |   10)               |  serial8250_interrupt() {
3251.107688 |   10)               |    serial8250_default_handle_irq() {
3251.107689 |   10) + 18.007 us   |      dw8250_serial_in32 [8250_dw]();
3251.107707 |   10)               |      serial8250_handle_irq() {
3251.107708 |   10)   1.847 us    |        dw8250_serial_in32 [8250_dw]();
3251.107713 |   10)   1.003 us    |        serial8250_rx_dma_flush();
3251.107714 |   10)               |        serial8250_read_char() {
3251.107715 |   10) + 17.452 us   |          dw8250_serial_in32 [8250_dw]();
3251.107732 |   10)   1.008 us    |          uart_insert_char();
3251.107734 |   10) + 19.136 us   |        }
3251.107734 |   10)   3.914 us    |        dw8250_serial_in32 [8250_dw]();
3251.107739 |   10)   6.789 us    |        tty_flip_buffer_push();
3251.107746 |   10)               |        serial8250_modem_status() {
3251.107746 |   10) + 17.556 us   |          dw8250_serial_in32 [8250_dw]();
3251.107764 |   10) + 18.039 us   |        }
3251.107764 |   10) + 56.710 us   |      }
3251.107764 |   10) + 76.870 us   |    }
3251.107765 |   10)               |    serial8250_default_handle_irq() {
3251.107765 |   10)   4.205 us    |      dw8250_serial_in32 [8250_dw]();
3251.107769 |   10)   0.219 us    |      serial8250_handle_irq();
3251.107770 |   10)   5.076 us    |    }
3251.107770 |   10) + 88.331 us   |  }
3251.107805 |    6)               |  tty_port_default_receive_buf() {
3251.107808 |    6)   1.377 us    |    tty_ldisc_ref();
3251.107810 |    6)               |    tty_ldisc_receive_buf() {
3251.107813 |    6)   0.639 us    |      tty_get_pgrp();
3251.107823 |    6)               |      tty_driver_flush_buffer() {
3251.107824 |    6)               |        uart_flush_buffer() {
3251.107825 |    6)               |          serial8250_flush_buffer() {
3251.107826 |    6)               |            serial8250_tx_dma_flush() {
3251.107829 |    6)               |              /* serial8250_tx_dma_flush: tx_running => 1 */
                                                 /* Notice the callback is never ran */
3251.108186 |    6) ! 360.293 us  |            }
3251.108186 |    6) ! 361.125 us  |          }
3251.108196 |    6)               |          tty_port_tty_wakeup() {
3251.108196 |    6)               |            tty_port_default_wakeup() {
3251.108196 |    6)   1.146 us    |              tty_wakeup();
3251.108198 |    6)   0.247 us    |              tty_kref_put();
3251.108198 |    6)   2.359 us    |            }
3251.108199 |    6)   2.987 us    |          }
3251.108199 |    6) ! 374.727 us  |        }
3251.108199 |    6) ! 375.679 us  |      }
3251.108202 |    6)               |      tty_write_room() {
3251.108203 |    6)   0.286 us    |        uart_write_room();
3251.108203 |    6)   1.074 us    |      }
3251.108204 |    6)               |      tty_put_char() {
3251.108204 |    6)   0.487 us    |        uart_put_char();
3251.108205 |    6)   1.177 us    |      }
3251.108205 |    6)               |      tty_put_char() {
3251.108205 |    6)   0.261 us    |        uart_put_char();
3251.108206 |    6)   0.677 us    |      }
3251.108206 |    6)               |      uart_flush_chars() {
3251.108207 |    6)               |        uart_start() {
3251.108208 |    6)               |          serial8250_start_tx() {
3251.108209 |    6)               |            serial8250_tx_dma() {
3251.108210 |    6)               |              /* serial8250_tx_dma: tx_running => 1 */
3251.108210 |    6)   1.194 us    |            }
3251.108211 |    6)   2.251 us    |          }
3251.108213 |    6)   5.829 us    |        }
3251.108213 |    6)   6.532 us    |      }
3251.108214 |    6) ! 404.132 us  |    }
3251.108214 |    6)   0.238 us    |    tty_ldisc_deref();
3251.108214 |    6) ! 411.672 us  |  }
3251.108260 |    0)   3.102 us    |    tty_ldisc_deref();
3251.108263 |    0) @ 166956.2 us |  }
3251.108270 |    0)   0.465 us    |  tty_audit_exit();
3251.109013 |   11)   2.680 us    |  tty_kref_put();
3251.109059 |   11)               |  tty_ioctl() {
3251.109060 |   11)               |    tty_jobctrl_ioctl() {
3251.109060 |   11)   0.622 us    |      __tty_check_change();
3251.109062 |   11)   2.450 us    |    }
3251.109063 |   11)   4.003 us    |  }
3251.109064 |   11)               |  tty_ioctl() {
3251.109064 |   11)   0.225 us    |    tty_jobctrl_ioctl();
3251.109065 |   11)   1.550 us    |    uart_ioctl();
3251.109067 |   11)   0.554 us    |    tty_ldisc_ref_wait();
3251.109069 |   11)               |    tty_mode_ioctl() {
3251.109069 |   11)               |      tty_check_change() {
3251.109070 |   11)   0.282 us    |        __tty_check_change();
3251.109070 |   11)   0.662 us    |      }
3251.109071 |   11)   0.477 us    |      tty_termios_input_baud_rate();
3251.109071 |   11)   0.214 us    |      tty_termios_baud_rate();
3251.109072 |   11)   0.483 us    |      uart_chars_in_buffer();
3251.109073 |   11)   0.247 us    |      uart_chars_in_buffer();
------------------------------------------
11)    sh-7708     =>   kworker-90
------------------------------------------

3251.686493 |   11)               |  serial8250_start_tx() {
3251.686496 |   11)               |    serial8250_tx_dma() {
3251.686500 |   11)               |      /* serial8250_tx_dma: tx_running => 1 */
3251.686501 |   11)   4.696 us    |    }
3251.686501 |   11) + 12.203 us   |  }
3252.311575 |   11)               |  serial8250_start_tx() {
3252.311579 |   11)               |    serial8250_tx_dma() {
3252.311582 |   11)               |      /* serial8250_tx_dma: tx_running => 1 */
3252.311583 |   11)   4.020 us    |    }
3252.311583 |   11) + 10.556 us   |  }
3252.936657 |   11)               |  serial8250_start_tx() {
3252.936661 |   11)               |    serial8250_tx_dma() {
3252.936664 |   11)               |      /* serial8250_tx_dma: tx_running => 1 */
3252.936665 |   11)   3.989 us    |    }
3252.936665 |   11) + 10.880 us   |  }

Thanks,
Raul
Re: [PATCH v3] serial: 8250: Fix fifo underflow on flush
Posted by Wentao Guan 11 months, 3 weeks ago
Hello John,

It seems strange that call 'dmaengine_terminate_async( **dma->rxchan** );' in
'serial8250_ **tx** _dma_flush' during code review.
I am not a professional reviewer in this module, could you explaim the change?

BRs
Wentao Guan
Re: [PATCH v3] serial: 8250: Fix fifo underflow on flush
Posted by John Keeping 11 months, 2 weeks ago
On Sat, Feb 22, 2025 at 09:26:27PM +0800, Wentao Guan wrote:
> It seems strange that call 'dmaengine_terminate_async( **dma->rxchan** );' in
> 'serial8250_ **tx** _dma_flush' during code review.
> I am not a professional reviewer in this module, could you explaim the change?

It's a typo!  Nice catch, I will send a patch to fix this.


Regards,
John