drivers/tty/serial/8250/8250_port.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
When the PSLVERR_RESP_EN parameter is set to 1, the device generates
an error response if an attempt is made to read an empty RBR (Receive
Buffer Register) while the FIFO is enabled.
In serial8250_do_startup, calling serial_port_out(port, UART_LCR,
UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes
dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter
function enables the FIFO via serial_out(p, UART_FCR, p->fcr).
Execution proceeds to the dont_test_tx_en label:
...
serial_port_in(port, UART_RX);
This satisfies the PSLVERR trigger condition.
Because another CPU(e.g., using printk) is accessing the UART (UART
is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) ==
(lcr & ~UART_LCR_SPAR), causing it to enter dw8250_force_idle().
To resolve this issue, relevant serial_port_out operations should be
placed in a critical section, and UART_RX data should only be read
when the UART_LSR DR bit is set.
Panic message:
[ 0.442336] Oops - unknown exception [#1]
[ 0.442337] Modules linked in:
[ 0.442339] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.12.13-00102-gf1f43e345877 #1
[ 0.442342] Tainted: [W]=WARN
[ 0.442343] epc : dw8250_serial_in32+0x1e/0x4a
[ 0.442351] ra : serial8250_do_startup+0x2c8/0x88e
[ 0.442354] epc : ffffffff8064efca ra : ffffffff8064af28 sp : ffff8f8000103990
[ 0.442355] gp : ffffffff815bad28 tp : ffffaf807e36d400 t0 : ffffaf80804cf080
[ 0.442356] t1 : 0000000000000001 t2 : 0000000000000000 s0 : ffff8f80001039a0
[ 0.442358] s1 : ffffffff81626fc0 a0 : ffffffff81626fc0 a1 : 0000000000000000
[ 0.442359] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff81626fc0
[ 0.442360] a5 : ffff8f800012d900 a6 : 000000000000000f a7 : 000000000fc648c1
[ 0.442361] s2 : 0000000000000000 s3 : 0000000200000022 s4 : 0000000000000000
[ 0.442362] s5 : ffffffff81626fc0 s6 : ffffaf8085227000 s7 : ffffffff81073c58
[ 0.442363] s8 : 0000000000500000 s9 : ffffaf80851a5a60 s10: ffffaf80851a5a60
[ 0.442365] s11: ffffffff80e85980 t3 : ffffaf807e324600 t4 : 0000000000000002
[ 0.442365] t5 : 0000000000000003 t6 : ffffaf80804cf072
[ 0.442366] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000013
[ 0.442368] [<ffffffff8064efca>] dw8250_serial_in32+0x1e/0x4a
[ 0.442371] [<ffffffff8064af28>] serial8250_do_startup+0x2c8/0x88e
[ 0.442373] [<ffffffff8064b514>] serial8250_startup+0x26/0x2e
[ 0.442375] [<ffffffff806428a2>] uart_startup+0x13a/0x308
[ 0.442377] [<ffffffff80642aa4>] uart_port_activate+0x34/0x50
[ 0.442378] [<ffffffff8062ab6a>] tty_port_open+0xb4/0x110
[ 0.442383] [<ffffffff8063f548>] uart_open+0x22/0x36
[ 0.442389] [<ffffffff806234b4>] tty_open+0x1be/0x5e6
[ 0.442396] [<ffffffff802f2d52>] chrdev_open+0x10a/0x2a8
[ 0.442400] [<ffffffff802e7ab6>] do_dentry_open+0xf6/0x34e
[ 0.442405] [<ffffffff802e9456>] vfs_open+0x2a/0xb4
[ 0.442408] [<ffffffff80300124>] path_openat+0x676/0xf36
[ 0.442410] [<ffffffff80300a58>] do_filp_open+0x74/0xfa
[ 0.442412] [<ffffffff802e9900>] file_open_name+0x84/0x144
[ 0.442414] [<ffffffff802e99f6>] filp_open+0x36/0x54
[ 0.442416] [<ffffffff80a01232>] console_on_rootfs+0x26/0x70
[ 0.442420] [<ffffffff80a0154e>] kernel_init_freeable+0x2d2/0x30e
[ 0.442422] [<ffffffff8099c730>] kernel_init+0x2a/0x15e
[ 0.442427] [<ffffffff809a7666>] ret_from_fork+0xe/0x1c
[ 0.442430] Code: e022 e406 0800 4683 0c15 691c 872a 96bb 00d5 97b6 (439c) 851b
[ 0.442432] ---[ end trace 0000000000000000 ]---
[ 0.442434] Kernel panic - not syncing: Fatal exception in interrupt
[ 0.442435] SMP: stopping secondary CPUs
[ 0.451111] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
drivers/tty/serial/8250/8250_port.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 3f256e96c722..6909c81109db 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -2264,13 +2264,16 @@ int serial8250_do_startup(struct uart_port *port)
* Clear the FIFO buffers and disable them.
* (they will be reenabled in set_termios())
*/
+ uart_port_lock_irqsave(port, &flags);
serial8250_clear_fifos(up);
+ uart_port_unlock_irqrestore(port, flags);
/*
* Clear the interrupt registers.
*/
- serial_port_in(port, UART_LSR);
- serial_port_in(port, UART_RX);
+ lsr = serial_port_in(port, UART_LSR);
+ if (lsr & UART_LSR_DR)
+ serial_port_in(port, UART_RX);
serial_port_in(port, UART_IIR);
serial_port_in(port, UART_MSR);
@@ -2380,9 +2383,9 @@ int serial8250_do_startup(struct uart_port *port)
/*
* Now, initialize the UART
*/
+ uart_port_lock_irqsave(port, &flags);
serial_port_out(port, UART_LCR, UART_LCR_WLEN8);
- uart_port_lock_irqsave(port, &flags);
if (up->port.flags & UPF_FOURPORT) {
if (!up->port.irq)
up->port.mctrl |= TIOCM_OUT1;
@@ -2435,8 +2438,9 @@ int serial8250_do_startup(struct uart_port *port)
* saved flags to avoid getting false values from polling
* routines or the previous session.
*/
- serial_port_in(port, UART_LSR);
- serial_port_in(port, UART_RX);
+ lsr = serial_port_in(port, UART_LSR);
+ if (lsr & UART_LSR_DR)
+ serial_port_in(port, UART_RX);
serial_port_in(port, UART_IIR);
serial_port_in(port, UART_MSR);
up->lsr_saved_flags = 0;
--
2.39.2
On 2025-04-03, Yunhui Cui <cuiyunhui@bytedance.com> wrote: > When the PSLVERR_RESP_EN parameter is set to 1, the device generates > an error response if an attempt is made to read an empty RBR (Receive > Buffer Register) while the FIFO is enabled. > > In serial8250_do_startup, calling serial_port_out(port, UART_LCR, > UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes > dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter > function enables the FIFO via serial_out(p, UART_FCR, p->fcr). > Execution proceeds to the dont_test_tx_en label: > ... > serial_port_in(port, UART_RX); > This satisfies the PSLVERR trigger condition. > > Because another CPU(e.g., using printk) is accessing the UART (UART > is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) == > (lcr & ~UART_LCR_SPAR), causing it to enter dw8250_force_idle(). Didn't this[0] patch resolve this exact issue? John Ogness [0] https://lore.kernel.org/lkml/20220713131722.2316829-1-vamshigajjela@google.com
Hi John, On Thu, Apr 3, 2025 at 7:58 PM John Ogness <john.ogness@linutronix.de> wrote: > > On 2025-04-03, Yunhui Cui <cuiyunhui@bytedance.com> wrote: > > When the PSLVERR_RESP_EN parameter is set to 1, the device generates > > an error response if an attempt is made to read an empty RBR (Receive > > Buffer Register) while the FIFO is enabled. > > > > In serial8250_do_startup, calling serial_port_out(port, UART_LCR, > > UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes > > dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter > > function enables the FIFO via serial_out(p, UART_FCR, p->fcr). > > Execution proceeds to the dont_test_tx_en label: > > ... > > serial_port_in(port, UART_RX); > > This satisfies the PSLVERR trigger condition. > > > > Because another CPU(e.g., using printk) is accessing the UART (UART > > is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) == > > (lcr & ~UART_LCR_SPAR), causing it to enter dw8250_force_idle(). > > Didn't this[0] patch resolve this exact issue? > > John Ogness > > [0] https://lore.kernel.org/lkml/20220713131722.2316829-1-vamshigajjela@google.com No, these are two separate issues. This[0] patch is necessary, as expressed in this comment: /* * With PSLVERR_RESP_EN parameter set to 1, the device generates an * error response when an attempt to read an empty RBR with FIFO * enabled. */ The current patch addresses the following scenario: cpuA is accessing the UART via printk(), causing the UART to be busy. cpuB follows the CallTrace path: -serial8250_do_startup() --serial_port_out(port, UART_LCR, UART_LCR_WLEN8); ---dw8250_serial_out32 ----dw8250_check_lcr -----dw8250_force_idle (triggered by UART busy) ------serial8250_clear_and_reinit_fifos -------serial_out(p, UART_FCR, p->fcr); (enables FIFO here) cpuB proceeds to the dont_test_tx_en label: ... serial_port_in(port, UART_RX); //FIFO is enabled, and the UART has no data to read, causing the device to generate a PSLVERR error and panic. Our solution: Relevant serial_port_out operations should be placed in a critical section. Before reading UART_RX, check if data is available (e.g., by verifying the UART_LSR DR bit is set). Thanks, Yunhui
On 2025-04-03, yunhui cui <cuiyunhui@bytedance.com> wrote: >>> When the PSLVERR_RESP_EN parameter is set to 1, the device generates >>> an error response if an attempt is made to read an empty RBR (Receive >>> Buffer Register) while the FIFO is enabled. >>> >>> In serial8250_do_startup, calling serial_port_out(port, UART_LCR, >>> UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes >>> dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter >>> function enables the FIFO via serial_out(p, UART_FCR, p->fcr). >>> Execution proceeds to the dont_test_tx_en label: >>> ... >>> serial_port_in(port, UART_RX); >>> This satisfies the PSLVERR trigger condition. >>> >>> Because another CPU(e.g., using printk) is accessing the UART (UART >>> is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) == >>> (lcr & ~UART_LCR_SPAR), causing it to enter dw8250_force_idle(). >> >> Didn't this[0] patch resolve this exact issue? >> >> John Ogness >> >> [0] https://lore.kernel.org/lkml/20220713131722.2316829-1-vamshigajjela@google.com > > No, these are two separate issues. This[0] patch is necessary, as > expressed in this comment: > > /* > * With PSLVERR_RESP_EN parameter set to 1, the device generates an > * error response when an attempt to read an empty RBR with FIFO > * enabled. > */ > > The current patch addresses the following scenario: > > cpuA is accessing the UART via printk(), causing the UART to be busy. > cpuB follows the CallTrace path: > -serial8250_do_startup() > --serial_port_out(port, UART_LCR, UART_LCR_WLEN8); > ---dw8250_serial_out32 > ----dw8250_check_lcr > -----dw8250_force_idle (triggered by UART busy) > ------serial8250_clear_and_reinit_fifos > -------serial_out(p, UART_FCR, p->fcr); (enables FIFO here) > cpuB proceeds to the dont_test_tx_en label: > ... > serial_port_in(port, UART_RX); //FIFO is enabled, and the UART has > no data to read, causing the device to generate a PSLVERR error and > panic. > > Our solution: > Relevant serial_port_out operations should be placed in a critical section. > Before reading UART_RX, check if data is available (e.g., by verifying > the UART_LSR DR bit is set). OK, now I see. The problem is the explicit reads of UART_RX near the end of serial8250_do_startup(). >> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c >> index 3f256e96c722..6909c81109db 100644 >> --- a/drivers/tty/serial/8250/8250_port.c >> +++ b/drivers/tty/serial/8250/8250_port.c >> @@ -2264,13 +2264,16 @@ int serial8250_do_startup(struct uart_port *port) >> * Clear the FIFO buffers and disable them. >> * (they will be reenabled in set_termios()) >> */ >> + uart_port_lock_irqsave(port, &flags); >> serial8250_clear_fifos(up); >> + uart_port_unlock_irqrestore(port, flags); Can you clarify why serial8250_clear_fifos() needs to be in a critical section? serial8250_do_shutdown() and do_set_rxtrig() also call serial8250_clear_fifos() without holding the port lock. >>> /* >>> * Clear the interrupt registers. >>> */ >>> - serial_port_in(port, UART_LSR); >>> - serial_port_in(port, UART_RX); >>> + lsr = serial_port_in(port, UART_LSR); >>> + if (lsr & UART_LSR_DR) >>> + serial_port_in(port, UART_RX); Do we care about the unchecked UART_RX in serial8250_do_shutdown()? /* * Read data port to reset things, and then unlink from * the IRQ chain. */ serial_port_in(port, UART_RX); serial8250_rpm_put(up); up->ops->release_irq(up); } Otherwise all other UART_RX reads are either checking UART_LSR_DR first or are under the port lock. John
Hi John, On Thu, Apr 3, 2025 at 9:46 PM John Ogness <john.ogness@linutronix.de> wrote: > > On 2025-04-03, yunhui cui <cuiyunhui@bytedance.com> wrote: > >>> When the PSLVERR_RESP_EN parameter is set to 1, the device generates > >>> an error response if an attempt is made to read an empty RBR (Receive > >>> Buffer Register) while the FIFO is enabled. > >>> > >>> In serial8250_do_startup, calling serial_port_out(port, UART_LCR, > >>> UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes > >>> dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter > >>> function enables the FIFO via serial_out(p, UART_FCR, p->fcr). > >>> Execution proceeds to the dont_test_tx_en label: > >>> ... > >>> serial_port_in(port, UART_RX); > >>> This satisfies the PSLVERR trigger condition. > >>> > >>> Because another CPU(e.g., using printk) is accessing the UART (UART > >>> is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) == > >>> (lcr & ~UART_LCR_SPAR), causing it to enter dw8250_force_idle(). > >> > >> Didn't this[0] patch resolve this exact issue? > >> > >> John Ogness > >> > >> [0] https://lore.kernel.org/lkml/20220713131722.2316829-1-vamshigajjela@google.com > > > > No, these are two separate issues. This[0] patch is necessary, as > > expressed in this comment: > > > > /* > > * With PSLVERR_RESP_EN parameter set to 1, the device generates an > > * error response when an attempt to read an empty RBR with FIFO > > * enabled. > > */ > > > > The current patch addresses the following scenario: > > > > cpuA is accessing the UART via printk(), causing the UART to be busy. > > cpuB follows the CallTrace path: > > -serial8250_do_startup() > > --serial_port_out(port, UART_LCR, UART_LCR_WLEN8); > > ---dw8250_serial_out32 > > ----dw8250_check_lcr > > -----dw8250_force_idle (triggered by UART busy) > > ------serial8250_clear_and_reinit_fifos > > -------serial_out(p, UART_FCR, p->fcr); (enables FIFO here) > > cpuB proceeds to the dont_test_tx_en label: > > ... > > serial_port_in(port, UART_RX); //FIFO is enabled, and the UART has > > no data to read, causing the device to generate a PSLVERR error and > > panic. > > > > Our solution: > > Relevant serial_port_out operations should be placed in a critical section. > > Before reading UART_RX, check if data is available (e.g., by verifying > > the UART_LSR DR bit is set). > > OK, now I see. The problem is the explicit reads of UART_RX near the end > of serial8250_do_startup(). > > >> diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c > >> index 3f256e96c722..6909c81109db 100644 > >> --- a/drivers/tty/serial/8250/8250_port.c > >> +++ b/drivers/tty/serial/8250/8250_port.c > >> @@ -2264,13 +2264,16 @@ int serial8250_do_startup(struct uart_port *port) > >> * Clear the FIFO buffers and disable them. > >> * (they will be reenabled in set_termios()) > >> */ > >> + uart_port_lock_irqsave(port, &flags); > >> serial8250_clear_fifos(up); > >> + uart_port_unlock_irqrestore(port, flags); > > Can you clarify why serial8250_clear_fifos() needs to be in a critical > section? There are two aspects. Firstly, if the lock is not held, the following situation may also occur when serial8250_clear_fifos() is called: ---dw8250_serial_out32 ----dw8250_check_lcr -----dw8250_force_idle (triggered by UART busy) ------serial8250_clear_and_reinit_fifos -------serial_out(p, UART_FCR, p->fcr); (enables FIFO here) This in itself goes against the semantics of clear_fifo. Secondly, if a CPU is accessing the UART normally and the current CPU suddenly clears the FIFO, it may cause problems. > > serial8250_do_shutdown() and do_set_rxtrig() also call > serial8250_clear_fifos() without holding the port lock. > > >>> /* > >>> * Clear the interrupt registers. > >>> */ > >>> - serial_port_in(port, UART_LSR); > >>> - serial_port_in(port, UART_RX); > >>> + lsr = serial_port_in(port, UART_LSR); > >>> + if (lsr & UART_LSR_DR) > >>> + serial_port_in(port, UART_RX); > > Do we care about the unchecked UART_RX in serial8250_do_shutdown()? I understand that it is required. > > /* > * Read data port to reset things, and then unlink from > * the IRQ chain. > */ > serial_port_in(port, UART_RX); > serial8250_rpm_put(up); > > up->ops->release_irq(up); > } > > Otherwise all other UART_RX reads are either checking UART_LSR_DR first > or are under the port lock. Agree > > John Thanks, Yunhui
On Thu, Apr 03, 2025 at 02:03:58PM +0206, John Ogness wrote: > On 2025-04-03, Yunhui Cui <cuiyunhui@bytedance.com> wrote: > > When the PSLVERR_RESP_EN parameter is set to 1, the device generates > > an error response if an attempt is made to read an empty RBR (Receive > > Buffer Register) while the FIFO is enabled. > > > > In serial8250_do_startup, calling serial_port_out(port, UART_LCR, > > UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes > > dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter > > function enables the FIFO via serial_out(p, UART_FCR, p->fcr). > > Execution proceeds to the dont_test_tx_en label: > > ... > > serial_port_in(port, UART_RX); > > This satisfies the PSLVERR trigger condition. > > > > Because another CPU(e.g., using printk) is accessing the UART (UART > > is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) == > > (lcr & ~UART_LCR_SPAR), causing it to enter dw8250_force_idle(). > > Didn't this[0] patch resolve this exact issue? I see it applied (in v6.0), but version in the traceback seems from v6.12. I think this is about race condition so only locking parts should suffice. > [0] https://lore.kernel.org/lkml/20220713131722.2316829-1-vamshigajjela@google.com -- With Best Regards, Andy Shevchenko
On Thu, Apr 03, 2025 at 05:03:36PM +0800, Yunhui Cui wrote: > When the PSLVERR_RESP_EN parameter is set to 1, the device generates > an error response if an attempt is made to read an empty RBR (Receive > Buffer Register) while the FIFO is enabled. > > In serial8250_do_startup, calling serial_port_out(port, UART_LCR, serial8250_do_startup() > UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes > dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter > function enables the FIFO via serial_out(p, UART_FCR, p->fcr). > Execution proceeds to the dont_test_tx_en label: > ... > serial_port_in(port, UART_RX); > This satisfies the PSLVERR trigger condition. > > Because another CPU(e.g., using printk) is accessing the UART (UART printk() > is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) == > (lcr & ~UART_LCR_SPAR), causing it to enter dw8250_force_idle(). > > To resolve this issue, relevant serial_port_out operations should be serial_port_out() > placed in a critical section, and UART_RX data should only be read > when the UART_LSR DR bit is set. The last one is made in the common code, are you sure that all supported UARTs will be okay with such a change? > Panic message: Please, read this https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages and act accordingly. > [ 0.442336] Oops - unknown exception [#1] > [ 0.442337] Modules linked in: > [ 0.442339] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.12.13-00102-gf1f43e345877 #1 Is it still reproducible on v6.14 (and soon v6.15-rc1)? > [ 0.442342] Tainted: [W]=WARN > [ 0.442343] epc : dw8250_serial_in32+0x1e/0x4a > [ 0.442351] ra : serial8250_do_startup+0x2c8/0x88e > [ 0.442354] epc : ffffffff8064efca ra : ffffffff8064af28 sp : ffff8f8000103990 > [ 0.442355] gp : ffffffff815bad28 tp : ffffaf807e36d400 t0 : ffffaf80804cf080 > [ 0.442356] t1 : 0000000000000001 t2 : 0000000000000000 s0 : ffff8f80001039a0 > [ 0.442358] s1 : ffffffff81626fc0 a0 : ffffffff81626fc0 a1 : 0000000000000000 > [ 0.442359] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff81626fc0 > [ 0.442360] a5 : ffff8f800012d900 a6 : 000000000000000f a7 : 000000000fc648c1 > [ 0.442361] s2 : 0000000000000000 s3 : 0000000200000022 s4 : 0000000000000000 > [ 0.442362] s5 : ffffffff81626fc0 s6 : ffffaf8085227000 s7 : ffffffff81073c58 > [ 0.442363] s8 : 0000000000500000 s9 : ffffaf80851a5a60 s10: ffffaf80851a5a60 > [ 0.442365] s11: ffffffff80e85980 t3 : ffffaf807e324600 t4 : 0000000000000002 > [ 0.442365] t5 : 0000000000000003 t6 : ffffaf80804cf072 > [ 0.442366] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000013 > [ 0.442368] [<ffffffff8064efca>] dw8250_serial_in32+0x1e/0x4a > [ 0.442371] [<ffffffff8064af28>] serial8250_do_startup+0x2c8/0x88e > [ 0.442373] [<ffffffff8064b514>] serial8250_startup+0x26/0x2e > [ 0.442375] [<ffffffff806428a2>] uart_startup+0x13a/0x308 > [ 0.442377] [<ffffffff80642aa4>] uart_port_activate+0x34/0x50 > [ 0.442378] [<ffffffff8062ab6a>] tty_port_open+0xb4/0x110 > [ 0.442383] [<ffffffff8063f548>] uart_open+0x22/0x36 > [ 0.442389] [<ffffffff806234b4>] tty_open+0x1be/0x5e6 > [ 0.442396] [<ffffffff802f2d52>] chrdev_open+0x10a/0x2a8 > [ 0.442400] [<ffffffff802e7ab6>] do_dentry_open+0xf6/0x34e > [ 0.442405] [<ffffffff802e9456>] vfs_open+0x2a/0xb4 > [ 0.442408] [<ffffffff80300124>] path_openat+0x676/0xf36 > [ 0.442410] [<ffffffff80300a58>] do_filp_open+0x74/0xfa > [ 0.442412] [<ffffffff802e9900>] file_open_name+0x84/0x144 > [ 0.442414] [<ffffffff802e99f6>] filp_open+0x36/0x54 > [ 0.442416] [<ffffffff80a01232>] console_on_rootfs+0x26/0x70 > [ 0.442420] [<ffffffff80a0154e>] kernel_init_freeable+0x2d2/0x30e > [ 0.442422] [<ffffffff8099c730>] kernel_init+0x2a/0x15e > [ 0.442427] [<ffffffff809a7666>] ret_from_fork+0xe/0x1c > [ 0.442430] Code: e022 e406 0800 4683 0c15 691c 872a 96bb 00d5 97b6 (439c) 851b > [ 0.442432] ---[ end trace 0000000000000000 ]--- > [ 0.442434] Kernel panic - not syncing: Fatal exception in interrupt > [ 0.442435] SMP: stopping secondary CPUs > [ 0.451111] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes tag? Cc to stable@? ... > /* > * Now, initialize the UART > */ + Blank line. > + uart_port_lock_irqsave(port, &flags); > serial_port_out(port, UART_LCR, UART_LCR_WLEN8); > > - uart_port_lock_irqsave(port, &flags); > if (up->port.flags & UPF_FOURPORT) { > if (!up->port.irq) > up->port.mctrl |= TIOCM_OUT1; -- With Best Regards, Andy Shevchenko
Hi Andy, On Thu, Apr 3, 2025 at 7:36 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Thu, Apr 03, 2025 at 05:03:36PM +0800, Yunhui Cui wrote: > > When the PSLVERR_RESP_EN parameter is set to 1, the device generates > > an error response if an attempt is made to read an empty RBR (Receive > > Buffer Register) while the FIFO is enabled. > > > > In serial8250_do_startup, calling serial_port_out(port, UART_LCR, > > serial8250_do_startup() > > > UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes > > dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter > > function enables the FIFO via serial_out(p, UART_FCR, p->fcr). > > Execution proceeds to the dont_test_tx_en label: > > ... > > serial_port_in(port, UART_RX); > > This satisfies the PSLVERR trigger condition. > > > > Because another CPU(e.g., using printk) is accessing the UART (UART > > printk() Okay. > > > is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) == > > (lcr & ~UART_LCR_SPAR), causing it to enter dw8250_force_idle(). > > > > To resolve this issue, relevant serial_port_out operations should be > > serial_port_out() Okay. > > > placed in a critical section, and UART_RX data should only be read > > when the UART_LSR DR bit is set. > > The last one is made in the common code, are you sure that all supported UARTs > will be okay with such a change? This change enhances code robustness without being intrusive. > > > Panic message: > > Please, read this > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages > and act accordingly. Okay, I'll update the next version to follow the guideline: 'Avoid directly copying full dmesg output (e.g., timestamps, registers, and stack dumps); instead, extract the critical call chain.' > > > [ 0.442336] Oops - unknown exception [#1] > > [ 0.442337] Modules linked in: > > [ 0.442339] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.12.13-00102-gf1f43e345877 #1 > > Is it still reproducible on v6.14 (and soon v6.15-rc1)? This is not a consistently reproducible issue, but even without this fix, I believe the issue still exists. > > > [ 0.442342] Tainted: [W]=WARN > > [ 0.442343] epc : dw8250_serial_in32+0x1e/0x4a > > [ 0.442351] ra : serial8250_do_startup+0x2c8/0x88e > > [ 0.442354] epc : ffffffff8064efca ra : ffffffff8064af28 sp : ffff8f8000103990 > > [ 0.442355] gp : ffffffff815bad28 tp : ffffaf807e36d400 t0 : ffffaf80804cf080 > > [ 0.442356] t1 : 0000000000000001 t2 : 0000000000000000 s0 : ffff8f80001039a0 > > [ 0.442358] s1 : ffffffff81626fc0 a0 : ffffffff81626fc0 a1 : 0000000000000000 > > [ 0.442359] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff81626fc0 > > [ 0.442360] a5 : ffff8f800012d900 a6 : 000000000000000f a7 : 000000000fc648c1 > > [ 0.442361] s2 : 0000000000000000 s3 : 0000000200000022 s4 : 0000000000000000 > > [ 0.442362] s5 : ffffffff81626fc0 s6 : ffffaf8085227000 s7 : ffffffff81073c58 > > [ 0.442363] s8 : 0000000000500000 s9 : ffffaf80851a5a60 s10: ffffaf80851a5a60 > > [ 0.442365] s11: ffffffff80e85980 t3 : ffffaf807e324600 t4 : 0000000000000002 > > [ 0.442365] t5 : 0000000000000003 t6 : ffffaf80804cf072 > > [ 0.442366] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000013 > > [ 0.442368] [<ffffffff8064efca>] dw8250_serial_in32+0x1e/0x4a > > [ 0.442371] [<ffffffff8064af28>] serial8250_do_startup+0x2c8/0x88e > > [ 0.442373] [<ffffffff8064b514>] serial8250_startup+0x26/0x2e > > [ 0.442375] [<ffffffff806428a2>] uart_startup+0x13a/0x308 > > [ 0.442377] [<ffffffff80642aa4>] uart_port_activate+0x34/0x50 > > [ 0.442378] [<ffffffff8062ab6a>] tty_port_open+0xb4/0x110 > > [ 0.442383] [<ffffffff8063f548>] uart_open+0x22/0x36 > > [ 0.442389] [<ffffffff806234b4>] tty_open+0x1be/0x5e6 > > [ 0.442396] [<ffffffff802f2d52>] chrdev_open+0x10a/0x2a8 > > [ 0.442400] [<ffffffff802e7ab6>] do_dentry_open+0xf6/0x34e > > [ 0.442405] [<ffffffff802e9456>] vfs_open+0x2a/0xb4 > > [ 0.442408] [<ffffffff80300124>] path_openat+0x676/0xf36 > > [ 0.442410] [<ffffffff80300a58>] do_filp_open+0x74/0xfa > > [ 0.442412] [<ffffffff802e9900>] file_open_name+0x84/0x144 > > [ 0.442414] [<ffffffff802e99f6>] filp_open+0x36/0x54 > > [ 0.442416] [<ffffffff80a01232>] console_on_rootfs+0x26/0x70 > > [ 0.442420] [<ffffffff80a0154e>] kernel_init_freeable+0x2d2/0x30e > > [ 0.442422] [<ffffffff8099c730>] kernel_init+0x2a/0x15e > > [ 0.442427] [<ffffffff809a7666>] ret_from_fork+0xe/0x1c > > [ 0.442430] Code: e022 e406 0800 4683 0c15 691c 872a 96bb 00d5 97b6 (439c) 851b > > [ 0.442432] ---[ end trace 0000000000000000 ]--- > > [ 0.442434] Kernel panic - not syncing: Fatal exception in interrupt > > [ 0.442435] SMP: stopping secondary CPUs > > [ 0.451111] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- > > Fixes tag? > Cc to stable@? Okay. > > ... > > > /* > > * Now, initialize the UART > > */ > > + Blank line. Okay. > > > + uart_port_lock_irqsave(port, &flags); > > serial_port_out(port, UART_LCR, UART_LCR_WLEN8); > > > > - uart_port_lock_irqsave(port, &flags); > > if (up->port.flags & UPF_FOURPORT) { > > if (!up->port.irq) > > up->port.mctrl |= TIOCM_OUT1; > > -- > With Best Regards, > Andy Shevchenko > > Thanks, Yunhui
On Fri, Apr 04, 2025 at 10:44:09AM +0800, yunhui cui wrote: > On Thu, Apr 3, 2025 at 7:36 PM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > On Thu, Apr 03, 2025 at 05:03:36PM +0800, Yunhui Cui wrote: ... > > > To resolve this issue, relevant serial_port_out operations should be > > > > serial_port_out() > > Okay. > > > > > > placed in a critical section, and UART_RX data should only be read > > > when the UART_LSR DR bit is set. > > > > The last one is made in the common code, are you sure that all supported UARTs > > will be okay with such a change? > > This change enhances code robustness without being intrusive. It is intrusive as it touches the core part affecting basically _all_ of the 8250-based drivers. Yes, it's small, but still it needs to be done carefully with commit message pointing out to the other 8250 datasheets to show that this is _not_ DW specific change. ... > > > Panic message: > > > > Please, read this > > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages > > and act accordingly. > > Okay, I'll update the next version to follow the guideline: 'Avoid > directly copying full dmesg output (e.g., timestamps, registers, and > stack dumps); instead, extract the critical call chain.' and make it short, e.g. ~3-5 lines only. -- With Best Regards, Andy Shevchenko
On Thu, Apr 03, 2025 at 02:36:16PM +0300, Andy Shevchenko wrote: > On Thu, Apr 03, 2025 at 05:03:36PM +0800, Yunhui Cui wrote: A couple of more questions here: 1) what is the DW IP version and where did you get the PSLVERR_RESP_EN parameter from? 2) what is the setting of the UART_16550_COMPATIBLE parameter? > > When the PSLVERR_RESP_EN parameter is set to 1, the device generates > > an error response if an attempt is made to read an empty RBR (Receive > > Buffer Register) while the FIFO is enabled. > > > > In serial8250_do_startup, calling serial_port_out(port, UART_LCR, > > serial8250_do_startup() > > > UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes > > dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter > > function enables the FIFO via serial_out(p, UART_FCR, p->fcr). > > Execution proceeds to the dont_test_tx_en label: > > ... > > serial_port_in(port, UART_RX); > > This satisfies the PSLVERR trigger condition. > > > > Because another CPU(e.g., using printk) is accessing the UART (UART > > printk() > > > is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) == > > (lcr & ~UART_LCR_SPAR), causing it to enter dw8250_force_idle(). > > > > To resolve this issue, relevant serial_port_out operations should be > > serial_port_out() > > > placed in a critical section, and UART_RX data should only be read > > when the UART_LSR DR bit is set. > > The last one is made in the common code, are you sure that all supported UARTs > will be okay with such a change? > > > Panic message: > > Please, read this > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages > and act accordingly. > > > [ 0.442336] Oops - unknown exception [#1] > > [ 0.442337] Modules linked in: > > [ 0.442339] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.12.13-00102-gf1f43e345877 #1 > > Is it still reproducible on v6.14 (and soon v6.15-rc1)? > > > [ 0.442342] Tainted: [W]=WARN > > [ 0.442343] epc : dw8250_serial_in32+0x1e/0x4a > > [ 0.442351] ra : serial8250_do_startup+0x2c8/0x88e > > [ 0.442354] epc : ffffffff8064efca ra : ffffffff8064af28 sp : ffff8f8000103990 > > [ 0.442355] gp : ffffffff815bad28 tp : ffffaf807e36d400 t0 : ffffaf80804cf080 > > [ 0.442356] t1 : 0000000000000001 t2 : 0000000000000000 s0 : ffff8f80001039a0 > > [ 0.442358] s1 : ffffffff81626fc0 a0 : ffffffff81626fc0 a1 : 0000000000000000 > > [ 0.442359] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff81626fc0 > > [ 0.442360] a5 : ffff8f800012d900 a6 : 000000000000000f a7 : 000000000fc648c1 > > [ 0.442361] s2 : 0000000000000000 s3 : 0000000200000022 s4 : 0000000000000000 > > [ 0.442362] s5 : ffffffff81626fc0 s6 : ffffaf8085227000 s7 : ffffffff81073c58 > > [ 0.442363] s8 : 0000000000500000 s9 : ffffaf80851a5a60 s10: ffffaf80851a5a60 > > [ 0.442365] s11: ffffffff80e85980 t3 : ffffaf807e324600 t4 : 0000000000000002 > > [ 0.442365] t5 : 0000000000000003 t6 : ffffaf80804cf072 > > [ 0.442366] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000013 > > [ 0.442368] [<ffffffff8064efca>] dw8250_serial_in32+0x1e/0x4a > > [ 0.442371] [<ffffffff8064af28>] serial8250_do_startup+0x2c8/0x88e > > [ 0.442373] [<ffffffff8064b514>] serial8250_startup+0x26/0x2e > > [ 0.442375] [<ffffffff806428a2>] uart_startup+0x13a/0x308 > > [ 0.442377] [<ffffffff80642aa4>] uart_port_activate+0x34/0x50 > > [ 0.442378] [<ffffffff8062ab6a>] tty_port_open+0xb4/0x110 > > [ 0.442383] [<ffffffff8063f548>] uart_open+0x22/0x36 > > [ 0.442389] [<ffffffff806234b4>] tty_open+0x1be/0x5e6 > > [ 0.442396] [<ffffffff802f2d52>] chrdev_open+0x10a/0x2a8 > > [ 0.442400] [<ffffffff802e7ab6>] do_dentry_open+0xf6/0x34e > > [ 0.442405] [<ffffffff802e9456>] vfs_open+0x2a/0xb4 > > [ 0.442408] [<ffffffff80300124>] path_openat+0x676/0xf36 > > [ 0.442410] [<ffffffff80300a58>] do_filp_open+0x74/0xfa > > [ 0.442412] [<ffffffff802e9900>] file_open_name+0x84/0x144 > > [ 0.442414] [<ffffffff802e99f6>] filp_open+0x36/0x54 > > [ 0.442416] [<ffffffff80a01232>] console_on_rootfs+0x26/0x70 > > [ 0.442420] [<ffffffff80a0154e>] kernel_init_freeable+0x2d2/0x30e > > [ 0.442422] [<ffffffff8099c730>] kernel_init+0x2a/0x15e > > [ 0.442427] [<ffffffff809a7666>] ret_from_fork+0xe/0x1c > > [ 0.442430] Code: e022 e406 0800 4683 0c15 691c 872a 96bb 00d5 97b6 (439c) 851b > > [ 0.442432] ---[ end trace 0000000000000000 ]--- > > [ 0.442434] Kernel panic - not syncing: Fatal exception in interrupt > > [ 0.442435] SMP: stopping secondary CPUs > > [ 0.451111] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- > > Fixes tag? > Cc to stable@? > > ... > > > /* > > * Now, initialize the UART > > */ > > + Blank line. > > > + uart_port_lock_irqsave(port, &flags); > > serial_port_out(port, UART_LCR, UART_LCR_WLEN8); > > > > - uart_port_lock_irqsave(port, &flags); > > if (up->port.flags & UPF_FOURPORT) { > > if (!up->port.irq) > > up->port.mctrl |= TIOCM_OUT1; -- With Best Regards, Andy Shevchenko
Hi Andy, On Thu, Apr 3, 2025 at 7:50 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Thu, Apr 03, 2025 at 02:36:16PM +0300, Andy Shevchenko wrote: > > On Thu, Apr 03, 2025 at 05:03:36PM +0800, Yunhui Cui wrote: > > A couple of more questions here: > 1) what is the DW IP version and where did you get the PSLVERR_RESP_EN > parameter from? > 2) what is the setting of the UART_16550_COMPATIBLE parameter? 1): Refer to: https://www.synopsys.com/dw/ipdir.php?c=DW_apb_uart 2): data->uart_16550_compatible == 0 > > > > When the PSLVERR_RESP_EN parameter is set to 1, the device generates > > > an error response if an attempt is made to read an empty RBR (Receive > > > Buffer Register) while the FIFO is enabled. > > > > > > In serial8250_do_startup, calling serial_port_out(port, UART_LCR, > > > > serial8250_do_startup() > > > > > UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes > > > dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter > > > function enables the FIFO via serial_out(p, UART_FCR, p->fcr). > > > Execution proceeds to the dont_test_tx_en label: > > > ... > > > serial_port_in(port, UART_RX); > > > This satisfies the PSLVERR trigger condition. > > > > > > Because another CPU(e.g., using printk) is accessing the UART (UART > > > > printk() > > > > > is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) == > > > (lcr & ~UART_LCR_SPAR), causing it to enter dw8250_force_idle(). > > > > > > To resolve this issue, relevant serial_port_out operations should be > > > > serial_port_out() > > > > > placed in a critical section, and UART_RX data should only be read > > > when the UART_LSR DR bit is set. > > > > The last one is made in the common code, are you sure that all supported UARTs > > will be okay with such a change? > > > > > Panic message: > > > > Please, read this > > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages > > and act accordingly. > > > > > [ 0.442336] Oops - unknown exception [#1] > > > [ 0.442337] Modules linked in: > > > [ 0.442339] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.12.13-00102-gf1f43e345877 #1 > > > > Is it still reproducible on v6.14 (and soon v6.15-rc1)? > > > > > [ 0.442342] Tainted: [W]=WARN > > > [ 0.442343] epc : dw8250_serial_in32+0x1e/0x4a > > > [ 0.442351] ra : serial8250_do_startup+0x2c8/0x88e > > > [ 0.442354] epc : ffffffff8064efca ra : ffffffff8064af28 sp : ffff8f8000103990 > > > [ 0.442355] gp : ffffffff815bad28 tp : ffffaf807e36d400 t0 : ffffaf80804cf080 > > > [ 0.442356] t1 : 0000000000000001 t2 : 0000000000000000 s0 : ffff8f80001039a0 > > > [ 0.442358] s1 : ffffffff81626fc0 a0 : ffffffff81626fc0 a1 : 0000000000000000 > > > [ 0.442359] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffff81626fc0 > > > [ 0.442360] a5 : ffff8f800012d900 a6 : 000000000000000f a7 : 000000000fc648c1 > > > [ 0.442361] s2 : 0000000000000000 s3 : 0000000200000022 s4 : 0000000000000000 > > > [ 0.442362] s5 : ffffffff81626fc0 s6 : ffffaf8085227000 s7 : ffffffff81073c58 > > > [ 0.442363] s8 : 0000000000500000 s9 : ffffaf80851a5a60 s10: ffffaf80851a5a60 > > > [ 0.442365] s11: ffffffff80e85980 t3 : ffffaf807e324600 t4 : 0000000000000002 > > > [ 0.442365] t5 : 0000000000000003 t6 : ffffaf80804cf072 > > > [ 0.442366] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000013 > > > [ 0.442368] [<ffffffff8064efca>] dw8250_serial_in32+0x1e/0x4a > > > [ 0.442371] [<ffffffff8064af28>] serial8250_do_startup+0x2c8/0x88e > > > [ 0.442373] [<ffffffff8064b514>] serial8250_startup+0x26/0x2e > > > [ 0.442375] [<ffffffff806428a2>] uart_startup+0x13a/0x308 > > > [ 0.442377] [<ffffffff80642aa4>] uart_port_activate+0x34/0x50 > > > [ 0.442378] [<ffffffff8062ab6a>] tty_port_open+0xb4/0x110 > > > [ 0.442383] [<ffffffff8063f548>] uart_open+0x22/0x36 > > > [ 0.442389] [<ffffffff806234b4>] tty_open+0x1be/0x5e6 > > > [ 0.442396] [<ffffffff802f2d52>] chrdev_open+0x10a/0x2a8 > > > [ 0.442400] [<ffffffff802e7ab6>] do_dentry_open+0xf6/0x34e > > > [ 0.442405] [<ffffffff802e9456>] vfs_open+0x2a/0xb4 > > > [ 0.442408] [<ffffffff80300124>] path_openat+0x676/0xf36 > > > [ 0.442410] [<ffffffff80300a58>] do_filp_open+0x74/0xfa > > > [ 0.442412] [<ffffffff802e9900>] file_open_name+0x84/0x144 > > > [ 0.442414] [<ffffffff802e99f6>] filp_open+0x36/0x54 > > > [ 0.442416] [<ffffffff80a01232>] console_on_rootfs+0x26/0x70 > > > [ 0.442420] [<ffffffff80a0154e>] kernel_init_freeable+0x2d2/0x30e > > > [ 0.442422] [<ffffffff8099c730>] kernel_init+0x2a/0x15e > > > [ 0.442427] [<ffffffff809a7666>] ret_from_fork+0xe/0x1c > > > [ 0.442430] Code: e022 e406 0800 4683 0c15 691c 872a 96bb 00d5 97b6 (439c) 851b > > > [ 0.442432] ---[ end trace 0000000000000000 ]--- > > > [ 0.442434] Kernel panic - not syncing: Fatal exception in interrupt > > > [ 0.442435] SMP: stopping secondary CPUs > > > [ 0.451111] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- > > > > Fixes tag? > > Cc to stable@? > > > > ... > > > > > /* > > > * Now, initialize the UART > > > */ > > > > + Blank line. > > > > > + uart_port_lock_irqsave(port, &flags); > > > serial_port_out(port, UART_LCR, UART_LCR_WLEN8); > > > > > > - uart_port_lock_irqsave(port, &flags); > > > if (up->port.flags & UPF_FOURPORT) { > > > if (!up->port.irq) > > > up->port.mctrl |= TIOCM_OUT1; > > -- > With Best Regards, > Andy Shevchenko > > Thanks, Yunhui
On Fri, Apr 04, 2025 at 10:31:25AM +0800, yunhui cui wrote: > On Thu, Apr 3, 2025 at 7:50 PM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > On Thu, Apr 03, 2025 at 02:36:16PM +0300, Andy Shevchenko wrote: > > > On Thu, Apr 03, 2025 at 05:03:36PM +0800, Yunhui Cui wrote: > > > > A couple of more questions here: > > 1) what is the DW IP version and where did you get the PSLVERR_RESP_EN > > parameter from? > > 2) what is the setting of the UART_16550_COMPATIBLE parameter? > > 1): Refer to: https://www.synopsys.com/dw/ipdir.php?c=DW_apb_uart I don't understand this. I asked about version of the IP, I have datasheets already for many of them, I can't find PSLVERR_RESP_EN there, that's why the question. > 2): data->uart_16550_compatible == 0 Thanks! -- With Best Regards, Andy Shevchenko
© 2016 - 2025 Red Hat, Inc.