[PATCH] serial: sc16is7xx: address RX timeout interrupt errata

Daniel Mack posted 1 patch 2 years, 1 month ago
There is a newer version of this series
drivers/tty/serial/sc16is7xx.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
[PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Daniel Mack 2 years, 1 month ago
This devices has a silicon bug that makes it report a timeout interrupt
but no data in FIFO.

The datasheet states the following in the errata section 18.1.4:

  "If the host reads the receive FIFO at the at the same time as a
  time-out interrupt condition happens, the host might read 0xCC
  (time-out) in the Interrupt Indication Register (IIR), but bit 0
  of the Line Status Register (LSR) is not set (means there is not
  data in the receive FIFO)."

When this happens, the loop in sc16is7xx_irq() will run forever,
which effectively blocks the i2c bus and breaks the functionality
of the UART.

From the information above, it is assumed that when the bug is
triggered, the FIFO does in fact have payload in its buffer, but the
fill level reporting is off-by-one. Hence this patch fixes the issue
by reading one byte from the FIFO when that condition is detected.

This clears the interrupt and hence breaks the polling loop.

Signed-off-by: Daniel Mack <daniel@zonque.org>
Co-Developed-by: Maxim Popov <maxim.snafu@gmail.com>
Cc: stable@vger.kernel.org
---
 drivers/tty/serial/sc16is7xx.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 289ca7d4e566..76f76e510ed1 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -765,6 +765,18 @@ static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
 		case SC16IS7XX_IIR_RTOI_SRC:
 		case SC16IS7XX_IIR_XOFFI_SRC:
 			rxlen = sc16is7xx_port_read(port, SC16IS7XX_RXLVL_REG);
+
+			/*
+			 * There is a silicon bug that makes the chip report a
+			 * time-out interrupt but no data in the FIFO. This is
+			 * described in errata section 18.1.4.
+			 *
+			 * When this happens, read one byte from the FIFO to
+			 * clear the interrupt.
+			 */
+			if (iir == SC16IS7XX_IIR_RTOI_SRC && !rxlen)
+				rxlen = 1;
+
 			if (rxlen)
 				sc16is7xx_handle_rx(port, rxlen, iir);
 			break;
-- 
2.41.0
Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Hugo Villeneuve 2 years, 1 month ago
On Tue, 14 Nov 2023 08:49:04 +0100
Daniel Mack <daniel@zonque.org> wrote:

Hi Daniel,

> This devices has a silicon bug that makes it report a timeout interrupt

devices -> device

> but no data in FIFO.
> 
> The datasheet states the following in the errata section 18.1.4:
> 
>   "If the host reads the receive FIFO at the at the same time as a

"at the at the" -> "at the"

Note: I know this error is part of the errata in NXP datasheet.


>   time-out interrupt condition happens, the host might read 0xCC
>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
>   of the Line Status Register (LSR) is not set (means there is not
>   data in the receive FIFO)."
> 
> When this happens, the loop in sc16is7xx_irq() will run forever,
> which effectively blocks the i2c bus and breaks the functionality

i2c -> i2c/spi

Hugo.


> of the UART.
> 
> From the information above, it is assumed that when the bug is
> triggered, the FIFO does in fact have payload in its buffer, but the
> fill level reporting is off-by-one. Hence this patch fixes the issue
> by reading one byte from the FIFO when that condition is detected.
> 
> This clears the interrupt and hence breaks the polling loop.
> 
> Signed-off-by: Daniel Mack <daniel@zonque.org>
> Co-Developed-by: Maxim Popov <maxim.snafu@gmail.com>
> Cc: stable@vger.kernel.org
> ---
>  drivers/tty/serial/sc16is7xx.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
> index 289ca7d4e566..76f76e510ed1 100644
> --- a/drivers/tty/serial/sc16is7xx.c
> +++ b/drivers/tty/serial/sc16is7xx.c
> @@ -765,6 +765,18 @@ static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
>  		case SC16IS7XX_IIR_RTOI_SRC:
>  		case SC16IS7XX_IIR_XOFFI_SRC:
>  			rxlen = sc16is7xx_port_read(port, SC16IS7XX_RXLVL_REG);
> +
> +			/*
> +			 * There is a silicon bug that makes the chip report a
> +			 * time-out interrupt but no data in the FIFO. This is
> +			 * described in errata section 18.1.4.
> +			 *
> +			 * When this happens, read one byte from the FIFO to
> +			 * clear the interrupt.
> +			 */
> +			if (iir == SC16IS7XX_IIR_RTOI_SRC && !rxlen)
> +				rxlen = 1;
> +
>  			if (rxlen)
>  				sc16is7xx_handle_rx(port, rxlen, iir);
>  			break;
> -- 
> 2.41.0
>
Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Hugo Villeneuve 2 years, 1 month ago
On Tue, 14 Nov 2023 08:49:04 +0100
Daniel Mack <daniel@zonque.org> wrote:

Hi,

> This devices has a silicon bug that makes it report a timeout interrupt
> but no data in FIFO.
> 
> The datasheet states the following in the errata section 18.1.4:
> 
>   "If the host reads the receive FIFO at the at the same time as a
>   time-out interrupt condition happens, the host might read 0xCC
>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
>   of the Line Status Register (LSR) is not set (means there is not
>   data in the receive FIFO)."
> 
> When this happens, the loop in sc16is7xx_irq() will run forever,
> which effectively blocks the i2c bus and breaks the functionality
> of the UART.
> 
> From the information above, it is assumed that when the bug is
> triggered, the FIFO does in fact have payload in its buffer, but the
> fill level reporting is off-by-one. Hence this patch fixes the issue
> by reading one byte from the FIFO when that condition is detected.

From what I understand from the errata, when the problem occurs, it
affects bit 0 of the LSR register. I see no mention that it
also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?

LSR[0] would be checked only if we were using polled mode of
operation, but we always use the interrupt mode (IRQ), and therefore I
would say that this errata doesn't apply to this driver, and the
patch is not necessary...

Hugo.


> This clears the interrupt and hence breaks the polling loop.
> 
> Signed-off-by: Daniel Mack <daniel@zonque.org>
> Co-Developed-by: Maxim Popov <maxim.snafu@gmail.com>
> Cc: stable@vger.kernel.org
> ---
>  drivers/tty/serial/sc16is7xx.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
> index 289ca7d4e566..76f76e510ed1 100644
> --- a/drivers/tty/serial/sc16is7xx.c
> +++ b/drivers/tty/serial/sc16is7xx.c
> @@ -765,6 +765,18 @@ static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
>  		case SC16IS7XX_IIR_RTOI_SRC:
>  		case SC16IS7XX_IIR_XOFFI_SRC:
>  			rxlen = sc16is7xx_port_read(port, SC16IS7XX_RXLVL_REG);
> +
> +			/*
> +			 * There is a silicon bug that makes the chip report a
> +			 * time-out interrupt but no data in the FIFO. This is
> +			 * described in errata section 18.1.4.
> +			 *
> +			 * When this happens, read one byte from the FIFO to
> +			 * clear the interrupt.
> +			 */
> +			if (iir == SC16IS7XX_IIR_RTOI_SRC && !rxlen)
> +				rxlen = 1;
> +
>  			if (rxlen)
>  				sc16is7xx_handle_rx(port, rxlen, iir);
>  			break;
> -- 
> 2.41.0
>
Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Daniel Mack 2 years, 1 month ago
Hi Hugo,

On 11/14/23 16:20, Hugo Villeneuve wrote:
> On Tue, 14 Nov 2023 08:49:04 +0100
> Daniel Mack <daniel@zonque.org> wrote:
>> This devices has a silicon bug that makes it report a timeout interrupt
>> but no data in FIFO.
>>
>> The datasheet states the following in the errata section 18.1.4:
>>
>>   "If the host reads the receive FIFO at the at the same time as a
>>   time-out interrupt condition happens, the host might read 0xCC
>>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
>>   of the Line Status Register (LSR) is not set (means there is not
>>   data in the receive FIFO)."
>>
>> When this happens, the loop in sc16is7xx_irq() will run forever,
>> which effectively blocks the i2c bus and breaks the functionality
>> of the UART.
>>
>> From the information above, it is assumed that when the bug is
>> triggered, the FIFO does in fact have payload in its buffer, but the
>> fill level reporting is off-by-one. Hence this patch fixes the issue
>> by reading one byte from the FIFO when that condition is detected.
> 
> From what I understand from the errata, when the problem occurs, it
> affects bit 0 of the LSR register. I see no mention that it
> also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?

True, the errata doesn't explicitly mention that, but tests have shown
that the RXLVL register is equally affected.

> LSR[0] would be checked only if we were using polled mode of
> operation, but we always use the interrupt mode (IRQ), and therefore I
> would say that this errata doesn't apply to this driver, and the
> patch is not necessary...

Well, it is. We have seen this bug in the wild and extensively
stress-tested the patch on dozens of boards for many days. Without this
patch, kernels on affected systems would consume a lot of CPU cycles in
the interrupt threads and effectively render the I2C bus unusable due to
the busy polling.

With this patch applied, we were no longer able to reproduce the issue.


Thanks,
Daniel
Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Hugo Villeneuve 2 years, 1 month ago
On Tue, 14 Nov 2023 16:55:33 +0100
Daniel Mack <daniel@zonque.org> wrote:

> Hi Hugo,
> 
> On 11/14/23 16:20, Hugo Villeneuve wrote:
> > On Tue, 14 Nov 2023 08:49:04 +0100
> > Daniel Mack <daniel@zonque.org> wrote:
> >> This devices has a silicon bug that makes it report a timeout interrupt
> >> but no data in FIFO.
> >>
> >> The datasheet states the following in the errata section 18.1.4:
> >>
> >>   "If the host reads the receive FIFO at the at the same time as a
> >>   time-out interrupt condition happens, the host might read 0xCC
> >>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
> >>   of the Line Status Register (LSR) is not set (means there is not
> >>   data in the receive FIFO)."
> >>
> >> When this happens, the loop in sc16is7xx_irq() will run forever,
> >> which effectively blocks the i2c bus and breaks the functionality
> >> of the UART.
> >>
> >> From the information above, it is assumed that when the bug is
> >> triggered, the FIFO does in fact have payload in its buffer, but the
> >> fill level reporting is off-by-one. Hence this patch fixes the issue
> >> by reading one byte from the FIFO when that condition is detected.
> > 
> > From what I understand from the errata, when the problem occurs, it
> > affects bit 0 of the LSR register. I see no mention that it
> > also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?
> 
> True, the errata doesn't explicitly mention that, but tests have shown
> that the RXLVL register is equally affected.

Hi Daniel,
ok, now it makes more sense if RXLVL is affected.

Have you contacted NXP about this? If not, I suggest you do open a
support case and let them know about your findings, because it is very
strange that it is not mentioned in the errata. And doing so may led to
an updated and better documentation on their side about this errata.

And incorporate this new info into your commit log for an eventual
patch V2.

Thank you,
Hugo.


> > LSR[0] would be checked only if we were using polled mode of
> > operation, but we always use the interrupt mode (IRQ), and therefore I
> > would say that this errata doesn't apply to this driver, and the
> > patch is not necessary...
> 
> Well, it is. We have seen this bug in the wild and extensively
> stress-tested the patch on dozens of boards for many days. Without this
> patch, kernels on affected systems would consume a lot of CPU cycles in
> the interrupt threads and effectively render the I2C bus unusable due to
> the busy polling.
> 
> With this patch applied, we were no longer able to reproduce the issue.
> 
> 
> Thanks,
> Daniel
> 
>
Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Daniel Mack 2 years, 1 month ago
On 11/15/23 15:47, Hugo Villeneuve wrote:
> On Tue, 14 Nov 2023 16:55:33 +0100
> Daniel Mack <daniel@zonque.org> wrote:
> 
>> Hi Hugo,
>>
>> On 11/14/23 16:20, Hugo Villeneuve wrote:
>>> On Tue, 14 Nov 2023 08:49:04 +0100
>>> Daniel Mack <daniel@zonque.org> wrote:
>>>> This devices has a silicon bug that makes it report a timeout interrupt
>>>> but no data in FIFO.
>>>>
>>>> The datasheet states the following in the errata section 18.1.4:
>>>>
>>>>   "If the host reads the receive FIFO at the at the same time as a
>>>>   time-out interrupt condition happens, the host might read 0xCC
>>>>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
>>>>   of the Line Status Register (LSR) is not set (means there is not
>>>>   data in the receive FIFO)."
>>>>
>>>> When this happens, the loop in sc16is7xx_irq() will run forever,
>>>> which effectively blocks the i2c bus and breaks the functionality
>>>> of the UART.
>>>>
>>>> From the information above, it is assumed that when the bug is
>>>> triggered, the FIFO does in fact have payload in its buffer, but the
>>>> fill level reporting is off-by-one. Hence this patch fixes the issue
>>>> by reading one byte from the FIFO when that condition is detected.
>>>
>>> From what I understand from the errata, when the problem occurs, it
>>> affects bit 0 of the LSR register. I see no mention that it
>>> also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?
>>
>> True, the errata doesn't explicitly mention that, but tests have shown
>> that the RXLVL register is equally affected.
> 
> Hi Daniel,
> ok, now it makes more sense if RXLVL is affected.
> 
> Have you contacted NXP about this? If not, I suggest you do open a
> support case and let them know about your findings, because it is very
> strange that it is not mentioned in the errata. And doing so may led to
> an updated and better documentation on their side about this errata.

The errata is also wrong in other regards - the IIR register cannot
yield 0xcc according to their own documentation. It also makes no
suggestion on how to recover from that situation, which is common
practice usually.

We'll let them know through our FAE channels, but the latest datasheet
for this chip was released over a decade ago, and I don't expect any
update to the errata wording.

> And incorporate this new info into your commit log for an eventual
> patch V2.

It makes no sense IMO to have all users of this chip suffer from an
issue that was clearly identified to be present and which has an evident
fix. Why would we do that?


Thanks,
Daniel
Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Hugo Villeneuve 2 years, 1 month ago
On Wed, 15 Nov 2023 15:57:38 +0100
Daniel Mack <daniel@zonque.org> wrote:

> On 11/15/23 15:47, Hugo Villeneuve wrote:
> > On Tue, 14 Nov 2023 16:55:33 +0100
> > Daniel Mack <daniel@zonque.org> wrote:
> > 
> >> Hi Hugo,
> >>
> >> On 11/14/23 16:20, Hugo Villeneuve wrote:
> >>> On Tue, 14 Nov 2023 08:49:04 +0100
> >>> Daniel Mack <daniel@zonque.org> wrote:
> >>>> This devices has a silicon bug that makes it report a timeout interrupt
> >>>> but no data in FIFO.
> >>>>
> >>>> The datasheet states the following in the errata section 18.1.4:
> >>>>
> >>>>   "If the host reads the receive FIFO at the at the same time as a
> >>>>   time-out interrupt condition happens, the host might read 0xCC
> >>>>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
> >>>>   of the Line Status Register (LSR) is not set (means there is not
> >>>>   data in the receive FIFO)."
> >>>>
> >>>> When this happens, the loop in sc16is7xx_irq() will run forever,
> >>>> which effectively blocks the i2c bus and breaks the functionality
> >>>> of the UART.
> >>>>
> >>>> From the information above, it is assumed that when the bug is
> >>>> triggered, the FIFO does in fact have payload in its buffer, but the
> >>>> fill level reporting is off-by-one. Hence this patch fixes the issue
> >>>> by reading one byte from the FIFO when that condition is detected.
> >>>
> >>> From what I understand from the errata, when the problem occurs, it
> >>> affects bit 0 of the LSR register. I see no mention that it
> >>> also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?
> >>
> >> True, the errata doesn't explicitly mention that, but tests have shown
> >> that the RXLVL register is equally affected.
> > 
> > Hi Daniel,
> > ok, now it makes more sense if RXLVL is affected.
> > 
> > Have you contacted NXP about this? If not, I suggest you do open a
> > support case and let them know about your findings, because it is very
> > strange that it is not mentioned in the errata. And doing so may led to
> > an updated and better documentation on their side about this errata.
> 

Hi Daniel,

> The errata is also wrong in other regards - the IIR register cannot
> yield 0xcc according to their own documentation. It also makes no
> suggestion on how to recover from that situation, which is common
> practice usually.

0xcc is valid according to the datasheet. Bits 7:6 are a mirror copy of
FCR[0], so bits 5:0 are 0x0c, which is documented in table 14?

But you are right about the recovery procedure, it should be documented
in the errata.


> We'll let them know through our FAE channels, but the latest datasheet
> for this chip was released over a decade ago, and I don't expect any
> update to the errata wording.

You cannot assume they would not update the datasheet, especially with
your findings about RXLVL which add a whole new dimension to this
errata. The fact that the latest release was long ago is irrelevant.


> > And incorporate this new info into your commit log for an eventual
> > patch V2.
> 
> It makes no sense IMO to have all users of this chip suffer from an
> issue that was clearly identified to be present and which has an evident
> fix. Why would we do that?

I don't know what you mean by that...

My suggestion was simply to incorporate your findings about RXLVL
register into your commit log for patch V2...

Hugo.
Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Daniel Mack 2 years, 1 month ago
Hi Hugo,

On 11/15/23 16:31, Hugo Villeneuve wrote:
> On Wed, 15 Nov 2023 15:57:38 +0100
> Daniel Mack <daniel@zonque.org> wrote:
> 
>> On 11/15/23 15:47, Hugo Villeneuve wrote:
>>> On Tue, 14 Nov 2023 16:55:33 +0100
>>> Daniel Mack <daniel@zonque.org> wrote:
>>>
>>>> Hi Hugo,
>>>>
>>>> On 11/14/23 16:20, Hugo Villeneuve wrote:
>>>>> On Tue, 14 Nov 2023 08:49:04 +0100
>>>>> Daniel Mack <daniel@zonque.org> wrote:
>>>>>> This devices has a silicon bug that makes it report a timeout interrupt
>>>>>> but no data in FIFO.
>>>>>>
>>>>>> The datasheet states the following in the errata section 18.1.4:
>>>>>>
>>>>>>   "If the host reads the receive FIFO at the at the same time as a
>>>>>>   time-out interrupt condition happens, the host might read 0xCC
>>>>>>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
>>>>>>   of the Line Status Register (LSR) is not set (means there is not
>>>>>>   data in the receive FIFO)."
>>>>>>
>>>>>> When this happens, the loop in sc16is7xx_irq() will run forever,
>>>>>> which effectively blocks the i2c bus and breaks the functionality
>>>>>> of the UART.
>>>>>>
>>>>>> From the information above, it is assumed that when the bug is
>>>>>> triggered, the FIFO does in fact have payload in its buffer, but the
>>>>>> fill level reporting is off-by-one. Hence this patch fixes the issue
>>>>>> by reading one byte from the FIFO when that condition is detected.
>>>>>
>>>>> From what I understand from the errata, when the problem occurs, it
>>>>> affects bit 0 of the LSR register. I see no mention that it
>>>>> also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?
>>>>
>>>> True, the errata doesn't explicitly mention that, but tests have shown
>>>> that the RXLVL register is equally affected.
>>>
>>> Hi Daniel,
>>> ok, now it makes more sense if RXLVL is affected.
>>>
>>> Have you contacted NXP about this? If not, I suggest you do open a
>>> support case and let them know about your findings, because it is very
>>> strange that it is not mentioned in the errata. And doing so may led to
>>> an updated and better documentation on their side about this errata.
>>
> 
> Hi Daniel,
> 
>> The errata is also wrong in other regards - the IIR register cannot
>> yield 0xcc according to their own documentation. It also makes no
>> suggestion on how to recover from that situation, which is common
>> practice usually.
> 
> 0xcc is valid according to the datasheet. Bits 7:6 are a mirror copy of
> FCR[0], so bits 5:0 are 0x0c, which is documented in table 14?

Ah, right. I was looking at the masked value.

> But you are right about the recovery procedure, it should be documented
> in the errata.
> 
> 
>> We'll let them know through our FAE channels, but the latest datasheet
>> for this chip was released over a decade ago, and I don't expect any
>> update to the errata wording.
> 
> You cannot assume they would not update the datasheet, especially with
> your findings about RXLVL which add a whole new dimension to this
> errata. The fact that the latest release was long ago is irrelevant.

Yes, we will give them a note. Let's see what happens.

>>> And incorporate this new info into your commit log for an eventual
>>> patch V2.
>>
>> It makes no sense IMO to have all users of this chip suffer from an
>> issue that was clearly identified to be present and which has an evident
>> fix. Why would we do that?
> 
> I don't know what you mean by that...
> 
> My suggestion was simply to incorporate your findings about RXLVL
> register into your commit log for patch V2...

My apologies, I misread your message then. I thought you suggested to
wait for a new errata statement from NXP so we can quote from that in
the commit.

The RXLVL detail is part of the the commit log in the v2 submission.


Thanks,
Daniel
Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Lech Perczak 2 years, 1 month ago
Hi Daniel,

W dniu 14.11.2023 o 16:55, Daniel Mack pisze:
> Hi Hugo,
>
> On 11/14/23 16:20, Hugo Villeneuve wrote:
>> On Tue, 14 Nov 2023 08:49:04 +0100
>> Daniel Mack <daniel@zonque.org> wrote:
>>> This devices has a silicon bug that makes it report a timeout interrupt
>>> but no data in FIFO.
>>>
>>> The datasheet states the following in the errata section 18.1.4:
>>>
>>>   "If the host reads the receive FIFO at the at the same time as a
>>>   time-out interrupt condition happens, the host might read 0xCC
>>>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
>>>   of the Line Status Register (LSR) is not set (means there is not
>>>   data in the receive FIFO)."
>>>
>>> When this happens, the loop in sc16is7xx_irq() will run forever,
>>> which effectively blocks the i2c bus and breaks the functionality
>>> of the UART.
>>>
>>> From the information above, it is assumed that when the bug is
>>> triggered, the FIFO does in fact have payload in its buffer, but the
>>> fill level reporting is off-by-one. Hence this patch fixes the issue
>>> by reading one byte from the FIFO when that condition is detected.
>> From what I understand from the errata, when the problem occurs, it
>> affects bit 0 of the LSR register. I see no mention that it
>> also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?
> True, the errata doesn't explicitly mention that, but tests have shown
> that the RXLVL register is equally affected.
>
>> LSR[0] would be checked only if we were using polled mode of
>> operation, but we always use the interrupt mode (IRQ), and therefore I
>> would say that this errata doesn't apply to this driver, and the
>> patch is not necessary...
> Well, it is. We have seen this bug in the wild and extensively
> stress-tested the patch on dozens of boards for many days. Without this
> patch, kernels on affected systems would consume a lot of CPU cycles in
> the interrupt threads and effectively render the I2C bus unusable due to
> the busy polling.
>
> With this patch applied, we were no longer able to reproduce the issue.
Could you share some more details on the setup you use to reproduce this? I'd like to try out as well.
>
>
> Thanks,
> Daniel

-- 
Pozdrawiam/With kind regards,
Lech Perczak
Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Daniel Mack 2 years, 1 month ago
Hi Lech,

On 11/15/23 11:51, Lech Perczak wrote:
> W dniu 14.11.2023 o 16:55, Daniel Mack pisze:
>> Hi Hugo,
>>
>> On 11/14/23 16:20, Hugo Villeneuve wrote:
>>> On Tue, 14 Nov 2023 08:49:04 +0100
>>> Daniel Mack <daniel@zonque.org> wrote:
>>>> This devices has a silicon bug that makes it report a timeout interrupt
>>>> but no data in FIFO.
>>>>
>>>> The datasheet states the following in the errata section 18.1.4:
>>>>
>>>>   "If the host reads the receive FIFO at the at the same time as a
>>>>   time-out interrupt condition happens, the host might read 0xCC
>>>>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
>>>>   of the Line Status Register (LSR) is not set (means there is not
>>>>   data in the receive FIFO)."
>>>>
>>>> When this happens, the loop in sc16is7xx_irq() will run forever,
>>>> which effectively blocks the i2c bus and breaks the functionality
>>>> of the UART.
>>>>
>>>> From the information above, it is assumed that when the bug is
>>>> triggered, the FIFO does in fact have payload in its buffer, but the
>>>> fill level reporting is off-by-one. Hence this patch fixes the issue
>>>> by reading one byte from the FIFO when that condition is detected.
>>> From what I understand from the errata, when the problem occurs, it
>>> affects bit 0 of the LSR register. I see no mention that it
>>> also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?
>> True, the errata doesn't explicitly mention that, but tests have shown
>> that the RXLVL register is equally affected.
>>
>>> LSR[0] would be checked only if we were using polled mode of
>>> operation, but we always use the interrupt mode (IRQ), and therefore I
>>> would say that this errata doesn't apply to this driver, and the
>>> patch is not necessary...
>> Well, it is. We have seen this bug in the wild and extensively
>> stress-tested the patch on dozens of boards for many days. Without this
>> patch, kernels on affected systems would consume a lot of CPU cycles in
>> the interrupt threads and effectively render the I2C bus unusable due to
>> the busy polling.
>>
>> With this patch applied, we were no longer able to reproduce the issue.
> Could you share some more details on the setup you use to reproduce this? I'd like to try out as well.

We have boards with 2 I2C busses with an SC16IS752IBS on both. The UARTs
are configured in infrared mode, and they send receive IR signals
constantly. I guess the same would happen with other electrical
interfaces, but the important bit is that the UARTs see a steady stream
of inbound data.

The bug has hit us on production units and when it does, sc16is7xx_irq()
would spin forever because sc16is7xx_port_irq() keeps seeing an
interrupt in the IIR register that is not cleared because the driver
does not call into sc16is7xx_handle_rx() unless the RXLVL register
reports at least one byte in the FIFO.

Note that this issue might only occur in revision E of the silicon. And
there seems to be now way to read the revision code through I2C, so I
guess you won't be able to figure out easily whether your chip is affected.

Let me know if I can provide more information.


Thanks,
Daniel






Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Hugo Villeneuve 2 years, 1 month ago
On Wed, 15 Nov 2023 12:22:10 +0100
Daniel Mack <daniel@zonque.org> wrote:

> Hi Lech,
> 
> On 11/15/23 11:51, Lech Perczak wrote:
> > W dniu 14.11.2023 o 16:55, Daniel Mack pisze:
> >> Hi Hugo,
> >>
> >> On 11/14/23 16:20, Hugo Villeneuve wrote:
> >>> On Tue, 14 Nov 2023 08:49:04 +0100
> >>> Daniel Mack <daniel@zonque.org> wrote:
> >>>> This devices has a silicon bug that makes it report a timeout interrupt
> >>>> but no data in FIFO.
> >>>>
> >>>> The datasheet states the following in the errata section 18.1.4:
> >>>>
> >>>>   "If the host reads the receive FIFO at the at the same time as a
> >>>>   time-out interrupt condition happens, the host might read 0xCC
> >>>>   (time-out) in the Interrupt Indication Register (IIR), but bit 0
> >>>>   of the Line Status Register (LSR) is not set (means there is not
> >>>>   data in the receive FIFO)."
> >>>>
> >>>> When this happens, the loop in sc16is7xx_irq() will run forever,
> >>>> which effectively blocks the i2c bus and breaks the functionality
> >>>> of the UART.
> >>>>
> >>>> From the information above, it is assumed that when the bug is
> >>>> triggered, the FIFO does in fact have payload in its buffer, but the
> >>>> fill level reporting is off-by-one. Hence this patch fixes the issue
> >>>> by reading one byte from the FIFO when that condition is detected.
> >>> From what I understand from the errata, when the problem occurs, it
> >>> affects bit 0 of the LSR register. I see no mention that it
> >>> also affects the RX FIFO level register (SC16IS7XX_RXLVL_REG)?
> >> True, the errata doesn't explicitly mention that, but tests have shown
> >> that the RXLVL register is equally affected.
> >>
> >>> LSR[0] would be checked only if we were using polled mode of
> >>> operation, but we always use the interrupt mode (IRQ), and therefore I
> >>> would say that this errata doesn't apply to this driver, and the
> >>> patch is not necessary...
> >> Well, it is. We have seen this bug in the wild and extensively
> >> stress-tested the patch on dozens of boards for many days. Without this
> >> patch, kernels on affected systems would consume a lot of CPU cycles in
> >> the interrupt threads and effectively render the I2C bus unusable due to
> >> the busy polling.
> >>
> >> With this patch applied, we were no longer able to reproduce the issue.
> > Could you share some more details on the setup you use to reproduce this? I'd like to try out as well.
> 
> We have boards with 2 I2C busses with an SC16IS752IBS on both. The UARTs
> are configured in infrared mode, and they send receive IR signals
> constantly. I guess the same would happen with other electrical
> interfaces, but the important bit is that the UARTs see a steady stream
> of inbound data.
> 

Hi Daniel,

> The bug has hit us on production units and when it does, sc16is7xx_irq()
> would spin forever because sc16is7xx_port_irq() keeps seeing an
> interrupt in the IIR register that is not cleared because the driver
> does not call into sc16is7xx_handle_rx() unless the RXLVL register
> reports at least one byte in the FIFO.

I would suggest that you replace the second paragraph or your original
commit message with this, it better explains what is the problem.

Also, when the problem happens, you say that "the fill level reporting
is off-by-one", so doest it mean that RXLVL can sometimes be non-zero
when the bug occurs?


> Note that this issue might only occur in revision E of the silicon. And

Is this just a supposition or based on NXP info or some actual tests?

> there seems to be now way to read the revision code through I2C, so I
> guess you won't be able to figure out easily whether your chip is affected.
> 
> Let me know if I can provide more information.

I have a board with two SC16IS752IPW using SPI interface, but I don't
know (yet) what is the revision. I will try to determine it if possible,
although I do not see any info on that in the datasheet.

I will also try to reproduce the issue, and test your patch.

Hugo.
Re: [PATCH] serial: sc16is7xx: address RX timeout interrupt errata
Posted by Daniel Mack 2 years, 1 month ago
Hi Hugo,

On 11/15/23 16:01, Hugo Villeneuve wrote:
> On Wed, 15 Nov 2023 12:22:10 +0100
> Daniel Mack <daniel@zonque.org> wrote:
> Hi Daniel,
> 
>> The bug has hit us on production units and when it does, sc16is7xx_irq()
>> would spin forever because sc16is7xx_port_irq() keeps seeing an
>> interrupt in the IIR register that is not cleared because the driver
>> does not call into sc16is7xx_handle_rx() unless the RXLVL register
>> reports at least one byte in the FIFO.
> 
> I would suggest that you replace the second paragraph or your original
> commit message with this, it better explains what is the problem.

Good idea, will do.

> Also, when the problem happens, you say that "the fill level reporting
> is off-by-one", so doest it mean that RXLVL can sometimes be non-zero
> when the bug occurs?

Maybe, but if that happens we would read one byte too less, which
doesn't harm as we would get another interrupt later to handle the rest.
The problematic case is when we don't read any data at all even though
there is something in the FIFO, and the interrupt suggested that as well.

>> Note that this issue might only occur in revision E of the silicon. And
> 
> Is this just a supposition or based on NXP info or some actual tests?

Well, the datasheet states "Errata for Rev. E added 12 August 2011", and
as "Revision E" does not seem to refer to a datasheet version, it will
most likely be about the silicon revision. And another assumption is
that the issue would fixed in subsequent versions of the chip, in case
there are any at all.

FTR, this is the document I'm looking at:

  https://www.nxp.com/docs/en/data-sheet/SC16IS752_SC16IS762.pdf

>> there seems to be now way to read the revision code through I2C, so I
>> guess you won't be able to figure out easily whether your chip is affected.
>>
>> Let me know if I can provide more information.
> 
> I have a board with two SC16IS752IPW using SPI interface, but I don't
> know (yet) what is the revision. I will try to determine it if possible,
> although I do not see any info on that in the datasheet.
> 
> I will also try to reproduce the issue, and test your patch.

Great, thanks!


Best regards,
Daniel