[PATCH v3] i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low

Yann Sionneau posted 1 patch 2 years, 3 months ago
There is a newer version of this series
drivers/i2c/busses/i2c-designware-common.c | 17 +++++++++++++++++
drivers/i2c/busses/i2c-designware-core.h   |  3 +++
2 files changed, 20 insertions(+)
[PATCH v3] i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low
Posted by Yann Sionneau 2 years, 3 months ago
The DesignWare IP can be synthesized with the IC_EMPTYFIFO_HOLD_MASTER_EN
parameter.
In this case, when the TX FIFO gets empty and the last command didn't have
the STOP bit (IC_DATA_CMD[9]), the controller will hold SCL low until
a new command is pushed into the TX FIFO or the transfer is aborted.

When the controller is holding SCL low, it cannot be disabled.
The transfer must first be aborted.
Also, the bus recovery won't work because SCL is held low by the master.

Check if the master is holding SCL low in __i2c_dw_disable() before trying
to disable the controller. If SCL is held low, an abort is initiated.
When the abort is done, then proceed with disabling the controller.

This whole situation can happen for instance during SMBus read data block
if the slave just responds with "byte count == 0".
This puts the driver in an unrecoverable state, because the controller is
holding SCL low and the current __i2c_dw_disable() procedure is not
working. In this situation only a SoC reset can fix the i2c bus.

Co-developed-by: Jonathan Borne <jborne@kalray.eu>
Signed-off-by: Jonathan Borne <jborne@kalray.eu>
Signed-off-by: Yann Sionneau <ysionneau@kalray.eu>
---
V2 -> V3:
* do not rename timeout variable for disabling loop
in order to ease backports
* replace abort loop with regmap_read_poll_timeout()
* remove extra empty line.

V1 -> V2:
* use reverse christmas tree order for variable declarations
* use unsigned int type instead of u32 for regmap_read
* give its own loop to the abort procedure with its own timeout
* print an error message if the abort never finishes during the timeout
* rename the timeout variable for the controller disabling loop
* with the usleep_range(10, 20) it takes only 1 loop iteration.
Without it takes 75 iterations and with udelay(1) it takes 16
loop iterations.

 drivers/i2c/busses/i2c-designware-common.c | 17 +++++++++++++++++
 drivers/i2c/busses/i2c-designware-core.h   |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/drivers/i2c/busses/i2c-designware-common.c b/drivers/i2c/busses/i2c-designware-common.c
index 9f8574320eb2..0cd0f85e1350 100644
--- a/drivers/i2c/busses/i2c-designware-common.c
+++ b/drivers/i2c/busses/i2c-designware-common.c
@@ -438,8 +438,25 @@ int i2c_dw_set_sda_hold(struct dw_i2c_dev *dev)
 
 void __i2c_dw_disable(struct dw_i2c_dev *dev)
 {
+	unsigned int raw_intr_stats;
+	unsigned int enable;
 	int timeout = 100;
+	bool abort_needed;
 	u32 status;
+	int ret;
+
+	regmap_read(dev->map, DW_IC_RAW_INTR_STAT, &raw_intr_stats);
+	regmap_read(dev->map, DW_IC_ENABLE, &enable);
+
+	abort_needed = raw_intr_stats & DW_IC_INTR_MST_ON_HOLD;
+	if (abort_needed) {
+		regmap_write(dev->map, DW_IC_ENABLE, enable | DW_IC_ENABLE_ABORT);
+		ret = regmap_read_poll_timeout(dev->map, DW_IC_ENABLE, enable,
+					       !(enable & DW_IC_ENABLE_ABORT), 10,
+					       100);
+		if (ret)
+			dev_err(dev->dev, "timeout while trying to abort current transfer\n");
+	}
 
 	do {
 		__i2c_dw_disable_nowait(dev);
diff --git a/drivers/i2c/busses/i2c-designware-core.h b/drivers/i2c/busses/i2c-designware-core.h
index 19ae23575945..dcd9bd9ee00f 100644
--- a/drivers/i2c/busses/i2c-designware-core.h
+++ b/drivers/i2c/busses/i2c-designware-core.h
@@ -98,6 +98,7 @@
 #define DW_IC_INTR_START_DET	BIT(10)
 #define DW_IC_INTR_GEN_CALL	BIT(11)
 #define DW_IC_INTR_RESTART_DET	BIT(12)
+#define DW_IC_INTR_MST_ON_HOLD	BIT(13)
 
 #define DW_IC_INTR_DEFAULT_MASK		(DW_IC_INTR_RX_FULL | \
 					 DW_IC_INTR_TX_ABRT | \
@@ -109,6 +110,8 @@
 					 DW_IC_INTR_RX_UNDER | \
 					 DW_IC_INTR_RD_REQ)
 
+#define DW_IC_ENABLE_ABORT		BIT(1)
+
 #define DW_IC_STATUS_ACTIVITY		BIT(0)
 #define DW_IC_STATUS_TFE		BIT(2)
 #define DW_IC_STATUS_MASTER_ACTIVITY	BIT(5)
-- 
2.17.1
Re: [PATCH v3] i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low
Posted by Jarkko Nikula 2 years, 3 months ago
Hi

On 8/22/23 12:02, Yann Sionneau wrote:
> The DesignWare IP can be synthesized with the IC_EMPTYFIFO_HOLD_MASTER_EN
> parameter.
> In this case, when the TX FIFO gets empty and the last command didn't have
> the STOP bit (IC_DATA_CMD[9]), the controller will hold SCL low until
> a new command is pushed into the TX FIFO or the transfer is aborted.
> 
> When the controller is holding SCL low, it cannot be disabled.
> The transfer must first be aborted.
> Also, the bus recovery won't work because SCL is held low by the master.
> 
> Check if the master is holding SCL low in __i2c_dw_disable() before trying
> to disable the controller. If SCL is held low, an abort is initiated.
> When the abort is done, then proceed with disabling the controller.
> 
> This whole situation can happen for instance during SMBus read data block
> if the slave just responds with "byte count == 0".
> This puts the driver in an unrecoverable state, because the controller is
> holding SCL low and the current __i2c_dw_disable() procedure is not
> working. In this situation only a SoC reset can fix the i2c bus.
> 
> Co-developed-by: Jonathan Borne <jborne@kalray.eu>
> Signed-off-by: Jonathan Borne <jborne@kalray.eu>
> Signed-off-by: Yann Sionneau <ysionneau@kalray.eu>
> ---
> V2 -> V3:
> * do not rename timeout variable for disabling loop
> in order to ease backports
> * replace abort loop with regmap_read_poll_timeout()
> * remove extra empty line.
> 
> V1 -> V2:
> * use reverse christmas tree order for variable declarations
> * use unsigned int type instead of u32 for regmap_read
> * give its own loop to the abort procedure with its own timeout
> * print an error message if the abort never finishes during the timeout
> * rename the timeout variable for the controller disabling loop
> * with the usleep_range(10, 20) it takes only 1 loop iteration.
> Without it takes 75 iterations and with udelay(1) it takes 16
> loop iterations.
> 
>   drivers/i2c/busses/i2c-designware-common.c | 17 +++++++++++++++++
>   drivers/i2c/busses/i2c-designware-core.h   |  3 +++
>   2 files changed, 20 insertions(+)
> 
This doesn't apply against current code. Looks like your base is older 
than v6.2? I.e. before commit 966b7d3c738a ("i2c: designware: Align 
defines in i2c-designware-core.h").

Jarkko
Re: [PATCH v3] i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low
Posted by Yann Sionneau 2 years, 3 months ago
Hi,

On 8/30/23 12:42, Jarkko Nikula wrote:
> Hi
>
> On 8/22/23 12:02, Yann Sionneau wrote:
>> The DesignWare IP can be synthesized with the 
>> IC_EMPTYFIFO_HOLD_MASTER_EN
>> parameter.
>> In this case, when the TX FIFO gets empty and the last command didn't 
>> have
>> the STOP bit (IC_DATA_CMD[9]), the controller will hold SCL low until
>> a new command is pushed into the TX FIFO or the transfer is aborted.
>>
>> When the controller is holding SCL low, it cannot be disabled.
>> The transfer must first be aborted.
>> Also, the bus recovery won't work because SCL is held low by the master.
>>
>> Check if the master is holding SCL low in __i2c_dw_disable() before 
>> trying
>> to disable the controller. If SCL is held low, an abort is initiated.
>> When the abort is done, then proceed with disabling the controller.
>>
>> This whole situation can happen for instance during SMBus read data 
>> block
>> if the slave just responds with "byte count == 0".
>> This puts the driver in an unrecoverable state, because the 
>> controller is
>> holding SCL low and the current __i2c_dw_disable() procedure is not
>> working. In this situation only a SoC reset can fix the i2c bus.
>>
>> Co-developed-by: Jonathan Borne <jborne@kalray.eu>
>> Signed-off-by: Jonathan Borne <jborne@kalray.eu>
>> Signed-off-by: Yann Sionneau <ysionneau@kalray.eu>
>> ---
>> V2 -> V3:
>> * do not rename timeout variable for disabling loop
>> in order to ease backports
>> * replace abort loop with regmap_read_poll_timeout()
>> * remove extra empty line.
>>
>> V1 -> V2:
>> * use reverse christmas tree order for variable declarations
>> * use unsigned int type instead of u32 for regmap_read
>> * give its own loop to the abort procedure with its own timeout
>> * print an error message if the abort never finishes during the timeout
>> * rename the timeout variable for the controller disabling loop
>> * with the usleep_range(10, 20) it takes only 1 loop iteration.
>> Without it takes 75 iterations and with udelay(1) it takes 16
>> loop iterations.
>>
>>   drivers/i2c/busses/i2c-designware-common.c | 17 +++++++++++++++++
>>   drivers/i2c/busses/i2c-designware-core.h   |  3 +++
>>   2 files changed, 20 insertions(+)
>>
> This doesn't apply against current code. Looks like your base is older 
> than v6.2? I.e. before commit 966b7d3c738a ("i2c: designware: Align 
> defines in i2c-designware-core.h").


Oops, I've rebased it on v6.6-rc1 and re-sent it as V4.

Thanks.

Regards,

-- 

Yann





Re: [PATCH v3] i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low
Posted by Jarkko Nikula 2 years, 3 months ago
Hi

On 8/22/23 12:02, Yann Sionneau wrote:
> The DesignWare IP can be synthesized with the IC_EMPTYFIFO_HOLD_MASTER_EN
> parameter.
> In this case, when the TX FIFO gets empty and the last command didn't have
> the STOP bit (IC_DATA_CMD[9]), the controller will hold SCL low until
> a new command is pushed into the TX FIFO or the transfer is aborted.
> 
> When the controller is holding SCL low, it cannot be disabled.
> The transfer must first be aborted.
> Also, the bus recovery won't work because SCL is held low by the master.
> 
> Check if the master is holding SCL low in __i2c_dw_disable() before trying
> to disable the controller. If SCL is held low, an abort is initiated.
> When the abort is done, then proceed with disabling the controller.
> 
> This whole situation can happen for instance during SMBus read data block
> if the slave just responds with "byte count == 0".
> This puts the driver in an unrecoverable state, because the controller is
> holding SCL low and the current __i2c_dw_disable() procedure is not
> working. In this situation only a SoC reset can fix the i2c bus.
> 
Is this fixed already by the commit 69f035c480d7 ("i2c: designware: 
Handle invalid SMBus block data response length value")?

https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git/commit/?h=i2c/for-next&id=69f035c480d76f12bf061148ccfd578e1099e5fc
Re: [PATCH v3] i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low
Posted by Yann Sionneau 2 years, 3 months ago
Hi,

On 8/22/23 12:14, Jarkko Nikula wrote:
> Hi
>
> On 8/22/23 12:02, Yann Sionneau wrote:
>> The DesignWare IP can be synthesized with the 
>> IC_EMPTYFIFO_HOLD_MASTER_EN
>> parameter.
>> In this case, when the TX FIFO gets empty and the last command didn't 
>> have
>> the STOP bit (IC_DATA_CMD[9]), the controller will hold SCL low until
>> a new command is pushed into the TX FIFO or the transfer is aborted.
>>
>> When the controller is holding SCL low, it cannot be disabled.
>> The transfer must first be aborted.
>> Also, the bus recovery won't work because SCL is held low by the master.
>>
>> Check if the master is holding SCL low in __i2c_dw_disable() before 
>> trying
>> to disable the controller. If SCL is held low, an abort is initiated.
>> When the abort is done, then proceed with disabling the controller.
>>
>> This whole situation can happen for instance during SMBus read data 
>> block
>> if the slave just responds with "byte count == 0".
>> This puts the driver in an unrecoverable state, because the 
>> controller is
>> holding SCL low and the current __i2c_dw_disable() procedure is not
>> working. In this situation only a SoC reset can fix the i2c bus.
>>
> Is this fixed already by the commit 69f035c480d7 ("i2c: designware: 
> Handle invalid SMBus block data response length value")?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git/commit/?h=i2c/for-next&id=69f035c480d76f12bf061148ccfd578e1099e5fc 
>

Indeed the bug that I am having is fixed by 
69f035c480d76f12bf061148ccfd578e1099e5fc

Meaning that thanks to 69f035c receiving a SMBus byte count of 0 won't 
put the controller into a state where the completion will timeout and it 
will need to start a recovery that will fail and then a controller 
disabling that will also fail.

But, still, the disabling procedure is wrong, it lacks the abort part 
(in case SCL is held low).

What my patch does, is fixing the disabling procedure. So that - for 
example - even without 69f035c, the controller will timeout when 
receiving byte count of 0, triggering the "disabling" procedure which 
will work to recover the bus.

My patch fixes the general disabling code, that could be triggered by 
the bug fixed by 69f035c but also by any other bug really.

Speaking of 69f035c btw ... I am really wondering if it's the best fix, 
because it will lie to the kernel saying "we have byte count of 1, read 
another byte" to trigger a read with STOP bit set so that the transfer 
does finish and the controller does not timeout. But to do this it will 
do an extra spurious byte read.

I propose another approach that will acknowledge that "we are in a bad 
condition" and directly abort the transfer without doing an extra read: 
https://marc.info/?l=linux-i2c&m=169175828013532&w=2

I hope my answer to your question is clear enough... English is not my 
native language.

Regards,

-- 

Yann
Re: [PATCH v3] i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low
Posted by Yann Sionneau 2 years, 3 months ago
Hi,

Le 22/08/2023 à 12:28, Yann Sionneau a écrit :
> Hi,
>
> On 8/22/23 12:14, Jarkko Nikula wrote:
>> Hi
>>
>> On 8/22/23 12:02, Yann Sionneau wrote:
>>> The DesignWare IP can be synthesized with the 
>>> IC_EMPTYFIFO_HOLD_MASTER_EN
>>> parameter.
>>> In this case, when the TX FIFO gets empty and the last command 
>>> didn't have
>>> the STOP bit (IC_DATA_CMD[9]), the controller will hold SCL low until
>>> a new command is pushed into the TX FIFO or the transfer is aborted.
>>>
>>> When the controller is holding SCL low, it cannot be disabled.
>>> The transfer must first be aborted.
>>> Also, the bus recovery won't work because SCL is held low by the 
>>> master.
>>>
>>> Check if the master is holding SCL low in __i2c_dw_disable() before 
>>> trying
>>> to disable the controller. If SCL is held low, an abort is initiated.
>>> When the abort is done, then proceed with disabling the controller.
>>>
>>> This whole situation can happen for instance during SMBus read data 
>>> block
>>> if the slave just responds with "byte count == 0".
>>> This puts the driver in an unrecoverable state, because the 
>>> controller is
>>> holding SCL low and the current __i2c_dw_disable() procedure is not
>>> working. In this situation only a SoC reset can fix the i2c bus.
>>>
>> Is this fixed already by the commit 69f035c480d7 ("i2c: designware: 
>> Handle invalid SMBus block data response length value")?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git/commit/?h=i2c/for-next&id=69f035c480d76f12bf061148ccfd578e1099e5fc 
>>
>
> Indeed the bug that I am having is fixed by 
> 69f035c480d76f12bf061148ccfd578e1099e5fc
>
> Meaning that thanks to 69f035c receiving a SMBus byte count of 0 won't 
> put the controller into a state where the completion will timeout and 
> it will need to start a recovery that will fail and then a controller 
> disabling that will also fail.
>
> But, still, the disabling procedure is wrong, it lacks the abort part 
> (in case SCL is held low).
>
> What my patch does, is fixing the disabling procedure. So that - for 
> example - even without 69f035c, the controller will timeout when 
> receiving byte count of 0, triggering the "disabling" procedure which 
> will work to recover the bus.
>
> My patch fixes the general disabling code, that could be triggered by 
> the bug fixed by 69f035c but also by any other bug really.
>
> Speaking of 69f035c btw ... I am really wondering if it's the best 
> fix, because it will lie to the kernel saying "we have byte count of 
> 1, read another byte" to trigger a read with STOP bit set so that the 
> transfer does finish and the controller does not timeout. But to do 
> this it will do an extra spurious byte read.
>
> I propose another approach that will acknowledge that "we are in a bad 
> condition" and directly abort the transfer without doing an extra 
> read: https://marc.info/?l=linux-i2c&m=169175828013532&w=2
>
> I hope my answer to your question is clear enough... English is not my 
> native language.
>
> Regards,
>
Is this any clearer?

Also, what do you think about my remarks on 69f035c?

Regards,

-- 

Yann