[PATCH] i2c: designware: Fix corrupted memory seen in the ISR

Jan Bottorff posted 1 patch 2 years, 5 months ago
There is a newer version of this series
drivers/i2c/busses/i2c-designware-master.c | 8 ++++++++
1 file changed, 8 insertions(+)
[PATCH] i2c: designware: Fix corrupted memory seen in the ISR
Posted by Jan Bottorff 2 years, 5 months ago
Errors were happening in the ISR that looked like corrupted
memory. This was because writes from the core enabling interrupts
where not yet visible to the core running the ISR. A write barrier
assures writes to driver data structures are visible to all cores
before interrupts are enabled.

The ARM Barrier Litmus Tests and Cookbook has an example under
Sending Interrupts and Barriers that matches the usage in this
driver. That document says a DSB barrier is required.

Signed-off-by: Jan Bottorff <janb@os.amperecomputing.com>
---
 drivers/i2c/busses/i2c-designware-master.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
index ca1035e010c7..1694ac6bb592 100644
--- a/drivers/i2c/busses/i2c-designware-master.c
+++ b/drivers/i2c/busses/i2c-designware-master.c
@@ -248,6 +248,14 @@ static void i2c_dw_xfer_init(struct dw_i2c_dev *dev)
 	/* Dummy read to avoid the register getting stuck on Bay Trail */
 	regmap_read(dev->map, DW_IC_ENABLE_STATUS, &dummy);
 
+	/*
+	 * To guarantee data written by the current core is visible to
+	 * all cores, a write barrier is required. This needs to be
+	 * before an interrupt causes execution on another core.
+	 * For ARM processors, this needs to be a DSB barrier.
+	 */
+	wmb();
+
 	/* Clear and enable interrupts */
 	regmap_read(dev->map, DW_IC_CLR_INTR, &dummy);
 	regmap_write(dev->map, DW_IC_INTR_MASK, DW_IC_INTR_MASTER_MASK);
-- 
2.41.0
Re: [PATCH] i2c: designware: Fix corrupted memory seen in the ISR
Posted by Yann Sionneau 2 years, 5 months ago
Hi Jan,

On 13/09/2023 03:03, Jan Bottorff wrote:
> Errors were happening in the ISR that looked like corrupted
> memory. This was because writes from the core enabling interrupts
> where not yet visible to the core running the ISR. A write barrier
Typo where => were
> assures writes to driver data structures are visible to all cores
> before interrupts are enabled.
Maybe rephrase this using the direct style describing what the commit 
does like "Add a write barrier before enabling interrupts to assure [...]"
>
> The ARM Barrier Litmus Tests and Cookbook has an example under
> Sending Interrupts and Barriers that matches the usage in this
> driver. That document says a DSB barrier is required.
>
> Signed-off-by: Jan Bottorff <janb@os.amperecomputing.com>
> ---
>   drivers/i2c/busses/i2c-designware-master.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
> index ca1035e010c7..1694ac6bb592 100644
> --- a/drivers/i2c/busses/i2c-designware-master.c
> +++ b/drivers/i2c/busses/i2c-designware-master.c
> @@ -248,6 +248,14 @@ static void i2c_dw_xfer_init(struct dw_i2c_dev *dev)
>   	/* Dummy read to avoid the register getting stuck on Bay Trail */
>   	regmap_read(dev->map, DW_IC_ENABLE_STATUS, &dummy);
>   
> +	/*
> +	 * To guarantee data written by the current core is visible to
> +	 * all cores, a write barrier is required. This needs to be
> +	 * before an interrupt causes execution on another core.
> +	 * For ARM processors, this needs to be a DSB barrier.
> +	 */
> +	wmb();
> +
>   	/* Clear and enable interrupts */
>   	regmap_read(dev->map, DW_IC_CLR_INTR, &dummy);
>   	regmap_write(dev->map, DW_IC_INTR_MASK, DW_IC_INTR_MASTER_MASK);

Apart from the commit message it looks good to me.

If I understand correctly without this wmb() it is possible that the 
writes to dev->msg_write_idx , dev->msg_read_idx = 0 etc would not yet 
be visible to another CPU running the ISR handler right after enabling 
those.

Reviewed-by: Yann Sionneau <ysionneau@kalrayinc.com>

Tested-by: Yann Sionneau <ysionneau@kalrayinc.com>

Thanks!

Regards,

-- 

Yann
Re: [PATCH] i2c: designware: Fix corrupted memory seen in the ISR
Posted by Andy Shevchenko 2 years, 5 months ago
On Wed, Sep 13, 2023 at 11:04:00AM +0200, Yann Sionneau wrote:
> On 13/09/2023 03:03, Jan Bottorff wrote:

...

> > +	/*
> > +	 * To guarantee data written by the current core is visible to
> > +	 * all cores, a write barrier is required. This needs to be
> > +	 * before an interrupt causes execution on another core.
> > +	 * For ARM processors, this needs to be a DSB barrier.
> > +	 */
> > +	wmb();
> 
> Apart from the commit message it looks good to me.
> 
> If I understand correctly without this wmb() it is possible that the writes
> to dev->msg_write_idx , dev->msg_read_idx = 0 etc would not yet be visible
> to another CPU running the ISR handler right after enabling those.

If this is the case, shouldn't we rather use READ_ONCE()/WRITE_ONCE() where
appropriate?

-- 
With Best Regards,
Andy Shevchenko
Re: [PATCH] i2c: designware: Fix corrupted memory seen in the ISR
Posted by Yann Sionneau 2 years, 5 months ago
On 13/09/2023 13:22, Andy Shevchenko wrote:
> On Wed, Sep 13, 2023 at 11:04:00AM +0200, Yann Sionneau wrote:
>> On 13/09/2023 03:03, Jan Bottorff wrote:
> ...
>
>>> +	/*
>>> +	 * To guarantee data written by the current core is visible to
>>> +	 * all cores, a write barrier is required. This needs to be
>>> +	 * before an interrupt causes execution on another core.
>>> +	 * For ARM processors, this needs to be a DSB barrier.
>>> +	 */
>>> +	wmb();
>> Apart from the commit message it looks good to me.
>>
>> If I understand correctly without this wmb() it is possible that the writes
>> to dev->msg_write_idx , dev->msg_read_idx = 0 etc would not yet be visible
>> to another CPU running the ISR handler right after enabling those.
> If this is the case, shouldn't we rather use READ_ONCE()/WRITE_ONCE() where
> appropriate?
>
To my knowledge the READ_ONCE()/WRITE_ONCE() only imply the use of 
volatile to access memory thus preventing the compiler to do weird 
optimizations like merging store/loads, moving store/loads, removing 
them etc

They don't imply a memory barrier.

Some systems need a memory barrier, to emit a "fence" like instruction, 
so that the pipeline stalls waiting for the store to "finish", for 
systems where the writes are posted.

Regards,

-- 

Yann
Re: [PATCH] i2c: designware: Fix corrupted memory seen in the ISR
Posted by Yann Sionneau 2 years, 5 months ago
On 13/09/2023 13:32, Yann Sionneau wrote:
>
> On 13/09/2023 13:22, Andy Shevchenko wrote:
>> On Wed, Sep 13, 2023 at 11:04:00AM +0200, Yann Sionneau wrote:
>>> On 13/09/2023 03:03, Jan Bottorff wrote:
>> ...
>>
>>>> +    /*
>>>> +     * To guarantee data written by the current core is visible to
>>>> +     * all cores, a write barrier is required. This needs to be
>>>> +     * before an interrupt causes execution on another core.
>>>> +     * For ARM processors, this needs to be a DSB barrier.
>>>> +     */
>>>> +    wmb();
>>> Apart from the commit message it looks good to me.
>>>
>>> If I understand correctly without this wmb() it is possible that the 
>>> writes
>>> to dev->msg_write_idx , dev->msg_read_idx = 0 etc would not yet be 
>>> visible
>>> to another CPU running the ISR handler right after enabling those.
>> If this is the case, shouldn't we rather use READ_ONCE()/WRITE_ONCE() 
>> where
>> appropriate?
>>
> To my knowledge the READ_ONCE()/WRITE_ONCE() only imply the use of 
> volatile to access memory thus preventing the compiler to do weird 
> optimizations like merging store/loads, moving store/loads, removing 
> them etc
>
> They don't imply a memory barrier.
>
> Some systems need a memory barrier, to emit a "fence" like 
> instruction, so that the pipeline stalls waiting for the store to 
> "finish", for systems where the writes are posted.
>
> Regards,
>
Well, for the READ_ONCE() actually I'm wrong, it's overloaded for Alpha 
and arm64 https://elixir.bootlin.com/linux/latest/C/ident/__READ_ONCE

-- 

Yann





Re: [PATCH] i2c: designware: Fix corrupted memory seen in the ISR
Posted by Jan Bottorff 2 years, 5 months ago
On 9/13/2023 4:54 AM, Yann Sionneau wrote:
> 
> On 13/09/2023 13:32, Yann Sionneau wrote:
>>
>> On 13/09/2023 13:22, Andy Shevchenko wrote:
...
>> To my knowledge the READ_ONCE()/WRITE_ONCE() only imply the use of 
>> volatile to access memory thus preventing the compiler to do weird 
>> optimizations like merging store/loads, moving store/loads, removing 
>> them etc
>>
>> They don't imply a memory barrier.
>>
>> Some systems need a memory barrier, to emit a "fence" like 
>> instruction, so that the pipeline stalls waiting for the store to 
>> "finish", for systems where the writes are posted.
>>
>> Regards,
>>
> Well, for the READ_ONCE() actually I'm wrong, it's overloaded for Alpha 
> and arm64 https://elixir.bootlin.com/linux/latest/C/ident/__READ_ONCE
>

Hi Yann,

I'll make a new submission with an improved commit message.

I looked at that ARM definition for READ_ONCE/WRITE_ONCE, and those 
didn't really look like they they addressed this kind of cross-core 
write visibility, it looked more like it was about avoiding read/write 
tearing. Read acquire and write release can sometimes replace explicit 
barrier instructions, but felt adding an explicit barrier was a simpler 
solution. If this were highly performance critical code, then the 
benefits of avoiding explicit barriers would be worth a lot more effort.

I know the ARM Barrier docs at 
https://developer.arm.com/documentation/genc007826/latest/ have an 
example under "Sending Interrupts and Barriers", page 19, that is very 
specific that a DSB barrier is required in cases where you write to 
normal memory on one core and then trigger an interrupt, and the normal 
memory is read on another core. It explicitly says that even if a memory 
write to strongly ordered memory (device memory) triggers the interrupt, 
you would need a DSB barrier. That example seemed like a very good match 
with what the DW i2c driver was doing.

I also found the video at https://www.youtube.com/watch?v=2I8OHacills 
had some useful explanations of when barriers are needed. It was a 
little puzzling in the video, it seemed to say a DMB barrier would be 
sufficient for this case, even though the ARM barrier docs say a DSB is 
needed. The DMB only stalls execution of memory accesses, the DSB stalls 
execution of all instructions. The Linux wmb macro is a DSB barrier on ARM.

Jan