[PATCH] Add io_sync stubs to generic IO memcpy/memset

Julian Vetter posted 1 patch 2 days, 22 hours ago
lib/iomem_copy.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
[PATCH] Add io_sync stubs to generic IO memcpy/memset
Posted by Julian Vetter 2 days, 22 hours ago
The recently added IO memcpy and memset functions lack support for
barriers or other sync functions before and/or after the transaction. To
convert more architectures to use the generic IO memcpy and memset
functions, add empty __pre_io_sync and __post_io_sync defines that can
be overwritten by individual architectures if needed.

Signed-off-by: Julian Vetter <julian@outer-limits.org>
---
 lib/iomem_copy.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/lib/iomem_copy.c b/lib/iomem_copy.c
index dec7eaea60e0..2e81182dd4d3 100644
--- a/lib/iomem_copy.c
+++ b/lib/iomem_copy.c
@@ -9,6 +9,14 @@
 #include <linux/types.h>
 #include <linux/unaligned.h>
 
+#ifndef __pre_io_sync
+#define __pre_io_sync
+#endif
+
+#ifndef __post_io_sync
+#define __post_io_sync
+#endif
+
 #ifndef memset_io
 /**
  * memset_io() - Set a range of I/O memory to a constant value
@@ -24,6 +32,8 @@ void memset_io(volatile void __iomem *addr, int val, size_t count)
 
 	qc *= ~0UL / 0xff;
 
+	__pre_io_sync;
+
 	while (count && !IS_ALIGNED((long)addr, sizeof(long))) {
 		__raw_writeb(val, addr);
 		addr++;
@@ -46,6 +56,8 @@ void memset_io(volatile void __iomem *addr, int val, size_t count)
 		addr++;
 		count--;
 	}
+
+	__post_io_sync;
 }
 EXPORT_SYMBOL(memset_io);
 #endif
@@ -61,6 +73,8 @@ EXPORT_SYMBOL(memset_io);
  */
 void memcpy_fromio(void *dst, const volatile void __iomem *src, size_t count)
 {
+	__pre_io_sync;
+
 	while (count && !IS_ALIGNED((long)src, sizeof(long))) {
 		*(u8 *)dst = __raw_readb(src);
 		src++;
@@ -88,6 +102,8 @@ void memcpy_fromio(void *dst, const volatile void __iomem *src, size_t count)
 		dst++;
 		count--;
 	}
+
+	__post_io_sync
 }
 EXPORT_SYMBOL(memcpy_fromio);
 #endif
@@ -103,6 +119,8 @@ EXPORT_SYMBOL(memcpy_fromio);
  */
 void memcpy_toio(volatile void __iomem *dst, const void *src, size_t count)
 {
+	__pre_io_sync;
+
 	while (count && !IS_ALIGNED((long)dst, sizeof(long))) {
 		__raw_writeb(*(u8 *)src, dst);
 		src++;
@@ -129,6 +147,8 @@ void memcpy_toio(volatile void __iomem *dst, const void *src, size_t count)
 		dst++;
 		count--;
 	}
+
+	__post_io_sync;
 }
 EXPORT_SYMBOL(memcpy_toio);
 #endif
-- 
2.34.1
Re: [PATCH] Add io_sync stubs to generic IO memcpy/memset
Posted by Arnd Bergmann 2 days, 22 hours ago
On Mon, Jan 27, 2025, at 11:04, Julian Vetter wrote:
> The recently added IO memcpy and memset functions lack support for
> barriers or other sync functions before and/or after the transaction. To
> convert more architectures to use the generic IO memcpy and memset
> functions, add empty __pre_io_sync and __post_io_sync defines that can
> be overwritten by individual architectures if needed.
>
> Signed-off-by: Julian Vetter <julian@outer-limits.org>
> ---
>  lib/iomem_copy.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/lib/iomem_copy.c b/lib/iomem_copy.c
> index dec7eaea60e0..2e81182dd4d3 100644
> --- a/lib/iomem_copy.c
> +++ b/lib/iomem_copy.c
> @@ -9,6 +9,14 @@
>  #include <linux/types.h>
>  #include <linux/unaligned.h>
> 
> +#ifndef __pre_io_sync
> +#define __pre_io_sync
> +#endif
> +
> +#ifndef __post_io_sync
> +#define __post_io_sync
> +#endif

I think we should define what these barriers are supposed to
do exactly, and how they relate to the __io_br/__io_ar/__io_bw/__io_aw
ones include/asm-generic/io.h.

Depending on what the barriers are meant to do, we probably
want to either use the existing ones directly or use a similar
naming scheme.

     Arnd
Re: [PATCH] Add io_sync stubs to generic IO memcpy/memset
Posted by Julian Vetter 2 days, 18 hours ago

On 1/27/25 11:27, Arnd Bergmann wrote:
> On Mon, Jan 27, 2025, at 11:04, Julian Vetter wrote:
>> The recently added IO memcpy and memset functions lack support for
>> barriers or other sync functions before and/or after the transaction. To
>> convert more architectures to use the generic IO memcpy and memset
>> functions, add empty __pre_io_sync and __post_io_sync defines that can
>> be overwritten by individual architectures if needed.
>>
>> Signed-off-by: Julian Vetter <julian@outer-limits.org>
>> ---
>>   lib/iomem_copy.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/lib/iomem_copy.c b/lib/iomem_copy.c
>> index dec7eaea60e0..2e81182dd4d3 100644
>> --- a/lib/iomem_copy.c
>> +++ b/lib/iomem_copy.c
>> @@ -9,6 +9,14 @@
>>   #include <linux/types.h>
>>   #include <linux/unaligned.h>
>>
>> +#ifndef __pre_io_sync
>> +#define __pre_io_sync
>> +#endif
>> +
>> +#ifndef __post_io_sync
>> +#define __post_io_sync
>> +#endif
> 
> I think we should define what these barriers are supposed to
> do exactly, and how they relate to the __io_br/__io_ar/__io_bw/__io_aw
> ones include/asm-generic/io.h.
> 

Thank you for your quick reply. You're right, I was just going with the
naming used in the powerpc arch which has an io_sync define. I'm now
wondering if we can't simply use the read{l,q}/write{l,q} functions
(instead of the __raw_xxx version), there are already calls to __io_br
before and__io_ar after each read (and write). But this might have
performance implications on some architectures, depending what it
resolves to.
Otherwise I propose renaming the __pre_io_sync and __post_io_sync into a
single __io_mbr which is called before and after each loop. Looking at 
PowerPC and SuperH, both of them could be consolidated into the generic 
IO memcpy code when adding this. What do you think?
The existing ones, especially __io_br unfortunately don't resolve to the 
right define on these architectures. The __io_ar and __io_br resolve to 
the right mb() on SuperH and PowerPC as well, but this would again have 
implications on other architectures.
Thank you!

Julian

> Depending on what the barriers are meant to do, we probably
> want to either use the existing ones directly or use a similar
> naming scheme.
> 
>       Arnd
Re: [PATCH] Add io_sync stubs to generic IO memcpy/memset
Posted by Arnd Bergmann 2 days, 16 hours ago
On Mon, Jan 27, 2025, at 15:11, Julian Vetter wrote:
> On 1/27/25 11:27, Arnd Bergmann wrote:
>> On Mon, Jan 27, 2025, at 11:04, Julian Vetter wrote:
>
> Thank you for your quick reply. You're right, I was just going with the
> naming used in the powerpc arch which has an io_sync define. I'm now
> wondering if we can't simply use the read{l,q}/write{l,q} functions
> (instead of the __raw_xxx version), there are already calls to __io_br
> before and__io_ar after each read (and write). But this might have
> performance implications on some architectures, depending what it
> resolves to.
>
> Otherwise I propose renaming the __pre_io_sync and __post_io_sync into a
> single __io_mbr which is called before and after each loop. Looking at 
> PowerPC and SuperH, both of them could be consolidated into the generic 
> IO memcpy code when adding this. What do you think?

Having barriers between the accesses would be very expensive, and
prevent the write-combining and prefetching that can otherwise happen
(depending on mapping flags).

I suspect that the powerpc variant got this wrong for historic
reasons, but that's hard to tell now. The ppc32 variant didn't
have barriers at all originally, it was just memcpy/memset
before it got combined with ppc64 into arch/powerpc.

> The existing ones, especially __io_br unfortunately don't resolve to the 
> right define on these architectures. The __io_ar and __io_br resolve to 
> the right mb() on SuperH and PowerPC as well, but this would again have 
> implications on other architectures.

The barriers in the sh functions seem arbitrary, and I would
expect them to be wrong.


      ARnd
Re: [PATCH] Add io_sync stubs to generic IO memcpy/memset
Posted by Julian Vetter 2 days ago
On 1/27/25 16:48, Arnd Bergmann wrote:
> On Mon, Jan 27, 2025, at 15:11, Julian Vetter wrote:
>> On 1/27/25 11:27, Arnd Bergmann wrote:
>>> On Mon, Jan 27, 2025, at 11:04, Julian Vetter wrote:
>>
>> Thank you for your quick reply. You're right, I was just going with the
>> naming used in the powerpc arch which has an io_sync define. I'm now
>> wondering if we can't simply use the read{l,q}/write{l,q} functions
>> (instead of the __raw_xxx version), there are already calls to __io_br
>> before and__io_ar after each read (and write). But this might have
>> performance implications on some architectures, depending what it
>> resolves to.
>>
>> Otherwise I propose renaming the __pre_io_sync and __post_io_sync into a
>> single __io_mbr which is called before and after each loop. Looking at
>> PowerPC and SuperH, both of them could be consolidated into the generic
>> IO memcpy code when adding this. What do you think?
> 
> Having barriers between the accesses would be very expensive, and
> prevent the write-combining and prefetching that can otherwise happen
> (depending on mapping flags).

Yes, ok. I see.

> 
> I suspect that the powerpc variant got this wrong for historic
> reasons, but that's hard to tell now. The ppc32 variant didn't
> have barriers at all originally, it was just memcpy/memset
> before it got combined with ppc64 into arch/powerpc.
> 

hmmm... ok. I'm not sure what do make of this. But maybe I can just send 
a patch to the PowerPC mailinglist, without those "sync" calls and see 
what they have to say.

>> The existing ones, especially __io_br unfortunately don't resolve to the
>> right define on these architectures. The __io_ar and __io_br resolve to
>> the right mb() on SuperH and PowerPC as well, but this would again have
>> implications on other architectures.
> 
> The barriers in the sh functions seem arbitrary, and I would
> expect them to be wrong.
> 
Ok, same for SuperH, I will send a patch to the mailinglist without the 
'mb()' before and after and see what they say. If they don't like it, I 
will come back to this.

Julian

> 
>        ARnd
Re: [PATCH] Add io_sync stubs to generic IO memcpy/memset
Posted by Arnd Bergmann 1 day, 23 hours ago
On Tue, Jan 28, 2025, at 09:32, Julian Vetter wrote:
> On 1/27/25 16:48, Arnd Bergmann wrote:
>> I suspect that the powerpc variant got this wrong for historic
>> reasons, but that's hard to tell now. The ppc32 variant didn't
>> have barriers at all originally, it was just memcpy/memset
>> before it got combined with ppc64 into arch/powerpc.
>> 
>
> hmmm... ok. I'm not sure what do make of this. But maybe I can just send 
> a patch to the PowerPC mailinglist, without those "sync" calls and see 
> what they have to say.

I just looked again and I see that the powerpc I/O memcpy/memset
functions do use the same barriers as readl/writel -- sync
before each one, twi/isync after read and sync after write,
the only difference is the eieio in the middle, so you could
start with a patch that removes the eieio.  The question here
is whether we want to allow or prevent combining and reordering
accesses within the string operations, and my feeling is that
allowing them makes more sense here, but there may be a powerpc
specific reason we don't want that.

I'm still unsure about having barriers before/after the string
operations, I can very much see that debate go either way, but
I also feel like this should be done consistently across all
architectures.

A possible answer may be that we declare these helpers to
include the same barriers as readl_relaxed()/writel_relaxed(),
i.e. ordering between I/O operations is enforced, but not
the barriers for serializing against DMA and interrupts
that is provided by the normal readl()/writel().

Most architectures assume that the relaxed variants don't
need any barriers, so that is what asm-generic/io.h
implements, but I think alpha and mips need barriers here
and powerpc simply defines the relaxed accessors to
be identical to the non-relaxed ones for simplicity.

>>> The existing ones, especially __io_br unfortunately don't resolve to the
>>> right define on these architectures. The __io_ar and __io_br resolve to
>>> the right mb() on SuperH and PowerPC as well, but this would again have
>>> implications on other architectures.
>> 
>> The barriers in the sh functions seem arbitrary, and I would
>> expect them to be wrong.
>> 
> Ok, same for SuperH, I will send a patch to the mailinglist without the 
> 'mb()' before and after and see what they say. If they don't like it, I 
> will come back to this.

I doubt there is anyone left who understands the history behind
the exact implementation on sh, the only real options I see for it
are to not touch it at all, or to remove the functions in favor of
the generic ones.

sh also has custom readsl/writesl functions, which are related,
but I see that it is completely missing the corresponding
readsw/writesw and readsb/writesb interfaces and it prevents
the use of the generic implementations.

     Arnd