lib/iomem_copy.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
The recently added IO memcpy and memset functions lack support for
barriers or other sync functions before and/or after the transaction. To
convert more architectures to use the generic IO memcpy and memset
functions, add empty __pre_io_sync and __post_io_sync defines that can
be overwritten by individual architectures if needed.
Signed-off-by: Julian Vetter <julian@outer-limits.org>
---
lib/iomem_copy.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/lib/iomem_copy.c b/lib/iomem_copy.c
index dec7eaea60e0..2e81182dd4d3 100644
--- a/lib/iomem_copy.c
+++ b/lib/iomem_copy.c
@@ -9,6 +9,14 @@
#include <linux/types.h>
#include <linux/unaligned.h>
+#ifndef __pre_io_sync
+#define __pre_io_sync
+#endif
+
+#ifndef __post_io_sync
+#define __post_io_sync
+#endif
+
#ifndef memset_io
/**
* memset_io() - Set a range of I/O memory to a constant value
@@ -24,6 +32,8 @@ void memset_io(volatile void __iomem *addr, int val, size_t count)
qc *= ~0UL / 0xff;
+ __pre_io_sync;
+
while (count && !IS_ALIGNED((long)addr, sizeof(long))) {
__raw_writeb(val, addr);
addr++;
@@ -46,6 +56,8 @@ void memset_io(volatile void __iomem *addr, int val, size_t count)
addr++;
count--;
}
+
+ __post_io_sync;
}
EXPORT_SYMBOL(memset_io);
#endif
@@ -61,6 +73,8 @@ EXPORT_SYMBOL(memset_io);
*/
void memcpy_fromio(void *dst, const volatile void __iomem *src, size_t count)
{
+ __pre_io_sync;
+
while (count && !IS_ALIGNED((long)src, sizeof(long))) {
*(u8 *)dst = __raw_readb(src);
src++;
@@ -88,6 +102,8 @@ void memcpy_fromio(void *dst, const volatile void __iomem *src, size_t count)
dst++;
count--;
}
+
+ __post_io_sync
}
EXPORT_SYMBOL(memcpy_fromio);
#endif
@@ -103,6 +119,8 @@ EXPORT_SYMBOL(memcpy_fromio);
*/
void memcpy_toio(volatile void __iomem *dst, const void *src, size_t count)
{
+ __pre_io_sync;
+
while (count && !IS_ALIGNED((long)dst, sizeof(long))) {
__raw_writeb(*(u8 *)src, dst);
src++;
@@ -129,6 +147,8 @@ void memcpy_toio(volatile void __iomem *dst, const void *src, size_t count)
dst++;
count--;
}
+
+ __post_io_sync;
}
EXPORT_SYMBOL(memcpy_toio);
#endif
--
2.34.1
On Mon, Jan 27, 2025, at 11:04, Julian Vetter wrote: > The recently added IO memcpy and memset functions lack support for > barriers or other sync functions before and/or after the transaction. To > convert more architectures to use the generic IO memcpy and memset > functions, add empty __pre_io_sync and __post_io_sync defines that can > be overwritten by individual architectures if needed. > > Signed-off-by: Julian Vetter <julian@outer-limits.org> > --- > lib/iomem_copy.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/lib/iomem_copy.c b/lib/iomem_copy.c > index dec7eaea60e0..2e81182dd4d3 100644 > --- a/lib/iomem_copy.c > +++ b/lib/iomem_copy.c > @@ -9,6 +9,14 @@ > #include <linux/types.h> > #include <linux/unaligned.h> > > +#ifndef __pre_io_sync > +#define __pre_io_sync > +#endif > + > +#ifndef __post_io_sync > +#define __post_io_sync > +#endif I think we should define what these barriers are supposed to do exactly, and how they relate to the __io_br/__io_ar/__io_bw/__io_aw ones include/asm-generic/io.h. Depending on what the barriers are meant to do, we probably want to either use the existing ones directly or use a similar naming scheme. Arnd
On 1/27/25 11:27, Arnd Bergmann wrote: > On Mon, Jan 27, 2025, at 11:04, Julian Vetter wrote: >> The recently added IO memcpy and memset functions lack support for >> barriers or other sync functions before and/or after the transaction. To >> convert more architectures to use the generic IO memcpy and memset >> functions, add empty __pre_io_sync and __post_io_sync defines that can >> be overwritten by individual architectures if needed. >> >> Signed-off-by: Julian Vetter <julian@outer-limits.org> >> --- >> lib/iomem_copy.c | 20 ++++++++++++++++++++ >> 1 file changed, 20 insertions(+) >> >> diff --git a/lib/iomem_copy.c b/lib/iomem_copy.c >> index dec7eaea60e0..2e81182dd4d3 100644 >> --- a/lib/iomem_copy.c >> +++ b/lib/iomem_copy.c >> @@ -9,6 +9,14 @@ >> #include <linux/types.h> >> #include <linux/unaligned.h> >> >> +#ifndef __pre_io_sync >> +#define __pre_io_sync >> +#endif >> + >> +#ifndef __post_io_sync >> +#define __post_io_sync >> +#endif > > I think we should define what these barriers are supposed to > do exactly, and how they relate to the __io_br/__io_ar/__io_bw/__io_aw > ones include/asm-generic/io.h. > Thank you for your quick reply. You're right, I was just going with the naming used in the powerpc arch which has an io_sync define. I'm now wondering if we can't simply use the read{l,q}/write{l,q} functions (instead of the __raw_xxx version), there are already calls to __io_br before and__io_ar after each read (and write). But this might have performance implications on some architectures, depending what it resolves to. Otherwise I propose renaming the __pre_io_sync and __post_io_sync into a single __io_mbr which is called before and after each loop. Looking at PowerPC and SuperH, both of them could be consolidated into the generic IO memcpy code when adding this. What do you think? The existing ones, especially __io_br unfortunately don't resolve to the right define on these architectures. The __io_ar and __io_br resolve to the right mb() on SuperH and PowerPC as well, but this would again have implications on other architectures. Thank you! Julian > Depending on what the barriers are meant to do, we probably > want to either use the existing ones directly or use a similar > naming scheme. > > Arnd
On Mon, Jan 27, 2025, at 15:11, Julian Vetter wrote: > On 1/27/25 11:27, Arnd Bergmann wrote: >> On Mon, Jan 27, 2025, at 11:04, Julian Vetter wrote: > > Thank you for your quick reply. You're right, I was just going with the > naming used in the powerpc arch which has an io_sync define. I'm now > wondering if we can't simply use the read{l,q}/write{l,q} functions > (instead of the __raw_xxx version), there are already calls to __io_br > before and__io_ar after each read (and write). But this might have > performance implications on some architectures, depending what it > resolves to. > > Otherwise I propose renaming the __pre_io_sync and __post_io_sync into a > single __io_mbr which is called before and after each loop. Looking at > PowerPC and SuperH, both of them could be consolidated into the generic > IO memcpy code when adding this. What do you think? Having barriers between the accesses would be very expensive, and prevent the write-combining and prefetching that can otherwise happen (depending on mapping flags). I suspect that the powerpc variant got this wrong for historic reasons, but that's hard to tell now. The ppc32 variant didn't have barriers at all originally, it was just memcpy/memset before it got combined with ppc64 into arch/powerpc. > The existing ones, especially __io_br unfortunately don't resolve to the > right define on these architectures. The __io_ar and __io_br resolve to > the right mb() on SuperH and PowerPC as well, but this would again have > implications on other architectures. The barriers in the sh functions seem arbitrary, and I would expect them to be wrong. ARnd
On 1/27/25 16:48, Arnd Bergmann wrote: > On Mon, Jan 27, 2025, at 15:11, Julian Vetter wrote: >> On 1/27/25 11:27, Arnd Bergmann wrote: >>> On Mon, Jan 27, 2025, at 11:04, Julian Vetter wrote: >> >> Thank you for your quick reply. You're right, I was just going with the >> naming used in the powerpc arch which has an io_sync define. I'm now >> wondering if we can't simply use the read{l,q}/write{l,q} functions >> (instead of the __raw_xxx version), there are already calls to __io_br >> before and__io_ar after each read (and write). But this might have >> performance implications on some architectures, depending what it >> resolves to. >> >> Otherwise I propose renaming the __pre_io_sync and __post_io_sync into a >> single __io_mbr which is called before and after each loop. Looking at >> PowerPC and SuperH, both of them could be consolidated into the generic >> IO memcpy code when adding this. What do you think? > > Having barriers between the accesses would be very expensive, and > prevent the write-combining and prefetching that can otherwise happen > (depending on mapping flags). Yes, ok. I see. > > I suspect that the powerpc variant got this wrong for historic > reasons, but that's hard to tell now. The ppc32 variant didn't > have barriers at all originally, it was just memcpy/memset > before it got combined with ppc64 into arch/powerpc. > hmmm... ok. I'm not sure what do make of this. But maybe I can just send a patch to the PowerPC mailinglist, without those "sync" calls and see what they have to say. >> The existing ones, especially __io_br unfortunately don't resolve to the >> right define on these architectures. The __io_ar and __io_br resolve to >> the right mb() on SuperH and PowerPC as well, but this would again have >> implications on other architectures. > > The barriers in the sh functions seem arbitrary, and I would > expect them to be wrong. > Ok, same for SuperH, I will send a patch to the mailinglist without the 'mb()' before and after and see what they say. If they don't like it, I will come back to this. Julian > > ARnd
On Tue, Jan 28, 2025, at 09:32, Julian Vetter wrote: > On 1/27/25 16:48, Arnd Bergmann wrote: >> I suspect that the powerpc variant got this wrong for historic >> reasons, but that's hard to tell now. The ppc32 variant didn't >> have barriers at all originally, it was just memcpy/memset >> before it got combined with ppc64 into arch/powerpc. >> > > hmmm... ok. I'm not sure what do make of this. But maybe I can just send > a patch to the PowerPC mailinglist, without those "sync" calls and see > what they have to say. I just looked again and I see that the powerpc I/O memcpy/memset functions do use the same barriers as readl/writel -- sync before each one, twi/isync after read and sync after write, the only difference is the eieio in the middle, so you could start with a patch that removes the eieio. The question here is whether we want to allow or prevent combining and reordering accesses within the string operations, and my feeling is that allowing them makes more sense here, but there may be a powerpc specific reason we don't want that. I'm still unsure about having barriers before/after the string operations, I can very much see that debate go either way, but I also feel like this should be done consistently across all architectures. A possible answer may be that we declare these helpers to include the same barriers as readl_relaxed()/writel_relaxed(), i.e. ordering between I/O operations is enforced, but not the barriers for serializing against DMA and interrupts that is provided by the normal readl()/writel(). Most architectures assume that the relaxed variants don't need any barriers, so that is what asm-generic/io.h implements, but I think alpha and mips need barriers here and powerpc simply defines the relaxed accessors to be identical to the non-relaxed ones for simplicity. >>> The existing ones, especially __io_br unfortunately don't resolve to the >>> right define on these architectures. The __io_ar and __io_br resolve to >>> the right mb() on SuperH and PowerPC as well, but this would again have >>> implications on other architectures. >> >> The barriers in the sh functions seem arbitrary, and I would >> expect them to be wrong. >> > Ok, same for SuperH, I will send a patch to the mailinglist without the > 'mb()' before and after and see what they say. If they don't like it, I > will come back to this. I doubt there is anyone left who understands the history behind the exact implementation on sh, the only real options I see for it are to not touch it at all, or to remove the functions in favor of the generic ones. sh also has custom readsl/writesl functions, which are related, but I see that it is completely missing the corresponding readsw/writesw and readsb/writesb interfaces and it prevents the use of the generic implementations. Arnd
© 2016 - 2025 Red Hat, Inc.