arch/arm/include/asm/io.h | 11 ---------- arch/arm/kernel/io.c | 46 --------------------------------------- 2 files changed, 57 deletions(-)
From: Julian Vetter <julian@outer-limits.org>
Recently a new IO memcpy was added in libs/iomem_copy.c. So, remove the
byte-wise IO memcpy operations used in ARM big endian builds and fall
back to the new generic implementation. It will be slightly faster,
because it uses machine word accesses if the memory is aligned and falls
back to byte-wise accesses if its not.
Signed-off-by: Julian Vetter <julian@outer-limits.org>
---
arch/arm/include/asm/io.h | 11 ----------
arch/arm/kernel/io.c | 46 ---------------------------------------
2 files changed, 57 deletions(-)
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 1815748f5d2a..146509d82e30 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -253,13 +253,6 @@ void __iomem *pci_remap_cfgspace(resource_size_t res_cookie, size_t size);
#define insl(p,d,l) __raw_readsl(__io(p),d,l)
#endif
-/*
- * String version of IO memory access ops:
- */
-extern void _memcpy_fromio(void *, const volatile void __iomem *, size_t);
-extern void _memcpy_toio(volatile void __iomem *, const void *, size_t);
-extern void _memset_io(volatile void __iomem *, int, size_t);
-
/*
* Memory access primitives
* ------------------------
@@ -322,10 +315,6 @@ static inline void memcpy_toio(volatile void __iomem *to, const void *from,
}
#define memcpy_toio(to,from,count) memcpy_toio(to,from,count)
-#else
-#define memset_io(c,v,l) _memset_io(c,(v),(l))
-#define memcpy_fromio(a,c,l) _memcpy_fromio((a),c,(l))
-#define memcpy_toio(c,a,l) _memcpy_toio(c,(a),(l))
#endif
#endif /* readl */
diff --git a/arch/arm/kernel/io.c b/arch/arm/kernel/io.c
index 60b621295d6c..afa4eff8c6c5 100644
--- a/arch/arm/kernel/io.c
+++ b/arch/arm/kernel/io.c
@@ -37,49 +37,3 @@ void atomic_io_modify(void __iomem *reg, u32 mask, u32 set)
raw_spin_unlock_irqrestore(&__io_lock, flags);
}
EXPORT_SYMBOL(atomic_io_modify);
-
-/*
- * Copy data from IO memory space to "real" memory space.
- * This needs to be optimized.
- */
-void _memcpy_fromio(void *to, const volatile void __iomem *from, size_t count)
-{
- unsigned char *t = to;
- while (count) {
- count--;
- *t = readb(from);
- t++;
- from++;
- }
-}
-EXPORT_SYMBOL(_memcpy_fromio);
-
-/*
- * Copy data from "real" memory space to IO memory space.
- * This needs to be optimized.
- */
-void _memcpy_toio(volatile void __iomem *to, const void *from, size_t count)
-{
- const unsigned char *f = from;
- while (count) {
- count--;
- writeb(*f, to);
- f++;
- to++;
- }
-}
-EXPORT_SYMBOL(_memcpy_toio);
-
-/*
- * "memset" on IO memory space.
- * This needs to be optimized.
- */
-void _memset_io(volatile void __iomem *dst, int c, size_t count)
-{
- while (count) {
- count--;
- writeb(c, dst);
- dst++;
- }
-}
-EXPORT_SYMBOL(_memset_io);
--
2.34.1
On Tue, Dec 3, 2024, at 09:38, Julian Vetter wrote:
> From: Julian Vetter <julian@outer-limits.org>
>
> Recently a new IO memcpy was added in libs/iomem_copy.c. So, remove the
> byte-wise IO memcpy operations used in ARM big endian builds and fall
> back to the new generic implementation. It will be slightly faster,
> because it uses machine word accesses if the memory is aligned and falls
> back to byte-wise accesses if its not.
>
> Signed-off-by: Julian Vetter <julian@outer-limits.org>
> ---
> arch/arm/include/asm/io.h | 11 ----------
> arch/arm/kernel/io.c | 46 ---------------------------------------
> 2 files changed, 57 deletions(-)
I'm not sure if this is safe on all platforms. Big-endian arm
is extremely rare in practice, and in comes in multiple variants
that behave slightly differently:
- On modern ARMv7 the byte-invariant big-endian "BE8" mode
generally well-behaved and works as one would expect it to.
- There is one ARMv5 "BE32" based platform, the ixp4xx, which
works differently, and this in turn allows multiple configurations
for its buses where a byte-swap is performed in the PCI
controller.
When the little-endian I/O string operations got optimized to
calling the word-based helpers in commit 7ddfe625cbc1 ("ARM:
optimize memset_io()/memcpy_fromio()/memcpy_toio()"), Russell
intentionally left the big-endian versions alone, which I think
was done for the case of PCI on ixp4xx, but could have been
out of general caution.
Before we apply your patch, I think the minimum would be to
have Linus Walleij try it out on an an ixp4xx with a driver
that uses these functions. Maybe Russell remembers the exact
constraints that led to using byte access for big-endian
mmio string operations, and whether the new lib/iomem_copy.c
version causes problems.
I also looked at the little-endian arm32 version, and
it seems that here the generic code would work fine, but
the custom variant is likely much faster when both the
source and destination buffers are aligned, as it can
do larger MMIO transactions using ldm/stm instructions,
though the generic version would be a bit better if the
in-memory buffer is unaligned.
We could get the best of both by implementing optimized
arm32 versions __iowrite32_copy()/__ioread32_copy and
using those in the generic memcpy_{from,to}io.
Arnd
On Tue, Dec 3, 2024 at 11:08 AM Arnd Bergmann <arnd@arndb.de> wrote: > - There is one ARMv5 "BE32" based platform, the ixp4xx, which > works differently, and this in turn allows multiple configurations > for its buses where a byte-swap is performed in the PCI > controller. Actually it affects all memory-mapped I/O. E.g. drivers/irqchip/irq-ixp4xx.c for this reason IXP4xx drivers that are not PCI usually use __raw_readl() and __raw_writel() to let the bus do its thing. IXP4xx PCI drivers are however *not* doing this, because PCI is (IIUC) specified to be little-endian, and the hardware deals with this. (Except for the PCI host driver itself, using __raw accesses.) > Before we apply your patch, I think the minimum would be to > have Linus Walleij try it out on an an ixp4xx with a driver > that uses these functions. I applied the patch, compiled v6.13-rc1 HEAD for the Linksys NSLU2 (oldest system there is! also a classic!) + Boots cleanly + Mounts root - this is on a USB drive which is on a PCI-attached USB controller, so PCI is tested + Reaches userspace and console + Gets network and IP address + LuCI web user inteface works (browsed to web server on device) + Dropbear SSH login works fine Tested-by: Linus Walleij <linus.walleij@linaro.org> I doubt any IXP4xx driver uses ioread/iowrite. The MTD driver does not for example, because of other weirdness with address swizzling (drivers/mtd/maps/physmap-ixp4xx.c) For similar reasons drivers/ata/pata_ixp4xx_cf.c is using readw/writew exclusively. It also looks like a good idea: Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Your, Linus Walleij
On 03.12.24 11:08, Arnd Bergmann wrote:
> On Tue, Dec 3, 2024, at 09:38, Julian Vetter wrote:
>> From: Julian Vetter <julian@outer-limits.org>
>>
>> Recently a new IO memcpy was added in libs/iomem_copy.c. So, remove the
>> byte-wise IO memcpy operations used in ARM big endian builds and fall
>> back to the new generic implementation. It will be slightly faster,
>> because it uses machine word accesses if the memory is aligned and falls
>> back to byte-wise accesses if its not.
>>
>> Signed-off-by: Julian Vetter <julian@outer-limits.org>
>> ---
>> arch/arm/include/asm/io.h | 11 ----------
>> arch/arm/kernel/io.c | 46 ---------------------------------------
>> 2 files changed, 57 deletions(-)
>
Hello Arnd,
first, sorry, that I messed up my sender email while sending the patch.
> I'm not sure if this is safe on all platforms. Big-endian arm
> is extremely rare in practice, and in comes in multiple variants
> that behave slightly differently:
>
> - On modern ARMv7 the byte-invariant big-endian "BE8" mode
> generally well-behaved and works as one would expect it to.
>
> - There is one ARMv5 "BE32" based platform, the ixp4xx, which
> works differently, and this in turn allows multiple configurations
> for its buses where a byte-swap is performed in the PCI
> controller.
>
> When the little-endian I/O string operations got optimized to
> calling the word-based helpers in commit 7ddfe625cbc1 ("ARM:
> optimize memset_io()/memcpy_fromio()/memcpy_toio()"), Russell
> intentionally left the big-endian versions alone, which I think
> was done for the case of PCI on ixp4xx, but could have been
> out of general caution.
>
Wow, thank you for the explanation!
> Before we apply your patch, I think the minimum would be to
> have Linus Walleij try it out on an an ixp4xx with a driver
> that uses these functions. Maybe Russell remembers the exact
> constraints that led to using byte access for big-endian
> mmio string operations, and whether the new lib/iomem_copy.c
> version causes problems.
>
> I also looked at the little-endian arm32 version, and
> it seems that here the generic code would work fine, but
> the custom variant is likely much faster when both the
> source and destination buffers are aligned, as it can
> do larger MMIO transactions using ldm/stm instructions,
> though the generic version would be a bit better if the
> in-memory buffer is unaligned.
> We could get the best of both by implementing optimized
> arm32 versions __iowrite32_copy()/__ioread32_copy and
> using those in the generic memcpy_{from,to}io.
>
ok, then I will wait for now, to see if it works with the current
version. If not, I will have a look at using the
__iowrite32_copy/__ioread32_copy functions.
> Arnd
© 2016 - 2026 Red Hat, Inc.