This commit introduces two helper functions, `memcpy-fromio` and
`memcpy-toio`, to provide a robust mechanism for copying data between
standard memory and memory-mapped I/O (MMIO) space for the ARM
architecture.
These helpers handle alignment safely by using ordered byte accesses for
any leading/trailing unaligned bytes and ordered 32-bit accesses for the
aligned bulk transfer. Using the ordered `readb/readl` and
`writeb/writel` accessors avoids unintended endianness conversion while
respecting device ordering requirements on ARM32/ARM64 hardware that may
not support 64-bit MMIO atomically.
The interface lives in the generic header so other architectures can
provide their own implementations (as macros or functions). The ARM
implementation is placed under `arch/arm/lib/` (mirroring the x86
reference layout) and is split into separate compilation units added via
the architecture-specific lib Makefile.
Signed-off-by: Oleksii Moisieiev <oleksii_moisieiev@epam.com>
---
Changes in v8:
- switched to ordered accessors to address the ordering and barrier
concerns.
- updated the documentation to match the implementation and explicitly
state the supported access sizes and granularity.
- rename memcpy_* implementation files to memcpu-* to follow naming
convension
- fix indentation to match Xen style
- fix intendation to match Xen style
- move memcpy-{from/to}io to more convenient library place
Changes in v7:
- x86 guidance: removed the speculative note; header now just says
each arch supplies its own implementation or macro.
- name spacing: dropped the double-underscore; the helpers are now
memcpy_fromio / memcpy_toio. The header also explicitly allows an
arch to define these as macros before including it.
- updated io.c to keep 32-bit transfers safe on arm32
- moved to __raw_read*/__raw_write* accessors to avoid endianness conversion.
- split the helpers into separate compilation units
Changes in v6:
- sorted objs in Makefile alhabetically
- added newline at the end of Makefile
- used uint{N}_t intead of u{N}
- add comment about why 32 bit IO operations were used
- updated cast opertaions to avoid dropping constness which is wrong
- move function definitions to generic place so the could be reused by
other arch
- add SPDX tag to io.c
Changes in v5:
- move memcpy_toio/fromio to the generic place
xen/arch/arm/Makefile | 1 +
xen/arch/arm/arch.mk | 1 +
xen/arch/arm/lib/Makefile | 2 +
xen/arch/arm/lib/memcpy-fromio.c | 48 +++++++++++++++++++++
xen/arch/arm/lib/memcpy-toio.c | 48 +++++++++++++++++++++
xen/include/xen/lib/io.h | 71 ++++++++++++++++++++++++++++++++
6 files changed, 171 insertions(+)
create mode 100644 xen/arch/arm/lib/Makefile
create mode 100644 xen/arch/arm/lib/memcpy-fromio.c
create mode 100644 xen/arch/arm/lib/memcpy-toio.c
create mode 100644 xen/include/xen/lib/io.h
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 7494a0f926..bd8638c8a7 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -10,6 +10,7 @@ endif
obj-y += firmware/
obj-$(CONFIG_TEE) += tee/
obj-$(CONFIG_HAS_VPCI) += vpci.o
+obj-y += lib/
obj-$(CONFIG_HAS_ALTERNATIVE) += alternative.o
obj-y += cpuerrata.o
diff --git a/xen/arch/arm/arch.mk b/xen/arch/arm/arch.mk
index dea8dbd18a..0c28dbeb87 100644
--- a/xen/arch/arm/arch.mk
+++ b/xen/arch/arm/arch.mk
@@ -2,6 +2,7 @@
# arm-specific definitions
ARCH_LIBS-y += arch/arm/$(ARCH)/lib/lib.a
+ALL_LIBS-y += arch/arm/lib/lib.a
$(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
$(call cc-option-add,CFLAGS,CC,-Wnested-externs)
diff --git a/xen/arch/arm/lib/Makefile b/xen/arch/arm/lib/Makefile
new file mode 100644
index 0000000000..07a0d9186c
--- /dev/null
+++ b/xen/arch/arm/lib/Makefile
@@ -0,0 +1,2 @@
+lib-y += memcpy-fromio.o
+lib-y += memcpy-toio.o
diff --git a/xen/arch/arm/lib/memcpy-fromio.c b/xen/arch/arm/lib/memcpy-fromio.c
new file mode 100644
index 0000000000..85377137c3
--- /dev/null
+++ b/xen/arch/arm/lib/memcpy-fromio.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include <asm/io.h>
+#include <xen/lib/io.h>
+
+/*
+ * Use ordered 8-bit and 32-bit IO accessors for portability across
+ * ARM32/ARM64 where 64-bit accesses may not be atomic and some devices
+ * only support 32-bit aligned accesses.
+ */
+
+void memcpy_fromio(void *to, const volatile void __iomem *from,
+ size_t count)
+{
+ while ( count && (!IS_ALIGNED((unsigned long)from, 4) ||
+ !IS_ALIGNED((unsigned long)to, 4)) )
+ {
+ *(uint8_t *)to = readb(from);
+ from++;
+ to++;
+ count--;
+ }
+
+ while ( count >= 4 )
+ {
+ *(uint32_t *)to = readl(from);
+ from += 4;
+ to += 4;
+ count -= 4;
+ }
+
+ while ( count )
+ {
+ *(uint8_t *)to = readb(from);
+ from++;
+ to++;
+ count--;
+ }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 8
+ * tab-width: 8
+ * indent-tabs-mode: t
+ * End:
+ */
diff --git a/xen/arch/arm/lib/memcpy-toio.c b/xen/arch/arm/lib/memcpy-toio.c
new file mode 100644
index 0000000000..588497ed0f
--- /dev/null
+++ b/xen/arch/arm/lib/memcpy-toio.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include <asm/io.h>
+#include <xen/lib/io.h>
+
+/*
+ * Use ordered 8-bit and 32-bit IO accessors for portability across
+ * ARM32/ARM64 where 64-bit accesses may not be atomic and some devices
+ * only support 32-bit aligned accesses.
+ */
+
+void memcpy_toio(volatile void __iomem *to, const void *from,
+ size_t count)
+{
+ while ( count && (!IS_ALIGNED((unsigned long)to, 4) ||
+ !IS_ALIGNED((unsigned long)from, 4)) )
+ {
+ writeb(*(const uint8_t *)from, to);
+ from++;
+ to++;
+ count--;
+ }
+
+ while ( count >= 4 )
+ {
+ writel(*(const uint32_t *)from, to);
+ from += 4;
+ to += 4;
+ count -= 4;
+ }
+
+ while ( count )
+ {
+ writeb(*(const uint8_t *)from, to);
+ from++;
+ to++;
+ count--;
+ }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 8
+ * tab-width: 8
+ * indent-tabs-mode: t
+ * End:
+ */
diff --git a/xen/include/xen/lib/io.h b/xen/include/xen/lib/io.h
new file mode 100644
index 0000000000..1c0865401e
--- /dev/null
+++ b/xen/include/xen/lib/io.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generic I/O memory copy function prototypes.
+ *
+ * These functions provide low-level implementation for copying data between
+ * regular memory and I/O memory regions. Each architecture must provide its
+ * own implementation based on the specific requirements of the architecture's
+ * memory model and I/O access patterns. An architecture may supply these as
+ * functions or as macros in its own headers before including this file.
+ *
+ * Architecture-specific implementations:
+ * =====================================
+ * Each architecture should implement these functions in xen/lib/<arch>/io.c
+ * (or define them as macros) based on their hardware requirements. See
+ * xen/lib/arm/io.c for an example using explicit I/O accessors.
+ */
+
+#ifndef _XEN_LIB_IO_H
+#define _XEN_LIB_IO_H
+
+#include <xen/types.h>
+
+/*
+ * memcpy_fromio - Copy data from I/O memory space to regular memory
+ * @to: Destination buffer in regular memory
+ * @from: Source address in I/O memory space (must be marked __iomem)
+ * @count: Number of bytes to copy
+ *
+ * This function handles copying from memory-mapped I/O regions using
+ * architecture-appropriate I/O accessor functions (e.g. readb/readl on Arm)
+ * that already impose the required ordering for device accesses. Typical
+ * implementations may:
+ * - Handle leading/trailing unaligned bytes with 8-bit accesses
+ * - Use the widest safe aligned access size supported by the target (often
+ * 32-bit on Arm where 64-bit MMIO may not be atomic)
+ * - Rely on MMIO accessors to provide the needed barriers
+ *
+ * Limitations:
+ * - Only suitable for devices that tolerate 8-bit and 32-bit accesses; it is
+ * not valid for devices that require strictly 16-bit or 64-bit access sizes.
+ * - Callers must ensure the target MMIO region is mapped with appropriate
+ * device attributes.
+ */
+extern void memcpy_fromio(void *to, const volatile void __iomem *from,
+ size_t count);
+
+/*
+ * memcpy_toio - Copy data from regular memory to I/O memory space
+ * @to: Destination address in I/O memory space (must be marked __iomem)
+ * @from: Source buffer in regular memory
+ * @count: Number of bytes to copy
+ *
+ * This function handles copying to memory-mapped I/O regions using
+ * architecture-appropriate I/O accessor functions (e.g. writeb/writel on Arm)
+ * that already impose the required ordering for device accesses. Typical
+ * implementations may:
+ * - Handle leading/trailing unaligned bytes with 8-bit accesses
+ * - Use the widest safe aligned access size supported by the target (often
+ * 32-bit on Arm where 64-bit MMIO may not be atomic)
+ * - Rely on MMIO accessors to provide the needed barriers
+ *
+ * Limitations:
+ * - Only suitable for devices that tolerate 8-bit and 32-bit accesses; it is
+ * not valid for devices that require strictly 16-bit or 64-bit access sizes.
+ * - Callers must ensure the target MMIO region is mapped with appropriate
+ * device attributes.
+ */
+extern void memcpy_toio(volatile void __iomem *to, const void *from,
+ size_t count);
+
+#endif /* _XEN_LIB_IO_H */
--
2.34.1
On 21.01.2026 19:43, Oleksii Moisieiev wrote: > This commit introduces two helper functions, `memcpy-fromio` and > `memcpy-toio`, to provide a robust mechanism for copying data between > standard memory and memory-mapped I/O (MMIO) space for the ARM > architecture. No helpers of this name are being introduced, as what's spelled out above aren't even identifiers. Also instead of the quoting we've been trying to uniformly identify functions in descriptions by adding parentheses: memcpy_fromio(). Plus, nit: Please don't use "This commit ..." or alike in descriptions. > --- a/xen/arch/arm/Makefile > +++ b/xen/arch/arm/Makefile > @@ -10,6 +10,7 @@ endif > obj-y += firmware/ > obj-$(CONFIG_TEE) += tee/ > obj-$(CONFIG_HAS_VPCI) += vpci.o > +obj-y += lib/ Yes, sorting in this section was already screwed. But why make the problem worse? > --- a/xen/arch/arm/arch.mk > +++ b/xen/arch/arm/arch.mk > @@ -2,6 +2,7 @@ > # arm-specific definitions > > ARCH_LIBS-y += arch/arm/$(ARCH)/lib/lib.a > +ALL_LIBS-y += arch/arm/lib/lib.a Conceivable generic helpers of the same names could be introduced. In that case this choice of yours would lead to them being used, instead of the Arm ones. IOW I think you want to add to ARCH_LIBS here. > --- /dev/null > +++ b/xen/arch/arm/lib/memcpy-fromio.c > @@ -0,0 +1,48 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +#include <asm/io.h> > +#include <xen/lib/io.h> Preferably the other way around, and with a blank line between them. (But see below as to the header being generic; if it wasn't, this remark wouldn't apply anymore.) > --- /dev/null > +++ b/xen/include/xen/lib/io.h > @@ -0,0 +1,71 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Generic I/O memory copy function prototypes. > + * > + * These functions provide low-level implementation for copying data between > + * regular memory and I/O memory regions. Each architecture must provide its > + * own implementation based on the specific requirements of the architecture's > + * memory model and I/O access patterns. An architecture may supply these as > + * functions or as macros in its own headers before including this file. > + * > + * Architecture-specific implementations: > + * ===================================== > + * Each architecture should implement these functions in xen/lib/<arch>/io.c > + * (or define them as macros) based on their hardware requirements. See > + * xen/lib/arm/io.c for an example using explicit I/O accessors. > + */ The file name referenced is unhelpful and actually wrong for the Arm functions you add here. > +#ifndef _XEN_LIB_IO_H > +#define _XEN_LIB_IO_H > + > +#include <xen/types.h> > + > +/* > + * memcpy_fromio - Copy data from I/O memory space to regular memory > + * @to: Destination buffer in regular memory > + * @from: Source address in I/O memory space (must be marked __iomem) > + * @count: Number of bytes to copy > + * > + * This function handles copying from memory-mapped I/O regions using > + * architecture-appropriate I/O accessor functions (e.g. readb/readl on Arm) > + * that already impose the required ordering for device accesses. Typical > + * implementations may: > + * - Handle leading/trailing unaligned bytes with 8-bit accesses This is either imprecise, or the implementation is wrong: From context, this ought to be talking solely of the MMIO side of the operation. Yet if src and dst are misaligned with one another, you'd do the entire operation in 8-bit chunks. For devices requiring aligned 32-bit accesses that won't work at all. > + * - Use the widest safe aligned access size supported by the target (often > + * 32-bit on Arm where 64-bit MMIO may not be atomic) > + * - Rely on MMIO accessors to provide the needed barriers > + * > + * Limitations: > + * - Only suitable for devices that tolerate 8-bit and 32-bit accesses; it is > + * not valid for devices that require strictly 16-bit or 64-bit access sizes. > + * - Callers must ensure the target MMIO region is mapped with appropriate > + * device attributes. > + */ The description is now valid for the Arm implementation you supply, but the header we're in is a generic one. Imo, generic constraints should be reduced as much as possible, like dealing with leading / trailing sub-32-bit items by doing at most one 8-bit access followed by at most one 16-bit one (the other way around for the trailing part). Or else the header should be Arm- only as well (more strict constraints on Arm would make these functions potentially unusable from generic code, after all). Along these lines, "device attributes" is Arm terminology, aiui. Also, if indeed a generic header, why xen/lib/io.h and not xen/io.h (which already exists)? Jan
© 2016 - 2026 Red Hat, Inc.