[v7] xen/arm: scmi: introduce SCI SCMI SMC multi-agent support

[PATCH v7 3/5] lib/arm: Add I/O memory copy helpers

Posted by Oleksii Moisieiev 3 weeks, 5 days ago

This commit introduces two helper functions, `memcpy_fromio` and
`memcpy_toio`, to provide a robust mechanism for copying data between
standard memory and memory-mapped I/O (MMIO) space for the ARM
architecture.

These helpers handle alignment safely by using byte accesses for any
leading/trailing unaligned bytes and 32-bit raw accesses for the aligned
bulk transfer. Using `__raw_readb/__raw_readl` and
`__raw_writeb/__raw_writel` avoids unintended endianness conversion while
remaining safe across ARM32/ARM64 devices that only support 32-bit
accesses.

The interface lives in the generic header so other architectures can
provide their own implementations (as macros or functions). The ARM
implementation is split into separate compilation units and added to the
architecture-specific lib Makefile.

Signed-off-by: Oleksii Moisieiev <oleksii_moisieiev@epam.com>
---

Changes in v7:
- x86 guidance: removed the speculative note; header now just says
  each arch supplies its own implementation or macro.
- name spacing: dropped the double-underscore; the helpers are now
  memcpy_fromio / memcpy_toio. The header also explicitly allows an
  arch to define these as macros before including it.
- updated io.c to keep 32-bit transfers safe on arm32
- moved to __raw_read*/__raw_write* accessors to avoid endianness conversion.
- split the helpers into separate compilation units

Changes in v6:
- sorted objs in Makefile alhabetically
- added newline at the end of Makefile
- used uint{N}_t intead of u{N}
- add comment about why 32 bit IO operations were used
- updated cast opertaions to avoid dropping constness which is wrong
- move function definitions to generic place so the could be reused by
other arch
- add SPDX tag to io.c

Changes in v5:
- move memcpy_toio/fromio to the generic place

 xen/include/xen/lib/io.h    | 65 +++++++++++++++++++++++++++++++++++++
 xen/lib/Makefile            |  1 +
 xen/lib/arm/Makefile        |  1 +
 xen/lib/arm/memcpy_fromio.c | 48 +++++++++++++++++++++++++++
 xen/lib/arm/memcpy_toio.c   | 48 +++++++++++++++++++++++++++
 5 files changed, 163 insertions(+)
 create mode 100644 xen/include/xen/lib/io.h
 create mode 100644 xen/lib/arm/Makefile
 create mode 100644 xen/lib/arm/memcpy_fromio.c
 create mode 100644 xen/lib/arm/memcpy_toio.c

diff --git a/xen/include/xen/lib/io.h b/xen/include/xen/lib/io.h
new file mode 100644
index 0000000000..cd2b6680d5
--- /dev/null
+++ b/xen/include/xen/lib/io.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generic I/O memory copy function prototypes.
+ *
+ * These functions provide low-level implementation for copying data between
+ * regular memory and I/O memory regions. Each architecture must provide its
+ * own implementation based on the specific requirements of the architecture's
+ * memory model and I/O access patterns. An architecture may supply these as
+ * functions or as macros in its own headers before including this file.
+ *
+ * Architecture-specific implementations:
+ * =====================================
+ * Each architecture should implement these functions in xen/lib/<arch>/io.c
+ * (or define them as macros) based on their hardware requirements. See
+ * xen/lib/arm/io.c for an example using explicit I/O accessors.
+ */
+
+#ifndef _XEN_LIB_IO_H
+#define _XEN_LIB_IO_H
+
+#include <xen/types.h>
+
+/*
+ * memcpy_fromio - Copy data from I/O memory space to regular memory
+ * @to: Destination buffer in regular memory
+ * @from: Source address in I/O memory space (must be marked __iomem)
+ * @count: Number of bytes to copy
+ *
+ * This function handles copying from memory-mapped I/O regions using
+ * architecture-appropriate I/O accessor functions. It ensures proper:
+ * - Memory ordering and barriers
+ * - Alignment requirements
+ * - Hardware-specific access semantics
+ *
+ * Each architecture provides its own implementation that may:
+ * - Use special I/O accessor functions (ARM: readl_relaxed, readb_relaxed)
+ * - Implement alignment handling for devices requiring specific access sizes
+ * - Add memory barriers to ensure ordering with other I/O operations
+ * - Or simply map to memcpy() if the architecture allows direct I/O access
+ */
+extern void memcpy_fromio(void *to, const volatile void __iomem *from,
+                          size_t count);
+
+/*
+ * memcpy_toio - Copy data from regular memory to I/O memory space
+ * @to: Destination address in I/O memory space (must be marked __iomem)
+ * @from: Source buffer in regular memory
+ * @count: Number of bytes to copy
+ *
+ * This function handles copying to memory-mapped I/O regions using
+ * architecture-appropriate I/O accessor functions. It ensures proper:
+ * - Memory ordering and barriers
+ * - Alignment requirements
+ * - Hardware-specific access semantics
+ *
+ * Each architecture provides its own implementation that may:
+ * - Use special I/O accessor functions (ARM: writel_relaxed, writeb_relaxed)
+ * - Implement alignment handling for devices requiring specific access sizes
+ * - Add memory barriers to ensure ordering with other I/O operations
+ * - Or simply map to memcpy() if the architecture allows direct I/O access
+ */
+extern void memcpy_toio(volatile void __iomem *to, const void *from,
+                        size_t count);
+
+#endif /* _XEN_LIB_IO_H */
diff --git a/xen/lib/Makefile b/xen/lib/Makefile
index 5ccb1e5241..6bb0491d89 100644
--- a/xen/lib/Makefile
+++ b/xen/lib/Makefile
@@ -1,3 +1,4 @@
+obj-$(CONFIG_ARM) += arm/
 obj-$(CONFIG_X86) += x86/
 
 lib-y += bsearch.o
diff --git a/xen/lib/arm/Makefile b/xen/lib/arm/Makefile
new file mode 100644
index 0000000000..0bb1a825ce
--- /dev/null
+++ b/xen/lib/arm/Makefile
@@ -0,0 +1 @@
+obj-y += memcpy_fromio.o memcpy_toio.o
diff --git a/xen/lib/arm/memcpy_fromio.c b/xen/lib/arm/memcpy_fromio.c
new file mode 100644
index 0000000000..342a28cb49
--- /dev/null
+++ b/xen/lib/arm/memcpy_fromio.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include <asm/io.h>
+#include <xen/lib/io.h>
+
+/*
+ * Use 32-bit raw IO operations for portability across ARM32/ARM64 where
+ * 64-bit accessors may not be atomic and some devices only support 32-bit
+ * aligned accesses.
+ */
+
+void memcpy_fromio(void *to, const volatile void __iomem *from,
+		   size_t count)
+{
+	while ( count && (!IS_ALIGNED((unsigned long)from, 4) ||
+			  !IS_ALIGNED((unsigned long)to, 4)) )
+	{
+		*(uint8_t *)to = __raw_readb(from);
+		from++;
+		to++;
+		count--;
+	}
+
+	while ( count >= 4 )
+	{
+		*(uint32_t *)to = __raw_readl(from);
+		from += 4;
+		to += 4;
+		count -= 4;
+	}
+
+	while ( count )
+	{
+		*(uint8_t *)to = __raw_readb(from);
+		from++;
+		to++;
+		count--;
+	}
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 8
+ * tab-width: 8
+ * indent-tabs-mode: t
+ * End:
+ */
diff --git a/xen/lib/arm/memcpy_toio.c b/xen/lib/arm/memcpy_toio.c
new file mode 100644
index 0000000000..e543c49124
--- /dev/null
+++ b/xen/lib/arm/memcpy_toio.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#include <asm/io.h>
+#include <xen/lib/io.h>
+
+/*
+ * Use 32-bit raw IO operations for portability across ARM32/ARM64 where
+ * 64-bit accessors may not be atomic and some devices only support 32-bit
+ * aligned accesses.
+ */
+
+void memcpy_toio(volatile void __iomem *to, const void *from,
+	       size_t count)
+{
+	while ( count && (!IS_ALIGNED((unsigned long)to, 4) ||
+			  !IS_ALIGNED((unsigned long)from, 4)) )
+	{
+		__raw_writeb(*(const uint8_t *)from, to);
+		from++;
+		to++;
+		count--;
+	}
+
+	while ( count >= 4 )
+	{
+		__raw_writel(*(const uint32_t *)from, to);
+		from += 4;
+		to += 4;
+		count -= 4;
+	}
+
+	while ( count )
+	{
+		__raw_writeb(*(const uint8_t *)from, to);
+		from++;
+		to++;
+		count--;
+	}
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 8
+ * tab-width: 8
+ * indent-tabs-mode: t
+ * End:
+ */
-- 
2.34.1

Re: [PATCH v7 3/5] lib/arm: Add I/O memory copy helpers

Posted by Jan Beulich 3 weeks, 4 days ago

On 14.01.2026 19:29, Oleksii Moisieiev wrote:
> --- /dev/null
> +++ b/xen/lib/arm/Makefile
> @@ -0,0 +1 @@
> +obj-y += memcpy_fromio.o memcpy_toio.o

lib-y please (requiring a change in Arm's arch.mk as well), and each
file on its own line. Plus if this is to be Arm-only (see below), it
really means to live in xen/arch/arm/lib/ - see how xen/lib/x86/ is
about to go away:
https://lists.xen.org/archives/html/xen-devel/2026-01/msg00138.html.

> --- /dev/null
> +++ b/xen/lib/arm/memcpy_fromio.c
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#include <asm/io.h>
> +#include <xen/lib/io.h>
> +
> +/*
> + * Use 32-bit raw IO operations for portability across ARM32/ARM64 where
> + * 64-bit accessors may not be atomic and some devices only support 32-bit
> + * aligned accesses.
> + */
> +
> +void memcpy_fromio(void *to, const volatile void __iomem *from,
> +		   size_t count)
> +{
> +	while ( count && (!IS_ALIGNED((unsigned long)from, 4) ||
> +			  !IS_ALIGNED((unsigned long)to, 4)) )

Nit: Xen style indentation (no hard tabs) please throughout.

> +	{
> +		*(uint8_t *)to = __raw_readb(from);
> +		from++;
> +		to++;
> +		count--;
> +	}
> +
> +	while ( count >= 4 )
> +	{
> +		*(uint32_t *)to = __raw_readl(from);
> +		from += 4;
> +		to += 4;
> +		count -= 4;
> +	}
> +
> +	while ( count )
> +	{
> +		*(uint8_t *)to = __raw_readb(from);
> +		from++;
> +		to++;
> +		count--;
> +	}
> +}

Barrier requirements on Arm aren't quite clear to me here: Is it really correct
to use __raw_read{b,w,l}() here, rather than read{b,w,l}()? If it was, wouldn't
a barrier then be needed at the end of the function?

And then, if it was read{b,w,l}() that is to be used here, what about all of
this would then still be Arm-specific? Hmm, I guess the IS_ALIGNED() on "to" is,
but that's Arm32-specific, with Arm64 not needing it? Plus then it's again not
exactly Arm-specific, but specific to all architectures where misaligned
accesses may fault.

Jan

Re: [PATCH v7 3/5] lib/arm: Add I/O memory copy helpers

Posted by Jan Beulich 3 weeks, 4 days ago

On 15.01.2026 10:26, Jan Beulich wrote:
> On 14.01.2026 19:29, Oleksii Moisieiev wrote:
>> --- /dev/null
>> +++ b/xen/lib/arm/memcpy_fromio.c
>> @@ -0,0 +1,48 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +#include <asm/io.h>
>> +#include <xen/lib/io.h>
>> +
>> +/*
>> + * Use 32-bit raw IO operations for portability across ARM32/ARM64 where
>> + * 64-bit accessors may not be atomic and some devices only support 32-bit
>> + * aligned accesses.
>> + */
>> +
>> +void memcpy_fromio(void *to, const volatile void __iomem *from,
>> +		   size_t count)
>> +{
>> +	while ( count && (!IS_ALIGNED((unsigned long)from, 4) ||
>> +			  !IS_ALIGNED((unsigned long)to, 4)) )
> 
> Nit: Xen style indentation (no hard tabs) please throughout.
> 
>> +	{
>> +		*(uint8_t *)to = __raw_readb(from);
>> +		from++;
>> +		to++;
>> +		count--;
>> +	}
>> +
>> +	while ( count >= 4 )
>> +	{
>> +		*(uint32_t *)to = __raw_readl(from);
>> +		from += 4;
>> +		to += 4;
>> +		count -= 4;
>> +	}
>> +
>> +	while ( count )
>> +	{
>> +		*(uint8_t *)to = __raw_readb(from);
>> +		from++;
>> +		to++;
>> +		count--;
>> +	}
>> +}
> 
> Barrier requirements on Arm aren't quite clear to me here: Is it really correct
> to use __raw_read{b,w,l}() here, rather than read{b,w,l}()? If it was, wouldn't
> a barrier then be needed at the end of the function?

Thinking about it, as the order of MMIO accesses needs to be guaranteed
(unless you have extra information about the area's properties, like it
being a video frame buffer), I'm pretty sure now that read{b,w,l}() needs
using here. In fact the comment in the header says that it would handle
"Memory ordering and barriers" when it doesn't look as if it did.

Note how Linux looks to have grown multiple flavors: Besides
memcpy_fromio() I can also spot at least fb_memcpy_fromio() and
sdio_memcpy_fromio().

> And then, if it was read{b,w,l}() that is to be used here, what about all of
> this would then still be Arm-specific? Hmm, I guess the IS_ALIGNED() on "to" is,
> but that's Arm32-specific, with Arm64 not needing it? Plus then it's again not
> exactly Arm-specific, but specific to all architectures where misaligned
> accesses may fault.

There's a bigger issue here, with access granularity (despite the header
claiming "Implement alignment handling for devices requiring specific
access sizes"). MMIO can behave in interesting ways. The header comment
says nothing as to restrictions, i.e. when these functions may not be
used. Yet consider a device registers of which must be accessed in 32-bit
chunks. As long as the other pointer is suitably aligned, all would be
fine. But you handle the case where it isn't, and hence that case then
also needs to function correctly. At the same time accesses to a devices
requiring 16- or 64bit granularity wouldn't work at all, which for
required 8-bit granularity it would again only work partly.

How much of the above requires code adjustments and how much should be
dealt with by updating commentary I don't know, as I know nothing about
your particular use case, nor about possible future ones.

Also note that the header comment still references the ..._relaxed()
functions, when then implementation doesn't use those anymore.

Jan

Re: [PATCH v7 3/5] lib/arm: Add I/O memory copy helpers

Posted by Oleksii Moisieiev 3 weeks, 4 days ago

On 15/01/2026 13:59, Jan Beulich wrote:
> On 15.01.2026 10:26, Jan Beulich wrote:
>> On 14.01.2026 19:29, Oleksii Moisieiev wrote:
>>> --- /dev/null
>>> +++ b/xen/lib/arm/memcpy_fromio.c
>>> @@ -0,0 +1,48 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +#include <asm/io.h>
>>> +#include <xen/lib/io.h>
>>> +
>>> +/*
>>> + * Use 32-bit raw IO operations for portability across ARM32/ARM64 where
>>> + * 64-bit accessors may not be atomic and some devices only support 32-bit
>>> + * aligned accesses.
>>> + */
>>> +
>>> +void memcpy_fromio(void *to, const volatile void __iomem *from,
>>> +		   size_t count)
>>> +{
>>> +	while ( count && (!IS_ALIGNED((unsigned long)from, 4) ||
>>> +			  !IS_ALIGNED((unsigned long)to, 4)) )
>> Nit: Xen style indentation (no hard tabs) please throughout.
>>
>>> +	{
>>> +		*(uint8_t *)to = __raw_readb(from);
>>> +		from++;
>>> +		to++;
>>> +		count--;
>>> +	}
>>> +
>>> +	while ( count >= 4 )
>>> +	{
>>> +		*(uint32_t *)to = __raw_readl(from);
>>> +		from += 4;
>>> +		to += 4;
>>> +		count -= 4;
>>> +	}
>>> +
>>> +	while ( count )
>>> +	{
>>> +		*(uint8_t *)to = __raw_readb(from);
>>> +		from++;
>>> +		to++;
>>> +		count--;
>>> +	}
>>> +}
>> Barrier requirements on Arm aren't quite clear to me here: Is it really correct
>> to use __raw_read{b,w,l}() here, rather than read{b,w,l}()? If it was, wouldn't
>> a barrier then be needed at the end of the function?
> Thinking about it, as the order of MMIO accesses needs to be guaranteed
> (unless you have extra information about the area's properties, like it
> being a video frame buffer), I'm pretty sure now that read{b,w,l}() needs
> using here. In fact the comment in the header says that it would handle
> "Memory ordering and barriers" when it doesn't look as if it did.
>
> Note how Linux looks to have grown multiple flavors: Besides
> memcpy_fromio() I can also spot at least fb_memcpy_fromio() and
> sdio_memcpy_fromio().
>
>> And then, if it was read{b,w,l}() that is to be used here, what about all of
>> this would then still be Arm-specific? Hmm, I guess the IS_ALIGNED() on "to" is,
>> but that's Arm32-specific, with Arm64 not needing it? Plus then it's again not
>> exactly Arm-specific, but specific to all architectures where misaligned
>> accesses may fault.
> There's a bigger issue here, with access granularity (despite the header
> claiming "Implement alignment handling for devices requiring specific
> access sizes"). MMIO can behave in interesting ways. The header comment
> says nothing as to restrictions, i.e. when these functions may not be
> used. Yet consider a device registers of which must be accessed in 32-bit
> chunks. As long as the other pointer is suitably aligned, all would be
> fine. But you handle the case where it isn't, and hence that case then
> also needs to function correctly. At the same time accesses to a devices
> requiring 16- or 64bit granularity wouldn't work at all, which for
> required 8-bit granularity it would again only work partly.
>
> How much of the above requires code adjustments and how much should be
> dealt with by updating commentary I don't know, as I know nothing about
> your particular use case, nor about possible future ones.
>
> Also note that the header comment still references the ..._relaxed()
> functions, when then implementation doesn't use those anymore.
>
> Jan
Hi Jan,

Thank you very much for your valuable input and involvement.

After further research and reconsidering all the points you raised, I 
have decided
to switch to using the ordered MMIO accessors (readb/readl and 
writeb/writel).

Here is my reasoning:

The concern you mentioned was valid: the use of __raw_read*/__raw_write* 
in the
helpers did not provide the ordering guarantees expected for MMIO 
copies, and the
header still referenced *_relaxed, even though the implementation no 
longer used
them. This could allow reordering around the copy and misrepresent the 
intended
semantics.

A few additional thoughts regarding your other questions:

Is it still Arm-specific?
Functionally, the logic could work on any architecture that supports 
8/32-bit MMIO
accesses and uses the same accessor semantics. However, this implementation
relies on Arm’s read*/write* ordering guarantees, as well as the 
Arm-specific
IS_ALIGNED check to avoid misaligned faults. Therefore, it remains 
Arm-specific
as written; other architectures would need their own implementations if they
have different alignment or granularity requirements.

Ordered accessors vs. raw accessors + trailing barrier:
I chose to use ordered accessors instead of raw accessors with a 
trailing barrier
because readb/readl and writeb/writel already provide the required 
device ordering
and barrier semantics, making an additional barrier unnecessary and 
solving potential
ordering issues.

Use for register access:
These helpers are suitable for MMIO buffers, shared memory, and 
registers that
tolerate 8-/32-bit accesses. They are not appropriate for registers that
require 16- or 64-bit accesses, or where side effects depend on the exact
access width. I'll update the header to document this limitation; 
drivers needing strict
register semantics should continue to use readl/writel
(or readw/writew/readq/writeq) directly, rather than memcpy_*io().

Summary:

I have made the following changes to the helper functions:

- switched to ordered accessors to address the ordering and barrier 
concerns.
- updated the documentation to match the implementation and explicitly state
the supported access sizes and granularity.

If this approach sounds good to you, I will proceed with the fix and 
submit a new version.

Best regards,
Oleksii

Re: [PATCH v7 3/5] lib/arm: Add I/O memory copy helpers

Posted by Jan Beulich 3 weeks, 4 days ago

On 15.01.2026 16:34, Oleksii Moisieiev wrote:
> On 15/01/2026 13:59, Jan Beulich wrote:
>> On 15.01.2026 10:26, Jan Beulich wrote:
>>> On 14.01.2026 19:29, Oleksii Moisieiev wrote:
>>>> --- /dev/null
>>>> +++ b/xen/lib/arm/memcpy_fromio.c
>>>> @@ -0,0 +1,48 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>> +#include <asm/io.h>
>>>> +#include <xen/lib/io.h>
>>>> +
>>>> +/*
>>>> + * Use 32-bit raw IO operations for portability across ARM32/ARM64 where
>>>> + * 64-bit accessors may not be atomic and some devices only support 32-bit
>>>> + * aligned accesses.
>>>> + */
>>>> +
>>>> +void memcpy_fromio(void *to, const volatile void __iomem *from,
>>>> +		   size_t count)
>>>> +{
>>>> +	while ( count && (!IS_ALIGNED((unsigned long)from, 4) ||
>>>> +			  !IS_ALIGNED((unsigned long)to, 4)) )
>>> Nit: Xen style indentation (no hard tabs) please throughout.
>>>
>>>> +	{
>>>> +		*(uint8_t *)to = __raw_readb(from);
>>>> +		from++;
>>>> +		to++;
>>>> +		count--;
>>>> +	}
>>>> +
>>>> +	while ( count >= 4 )
>>>> +	{
>>>> +		*(uint32_t *)to = __raw_readl(from);
>>>> +		from += 4;
>>>> +		to += 4;
>>>> +		count -= 4;
>>>> +	}
>>>> +
>>>> +	while ( count )
>>>> +	{
>>>> +		*(uint8_t *)to = __raw_readb(from);
>>>> +		from++;
>>>> +		to++;
>>>> +		count--;
>>>> +	}
>>>> +}
>>> Barrier requirements on Arm aren't quite clear to me here: Is it really correct
>>> to use __raw_read{b,w,l}() here, rather than read{b,w,l}()? If it was, wouldn't
>>> a barrier then be needed at the end of the function?
>> Thinking about it, as the order of MMIO accesses needs to be guaranteed
>> (unless you have extra information about the area's properties, like it
>> being a video frame buffer), I'm pretty sure now that read{b,w,l}() needs
>> using here. In fact the comment in the header says that it would handle
>> "Memory ordering and barriers" when it doesn't look as if it did.
>>
>> Note how Linux looks to have grown multiple flavors: Besides
>> memcpy_fromio() I can also spot at least fb_memcpy_fromio() and
>> sdio_memcpy_fromio().
>>
>>> And then, if it was read{b,w,l}() that is to be used here, what about all of
>>> this would then still be Arm-specific? Hmm, I guess the IS_ALIGNED() on "to" is,
>>> but that's Arm32-specific, with Arm64 not needing it? Plus then it's again not
>>> exactly Arm-specific, but specific to all architectures where misaligned
>>> accesses may fault.
>> There's a bigger issue here, with access granularity (despite the header
>> claiming "Implement alignment handling for devices requiring specific
>> access sizes"). MMIO can behave in interesting ways. The header comment
>> says nothing as to restrictions, i.e. when these functions may not be
>> used. Yet consider a device registers of which must be accessed in 32-bit
>> chunks. As long as the other pointer is suitably aligned, all would be
>> fine. But you handle the case where it isn't, and hence that case then
>> also needs to function correctly. At the same time accesses to a devices
>> requiring 16- or 64bit granularity wouldn't work at all, which for
>> required 8-bit granularity it would again only work partly.
>>
>> How much of the above requires code adjustments and how much should be
>> dealt with by updating commentary I don't know, as I know nothing about
>> your particular use case, nor about possible future ones.
>>
>> Also note that the header comment still references the ..._relaxed()
>> functions, when then implementation doesn't use those anymore.
>>
>> Jan
> Hi Jan,
> 
> Thank you very much for your valuable input and involvement.
> 
> After further research and reconsidering all the points you raised, I 
> have decided
> to switch to using the ordered MMIO accessors (readb/readl and 
> writeb/writel).
> 
> Here is my reasoning:
> 
> The concern you mentioned was valid: the use of __raw_read*/__raw_write* 
> in the
> helpers did not provide the ordering guarantees expected for MMIO 
> copies, and the
> header still referenced *_relaxed, even though the implementation no 
> longer used
> them. This could allow reordering around the copy and misrepresent the 
> intended
> semantics.
> 
> A few additional thoughts regarding your other questions:
> 
> Is it still Arm-specific?
> Functionally, the logic could work on any architecture that supports 
> 8/32-bit MMIO
> accesses and uses the same accessor semantics. However, this implementation
> relies on Arm’s read*/write* ordering guarantees, as well as the 
> Arm-specific
> IS_ALIGNED check to avoid misaligned faults. Therefore, it remains 
> Arm-specific
> as written; other architectures would need their own implementations if they
> have different alignment or granularity requirements.
> 
> Ordered accessors vs. raw accessors + trailing barrier:
> I chose to use ordered accessors instead of raw accessors with a 
> trailing barrier
> because readb/readl and writeb/writel already provide the required 
> device ordering
> and barrier semantics, making an additional barrier unnecessary and 
> solving potential
> ordering issues.
> 
> Use for register access:
> These helpers are suitable for MMIO buffers, shared memory, and 
> registers that
> tolerate 8-/32-bit accesses. They are not appropriate for registers that
> require 16- or 64-bit accesses, or where side effects depend on the exact
> access width. I'll update the header to document this limitation; 
> drivers needing strict
> register semantics should continue to use readl/writel
> (or readw/writew/readq/writeq) directly, rather than memcpy_*io().
> 
> Summary:
> 
> I have made the following changes to the helper functions:
> 
> - switched to ordered accessors to address the ordering and barrier 
> concerns.
> - updated the documentation to match the implementation and explicitly state
> the supported access sizes and granularity.
> 
> If this approach sounds good to you, I will proceed with the fix and 
> submit a new version.

At the first glance it looks okay, but please allow Arm folks to give their
opinion. And of course my final take will be available only once I see the
next version.

Jan

Re: [PATCH v7 3/5] lib/arm: Add I/O memory copy helpers

Posted by Oleksii Moisieiev 3 weeks, 4 days ago


On 15/01/2026 17:39, Jan Beulich wrote:
> On 15.01.2026 16:34, Oleksii Moisieiev wrote:
>> On 15/01/2026 13:59, Jan Beulich wrote:
>>> On 15.01.2026 10:26, Jan Beulich wrote:
>>>> On 14.01.2026 19:29, Oleksii Moisieiev wrote:
>>>>> --- /dev/null
>>>>> +++ b/xen/lib/arm/memcpy_fromio.c
>>>>> @@ -0,0 +1,48 @@
>>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>>> +#include <asm/io.h>
>>>>> +#include <xen/lib/io.h>
>>>>> +
>>>>> +/*
>>>>> + * Use 32-bit raw IO operations for portability across ARM32/ARM64 where
>>>>> + * 64-bit accessors may not be atomic and some devices only support 32-bit
>>>>> + * aligned accesses.
>>>>> + */
>>>>> +
>>>>> +void memcpy_fromio(void *to, const volatile void __iomem *from,
>>>>> +		   size_t count)
>>>>> +{
>>>>> +	while ( count && (!IS_ALIGNED((unsigned long)from, 4) ||
>>>>> +			  !IS_ALIGNED((unsigned long)to, 4)) )
>>>> Nit: Xen style indentation (no hard tabs) please throughout.
>>>>
>>>>> +	{
>>>>> +		*(uint8_t *)to = __raw_readb(from);
>>>>> +		from++;
>>>>> +		to++;
>>>>> +		count--;
>>>>> +	}
>>>>> +
>>>>> +	while ( count >= 4 )
>>>>> +	{
>>>>> +		*(uint32_t *)to = __raw_readl(from);
>>>>> +		from += 4;
>>>>> +		to += 4;
>>>>> +		count -= 4;
>>>>> +	}
>>>>> +
>>>>> +	while ( count )
>>>>> +	{
>>>>> +		*(uint8_t *)to = __raw_readb(from);
>>>>> +		from++;
>>>>> +		to++;
>>>>> +		count--;
>>>>> +	}
>>>>> +}
>>>> Barrier requirements on Arm aren't quite clear to me here: Is it really correct
>>>> to use __raw_read{b,w,l}() here, rather than read{b,w,l}()? If it was, wouldn't
>>>> a barrier then be needed at the end of the function?
>>> Thinking about it, as the order of MMIO accesses needs to be guaranteed
>>> (unless you have extra information about the area's properties, like it
>>> being a video frame buffer), I'm pretty sure now that read{b,w,l}() needs
>>> using here. In fact the comment in the header says that it would handle
>>> "Memory ordering and barriers" when it doesn't look as if it did.
>>>
>>> Note how Linux looks to have grown multiple flavors: Besides
>>> memcpy_fromio() I can also spot at least fb_memcpy_fromio() and
>>> sdio_memcpy_fromio().
>>>
>>>> And then, if it was read{b,w,l}() that is to be used here, what about all of
>>>> this would then still be Arm-specific? Hmm, I guess the IS_ALIGNED() on "to" is,
>>>> but that's Arm32-specific, with Arm64 not needing it? Plus then it's again not
>>>> exactly Arm-specific, but specific to all architectures where misaligned
>>>> accesses may fault.
>>> There's a bigger issue here, with access granularity (despite the header
>>> claiming "Implement alignment handling for devices requiring specific
>>> access sizes"). MMIO can behave in interesting ways. The header comment
>>> says nothing as to restrictions, i.e. when these functions may not be
>>> used. Yet consider a device registers of which must be accessed in 32-bit
>>> chunks. As long as the other pointer is suitably aligned, all would be
>>> fine. But you handle the case where it isn't, and hence that case then
>>> also needs to function correctly. At the same time accesses to a devices
>>> requiring 16- or 64bit granularity wouldn't work at all, which for
>>> required 8-bit granularity it would again only work partly.
>>>
>>> How much of the above requires code adjustments and how much should be
>>> dealt with by updating commentary I don't know, as I know nothing about
>>> your particular use case, nor about possible future ones.
>>>
>>> Also note that the header comment still references the ..._relaxed()
>>> functions, when then implementation doesn't use those anymore.
>>>
>>> Jan
>> Hi Jan,
>>
>> Thank you very much for your valuable input and involvement.
>>
>> After further research and reconsidering all the points you raised, I
>> have decided
>> to switch to using the ordered MMIO accessors (readb/readl and
>> writeb/writel).
>>
>> Here is my reasoning:
>>
>> The concern you mentioned was valid: the use of __raw_read*/__raw_write*
>> in the
>> helpers did not provide the ordering guarantees expected for MMIO
>> copies, and the
>> header still referenced *_relaxed, even though the implementation no
>> longer used
>> them. This could allow reordering around the copy and misrepresent the
>> intended
>> semantics.
>>
>> A few additional thoughts regarding your other questions:
>>
>> Is it still Arm-specific?
>> Functionally, the logic could work on any architecture that supports
>> 8/32-bit MMIO
>> accesses and uses the same accessor semantics. However, this implementation
>> relies on Arm’s read*/write* ordering guarantees, as well as the
>> Arm-specific
>> IS_ALIGNED check to avoid misaligned faults. Therefore, it remains
>> Arm-specific
>> as written; other architectures would need their own implementations if they
>> have different alignment or granularity requirements.
>>
>> Ordered accessors vs. raw accessors + trailing barrier:
>> I chose to use ordered accessors instead of raw accessors with a
>> trailing barrier
>> because readb/readl and writeb/writel already provide the required
>> device ordering
>> and barrier semantics, making an additional barrier unnecessary and
>> solving potential
>> ordering issues.
>>
>> Use for register access:
>> These helpers are suitable for MMIO buffers, shared memory, and
>> registers that
>> tolerate 8-/32-bit accesses. They are not appropriate for registers that
>> require 16- or 64-bit accesses, or where side effects depend on the exact
>> access width. I'll update the header to document this limitation;
>> drivers needing strict
>> register semantics should continue to use readl/writel
>> (or readw/writew/readq/writeq) directly, rather than memcpy_*io().
>>
>> Summary:
>>
>> I have made the following changes to the helper functions:
>>
>> - switched to ordered accessors to address the ordering and barrier
>> concerns.
>> - updated the documentation to match the implementation and explicitly state
>> the supported access sizes and granularity.
>>
>> If this approach sounds good to you, I will proceed with the fix and
>> submit a new version.
> At the first glance it looks okay, but please allow Arm folks to give their
> opinion. And of course my final take will be available only once I see the
> next version.
>
> Jan
Sure. I will post the new version once I sort out all documentation 
question.
Thank you again for deep review!

Oleksii.

Re: [PATCH v7 3/5] lib/arm: Add I/O memory copy helpers

Posted by Stefano Stabellini 3 weeks, 5 days ago

On Wed, 14 Jan 2026, Oleksii Moisieiev wrote:
> This commit introduces two helper functions, `memcpy_fromio` and
> `memcpy_toio`, to provide a robust mechanism for copying data between
> standard memory and memory-mapped I/O (MMIO) space for the ARM
> architecture.
> 
> These helpers handle alignment safely by using byte accesses for any
> leading/trailing unaligned bytes and 32-bit raw accesses for the aligned
> bulk transfer. Using `__raw_readb/__raw_readl` and
> `__raw_writeb/__raw_writel` avoids unintended endianness conversion while
> remaining safe across ARM32/ARM64 devices that only support 32-bit
> accesses.
> 
> The interface lives in the generic header so other architectures can
> provide their own implementations (as macros or functions). The ARM
> implementation is split into separate compilation units and added to the
> architecture-specific lib Makefile.
> 
> Signed-off-by: Oleksii Moisieiev <oleksii_moisieiev@epam.com>

From an ARM point of view:

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


Thanks Jan for the good feedback on the previous version which has now
been addressed


> ---
> 
> Changes in v7:
> - x86 guidance: removed the speculative note; header now just says
>   each arch supplies its own implementation or macro.
> - name spacing: dropped the double-underscore; the helpers are now
>   memcpy_fromio / memcpy_toio. The header also explicitly allows an
>   arch to define these as macros before including it.
> - updated io.c to keep 32-bit transfers safe on arm32
> - moved to __raw_read*/__raw_write* accessors to avoid endianness conversion.
> - split the helpers into separate compilation units
> 
> Changes in v6:
> - sorted objs in Makefile alhabetically
> - added newline at the end of Makefile
> - used uint{N}_t intead of u{N}
> - add comment about why 32 bit IO operations were used
> - updated cast opertaions to avoid dropping constness which is wrong
> - move function definitions to generic place so the could be reused by
> other arch
> - add SPDX tag to io.c
> 
> Changes in v5:
> - move memcpy_toio/fromio to the generic place
> 
>  xen/include/xen/lib/io.h    | 65 +++++++++++++++++++++++++++++++++++++
>  xen/lib/Makefile            |  1 +
>  xen/lib/arm/Makefile        |  1 +
>  xen/lib/arm/memcpy_fromio.c | 48 +++++++++++++++++++++++++++
>  xen/lib/arm/memcpy_toio.c   | 48 +++++++++++++++++++++++++++
>  5 files changed, 163 insertions(+)
>  create mode 100644 xen/include/xen/lib/io.h
>  create mode 100644 xen/lib/arm/Makefile
>  create mode 100644 xen/lib/arm/memcpy_fromio.c
>  create mode 100644 xen/lib/arm/memcpy_toio.c
> 
> diff --git a/xen/include/xen/lib/io.h b/xen/include/xen/lib/io.h
> new file mode 100644
> index 0000000000..cd2b6680d5
> --- /dev/null
> +++ b/xen/include/xen/lib/io.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Generic I/O memory copy function prototypes.
> + *
> + * These functions provide low-level implementation for copying data between
> + * regular memory and I/O memory regions. Each architecture must provide its
> + * own implementation based on the specific requirements of the architecture's
> + * memory model and I/O access patterns. An architecture may supply these as
> + * functions or as macros in its own headers before including this file.
> + *
> + * Architecture-specific implementations:
> + * =====================================
> + * Each architecture should implement these functions in xen/lib/<arch>/io.c
> + * (or define them as macros) based on their hardware requirements. See
> + * xen/lib/arm/io.c for an example using explicit I/O accessors.
> + */
> +
> +#ifndef _XEN_LIB_IO_H
> +#define _XEN_LIB_IO_H
> +
> +#include <xen/types.h>
> +
> +/*
> + * memcpy_fromio - Copy data from I/O memory space to regular memory
> + * @to: Destination buffer in regular memory
> + * @from: Source address in I/O memory space (must be marked __iomem)
> + * @count: Number of bytes to copy
> + *
> + * This function handles copying from memory-mapped I/O regions using
> + * architecture-appropriate I/O accessor functions. It ensures proper:
> + * - Memory ordering and barriers
> + * - Alignment requirements
> + * - Hardware-specific access semantics
> + *
> + * Each architecture provides its own implementation that may:
> + * - Use special I/O accessor functions (ARM: readl_relaxed, readb_relaxed)
> + * - Implement alignment handling for devices requiring specific access sizes
> + * - Add memory barriers to ensure ordering with other I/O operations
> + * - Or simply map to memcpy() if the architecture allows direct I/O access
> + */
> +extern void memcpy_fromio(void *to, const volatile void __iomem *from,
> +                          size_t count);
> +
> +/*
> + * memcpy_toio - Copy data from regular memory to I/O memory space
> + * @to: Destination address in I/O memory space (must be marked __iomem)
> + * @from: Source buffer in regular memory
> + * @count: Number of bytes to copy
> + *
> + * This function handles copying to memory-mapped I/O regions using
> + * architecture-appropriate I/O accessor functions. It ensures proper:
> + * - Memory ordering and barriers
> + * - Alignment requirements
> + * - Hardware-specific access semantics
> + *
> + * Each architecture provides its own implementation that may:
> + * - Use special I/O accessor functions (ARM: writel_relaxed, writeb_relaxed)
> + * - Implement alignment handling for devices requiring specific access sizes
> + * - Add memory barriers to ensure ordering with other I/O operations
> + * - Or simply map to memcpy() if the architecture allows direct I/O access
> + */
> +extern void memcpy_toio(volatile void __iomem *to, const void *from,
> +                        size_t count);
> +
> +#endif /* _XEN_LIB_IO_H */
> diff --git a/xen/lib/Makefile b/xen/lib/Makefile
> index 5ccb1e5241..6bb0491d89 100644
> --- a/xen/lib/Makefile
> +++ b/xen/lib/Makefile
> @@ -1,3 +1,4 @@
> +obj-$(CONFIG_ARM) += arm/
>  obj-$(CONFIG_X86) += x86/
>  
>  lib-y += bsearch.o
> diff --git a/xen/lib/arm/Makefile b/xen/lib/arm/Makefile
> new file mode 100644
> index 0000000000..0bb1a825ce
> --- /dev/null
> +++ b/xen/lib/arm/Makefile
> @@ -0,0 +1 @@
> +obj-y += memcpy_fromio.o memcpy_toio.o
> diff --git a/xen/lib/arm/memcpy_fromio.c b/xen/lib/arm/memcpy_fromio.c
> new file mode 100644
> index 0000000000..342a28cb49
> --- /dev/null
> +++ b/xen/lib/arm/memcpy_fromio.c
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#include <asm/io.h>
> +#include <xen/lib/io.h>
> +
> +/*
> + * Use 32-bit raw IO operations for portability across ARM32/ARM64 where
> + * 64-bit accessors may not be atomic and some devices only support 32-bit
> + * aligned accesses.
> + */
> +
> +void memcpy_fromio(void *to, const volatile void __iomem *from,
> +		   size_t count)
> +{
> +	while ( count && (!IS_ALIGNED((unsigned long)from, 4) ||
> +			  !IS_ALIGNED((unsigned long)to, 4)) )
> +	{
> +		*(uint8_t *)to = __raw_readb(from);
> +		from++;
> +		to++;
> +		count--;
> +	}
> +
> +	while ( count >= 4 )
> +	{
> +		*(uint32_t *)to = __raw_readl(from);
> +		from += 4;
> +		to += 4;
> +		count -= 4;
> +	}
> +
> +	while ( count )
> +	{
> +		*(uint8_t *)to = __raw_readb(from);
> +		from++;
> +		to++;
> +		count--;
> +	}
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 8
> + * tab-width: 8
> + * indent-tabs-mode: t
> + * End:
> + */
> diff --git a/xen/lib/arm/memcpy_toio.c b/xen/lib/arm/memcpy_toio.c
> new file mode 100644
> index 0000000000..e543c49124
> --- /dev/null
> +++ b/xen/lib/arm/memcpy_toio.c
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#include <asm/io.h>
> +#include <xen/lib/io.h>
> +
> +/*
> + * Use 32-bit raw IO operations for portability across ARM32/ARM64 where
> + * 64-bit accessors may not be atomic and some devices only support 32-bit
> + * aligned accesses.
> + */
> +
> +void memcpy_toio(volatile void __iomem *to, const void *from,
> +	       size_t count)
> +{
> +	while ( count && (!IS_ALIGNED((unsigned long)to, 4) ||
> +			  !IS_ALIGNED((unsigned long)from, 4)) )
> +	{
> +		__raw_writeb(*(const uint8_t *)from, to);
> +		from++;
> +		to++;
> +		count--;
> +	}
> +
> +	while ( count >= 4 )
> +	{
> +		__raw_writel(*(const uint32_t *)from, to);
> +		from += 4;
> +		to += 4;
> +		count -= 4;
> +	}
> +
> +	while ( count )
> +	{
> +		__raw_writeb(*(const uint8_t *)from, to);
> +		from++;
> +		to++;
> +		count--;
> +	}
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 8
> + * tab-width: 8
> + * indent-tabs-mode: t
> + * End:
> + */
> -- 
> 2.34.1
>

[PATCH v7 1/5] xen/domctl: extend XEN_DOMCTL_assign_device to handle not only iommu
[PATCH v7 2/5] xen: arm: smccc: add INVALID_PARAMETER error code
[PATCH v7 3/5] lib/arm: Add I/O memory copy helpers
[PATCH v7 4/5] xen/arm: scmi: introduce SCI SCMI SMC multi-agent driver
[PATCH v7 5/5] docs: arm: add SCI SCMI SMC multi-agent driver docs