[PATCH v6 4/7] x86: re-work memcpy()

Jan Beulich posted 7 patches 4 months, 2 weeks ago
There is a newer version of this series
[PATCH v6 4/7] x86: re-work memcpy()
Posted by Jan Beulich 4 months, 2 weeks ago
Move the function to its own assembly file. Having it in C just for the
entire body to be an asm() isn't really helpful. Then have two flavors:
A "basic" version using qword steps for the bulk of the operation, and an
ERMS version for modern hardware, to be substituted in via alternatives
patching.

Alternatives patching, however, requires an extra precaution: It uses
memcpy() itself, and hence the function may patch itself. Luckily the
patched-in code only replaces the prolog of the original function. Make
sure this remains this way.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Teddy Astie <teddy.astie@vates.tech>
---
We may want to consider branching over the REP MOVSQ as well, if the
number of qwords turns out to be zero.
We may also want to consider using non-REP MOVS{L,W,B} for the tail.

TBD: We may further need a workaround similar to Linux'es 8ca97812c3c8
     ("x86/mce: Work around an erratum on fast string copy
     instructions").

TBD: Some older AMD CPUs have an issue with REP MOVS when source and
     destination are misaligned with one another (modulo 32?), which may
     require a separate memcpy() flavor.
---
v6: Move alternatives.c change to a separate patch.
v5: Re-base.
v4: Use CR2 write as serializing insn, and limit its use to boot time.
v3: Re-base.

--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_RETURN_THUNK) += indirect-t
 obj-$(CONFIG_PV) += ioport_emulate.o
 obj-y += irq.o
 obj-$(CONFIG_KEXEC) += machine_kexec.o
+obj-y += memcpy.o
 obj-y += memset.o
 obj-y += mm.o x86_64/mm.o
 obj-$(CONFIG_VM_EVENT) += monitor.o
--- /dev/null
+++ b/xen/arch/x86/memcpy.S
@@ -0,0 +1,20 @@
+#include <asm/asm_defns.h>
+
+FUNC(memcpy)
+        mov     %rdx, %rcx
+        mov     %rdi, %rax
+        /*
+         * We need to be careful here: memcpy() is involved in alternatives
+         * patching, so the code doing the actual copying (i.e. past setting
+         * up registers) may not be subject to patching (unless further
+         * precautions were taken).
+         */
+        ALTERNATIVE "and $7, %edx; shr $3, %rcx", \
+                    STR(rep movsb; RET), X86_FEATURE_ERMS
+        rep movsq
+        or      %edx, %ecx
+        jz      1f
+        rep movsb
+1:
+        RET
+END(memcpy)
--- a/xen/arch/x86/string.c
+++ b/xen/arch/x86/string.c
@@ -7,21 +7,6 @@
 
 #include <xen/lib.h>
 
-void *(memcpy)(void *dest, const void *src, size_t n)
-{
-    long d0, d1, d2;
-
-    asm volatile (
-        "   rep ; movs"__OS" ; "
-        "   mov %k4,%k3      ; "
-        "   rep ; movsb        "
-        : "=&c" (d0), "=&D" (d1), "=&S" (d2)
-        : "0" (n/BYTES_PER_LONG), "r" (n%BYTES_PER_LONG), "1" (dest), "2" (src)
-        : "memory" );
-
-    return dest;
-}
-
 void *(memmove)(void *dest, const void *src, size_t n)
 {
     long d0, d1, d2;
Re: [PATCH v6 4/7] x86: re-work memcpy()
Posted by Jason Andryuk 4 months, 1 week ago
On 2025-06-16 09:00, Jan Beulich wrote:
> Move the function to its own assembly file. Having it in C just for the
> entire body to be an asm() isn't really helpful. Then have two flavors:
> A "basic" version using qword steps for the bulk of the operation, and an
> ERMS version for modern hardware, to be substituted in via alternatives
> patching.
> 
> Alternatives patching, however, requires an extra precaution: It uses
> memcpy() itself, and hence the function may patch itself. Luckily the
> patched-in code only replaces the prolog of the original function. Make
> sure this remains this way.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Teddy Astie <teddy.astie@vates.tech>

> --- /dev/null
> +++ b/xen/arch/x86/memcpy.S
> @@ -0,0 +1,20 @@
> +#include <asm/asm_defns.h>
> +
> +FUNC(memcpy)
> +        mov     %rdx, %rcx
> +        mov     %rdi, %rax
> +        /*
> +         * We need to be careful here: memcpy() is involved in alternatives
> +         * patching, so the code doing the actual copying (i.e. past setting
> +         * up registers) may not be subject to patching (unless further
> +         * precautions were taken).
> +         */

I think this phrasing is a little clearer:

We need to be careful here: memcpy() is involved in alternatives 
patching.  Define the original code as only the register setup.  The 
code doing the actual copying (i.e. past setting up registers) is not 
subject to patching, which avoids it changing underneath the processor.

Your comment is okay if you prefer not to change it:

Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>

Regards,
Jason

> +        ALTERNATIVE "and $7, %edx; shr $3, %rcx", \
> +                    STR(rep movsb; RET), X86_FEATURE_ERMS
> +        rep movsq
> +        or      %edx, %ecx
> +        jz      1f
> +        rep movsb
> +1:
> +        RET
> +END(memcpy)