Stop the compiler from inlining non-trivial memset() and memcpy() (for
memset() see e.g. map_vcpu_info() or kimage_load_segments() for
examples). This way we even keep the compiler from using REP STOSQ /
REP MOVSQ when we'd prefer REP STOSB / REP MOVSB (when ERMS is
available).
With gcc10 this yields a modest .text size reduction (release build) of
around 2k.
Unfortunately these options aren't understood by the clang versions I
have readily available for testing with; I'm unaware of equivalents.
Note also that using cc-option-add is not an option here, or at least I
couldn't make things work with it (in case the option was not supported
by the compiler): The embedded comma in the option looks to be getting
in the way.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Re-base.
v2: New.
---
The boundary values are of course up for discussion - I wasn't really
certain whether to use 16 or 32; I'd be less certain about using yet
larger values.
Similarly whether to permit the compiler to emit REP STOSQ / REP MOVSQ
for known size, properly aligned blocks is up for discussion.
--- a/xen/arch/x86/arch.mk
+++ b/xen/arch/x86/arch.mk
@@ -65,6 +65,9 @@ endif
$(call cc-option-add,CFLAGS_stack_boundary,CC,-mpreferred-stack-boundary=3)
export CFLAGS_stack_boundary
+CFLAGS += $(call cc-option,$(CC),-mmemcpy-strategy=unrolled_loop:16:noalign$(comma)libcall:-1:noalign)
+CFLAGS += $(call cc-option,$(CC),-mmemset-strategy=unrolled_loop:16:noalign$(comma)libcall:-1:noalign)
+
ifeq ($(CONFIG_UBSAN),y)
# Don't enable alignment sanitisation. x86 has efficient unaligned accesses,
# and various things (ACPI tables, hypercall pages, stubs, etc) are wont-fix.