[PATCH 2/3] x86/boot: Drop move_memory() and use memcpy() directly

Andrew Cooper posted 3 patches 4 years, 2 months ago
[PATCH 2/3] x86/boot: Drop move_memory() and use memcpy() directly
Posted by Andrew Cooper 4 years, 2 months ago
The way move_memory() sets up the virtual mappings means that there are always
two non-overlapping regions.  The virtual layout means that memmove()'s
forward/backwards check doesn't do what the caller intends, as the check ought
to be performed in physical space rather than virtual.

Luckily both callers already provide non-overlapping mappings, so this bug
doesn't manifest, and we can move to memcpy() to avoid a backwards copy.
Backwards rep movs's are typically far slower than forwards copies.

Furthermore, both callers already have suitable directmap mappings.  There is
no need to spend time managing early boot mappings, or chunking the copy
through them.

For the main Xen relocation, we can read out of the virtual mapping that we're
executing on, and write directly into the directmap.  In fact, this removes
one dependency on Xen being "at 0" (the XEN_IMG_OFFSET passed as src) for
relocation to occur.

For the module relocation, just transcribe the move_memory() call into an
equivalent memcpy().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/setup.c | 58 +++++-----------------------------------------------
 1 file changed, 5 insertions(+), 53 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 0492856292cf..a6ff450daab7 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -413,53 +413,6 @@ void *__init bootstrap_map(const module_t *mod)
     return ret;
 }
 
-static void *__init move_memory(
-    uint64_t dst, uint64_t src, unsigned int size, bool keep)
-{
-    unsigned int blksz = BOOTSTRAP_MAP_LIMIT - BOOTSTRAP_MAP_BASE;
-    unsigned int mask = (1L << L2_PAGETABLE_SHIFT) - 1;
-
-    if ( src + size > BOOTSTRAP_MAP_BASE )
-        blksz >>= 1;
-
-    while ( size )
-    {
-        module_t mod;
-        unsigned int soffs = src & mask;
-        unsigned int doffs = dst & mask;
-        unsigned int sz;
-        void *d, *s;
-
-        mod.mod_start = (src - soffs) >> PAGE_SHIFT;
-        mod.mod_end = soffs + size;
-        if ( mod.mod_end > blksz )
-            mod.mod_end = blksz;
-        sz = mod.mod_end - soffs;
-        s = bootstrap_map(&mod);
-
-        mod.mod_start = (dst - doffs) >> PAGE_SHIFT;
-        mod.mod_end = doffs + size;
-        if ( mod.mod_end > blksz )
-            mod.mod_end = blksz;
-        if ( sz > mod.mod_end - doffs )
-            sz = mod.mod_end - doffs;
-        d = bootstrap_map(&mod);
-
-        memmove(d + doffs, s + soffs, sz);
-
-        dst += sz;
-        src += sz;
-        size -= sz;
-
-        if ( keep )
-            return size ? NULL : d + doffs;
-
-        bootstrap_map(NULL);
-    }
-
-    return NULL;
-}
-
 #undef BOOTSTRAP_MAP_LIMIT
 
 static uint64_t __init consider_modules(
@@ -1243,7 +1196,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
              * data until after we have switched to the relocated pagetables!
              */
             barrier();
-            move_memory(e, XEN_IMG_OFFSET, _end - _start, 1);
+            memcpy(__va(__pa(_start)), _start, _end - _start);
 
             /* Walk idle_pg_table, relocating non-leaf entries. */
             pl4e = __va(__pa(idle_pg_table));
@@ -1300,8 +1253,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                    "1" (__va(__pa(cpu0_stack))), "2" (STACK_SIZE / 8)
                 : "memory" );
 
-            bootstrap_map(NULL);
-
             printk("New Xen image base address: %#lx\n", xen_phys_start);
         }
 
@@ -1325,9 +1276,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                  (headroom ||
                   ((end - size) >> PAGE_SHIFT) > mod[j].mod_start) )
             {
-                move_memory(end - size + headroom,
-                            (uint64_t)mod[j].mod_start << PAGE_SHIFT,
-                            mod[j].mod_end, 0);
+                memcpy(__va(end - size + headroom),
+                       __va((uint64_t)mod[j].mod_start << PAGE_SHIFT),
+                       mod[j].mod_end);
+
                 mod[j].mod_start = (end - size) >> PAGE_SHIFT;
                 mod[j].mod_end += headroom;
                 mod[j].reserved = 1;
-- 
2.11.0


Re: [PATCH 2/3] x86/boot: Drop move_memory() and use memcpy() directly
Posted by Jan Beulich 4 years, 2 months ago
On 07.12.2021 11:53, Andrew Cooper wrote:
> @@ -1243,7 +1196,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>               * data until after we have switched to the relocated pagetables!
>               */
>              barrier();
> -            move_memory(e, XEN_IMG_OFFSET, _end - _start, 1);
> +            memcpy(__va(__pa(_start)), _start, _end - _start);
>  
>              /* Walk idle_pg_table, relocating non-leaf entries. */
>              pl4e = __va(__pa(idle_pg_table));
> @@ -1300,8 +1253,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>                     "1" (__va(__pa(cpu0_stack))), "2" (STACK_SIZE / 8)
>                  : "memory" );
>  
> -            bootstrap_map(NULL);
> -
>              printk("New Xen image base address: %#lx\n", xen_phys_start);
>          }

This bootstrap_map() must have been dead code already before, except
for the "keep" argument above needlessly having got passed as 1? Afaict
passing 1 was pointless without using the function's return value.

> @@ -1325,9 +1276,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>                   (headroom ||
>                    ((end - size) >> PAGE_SHIFT) > mod[j].mod_start) )
>              {
> -                move_memory(end - size + headroom,
> -                            (uint64_t)mod[j].mod_start << PAGE_SHIFT,
> -                            mod[j].mod_end, 0);
> +                memcpy(__va(end - size + headroom),
> +                       __va((uint64_t)mod[j].mod_start << PAGE_SHIFT),
> +                       mod[j].mod_end);

I'm not convinced this can be memcpy() - consider_modules() specifically
allows for the current module's source and destination areas to overlap.
See also the comment ahead of its invocation a few lines up from here.

I'm also not convinced we have the source range (fully) direct-mapped at
this point. Only full superpages have been mapped so far, and only those
for the current or higher address E820 entries (plus of course the pre-
built mappings of the space below 1Gb [PREBUILT_MAP_LIMIT]) located
below 4Gb.

As to the 2nd argument - if this can indeed be converted in the first
place, may I suggest to also switch to using pfn_to_paddr()?

Jan


Re: [PATCH 2/3] x86/boot: Drop move_memory() and use memcpy() directly
Posted by Andrew Cooper 4 years, 2 months ago
On 07/12/2021 12:03, Jan Beulich wrote:
> On 07.12.2021 11:53, Andrew Cooper wrote:
>> @@ -1243,7 +1196,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>               * data until after we have switched to the relocated pagetables!
>>               */
>>              barrier();
>> -            move_memory(e, XEN_IMG_OFFSET, _end - _start, 1);
>> +            memcpy(__va(__pa(_start)), _start, _end - _start);
>>  
>>              /* Walk idle_pg_table, relocating non-leaf entries. */
>>              pl4e = __va(__pa(idle_pg_table));
>> @@ -1300,8 +1253,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>                     "1" (__va(__pa(cpu0_stack))), "2" (STACK_SIZE / 8)
>>                  : "memory" );
>>  
>> -            bootstrap_map(NULL);
>> -
>>              printk("New Xen image base address: %#lx\n", xen_phys_start);
>>          }
> This bootstrap_map() must have been dead code already before, except
> for the "keep" argument above needlessly having got passed as 1? Afaict
> passing 1 was pointless without using the function's return value.

bootstrap_map(NULL) is necessary to zap the constructed mappings, but it
seems like the use of the return address was dropped by c/s 0b76ce20de
"x86/setup: don't relocate the VGA hole" in 2013.

>
>> @@ -1325,9 +1276,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>                   (headroom ||
>>                    ((end - size) >> PAGE_SHIFT) > mod[j].mod_start) )
>>              {
>> -                move_memory(end - size + headroom,
>> -                            (uint64_t)mod[j].mod_start << PAGE_SHIFT,
>> -                            mod[j].mod_end, 0);
>> +                memcpy(__va(end - size + headroom),
>> +                       __va((uint64_t)mod[j].mod_start << PAGE_SHIFT),
>> +                       mod[j].mod_end);
> I'm not convinced this can be memcpy() - consider_modules() specifically
> allows for the current module's source and destination areas to overlap.
> See also the comment ahead of its invocation a few lines up from here.

The comment which says:

/* Don't overlap with other modules (or Xen itself). */
end = consider_modules(s, e, size, mod,
                       mbi->mods_count + relocated, j);

?

memmove() in move_memory() is broken, and in fact always results in a
backwards copy, which means that one way or another, overlapping source
and destination doesn't work.

If it was really broken before, then it can be fixed now by using
memmove() here, because using 2 directmap mappings means the
forward/backward check will now work as expected.

> I'm also not convinced we have the source range (fully) direct-mapped at
> this point. Only full superpages have been mapped so far, and only those
> for the current or higher address E820 entries (plus of course the pre-
> built mappings of the space below 1Gb [PREBUILT_MAP_LIMIT]) located
> below 4Gb.

PREBUILT_MAP_LIMIT is 2M, and that's only to cover the fact that we
build l1_directmap[] with the VGA UC range at build time.  I was hoping
to remove it in due course.

As to the other mappings, that is awkward.  Perhaps what we ought to do
is split the loops.  First fill in all 2M superpages into the directmap,
then relocate Xen, at which point we've got plenty of frames to feed
into the allocator, to let us do a second pass filling in non-2M regions.

We can depend on the modules living in RAM regions, but might want to
explicitly confirm.

> As to the 2nd argument - if this can indeed be converted in the first
> place, may I suggest to also switch to using pfn_to_paddr()?

Honestly, that's taking a terrible situation and making it worse.

Calling pfn_to_paddr() on what is logically a paddr_t already ought to
be a compilation error, and the logic which makes this change
deliberately is some of the most nack-worthy logic I've ever come across.

It's very much not ok to have mod_start be a paddr or pfn, and for
mod_end to either be an end or a sized, epending on where you are during
boot.

~Andrew

Re: [PATCH 2/3] x86/boot: Drop move_memory() and use memcpy() directly
Posted by Jan Beulich 4 years, 2 months ago
On 09.12.2021 20:58, Andrew Cooper wrote:
> On 07/12/2021 12:03, Jan Beulich wrote:
>> On 07.12.2021 11:53, Andrew Cooper wrote:
>>> @@ -1243,7 +1196,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>>               * data until after we have switched to the relocated pagetables!
>>>               */
>>>              barrier();
>>> -            move_memory(e, XEN_IMG_OFFSET, _end - _start, 1);
>>> +            memcpy(__va(__pa(_start)), _start, _end - _start);
>>>  
>>>              /* Walk idle_pg_table, relocating non-leaf entries. */
>>>              pl4e = __va(__pa(idle_pg_table));
>>> @@ -1300,8 +1253,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>>                     "1" (__va(__pa(cpu0_stack))), "2" (STACK_SIZE / 8)
>>>                  : "memory" );
>>>  
>>> -            bootstrap_map(NULL);
>>> -
>>>              printk("New Xen image base address: %#lx\n", xen_phys_start);
>>>          }
>> This bootstrap_map() must have been dead code already before, except
>> for the "keep" argument above needlessly having got passed as 1? Afaict
>> passing 1 was pointless without using the function's return value.
> 
> bootstrap_map(NULL) is necessary to zap the constructed mappings, but it
> seems like the use of the return address was dropped by c/s 0b76ce20de
> "x86/setup: don't relocate the VGA hole" in 2013.
> 
>>
>>> @@ -1325,9 +1276,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>>                   (headroom ||
>>>                    ((end - size) >> PAGE_SHIFT) > mod[j].mod_start) )
>>>              {
>>> -                move_memory(end - size + headroom,
>>> -                            (uint64_t)mod[j].mod_start << PAGE_SHIFT,
>>> -                            mod[j].mod_end, 0);
>>> +                memcpy(__va(end - size + headroom),
>>> +                       __va((uint64_t)mod[j].mod_start << PAGE_SHIFT),
>>> +                       mod[j].mod_end);
>> I'm not convinced this can be memcpy() - consider_modules() specifically
>> allows for the current module's source and destination areas to overlap.
>> See also the comment ahead of its invocation a few lines up from here.
> 
> The comment which says:
> 
> /* Don't overlap with other modules (or Xen itself). */
> end = consider_modules(s, e, size, mod,
>                        mbi->mods_count + relocated, j);
> 
> ?

Yes, with the emphasis on "other".

> memmove() in move_memory() is broken, and in fact always results in a
> backwards copy, which means that one way or another, overlapping source
> and destination doesn't work.

We agree on this aspect.

> If it was really broken before, then it can be fixed now by using
> memmove() here, because using 2 directmap mappings means the
> forward/backward check will now work as expected.

Use of memmove() here is what I've been trying to suggest. I'm actually
surprised we've had no reports of breakage, so I will admit I'm not
fully certain that I'm not overlooking some aspect excluding the
potential for any overlap here.

>> I'm also not convinced we have the source range (fully) direct-mapped at
>> this point. Only full superpages have been mapped so far, and only those
>> for the current or higher address E820 entries (plus of course the pre-
>> built mappings of the space below 1Gb [PREBUILT_MAP_LIMIT]) located
>> below 4Gb.
> 
> PREBUILT_MAP_LIMIT is 2M, and that's only to cover the fact that we
> build l1_directmap[] with the VGA UC range at build time.  I was hoping
> to remove it in due course.
> 
> As to the other mappings, that is awkward.  Perhaps what we ought to do
> is split the loops.  First fill in all 2M superpages into the directmap,
> then relocate Xen, at which point we've got plenty of frames to feed
> into the allocator, to let us do a second pass filling in non-2M regions.

Well, we already have such a split of the loops, just that the 2nd part
occurs later right now. So what you suggest is either moving up that
2nd part or moving down the relocation of the modules. If you're
convinced either of these can be done without breaking anything, this
would indeed seem to be the way to go. I have to admit that I'm not
convinced (yet).

> We can depend on the modules living in RAM regions, but might want to
> explicitly confirm.
> 
>> As to the 2nd argument - if this can indeed be converted in the first
>> place, may I suggest to also switch to using pfn_to_paddr()?
> 
> Honestly, that's taking a terrible situation and making it worse.
> 
> Calling pfn_to_paddr() on what is logically a paddr_t already ought to
> be a compilation error, and the logic which makes this change
> deliberately is some of the most nack-worthy logic I've ever come across.
> 
> It's very much not ok to have mod_start be a paddr or pfn, and for
> mod_end to either be an end or a sized, epending on where you are during
> boot.

Well, feel free to propose an improvement. Using the modules array
for recording this data seemed better to me at the time than having
a 2nd, almost redundant data object.

Jan