[Xen-devel] [PATCH] x86/dom0: improve PVH initrd and metadata placement

Roger Pau Monne posted 1 patch 4 years, 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20200302155509.44753-1-roger.pau@citrix.com
There is a newer version of this series
xen/arch/x86/hvm/dom0_build.c | 51 ++++++++++++++++++++++++++++++++++-
1 file changed, 50 insertions(+), 1 deletion(-)
[Xen-devel] [PATCH] x86/dom0: improve PVH initrd and metadata placement
Posted by Roger Pau Monne 4 years, 1 month ago
Don't assume there's going to be enough space at the tail of the
loaded kernel and instead try to find a suitable memory area where the
initrd and metadata can be loaded.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/dom0_build.c | 51 ++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index eded87eaf5..a03bf2e663 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -490,6 +490,44 @@ static int __init pvh_populate_p2m(struct domain *d)
 #undef MB1_PAGES
 }
 
+static paddr_t find_memory(const struct domain *d, const struct elf_binary *elf,
+                           size_t size)
+{
+    paddr_t kernel_start = (paddr_t)elf->dest_base;
+    paddr_t kernel_end = (paddr_t)(elf->dest_base + elf->dest_size);
+    unsigned int i;
+
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        paddr_t start, end;
+
+        if ( d->arch.e820[i].addr < MB(1) &&
+             d->arch.e820[i].addr + d->arch.e820[i].size < MB(1) )
+            continue;
+
+        start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1));
+        end = d->arch.e820[i].addr + d->arch.e820[i].size;
+
+        /* Deal with the kernel being loaded in the region. */
+        if ( kernel_start <= start && kernel_end >= start )
+            /* Truncate the start of the region */
+            start = ROUNDUP(kernel_end, PAGE_SIZE);
+        else if ( kernel_start <= end && kernel_end >= end )
+            /* Truncate the end of the region */
+            end = kernel_start;
+        /* Pick the biggest of the split regions */
+        else if ( kernel_start - start > end - kernel_end )
+            end = kernel_start;
+        else
+            start = ROUNDUP(kernel_end, PAGE_SIZE);
+
+        if ( end - start >= size )
+            return start;
+    }
+
+    return INVALID_PADDR;
+}
+
 static int __init pvh_load_kernel(struct domain *d, const module_t *image,
                                   unsigned long image_headroom,
                                   module_t *initrd, void *image_base,
@@ -546,7 +584,18 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
         return rc;
     }
 
-    last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE);
+    last_addr = find_memory(d, &elf, sizeof(start_info) +
+                            initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) +
+                                     sizeof(mod)
+                                   : 0 +
+                            cmdline ? ROUNDUP(strlen(cmdline) + 1,
+                                              elf_64bit(&elf) ? 8 : 4)
+                                    : 0);
+    if ( last_addr == INVALID_PADDR )
+    {
+        printk("Unable to find a memory region to load initrd and metadata\n");
+        return -ENOMEM;
+    }
 
     if ( initrd != NULL )
     {
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/dom0: improve PVH initrd and metadata placement
Posted by Jan Beulich 4 years, 1 month ago
On 02.03.2020 16:55, Roger Pau Monne wrote:
> Don't assume there's going to be enough space at the tail of the
> loaded kernel and instead try to find a suitable memory area where the
> initrd and metadata can be loaded.
> 
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/arch/x86/hvm/dom0_build.c | 51 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> index eded87eaf5..a03bf2e663 100644
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -490,6 +490,44 @@ static int __init pvh_populate_p2m(struct domain *d)
>  #undef MB1_PAGES
>  }
>  
> +static paddr_t find_memory(const struct domain *d, const struct elf_binary *elf,
> +                           size_t size)
> +{
> +    paddr_t kernel_start = (paddr_t)elf->dest_base;
> +    paddr_t kernel_end = (paddr_t)(elf->dest_base + elf->dest_size);
> +    unsigned int i;
> +
> +    for ( i = 0; i < d->arch.nr_e820; i++ )
> +    {
> +        paddr_t start, end;
> +
> +        if ( d->arch.e820[i].addr < MB(1) &&
> +             d->arch.e820[i].addr + d->arch.e820[i].size < MB(1) )

I guess you mean <= here, and I also guess the left side of the
&& could be dropped altogether (as redundant - E820 entries
aren't supposed to wrap, or if they did we'd have bigger
problems). Also perhaps ...

> +            continue;
> +
> +        start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1));
> +        end = d->arch.e820[i].addr + d->arch.e820[i].size;

... calculate "end" earlier and use it in the if() above?

As to the aligning to a 1Mb boundary - why are you doing this?
I guess whatever the reason warrants a comment, the more that
further down you only align to page boundaries.

> +        /* Deal with the kernel being loaded in the region. */
> +        if ( kernel_start <= start && kernel_end >= start )

While it doesn't matter much, I think it would look better to
use > on the right side of the && here ...

> +            /* Truncate the start of the region */
> +            start = ROUNDUP(kernel_end, PAGE_SIZE);
> +        else if ( kernel_start <= end && kernel_end >= end )

and < on the left side of the && here. Furthermore - can this
really be "else if()" here, i.e. doesn't it need to be plain
"if()"?

> +            /* Truncate the end of the region */
> +            end = kernel_start;
> +        /* Pick the biggest of the split regions */
> +        else if ( kernel_start - start > end - kernel_end )

I don't think the logic above guarantees e.g. kernel_start > start
(i.e. the subtraction to not wrap). More generally I don't think
it follows that there are two split regions at this point. At the
very least I think this whole block wants to be wrapped in a
check that there's any overlap between kernel and the given region
in the first place.

> +            end = kernel_start;
> +        else
> +            start = ROUNDUP(kernel_end, PAGE_SIZE);
> +
> +        if ( end - start >= size )
> +            return start;

Are all blocks E820_RAM at this point in time? Otherwise there's
a type check missing. (Even if all are of this type now, adding
a type check may still be a good idea to be more future proof.)

> @@ -546,7 +584,18 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
>          return rc;
>      }
>  
> -    last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE);
> +    last_addr = find_memory(d, &elf, sizeof(start_info) +
> +                            initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) +
> +                                     sizeof(mod)
> +                                   : 0 +
> +                            cmdline ? ROUNDUP(strlen(cmdline) + 1,
> +                                              elf_64bit(&elf) ? 8 : 4)
> +                                    : 0);

I guess you mean

    last_addr = find_memory(d, &elf, sizeof(start_info) +
                            (initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) +
                                      sizeof(mod)
                                    : 0) +
                            (cmdline ? ROUNDUP(strlen(cmdline) + 1,
                                               elf_64bit(&elf) ? 8 : 4)
                                     : 0));

?

Also, do both regions need to be adjacent? If not, wouldn't it be
better to find slots for them one by one?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/dom0: improve PVH initrd and metadata placement
Posted by Roger Pau Monné 4 years, 1 month ago
On Tue, Mar 03, 2020 at 10:14:50AM +0100, Jan Beulich wrote:
> On 02.03.2020 16:55, Roger Pau Monne wrote:
> > Don't assume there's going to be enough space at the tail of the
> > loaded kernel and instead try to find a suitable memory area where the
> > initrd and metadata can be loaded.
> > 
> > Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> >  xen/arch/x86/hvm/dom0_build.c | 51 ++++++++++++++++++++++++++++++++++-
> >  1 file changed, 50 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> > index eded87eaf5..a03bf2e663 100644
> > --- a/xen/arch/x86/hvm/dom0_build.c
> > +++ b/xen/arch/x86/hvm/dom0_build.c
> > @@ -490,6 +490,44 @@ static int __init pvh_populate_p2m(struct domain *d)
> >  #undef MB1_PAGES
> >  }
> >  
> > +static paddr_t find_memory(const struct domain *d, const struct elf_binary *elf,
> > +                           size_t size)
> > +{
> > +    paddr_t kernel_start = (paddr_t)elf->dest_base;
> > +    paddr_t kernel_end = (paddr_t)(elf->dest_base + elf->dest_size);
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < d->arch.nr_e820; i++ )
> > +    {
> > +        paddr_t start, end;
> > +
> > +        if ( d->arch.e820[i].addr < MB(1) &&
> > +             d->arch.e820[i].addr + d->arch.e820[i].size < MB(1) )
> 
> I guess you mean <= here,

Hm, right, or else I would have to - 1.

> and I also guess the left side of the
> && could be dropped altogether (as redundant - E820 entries
> aren't supposed to wrap, or if they did we'd have bigger
> problems). Also perhaps ...
> 
> > +            continue;
> > +
> > +        start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1));
> > +        end = d->arch.e820[i].addr + d->arch.e820[i].size;
> 
> ... calculate "end" earlier and use it in the if() above?

Right.

> 
> As to the aligning to a 1Mb boundary - why are you doing this?

I'm not sure placing the initrd and metadata below 1MB is sensible,
even if a region big enough was found. In pvh_populate_p2m we copy the
data on the memory regions < 1MB, I'm not sure BDA/EBDA is marked as
reserved in the memory map always.

> I guess whatever the reason warrants a comment, the more that
> further down you only align to page boundaries.
> 
> > +        /* Deal with the kernel being loaded in the region. */
> > +        if ( kernel_start <= start && kernel_end >= start )
> 
> While it doesn't matter much, I think it would look better to
> use > on the right side of the && here ...
> 
> > +            /* Truncate the start of the region */
> > +            start = ROUNDUP(kernel_end, PAGE_SIZE);
> > +        else if ( kernel_start <= end && kernel_end >= end )
> 
> and < on the left side of the && here. Furthermore - can this
> really be "else if()" here, i.e. doesn't it need to be plain
> "if()"?

I don't think so, as the region where the kernel has been loaded must
always be a single RAM region. Ie: [kernel_start, kernel_end) is
always going to be a subset of a RAM region.

So either the kernel region doesn't overlap, or overlaps with the
head, tail or splits the region into two.

> > +            /* Truncate the end of the region */
> > +            end = kernel_start;
> > +        /* Pick the biggest of the split regions */
> > +        else if ( kernel_start - start > end - kernel_end )
> 
> I don't think the logic above guarantees e.g. kernel_start > start
> (i.e. the subtraction to not wrap). More generally I don't think
> it follows that there are two split regions at this point. At the
> very least I think this whole block wants to be wrapped in a
> check that there's any overlap between kernel and the given region
> in the first place.

Yes, that's indeed missing.

> 
> > +            end = kernel_start;
> > +        else
> > +            start = ROUNDUP(kernel_end, PAGE_SIZE);
> > +
> > +        if ( end - start >= size )
> > +            return start;
> 
> Are all blocks E820_RAM at this point in time? Otherwise there's
> a type check missing. (Even if all are of this type now, adding
> a type check may still be a good idea to be more future proof.)

Will add the check.

> > @@ -546,7 +584,18 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
> >          return rc;
> >      }
> >  
> > -    last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE);
> > +    last_addr = find_memory(d, &elf, sizeof(start_info) +
> > +                            initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) +
> > +                                     sizeof(mod)
> > +                                   : 0 +
> > +                            cmdline ? ROUNDUP(strlen(cmdline) + 1,
> > +                                              elf_64bit(&elf) ? 8 : 4)
> > +                                    : 0);
> 
> I guess you mean
> 
>     last_addr = find_memory(d, &elf, sizeof(start_info) +
>                             (initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) +
>                                       sizeof(mod)
>                                     : 0) +
>                             (cmdline ? ROUNDUP(strlen(cmdline) + 1,
>                                                elf_64bit(&elf) ? 8 : 4)
>                                      : 0));
> 
> ?

Uh, yes, sorry.

> 
> Also, do both regions need to be adjacent? If not, wouldn't it be
> better to find slots for them one by one?

That's going to be much more complicated, as you would have to account
for previous regions while searching for empty spaces. If we want to
go that route we would have to use a rangeset or some such in order to
keep track of used areas.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/dom0: improve PVH initrd and metadata placement
Posted by Jan Beulich 4 years, 1 month ago
On 03.03.2020 11:29, Roger Pau Monné wrote:
> On Tue, Mar 03, 2020 at 10:14:50AM +0100, Jan Beulich wrote:
>> On 02.03.2020 16:55, Roger Pau Monne wrote:
>>> +        start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1));
>>> +        end = d->arch.e820[i].addr + d->arch.e820[i].size;
>>
>> ... calculate "end" earlier and use it in the if() above?
> 
> Right.
> 
>>
>> As to the aligning to a 1Mb boundary - why are you doing this?
> 
> I'm not sure placing the initrd and metadata below 1MB is sensible,
> even if a region big enough was found. In pvh_populate_p2m we copy the
> data on the memory regions < 1MB, I'm not sure BDA/EBDA is marked as
> reserved in the memory map always.

I realize now that I misread the code - you don't align to a 1Mb
boundary, but you cap the range at the lower end. That's fine of
course.

>> I guess whatever the reason warrants a comment, the more that
>> further down you only align to page boundaries.
>>
>>> +        /* Deal with the kernel being loaded in the region. */
>>> +        if ( kernel_start <= start && kernel_end >= start )
>>
>> While it doesn't matter much, I think it would look better to
>> use > on the right side of the && here ...
>>
>>> +            /* Truncate the start of the region */
>>> +            start = ROUNDUP(kernel_end, PAGE_SIZE);
>>> +        else if ( kernel_start <= end && kernel_end >= end )
>>
>> and < on the left side of the && here. Furthermore - can this
>> really be "else if()" here, i.e. doesn't it need to be plain
>> "if()"?
> 
> I don't think so, as the region where the kernel has been loaded must
> always be a single RAM region. Ie: [kernel_start, kernel_end) is
> always going to be a subset of a RAM region.

I this true even with the page size alignment getting done?
IOW are all E820 ranges we produce exact multiples of 4k in
size and aligned to 4k boundaries?

>> Also, do both regions need to be adjacent? If not, wouldn't it be
>> better to find slots for them one by one?
> 
> That's going to be much more complicated, as you would have to account
> for previous regions while searching for empty spaces. If we want to
> go that route we would have to use a rangeset or some such in order to
> keep track of used areas.

I accept this being more complicated, and hence not really
wanting doing now and here. But perhaps you could leave a
comment to the effect that the choice of using a single
region is for simplicity reasons?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/dom0: improve PVH initrd and metadata placement
Posted by Roger Pau Monné 4 years, 1 month ago
On Tue, Mar 03, 2020 at 12:00:00PM +0100, Jan Beulich wrote:
> On 03.03.2020 11:29, Roger Pau Monné wrote:
> > On Tue, Mar 03, 2020 at 10:14:50AM +0100, Jan Beulich wrote:
> >> On 02.03.2020 16:55, Roger Pau Monne wrote:
> >>> +            /* Truncate the start of the region */
> >>> +            start = ROUNDUP(kernel_end, PAGE_SIZE);
> >>> +        else if ( kernel_start <= end && kernel_end >= end )
> >>
> >> and < on the left side of the && here. Furthermore - can this
> >> really be "else if()" here, i.e. doesn't it need to be plain
> >> "if()"?
> > 
> > I don't think so, as the region where the kernel has been loaded must
> > always be a single RAM region. Ie: [kernel_start, kernel_end) is
> > always going to be a subset of a RAM region.
> 
> I this true even with the page size alignment getting done?
> IOW are all E820 ranges we produce exact multiples of 4k in
> size and aligned to 4k boundaries?

Yes, pvh_setup_e820 guarantees that, as the EPT/NPT cannot handle
anything smaller than a page. The RAM regions in the native e820 are
adjusted to that effect.

> >> Also, do both regions need to be adjacent? If not, wouldn't it be
> >> better to find slots for them one by one?
> > 
> > That's going to be much more complicated, as you would have to account
> > for previous regions while searching for empty spaces. If we want to
> > go that route we would have to use a rangeset or some such in order to
> > keep track of used areas.
> 
> I accept this being more complicated, and hence not really
> wanting doing now and here. But perhaps you could leave a
> comment to the effect that the choice of using a single
> region is for simplicity reasons?

Sure.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel