Early cleanups and bug fixes in preparation for live update

[Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot

Posted by David Woodhouse 6 years ago

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/common/vmap.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 37922f735b..8343460794 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -68,7 +68,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
     spin_lock(&vm_lock);
     for ( ; ; )
     {
-        struct page_info *pg;
+        mfn_t mfn;
 
         ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
         for ( start = vm_low[t]; start < vm_top[t]; )
@@ -103,9 +103,17 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
         if ( vm_top[t] >= vm_end[t] )
             return NULL;
 
-        pg = alloc_domheap_page(NULL, 0);
-        if ( !pg )
-            return NULL;
+        if ( system_state == SYS_STATE_early_boot )
+        {
+            mfn = alloc_boot_pages(1, 1);
+        }
+        else
+        {
+            struct page_info *pg = alloc_domheap_page(NULL, 0);
+            if ( !pg )
+                return NULL;
+            mfn = page_to_mfn(pg);
+        }
 
         spin_lock(&vm_lock);
 
@@ -113,7 +121,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
         {
             unsigned long va = (unsigned long)vm_bitmap(t) + vm_top[t] / 8;
 
-            if ( !map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR) )
+            if ( !map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR) )
             {
                 clear_page((void *)va);
                 vm_top[t] += PAGE_SIZE * 8;
@@ -123,7 +131,10 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
             }
         }
 
-        free_domheap_page(pg);
+        if ( system_state == SYS_STATE_early_boot )
+            init_boot_pages(mfn_to_maddr(mfn), mfn_to_maddr(mfn) + PAGE_SIZE);
+        else
+            free_domheap_page(mfn_to_page(mfn));
 
         if ( start >= vm_top[t] )
         {
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot

Posted by Julien Grall 6 years ago

Hi David,

On 01/02/2020 00:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>

I am a bit concerned with this change, particularly the consequence this 
have for the page-tables. There is an assumption that intermediate 
page-tables allocated via the boot allocator will never be freed.

On x86, a call to vunmap() will not free page-tables, but a subsequent 
call to vmap() may free it depending on the mapping size. So we would 
call free_domheap_pages() rather than init_heap_pages().

I am not entirely sure what is the full consequence, but I think this is 
a call for investigation and write it down a summary in the commit message.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot

Posted by David Woodhouse 6 years ago

On Mon, 2020-02-03 at 14:00 +0000, Julien Grall wrote:
> Hi David,
> 
> On 01/02/2020 00:33, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> 
> I am a bit concerned with this change, particularly the consequence this 
> have for the page-tables. There is an assumption that intermediate 
> page-tables allocated via the boot allocator will never be freed.
> 
> On x86, a call to vunmap() will not free page-tables, but a subsequent 
> call to vmap() may free it depending on the mapping size. So we would 
> call free_domheap_pages() rather than init_heap_pages().
> 
> I am not entirely sure what is the full consequence, but I think this is 
> a call for investigation and write it down a summary in the commit message.

This isn't just about page tables, right? It's about *any* allocation
given out by the boot allocator, being freed with free_heap_pages() ? 

Given the amount of code that has conditionals in both alloc and free
paths along the lines of…

  if (system_state > SYS_STATE_boot)
      use xenheap
  else
      use boot allocator

… I'm not sure I'd really trust the assumption that such a thing never
happens; that no pages are ever allocated from the boot allocator and
then freed into the heap.

In fact it does work fine except for some esoteric corner cases,
because init_heap_pages() is mostly just a trivial loop over
free_heap_pages().

The corner cases are if you call free_heap_pages() on boot-allocated
memory which matches one or more of the following criteria:

 • Includes MFN #0,

 • Includes the first page the heap has seen on a given node, so
   init_node_heap() has to be called, or

 • High-order allocations crossing from one node to another.

The first is trivial to fix by lifting the check for MFN#0 up into
free_heap_pages(). The second isn't hard either.

I'll have to think about the third a little longer... one option might
be to declare that it's OK to call free_heap_pages() on a *single* page
that was allocated from the boot allocator, but not higher orders (in
case they cross nodes). That would at least cover the specific concern
about page table pages and vm_init()/vmap().

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot

Posted by Julien Grall 6 years ago

Hi,

Answring back to this thread as well, so it is easier to track.

On 03/02/2020 16:37, David Woodhouse wrote:
> On Mon, 2020-02-03 at 14:00 +0000, Julien Grall wrote:
>> Hi David,
>>
>> On 01/02/2020 00:33, David Woodhouse wrote:
>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>
>> I am a bit concerned with this change, particularly the consequence this
>> have for the page-tables. There is an assumption that intermediate
>> page-tables allocated via the boot allocator will never be freed.
>>
>> On x86, a call to vunmap() will not free page-tables, but a subsequent
>> call to vmap() may free it depending on the mapping size. So we would
>> call free_domheap_pages() rather than init_heap_pages().
>>
>> I am not entirely sure what is the full consequence, but I think this is
>> a call for investigation and write it down a summary in the commit message.
> 
> This isn't just about page tables, right? It's about *any* allocation
> given out by the boot allocator, being freed with free_heap_pages() ?

That's correct.

> 
> Given the amount of code that has conditionals in both alloc and free
> paths along the lines of…
> 
>    if (system_state > SYS_STATE_boot)
>        use xenheap
>    else
>        use boot allocator
> 
> … I'm not sure I'd really trust the assumption that such a thing never
> happens; that no pages are ever allocated from the boot allocator and
> then freed into the heap.

I believe this assumption holds on Arm because page-tables are never 
freed so far. The only other case is in the ACPI code 
(acpi_os_free_memory()) but I think it would result to a leak instead 
(see is_xmalloc_memory()).

I hadn't realized that this assumption was not holding on x86 :(. 
Actually, from the discussion on the MFN 0 thread, it looks like x86 is 
abusing quite a few interface in page alloc.

So if this is nothing new, this probably yet another good future 
clean-up/hardening as long as the assumption hold outside of x86.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot

Posted by George Dunlap 6 years ago

On Mon, Feb 3, 2020 at 4:37 PM David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Mon, 2020-02-03 at 14:00 +0000, Julien Grall wrote:
> > Hi David,
> >
> > On 01/02/2020 00:33, David Woodhouse wrote:
> > > From: David Woodhouse <dwmw@amazon.co.uk>
> >
> > I am a bit concerned with this change, particularly the consequence this
> > have for the page-tables. There is an assumption that intermediate
> > page-tables allocated via the boot allocator will never be freed.
> >
> > On x86, a call to vunmap() will not free page-tables, but a subsequent
> > call to vmap() may free it depending on the mapping size. So we would
> > call free_domheap_pages() rather than init_heap_pages().
> >
> > I am not entirely sure what is the full consequence, but I think this is
> > a call for investigation and write it down a summary in the commit message.
>
> This isn't just about page tables, right? It's about *any* allocation
> given out by the boot allocator, being freed with free_heap_pages() ?
>
> Given the amount of code that has conditionals in both alloc and free
> paths along the lines of…
>
>   if (system_state > SYS_STATE_boot)
>       use xenheap
>   else
>       use boot allocator
>
> … I'm not sure I'd really trust the assumption that such a thing never
> happens; that no pages are ever allocated from the boot allocator and
> then freed into the heap.
>
> In fact it does work fine except for some esoteric corner cases,
> because init_heap_pages() is mostly just a trivial loop over
> free_heap_pages().
>
> The corner cases are if you call free_heap_pages() on boot-allocated
> memory which matches one or more of the following criteria:
>
>  • Includes MFN #0,
>
>  • Includes the first page the heap has seen on a given node, so
>    init_node_heap() has to be called, or
>
>  • High-order allocations crossing from one node to another.

I was asked to forward a message relating to MFN 0 and allocations
crossing zones from a private discussion on the security list:

8<---

> I am having difficulty seeing how invalidating MFN0 would solve the issue here.
> The zone number for a specific page is calculated from the most significant bit
> position set in it's MFN. As a result, each successive zone contains an order of
> magnitude more pages. You would need to invalidate the first or last MFN in each
> zone.

Because (unless Jan and I are reading the code wrong):

* Chunks can only be merged such that they end up on order-boundaries.
* Chunks can only be merged if they are the same order.
* Zone boundaries are on order boundaries.

So say you're freeing mfn 0x100, and mfn 0xff is free.  In that loop, (1
<< order) & mfn will always be 0, so it will always only look "forward"
fro things to merge, not backwards.

Suppose on the other hand, that you're freeing mfn 0x101, and 0x98
through 0x100 are free.  The loop will look "backwards" and merge with
0x100; but then it will look "forwards" again.

Now suppose you've merged 0x100-0x1ff, and the order moves up to size
0x100.  Now the mask becomes 0x1ff; so it can't merge with 0x200-0x2ff
(which would cross zones); instead it looks backwards to 0x0-0xff.

We don't think it's possible for things to be merged across zones unless
it can (say) start at 0xff, and merge all the way back to 0x0; which
can't be done if 0x0 is never on the free list.

That's the idea anyway.  That would explain why we've never seen it on
x86 -- due to the way the architecture is, mfn 0 is never on the free list.

--->8

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot

Posted by David Woodhouse 6 years ago

On Tue, 2020-02-04 at 11:00 +0000, George Dunlap wrote:
> On Mon, Feb 3, 2020 at 4:37 PM David Woodhouse <dwmw2@infradead.org> wrote:
> > 
> > On Mon, 2020-02-03 at 14:00 +0000, Julien Grall wrote:
> > > Hi David,
> > > 
> > > On 01/02/2020 00:33, David Woodhouse wrote:
> > > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > 
> > > I am a bit concerned with this change, particularly the consequence this
> > > have for the page-tables. There is an assumption that intermediate
> > > page-tables allocated via the boot allocator will never be freed.
> > > 
> > > On x86, a call to vunmap() will not free page-tables, but a subsequent
> > > call to vmap() may free it depending on the mapping size. So we would
> > > call free_domheap_pages() rather than init_heap_pages().
> > > 
> > > I am not entirely sure what is the full consequence, but I think this is
> > > a call for investigation and write it down a summary in the commit message.
> > 
> > This isn't just about page tables, right? It's about *any* allocation
> > given out by the boot allocator, being freed with free_heap_pages() ?
> > 
> > Given the amount of code that has conditionals in both alloc and free
> > paths along the lines of…
> > 
> >   if (system_state > SYS_STATE_boot)
> >       use xenheap
> >   else
> >       use boot allocator
> > 
> > … I'm not sure I'd really trust the assumption that such a thing never
> > happens; that no pages are ever allocated from the boot allocator and
> > then freed into the heap.
> > 
> > In fact it does work fine except for some esoteric corner cases,
> > because init_heap_pages() is mostly just a trivial loop over
> > free_heap_pages().
> > 
> > The corner cases are if you call free_heap_pages() on boot-allocated
> > memory which matches one or more of the following criteria:
> > 
> >  • Includes MFN #0,
> > 
> >  • Includes the first page the heap has seen on a given node, so
> >    init_node_heap() has to be called, or
> > 
> >  • High-order allocations crossing from one node to another.
> 
> I was asked to forward a message relating to MFN 0 and allocations
> crossing zones from a private discussion on the security list:
> 
> 8<---
> 
> > I am having difficulty seeing how invalidating MFN0 would solve the issue here.
> > The zone number for a specific page is calculated from the most significant bit
> > position set in it's MFN. As a result, each successive zone contains an order of
> > magnitude more pages. You would need to invalidate the first or last MFN in each
> > zone.
> 
> Because (unless Jan and I are reading the code wrong):
> 
> * Chunks can only be merged such that they end up on order-boundaries.
> * Chunks can only be merged if they are the same order.
> * Zone boundaries are on order boundaries.
> 
> So say you're freeing mfn 0x100, and mfn 0xff is free.  In that loop, (1
> << order) & mfn will always be 0, so it will always only look "forward"
> fro things to merge, not backwards.
> 
> Suppose on the other hand, that you're freeing mfn 0x101, and 0x98
> through 0x100 are free.  The loop will look "backwards" and merge with
> 0x100; but then it will look "forwards" again.
> 
> Now suppose you've merged 0x100-0x1ff, and the order moves up to size
> 0x100.  Now the mask becomes 0x1ff; so it can't merge with 0x200-0x2ff
> (which would cross zones); instead it looks backwards to 0x0-0xff.
> 
> We don't think it's possible for things to be merged across zones unless
> it can (say) start at 0xff, and merge all the way back to 0x0; which
> can't be done if 0x0 is never on the free list.
> 
> That's the idea anyway.  That would explain why we've never seen it on
> x86 -- due to the way the architecture is, mfn 0 is never on the free list.
> 
> --->8

Thanks.

I still don't really get it. What if the zone boundary is at MFN 0x300?

What prevents the buddy allocator from merging a range a 0x200-0x2FF
with another from 0x300-0x3FF, creating a single range 0x200-0x400
which crosses nodes?

The MFN0 trick only works if all zone boundaries must be at an address
which is 2ⁿ, doesn't it? Is that always true?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot

Posted by David Woodhouse 6 years ago

On Tue, 2020-02-04 at 11:06 +0000, David Woodhouse wrote:
> 
> I still don't really get it. What if the zone boundary is at MFN 0x300?
> 
> What prevents the buddy allocator from merging a range a 0x200-0x2FF
> with another from 0x300-0x3FF, creating a single range 0x200-0x400
> which crosses nodes?
> 
> The MFN0 trick only works if all zone boundaries must be at an address
> which is 2ⁿ, doesn't it? Is that always true?

As yes, it is. And you called it out explicitly. Sorry.

I think I was conflating NUMA nodes, with the allocator zones.

On the plus side, this is a whole lot easier for the boot allocator to
cope with. It's easy enough to avoid handing out any high-order
allocations from the boot allocator which would cross a *zone*
boundary. It's crossing *node* boundaries which I was thinking of,
which would be harder before we've even parsed the SRAT...
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot

Posted by Jan Beulich 5 years, 11 months ago

On 01.02.2020 01:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

A word on the "why" would be nice, just like Wei has in the previous
patch. Besides that just two cosmetic requests:

> --- a/xen/common/vmap.c
> +++ b/xen/common/vmap.c
> @@ -68,7 +68,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
>      spin_lock(&vm_lock);
>      for ( ; ; )
>      {
> -        struct page_info *pg;
> +        mfn_t mfn;
>  
>          ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
>          for ( start = vm_low[t]; start < vm_top[t]; )
> @@ -103,9 +103,17 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
>          if ( vm_top[t] >= vm_end[t] )
>              return NULL;
>  
> -        pg = alloc_domheap_page(NULL, 0);
> -        if ( !pg )
> -            return NULL;
> +        if ( system_state == SYS_STATE_early_boot )
> +        {
> +            mfn = alloc_boot_pages(1, 1);
> +        }

Please omit the braces in cases like this one.

> +        else
> +        {
> +            struct page_info *pg = alloc_domheap_page(NULL, 0);
> +            if ( !pg )

Please have a blank line between declaration(s) and statement(s).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel