[PATCH] page-alloc: fix initialization of cross-node regions

Jan Beulich posted 1 patch 1 year, 9 months ago
Failed in applying to current master (apply log)
[PATCH] page-alloc: fix initialization of cross-node regions
Posted by Jan Beulich 1 year, 9 months ago
Quite obviously to determine the split condition successive pages'
attributes need to be evaluated, not always those of the initial page.

Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
Split init_heap_pages() in two"), but there it was still benign.

--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1885,11 +1885,11 @@ static void init_heap_pages(
              * range to cross zones.
              */
 #ifdef CONFIG_SEPARATE_XENHEAP
-            if ( zone != page_to_zone(pg) )
+            if ( zone != page_to_zone(pg + contig_pages) )
                 break;
 #endif
 
-            if ( nid != (phys_to_nid(page_to_maddr(pg))) )
+            if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) )
                 break;
         }
Re: [PATCH] page-alloc: fix initialization of cross-node regions
Posted by Andrew Cooper 1 year, 9 months ago
On 25/07/2022 14:10, Jan Beulich wrote:
> Quite obviously to determine the split condition successive pages'
> attributes need to be evaluated, not always those of the initial page.
>
> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
> Split init_heap_pages() in two"), but there it was still benign.

This also fixes the crash that XenRT found on loads of hardware, which
looks something like:

(XEN) NUMA: Allocated memnodemap from 105bc81000 - 105bc92000
(XEN) NUMA: Using 8 for the hash shift.
(XEN) Early fatal page fault at e008:ffff82d04022ae1e
(cr2=00000000000000b8, ec=0002)
(XEN) ----[ Xen-4.17.0  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d04022ae1e>]
common/page_alloc.c#free_heap_pages+0x2dd/0x850
...
(XEN) Xen call trace:
(XEN)    [<ffff82d04022ae1e>] R
common/page_alloc.c#free_heap_pages+0x2dd/0x850
(XEN)    [<ffff82d04022dd64>] F
common/page_alloc.c#init_heap_pages+0x55f/0x720
(XEN)    [<ffff82d040415234>] F end_boot_allocator+0x187/0x1e7
(XEN)    [<ffff82d040452337>] F __start_xen+0x1a06/0x2779
(XEN)    [<ffff82d040204344>] F __high_start+0x94/0xa0

Debugging shows that it's always a block which crosses node 0 and 1,
where avail[1] has yet to be initialised.

What I'm confused by is how this manages to manifest broken swiotlb
issues without Xen crashing.

~Andrew
Re: [PATCH] page-alloc: fix initialization of cross-node regions
Posted by Jan Beulich 1 year, 9 months ago
On 25.07.2022 20:54, Andrew Cooper wrote:
> On 25/07/2022 14:10, Jan Beulich wrote:
>> Quite obviously to determine the split condition successive pages'
>> attributes need to be evaluated, not always those of the initial page.
>>
>> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
>> Split init_heap_pages() in two"), but there it was still benign.
> 
> This also fixes the crash that XenRT found on loads of hardware, which
> looks something like:
> 
> (XEN) NUMA: Allocated memnodemap from 105bc81000 - 105bc92000
> (XEN) NUMA: Using 8 for the hash shift.
> (XEN) Early fatal page fault at e008:ffff82d04022ae1e
> (cr2=00000000000000b8, ec=0002)
> (XEN) ----[ Xen-4.17.0  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d04022ae1e>]
> common/page_alloc.c#free_heap_pages+0x2dd/0x850
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04022ae1e>] R
> common/page_alloc.c#free_heap_pages+0x2dd/0x850
> (XEN)    [<ffff82d04022dd64>] F
> common/page_alloc.c#init_heap_pages+0x55f/0x720
> (XEN)    [<ffff82d040415234>] F end_boot_allocator+0x187/0x1e7
> (XEN)    [<ffff82d040452337>] F __start_xen+0x1a06/0x2779
> (XEN)    [<ffff82d040204344>] F __high_start+0x94/0xa0
> 
> Debugging shows that it's always a block which crosses node 0 and 1,
> where avail[1] has yet to be initialised.
> 
> What I'm confused by is how this manages to manifest broken swiotlb
> issues without Xen crashing.

I didn't debug this in detail since I had managed to spot the issue
by staring at the offending patch, but from the observations some
of node 1's memory was actually accounted to node 0 (incl off-by-
65535 node_need_scrub[] values for both nodes), so I would guess
avail[1] simply wasn't accessed before being set up in my case.

Jan

Re: [PATCH] page-alloc: fix initialization of cross-node regions
Posted by Julien Grall 1 year, 9 months ago
Hi Jan,

(Sorry for the formatting)


On Mon, 25 Jul 2022, 14:10 Jan Beulich, <jbeulich@suse.com> wrote:

> Quite obviously to determine the split condition successive pages'
> attributes need to be evaluated, not always those of the initial page.
>
> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap
> init")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
> Split init_heap_pages() in two"), but there it was still benign.
>

Is this because range will never cross numa node? How about the fake NUMA
node?


> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1885,11 +1885,11 @@ static void init_heap_pages(
>               * range to cross zones.
>               */
>  #ifdef CONFIG_SEPARATE_XENHEAP
> -            if ( zone != page_to_zone(pg) )
> +            if ( zone != page_to_zone(pg + contig_pages) )
>                  break;
>  #endif
>
> -            if ( nid != (phys_to_nid(page_to_maddr(pg))) )
> +            if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) )
>                  break;
>          }
>

Hmmm I am not sure why I didn't spot this issue during my testing. It looks
like this was introduced in v2, sorry for that.

Reviewed-by: Julien Grall <jgrall@amazon.com>

Cheets,


>
Re: [PATCH] page-alloc: fix initialization of cross-node regions
Posted by Jan Beulich 1 year, 9 months ago
On 25.07.2022 18:05, Julien Grall wrote:
> (Sorry for the formatting)

No issues seen.

> On Mon, 25 Jul 2022, 14:10 Jan Beulich, <jbeulich@suse.com> wrote:
> 
>> Quite obviously to determine the split condition successive pages'
>> attributes need to be evaluated, not always those of the initial page.
>>
>> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap
>> init")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
>> Split init_heap_pages() in two"), but there it was still benign.
>>
> 
> Is this because range will never cross numa node? How about the fake NUMA
> node?

No (afaict), because pages were still freed one by one (and hence node
boundaries still wouldn't end up in the middle of a buddy).

>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1885,11 +1885,11 @@ static void init_heap_pages(
>>               * range to cross zones.
>>               */
>>  #ifdef CONFIG_SEPARATE_XENHEAP
>> -            if ( zone != page_to_zone(pg) )
>> +            if ( zone != page_to_zone(pg + contig_pages) )
>>                  break;
>>  #endif
>>
>> -            if ( nid != (phys_to_nid(page_to_maddr(pg))) )
>> +            if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) )
>>                  break;
>>          }
>>
> 
> Hmmm I am not sure why I didn't spot this issue during my testing. It looks
> like this was introduced in v2, sorry for that.
> 
> Reviewed-by: Julien Grall <jgrall@amazon.com>

Thanks.

Jan
Re: [PATCH] page-alloc: fix initialization of cross-node regions
Posted by Julien Grall 1 year, 9 months ago
Hi Jan,

On 25/07/2022 17:18, Jan Beulich wrote:
> On 25.07.2022 18:05, Julien Grall wrote:
>> (Sorry for the formatting)
> 
> No issues seen.

Good to know. I sent it from my phone and the gmail app used to mangle 
e-mails.

> 
>> On Mon, 25 Jul 2022, 14:10 Jan Beulich, <jbeulich@suse.com> wrote:
>>
>>> Quite obviously to determine the split condition successive pages'
>>> attributes need to be evaluated, not always those of the initial page.
>>>
>>> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap
>>> init")
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
>>> Split init_heap_pages() in two"), but there it was still benign.
>>>
>>
>> Is this because range will never cross numa node? How about the fake NUMA
>> node?
> 
> No (afaict), because pages were still freed one by one (and hence node
> boundaries still wouldn't end up in the middle of a buddy).

So I agree that free_heap_pages() would be called with one page at the 
time. However, I think _init_heap_pages() would end up to be called with 
the full range.

So we would initialize the first node but not the others (if the range 
spans over multiple ones). Therefore, I think free_heap_pages() could 
dereference a NULL pointer.

Anyway, I would not expect anyone to only backport the patch to split 
_init_heap_pages() and... in any case you already committed it (which is 
fine given this is a major regression).

Cheers,

-- 
Julien Grall