Quite obviously to determine the split condition successive pages'
attributes need to be evaluated, not always those of the initial page.
Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
Split init_heap_pages() in two"), but there it was still benign.
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1885,11 +1885,11 @@ static void init_heap_pages(
* range to cross zones.
*/
#ifdef CONFIG_SEPARATE_XENHEAP
- if ( zone != page_to_zone(pg) )
+ if ( zone != page_to_zone(pg + contig_pages) )
break;
#endif
- if ( nid != (phys_to_nid(page_to_maddr(pg))) )
+ if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) )
break;
}
On 25/07/2022 14:10, Jan Beulich wrote:
> Quite obviously to determine the split condition successive pages'
> attributes need to be evaluated, not always those of the initial page.
>
> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
> Split init_heap_pages() in two"), but there it was still benign.
This also fixes the crash that XenRT found on loads of hardware, which
looks something like:
(XEN) NUMA: Allocated memnodemap from 105bc81000 - 105bc92000
(XEN) NUMA: Using 8 for the hash shift.
(XEN) Early fatal page fault at e008:ffff82d04022ae1e
(cr2=00000000000000b8, ec=0002)
(XEN) ----[ Xen-4.17.0 x86_64 debug=y Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff82d04022ae1e>]
common/page_alloc.c#free_heap_pages+0x2dd/0x850
...
(XEN) Xen call trace:
(XEN) [<ffff82d04022ae1e>] R
common/page_alloc.c#free_heap_pages+0x2dd/0x850
(XEN) [<ffff82d04022dd64>] F
common/page_alloc.c#init_heap_pages+0x55f/0x720
(XEN) [<ffff82d040415234>] F end_boot_allocator+0x187/0x1e7
(XEN) [<ffff82d040452337>] F __start_xen+0x1a06/0x2779
(XEN) [<ffff82d040204344>] F __high_start+0x94/0xa0
Debugging shows that it's always a block which crosses node 0 and 1,
where avail[1] has yet to be initialised.
What I'm confused by is how this manages to manifest broken swiotlb
issues without Xen crashing.
~Andrew
On 25.07.2022 20:54, Andrew Cooper wrote:
> On 25/07/2022 14:10, Jan Beulich wrote:
>> Quite obviously to determine the split condition successive pages'
>> attributes need to be evaluated, not always those of the initial page.
>>
>> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
>> Split init_heap_pages() in two"), but there it was still benign.
>
> This also fixes the crash that XenRT found on loads of hardware, which
> looks something like:
>
> (XEN) NUMA: Allocated memnodemap from 105bc81000 - 105bc92000
> (XEN) NUMA: Using 8 for the hash shift.
> (XEN) Early fatal page fault at e008:ffff82d04022ae1e
> (cr2=00000000000000b8, ec=0002)
> (XEN) ----[ Xen-4.17.0 x86_64 debug=y Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82d04022ae1e>]
> common/page_alloc.c#free_heap_pages+0x2dd/0x850
> ...
> (XEN) Xen call trace:
> (XEN) [<ffff82d04022ae1e>] R
> common/page_alloc.c#free_heap_pages+0x2dd/0x850
> (XEN) [<ffff82d04022dd64>] F
> common/page_alloc.c#init_heap_pages+0x55f/0x720
> (XEN) [<ffff82d040415234>] F end_boot_allocator+0x187/0x1e7
> (XEN) [<ffff82d040452337>] F __start_xen+0x1a06/0x2779
> (XEN) [<ffff82d040204344>] F __high_start+0x94/0xa0
>
> Debugging shows that it's always a block which crosses node 0 and 1,
> where avail[1] has yet to be initialised.
>
> What I'm confused by is how this manages to manifest broken swiotlb
> issues without Xen crashing.
I didn't debug this in detail since I had managed to spot the issue
by staring at the offending patch, but from the observations some
of node 1's memory was actually accounted to node 0 (incl off-by-
65535 node_need_scrub[] values for both nodes), so I would guess
avail[1] simply wasn't accessed before being set up in my case.
Jan
Hi Jan,
(Sorry for the formatting)
On Mon, 25 Jul 2022, 14:10 Jan Beulich, <jbeulich@suse.com> wrote:
> Quite obviously to determine the split condition successive pages'
> attributes need to be evaluated, not always those of the initial page.
>
> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap
> init")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
> Split init_heap_pages() in two"), but there it was still benign.
>
Is this because range will never cross numa node? How about the fake NUMA
node?
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1885,11 +1885,11 @@ static void init_heap_pages(
> * range to cross zones.
> */
> #ifdef CONFIG_SEPARATE_XENHEAP
> - if ( zone != page_to_zone(pg) )
> + if ( zone != page_to_zone(pg + contig_pages) )
> break;
> #endif
>
> - if ( nid != (phys_to_nid(page_to_maddr(pg))) )
> + if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) )
> break;
> }
>
Hmmm I am not sure why I didn't spot this issue during my testing. It looks
like this was introduced in v2, sorry for that.
Reviewed-by: Julien Grall <jgrall@amazon.com>
Cheets,
>
On 25.07.2022 18:05, Julien Grall wrote:
> (Sorry for the formatting)
No issues seen.
> On Mon, 25 Jul 2022, 14:10 Jan Beulich, <jbeulich@suse.com> wrote:
>
>> Quite obviously to determine the split condition successive pages'
>> attributes need to be evaluated, not always those of the initial page.
>>
>> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap
>> init")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
>> Split init_heap_pages() in two"), but there it was still benign.
>>
>
> Is this because range will never cross numa node? How about the fake NUMA
> node?
No (afaict), because pages were still freed one by one (and hence node
boundaries still wouldn't end up in the middle of a buddy).
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1885,11 +1885,11 @@ static void init_heap_pages(
>> * range to cross zones.
>> */
>> #ifdef CONFIG_SEPARATE_XENHEAP
>> - if ( zone != page_to_zone(pg) )
>> + if ( zone != page_to_zone(pg + contig_pages) )
>> break;
>> #endif
>>
>> - if ( nid != (phys_to_nid(page_to_maddr(pg))) )
>> + if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) )
>> break;
>> }
>>
>
> Hmmm I am not sure why I didn't spot this issue during my testing. It looks
> like this was introduced in v2, sorry for that.
>
> Reviewed-by: Julien Grall <jgrall@amazon.com>
Thanks.
Jan
Hi Jan,
On 25/07/2022 17:18, Jan Beulich wrote:
> On 25.07.2022 18:05, Julien Grall wrote:
>> (Sorry for the formatting)
>
> No issues seen.
Good to know. I sent it from my phone and the gmail app used to mangle
e-mails.
>
>> On Mon, 25 Jul 2022, 14:10 Jan Beulich, <jbeulich@suse.com> wrote:
>>
>>> Quite obviously to determine the split condition successive pages'
>>> attributes need to be evaluated, not always those of the initial page.
>>>
>>> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap
>>> init")
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
>>> Split init_heap_pages() in two"), but there it was still benign.
>>>
>>
>> Is this because range will never cross numa node? How about the fake NUMA
>> node?
>
> No (afaict), because pages were still freed one by one (and hence node
> boundaries still wouldn't end up in the middle of a buddy).
So I agree that free_heap_pages() would be called with one page at the
time. However, I think _init_heap_pages() would end up to be called with
the full range.
So we would initialize the first node but not the others (if the range
spans over multiple ones). Therefore, I think free_heap_pages() could
dereference a NULL pointer.
Anyway, I would not expect anyone to only backport the patch to split
_init_heap_pages() and... in any case you already committed it (which is
fine given this is a major regression).
Cheers,
--
Julien Grall
© 2016 - 2026 Red Hat, Inc.