Quite obviously to determine the split condition successive pages'
attributes need to be evaluated, not always those of the initial page.
Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Part of the problem was already introduced in 24a53060bd37 ("xen/heap:
Split init_heap_pages() in two"), but there it was still benign.
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1885,11 +1885,11 @@ static void init_heap_pages(
* range to cross zones.
*/
#ifdef CONFIG_SEPARATE_XENHEAP
- if ( zone != page_to_zone(pg) )
+ if ( zone != page_to_zone(pg + contig_pages) )
break;
#endif
- if ( nid != (phys_to_nid(page_to_maddr(pg))) )
+ if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) )
break;
}
On 25/07/2022 14:10, Jan Beulich wrote: > Quite obviously to determine the split condition successive pages' > attributes need to be evaluated, not always those of the initial page. > > Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init") > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > Part of the problem was already introduced in 24a53060bd37 ("xen/heap: > Split init_heap_pages() in two"), but there it was still benign. This also fixes the crash that XenRT found on loads of hardware, which looks something like: (XEN) NUMA: Allocated memnodemap from 105bc81000 - 105bc92000 (XEN) NUMA: Using 8 for the hash shift. (XEN) Early fatal page fault at e008:ffff82d04022ae1e (cr2=00000000000000b8, ec=0002) (XEN) ----[ Xen-4.17.0 x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d04022ae1e>] common/page_alloc.c#free_heap_pages+0x2dd/0x850 ... (XEN) Xen call trace: (XEN) [<ffff82d04022ae1e>] R common/page_alloc.c#free_heap_pages+0x2dd/0x850 (XEN) [<ffff82d04022dd64>] F common/page_alloc.c#init_heap_pages+0x55f/0x720 (XEN) [<ffff82d040415234>] F end_boot_allocator+0x187/0x1e7 (XEN) [<ffff82d040452337>] F __start_xen+0x1a06/0x2779 (XEN) [<ffff82d040204344>] F __high_start+0x94/0xa0 Debugging shows that it's always a block which crosses node 0 and 1, where avail[1] has yet to be initialised. What I'm confused by is how this manages to manifest broken swiotlb issues without Xen crashing. ~Andrew
On 25.07.2022 20:54, Andrew Cooper wrote: > On 25/07/2022 14:10, Jan Beulich wrote: >> Quite obviously to determine the split condition successive pages' >> attributes need to be evaluated, not always those of the initial page. >> >> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init") >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> Part of the problem was already introduced in 24a53060bd37 ("xen/heap: >> Split init_heap_pages() in two"), but there it was still benign. > > This also fixes the crash that XenRT found on loads of hardware, which > looks something like: > > (XEN) NUMA: Allocated memnodemap from 105bc81000 - 105bc92000 > (XEN) NUMA: Using 8 for the hash shift. > (XEN) Early fatal page fault at e008:ffff82d04022ae1e > (cr2=00000000000000b8, ec=0002) > (XEN) ----[ Xen-4.17.0 x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82d04022ae1e>] > common/page_alloc.c#free_heap_pages+0x2dd/0x850 > ... > (XEN) Xen call trace: > (XEN) [<ffff82d04022ae1e>] R > common/page_alloc.c#free_heap_pages+0x2dd/0x850 > (XEN) [<ffff82d04022dd64>] F > common/page_alloc.c#init_heap_pages+0x55f/0x720 > (XEN) [<ffff82d040415234>] F end_boot_allocator+0x187/0x1e7 > (XEN) [<ffff82d040452337>] F __start_xen+0x1a06/0x2779 > (XEN) [<ffff82d040204344>] F __high_start+0x94/0xa0 > > Debugging shows that it's always a block which crosses node 0 and 1, > where avail[1] has yet to be initialised. > > What I'm confused by is how this manages to manifest broken swiotlb > issues without Xen crashing. I didn't debug this in detail since I had managed to spot the issue by staring at the offending patch, but from the observations some of node 1's memory was actually accounted to node 0 (incl off-by- 65535 node_need_scrub[] values for both nodes), so I would guess avail[1] simply wasn't accessed before being set up in my case. Jan
Hi Jan, (Sorry for the formatting) On Mon, 25 Jul 2022, 14:10 Jan Beulich, <jbeulich@suse.com> wrote: > Quite obviously to determine the split condition successive pages' > attributes need to be evaluated, not always those of the initial page. > > Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap > init") > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > Part of the problem was already introduced in 24a53060bd37 ("xen/heap: > Split init_heap_pages() in two"), but there it was still benign. > Is this because range will never cross numa node? How about the fake NUMA node? > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1885,11 +1885,11 @@ static void init_heap_pages( > * range to cross zones. > */ > #ifdef CONFIG_SEPARATE_XENHEAP > - if ( zone != page_to_zone(pg) ) > + if ( zone != page_to_zone(pg + contig_pages) ) > break; > #endif > > - if ( nid != (phys_to_nid(page_to_maddr(pg))) ) > + if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) ) > break; > } > Hmmm I am not sure why I didn't spot this issue during my testing. It looks like this was introduced in v2, sorry for that. Reviewed-by: Julien Grall <jgrall@amazon.com> Cheets, >
On 25.07.2022 18:05, Julien Grall wrote: > (Sorry for the formatting) No issues seen. > On Mon, 25 Jul 2022, 14:10 Jan Beulich, <jbeulich@suse.com> wrote: > >> Quite obviously to determine the split condition successive pages' >> attributes need to be evaluated, not always those of the initial page. >> >> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap >> init") >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> Part of the problem was already introduced in 24a53060bd37 ("xen/heap: >> Split init_heap_pages() in two"), but there it was still benign. >> > > Is this because range will never cross numa node? How about the fake NUMA > node? No (afaict), because pages were still freed one by one (and hence node boundaries still wouldn't end up in the middle of a buddy). >> --- a/xen/common/page_alloc.c >> +++ b/xen/common/page_alloc.c >> @@ -1885,11 +1885,11 @@ static void init_heap_pages( >> * range to cross zones. >> */ >> #ifdef CONFIG_SEPARATE_XENHEAP >> - if ( zone != page_to_zone(pg) ) >> + if ( zone != page_to_zone(pg + contig_pages) ) >> break; >> #endif >> >> - if ( nid != (phys_to_nid(page_to_maddr(pg))) ) >> + if ( nid != (phys_to_nid(page_to_maddr(pg + contig_pages))) ) >> break; >> } >> > > Hmmm I am not sure why I didn't spot this issue during my testing. It looks > like this was introduced in v2, sorry for that. > > Reviewed-by: Julien Grall <jgrall@amazon.com> Thanks. Jan
Hi Jan, On 25/07/2022 17:18, Jan Beulich wrote: > On 25.07.2022 18:05, Julien Grall wrote: >> (Sorry for the formatting) > > No issues seen. Good to know. I sent it from my phone and the gmail app used to mangle e-mails. > >> On Mon, 25 Jul 2022, 14:10 Jan Beulich, <jbeulich@suse.com> wrote: >> >>> Quite obviously to determine the split condition successive pages' >>> attributes need to be evaluated, not always those of the initial page. >>> >>> Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap >>> init") >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> --- >>> Part of the problem was already introduced in 24a53060bd37 ("xen/heap: >>> Split init_heap_pages() in two"), but there it was still benign. >>> >> >> Is this because range will never cross numa node? How about the fake NUMA >> node? > > No (afaict), because pages were still freed one by one (and hence node > boundaries still wouldn't end up in the middle of a buddy). So I agree that free_heap_pages() would be called with one page at the time. However, I think _init_heap_pages() would end up to be called with the full range. So we would initialize the first node but not the others (if the range spans over multiple ones). Therefore, I think free_heap_pages() could dereference a NULL pointer. Anyway, I would not expect anyone to only backport the patch to split _init_heap_pages() and... in any case you already committed it (which is fine given this is a major regression). Cheers, -- Julien Grall
© 2016 - 2024 Red Hat, Inc.