ASSERT in rangeset_remove_range

Stefano Stabellini posted 1 patch 2 years, 5 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/alpine.DEB.2.22.394.2111081430090.3317@ubuntu-linux-20-04-desktop
ASSERT in rangeset_remove_range
Posted by Stefano Stabellini 2 years, 5 months ago
Hi Oleksandr, Julien,

I discovered a bug caused by the recent changes to introduce extended
regions in make_hypervisor_node (more logs appended):


(XEN) d1 BANK[0] 0x00000040000000-0x0000007e800000 (1000MB)
(XEN) d1 BANK[1] 0x00000200000000-0x00000200000000 (0MB)
(XEN) DEBUG find_unallocated_memory 994 s=40000000 e=7e7fffff
(XEN) DEBUG find_unallocated_memory 994 s=200000000 e=1ffffffff
(XEN) Assertion 's <= e' failed at rangeset.c:189


When a bank of memory is zero in size, then rangeset_remove_range is
called with end < start, triggering an ASSERT in rangeset_remove_range.

One solution is to avoid creating 0 size banks. The following change
does that:


diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 49b4eb2b13..3efe542d0f 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -459,9 +459,12 @@ static void __init allocate_memory(struct domain *d, struct kernel_info *kinfo)
         goto fail;
 
     bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);
-    if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
+    if ( bank_size > 0 )
+    {
+        if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
                                bank_size) )
-        goto fail;
+            goto fail;
+    }
 
     if ( kinfo->unassigned_mem )
         goto fail;



We have a couple of other options too:

- remove the ASSERT in rangeset_remove_range
There is an argument that we should simply return error
fromrangeset_remove_range, rather than a full assert.

- add a if (end > start) check before calling rangeset_remove_range
We could check that end > start before calling rangeset_remove_range at
all the call sites in domain_build.c. There are 5 call sites at the
moment.

Any other ideas or suggestions?

Cheers,

Stefano



(XEN) DEBUG find_memory_holes 1126 s=ff990000 e=ff990fff
(XEN) DEBUG find_memory_holes 1126 s=ff990000 e=ff990fff
(XEN) DEBUG find_memory_holes 1126 s=ff990000 e=ff990fff
(XEN) DEBUG find_memory_holes 1126 s=ff990000 e=ff990fff
(XEN) DEBUG find_memory_holes 1126 s=f9010000 e=f901ffff
(XEN) DEBUG find_memory_holes 1126 s=f9020000 e=f903ffff
(XEN) DEBUG find_memory_holes 1126 s=f9040000 e=f905ffff
(XEN) DEBUG find_memory_holes 1126 s=f9060000 e=f907ffff
(XEN) DEBUG find_memory_holes 1126 s=fd800000 e=fd81ffff
(XEN) DEBUG find_memory_holes 1126 s=ff060000 e=ff060fff
(XEN) DEBUG find_memory_holes 1126 s=ff070000 e=ff070fff
(XEN) DEBUG find_memory_holes 1126 s=fd6e0000 e=fd6e8fff
(XEN) DEBUG find_memory_holes 1126 s=fd6e9000 e=fd6edfff
(XEN) DEBUG find_memory_holes 1126 s=fd500000 e=fd500fff
(XEN) DEBUG find_memory_holes 1126 s=fd510000 e=fd510fff
(XEN) DEBUG find_memory_holes 1126 s=fd520000 e=fd520fff
(XEN) DEBUG find_memory_holes 1126 s=fd530000 e=fd530fff
(XEN) DEBUG find_memory_holes 1126 s=fd540000 e=fd540fff
(XEN) DEBUG find_memory_holes 1126 s=fd550000 e=fd550fff
(XEN) DEBUG find_memory_holes 1126 s=fd560000 e=fd560fff
(XEN) DEBUG find_memory_holes 1126 s=fd570000 e=fd570fff
(XEN) DEBUG find_memory_holes 1126 s=fd4b0000 e=fd4bffff
(XEN) DEBUG find_memory_holes 1126 s=ffa80000 e=ffa80fff
(XEN) DEBUG find_memory_holes 1126 s=ffa90000 e=ffa90fff
(XEN) DEBUG find_memory_holes 1126 s=ffaa0000 e=ffaa0fff
(XEN) DEBUG find_memory_holes 1126 s=ffab0000 e=ffab0fff
(XEN) DEBUG find_memory_holes 1126 s=ffac0000 e=ffac0fff
(XEN) DEBUG find_memory_holes 1126 s=ffad0000 e=ffad0fff
(XEN) DEBUG find_memory_holes 1126 s=ffae0000 e=ffae0fff
(XEN) DEBUG find_memory_holes 1126 s=ffaf0000 e=ffaf0fff
(XEN) DEBUG find_memory_holes 1126 s=fd070000 e=fd09ffff
(XEN) DEBUG find_memory_holes 1126 s=ff100000 e=ff100fff
(XEN) DEBUG find_memory_holes 1126 s=ff0b0000 e=ff0b0fff
(XEN) DEBUG find_memory_holes 1126 s=ff0c0000 e=ff0c0fff
(XEN) DEBUG find_memory_holes 1126 s=ff0d0000 e=ff0d0fff
(XEN) DEBUG find_memory_holes 1126 s=ff0e0000 e=ff0e0fff
(XEN) DEBUG find_memory_holes 1126 s=ff0a0000 e=ff0a0fff
(XEN) DEBUG find_memory_holes 1126 s=ff020000 e=ff020fff
(XEN) DEBUG find_memory_holes 1126 s=ff030000 e=ff030fff
(XEN) DEBUG find_memory_holes 1126 s=ff960000 e=ff960fff
(XEN) DEBUG find_memory_holes 1126 s=ffa00000 e=ffa0ffff
(XEN) DEBUG find_memory_holes 1126 s=fd0b0000 e=fd0bffff
(XEN) DEBUG find_memory_holes 1126 s=fd490000 e=fd49ffff
(XEN) DEBUG find_memory_holes 1126 s=ffa10000 e=ffa1ffff
(XEN) DEBUG find_memory_holes 1126 s=fd0e0000 e=fd0e0fff
(XEN) DEBUG find_memory_holes 1126 s=fd480000 e=fd480fff
(XEN) DEBUG find_memory_holes 1126 s=8000000000 e=8000ffffff
(XEN) DEBUG handle_pci_range 1056 s=e0000000 e=efffffff
(XEN) DEBUG handle_pci_range 1056 s=600000000 e=7ffffffff
(XEN) DEBUG find_memory_holes 1126 s=ff0f0000 e=ff0f0fff
(XEN) DEBUG find_memory_holes 1126 s=c0000000 e=c7ffffff
(XEN) DEBUG find_memory_holes 1126 s=ffa60000 e=ffa60fff
(XEN) DEBUG find_memory_holes 1126 s=fd400000 e=fd43ffff
(XEN) DEBUG find_memory_holes 1126 s=fd3d0000 e=fd3d0fff
(XEN) DEBUG find_memory_holes 1126 s=fd0c0000 e=fd0c1fff
(XEN) DEBUG find_memory_holes 1126 s=ff160000 e=ff160fff
(XEN) DEBUG find_memory_holes 1126 s=ff170000 e=ff170fff
(XEN) DEBUG find_memory_holes 1126 s=ff040000 e=ff040fff
(XEN) DEBUG find_memory_holes 1126 s=ff050000 e=ff050fff
(XEN) DEBUG find_memory_holes 1126 s=ff110000 e=ff110fff
(XEN) DEBUG find_memory_holes 1126 s=ff120000 e=ff120fff
(XEN) DEBUG find_memory_holes 1126 s=ff130000 e=ff130fff
(XEN) DEBUG find_memory_holes 1126 s=ff140000 e=ff140fff
(XEN) DEBUG find_memory_holes 1126 s=ff000000 e=ff000fff
(XEN) DEBUG find_memory_holes 1126 s=ff010000 e=ff010fff
(XEN) DEBUG find_memory_holes 1126 s=ff9d0000 e=ff9d0fff
(XEN) DEBUG find_memory_holes 1126 s=fe200000 e=fe23ffff
(XEN) DEBUG find_memory_holes 1126 s=ff9e0000 e=ff9e0fff
(XEN) DEBUG find_memory_holes 1126 s=fe300000 e=fe33ffff
(XEN) DEBUG find_memory_holes 1126 s=fd4d0000 e=fd4d0fff
(XEN) DEBUG find_memory_holes 1126 s=ff150000 e=ff150fff
(XEN) DEBUG find_memory_holes 1126 s=ffa50000 e=ffa50fff
(XEN) DEBUG find_memory_holes 1126 s=ffa50000 e=ffa50fff
(XEN) DEBUG find_memory_holes 1126 s=ffa50000 e=ffa50fff
(XEN) DEBUG find_memory_holes 1126 s=fd4c0000 e=fd4c0fff
(XEN) DEBUG find_memory_holes 1126 s=fd4a0000 e=fd4a0fff
(XEN) DEBUG find_memory_holes 1126 s=fd4aa000 e=fd4aafff
(XEN) DEBUG find_memory_holes 1126 s=fd4ab000 e=fd4abfff
(XEN) DEBUG find_memory_holes 1126 s=fd4ac000 e=fd4acfff
(XEN) DEBUG find_memory_holes 1126 s=0 e=7fefffff
(XEN) DEBUG find_memory_holes 1126 s=800000000 e=87fffffff
(XEN) Extended region 0: 0x80000000->0xc0000000
(XEN) Extended region 1: 0xc8000000->0xe0000000
(XEN) Extended region 2: 0xf0000000->0xf9000000
(XEN) Extended region 3: 0xffc00000->0x600000000
(XEN) Extended region 4: 0x880000000->0x8000000000
(XEN) Extended region 5: 0x8001000000->0x10000000000
(XEN) Loading zImage from 0000000000e00000 to 0000000020000000-0000000021367a00
(XEN) Loading d0 initrd from 0000000002200000 to 0x0000000028200000-0x00000000293f936d
(XEN) Loading d0 DTB to 0x0000000028000000-0x0000000028009604
(XEN) *** LOADING DOMU cpus=1 memory=fa000KB ***
(XEN) Loading d1 kernel from boot module @ 0000000003400000
(XEN) Loading ramdisk from boot module @ 0000000004800000
(XEN) Allocating mappings totalling 1000MB for d1:
(XEN) d1 BANK[0] 0x00000040000000-0x0000007e800000 (1000MB)
(XEN) d1 BANK[1] 0x00000200000000-0x00000200000000 (0MB)
(XEN) DEBUG find_unallocated_memory 994 s=40000000 e=7e7fffff
(XEN) DEBUG find_unallocated_memory 994 s=200000000 e=1ffffffff
(XEN) Assertion 's <= e' failed at rangeset.c:189
(XEN) ----[ Xen-4.16-rc  arm64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) PC:     0000000000220e6c rangeset_remove_range+0xbc/0x2bc
(XEN) LR:     00000000002cd508
(XEN) SP:     0000000000306f60
(XEN) CPSR:   0000000020000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 00008000fbf61e70  X1: 0000000200000000  X2: 00000001ffffffff
(XEN)      X3: 0000000000000005  X4: 0000000000000000  X5: 0000000000000028
(XEN)      X6: 0000000000000080  X7: fefefefefefeff09  X8: 7f7f7f7f7f7f7f7f
(XEN)      X9: 717164616f726051 X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101
(XEN)     X12: 0000000000000008 X13: 0000000000000009 X14: 0000000000306cb8
(XEN)     X15: 0000000000000020 X16: 000000000028c5a8 X17: 0000000000000000
(XEN)     X18: 0180000000000000 X19: 00000001ffffffff X20: 0000000000000001
(XEN)     X21: 0000000200000000 X22: 0000000200000000 X23: 0000000000000002
(XEN)     X24: 0000000000000002 X25: 00000000003070e0 X26: 00000000002d9e68
(XEN)     X27: 00000000002d8d18 X28: 00008000fbf40000  FP: 0000000000306f60
(XEN) 
(XEN)   VTCR_EL2: 0000000080023558
(XEN)  VTTBR_EL2: 0000000000000000
(XEN) 
(XEN)  SCTLR_EL2: 0000000030cd183d
(XEN)    HCR_EL2: 0000000000000038
(XEN)  TTBR0_EL2: 0000000004b45000
(XEN) 
(XEN)    ESR_EL2: 00000000f2000001
(XEN)  HPFAR_EL2: 0000000000000000
(XEN)    FAR_EL2: 0000000000000000
(XEN) 
(XEN) Xen stack trace from sp=0000000000306f60:
(XEN)    0000000000307040 00000000002cf2a8 00008000fbf5a000 0000000000000000
(XEN)    00008000fbf40000 00000000003070a8 0000000000307de4 00000000002aa078
(XEN)    0000000880000000 0000000000000002 00000000002e8d08 00000000000fff00
(XEN)    00008000fbf61e70 00008000fbf5a000 00008000fbf61220 0000000000000000
(XEN)    0000000000307040 00000000002cf288 00008000fbf5a000 0000000000000000
(XEN)    00008000fbf40000 00000000003070a8 0000000000307de4 65782c32766e6578
(XEN)    000000000030006e 2d6e65782c6e6578 6e65780036312e34 000000006e65782c
(XEN)    0000000000307d80 00000000002d0440 00008000fbfd95a0 0000000000307dc8
(XEN)    00000000002d99b8 00000000002da338 0000000000307de4 00000000002aa078
(XEN)    000000000000000f 0000000000307110 0000000000000001 00c2010000000002
(XEN)    00000000003070c8 0000000000000000 6d933f29040f0000 0000002200000000
(XEN)    0010000000000000 0020000300000000 0020000000000000 00000000000fa000
(XEN)    0000000000000001 00008000fbf5a000 00008000fbf40000 0000000000000000
(XEN)    0000000000000002 0000000040000000 000000003e800000 0000000000000000
(XEN)    0000000200000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<0000000000220e6c>] rangeset_remove_range+0xbc/0x2bc (PC)
(XEN)    [<00000000002cd508>] domain_build.c#make_hypervisor_node+0x258/0x7f4 (LR)
(XEN)    [<00000000002cf2a8>] domain_build.c#construct_domU+0x9cc/0xa8c
(XEN)    [<00000000002d0440>] create_domUs+0xe8/0x224
(XEN)    [<00000000002d4988>] start_xen+0xafc/0xbf0
(XEN)    [<00000000002001a0>] arm64/head.o#primary_switched+0xc/0x1c
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 's <= e' failed at rangeset.c:189
(XEN) ****************************************

Re: ASSERT in rangeset_remove_range
Posted by Julien Grall 2 years, 5 months ago
(+ Jan, Andrew, Wei for the common code)

On 08/11/2021 22:45, Stefano Stabellini wrote:
> Hi Oleksandr, Julien,

Hi,

> I discovered a bug caused by the recent changes to introduce extended
> regions in make_hypervisor_node (more logs appended):
> 
> 
> (XEN) d1 BANK[0] 0x00000040000000-0x0000007e800000 (1000MB)
> (XEN) d1 BANK[1] 0x00000200000000-0x00000200000000 (0MB)
> (XEN) DEBUG find_unallocated_memory 994 s=40000000 e=7e7fffff
> (XEN) DEBUG find_unallocated_memory 994 s=200000000 e=1ffffffff
> (XEN) Assertion 's <= e' failed at rangeset.c:189
> 
> 
> When a bank of memory is zero in size, then rangeset_remove_range is
> called with end < start, triggering an ASSERT in rangeset_remove_range.
> 
> One solution is to avoid creating 0 size banks. The following change
> does that:
> 
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 49b4eb2b13..3efe542d0f 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -459,9 +459,12 @@ static void __init allocate_memory(struct domain *d, struct kernel_info *kinfo)
>           goto fail;
>   
>       bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);
> -    if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
> +    if ( bank_size > 0 )
> +    {
> +        if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
>                                  bank_size) )
> -        goto fail;
> +            goto fail;
> +    }

I would move the size check in allocate_bank_memory().

>   
>       if ( kinfo->unassigned_mem )
>           goto fail;
> 
> 
> 
> We have a couple of other options too:
> 
> - remove the ASSERT in rangeset_remove_range
> There is an argument that we should simply return error
> fromrangeset_remove_range, rather than a full assert.

To be honest, this is a developper mistake to call with end < start. If 
we were going to return an error then we would completely hide (even in 
developper) it because we would fallback to not exposing extended regions.

So I am not sure switch from ASSERT() to a plain check is a good idea. 
Jan, Andrew, Wei, what do you think?

That said, this option would not be sufficient to fix your problem as 
extended regions will not work.

> 
> - add a if (end > start) check before calling rangeset_remove_range
> We could check that end > start before calling rangeset_remove_range at
> all the call sites in domain_build.c. There are 5 call sites at the
> moment.

I think we only want to add (end > start) where we expect the region 
size to be 0. AFAICT, the only other potential place where this can 
happens is ``find_memory_holes()`` (I vaguely recall a discussion in the 
past where some of the "reg"  property would have size == 0).

> 
> Any other ideas or suggestions?

My preference goes with your initial sugestion (so long the check is 
moved to allocate_bank_memory()).

[...]

> (XEN) Assertion 's <= e' failed at rangeset.c:189
> (XEN) ----[ Xen-4.16-rc  arm64  debug=y  Not tainted ]----
> (XEN) Xen call trace:
> (XEN)    [<0000000000220e6c>] rangeset_remove_range+0xbc/0x2bc (PC)
> (XEN)    [<00000000002cd508>] domain_build.c#make_hypervisor_node+0x258/0x7f4 (LR)
> (XEN)    [<00000000002cf2a8>] domain_build.c#construct_domU+0x9cc/0xa8c

Vanilla staging doesn't call make_hypervisor_node() from construct_domU. 
So what are you using?

> (XEN)    [<00000000002d0440>] create_domUs+0xe8/0x224
> (XEN)    [<00000000002d4988>] start_xen+0xafc/0xbf0
> (XEN)    [<00000000002001a0>] arm64/head.o#primary_switched+0xc/0x1c
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 's <= e' failed at rangeset.c:189
> (XEN) ****************************************
> 

Cheers,

-- 
Julien Grall

Re: ASSERT in rangeset_remove_range
Posted by Jan Beulich 2 years, 5 months ago
On 09.11.2021 12:58, Julien Grall wrote:
> On 08/11/2021 22:45, Stefano Stabellini wrote:
>> I discovered a bug caused by the recent changes to introduce extended
>> regions in make_hypervisor_node (more logs appended):
>>
>>
>> (XEN) d1 BANK[0] 0x00000040000000-0x0000007e800000 (1000MB)
>> (XEN) d1 BANK[1] 0x00000200000000-0x00000200000000 (0MB)
>> (XEN) DEBUG find_unallocated_memory 994 s=40000000 e=7e7fffff
>> (XEN) DEBUG find_unallocated_memory 994 s=200000000 e=1ffffffff
>> (XEN) Assertion 's <= e' failed at rangeset.c:189
>>
>>
>> When a bank of memory is zero in size, then rangeset_remove_range is
>> called with end < start, triggering an ASSERT in rangeset_remove_range.
>>
>> One solution is to avoid creating 0 size banks. The following change
>> does that:
>>
>>
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index 49b4eb2b13..3efe542d0f 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -459,9 +459,12 @@ static void __init allocate_memory(struct domain *d, struct kernel_info *kinfo)
>>           goto fail;
>>   
>>       bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);
>> -    if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
>> +    if ( bank_size > 0 )
>> +    {
>> +        if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
>>                                  bank_size) )
>> -        goto fail;
>> +            goto fail;
>> +    }
> 
> I would move the size check in allocate_bank_memory().
> 
>>   
>>       if ( kinfo->unassigned_mem )
>>           goto fail;
>>
>>
>>
>> We have a couple of other options too:
>>
>> - remove the ASSERT in rangeset_remove_range
>> There is an argument that we should simply return error
>> fromrangeset_remove_range, rather than a full assert.
> 
> To be honest, this is a developper mistake to call with end < start. If 
> we were going to return an error then we would completely hide (even in 
> developper) it because we would fallback to not exposing extended regions.
> 
> So I am not sure switch from ASSERT() to a plain check is a good idea. 
> Jan, Andrew, Wei, what do you think?

There might be reasons to convert, but I don't think the situation here
warrants doing so.

Jan


Re: ASSERT in rangeset_remove_range
Posted by Stefano Stabellini 2 years, 5 months ago
On Tue, 9 Nov 2021, Julien Grall wrote:
> (+ Jan, Andrew, Wei for the common code)
> 
> On 08/11/2021 22:45, Stefano Stabellini wrote:
> > Hi Oleksandr, Julien,
> 
> Hi,
> 
> > I discovered a bug caused by the recent changes to introduce extended
> > regions in make_hypervisor_node (more logs appended):
> > 
> > 
> > (XEN) d1 BANK[0] 0x00000040000000-0x0000007e800000 (1000MB)
> > (XEN) d1 BANK[1] 0x00000200000000-0x00000200000000 (0MB)
> > (XEN) DEBUG find_unallocated_memory 994 s=40000000 e=7e7fffff
> > (XEN) DEBUG find_unallocated_memory 994 s=200000000 e=1ffffffff
> > (XEN) Assertion 's <= e' failed at rangeset.c:189
> > 
> > 
> > When a bank of memory is zero in size, then rangeset_remove_range is
> > called with end < start, triggering an ASSERT in rangeset_remove_range.
> > 
> > One solution is to avoid creating 0 size banks. The following change
> > does that:
> > 
> > 
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index 49b4eb2b13..3efe542d0f 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -459,9 +459,12 @@ static void __init allocate_memory(struct domain *d,
> > struct kernel_info *kinfo)
> >           goto fail;
> >         bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);
> > -    if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
> > +    if ( bank_size > 0 )
> > +    {
> > +        if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
> >                                  bank_size) )
> > -        goto fail;
> > +            goto fail;
> > +    }
> 
> I would move the size check in allocate_bank_memory().

Sure, I can do that


> >         if ( kinfo->unassigned_mem )
> >           goto fail;
> > 
> > 
> > 
> > We have a couple of other options too:
> > 
> > - remove the ASSERT in rangeset_remove_range
> > There is an argument that we should simply return error
> > fromrangeset_remove_range, rather than a full assert.
> 
> To be honest, this is a developper mistake to call with end < start. If we
> were going to return an error then we would completely hide (even in
> developper) it because we would fallback to not exposing extended regions.
> 
> So I am not sure switch from ASSERT() to a plain check is a good idea. Jan,
> Andrew, Wei, what do you think?
> 
> That said, this option would not be sufficient to fix your problem as extended
> regions will not work.
> 
> > 
> > - add a if (end > start) check before calling rangeset_remove_range
> > We could check that end > start before calling rangeset_remove_range at
> > all the call sites in domain_build.c. There are 5 call sites at the
> > moment.
> 
> I think we only want to add (end > start) where we expect the region size to
> be 0. AFAICT, the only other potential place where this can happens is
> ``find_memory_holes()`` (I vaguely recall a discussion in the past where some
> of the "reg"  property would have size == 0).
> 
> > 
> > Any other ideas or suggestions?
> 
> My preference goes with your initial sugestion (so long the check is moved to
> allocate_bank_memory()).
> 
> [...]
> 
> > (XEN) Assertion 's <= e' failed at rangeset.c:189
> > (XEN) ----[ Xen-4.16-rc  arm64  debug=y  Not tainted ]----
> > (XEN) Xen call trace:
> > (XEN)    [<0000000000220e6c>] rangeset_remove_range+0xbc/0x2bc (PC)
> > (XEN)    [<00000000002cd508>]
> > domain_build.c#make_hypervisor_node+0x258/0x7f4 (LR)
> > (XEN)    [<00000000002cf2a8>] domain_build.c#construct_domU+0x9cc/0xa8c
> 
> Vanilla staging doesn't call make_hypervisor_node() from construct_domU. So
> what are you using?

Well spotted. This is my WIP branch with PV drivers support for Dom0less
guests (soon to be sent to xen-devel). The underlying bug could affect
vanilla Xen too as it only takes a zero-size bank to trigger it, but it
is certainly harder to reproduce because make_hypervisor_node is only
called for Dom0 and allocate_bank_memory (the function that today always
adds a zero size bank) is called for DomUs.


> > (XEN)    [<00000000002d0440>] create_domUs+0xe8/0x224
> > (XEN)    [<00000000002d4988>] start_xen+0xafc/0xbf0
> > (XEN)    [<00000000002001a0>] arm64/head.o#primary_switched+0xc/0x1c
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Assertion 's <= e' failed at rangeset.c:189
> > (XEN) ****************************************

Re: ASSERT in rangeset_remove_range
Posted by Oleksandr Tyshchenko 2 years, 5 months ago
On Tue, Nov 9, 2021 at 12:45 AM Stefano Stabellini <sstabellini@kernel.org>
wrote:

> Hi Oleksandr, Julien,
>

Hi Stefano, Julien.

[Sorry for the possible format issues]


>
> I discovered a bug caused by the recent changes to introduce extended
> regions in make_hypervisor_node (more logs appended):
>
>
> (XEN) d1 BANK[0] 0x00000040000000-0x0000007e800000 (1000MB)
> (XEN) d1 BANK[1] 0x00000200000000-0x00000200000000 (0MB)
> (XEN) DEBUG find_unallocated_memory 994 s=40000000 e=7e7fffff
> (XEN) DEBUG find_unallocated_memory 994 s=200000000 e=1ffffffff
> (XEN) Assertion 's <= e' failed at rangeset.c:189
>
>
> When a bank of memory is zero in size, then rangeset_remove_range is
> called with end < start, triggering an ASSERT in rangeset_remove_range.
>
> One solution is to avoid creating 0 size banks. The following change
> does that:
>
>
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 49b4eb2b13..3efe542d0f 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -459,9 +459,12 @@ static void __init allocate_memory(struct domain *d,
> struct kernel_info *kinfo)
>          goto fail;
>
>      bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);
> -    if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
> +    if ( bank_size > 0 )
> +    {
> +        if ( !allocate_bank_memory(d, kinfo,
> gaddr_to_gfn(GUEST_RAM1_BASE),
>                                 bank_size) )
> -        goto fail;
> +            goto fail;
> +    }
>
>      if ( kinfo->unassigned_mem )
>          goto fail;
>
>
>
> We have a couple of other options too:
>
> - remove the ASSERT in rangeset_remove_range
> There is an argument that we should simply return error
> fromrangeset_remove_range, rather than a full assert.
>
> - add a if (end > start) check before calling rangeset_remove_range
> We could check that end > start before calling rangeset_remove_range at
> all the call sites in domain_build.c. There are 5 call sites at the
> moment.
>
> Any other ideas or suggestions?
>


Personally I would avoid creating zero-sized banks (your first solution) as
I don't see any point in taking them into the account and exposing them to
the guest.

What I don't understand is how this assert is triggered by upstream Xen
(how the make_hypervisor_node() gets called for DomU)? Do you have some
changes on top?



>
> Cheers,
>
> Stefano
>
>
>
> (XEN) DEBUG find_memory_holes 1126 s=ff990000 e=ff990fff
> (XEN) DEBUG find_memory_holes 1126 s=ff990000 e=ff990fff
> (XEN) DEBUG find_memory_holes 1126 s=ff990000 e=ff990fff
> (XEN) DEBUG find_memory_holes 1126 s=ff990000 e=ff990fff
> (XEN) DEBUG find_memory_holes 1126 s=f9010000 e=f901ffff
> (XEN) DEBUG find_memory_holes 1126 s=f9020000 e=f903ffff
> (XEN) DEBUG find_memory_holes 1126 s=f9040000 e=f905ffff
> (XEN) DEBUG find_memory_holes 1126 s=f9060000 e=f907ffff
> (XEN) DEBUG find_memory_holes 1126 s=fd800000 e=fd81ffff
> (XEN) DEBUG find_memory_holes 1126 s=ff060000 e=ff060fff
> (XEN) DEBUG find_memory_holes 1126 s=ff070000 e=ff070fff
> (XEN) DEBUG find_memory_holes 1126 s=fd6e0000 e=fd6e8fff
> (XEN) DEBUG find_memory_holes 1126 s=fd6e9000 e=fd6edfff
> (XEN) DEBUG find_memory_holes 1126 s=fd500000 e=fd500fff
> (XEN) DEBUG find_memory_holes 1126 s=fd510000 e=fd510fff
> (XEN) DEBUG find_memory_holes 1126 s=fd520000 e=fd520fff
> (XEN) DEBUG find_memory_holes 1126 s=fd530000 e=fd530fff
> (XEN) DEBUG find_memory_holes 1126 s=fd540000 e=fd540fff
> (XEN) DEBUG find_memory_holes 1126 s=fd550000 e=fd550fff
> (XEN) DEBUG find_memory_holes 1126 s=fd560000 e=fd560fff
> (XEN) DEBUG find_memory_holes 1126 s=fd570000 e=fd570fff
> (XEN) DEBUG find_memory_holes 1126 s=fd4b0000 e=fd4bffff
> (XEN) DEBUG find_memory_holes 1126 s=ffa80000 e=ffa80fff
> (XEN) DEBUG find_memory_holes 1126 s=ffa90000 e=ffa90fff
> (XEN) DEBUG find_memory_holes 1126 s=ffaa0000 e=ffaa0fff
> (XEN) DEBUG find_memory_holes 1126 s=ffab0000 e=ffab0fff
> (XEN) DEBUG find_memory_holes 1126 s=ffac0000 e=ffac0fff
> (XEN) DEBUG find_memory_holes 1126 s=ffad0000 e=ffad0fff
> (XEN) DEBUG find_memory_holes 1126 s=ffae0000 e=ffae0fff
> (XEN) DEBUG find_memory_holes 1126 s=ffaf0000 e=ffaf0fff
> (XEN) DEBUG find_memory_holes 1126 s=fd070000 e=fd09ffff
> (XEN) DEBUG find_memory_holes 1126 s=ff100000 e=ff100fff
> (XEN) DEBUG find_memory_holes 1126 s=ff0b0000 e=ff0b0fff
> (XEN) DEBUG find_memory_holes 1126 s=ff0c0000 e=ff0c0fff
> (XEN) DEBUG find_memory_holes 1126 s=ff0d0000 e=ff0d0fff
> (XEN) DEBUG find_memory_holes 1126 s=ff0e0000 e=ff0e0fff
> (XEN) DEBUG find_memory_holes 1126 s=ff0a0000 e=ff0a0fff
> (XEN) DEBUG find_memory_holes 1126 s=ff020000 e=ff020fff
> (XEN) DEBUG find_memory_holes 1126 s=ff030000 e=ff030fff
> (XEN) DEBUG find_memory_holes 1126 s=ff960000 e=ff960fff
> (XEN) DEBUG find_memory_holes 1126 s=ffa00000 e=ffa0ffff
> (XEN) DEBUG find_memory_holes 1126 s=fd0b0000 e=fd0bffff
> (XEN) DEBUG find_memory_holes 1126 s=fd490000 e=fd49ffff
> (XEN) DEBUG find_memory_holes 1126 s=ffa10000 e=ffa1ffff
> (XEN) DEBUG find_memory_holes 1126 s=fd0e0000 e=fd0e0fff
> (XEN) DEBUG find_memory_holes 1126 s=fd480000 e=fd480fff
> (XEN) DEBUG find_memory_holes 1126 s=8000000000 e=8000ffffff
> (XEN) DEBUG handle_pci_range 1056 s=e0000000 e=efffffff
> (XEN) DEBUG handle_pci_range 1056 s=600000000 e=7ffffffff
> (XEN) DEBUG find_memory_holes 1126 s=ff0f0000 e=ff0f0fff
> (XEN) DEBUG find_memory_holes 1126 s=c0000000 e=c7ffffff
> (XEN) DEBUG find_memory_holes 1126 s=ffa60000 e=ffa60fff
> (XEN) DEBUG find_memory_holes 1126 s=fd400000 e=fd43ffff
> (XEN) DEBUG find_memory_holes 1126 s=fd3d0000 e=fd3d0fff
> (XEN) DEBUG find_memory_holes 1126 s=fd0c0000 e=fd0c1fff
> (XEN) DEBUG find_memory_holes 1126 s=ff160000 e=ff160fff
> (XEN) DEBUG find_memory_holes 1126 s=ff170000 e=ff170fff
> (XEN) DEBUG find_memory_holes 1126 s=ff040000 e=ff040fff
> (XEN) DEBUG find_memory_holes 1126 s=ff050000 e=ff050fff
> (XEN) DEBUG find_memory_holes 1126 s=ff110000 e=ff110fff
> (XEN) DEBUG find_memory_holes 1126 s=ff120000 e=ff120fff
> (XEN) DEBUG find_memory_holes 1126 s=ff130000 e=ff130fff
> (XEN) DEBUG find_memory_holes 1126 s=ff140000 e=ff140fff
> (XEN) DEBUG find_memory_holes 1126 s=ff000000 e=ff000fff
> (XEN) DEBUG find_memory_holes 1126 s=ff010000 e=ff010fff
> (XEN) DEBUG find_memory_holes 1126 s=ff9d0000 e=ff9d0fff
> (XEN) DEBUG find_memory_holes 1126 s=fe200000 e=fe23ffff
> (XEN) DEBUG find_memory_holes 1126 s=ff9e0000 e=ff9e0fff
> (XEN) DEBUG find_memory_holes 1126 s=fe300000 e=fe33ffff
> (XEN) DEBUG find_memory_holes 1126 s=fd4d0000 e=fd4d0fff
> (XEN) DEBUG find_memory_holes 1126 s=ff150000 e=ff150fff
> (XEN) DEBUG find_memory_holes 1126 s=ffa50000 e=ffa50fff
> (XEN) DEBUG find_memory_holes 1126 s=ffa50000 e=ffa50fff
> (XEN) DEBUG find_memory_holes 1126 s=ffa50000 e=ffa50fff
> (XEN) DEBUG find_memory_holes 1126 s=fd4c0000 e=fd4c0fff
> (XEN) DEBUG find_memory_holes 1126 s=fd4a0000 e=fd4a0fff
> (XEN) DEBUG find_memory_holes 1126 s=fd4aa000 e=fd4aafff
> (XEN) DEBUG find_memory_holes 1126 s=fd4ab000 e=fd4abfff
> (XEN) DEBUG find_memory_holes 1126 s=fd4ac000 e=fd4acfff
> (XEN) DEBUG find_memory_holes 1126 s=0 e=7fefffff
> (XEN) DEBUG find_memory_holes 1126 s=800000000 e=87fffffff
> (XEN) Extended region 0: 0x80000000->0xc0000000
> (XEN) Extended region 1: 0xc8000000->0xe0000000
> (XEN) Extended region 2: 0xf0000000->0xf9000000
> (XEN) Extended region 3: 0xffc00000->0x600000000
> (XEN) Extended region 4: 0x880000000->0x8000000000
> (XEN) Extended region 5: 0x8001000000->0x10000000000
> (XEN) Loading zImage from 0000000000e00000 to
> 0000000020000000-0000000021367a00
> (XEN) Loading d0 initrd from 0000000002200000 to
> 0x0000000028200000-0x00000000293f936d
> (XEN) Loading d0 DTB to 0x0000000028000000-0x0000000028009604
> (XEN) *** LOADING DOMU cpus=1 memory=fa000KB ***
> (XEN) Loading d1 kernel from boot module @ 0000000003400000
> (XEN) Loading ramdisk from boot module @ 0000000004800000
> (XEN) Allocating mappings totalling 1000MB for d1:
> (XEN) d1 BANK[0] 0x00000040000000-0x0000007e800000 (1000MB)
> (XEN) d1 BANK[1] 0x00000200000000-0x00000200000000 (0MB)
> (XEN) DEBUG find_unallocated_memory 994 s=40000000 e=7e7fffff
> (XEN) DEBUG find_unallocated_memory 994 s=200000000 e=1ffffffff
> (XEN) Assertion 's <= e' failed at rangeset.c:189
> (XEN) ----[ Xen-4.16-rc  arm64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) PC:     0000000000220e6c rangeset_remove_range+0xbc/0x2bc
> (XEN) LR:     00000000002cd508
> (XEN) SP:     0000000000306f60
> (XEN) CPSR:   0000000020000249 MODE:64-bit EL2h (Hypervisor, handler)
> (XEN)      X0: 00008000fbf61e70  X1: 0000000200000000  X2: 00000001ffffffff
> (XEN)      X3: 0000000000000005  X4: 0000000000000000  X5: 0000000000000028
> (XEN)      X6: 0000000000000080  X7: fefefefefefeff09  X8: 7f7f7f7f7f7f7f7f
> (XEN)      X9: 717164616f726051 X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101
> (XEN)     X12: 0000000000000008 X13: 0000000000000009 X14: 0000000000306cb8
> (XEN)     X15: 0000000000000020 X16: 000000000028c5a8 X17: 0000000000000000
> (XEN)     X18: 0180000000000000 X19: 00000001ffffffff X20: 0000000000000001
> (XEN)     X21: 0000000200000000 X22: 0000000200000000 X23: 0000000000000002
> (XEN)     X24: 0000000000000002 X25: 00000000003070e0 X26: 00000000002d9e68
> (XEN)     X27: 00000000002d8d18 X28: 00008000fbf40000  FP: 0000000000306f60
> (XEN)
> (XEN)   VTCR_EL2: 0000000080023558
> (XEN)  VTTBR_EL2: 0000000000000000
> (XEN)
> (XEN)  SCTLR_EL2: 0000000030cd183d
> (XEN)    HCR_EL2: 0000000000000038
> (XEN)  TTBR0_EL2: 0000000004b45000
> (XEN)
> (XEN)    ESR_EL2: 00000000f2000001
> (XEN)  HPFAR_EL2: 0000000000000000
> (XEN)    FAR_EL2: 0000000000000000
> (XEN)
> (XEN) Xen stack trace from sp=0000000000306f60:
> (XEN)    0000000000307040 00000000002cf2a8 00008000fbf5a000
> 0000000000000000
> (XEN)    00008000fbf40000 00000000003070a8 0000000000307de4
> 00000000002aa078
> (XEN)    0000000880000000 0000000000000002 00000000002e8d08
> 00000000000fff00
> (XEN)    00008000fbf61e70 00008000fbf5a000 00008000fbf61220
> 0000000000000000
> (XEN)    0000000000307040 00000000002cf288 00008000fbf5a000
> 0000000000000000
> (XEN)    00008000fbf40000 00000000003070a8 0000000000307de4
> 65782c32766e6578
> (XEN)    000000000030006e 2d6e65782c6e6578 6e65780036312e34
> 000000006e65782c
> (XEN)    0000000000307d80 00000000002d0440 00008000fbfd95a0
> 0000000000307dc8
> (XEN)    00000000002d99b8 00000000002da338 0000000000307de4
> 00000000002aa078
> (XEN)    000000000000000f 0000000000307110 0000000000000001
> 00c2010000000002
> (XEN)    00000000003070c8 0000000000000000 6d933f29040f0000
> 0000002200000000
> (XEN)    0010000000000000 0020000300000000 0020000000000000
> 00000000000fa000
> (XEN)    0000000000000001 00008000fbf5a000 00008000fbf40000
> 0000000000000000
> (XEN)    0000000000000002 0000000040000000 000000003e800000
> 0000000000000000
> (XEN)    0000000200000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<0000000000220e6c>] rangeset_remove_range+0xbc/0x2bc (PC)
> (XEN)    [<00000000002cd508>]
> domain_build.c#make_hypervisor_node+0x258/0x7f4 (LR)
> (XEN)    [<00000000002cf2a8>] domain_build.c#construct_domU+0x9cc/0xa8c
> (XEN)    [<00000000002d0440>] create_domUs+0xe8/0x224
> (XEN)    [<00000000002d4988>] start_xen+0xafc/0xbf0
> (XEN)    [<00000000002001a0>] arm64/head.o#primary_switched+0xc/0x1c
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 's <= e' failed at rangeset.c:189
> (XEN) ****************************************
>
>

-- 
Regards,

Oleksandr Tyshchenko