RE: static-mem preventing dom0 from booting

Stefano Stabellini posted 1 patch 2 years, 4 months ago
Failed in applying to current master (apply log)
RE: static-mem preventing dom0 from booting
Posted by Stefano Stabellini 2 years, 4 months ago
On Fri, 5 Nov 2021, Stefano Stabellini wrote:
> The scenario is extremely simple; you can see the full device tree
> configuration in the attachment to my previous email.
> 
> - dom0
> - dom0less domU with static-mem
> 
> That's it! So basically it is just a normal dom0 + dom0less domU
> configuration, which already works fine, where I added static-mem to the
> domU and suddenly dom0 (not the domU!) stopped working.

I did some more debugging today and I found the problem. The issue is
that static-mem regions are added to the list of reserved_mem. However,
reserved_mem is automatically assigned to Dom0 by default at the bottom
of xen/arch/arm/domain_build.c:handle_node, see the second call to
make_memory_node. Really, we shouldn't give to dom0 static-mem ranges
meant for other domUs. E.g. the following change is sufficient to solve
the problem:

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 88a79247cb..dc609c4f0e 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -891,6 +891,9 @@ static int __init make_memory_node(const struct domain *d,
         u64 start = mem->bank[i].start;
         u64 size = mem->bank[i].size;
 
+        if ( mem->bank[i].xen_domain )
+            continue;
+
         dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
                    i, start, start + size);
 
However, maybe a better fix would be to purge reserved_mem of any
xen_domain items before calling make_memory_node.


I found one additional issue regarding is_domain_direct_mapped which
doesn't return true for static-mem domains. I think we need to add a
direct_map bool to arch_domain and set it for both dom0 and static-mem
dom0less domUs, so that we can change the implementation of
is_domain_direct_mapped to:

#define is_domain_direct_mapped(d) (d->arch.direct_map)

RE: static-mem preventing dom0 from booting
Posted by Penny Zheng 2 years, 4 months ago
Hi Stefano

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: Saturday, November 6, 2021 7:03 AM
> To: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org;
> Wei Chen <Wei.Chen@arm.com>; Bertrand Marquis
> <Bertrand.Marquis@arm.com>
> Subject: RE: static-mem preventing dom0 from booting
> 
> On Fri, 5 Nov 2021, Stefano Stabellini wrote:
> > The scenario is extremely simple; you can see the full device tree
> > configuration in the attachment to my previous email.
> >
> > - dom0
> > - dom0less domU with static-mem
> >
> > That's it! So basically it is just a normal dom0 + dom0less domU
> > configuration, which already works fine, where I added static-mem to
> > the domU and suddenly dom0 (not the domU!) stopped working.
> 

Got it. Sorry, I missed the scenario you are talking about... I simply think what dom0less
means that dom0 is absent... ;/
  
> I did some more debugging today and I found the problem. The issue is that
> static-mem regions are added to the list of reserved_mem. However,
> reserved_mem is automatically assigned to Dom0 by default at the bottom of
> xen/arch/arm/domain_build.c:handle_node, see the second call to
> make_memory_node. Really, we shouldn't give to dom0 static-mem ranges
> meant for other domUs. E.g. the following change is sufficient to solve the
> problem:
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 88a79247cb..dc609c4f0e 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -891,6 +891,9 @@ static int __init make_memory_node(const struct
> domain *d,
>          u64 start = mem->bank[i].start;
>          u64 size = mem->bank[i].size;
> 
> +        if ( mem->bank[i].xen_domain )
> +            continue;
> +
>          dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
>                     i, start, start + size);
> 
> However, maybe a better fix would be to purge reserved_mem of any
> xen_domain items before calling make_memory_node.
> 
> 
> I found one additional issue regarding is_domain_direct_mapped which
> doesn't return true for static-mem domains. I think we need to add a
> direct_map bool to arch_domain and set it for both dom0 and static-mem
> dom0less domUs, so that we can change the implementation of
> is_domain_direct_mapped to:
> 
> #define is_domain_direct_mapped(d) (d->arch.direct_map)

Yeah, I already pushed a patch serie regarding direct-map to community for review, 
and it is also based on your old direct-map serie. 

Today, I may push direct-map version 3 to community for review~~~ If you're free, plz take a look.

Cheers,
Penny Zheng

RE: static-mem preventing dom0 from booting
Posted by Stefano Stabellini 2 years, 4 months ago
+ Julien, Ian and others

On Fri, 5 Nov 2021, Stefano Stabellini wrote:
> On Fri, 5 Nov 2021, Stefano Stabellini wrote:
> > The scenario is extremely simple; you can see the full device tree
> > configuration in the attachment to my previous email.
> > 
> > - dom0
> > - dom0less domU with static-mem
> > 
> > That's it! So basically it is just a normal dom0 + dom0less domU
> > configuration, which already works fine, where I added static-mem to the
> > domU and suddenly dom0 (not the domU!) stopped working.
> 
> I did some more debugging today and I found the problem. The issue is
> that static-mem regions are added to the list of reserved_mem. However,
> reserved_mem is automatically assigned to Dom0 by default at the bottom
> of xen/arch/arm/domain_build.c:handle_node, see the second call to
> make_memory_node. Really, we shouldn't give to dom0 static-mem ranges
> meant for other domUs. E.g. the following change is sufficient to solve
> the problem:
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 88a79247cb..dc609c4f0e 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -891,6 +891,9 @@ static int __init make_memory_node(const struct domain *d,
>          u64 start = mem->bank[i].start;
>          u64 size = mem->bank[i].size;
>  
> +        if ( mem->bank[i].xen_domain )
> +            continue;
> +
>          dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
>                     i, start, start + size);
>  
> However, maybe a better fix would be to purge reserved_mem of any
> xen_domain items before calling make_memory_node.
> 
> 
> I found one additional issue regarding is_domain_direct_mapped which
> doesn't return true for static-mem domains. I think we need to add a
> direct_map bool to arch_domain and set it for both dom0 and static-mem
> dom0less domUs, so that we can change the implementation of
> is_domain_direct_mapped to:
> 
> #define is_domain_direct_mapped(d) (d->arch.direct_map)
> 

Re: static-mem preventing dom0 from booting
Posted by Julien Grall 2 years, 4 months ago
Hi Stefano,

On 05/11/2021 23:05, Stefano Stabellini wrote:
> On Fri, 5 Nov 2021, Stefano Stabellini wrote:
>> On Fri, 5 Nov 2021, Stefano Stabellini wrote:
>>> The scenario is extremely simple; you can see the full device tree
>>> configuration in the attachment to my previous email.
>>>
>>> - dom0
>>> - dom0less domU with static-mem
>>>
>>> That's it! So basically it is just a normal dom0 + dom0less domU
>>> configuration, which already works fine, where I added static-mem to the
>>> domU and suddenly dom0 (not the domU!) stopped working.
>>
>> I did some more debugging today and I found the problem. The issue is
>> that static-mem regions are added to the list of reserved_mem. However,
>> reserved_mem is automatically assigned to Dom0 by default at the bottom
>> of xen/arch/arm/domain_build.c:handle_node, see the second call to
>> make_memory_node. Really, we shouldn't give to dom0 static-mem ranges
>> meant for other domUs. E.g. the following change is sufficient to solve
>> the problem:
>>
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index 88a79247cb..dc609c4f0e 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -891,6 +891,9 @@ static int __init make_memory_node(const struct domain *d,
>>           u64 start = mem->bank[i].start;
>>           u64 size = mem->bank[i].size;
>>   
>> +        if ( mem->bank[i].xen_domain )
>> +            continue;
>> +
>>           dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
>>                      i, start, start + size);
>>   
>> However, maybe a better fix would be to purge reserved_mem of any
>> xen_domain items before calling make_memory_node.

I would rather not modify boot_info.reserved_mem because it may be used 
afterwards. I think your approach is the right one.

Alternatively, we would rework make_memory_node() to create one node per 
range (rather than a node with multiple ranges). This would move the 
loop outside of make_memory_node(). The advantage is we have more 
flexibily how on to filter ranges (in the future we may need to pass 
some reserved ranges to a domain).

>>
>>
>> I found one additional issue regarding is_domain_direct_mapped which
>> doesn't return true for static-mem domains. I think we need to add a
>> direct_map bool to arch_domain and set it for both dom0 and static-mem
>> dom0less domUs, so that we can change the implementation of
>> is_domain_direct_mapped to:
>>
>> #define is_domain_direct_mapped(d) (d->arch.direct_map)

In Xen 4.16, static-mem domains are not direct mapped (i.e MFN == GFN). 
Instead, the static memory is used to allocate memory for the domain at 
the default regions in the guest memory layout.

If you want to direct map static-mem domains, then you would have to 
apply [1] from Penny which is still under review.

Cheers,

[1] <20211015030945.2082898-1-penny.zheng@arm.com>

-- 
Julien Grall

Re: static-mem preventing dom0 from booting
Posted by Stefano Stabellini 2 years, 4 months ago
On Sat, 6 Nov 2021, Julien Grall wrote:
> Hi Stefano,
> 
> On 05/11/2021 23:05, Stefano Stabellini wrote:
> > On Fri, 5 Nov 2021, Stefano Stabellini wrote:
> > > On Fri, 5 Nov 2021, Stefano Stabellini wrote:
> > > > The scenario is extremely simple; you can see the full device tree
> > > > configuration in the attachment to my previous email.
> > > > 
> > > > - dom0
> > > > - dom0less domU with static-mem
> > > > 
> > > > That's it! So basically it is just a normal dom0 + dom0less domU
> > > > configuration, which already works fine, where I added static-mem to the
> > > > domU and suddenly dom0 (not the domU!) stopped working.
> > > 
> > > I did some more debugging today and I found the problem. The issue is
> > > that static-mem regions are added to the list of reserved_mem. However,
> > > reserved_mem is automatically assigned to Dom0 by default at the bottom
> > > of xen/arch/arm/domain_build.c:handle_node, see the second call to
> > > make_memory_node. Really, we shouldn't give to dom0 static-mem ranges
> > > meant for other domUs. E.g. the following change is sufficient to solve
> > > the problem:
> > > 
> > > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > > index 88a79247cb..dc609c4f0e 100644
> > > --- a/xen/arch/arm/domain_build.c
> > > +++ b/xen/arch/arm/domain_build.c
> > > @@ -891,6 +891,9 @@ static int __init make_memory_node(const struct domain
> > > *d,
> > >           u64 start = mem->bank[i].start;
> > >           u64 size = mem->bank[i].size;
> > >   +        if ( mem->bank[i].xen_domain )
> > > +            continue;
> > > +
> > >           dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
> > >                      i, start, start + size);
> > >   However, maybe a better fix would be to purge reserved_mem of any
> > > xen_domain items before calling make_memory_node.
> 
> I would rather not modify boot_info.reserved_mem because it may be used
> afterwards. I think your approach is the right one.
> 
> Alternatively, we would rework make_memory_node() to create one node per range
> (rather than a node with multiple ranges). This would move the loop outside of
> make_memory_node(). The advantage is we have more flexibily how on to filter
> ranges (in the future we may need to pass some reserved ranges to a domain).

Thanks for the quick feedback, I'll send a proper patch. I'll follow the
first approach for now.


> > > 
> > > I found one additional issue regarding is_domain_direct_mapped which
> > > doesn't return true for static-mem domains. I think we need to add a
> > > direct_map bool to arch_domain and set it for both dom0 and static-mem
> > > dom0less domUs, so that we can change the implementation of
> > > is_domain_direct_mapped to:
> > > 
> > > #define is_domain_direct_mapped(d) (d->arch.direct_map)
> 
> In Xen 4.16, static-mem domains are not direct mapped (i.e MFN == GFN).
> Instead, the static memory is used to allocate memory for the domain at the
> default regions in the guest memory layout.

I see, I forgot that the memory is not already mapped 1:1. 


> If you want to direct map static-mem domains, then you would have to apply [1]
> from Penny which is still under review.
> 
> Cheers,
> 
> [1] <20211015030945.2082898-1-penny.zheng@arm.com>