[PATCH 04/22] x86/mm: ensure L4 idle_pg_table is not modified past boot

Roger Pau Monne posted 22 patches 1 month, 3 weeks ago
[PATCH 04/22] x86/mm: ensure L4 idle_pg_table is not modified past boot
Posted by Roger Pau Monne 1 month, 3 weeks ago
The idle_pg_table L4 is cloned to create all the other L4 Xen uses, and hence
it shouldn't be modified once further L4 are created.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/mm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 6ffacab341ad..01380fd82c9d 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5023,6 +5023,12 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
         mfn_t l3mfn;
         l3_pgentry_t *l3t = alloc_mapped_pagetable(&l3mfn);
 
+        /*
+         * dom0 is build at smp_boot, at which point we already create new L4s
+         * based on idle_pg_table.
+         */
+        BUG_ON(system_state >= SYS_STATE_smp_boot);
+
         if ( !l3t )
             return NULL;
         UNMAP_DOMAIN_PAGE(l3t);
-- 
2.45.2


Re: [PATCH 04/22] x86/mm: ensure L4 idle_pg_table is not modified past boot
Posted by Jan Beulich 1 month ago
On 26.07.2024 17:21, Roger Pau Monne wrote:
> The idle_pg_table L4 is cloned to create all the other L4 Xen uses, and hence
> it shouldn't be modified once further L4 are created.

Yes, but the window between moving into SYS_STATE_smp_boot and Dom0 having
its initial page tables created is quite large. If the justification was
relative to AP bringup, that may be all fine. But when related to cloning,
I think that would then truly want keying to there being any non-system
domain(s).

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5023,6 +5023,12 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
>          mfn_t l3mfn;
>          l3_pgentry_t *l3t = alloc_mapped_pagetable(&l3mfn);
>  
> +        /*
> +         * dom0 is build at smp_boot, at which point we already create new L4s
> +         * based on idle_pg_table.
> +         */
> +        BUG_ON(system_state >= SYS_STATE_smp_boot);

Which effectively means most of this function could become __init (e.g. by
moving into a helper). We'd then hit the BUG_ON() prior to init_done()
destroying the .init.* mappings, and we'd simply #PF afterwards. That's
not so much for the space savings in .text, but to document the limited
lifetime of the (helper) function directly in its function head.

I further wonder whether in such a case the enclosing if() wouldn't want
to gain unlikely() at the same time.

Jan
Re: [PATCH 04/22] x86/mm: ensure L4 idle_pg_table is not modified past boot
Posted by Roger Pau Monné 6 days, 10 hours ago
On Tue, Aug 13, 2024 at 05:54:54PM +0200, Jan Beulich wrote:
> On 26.07.2024 17:21, Roger Pau Monne wrote:
> > The idle_pg_table L4 is cloned to create all the other L4 Xen uses, and hence
> > it shouldn't be modified once further L4 are created.
> 
> Yes, but the window between moving into SYS_STATE_smp_boot and Dom0 having
> its initial page tables created is quite large. If the justification was
> relative to AP bringup, that may be all fine. But when related to cloning,
> I think that would then truly want keying to there being any non-system
> domain(s).

Further changes in this series will add a per-CPU idle page table, and
hence we need to ensure that by the time APs are started the BSP L4 idle
page directory is not changed, as otherwise the copies in the APs
would get out of sync.

The idle system domain is indeed tied to the idle page talbes, but the
idle vCPU0 (the BSP) directly uses idle_pg_table (no copying), and
hence it's fine to allow modifications of the L4 idle page table
directory up to when APs are started (those will indeed make copies of
the idle L4.

> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5023,6 +5023,12 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
> >          mfn_t l3mfn;
> >          l3_pgentry_t *l3t = alloc_mapped_pagetable(&l3mfn);
> >  
> > +        /*
> > +         * dom0 is build at smp_boot, at which point we already create new L4s
> > +         * based on idle_pg_table.
> > +         */
> > +        BUG_ON(system_state >= SYS_STATE_smp_boot);
> 
> Which effectively means most of this function could become __init (e.g. by
> moving into a helper). We'd then hit the BUG_ON() prior to init_done()
> destroying the .init.* mappings, and we'd simply #PF afterwards. That's
> not so much for the space savings in .text, but to document the limited
> lifetime of the (helper) function directly in its function head.

IMO the BUG_ON() is clearer to debug, but I won't mind splitting the
logic inside the if body into a separate helper.

> I further wonder whether in such a case the enclosing if() wouldn't want
> to gain unlikely() at the same time.

Yes, I can certainly add that.

Thanks, Roger.
Re: [PATCH 04/22] x86/mm: ensure L4 idle_pg_table is not modified past boot
Posted by Jan Beulich 6 days, 10 hours ago
On 10.09.2024 10:54, Roger Pau Monné wrote:
> On Tue, Aug 13, 2024 at 05:54:54PM +0200, Jan Beulich wrote:
>> On 26.07.2024 17:21, Roger Pau Monne wrote:
>>> The idle_pg_table L4 is cloned to create all the other L4 Xen uses, and hence
>>> it shouldn't be modified once further L4 are created.
>>
>> Yes, but the window between moving into SYS_STATE_smp_boot and Dom0 having
>> its initial page tables created is quite large. If the justification was
>> relative to AP bringup, that may be all fine. But when related to cloning,
>> I think that would then truly want keying to there being any non-system
>> domain(s).
> 
> Further changes in this series will add a per-CPU idle page table, and
> hence we need to ensure that by the time APs are started the BSP L4 idle
> page directory is not changed, as otherwise the copies in the APs
> would get out of sync.
> 
> The idle system domain is indeed tied to the idle page talbes, but the
> idle vCPU0 (the BSP) directly uses idle_pg_table (no copying), and
> hence it's fine to allow modifications of the L4 idle page table
> directory up to when APs are started (those will indeed make copies of
> the idle L4.

Which may want at least mentioning in the description then. I take it
that ...

>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -5023,6 +5023,12 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
>>>          mfn_t l3mfn;
>>>          l3_pgentry_t *l3t = alloc_mapped_pagetable(&l3mfn);
>>>  
>>> +        /*
>>> +         * dom0 is build at smp_boot, at which point we already create new L4s
>>> +         * based on idle_pg_table.
>>> +         */

... this comment is then refined by the later patches you refer to?

>>> +        BUG_ON(system_state >= SYS_STATE_smp_boot);
>>
>> Which effectively means most of this function could become __init (e.g. by
>> moving into a helper). We'd then hit the BUG_ON() prior to init_done()
>> destroying the .init.* mappings, and we'd simply #PF afterwards. That's
>> not so much for the space savings in .text, but to document the limited
>> lifetime of the (helper) function directly in its function head.
> 
> IMO the BUG_ON() is clearer to debug,

Fair point - it's indeed a balance between two possible goals. I guess ...

> but I won't mind splitting the
> logic inside the if body into a separate helper.

... simply keep it as you have it.

Jan

Re: [PATCH 04/22] x86/mm: ensure L4 idle_pg_table is not modified past boot
Posted by Roger Pau Monné 6 days, 9 hours ago
On Tue, Sep 10, 2024 at 11:00:27AM +0200, Jan Beulich wrote:
> On 10.09.2024 10:54, Roger Pau Monné wrote:
> > On Tue, Aug 13, 2024 at 05:54:54PM +0200, Jan Beulich wrote:
> >> On 26.07.2024 17:21, Roger Pau Monne wrote:
> >>> The idle_pg_table L4 is cloned to create all the other L4 Xen uses, and hence
> >>> it shouldn't be modified once further L4 are created.
> >>
> >> Yes, but the window between moving into SYS_STATE_smp_boot and Dom0 having
> >> its initial page tables created is quite large. If the justification was
> >> relative to AP bringup, that may be all fine. But when related to cloning,
> >> I think that would then truly want keying to there being any non-system
> >> domain(s).
> > 
> > Further changes in this series will add a per-CPU idle page table, and
> > hence we need to ensure that by the time APs are started the BSP L4 idle
> > page directory is not changed, as otherwise the copies in the APs
> > would get out of sync.
> > 
> > The idle system domain is indeed tied to the idle page talbes, but the
> > idle vCPU0 (the BSP) directly uses idle_pg_table (no copying), and
> > hence it's fine to allow modifications of the L4 idle page table
> > directory up to when APs are started (those will indeed make copies of
> > the idle L4.
> 
> Which may want at least mentioning in the description then. I take it
> that ...
> 
> >>> --- a/xen/arch/x86/mm.c
> >>> +++ b/xen/arch/x86/mm.c
> >>> @@ -5023,6 +5023,12 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
> >>>          mfn_t l3mfn;
> >>>          l3_pgentry_t *l3t = alloc_mapped_pagetable(&l3mfn);
> >>>  
> >>> +        /*
> >>> +         * dom0 is build at smp_boot, at which point we already create new L4s
> >>> +         * based on idle_pg_table.
> >>> +         */
> 
> ... this comment is then refined by the later patches you refer to?

Hm, I would have to double check, not sure I've updated it once the
idle_pg_table is cloned for AP bringup.

Will expand commit message and update the comment here if not done
already by later patches.

Thanks, Roger.