[PATCH 04/12] x86/boot: introduce module release

Daniel P. Smith posted 12 patches 3 weeks ago
There is a newer version of this series
[PATCH 04/12] x86/boot: introduce module release
Posted by Daniel P. Smith 3 weeks ago
A precarious approach was used to release the pages used to hold a boot module.
The precariousness stemmed from the fact that in the case of PV dom0, the
initrd module pages may be either mapped or explicitly copied into the dom0
address space. So to handle this situation, the PV dom0 construction code will
set the size of the module to zero, relying on discard_initial_images() to skip
any modules with a size of zero.

A function is introduced to release a module when it is no longer needed that
accepts a boolean parameter, free_mem, to indicate if the corresponding pages
can be freed. To track that a module has been released, the boot module flag
`released` is introduced.

The previous release model was a free all at once except those of size zeros,
which would handle any unused modules passed. The new model is one of, free
consumed module after usage is complete, thus unconsumed modules do not have a
consumer to free them. To address this, the discard_uknown_boot_modules() is
introduced and called after the last module identification occurs, initrd, to
free the pages of any boot modules that are identified as not being released.
After domain construction completes, all modules should be freed. A walk of the
boot modules is added after domain construction to validate and notify if a
module is found not to have been released.

Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
Changes since v7:
- This is a new approach as an alternative to the `consumed` flag.
---
 xen/arch/x86/cpu/microcode/core.c   |  4 ++
 xen/arch/x86/hvm/dom0_build.c       |  7 ++--
 xen/arch/x86/include/asm/bootinfo.h |  2 +
 xen/arch/x86/include/asm/setup.h    |  3 +-
 xen/arch/x86/pv/dom0_build.c        | 20 ++--------
 xen/arch/x86/setup.c                | 57 +++++++++++++++++++++++------
 xen/xsm/xsm_core.c                  |  5 +++
 7 files changed, 67 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
index b09cf83249f6..349e857f539a 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
@@ -756,6 +756,10 @@ static int __init cf_check microcode_init_cache(void)
         return rc;
     }
 
+    /* If raw module, we can free it now */
+    if ( !opt_scan )
+        release_boot_module(&bi->mods[early_mod_idx], true);
+
     if ( !patch )
         return -ENOENT;
 
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 3dd913bdb029..a4ac262db463 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -704,6 +704,8 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
         return rc;
     }
 
+    release_module(image, true);
+
     /*
      * Find a RAM region big enough (and that doesn't overlap with the loaded
      * kernel) in order to load the initrd and the metadata. Note it could be
@@ -751,10 +753,9 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
             last_addr += len;
         }
         last_addr = ROUNDUP(last_addr, PAGE_SIZE);
-    }
 
-    /* Free temporary buffers. */
-    discard_initial_images();
+        release_module(initrd, true);
+    }
 
     if ( cmdline != NULL )
     {
diff --git a/xen/arch/x86/include/asm/bootinfo.h b/xen/arch/x86/include/asm/bootinfo.h
index 0fbf36739782..b1549d8c8f93 100644
--- a/xen/arch/x86/include/asm/bootinfo.h
+++ b/xen/arch/x86/include/asm/bootinfo.h
@@ -33,8 +33,10 @@ struct boot_module {
     /*
      * Module State Flags:
      *   relocated: indicates module has been relocated in memory.
+     *   released:  indicates module's pages have been freed.
      */
     bool relocated:1;
+    bool released:1;
 };
 
 /*
diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
index d7ed4f40024c..d68d37a5293b 100644
--- a/xen/arch/x86/include/asm/setup.h
+++ b/xen/arch/x86/include/asm/setup.h
@@ -36,12 +36,13 @@ void setup_io_bitmap(struct domain *d);
 extern struct boot_info xen_boot_info;
 
 unsigned long initial_images_nrpages(nodeid_t node);
-void discard_initial_images(void);
+void release_module(const module_t *m, bool mapped);
 
 struct boot_module;
 void *bootstrap_map_bm(const struct boot_module *bm);
 void *bootstrap_map(const module_t *mod);
 void bootstrap_unmap(void);
+void release_boot_module(struct boot_module *bm, bool mapped);
 
 struct rangeset;
 int remove_xen_ranges(struct rangeset *r);
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index cc882bee61c3..c1f44502a1ac 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -630,9 +630,7 @@ static int __init dom0_construct(struct domain *d,
                 }
             memcpy(page_to_virt(page), mfn_to_virt(initrd->mod_start),
                    initrd_len);
-            mpt_alloc = (paddr_t)initrd->mod_start << PAGE_SHIFT;
-            init_domheap_pages(mpt_alloc,
-                               mpt_alloc + PAGE_ALIGN(initrd_len));
+            release_module(initrd, true);
             initrd->mod_start = initrd_mfn = mfn_x(page_to_mfn(page));
         }
         else
@@ -640,18 +638,9 @@ static int __init dom0_construct(struct domain *d,
             while ( count-- )
                 if ( assign_pages(mfn_to_page(_mfn(mfn++)), 1, d, 0) )
                     BUG();
+            release_module(initrd, false);
         }
 
-        /*
-         * We have either:
-         * - Mapped the initrd directly into dom0, or
-         * - Copied it and freed the module.
-         *
-         * Either way, tell discard_initial_images() to not free it a second
-         * time.
-         */
-        initrd->mod_end = 0;
-
         iommu_memory_setup(d, "initrd", mfn_to_page(_mfn(initrd_mfn)),
                            PFN_UP(initrd_len), &flush_flags);
     }
@@ -839,7 +828,9 @@ static int __init dom0_construct(struct domain *d,
         printk("Failed to load the kernel binary\n");
         goto out;
     }
+    /* All done with kernel, release the module pages */
     bootstrap_unmap();
+    release_module(image, true);
 
     if ( UNSET_ADDR != parms.virt_hypercall )
     {
@@ -854,9 +845,6 @@ static int __init dom0_construct(struct domain *d,
         init_hypercall_page(d, _p(parms.virt_hypercall));
     }
 
-    /* Free temporary buffers. */
-    discard_initial_images();
-
     /* Set up start info area. */
     si = (start_info_t *)vstartinfo_start;
     clear_page(si);
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index d061ece0541f..e6d2d25fd038 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -341,27 +341,55 @@ unsigned long __init initial_images_nrpages(nodeid_t node)
     return nr;
 }
 
-void __init discard_initial_images(void) /* a.k.a. Free boot modules */
+void __init release_boot_module(struct boot_module *bm, bool free_mem)
+{
+    uint64_t start = pfn_to_paddr(bm->mod->mod_start);
+    uint64_t size  = bm->mod->mod_end;
+
+    if ( bm->released )
+    {
+        printk(XENLOG_WARNING "Attempt second release boot module of type %d\n",
+               bm->type);
+        return;
+    }
+
+    if ( free_mem )
+        init_domheap_pages(start, start + PAGE_ALIGN(size));
+
+    bm->released = true;
+}
+
+void __init release_module(const module_t *m, bool free_mem)
 {
     struct boot_info *bi = &xen_boot_info;
     unsigned int i;
 
-    for ( i = 0; i < bi->nr_modules; ++i )
+    for ( i = 0; i < bi->nr_modules; i++ )
     {
-        uint64_t start = pfn_to_paddr(bi->mods[i].mod->mod_start);
-        uint64_t size  = bi->mods[i].mod->mod_end;
+        if ( bi->mods[i].mod == m )
+            release_boot_module(&bi->mods[i], free_mem);
+    }
+}
 
-        /*
-         * Sometimes the initrd is mapped, rather than copied, into dom0.
-         * Size being 0 is how we're instructed to leave the module alone.
-         */
-        if ( size == 0 )
+static void __init discard_unknown_boot_modules(void)
+{
+    struct boot_info *bi = &xen_boot_info;
+    unsigned int i, count = 0;
+
+    for_each_boot_module_by_type(i, bi, BOOTMOD_UNKNOWN)
+    {
+        struct boot_module *bm = &bi->mods[i];
+
+        if ( bm == NULL || bm->released )
             continue;
 
-        init_domheap_pages(start, start + PAGE_ALIGN(size));
+        release_boot_module(bm, true);
+        count++;
     }
 
-    bi->nr_modules = 0;
+    if ( count )
+        printk(XENLOG_DEBUG "Releasing pages for uknown boot module %d\n",
+               count);
 }
 
 static void __init init_idle_domain(void)
@@ -2111,6 +2139,8 @@ void asmlinkage __init noreturn __start_xen(void)
                    initrdidx);
     }
 
+    discard_unknown_boot_modules();
+
     /*
      * We're going to setup domain0 using the module(s) that we stashed safely
      * above our heap. The second module, if present, is an initrd ramdisk.
@@ -2122,6 +2152,11 @@ void asmlinkage __init noreturn __start_xen(void)
     if ( !dom0 )
         panic("Could not set up DOM0 guest OS\n");
 
+    /* Check and warn if any modules did not get released */
+    for ( i = 0; i < bi->nr_modules; i++ )
+        if ( !bi->mods[i].released )
+            printk(XENLOG_ERR "Boot module %d not released, memory leaked", i);
+
     heap_init_late();
 
     init_trace_bufs();
diff --git a/xen/xsm/xsm_core.c b/xen/xsm/xsm_core.c
index f255fb63bf6f..d5875599b63a 100644
--- a/xen/xsm/xsm_core.c
+++ b/xen/xsm/xsm_core.c
@@ -162,6 +162,11 @@ int __init xsm_multiboot_init(struct boot_info *bi)
     ret = xsm_core_init(policy_buffer, policy_size);
     bootstrap_unmap();
 
+    /* If the policy was loaded from a boot module, release it's memory */
+    ret = first_boot_module_index(bi, BOOTMOD_XSM_POLICY);
+    if ( ret >= 0 && ret < bi->nr_modules )
+        release_boot_module(&bi->mods[ret], true);
+
     return 0;
 }
 #endif
-- 
2.30.2
Re: [PATCH 04/12] x86/boot: introduce module release
Posted by Jason Andryuk 2 weeks, 2 days ago
On 2024-11-02 13:25, Daniel P. Smith wrote:
> A precarious approach was used to release the pages used to hold a boot module.
> The precariousness stemmed from the fact that in the case of PV dom0, the
> initrd module pages may be either mapped or explicitly copied into the dom0
> address space. So to handle this situation, the PV dom0 construction code will
> set the size of the module to zero, relying on discard_initial_images() to skip
> any modules with a size of zero.
> 
> A function is introduced to release a module when it is no longer needed that
> accepts a boolean parameter, free_mem, to indicate if the corresponding pages
> can be freed. To track that a module has been released, the boot module flag
> `released` is introduced.
> 
> The previous release model was a free all at once except those of size zeros,
> which would handle any unused modules passed. The new model is one of, free
> consumed module after usage is complete, thus unconsumed modules do not have a
> consumer to free them.

Slightly confusing.  Maybe just "The new model is to free modules after 
they are consumed.  Thus unconsumed modules are not freed."

> To address this, the discard_uknown_boot_modules() is

"unknown"

> introduced and called after the last module identification occurs, initrd, to
> free the pages of any boot modules that are identified as not being released.
> After domain construction completes, all modules should be freed. A walk of the
> boot modules is added after domain construction to validate and notify if a
> module is found not to have been released.
> 
> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
> ---
> Changes since v7:
> - This is a new approach as an alternative to the `consumed` flag.
> ---
>   xen/arch/x86/cpu/microcode/core.c   |  4 ++
>   xen/arch/x86/hvm/dom0_build.c       |  7 ++--
>   xen/arch/x86/include/asm/bootinfo.h |  2 +
>   xen/arch/x86/include/asm/setup.h    |  3 +-
>   xen/arch/x86/pv/dom0_build.c        | 20 ++--------
>   xen/arch/x86/setup.c                | 57 +++++++++++++++++++++++------
>   xen/xsm/xsm_core.c                  |  5 +++
>   7 files changed, 67 insertions(+), 31 deletions(-)
> 

> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index d061ece0541f..e6d2d25fd038 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -341,27 +341,55 @@ unsigned long __init initial_images_nrpages(nodeid_t node)
>       return nr;
>   }
>   
> -void __init discard_initial_images(void) /* a.k.a. Free boot modules */
> +void __init release_boot_module(struct boot_module *bm, bool free_mem)
> +{
> +    uint64_t start = pfn_to_paddr(bm->mod->mod_start);
> +    uint64_t size  = bm->mod->mod_end;
> +
> +    if ( bm->released )
> +    {
> +        printk(XENLOG_WARNING "Attempt second release boot module of type %d\n",
> +               bm->type);
> +        return;
> +    }
> +
> +    if ( free_mem )
> +        init_domheap_pages(start, start + PAGE_ALIGN(size));
> +
> +    bm->released = true;
> +}
> +
> +void __init release_module(const module_t *m, bool free_mem)
>   {
>       struct boot_info *bi = &xen_boot_info;
>       unsigned int i;
>   
> -    for ( i = 0; i < bi->nr_modules; ++i )
> +    for ( i = 0; i < bi->nr_modules; i++ )
>       {
> -        uint64_t start = pfn_to_paddr(bi->mods[i].mod->mod_start);
> -        uint64_t size  = bi->mods[i].mod->mod_end;
> +        if ( bi->mods[i].mod == m )
> +            release_boot_module(&bi->mods[i], free_mem);
> +    }
> +}
>   
> -        /*
> -         * Sometimes the initrd is mapped, rather than copied, into dom0.
> -         * Size being 0 is how we're instructed to leave the module alone.
> -         */
> -        if ( size == 0 )
> +static void __init discard_unknown_boot_modules(void)
> +{
> +    struct boot_info *bi = &xen_boot_info;
> +    unsigned int i, count = 0;
> +
> +    for_each_boot_module_by_type(i, bi, BOOTMOD_UNKNOWN)

for_each_boot_module_by_type ( i, bi, BOOTMOD_UNKNOWN )

To match style from 74af2d98276d ("x86/boot: eliminate module_map")

> +    {
> +        struct boot_module *bm = &bi->mods[i];
> +
> +        if ( bm == NULL || bm->released )
>               continue;
>   
> -        init_domheap_pages(start, start + PAGE_ALIGN(size));
> +        release_boot_module(bm, true);
> +        count++;
>       }
>   
> -    bi->nr_modules = 0;
> +    if ( count )
> +        printk(XENLOG_DEBUG "Releasing pages for uknown boot module %d\n",

"unknown".  Since the operation already happened, maybe "Released pages 
for %d unknown boot modules"?  I'm not sure of the value of that 
message.  It would be more informative to provide the module index and 
maybe a page count.  The index would at least point to which module is 
unused.

> +               count);
>   }
>   
>   static void __init init_idle_domain(void)
> @@ -2111,6 +2139,8 @@ void asmlinkage __init noreturn __start_xen(void)
>                      initrdidx);
>       }
>   
> +    discard_unknown_boot_modules();
> +
>       /*
>        * We're going to setup domain0 using the module(s) that we stashed safely
>        * above our heap. The second module, if present, is an initrd ramdisk.
> @@ -2122,6 +2152,11 @@ void asmlinkage __init noreturn __start_xen(void)
>       if ( !dom0 )
>           panic("Could not set up DOM0 guest OS\n");
>   
> +    /* Check and warn if any modules did not get released */
> +    for ( i = 0; i < bi->nr_modules; i++ )
> +        if ( !bi->mods[i].released )
> +            printk(XENLOG_ERR "Boot module %d not released, memory leaked", i);
> +

Why not release the memory here instead of leaking it?

I feel like the corner case of mapping the dom0 initrd is leading to 
this manual approach or releasing modules individually.  That logic then 
has to be spread around.  discard_initial_images() kept the logic 
centralized.  Maybe just replace the mod_end == 0 special case with a 
"don't release me" flag that is checked in discard_initial_images()?

Regards,
Jason
Re: [PATCH 04/12] x86/boot: introduce module release
Posted by Daniel P. Smith 2 weeks, 2 days ago
On 11/6/24 17:16, Jason Andryuk wrote:
> On 2024-11-02 13:25, Daniel P. Smith wrote:
>> A precarious approach was used to release the pages used to hold a 
>> boot module.
>> The precariousness stemmed from the fact that in the case of PV dom0, the
>> initrd module pages may be either mapped or explicitly copied into the 
>> dom0
>> address space. So to handle this situation, the PV dom0 construction 
>> code will
>> set the size of the module to zero, relying on 
>> discard_initial_images() to skip
>> any modules with a size of zero.
>>
>> A function is introduced to release a module when it is no longer 
>> needed that
>> accepts a boolean parameter, free_mem, to indicate if the 
>> corresponding pages
>> can be freed. To track that a module has been released, the boot 
>> module flag
>> `released` is introduced.
>>
>> The previous release model was a free all at once except those of size 
>> zeros,
>> which would handle any unused modules passed. The new model is one of, 
>> free
>> consumed module after usage is complete, thus unconsumed modules do 
>> not have a
>> consumer to free them.
> 
> Slightly confusing.  Maybe just "The new model is to free modules after 
> they are consumed.  Thus unconsumed modules are not freed."

okay.

>> To address this, the discard_uknown_boot_modules() is
> 
> "unknown"

Ack

>> introduced and called after the last module identification occurs, 
>> initrd, to
>> free the pages of any boot modules that are identified as not being 
>> released.
>> After domain construction completes, all modules should be freed. A 
>> walk of the
>> boot modules is added after domain construction to validate and notify 
>> if a
>> module is found not to have been released.
>>
>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>> ---
>> Changes since v7:
>> - This is a new approach as an alternative to the `consumed` flag.
>> ---
>>   xen/arch/x86/cpu/microcode/core.c   |  4 ++
>>   xen/arch/x86/hvm/dom0_build.c       |  7 ++--
>>   xen/arch/x86/include/asm/bootinfo.h |  2 +
>>   xen/arch/x86/include/asm/setup.h    |  3 +-
>>   xen/arch/x86/pv/dom0_build.c        | 20 ++--------
>>   xen/arch/x86/setup.c                | 57 +++++++++++++++++++++++------
>>   xen/xsm/xsm_core.c                  |  5 +++
>>   7 files changed, 67 insertions(+), 31 deletions(-)
>>
> 
>> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
>> index d061ece0541f..e6d2d25fd038 100644
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -341,27 +341,55 @@ unsigned long __init 
>> initial_images_nrpages(nodeid_t node)
>>       return nr;
>>   }
>> -void __init discard_initial_images(void) /* a.k.a. Free boot modules */
>> +void __init release_boot_module(struct boot_module *bm, bool free_mem)
>> +{
>> +    uint64_t start = pfn_to_paddr(bm->mod->mod_start);
>> +    uint64_t size  = bm->mod->mod_end;
>> +
>> +    if ( bm->released )
>> +    {
>> +        printk(XENLOG_WARNING "Attempt second release boot module of 
>> type %d\n",
>> +               bm->type);
>> +        return;
>> +    }
>> +
>> +    if ( free_mem )
>> +        init_domheap_pages(start, start + PAGE_ALIGN(size));
>> +
>> +    bm->released = true;
>> +}
>> +
>> +void __init release_module(const module_t *m, bool free_mem)
>>   {
>>       struct boot_info *bi = &xen_boot_info;
>>       unsigned int i;
>> -    for ( i = 0; i < bi->nr_modules; ++i )
>> +    for ( i = 0; i < bi->nr_modules; i++ )
>>       {
>> -        uint64_t start = pfn_to_paddr(bi->mods[i].mod->mod_start);
>> -        uint64_t size  = bi->mods[i].mod->mod_end;
>> +        if ( bi->mods[i].mod == m )
>> +            release_boot_module(&bi->mods[i], free_mem);
>> +    }
>> +}
>> -        /*
>> -         * Sometimes the initrd is mapped, rather than copied, into 
>> dom0.
>> -         * Size being 0 is how we're instructed to leave the module 
>> alone.
>> -         */
>> -        if ( size == 0 )
>> +static void __init discard_unknown_boot_modules(void)
>> +{
>> +    struct boot_info *bi = &xen_boot_info;
>> +    unsigned int i, count = 0;
>> +
>> +    for_each_boot_module_by_type(i, bi, BOOTMOD_UNKNOWN)
> 
> for_each_boot_module_by_type ( i, bi, BOOTMOD_UNKNOWN )
> 
> To match style from 74af2d98276d ("x86/boot: eliminate module_map")

Ack.

>> +    {
>> +        struct boot_module *bm = &bi->mods[i];
>> +
>> +        if ( bm == NULL || bm->released )
>>               continue;
>> -        init_domheap_pages(start, start + PAGE_ALIGN(size));
>> +        release_boot_module(bm, true);
>> +        count++;
>>       }
>> -    bi->nr_modules = 0;
>> +    if ( count )
>> +        printk(XENLOG_DEBUG "Releasing pages for uknown boot module 
>> %d\n",
> 
> "unknown".  Since the operation already happened, maybe "Released pages 
> for %d unknown boot modules"?  I'm not sure of the value of that 
> message.  It would be more informative to provide the module index and 
> maybe a page count.  The index would at least point to which module is 
> unused.

Ack to unknown.

Can adjust the phrasing, the question is there a desire to have a 
message for every boot module freed. Guess I could do a single log line 
split across multiple printks, Thinking about the case where someone 
tried to abuse the interface by loading a bunch of unused modules.

>> +               count);
>>   }
>>   static void __init init_idle_domain(void)
>> @@ -2111,6 +2139,8 @@ void asmlinkage __init noreturn __start_xen(void)
>>                      initrdidx);
>>       }
>> +    discard_unknown_boot_modules();
>> +
>>       /*
>>        * We're going to setup domain0 using the module(s) that we 
>> stashed safely
>>        * above our heap. The second module, if present, is an initrd 
>> ramdisk.
>> @@ -2122,6 +2152,11 @@ void asmlinkage __init noreturn __start_xen(void)
>>       if ( !dom0 )
>>           panic("Could not set up DOM0 guest OS\n");
>> +    /* Check and warn if any modules did not get released */
>> +    for ( i = 0; i < bi->nr_modules; i++ )
>> +        if ( !bi->mods[i].released )
>> +            printk(XENLOG_ERR "Boot module %d not released, memory 
>> leaked", i);
>> +
> 
> Why not release the memory here instead of leaking it?

Because you don't know if it was mapped or consumed.

> I feel like the corner case of mapping the dom0 initrd is leading to 
> this manual approach or releasing modules individually.  That logic then 
> has to be spread around.  discard_initial_images() kept the logic 
> centralized.  Maybe just replace the mod_end == 0 special case with a 
> "don't release me" flag that is checked in discard_initial_images()?

That is what started me at the options to deal with it. The two I came 
up with was a flag and this approach. Weighing the pros/cons of the two, 
the deciding factor is when multi-domain construction is introduced. 
With multi-domain with a large number of domains, a balance has to be 
struck between holding all the kernels and ramdisks in memory plus being 
able to allocate the desired amount of memory for each domain. Perhaps a 
balance can be struck, with a discard_consumed_boot_modules() that walks 
the boot module list, and discard the ones consumed. While only dom0 can 
be constructed, it would be called once after create_dom0 call returns 
(per Andy's suggestion in response to this comment). An expansion on 
this could be that instead of using a free flag, use a ref count that is 
incremented as it is claimed and the decremented when it has been consumed.

v/r,
dps


Re: [PATCH 04/12] x86/boot: introduce module release
Posted by Jason Andryuk 2 weeks, 2 days ago
On 2024-11-06 19:46, Daniel P. Smith wrote:
> On 11/6/24 17:16, Jason Andryuk wrote:
>> On 2024-11-02 13:25, Daniel P. Smith wrote:
>>> A precarious approach was used to release the pages used to hold a 
>>> boot module.
>>> The precariousness stemmed from the fact that in the case of PV dom0, 
>>> the
>>> initrd module pages may be either mapped or explicitly copied into 
>>> the dom0
>>> address space. So to handle this situation, the PV dom0 construction 
>>> code will
>>> set the size of the module to zero, relying on 
>>> discard_initial_images() to skip
>>> any modules with a size of zero.
>>>
>>> A function is introduced to release a module when it is no longer 
>>> needed that
>>> accepts a boolean parameter, free_mem, to indicate if the 
>>> corresponding pages
>>> can be freed. To track that a module has been released, the boot 
>>> module flag
>>> `released` is introduced.
>>>
>>> The previous release model was a free all at once except those of 
>>> size zeros,
>>> which would handle any unused modules passed. The new model is one 
>>> of, free
>>> consumed module after usage is complete, thus unconsumed modules do 
>>> not have a
>>> consumer to free them.
>>
>> Slightly confusing.  Maybe just "The new model is to free modules 
>> after they are consumed.  Thus unconsumed modules are not freed."
> 
> okay.
> 
>>> To address this, the discard_uknown_boot_modules() is
>>
>> "unknown"
> 
> Ack
> 
>>> introduced and called after the last module identification occurs, 
>>> initrd, to
>>> free the pages of any boot modules that are identified as not being 
>>> released.
>>> After domain construction completes, all modules should be freed. A 
>>> walk of the
>>> boot modules is added after domain construction to validate and 
>>> notify if a
>>> module is found not to have been released.
>>>
>>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>>> ---

>>> +    {
>>> +        struct boot_module *bm = &bi->mods[i];
>>> +
>>> +        if ( bm == NULL || bm->released )
>>>               continue;
>>> -        init_domheap_pages(start, start + PAGE_ALIGN(size));
>>> +        release_boot_module(bm, true);
>>> +        count++;
>>>       }
>>> -    bi->nr_modules = 0;
>>> +    if ( count )
>>> +        printk(XENLOG_DEBUG "Releasing pages for uknown boot module 
>>> %d\n",
>>
>> "unknown".  Since the operation already happened, maybe "Released 
>> pages for %d unknown boot modules"?  I'm not sure of the value of that 
>> message.  It would be more informative to provide the module index and 
>> maybe a page count.  The index would at least point to which module is 
>> unused.
> 
> Ack to unknown.
> 
> Can adjust the phrasing, the question is there a desire to have a 
> message for every boot module freed. Guess I could do a single log line 
> split across multiple printks, Thinking about the case where someone 
> tried to abuse the interface by loading a bunch of unused modules.

It's 63 modules, so not too many.  And this is the boot path, so it's 
administrator controlled and shouldn't really be subject to abuse.

I think printing the index of unused modules makes the message useful. 
If a module is provided to Xen, but Xen doesn't use it, it is worth 
flagging which one.  As an example, if I thought module 3 was ucode, but 
it isn't getting used, that is useful information for further investigation.

>>> +               count);
>>>   }
>>>   static void __init init_idle_domain(void)
>>> @@ -2111,6 +2139,8 @@ void asmlinkage __init noreturn __start_xen(void)
>>>                      initrdidx);
>>>       }
>>> +    discard_unknown_boot_modules();
>>> +
>>>       /*
>>>        * We're going to setup domain0 using the module(s) that we 
>>> stashed safely
>>>        * above our heap. The second module, if present, is an initrd 
>>> ramdisk.
>>> @@ -2122,6 +2152,11 @@ void asmlinkage __init noreturn __start_xen(void)
>>>       if ( !dom0 )
>>>           panic("Could not set up DOM0 guest OS\n");
>>> +    /* Check and warn if any modules did not get released */
>>> +    for ( i = 0; i < bi->nr_modules; i++ )
>>> +        if ( !bi->mods[i].released )
>>> +            printk(XENLOG_ERR "Boot module %d not released, memory 
>>> leaked", i);
>>> +
>>
>> Why not release the memory here instead of leaking it?
> 
> Because you don't know if it was mapped or consumed.

So this is more of a sanity check since it should not trigger?  i.e. 
it's a bug to see this message.

>> I feel like the corner case of mapping the dom0 initrd is leading to 
>> this manual approach or releasing modules individually.  That logic 
>> then has to be spread around.  discard_initial_images() kept the logic 
>> centralized.  Maybe just replace the mod_end == 0 special case with a 
>> "don't release me" flag that is checked in discard_initial_images()?
> 
> That is what started me at the options to deal with it. The two I came 
> up with was a flag and this approach. Weighing the pros/cons of the two, 
> the deciding factor is when multi-domain construction is introduced. 
> With multi-domain with a large number of domains, a balance has to be 
> struck between holding all the kernels and ramdisks in memory plus being 
> able to allocate the desired amount of memory for each domain.

So you're saying that by piece-wise free-ing memory, you can have more 
domUs loaded by the boot loader?  If free-ing is delayed to the end, 
memory is tied up that could otherwise be assigned to domUs?

Is this the real motivation?  If so, it belongs in the commit message. 
As it is, the commit message is lacking in a specify reason why this 
change is needed.

> Perhaps a 
> balance can be struck, with a discard_consumed_boot_modules() that walks 
> the boot module list, and discard the ones consumed. While only dom0 can 
> be constructed, it would be called once after create_dom0 call returns 
> (per Andy's suggestion in response to this comment). An expansion on 
> this could be that instead of using a free flag, use a ref count that is 
> incremented as it is claimed and the decremented when it has been consumed.

Are you expecting 1 kernel image gets used to create N domUs?  Or are 
you only handling 1:1?

Is mapping the initrd only expected for a single PV dom0, or do you 
envision mapping multiple initrds for multiple PV domUs?

It seems to me that more mapping might help alleviate keeping memory 
tied up.  Freeing XSM and ucode early helps, but those aren't 
particularly big.  An initrd in the 100s of MBs is significant though. 
If you hyperlaunch Xen and 2 domUs, you'd want to be able to assign ~all 
the memory.  Losing 2x ~100MB as unassigned for copies of the initrds is 
wasteful.

Regards,
Jason

Re: [PATCH v8 04/12] x86/boot: introduce module release
Posted by Andrew Cooper 2 weeks, 2 days ago
On 06/11/2024 10:16 pm, Jason Andryuk wrote:
> On 2024-11-02 13:25, Daniel P. Smith wrote:
>> @@ -2122,6 +2152,11 @@ void asmlinkage __init noreturn __start_xen(void)
>>       if ( !dom0 )
>>           panic("Could not set up DOM0 guest OS\n");
>>   +    /* Check and warn if any modules did not get released */
>> +    for ( i = 0; i < bi->nr_modules; i++ )
>> +        if ( !bi->mods[i].released )
>> +            printk(XENLOG_ERR "Boot module %d not released, memory
>> leaked", i);
>> +
>
> Why not release the memory here instead of leaking it?
>
> I feel like the corner case of mapping the dom0 initrd is leading to
> this manual approach or releasing modules individually.  That logic
> then has to be spread around.  discard_initial_images() kept the logic
> centralized.  Maybe just replace the mod_end == 0 special case with a
> "don't release me" flag that is checked in discard_initial_images()?

I've been conflicted on this for ages, but I agree.

There are many wonky things with the current arrangement.

First(ish), discard_initial_images() should be named
discard_boot_modules() (if it's saying a separate function).

Second, the individual dom0_build.c's should not be calling this
directly.  They're both right at the end of construction, so it would
make far more sense for __start_xen() to do this after create_dom0(). 
That also avoids needing to export the function.

And yes, as Jason says, the "initrd mapped instead of copied" is a weird
corner case, and the other path manually releasing and also saying
"don't free" is just pointless complexity.  I do like the idea of a
"dont_free" flag.

Thoughts?

I know this would be moving back to a more v7-like approach, but I think
the end approach is cleaner.

~Andrew

Re: [PATCH v8 04/12] x86/boot: introduce module release
Posted by Daniel P. Smith 2 weeks, 2 days ago
On 11/6/24 17:46, Andrew Cooper wrote:
> On 06/11/2024 10:16 pm, Jason Andryuk wrote:
>> On 2024-11-02 13:25, Daniel P. Smith wrote:
>>> @@ -2122,6 +2152,11 @@ void asmlinkage __init noreturn __start_xen(void)
>>>        if ( !dom0 )
>>>            panic("Could not set up DOM0 guest OS\n");
>>>    +    /* Check and warn if any modules did not get released */
>>> +    for ( i = 0; i < bi->nr_modules; i++ )
>>> +        if ( !bi->mods[i].released )
>>> +            printk(XENLOG_ERR "Boot module %d not released, memory
>>> leaked", i);
>>> +
>>
>> Why not release the memory here instead of leaking it?
>>
>> I feel like the corner case of mapping the dom0 initrd is leading to
>> this manual approach or releasing modules individually.  That logic
>> then has to be spread around.  discard_initial_images() kept the logic
>> centralized.  Maybe just replace the mod_end == 0 special case with a
>> "don't release me" flag that is checked in discard_initial_images()?
> 
> I've been conflicted on this for ages, but I agree.
> 
> There are many wonky things with the current arrangement.
> 
> First(ish), discard_initial_images() should be named
> discard_boot_modules() (if it's saying a separate function).
> 
> Second, the individual dom0_build.c's should not be calling this
> directly.  They're both right at the end of construction, so it would
> make far more sense for __start_xen() to do this after create_dom0().
> That also avoids needing to export the function.
> 
> And yes, as Jason says, the "initrd mapped instead of copied" is a weird
> corner case, and the other path manually releasing and also saying
> "don't free" is just pointless complexity.  I do like the idea of a
> "dont_free" flag.
> 
> Thoughts?

How about the ref count I suggested in response to Jason?

> I know this would be moving back to a more v7-like approach, but I think
> the end approach is cleaner.

No worries, the series is looking better each time.

v/r,
dps