[MINI-OS PATCH 1/2] mm: provide a way to do very early page table allocations

Juergen Gross posted 2 patches 3 months, 3 weeks ago
There is a newer version of this series
[MINI-OS PATCH 1/2] mm: provide a way to do very early page table allocations
Posted by Juergen Gross 3 months, 3 weeks ago
Add a small pool of statically allocated memory pages to be handed out
for very early page table allocations.

This will make it possible to do virtual allocations e.g. for mapping
the shared info page.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/mm.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm.c b/arch/x86/mm.c
index bdff38fd..3f5c7ea7 100644
--- a/arch/x86/mm.c
+++ b/arch/x86/mm.c
@@ -640,12 +640,17 @@ void change_readonly(bool readonly)
  * return a valid PTE for a given virtual address. If PTE does not exist,
  * allocate page-table pages.
  */
+#define N_PTS 4
+static char early_pt[PAGE_SIZE * N_PTS] __attribute__((aligned(PAGE_SIZE)));
+static unsigned long n_early_pt = N_PTS;
+
 static int need_pgt_func(unsigned long va, unsigned int lvl, bool is_leaf,
                          pgentry_t *pte, void *par)
 {
     pgentry_t **result = par;
     unsigned long pt_mfn;
     unsigned long pt_pfn;
+    unsigned long pt_addr;
     unsigned int idx;
 
     if ( !is_leaf )
@@ -664,7 +669,16 @@ static int need_pgt_func(unsigned long va, unsigned int lvl, bool is_leaf,
     }
 
     pt_mfn = virt_to_mfn(pte);
-    pt_pfn = virt_to_pfn(alloc_page());
+    if ( n_early_pt )
+    {
+        n_early_pt--;
+        pt_addr = (unsigned long)&early_pt[n_early_pt * PAGE_SIZE];
+    }
+    else
+    {
+        pt_addr = alloc_page();
+    }
+    pt_pfn = virt_to_pfn(pt_addr);
     if ( !pt_pfn )
         return -1;
     idx = idx_from_va_lvl(va, lvl);
-- 
2.43.0
Re: [MINI-OS PATCH 1/2] mm: provide a way to do very early page table allocations
Posted by Jan Beulich 3 months ago
On 08.07.2025 08:37, Juergen Gross wrote:
> Add a small pool of statically allocated memory pages to be handed out
> for very early page table allocations.
> 
> This will make it possible to do virtual allocations e.g. for mapping
> the shared info page.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  arch/x86/mm.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm.c b/arch/x86/mm.c
> index bdff38fd..3f5c7ea7 100644
> --- a/arch/x86/mm.c
> +++ b/arch/x86/mm.c
> @@ -640,12 +640,17 @@ void change_readonly(bool readonly)
>   * return a valid PTE for a given virtual address. If PTE does not exist,
>   * allocate page-table pages.
>   */
> +#define N_PTS 4

Wouldn't it be prudent to have a comment here mentioning how this number
was derived, i.e. what's known to be covered? (To map the shared info
page I expect you really only need 3? Hence without a comment things may
remain unclear.)

> +static char early_pt[PAGE_SIZE * N_PTS] __attribute__((aligned(PAGE_SIZE)));

Maybe better early_pt[N_PTS][PAGE_SIZE], simplifying the allocation
code below a little?

> +static unsigned long n_early_pt = N_PTS;

unsigned int would do, I expect? With the suggestion above this could
then also use ARRAY_SIZE(), at which point there would be no real need
for N_PTS anymore.

>  static int need_pgt_func(unsigned long va, unsigned int lvl, bool is_leaf,
>                           pgentry_t *pte, void *par)
>  {
>      pgentry_t **result = par;
>      unsigned long pt_mfn;
>      unsigned long pt_pfn;
> +    unsigned long pt_addr;
>      unsigned int idx;
>  
>      if ( !is_leaf )
> @@ -664,7 +669,16 @@ static int need_pgt_func(unsigned long va, unsigned int lvl, bool is_leaf,
>      }
>  
>      pt_mfn = virt_to_mfn(pte);
> -    pt_pfn = virt_to_pfn(alloc_page());
> +    if ( n_early_pt )
> +    {
> +        n_early_pt--;
> +        pt_addr = (unsigned long)&early_pt[n_early_pt * PAGE_SIZE];
> +    }
> +    else
> +    {
> +        pt_addr = alloc_page();
> +    }

The failure pattern when one fails to increase early_pt[] is likely going
to be problematic. Wouldn't it be better to check for failure here?

Jan
Re: [MINI-OS PATCH 1/2] mm: provide a way to do very early page table allocations
Posted by Jürgen Groß 3 months ago
On 28.07.25 16:09, Jan Beulich wrote:
> On 08.07.2025 08:37, Juergen Gross wrote:
>> Add a small pool of statically allocated memory pages to be handed out
>> for very early page table allocations.
>>
>> This will make it possible to do virtual allocations e.g. for mapping
>> the shared info page.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   arch/x86/mm.c | 16 +++++++++++++++-
>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm.c b/arch/x86/mm.c
>> index bdff38fd..3f5c7ea7 100644
>> --- a/arch/x86/mm.c
>> +++ b/arch/x86/mm.c
>> @@ -640,12 +640,17 @@ void change_readonly(bool readonly)
>>    * return a valid PTE for a given virtual address. If PTE does not exist,
>>    * allocate page-table pages.
>>    */
>> +#define N_PTS 4
> 
> Wouldn't it be prudent to have a comment here mentioning how this number
> was derived, i.e. what's known to be covered? (To map the shared info
> page I expect you really only need 3? Hence without a comment things may
> remain unclear.)

Yes, 3 would have been enough. OTOH having another spare doesn't hurt, as
the memory will be allocated anyway.

I'll add a comment in this regard.

> 
>> +static char early_pt[PAGE_SIZE * N_PTS] __attribute__((aligned(PAGE_SIZE)));
> 
> Maybe better early_pt[N_PTS][PAGE_SIZE], simplifying the allocation
> code below a little?

Yes, good idea.

> 
>> +static unsigned long n_early_pt = N_PTS;
> 
> unsigned int would do, I expect? With the suggestion above this could
> then also use ARRAY_SIZE(), at which point there would be no real need
> for N_PTS anymore.

True.

> 
>>   static int need_pgt_func(unsigned long va, unsigned int lvl, bool is_leaf,
>>                            pgentry_t *pte, void *par)
>>   {
>>       pgentry_t **result = par;
>>       unsigned long pt_mfn;
>>       unsigned long pt_pfn;
>> +    unsigned long pt_addr;
>>       unsigned int idx;
>>   
>>       if ( !is_leaf )
>> @@ -664,7 +669,16 @@ static int need_pgt_func(unsigned long va, unsigned int lvl, bool is_leaf,
>>       }
>>   
>>       pt_mfn = virt_to_mfn(pte);
>> -    pt_pfn = virt_to_pfn(alloc_page());
>> +    if ( n_early_pt )
>> +    {
>> +        n_early_pt--;
>> +        pt_addr = (unsigned long)&early_pt[n_early_pt * PAGE_SIZE];
>> +    }
>> +    else
>> +    {
>> +        pt_addr = alloc_page();
>> +    }
> 
> The failure pattern when one fails to increase early_pt[] is likely going
> to be problematic. Wouldn't it be better to check for failure here?

Hmm, not sure this is true. I tried the shared info mapping without adding
the special early alloc code first and finding the bug was quite easy.


Juergen
Re: [MINI-OS PATCH 1/2] mm: provide a way to do very early page table allocations
Posted by Jan Beulich 3 months ago
On 28.07.2025 17:01, Jürgen Groß wrote:
> On 28.07.25 16:09, Jan Beulich wrote:
>> On 08.07.2025 08:37, Juergen Gross wrote:
>>> @@ -664,7 +669,16 @@ static int need_pgt_func(unsigned long va, unsigned int lvl, bool is_leaf,
>>>       }
>>>   
>>>       pt_mfn = virt_to_mfn(pte);
>>> -    pt_pfn = virt_to_pfn(alloc_page());
>>> +    if ( n_early_pt )
>>> +    {
>>> +        n_early_pt--;
>>> +        pt_addr = (unsigned long)&early_pt[n_early_pt * PAGE_SIZE];
>>> +    }
>>> +    else
>>> +    {
>>> +        pt_addr = alloc_page();
>>> +    }
>>
>> The failure pattern when one fails to increase early_pt[] is likely going
>> to be problematic. Wouldn't it be better to check for failure here?
> 
> Hmm, not sure this is true. I tried the shared info mapping without adding
> the special early alloc code first and finding the bug was quite easy.

Feel free to ignore my comment then.

Jan