[PATCH v10 00/13] switch to domheap for Xen page tables

Hongyan Xia posted 13 patches 3 years ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/cover.1619014052.git.hongyxia@amazon.com
xen/arch/x86/efi/runtime.h |  13 +-
xen/arch/x86/mm.c          | 247 ++++++++++++++++++++++---------------
xen/arch/x86/setup.c       |   4 +-
xen/arch/x86/smpboot.c     |  70 +++++++----
xen/arch/x86/x86_64/mm.c   |  80 +++++++-----
xen/common/efi/boot.c      |  83 ++++++++-----
xen/common/efi/efi.h       |   3 +-
xen/common/efi/runtime.c   |   8 +-
xen/include/asm-x86/mm.h   |   7 +-
xen/include/asm-x86/page.h |   5 -
10 files changed, 314 insertions(+), 206 deletions(-)
[PATCH v10 00/13] switch to domheap for Xen page tables
Posted by Hongyan Xia 3 years ago
From: Hongyan Xia <hongyxia@amazon.com>

This series rewrites all the remaining functions and finally makes the
switch from xenheap to domheap for Xen page tables, so that they no
longer need to rely on the direct map, which is a big step towards
removing the direct map.

---
Changed in v10:
- rebase.
- address comments in 01/13, which propagates a change into 02/13.

Changed in v9:
- drop first 2 patches which have been merged in XSA-345.
- adjust code around L3 page locking in mm.c.

Changed in v8:
- address comments in v7.
- rebase

Changed in v7:
- rebase and cleanup.
- address comments in v6.
- add alloc_map_clear_xen_pt() helper to simplify the patches in this
  series.

Changed in v6:
- drop the patches that have already been merged.
- rebase and cleanup.
- rewrite map_pages_to_xen() and modify_xen_mappings() in a way that
  does not require an end_of_loop goto label.

Hongyan Xia (2):
  x86/mm: drop old page table APIs
  x86: switch to use domheap page for page tables

Wei Liu (11):
  x86/mm: rewrite virt_to_xen_l*e
  x86/mm: switch to new APIs in map_pages_to_xen
  x86/mm: switch to new APIs in modify_xen_mappings
  x86_64/mm: introduce pl2e in paging_init
  x86_64/mm: switch to new APIs in paging_init
  x86_64/mm: switch to new APIs in setup_m2p_table
  efi: use new page table APIs in copy_mapping
  efi: switch to new APIs in EFI code
  x86/smpboot: add exit path for clone_mapping()
  x86/smpboot: switch clone_mapping() to new APIs
  x86/mm: drop _new suffix for page table APIs

 xen/arch/x86/efi/runtime.h |  13 +-
 xen/arch/x86/mm.c          | 247 ++++++++++++++++++++++---------------
 xen/arch/x86/setup.c       |   4 +-
 xen/arch/x86/smpboot.c     |  70 +++++++----
 xen/arch/x86/x86_64/mm.c   |  80 +++++++-----
 xen/common/efi/boot.c      |  83 ++++++++-----
 xen/common/efi/efi.h       |   3 +-
 xen/common/efi/runtime.c   |   8 +-
 xen/include/asm-x86/mm.h   |   7 +-
 xen/include/asm-x86/page.h |   5 -
 10 files changed, 314 insertions(+), 206 deletions(-)

-- 
2.23.4


Re: [PATCH v10 00/13] switch to domheap for Xen page tables
Posted by Andrew Cooper 3 years ago
On 21/04/2021 15:15, Hongyan Xia wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
>
> This series rewrites all the remaining functions and finally makes the
> switch from xenheap to domheap for Xen page tables, so that they no
> longer need to rely on the direct map, which is a big step towards
> removing the direct map.

Staging is broken.  Xen hits an assertion just after dom0 starts.

(XEN) Freed 616kB init memory
mapping kernel into physical memory
about to get started...
(XEN) Assertion 'hashent->refcnt' failed at domain_page.c:204
(XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d040316f80>] unmap_domain_page+0x2af/0x2e0
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor (d0v0)
(XEN) rax: 0000000000000000   rbx: ffff831c47bf9040   rcx: ffff831c47c1a000
(XEN) rdx: 0000000000000092   rsi: 0000000000000092   rdi: 0000000000000206
(XEN) rbp: ffff8300a5ca7c88   rsp: ffff8300a5ca7c78   r8:  0000000001c4f2fc
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000092018   r13: 0000000000800163   r14: fff0000000000000
(XEN) r15: 0000000000000001   cr0: 0000000080050033   cr4: 00000000003406e0
(XEN) cr3: 0000001c42008000   cr2: ffffc9000133d000
(XEN) fsb: 0000000000000000   gsb: ffff888266a00000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d040316f80> (unmap_domain_page+0x2af/0x2e0):
(XEN)  14 04 00 00 eb 19 0f 0b <0f> 0b 0f 0b ba 00 00 00 00 48 89 10 48
8b 81 d0
(XEN) Xen stack trace from rsp=ffff8300a5ca7c78:
(XEN)    ffff820040092018 0000000000000000 ffff8300a5ca7d58 ffff82d040327e20
(XEN)    a000000000000000 0000000000000000 ffff82d0405dbd40 008001e300000000
(XEN)    8000000000000000 8000000000000000 00000000000001e3 00000000000001e3
(XEN)    8000000000000000 0000000000000000 8000000000000163 0000000001440000
(XEN)    ffff82e0014b92e0 0000000301c1a000 0000000000000000 ffff820040090800
(XEN)    00000000026c10d8 0000000001c4f2fc 8010001c4240f067 ffff8300a5ca7df0
(XEN)    ffff82c00071c000 0000000000000001 0000000000001000 ffff8300a5ca7df8
(XEN)    ffff8300a5ca7dc8 ffff82d040232c08 ffff8300a5ca7db8 0000000140088078
(XEN)    ffff8300a5ca7df0 0080016300000001 ffffffff00000000 ffff82c00071c000
(XEN)    ffff82d0405b1300 ffff831c47bf9000 ffff82e04d821ae0 00000000026c10d7
(XEN)    ffff831c47c1a000 0000000000000100 ffff8300a5ca7dd8 ffff82d040232cdb
(XEN)    ffff8300a5ca7df8 ffff82d04031718b ffff8300a5ca7df8 00000000026c10d7
(XEN)    ffff8300a5ca7e38 ffff82d040209cb6 ffff831c47c1a018 0000000000000000
(XEN)    ffffffff82003e90 ffff831c47c1a018 ffff831c47bf9000 fffffffffffffff2
(XEN)    ffff8300a5ca7eb8 ffff82d04020a69a ffff82d04038a228 ffff82d04038a21c
(XEN)    00000000026c10d7 0000000000000100 ffff82d04038a228 ffff82d04038a21c
(XEN)    ffff82d04038a228 ffff82d04038a21c ffff82d04038a228 ffff8300a5ca7ef8
(XEN)    ffff831c47bf9000 0000000000000003 0000000000000000 0000000000000000
(XEN)    ffff8300a5ca7ee8 ffff82d040306e14 ffff82d04038a228 ffff831c47bf9000
(XEN)    0000000000000000 0000000000000000 00007cff5a3580e7 ffff82d04038a29d
(XEN) Xen call trace:
(XEN)    [<ffff82d040316f80>] R unmap_domain_page+0x2af/0x2e0
(XEN)    [<ffff82d040327e20>] F map_pages_to_xen+0x101a/0x1166
(XEN)    [<ffff82d040232c08>] F __vmap+0x332/0x3cd
(XEN)    [<ffff82d040232cdb>] F vmap+0x38/0x3a
(XEN)    [<ffff82d04031718b>] F map_domain_page_global+0x46/0x51
(XEN)    [<ffff82d040209cb6>] F map_vcpu_info+0x129/0x2c5
(XEN)    [<ffff82d04020a69a>] F do_vcpu_op+0x1eb/0x681
(XEN)    [<ffff82d040306e14>] F pv_hypercall+0x4e6/0x53d
(XEN)    [<ffff82d04038a29d>] F lstar_enter+0x12d/0x140
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'hashent->refcnt' failed at domain_page.c:204
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

I don't see an obvious candidate for the breakage.  Unless someone can
point one out quickly, I'll revert the lot to unblock staging.

~Andrew

Re: [PATCH v10 00/13] switch to domheap for Xen page tables
Posted by Hongyan Xia 3 years ago
Please see my reply in 03/13. Can you check this diff and see if you
can still trigger this issue:

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 50229e38d384..84e3ccf47e2a 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5532,7 +5532,6 @@ int map_pages_to_xen(
 
  out:
     L3T_UNLOCK(current_l3page);
-    unmap_domain_page(pl2e);
     unmap_domain_page(pl3e);
     unmap_domain_page(pl2e);
     return rc;
@@ -5830,6 +5829,7 @@ int modify_xen_mappings(unsigned long s, unsigned
long e, unsigned int nf)
  out:
     L3T_UNLOCK(current_l3page);
     unmap_domain_page(pl3e);
+    unmap_domain_page(pl2e);
     return rc;
 }

Hongyan
 
On Thu, 2021-04-22 at 17:21 +0100, Andrew Cooper wrote:
> On 21/04/2021 15:15, Hongyan Xia wrote:
> > From: Hongyan Xia <hongyxia@amazon.com>
> > 
> > This series rewrites all the remaining functions and finally makes
> > the
> > switch from xenheap to domheap for Xen page tables, so that they no
> > longer need to rely on the direct map, which is a big step towards
> > removing the direct map.
> 
> Staging is broken.  Xen hits an assertion just after dom0 starts.
> 
> (XEN) Freed 616kB init memory
> mapping kernel into physical memory
> about to get started...
> (XEN) Assertion 'hashent->refcnt' failed at domain_page.c:204
> (XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d040316f80>] unmap_domain_page+0x2af/0x2e0
> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor (d0v0)
> (XEN) rax: 0000000000000000   rbx: ffff831c47bf9040   rcx:
> ffff831c47c1a000
> (XEN) rdx: 0000000000000092   rsi: 0000000000000092   rdi:
> 0000000000000206
> (XEN) rbp: ffff8300a5ca7c88   rsp: ffff8300a5ca7c78   r8: 
> 0000000001c4f2fc
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11:
> 0000000000000000
> (XEN) r12: 0000000000092018   r13: 0000000000800163   r14:
> fff0000000000000
> (XEN) r15: 0000000000000001   cr0: 0000000080050033   cr4:
> 00000000003406e0
> (XEN) cr3: 0000001c42008000   cr2: ffffc9000133d000
> (XEN) fsb: 0000000000000000   gsb: ffff888266a00000   gss:
> 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen code around <ffff82d040316f80>
> (unmap_domain_page+0x2af/0x2e0):
> (XEN)  14 04 00 00 eb 19 0f 0b <0f> 0b 0f 0b ba 00 00 00 00 48 89 10
> 48
> 8b 81 d0
> (XEN) Xen stack trace from rsp=ffff8300a5ca7c78:
> (XEN)    ffff820040092018 0000000000000000 ffff8300a5ca7d58
> ffff82d040327e20
> (XEN)    a000000000000000 0000000000000000 ffff82d0405dbd40
> 008001e300000000
> (XEN)    8000000000000000 8000000000000000 00000000000001e3
> 00000000000001e3
> (XEN)    8000000000000000 0000000000000000 8000000000000163
> 0000000001440000
> (XEN)    ffff82e0014b92e0 0000000301c1a000 0000000000000000
> ffff820040090800
> (XEN)    00000000026c10d8 0000000001c4f2fc 8010001c4240f067
> ffff8300a5ca7df0
> (XEN)    ffff82c00071c000 0000000000000001 0000000000001000
> ffff8300a5ca7df8
> (XEN)    ffff8300a5ca7dc8 ffff82d040232c08 ffff8300a5ca7db8
> 0000000140088078
> (XEN)    ffff8300a5ca7df0 0080016300000001 ffffffff00000000
> ffff82c00071c000
> (XEN)    ffff82d0405b1300 ffff831c47bf9000 ffff82e04d821ae0
> 00000000026c10d7
> (XEN)    ffff831c47c1a000 0000000000000100 ffff8300a5ca7dd8
> ffff82d040232cdb
> (XEN)    ffff8300a5ca7df8 ffff82d04031718b ffff8300a5ca7df8
> 00000000026c10d7
> (XEN)    ffff8300a5ca7e38 ffff82d040209cb6 ffff831c47c1a018
> 0000000000000000
> (XEN)    ffffffff82003e90 ffff831c47c1a018 ffff831c47bf9000
> fffffffffffffff2
> (XEN)    ffff8300a5ca7eb8 ffff82d04020a69a ffff82d04038a228
> ffff82d04038a21c
> (XEN)    00000000026c10d7 0000000000000100 ffff82d04038a228
> ffff82d04038a21c
> (XEN)    ffff82d04038a228 ffff82d04038a21c ffff82d04038a228
> ffff8300a5ca7ef8
> (XEN)    ffff831c47bf9000 0000000000000003 0000000000000000
> 0000000000000000
> (XEN)    ffff8300a5ca7ee8 ffff82d040306e14 ffff82d04038a228
> ffff831c47bf9000
> (XEN)    0000000000000000 0000000000000000 00007cff5a3580e7
> ffff82d04038a29d
> (XEN) Xen call trace:
> (XEN)    [<ffff82d040316f80>] R unmap_domain_page+0x2af/0x2e0
> (XEN)    [<ffff82d040327e20>] F map_pages_to_xen+0x101a/0x1166
> (XEN)    [<ffff82d040232c08>] F __vmap+0x332/0x3cd
> (XEN)    [<ffff82d040232cdb>] F vmap+0x38/0x3a
> (XEN)    [<ffff82d04031718b>] F map_domain_page_global+0x46/0x51
> (XEN)    [<ffff82d040209cb6>] F map_vcpu_info+0x129/0x2c5
> (XEN)    [<ffff82d04020a69a>] F do_vcpu_op+0x1eb/0x681
> (XEN)    [<ffff82d040306e14>] F pv_hypercall+0x4e6/0x53d
> (XEN)    [<ffff82d04038a29d>] F lstar_enter+0x12d/0x140
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'hashent->refcnt' failed at domain_page.c:204
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> 
> I don't see an obvious candidate for the breakage.  Unless someone
> can
> point one out quickly, I'll revert the lot to unblock staging.
> 
> ~Andrew


Re: [PATCH v10 00/13] switch to domheap for Xen page tables
Posted by Julien Grall 3 years ago
Hi Hongyan,

On 22/04/2021 17:35, Hongyan Xia wrote:
> Please see my reply in 03/13. Can you check this diff and see if you
> can still trigger this issue:

I can reproduced the same issue as Andrew. I have applied the patch and 
confirm this resolves the problem. Can you send a formal patch?

BTW, feel free to add my Tested-by.

Cheers,

-- 
Julien Grall

Re: [PATCH v10 00/13] switch to domheap for Xen page tables
Posted by Andrew Cooper 3 years ago
On 22/04/2021 17:35, Hongyan Xia wrote:
> Please see my reply in 03/13. Can you check this diff and see if you
> can still trigger this issue:
>
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 50229e38d384..84e3ccf47e2a 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5532,7 +5532,6 @@ int map_pages_to_xen(
>  
>   out:
>      L3T_UNLOCK(current_l3page);
> -    unmap_domain_page(pl2e);
>      unmap_domain_page(pl3e);
>      unmap_domain_page(pl2e);
>      return rc;
> @@ -5830,6 +5829,7 @@ int modify_xen_mappings(unsigned long s, unsigned
> long e, unsigned int nf)
>   out:
>      L3T_UNLOCK(current_l3page);
>      unmap_domain_page(pl3e);
> +    unmap_domain_page(pl2e);
>      return rc;
>  }

Yup - that seems to fix things.

~Andrew