[PATCH v2 00/18] x86: adventures in Address Space Isolation

Roger Pau Monne posted 18 patches 3 weeks, 6 days ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20250108142659.99490-1-roger.pau@citrix.com
docs/misc/xen-command-line.pandoc    |  24 +++
xen/arch/x86/cpu/mcheck/mce.c        |   4 +
xen/arch/x86/domain.c                | 157 +++++++++++----
xen/arch/x86/domain_page.c           | 105 ++++++----
xen/arch/x86/flushtlb.c              |  28 ++-
xen/arch/x86/hvm/hvm.c               |   6 -
xen/arch/x86/include/asm/config.h    |  16 +-
xen/arch/x86/include/asm/current.h   |  58 +++++-
xen/arch/x86/include/asm/desc.h      |   6 +-
xen/arch/x86/include/asm/domain.h    |  50 +++--
xen/arch/x86/include/asm/flushtlb.h  |   2 +-
xen/arch/x86/include/asm/mm.h        |  15 +-
xen/arch/x86/include/asm/processor.h |   5 +
xen/arch/x86/include/asm/pv/mm.h     |   5 +
xen/arch/x86/include/asm/smp.h       |  12 ++
xen/arch/x86/include/asm/spec_ctrl.h |   4 +
xen/arch/x86/mm.c                    | 291 +++++++++++++++++++++------
xen/arch/x86/mm/hap/hap.c            |   2 +-
xen/arch/x86/mm/paging.c             |   6 +
xen/arch/x86/mm/shadow/hvm.c         |   2 +-
xen/arch/x86/mm/shadow/multi.c       |   2 +-
xen/arch/x86/pv/descriptor-tables.c  |  47 ++---
xen/arch/x86/pv/dom0_build.c         |  12 +-
xen/arch/x86/pv/domain.c             |  57 ++++--
xen/arch/x86/pv/mm.c                 |  43 +++-
xen/arch/x86/setup.c                 |  32 ++-
xen/arch/x86/smp.c                   |  39 ++++
xen/arch/x86/smpboot.c               |  26 ++-
xen/arch/x86/spec_ctrl.c             | 205 ++++++++++++++++++-
xen/arch/x86/traps.c                 |  25 ++-
xen/arch/x86/x86_64/mm.c             |   7 +-
xen/common/smp.c                     |  10 +
xen/common/stop_machine.c            |  10 +
xen/include/xen/smp.h                |   8 +
34 files changed, 1052 insertions(+), 269 deletions(-)
[PATCH v2 00/18] x86: adventures in Address Space Isolation
Posted by Roger Pau Monne 3 weeks, 6 days ago
Hello,

The aim of this series is to introduce the functionality required to
create linear mappings visible to a single pCPU.

Doing so requires having a per-vCPU root page-table (L4), and hence
requires shadowing the guest selected L4 on PV guests.  As follow ups
(and partially to ensure the per-CPU mappings work fine) the CPU stacks
are switched to use per-CPU mappings, so that remote stack contents are
not by default mapped on all page-tables (note: for this to be true the
directmap entries for the stack pages would need to be removed also).

There's one known shortcoming with the presented code: migration of PV
guests using per-vCPU root page-tables is not working.  I need to
introduce extra logic to deal with PV shadow mode when using unique root
page-tables.  I don't think this should block the series however, such
missing functionality can always be added as follow up work.
paging_domctl() is adjusted to reflect this restriction.

The main differences compared to v1 are the usage of per-vCPU root page
tables (as opposed to per-pCPU), and the usage of the existing perdomain
family of functions to manage the mappings in the per-domain slot, that
now becomes per-vCPU.

All patches until 17 are mostly preparatory, I think there's a nice
cleanup and generalization of the creation and managing of per-domain
mappings, by no longer storing references to L1 page-tables in the vCPU
or domain struct.

Patch 13 introduces the command line option, and would need discussion
and integration with the sparse direct map series.  IMO we should get
consensus on how we want the command line to look ASAP, so that we can
basic parsing logic in place to be used by both the work here and the
direct map removal series.

As part of this series the map_domain_page() helpers are also switched
to create per-vCPU mappings (see patch 15), which converts an existing
interface into creating per-vCPU mappings.  Such interface can be used
to hide (map per-vCPU) further data that we don't want to be part of the
direct map, or even shared between vCPUs of the same domain.  Also all
existing users of the interface will already create per-vCPU mappings
without needing additional changes.

Note that none of the logic introduced in the series removes entries for
the directmap, so even when creating the per-CPU mappings the underlying
physical addresses are fully accessible when using it's direct map
entries.

I also haven't done any benchmarking.  Doesn't seem to cripple
performance up to the point that XenRT jobs would timeout before
finishing, that the only objective reference I can provide at the
moment.

The series has been extensively tested on XenRT, but that doesn't cover
all possible use-cases, so it's likely to still have some rough edges,
handle with care.

Thanks, Roger.

Roger Pau Monne (18):
  x86/mm: purge unneeded destroy_perdomain_mapping()
  x86/domain: limit window where curr_vcpu != current on context switch
  x86/mm: introduce helper to detect per-domain L1 entries that need
    freeing
  x86/pv: introduce function to populate perdomain area and use it to
    map Xen GDT
  x86/mm: switch destroy_perdomain_mapping() parameter from domain to
    vCPU
  x86/pv: set/clear guest GDT mappings using
    {populate,destroy}_perdomain_mapping()
  x86/pv: update guest LDT mappings using the linear entries
  x86/pv: remove stashing of GDT/LDT L1 page-tables
  x86/mm: simplify create_perdomain_mapping() interface
  x86/mm: switch {create,destroy}_perdomain_mapping() domain parameter
    to vCPU
  x86/pv: untie issuing FLUSH_ROOT_PGTBL from XPTI
  x86/mm: move FLUSH_ROOT_PGTBL handling before TLB flush
  x86/spec-ctrl: introduce Address Space Isolation command line option
  x86/mm: introduce per-vCPU L3 page-table
  x86/mm: introduce a per-vCPU mapcache when using ASI
  x86/pv: allow using a unique per-pCPU root page table (L4)
  x86/mm: switch to a per-CPU mapped stack when using ASI
  x86/mm: zero stack on context switch

 docs/misc/xen-command-line.pandoc    |  24 +++
 xen/arch/x86/cpu/mcheck/mce.c        |   4 +
 xen/arch/x86/domain.c                | 157 +++++++++++----
 xen/arch/x86/domain_page.c           | 105 ++++++----
 xen/arch/x86/flushtlb.c              |  28 ++-
 xen/arch/x86/hvm/hvm.c               |   6 -
 xen/arch/x86/include/asm/config.h    |  16 +-
 xen/arch/x86/include/asm/current.h   |  58 +++++-
 xen/arch/x86/include/asm/desc.h      |   6 +-
 xen/arch/x86/include/asm/domain.h    |  50 +++--
 xen/arch/x86/include/asm/flushtlb.h  |   2 +-
 xen/arch/x86/include/asm/mm.h        |  15 +-
 xen/arch/x86/include/asm/processor.h |   5 +
 xen/arch/x86/include/asm/pv/mm.h     |   5 +
 xen/arch/x86/include/asm/smp.h       |  12 ++
 xen/arch/x86/include/asm/spec_ctrl.h |   4 +
 xen/arch/x86/mm.c                    | 291 +++++++++++++++++++++------
 xen/arch/x86/mm/hap/hap.c            |   2 +-
 xen/arch/x86/mm/paging.c             |   6 +
 xen/arch/x86/mm/shadow/hvm.c         |   2 +-
 xen/arch/x86/mm/shadow/multi.c       |   2 +-
 xen/arch/x86/pv/descriptor-tables.c  |  47 ++---
 xen/arch/x86/pv/dom0_build.c         |  12 +-
 xen/arch/x86/pv/domain.c             |  57 ++++--
 xen/arch/x86/pv/mm.c                 |  43 +++-
 xen/arch/x86/setup.c                 |  32 ++-
 xen/arch/x86/smp.c                   |  39 ++++
 xen/arch/x86/smpboot.c               |  26 ++-
 xen/arch/x86/spec_ctrl.c             | 205 ++++++++++++++++++-
 xen/arch/x86/traps.c                 |  25 ++-
 xen/arch/x86/x86_64/mm.c             |   7 +-
 xen/common/smp.c                     |  10 +
 xen/common/stop_machine.c            |  10 +
 xen/include/xen/smp.h                |   8 +
 34 files changed, 1052 insertions(+), 269 deletions(-)

-- 
2.46.0
Re: [PATCH v2 00/18] x86: adventures in Address Space Isolation
Posted by Jan Beulich 3 weeks ago
On 08.01.2025 15:26, Roger Pau Monne wrote:
> Hello,
> 
> The aim of this series is to introduce the functionality required to
> create linear mappings visible to a single pCPU.
> 
> Doing so requires having a per-vCPU root page-table (L4), and hence
> requires shadowing the guest selected L4 on PV guests.  As follow ups
> (and partially to ensure the per-CPU mappings work fine) the CPU stacks
> are switched to use per-CPU mappings, so that remote stack contents are
> not by default mapped on all page-tables (note: for this to be true the
> directmap entries for the stack pages would need to be removed also).
> 
> There's one known shortcoming with the presented code: migration of PV
> guests using per-vCPU root page-tables is not working.  I need to
> introduce extra logic to deal with PV shadow mode when using unique root
> page-tables.  I don't think this should block the series however, such
> missing functionality can always be added as follow up work.
> paging_domctl() is adjusted to reflect this restriction.
> 
> The main differences compared to v1 are the usage of per-vCPU root page
> tables (as opposed to per-pCPU), and the usage of the existing perdomain
> family of functions to manage the mappings in the per-domain slot, that
> now becomes per-vCPU.
> 
> All patches until 17 are mostly preparatory, I think there's a nice
> cleanup and generalization of the creation and managing of per-domain
> mappings, by no longer storing references to L1 page-tables in the vCPU
> or domain struct.

Since you referred me to the cover letter, I've looked back here after
making some more progress with the series. Along with my earlier comment
towards the need or ultimate goal, ...

> Patch 13 introduces the command line option, and would need discussion
> and integration with the sparse direct map series.  IMO we should get
> consensus on how we want the command line to look ASAP, so that we can
> basic parsing logic in place to be used by both the work here and the
> direct map removal series.
> 
> As part of this series the map_domain_page() helpers are also switched
> to create per-vCPU mappings (see patch 15), which converts an existing
> interface into creating per-vCPU mappings.  Such interface can be used
> to hide (map per-vCPU) further data that we don't want to be part of the
> direct map, or even shared between vCPUs of the same domain.  Also all
> existing users of the interface will already create per-vCPU mappings
> without needing additional changes.
> 
> Note that none of the logic introduced in the series removes entries for
> the directmap, so even when creating the per-CPU mappings the underlying
> physical addresses are fully accessible when using it's direct map
> entries.
> 
> I also haven't done any benchmarking.  Doesn't seem to cripple
> performance up to the point that XenRT jobs would timeout before
> finishing, that the only objective reference I can provide at the
> moment.
> 
> The series has been extensively tested on XenRT, but that doesn't cover
> all possible use-cases, so it's likely to still have some rough edges,
> handle with care.
> 
> Thanks, Roger.
> 
> Roger Pau Monne (18):
>   x86/mm: purge unneeded destroy_perdomain_mapping()
>   x86/domain: limit window where curr_vcpu != current on context switch
>   x86/mm: introduce helper to detect per-domain L1 entries that need
>     freeing
>   x86/pv: introduce function to populate perdomain area and use it to
>     map Xen GDT
>   x86/mm: switch destroy_perdomain_mapping() parameter from domain to
>     vCPU
>   x86/pv: set/clear guest GDT mappings using
>     {populate,destroy}_perdomain_mapping()
>   x86/pv: update guest LDT mappings using the linear entries
>   x86/pv: remove stashing of GDT/LDT L1 page-tables
>   x86/mm: simplify create_perdomain_mapping() interface
>   x86/mm: switch {create,destroy}_perdomain_mapping() domain parameter
>     to vCPU
>   x86/pv: untie issuing FLUSH_ROOT_PGTBL from XPTI
>   x86/mm: move FLUSH_ROOT_PGTBL handling before TLB flush
>   x86/spec-ctrl: introduce Address Space Isolation command line option
>   x86/mm: introduce per-vCPU L3 page-table
>   x86/mm: introduce a per-vCPU mapcache when using ASI
>   x86/pv: allow using a unique per-pCPU root page table (L4)
>   x86/mm: switch to a per-CPU mapped stack when using ASI
>   x86/mm: zero stack on context switch
> 
>  docs/misc/xen-command-line.pandoc    |  24 +++
>  xen/arch/x86/cpu/mcheck/mce.c        |   4 +
>  xen/arch/x86/domain.c                | 157 +++++++++++----
>  xen/arch/x86/domain_page.c           | 105 ++++++----
>  xen/arch/x86/flushtlb.c              |  28 ++-
>  xen/arch/x86/hvm/hvm.c               |   6 -
>  xen/arch/x86/include/asm/config.h    |  16 +-
>  xen/arch/x86/include/asm/current.h   |  58 +++++-
>  xen/arch/x86/include/asm/desc.h      |   6 +-
>  xen/arch/x86/include/asm/domain.h    |  50 +++--
>  xen/arch/x86/include/asm/flushtlb.h  |   2 +-
>  xen/arch/x86/include/asm/mm.h        |  15 +-
>  xen/arch/x86/include/asm/processor.h |   5 +
>  xen/arch/x86/include/asm/pv/mm.h     |   5 +
>  xen/arch/x86/include/asm/smp.h       |  12 ++
>  xen/arch/x86/include/asm/spec_ctrl.h |   4 +
>  xen/arch/x86/mm.c                    | 291 +++++++++++++++++++++------
>  xen/arch/x86/mm/hap/hap.c            |   2 +-
>  xen/arch/x86/mm/paging.c             |   6 +
>  xen/arch/x86/mm/shadow/hvm.c         |   2 +-
>  xen/arch/x86/mm/shadow/multi.c       |   2 +-
>  xen/arch/x86/pv/descriptor-tables.c  |  47 ++---
>  xen/arch/x86/pv/dom0_build.c         |  12 +-
>  xen/arch/x86/pv/domain.c             |  57 ++++--
>  xen/arch/x86/pv/mm.c                 |  43 +++-
>  xen/arch/x86/setup.c                 |  32 ++-
>  xen/arch/x86/smp.c                   |  39 ++++
>  xen/arch/x86/smpboot.c               |  26 ++-
>  xen/arch/x86/spec_ctrl.c             | 205 ++++++++++++++++++-
>  xen/arch/x86/traps.c                 |  25 ++-
>  xen/arch/x86/x86_64/mm.c             |   7 +-
>  xen/common/smp.c                     |  10 +
>  xen/common/stop_machine.c            |  10 +
>  xen/include/xen/smp.h                |   8 +
>  34 files changed, 1052 insertions(+), 269 deletions(-)

... this diffstat (even after subtracting out the contribution of the last two
patches in the series) doesn't really look like a cleanup / simplification.
Things becoming slightly slower (because of the L1 no longer directly available
to modify) may, otoh, not be a significant issue, if we assume that GDT/LDT
manipulation isn't normally a very frequent operation.

IOW my earlier request stands: Can you please try to make more clear (in the
patch descriptions) what exactly the motivation for these changes is? Just
doing things differently with more code overall can't be it, I don't think.

Jan
Re: [PATCH v2 00/18] x86: adventures in Address Space Isolation
Posted by Roger Pau Monné 2 weeks, 4 days ago
On Tue, Jan 14, 2025 at 05:20:04PM +0100, Jan Beulich wrote:
> On 08.01.2025 15:26, Roger Pau Monne wrote:
> > Hello,
> > 
> > The aim of this series is to introduce the functionality required to
> > create linear mappings visible to a single pCPU.
> > 
> > Doing so requires having a per-vCPU root page-table (L4), and hence
> > requires shadowing the guest selected L4 on PV guests.  As follow ups
> > (and partially to ensure the per-CPU mappings work fine) the CPU stacks
> > are switched to use per-CPU mappings, so that remote stack contents are
> > not by default mapped on all page-tables (note: for this to be true the
> > directmap entries for the stack pages would need to be removed also).
> > 
> > There's one known shortcoming with the presented code: migration of PV
> > guests using per-vCPU root page-tables is not working.  I need to
> > introduce extra logic to deal with PV shadow mode when using unique root
> > page-tables.  I don't think this should block the series however, such
> > missing functionality can always be added as follow up work.
> > paging_domctl() is adjusted to reflect this restriction.
> > 
> > The main differences compared to v1 are the usage of per-vCPU root page
> > tables (as opposed to per-pCPU), and the usage of the existing perdomain
> > family of functions to manage the mappings in the per-domain slot, that
> > now becomes per-vCPU.
> > 
> > All patches until 17 are mostly preparatory, I think there's a nice
> > cleanup and generalization of the creation and managing of per-domain
> > mappings, by no longer storing references to L1 page-tables in the vCPU
> > or domain struct.
> 
> Since you referred me to the cover letter, I've looked back here after
> making some more progress with the series. Along with my earlier comment
> towards the need or ultimate goal, ...
> 
> > Patch 13 introduces the command line option, and would need discussion
> > and integration with the sparse direct map series.  IMO we should get
> > consensus on how we want the command line to look ASAP, so that we can
> > basic parsing logic in place to be used by both the work here and the
> > direct map removal series.
> > 
> > As part of this series the map_domain_page() helpers are also switched
> > to create per-vCPU mappings (see patch 15), which converts an existing
> > interface into creating per-vCPU mappings.  Such interface can be used
> > to hide (map per-vCPU) further data that we don't want to be part of the
> > direct map, or even shared between vCPUs of the same domain.  Also all
> > existing users of the interface will already create per-vCPU mappings
> > without needing additional changes.
> > 
> > Note that none of the logic introduced in the series removes entries for
> > the directmap, so even when creating the per-CPU mappings the underlying
> > physical addresses are fully accessible when using it's direct map
> > entries.
> > 
> > I also haven't done any benchmarking.  Doesn't seem to cripple
> > performance up to the point that XenRT jobs would timeout before
> > finishing, that the only objective reference I can provide at the
> > moment.
> > 
> > The series has been extensively tested on XenRT, but that doesn't cover
> > all possible use-cases, so it's likely to still have some rough edges,
> > handle with care.
> > 
> > Thanks, Roger.
> > 
> > Roger Pau Monne (18):
> >   x86/mm: purge unneeded destroy_perdomain_mapping()
> >   x86/domain: limit window where curr_vcpu != current on context switch
> >   x86/mm: introduce helper to detect per-domain L1 entries that need
> >     freeing
> >   x86/pv: introduce function to populate perdomain area and use it to
> >     map Xen GDT
> >   x86/mm: switch destroy_perdomain_mapping() parameter from domain to
> >     vCPU
> >   x86/pv: set/clear guest GDT mappings using
> >     {populate,destroy}_perdomain_mapping()
> >   x86/pv: update guest LDT mappings using the linear entries
> >   x86/pv: remove stashing of GDT/LDT L1 page-tables
> >   x86/mm: simplify create_perdomain_mapping() interface
> >   x86/mm: switch {create,destroy}_perdomain_mapping() domain parameter
> >     to vCPU
> >   x86/pv: untie issuing FLUSH_ROOT_PGTBL from XPTI
> >   x86/mm: move FLUSH_ROOT_PGTBL handling before TLB flush
> >   x86/spec-ctrl: introduce Address Space Isolation command line option
> >   x86/mm: introduce per-vCPU L3 page-table
> >   x86/mm: introduce a per-vCPU mapcache when using ASI
> >   x86/pv: allow using a unique per-pCPU root page table (L4)
> >   x86/mm: switch to a per-CPU mapped stack when using ASI
> >   x86/mm: zero stack on context switch
> > 
> >  docs/misc/xen-command-line.pandoc    |  24 +++
> >  xen/arch/x86/cpu/mcheck/mce.c        |   4 +
> >  xen/arch/x86/domain.c                | 157 +++++++++++----
> >  xen/arch/x86/domain_page.c           | 105 ++++++----
> >  xen/arch/x86/flushtlb.c              |  28 ++-
> >  xen/arch/x86/hvm/hvm.c               |   6 -
> >  xen/arch/x86/include/asm/config.h    |  16 +-
> >  xen/arch/x86/include/asm/current.h   |  58 +++++-
> >  xen/arch/x86/include/asm/desc.h      |   6 +-
> >  xen/arch/x86/include/asm/domain.h    |  50 +++--
> >  xen/arch/x86/include/asm/flushtlb.h  |   2 +-
> >  xen/arch/x86/include/asm/mm.h        |  15 +-
> >  xen/arch/x86/include/asm/processor.h |   5 +
> >  xen/arch/x86/include/asm/pv/mm.h     |   5 +
> >  xen/arch/x86/include/asm/smp.h       |  12 ++
> >  xen/arch/x86/include/asm/spec_ctrl.h |   4 +
> >  xen/arch/x86/mm.c                    | 291 +++++++++++++++++++++------
> >  xen/arch/x86/mm/hap/hap.c            |   2 +-
> >  xen/arch/x86/mm/paging.c             |   6 +
> >  xen/arch/x86/mm/shadow/hvm.c         |   2 +-
> >  xen/arch/x86/mm/shadow/multi.c       |   2 +-
> >  xen/arch/x86/pv/descriptor-tables.c  |  47 ++---
> >  xen/arch/x86/pv/dom0_build.c         |  12 +-
> >  xen/arch/x86/pv/domain.c             |  57 ++++--
> >  xen/arch/x86/pv/mm.c                 |  43 +++-
> >  xen/arch/x86/setup.c                 |  32 ++-
> >  xen/arch/x86/smp.c                   |  39 ++++
> >  xen/arch/x86/smpboot.c               |  26 ++-
> >  xen/arch/x86/spec_ctrl.c             | 205 ++++++++++++++++++-
> >  xen/arch/x86/traps.c                 |  25 ++-
> >  xen/arch/x86/x86_64/mm.c             |   7 +-
> >  xen/common/smp.c                     |  10 +
> >  xen/common/stop_machine.c            |  10 +
> >  xen/include/xen/smp.h                |   8 +
> >  34 files changed, 1052 insertions(+), 269 deletions(-)
> 
> ... this diffstat (even after subtracting out the contribution of the last two
> patches in the series) doesn't really look like a cleanup / simplification.

To be fair you would need to subtract the contribution of the last 8
patches, as all those are strictly related to ASI.  The perdomain
mapping interface cleanup is just the first 10 patches.  Which leaves
a diffstat of:

 xen/arch/x86/domain.c                |  81 ++++++++++++----
 xen/arch/x86/domain_page.c           |  19 ++--
 xen/arch/x86/hvm/hvm.c               |   6 --
 xen/arch/x86/include/asm/desc.h      |   6 +-
 xen/arch/x86/include/asm/domain.h    |  13 +--
 xen/arch/x86/include/asm/mm.h        |  11 ++-
 xen/arch/x86/include/asm/processor.h |   5 +
 xen/arch/x86/mm.c                    | 175 ++++++++++++++++++++++++++---------
 xen/arch/x86/pv/descriptor-tables.c  |  47 +++++-----
 xen/arch/x86/pv/domain.c             |  24 ++---
 xen/arch/x86/pv/mm.c                 |   3 +-
 xen/arch/x86/smpboot.c               |   6 +-
 xen/arch/x86/traps.c                 |  12 +--
 xen/arch/x86/x86_64/mm.c             |   7 +-
 14 files changed, 260 insertions(+), 155 deletions(-)

That's including the context switch change and not differentiating
between lines of code vs comments.

However, I don't think cleanup / simplifications should be purely
based on diffstat LoC.  Arguably the current
create_perdomain_mapping() set of parameters are not the most obvious
ones:

int create_perdomain_mapping(struct domain *d, unsigned long va,
                             unsigned int nr, l1_pgentry_t **pl1tab,
                             struct page_info **ppg);

Compared to the result after the first 10 patches in the series:

int create_perdomain_mapping(struct vcpu *v, unsigned long va,
                             unsigned int nr, bool populate);

Together with the fact that callers no longer need to keep a reference
to the L1(s) tables to populate such area.

> Things becoming slightly slower (because of the L1 no longer directly available
> to modify) may, otoh, not be a significant issue, if we assume that GDT/LDT
> manipulation isn't normally a very frequent operation.

I introduce a fast path in both {populate,destroy}_perdomain_mapping()
that uses the recursive linear slot for manipulating the L1
page-table.  There's still a slow path that relies on walking the
page-tables, but that should only be used when the vCPU is not
running, and hence the added latency shouldn't be too critical.

As a side-effect of this logic the pages allocated for the per-domain
region page-tables can now uniformly be from domheap.  The usage of
xenheap pages for L1 page-tables is no longer needed once those are
not stashed in the domain structure anymore.

> IOW my earlier request stands: Can you please try to make more clear (in the
> patch descriptions) what exactly the motivation for these changes is? Just
> doing things differently with more code overall can't be it, I don't think.

The main motivation for the change is to remove stashing L1
page-tables for the per-domain area(s) in the domain struct, as with
the introduction of ASI Xen would need to stash L1 page-tables for the
per-domain area on the vcpu struct also, as a result of the per-domain
slot becoming per-vCPU.  IMO managing such references, and having
logic to deal with domain and vcpu L1 page-tables is too complex and
error prone.

Instead I propose to add an interface:
{populate,destroy}_perdomain_mapping() that can be used to manage
mappings on the per-domain area with callers being completely unaware
of whether the domain is running with per-vCPU mappings or not.

Please let me know if you are happy with the reasoning and arguments
provided.  I think the resulting perdomain mapping interface is much
better than what Xen currently does for manipulating per-domain
mapping entries, but I might be spoiled.

Thanks, Roger.