[PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange

Stefano Stabellini posted 1 patch 7 months, 3 weeks ago
Failed in applying to current master (apply log)
[PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 3 weeks ago
From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>

Dom0 PVH might need XENMEM_exchange when passing contiguous memory
addresses to firmware or co-processors not behind an IOMMU.

XENMEM_exchange was blocked for HVM/PVH DomUs, and accidentally it
impacted Dom0 PVH as well.

Permit Dom0 PVH to call XENMEM_exchange while leaving it blocked for
HVM/PVH DomUs.

Signed-off-by: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 1cf2365167..e995944333 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4401,7 +4401,7 @@ int steal_page(
     const struct domain *owner;
     int rc;
 
-    if ( paging_mode_external(d) )
+    if ( paging_mode_external(d) && !is_hardware_domain(d) )
         return -EOPNOTSUPP;
 
     /* Grab a reference to make sure the page doesn't change under our feet */
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 8ca4e1a842..796eec081b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
             rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
                                         exch.out.extent_order) ?: rc;
 
-            if ( !paging_mode_translate(d) &&
+            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
                  __copy_mfn_to_guest_offset(exch.out.extent_start,
                                             (i << out_chunk_order) + j,
                                             mfn) )
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Teddy Astie 7 months, 3 weeks ago
Hello Stefano,

Le 25/04/2025 à 22:21, Stefano Stabellini a écrit :
> From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> 
> Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> addresses to firmware or co-processors not behind an IOMMU.
> 
> XENMEM_exchange was blocked for HVM/PVH DomUs, and accidentally it
> impacted Dom0 PVH as well.
> 
> Permit Dom0 PVH to call XENMEM_exchange while leaving it blocked for
> HVM/PVH DomUs.
> 

In addition to Jan's remarks, I think it wants some additional 
clarifications on the hypercall interface. public/memory.h indicates 
"only PV guests can use this operation", so the interface is actually 
unspecified about HVM guests (e.g what are MFN/GMFN in this case ?).

> Signed-off-by: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
> 
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 1cf2365167..e995944333 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4401,7 +4401,7 @@ int steal_page(
>       const struct domain *owner;
>       int rc;
>   
> -    if ( paging_mode_external(d) )
> +    if ( paging_mode_external(d) && !is_hardware_domain(d) )
>           return -EOPNOTSUPP;
>   
>       /* Grab a reference to make sure the page doesn't change under our feet */
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 8ca4e1a842..796eec081b 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>               rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
>                                           exch.out.extent_order) ?: rc;
>   
> -            if ( !paging_mode_translate(d) &&
> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
>                    __copy_mfn_to_guest_offset(exch.out.extent_start,
>                                               (i << out_chunk_order) + j,
>                                               mfn) )
> 

Teddy


 | Vates

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 3 weeks ago
On Mon, 28 Apr 2025, Teddy Astie wrote:
> Hello Stefano,
> 
> Le 25/04/2025 à 22:21, Stefano Stabellini a écrit :
> > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> >
> > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > addresses to firmware or co-processors not behind an IOMMU.
> >
> > XENMEM_exchange was blocked for HVM/PVH DomUs, and accidentally it
> > impacted Dom0 PVH as well.
> >
> > Permit Dom0 PVH to call XENMEM_exchange while leaving it blocked for
> > HVM/PVH DomUs.
> >
> 
> In addition to Jan's remarks, I think it wants some additional
> clarifications on the hypercall interface. public/memory.h indicates
> "only PV guests can use this operation", so the interface is actually
> unspecified about HVM guests (e.g what are MFN/GMFN in this case ?).

That is a new addition from c08a11ab98c. If you see fae7d5be8bb, there
is a statement that "we permitted this operation".


> > Signed-off-by: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
> >
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > index 1cf2365167..e995944333 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -4401,7 +4401,7 @@ int steal_page(
> >       const struct domain *owner;
> >       int rc;
> >
> > -    if ( paging_mode_external(d) )
> > +    if ( paging_mode_external(d) && !is_hardware_domain(d) )
> >           return -EOPNOTSUPP;
> >
> >       /* Grab a reference to make sure the page doesn't change under our feet */
> > diff --git a/xen/common/memory.c b/xen/common/memory.c
> > index 8ca4e1a842..796eec081b 100644
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> >               rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
> >                                           exch.out.extent_order) ?: rc;
> >
> > -            if ( !paging_mode_translate(d) &&
> > +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
> >                    __copy_mfn_to_guest_offset(exch.out.extent_start,
> >                                               (i << out_chunk_order) + j,
> >                                               mfn) )
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Jan Beulich 7 months, 3 weeks ago
On 28.04.2025 22:02, Stefano Stabellini wrote:
> On Mon, 28 Apr 2025, Teddy Astie wrote:
>> Hello Stefano,
>>
>> Le 25/04/2025 à 22:21, Stefano Stabellini a écrit :
>>> From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
>>>
>>> Dom0 PVH might need XENMEM_exchange when passing contiguous memory
>>> addresses to firmware or co-processors not behind an IOMMU.
>>>
>>> XENMEM_exchange was blocked for HVM/PVH DomUs, and accidentally it
>>> impacted Dom0 PVH as well.
>>>
>>> Permit Dom0 PVH to call XENMEM_exchange while leaving it blocked for
>>> HVM/PVH DomUs.
>>>
>>
>> In addition to Jan's remarks, I think it wants some additional
>> clarifications on the hypercall interface. public/memory.h indicates
>> "only PV guests can use this operation", so the interface is actually
>> unspecified about HVM guests (e.g what are MFN/GMFN in this case ?).
> 
> That is a new addition from c08a11ab98c.

Later addition or not, I agree with Teddy that it wants amending.

Jan

> If you see fae7d5be8bb, there
> is a statement that "we permitted this operation".


Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Jan Beulich 7 months, 3 weeks ago
On 25.04.2025 22:19, Stefano Stabellini wrote:
> From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> 
> Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> addresses to firmware or co-processors not behind an IOMMU.

I definitely don't understand the firmware part: It's subject to the
same transparent P2M translations as the rest of the VM; it's just
another piece of software running there.

"Co-processors not behind an IOMMU" is also interesting; a more
concrete scenario might be nice, yet I realize you may be limited in
what you're allowed to say.

> XENMEM_exchange was blocked for HVM/PVH DomUs, and accidentally it
> impacted Dom0 PVH as well.

This wasn't accidental at all, I don't think.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4401,7 +4401,7 @@ int steal_page(
>      const struct domain *owner;
>      int rc;
>  
> -    if ( paging_mode_external(d) )
> +    if ( paging_mode_external(d) && !is_hardware_domain(d) )
>          return -EOPNOTSUPP;
>  
>      /* Grab a reference to make sure the page doesn't change under our feet */

Is this (in particular the code following below here) a safe thing to do
when we don't properly refcount page references from the P2M, yet? It's
Dom0, yes, but even there I might see potential security implications (as
top violating privacy of a guest).

Furthermore cleanup_page_mappings() (called later in the function) has a
PV-only aspect which would apparently need widening to PVH Dom0 then,
too.

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>              rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
>                                          exch.out.extent_order) ?: rc;
>  
> -            if ( !paging_mode_translate(d) &&
> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
>                   __copy_mfn_to_guest_offset(exch.out.extent_start,
>                                              (i << out_chunk_order) + j,
>                                              mfn) )

Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
it?

Jan
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 3 weeks ago
On Mon, 28 Apr 2025, Jan Beulich wrote:
> On 25.04.2025 22:19, Stefano Stabellini wrote:
> > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > 
> > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > addresses to firmware or co-processors not behind an IOMMU.
> 
> I definitely don't understand the firmware part: It's subject to the
> same transparent P2M translations as the rest of the VM; it's just
> another piece of software running there.
> 
> "Co-processors not behind an IOMMU" is also interesting; a more
> concrete scenario might be nice, yet I realize you may be limited in
> what you're allowed to say.

Sure. On AMD x86 platforms there is a co-processor called PSP running
TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
pass addresses to it.  See drivers/tee/amdtee/ and
include/linux/psp-tee.h in Linux.

This is not a new problem. On ARM we have been dealing with this kind of
issues for more than a decade and it is the reason why originally the
decision was to run Dom0 1:1 mapped on ARM. I am not suggesting to map
Dom0 PVH 1:1; I am only providing context and highlighting that we
have been lucky on x86 :-)


> > XENMEM_exchange was blocked for HVM/PVH DomUs, and accidentally it
> > impacted Dom0 PVH as well.
> 
> This wasn't accidental at all, I don't think.

You as the original author of the patch (fae7d5be8bb) in question would
surely know better. But the way the commit message was written was
pointing toward a Dom0/DeviceModel problem:

"The operation's success can't be controlled by the guest, as the device
model may have an active mapping of the page."


> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -4401,7 +4401,7 @@ int steal_page(
> >      const struct domain *owner;
> >      int rc;
> >  
> > -    if ( paging_mode_external(d) )
> > +    if ( paging_mode_external(d) && !is_hardware_domain(d) )
> >          return -EOPNOTSUPP;
> >  
> >      /* Grab a reference to make sure the page doesn't change under our feet */
> 
> Is this (in particular the code following below here) a safe thing to do
> when we don't properly refcount page references from the P2M, yet? It's
> Dom0, yes, but even there I might see potential security implications (as
> top violating privacy of a guest).

I don't think I am following, could you please elaborate more? The
change I am proposing is to allow Dom0 to share its own pages to the
co-processor. DomUs are not in the picture. I would be happy to add
further restriction to that effect. Is there something else you have in
mind?


> Furthermore cleanup_page_mappings() (called later in the function) has a
> PV-only aspect which would apparently need widening to PVH Dom0 then,
> too.

You are referring to:

        if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
            rc = iommu_legacy_unmap(d, _dfn(mfn), 1u << PAGE_ORDER_4K);

is that correct?


> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> >              rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
> >                                          exch.out.extent_order) ?: rc;
> >  
> > -            if ( !paging_mode_translate(d) &&
> > +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
> >                   __copy_mfn_to_guest_offset(exch.out.extent_start,
> >                                              (i << out_chunk_order) + j,
> >                                              mfn) )
> 
> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
> it?

One way or another Dom0 PVH needs to know the MFN to pass it to the
co-processor.
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Marek Marczykowski-Górecki 7 months, 2 weeks ago
On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> On Mon, 28 Apr 2025, Jan Beulich wrote:
> > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > 
> > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > addresses to firmware or co-processors not behind an IOMMU.
> > 
> > I definitely don't understand the firmware part: It's subject to the
> > same transparent P2M translations as the rest of the VM; it's just
> > another piece of software running there.
> > 
> > "Co-processors not behind an IOMMU" is also interesting; a more
> > concrete scenario might be nice, yet I realize you may be limited in
> > what you're allowed to say.
> 
> Sure. On AMD x86 platforms there is a co-processor called PSP running
> TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> pass addresses to it.  See drivers/tee/amdtee/ and
> include/linux/psp-tee.h in Linux.

We had (have?) similar issue with amdgpu (for integrated graphics) - it
uses PSP for loading its firmware. With PV dom0 there is a workaround as
dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
the one I used for debugging this issue).

References:
https://lists.xenproject.org/archives/html/xen-devel/2022-06/msg01660.html
https://github.com/QubesOS/qubes-issues/issues/6923

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Roger Pau Monné 7 months, 2 weeks ago
On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> > On Mon, 28 Apr 2025, Jan Beulich wrote:
> > > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > > 
> > > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > > addresses to firmware or co-processors not behind an IOMMU.
> > > 
> > > I definitely don't understand the firmware part: It's subject to the
> > > same transparent P2M translations as the rest of the VM; it's just
> > > another piece of software running there.
> > > 
> > > "Co-processors not behind an IOMMU" is also interesting; a more
> > > concrete scenario might be nice, yet I realize you may be limited in
> > > what you're allowed to say.
> > 
> > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > pass addresses to it.  See drivers/tee/amdtee/ and
> > include/linux/psp-tee.h in Linux.
> 
> We had (have?) similar issue with amdgpu (for integrated graphics) - it
> uses PSP for loading its firmware. With PV dom0 there is a workaround as
> dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> the one I used for debugging this issue).

That's ugly, and problematic when used in conjunction with AMD-SEV.

I wonder if Xen could emulate/mediate some parts of the PSP for dom0
to use, while allowing Xen to be the sole owner of the device.  Having
both Xen and dom0 use it (for different purposes) seems like asking
for trouble.  But I also have no idea how complex the PSP interface
is, neither whether it would be feasible to emulate the
interfaces/registers needed for firmware loading.

Regards, Roger.

Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 2 weeks ago
On Mon, 5 May 2025, Roger Pau Monné wrote:
> On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> > On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> > > On Mon, 28 Apr 2025, Jan Beulich wrote:
> > > > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > > > 
> > > > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > > > addresses to firmware or co-processors not behind an IOMMU.
> > > > 
> > > > I definitely don't understand the firmware part: It's subject to the
> > > > same transparent P2M translations as the rest of the VM; it's just
> > > > another piece of software running there.
> > > > 
> > > > "Co-processors not behind an IOMMU" is also interesting; a more
> > > > concrete scenario might be nice, yet I realize you may be limited in
> > > > what you're allowed to say.
> > > 
> > > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > > pass addresses to it.  See drivers/tee/amdtee/ and
> > > include/linux/psp-tee.h in Linux.
> > 
> > We had (have?) similar issue with amdgpu (for integrated graphics) - it
> > uses PSP for loading its firmware. With PV dom0 there is a workaround as
> > dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> > expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> > the one I used for debugging this issue).
> 
> That's ugly, and problematic when used in conjunction with AMD-SEV.
> 
> I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> to use, while allowing Xen to be the sole owner of the device.  Having
> both Xen and dom0 use it (for different purposes) seems like asking
> for trouble.  But I also have no idea how complex the PSP interface
> is, neither whether it would be feasible to emulate the
> interfaces/registers needed for firmware loading.

Let me take a step back from the PSP for a moment. I am not opposed to a
PSP mediator in Xen, but I want to emphasize that the issue is more
general and extends well beyond the PSP.

In my years working in embedded systems, I have consistently seen cases
where Dom0 needs to communicate with something that does not go through
the IOMMU. This could be due to special firmware on a co-processor, a
hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
device that technically supports the IOMMU but performs poorly unless
the IOMMU is disabled. All of these are real-world examples that I have
seen personally.

In my opinion, we definitely need a solution like this patch for Dom0
PVH to function correctly in all scenarios.

Additionally, we could add a PSP mediator in Xen to provide best PSP
support, and also for cases where both Xen and Dom0 need access, but I
cannot fully assess the complexity involved, as I am not very familiar
with the API. Probably it is not going to be simple.

Even with the PSP mediator in place, I would still recommend this patch.
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Roger Pau Monné 7 months, 2 weeks ago
On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> On Mon, 5 May 2025, Roger Pau Monné wrote:
> > On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> > > > On Mon, 28 Apr 2025, Jan Beulich wrote:
> > > > > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > > > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > > > > 
> > > > > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > > > > addresses to firmware or co-processors not behind an IOMMU.
> > > > > 
> > > > > I definitely don't understand the firmware part: It's subject to the
> > > > > same transparent P2M translations as the rest of the VM; it's just
> > > > > another piece of software running there.
> > > > > 
> > > > > "Co-processors not behind an IOMMU" is also interesting; a more
> > > > > concrete scenario might be nice, yet I realize you may be limited in
> > > > > what you're allowed to say.
> > > > 
> > > > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > > > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > > > pass addresses to it.  See drivers/tee/amdtee/ and
> > > > include/linux/psp-tee.h in Linux.
> > > 
> > > We had (have?) similar issue with amdgpu (for integrated graphics) - it
> > > uses PSP for loading its firmware. With PV dom0 there is a workaround as
> > > dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> > > expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> > > the one I used for debugging this issue).
> > 
> > That's ugly, and problematic when used in conjunction with AMD-SEV.
> > 
> > I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> > to use, while allowing Xen to be the sole owner of the device.  Having
> > both Xen and dom0 use it (for different purposes) seems like asking
> > for trouble.  But I also have no idea how complex the PSP interface
> > is, neither whether it would be feasible to emulate the
> > interfaces/registers needed for firmware loading.
> 
> Let me take a step back from the PSP for a moment. I am not opposed to a
> PSP mediator in Xen, but I want to emphasize that the issue is more
> general and extends well beyond the PSP.
> 
> In my years working in embedded systems, I have consistently seen cases
> where Dom0 needs to communicate with something that does not go through
> the IOMMU. This could be due to special firmware on a co-processor, a
> hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> device that technically supports the IOMMU but performs poorly unless
> the IOMMU is disabled. All of these are real-world examples that I have
> seen personally.

I wouldn't be surprised, classic PV dom0 avoided those issues because
it was dealing directly with host addresses (mfns), but that's not the
case with PVH dom0.

> In my opinion, we definitely need a solution like this patch for Dom0
> PVH to function correctly in all scenarios.

I'm not opposed to having such interface available for PVH hardware
domains.  I find it ugly, but I don't see much other way to deal with
those kind of "devices".  Xen mediating accesses for each one of them
is unlikely to be doable.

How do you hook this exchange interface into Linux to differentiate
which drivers need to use mfns when interacting with the hardware?

> Additionally, we could add a PSP mediator in Xen to provide best PSP
> support, and also for cases where both Xen and Dom0 need access, but I
> cannot fully assess the complexity involved, as I am not very familiar
> with the API. Probably it is not going to be simple.

For the specific PSP example, and kind of following from the question
above: if we add some mediation layer in Xen for PSP registers, how
could Linux differentiate whether it needs to use mfn of gfns when
interacting with the device?

Thanks, Roger.

Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 2 weeks ago
On Tue, 6 May 2025, Roger Pau Monné wrote:
> On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> > On Mon, 5 May 2025, Roger Pau Monné wrote:
> > > On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> > > > On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> > > > > On Mon, 28 Apr 2025, Jan Beulich wrote:
> > > > > > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > > > > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > > > > > 
> > > > > > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > > > > > addresses to firmware or co-processors not behind an IOMMU.
> > > > > > 
> > > > > > I definitely don't understand the firmware part: It's subject to the
> > > > > > same transparent P2M translations as the rest of the VM; it's just
> > > > > > another piece of software running there.
> > > > > > 
> > > > > > "Co-processors not behind an IOMMU" is also interesting; a more
> > > > > > concrete scenario might be nice, yet I realize you may be limited in
> > > > > > what you're allowed to say.
> > > > > 
> > > > > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > > > > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > > > > pass addresses to it.  See drivers/tee/amdtee/ and
> > > > > include/linux/psp-tee.h in Linux.
> > > > 
> > > > We had (have?) similar issue with amdgpu (for integrated graphics) - it
> > > > uses PSP for loading its firmware. With PV dom0 there is a workaround as
> > > > dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> > > > expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> > > > the one I used for debugging this issue).
> > > 
> > > That's ugly, and problematic when used in conjunction with AMD-SEV.
> > > 
> > > I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> > > to use, while allowing Xen to be the sole owner of the device.  Having
> > > both Xen and dom0 use it (for different purposes) seems like asking
> > > for trouble.  But I also have no idea how complex the PSP interface
> > > is, neither whether it would be feasible to emulate the
> > > interfaces/registers needed for firmware loading.
> > 
> > Let me take a step back from the PSP for a moment. I am not opposed to a
> > PSP mediator in Xen, but I want to emphasize that the issue is more
> > general and extends well beyond the PSP.
> > 
> > In my years working in embedded systems, I have consistently seen cases
> > where Dom0 needs to communicate with something that does not go through
> > the IOMMU. This could be due to special firmware on a co-processor, a
> > hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> > device that technically supports the IOMMU but performs poorly unless
> > the IOMMU is disabled. All of these are real-world examples that I have
> > seen personally.
> 
> I wouldn't be surprised, classic PV dom0 avoided those issues because
> it was dealing directly with host addresses (mfns), but that's not the
> case with PVH dom0.

Yeah


> > In my opinion, we definitely need a solution like this patch for Dom0
> > PVH to function correctly in all scenarios.
> 
> I'm not opposed to having such interface available for PVH hardware
> domains.  I find it ugly, but I don't see much other way to deal with
> those kind of "devices".  Xen mediating accesses for each one of them
> is unlikely to be doable.
> 
> How do you hook this exchange interface into Linux to differentiate
> which drivers need to use mfns when interacting with the hardware?

In the specific case we have at hands the driver is in Linux userspace
and is specially-written for our use case. It is not generic, so we
don't have this problem. But your question is valid.

In Linux, the issue is hidden behind drivers/xen/swiotlb-xen.c and
xen_arch_need_swiotlb. There are a few options:
- force swiotlb bounce for everything on the problematic SoC
- edit xen_arch_need_swiotlb to return true for the problematic device
- introduce a kernel command line option to specify which device
  xen_arch_need_swiotlb should return true for
- introduce an ACPI table with the relevant info
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Roger Pau Monné 7 months, 1 week ago
On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
> On Tue, 6 May 2025, Roger Pau Monné wrote:
> > On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> > > On Mon, 5 May 2025, Roger Pau Monné wrote:
> > > > On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> > > > > On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> > > > > > On Mon, 28 Apr 2025, Jan Beulich wrote:
> > > > > > > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > > > > > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > > > > > > 
> > > > > > > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > > > > > > addresses to firmware or co-processors not behind an IOMMU.
> > > > > > > 
> > > > > > > I definitely don't understand the firmware part: It's subject to the
> > > > > > > same transparent P2M translations as the rest of the VM; it's just
> > > > > > > another piece of software running there.
> > > > > > > 
> > > > > > > "Co-processors not behind an IOMMU" is also interesting; a more
> > > > > > > concrete scenario might be nice, yet I realize you may be limited in
> > > > > > > what you're allowed to say.
> > > > > > 
> > > > > > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > > > > > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > > > > > pass addresses to it.  See drivers/tee/amdtee/ and
> > > > > > include/linux/psp-tee.h in Linux.
> > > > > 
> > > > > We had (have?) similar issue with amdgpu (for integrated graphics) - it
> > > > > uses PSP for loading its firmware. With PV dom0 there is a workaround as
> > > > > dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> > > > > expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> > > > > the one I used for debugging this issue).
> > > > 
> > > > That's ugly, and problematic when used in conjunction with AMD-SEV.
> > > > 
> > > > I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> > > > to use, while allowing Xen to be the sole owner of the device.  Having
> > > > both Xen and dom0 use it (for different purposes) seems like asking
> > > > for trouble.  But I also have no idea how complex the PSP interface
> > > > is, neither whether it would be feasible to emulate the
> > > > interfaces/registers needed for firmware loading.
> > > 
> > > Let me take a step back from the PSP for a moment. I am not opposed to a
> > > PSP mediator in Xen, but I want to emphasize that the issue is more
> > > general and extends well beyond the PSP.
> > > 
> > > In my years working in embedded systems, I have consistently seen cases
> > > where Dom0 needs to communicate with something that does not go through
> > > the IOMMU. This could be due to special firmware on a co-processor, a
> > > hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> > > device that technically supports the IOMMU but performs poorly unless
> > > the IOMMU is disabled. All of these are real-world examples that I have
> > > seen personally.
> > 
> > I wouldn't be surprised, classic PV dom0 avoided those issues because
> > it was dealing directly with host addresses (mfns), but that's not the
> > case with PVH dom0.
> 
> Yeah
> 
> 
> > > In my opinion, we definitely need a solution like this patch for Dom0
> > > PVH to function correctly in all scenarios.
> > 
> > I'm not opposed to having such interface available for PVH hardware
> > domains.  I find it ugly, but I don't see much other way to deal with
> > those kind of "devices".  Xen mediating accesses for each one of them
> > is unlikely to be doable.
> > 
> > How do you hook this exchange interface into Linux to differentiate
> > which drivers need to use mfns when interacting with the hardware?
> 
> In the specific case we have at hands the driver is in Linux userspace
> and is specially-written for our use case. It is not generic, so we
> don't have this problem. But your question is valid.

Oh, so you then have some kind of ioctl interface that does the memory
exchange and bouncing inside of the kernel on behalf of the user-space
side I would think?

> In Linux, the issue is hidden behind drivers/xen/swiotlb-xen.c and
> xen_arch_need_swiotlb. There are a few options:
> - force swiotlb bounce for everything on the problematic SoC
> - edit xen_arch_need_swiotlb to return true for the problematic device
> - introduce a kernel command line option to specify which device
>   xen_arch_need_swiotlb should return true for

Isn't it a bit misleading to use the swiotlb for this purpose?  Won't
this usage of the swiotlb (to bounce from gfns to mfns) create issues
if there's any devices that have a DMA physical address limitation and
also needs to use the swiotlb while being behind the IOMMU?

> - introduce an ACPI table with the relevant info

Hm, best option might be an ACPI table so that Xen can signal to the
hardware domain whether communication with the device must be done
using mfns, or if accesses are mediated and hence can be done using
gfns?

It's a bit cumbersome however to have to resort to an ACPI table just
for this.  Not sure whether we could expand one of the existing tables
already under Xen control (STAO?) to contain this information.  It all
looks a bit ad-hoc.

I think we need some kind of list of devices that are not behind the
IOMMU, but I have no idea what URI to use to identify those.  I assume
the PSP doesn't have a PCI SBDF (as it's not on the PCI bus?).  Maybe
by ACPI path?

Or maybe it's fine to always communicate with those non-translated
devices using MFNs, and even if we later add some kind of PSP
mediation (so that both Xen and dom0 can use it), accesses by dom0
will still be assumed to be using MFNs, and thus need no translation.

Thanks, Roger.

Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 1 week ago
On Thu, 8 May 2025, Roger Pau Monné wrote:
> On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
> > On Tue, 6 May 2025, Roger Pau Monné wrote:
> > > On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> > > > On Mon, 5 May 2025, Roger Pau Monné wrote:
> > > > > On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> > > > > > On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> > > > > > > On Mon, 28 Apr 2025, Jan Beulich wrote:
> > > > > > > > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > > > > > > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > > > > > > > 
> > > > > > > > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > > > > > > > addresses to firmware or co-processors not behind an IOMMU.
> > > > > > > > 
> > > > > > > > I definitely don't understand the firmware part: It's subject to the
> > > > > > > > same transparent P2M translations as the rest of the VM; it's just
> > > > > > > > another piece of software running there.
> > > > > > > > 
> > > > > > > > "Co-processors not behind an IOMMU" is also interesting; a more
> > > > > > > > concrete scenario might be nice, yet I realize you may be limited in
> > > > > > > > what you're allowed to say.
> > > > > > > 
> > > > > > > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > > > > > > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > > > > > > pass addresses to it.  See drivers/tee/amdtee/ and
> > > > > > > include/linux/psp-tee.h in Linux.
> > > > > > 
> > > > > > We had (have?) similar issue with amdgpu (for integrated graphics) - it
> > > > > > uses PSP for loading its firmware. With PV dom0 there is a workaround as
> > > > > > dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> > > > > > expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> > > > > > the one I used for debugging this issue).
> > > > > 
> > > > > That's ugly, and problematic when used in conjunction with AMD-SEV.
> > > > > 
> > > > > I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> > > > > to use, while allowing Xen to be the sole owner of the device.  Having
> > > > > both Xen and dom0 use it (for different purposes) seems like asking
> > > > > for trouble.  But I also have no idea how complex the PSP interface
> > > > > is, neither whether it would be feasible to emulate the
> > > > > interfaces/registers needed for firmware loading.
> > > > 
> > > > Let me take a step back from the PSP for a moment. I am not opposed to a
> > > > PSP mediator in Xen, but I want to emphasize that the issue is more
> > > > general and extends well beyond the PSP.
> > > > 
> > > > In my years working in embedded systems, I have consistently seen cases
> > > > where Dom0 needs to communicate with something that does not go through
> > > > the IOMMU. This could be due to special firmware on a co-processor, a
> > > > hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> > > > device that technically supports the IOMMU but performs poorly unless
> > > > the IOMMU is disabled. All of these are real-world examples that I have
> > > > seen personally.
> > > 
> > > I wouldn't be surprised, classic PV dom0 avoided those issues because
> > > it was dealing directly with host addresses (mfns), but that's not the
> > > case with PVH dom0.
> > 
> > Yeah
> > 
> > 
> > > > In my opinion, we definitely need a solution like this patch for Dom0
> > > > PVH to function correctly in all scenarios.
> > > 
> > > I'm not opposed to having such interface available for PVH hardware
> > > domains.  I find it ugly, but I don't see much other way to deal with
> > > those kind of "devices".  Xen mediating accesses for each one of them
> > > is unlikely to be doable.
> > > 
> > > How do you hook this exchange interface into Linux to differentiate
> > > which drivers need to use mfns when interacting with the hardware?
> > 
> > In the specific case we have at hands the driver is in Linux userspace
> > and is specially-written for our use case. It is not generic, so we
> > don't have this problem. But your question is valid.
> 
> Oh, so you then have some kind of ioctl interface that does the memory
> exchange and bouncing inside of the kernel on behalf of the user-space
> side I would think?

I am not sure... Xenia might know more than me here.


> > In Linux, the issue is hidden behind drivers/xen/swiotlb-xen.c and
> > xen_arch_need_swiotlb. There are a few options:
> > - force swiotlb bounce for everything on the problematic SoC
> > - edit xen_arch_need_swiotlb to return true for the problematic device
> > - introduce a kernel command line option to specify which device
> >   xen_arch_need_swiotlb should return true for
> 
> Isn't it a bit misleading to use the swiotlb for this purpose?  Won't
> this usage of the swiotlb (to bounce from gfns to mfns) create issues
> if there's any devices that have a DMA physical address limitation and
> also needs to use the swiotlb while being behind the IOMMU?

When I wrote swiotlb, I meant swiotlb-xen (drivers/xen/swiotlb-xen.c).
We have been using it exactly for this kind of address translations so
far. It can also deal with cases where genuine bouncing needs to happen.


> > - introduce an ACPI table with the relevant info
> 
> Hm, best option might be an ACPI table so that Xen can signal to the
> hardware domain whether communication with the device must be done
> using mfns, or if accesses are mediated and hence can be done using
> gfns?

Yeah


> It's a bit cumbersome however to have to resort to an ACPI table just
> for this.  Not sure whether we could expand one of the existing tables
> already under Xen control (STAO?) to contain this information.  It all
> looks a bit ad-hoc.
> 
> I think we need some kind of list of devices that are not behind the
> IOMMU, but I have no idea what URI to use to identify those.  I assume
> the PSP doesn't have a PCI SBDF (as it's not on the PCI bus?).  Maybe
> by ACPI path?

Yes, we have a similar issue with the STAO table in terms of which
identifiers to use. STAO uses: "A list of ACPI strings, where each
string is the full path name in the ACPI namespace of the device"


> Or maybe it's fine to always communicate with those non-translated
> devices using MFNs, and even if we later add some kind of PSP
> mediation (so that both Xen and dom0 can use it), accesses by dom0
> will still be assumed to be using MFNs, and thus need no translation.

I think it is easy to enable/disable GFN/MFN for a device like PSP. We
can communicate to Linux in many ways this change in behavior.
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Roger Pau Monné 7 months, 1 week ago
On Thu, May 08, 2025 at 04:25:28PM -0700, Stefano Stabellini wrote:
> On Thu, 8 May 2025, Roger Pau Monné wrote:
> > On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
> > > On Tue, 6 May 2025, Roger Pau Monné wrote:
> > > > On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> > > > > On Mon, 5 May 2025, Roger Pau Monné wrote:
> > > > > > On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> > > > > > > On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> > > > > > > > On Mon, 28 Apr 2025, Jan Beulich wrote:
> > > > > > > > > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > > > > > > > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > > > > > > > > 
> > > > > > > > > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > > > > > > > > addresses to firmware or co-processors not behind an IOMMU.
> > > > > > > > > 
> > > > > > > > > I definitely don't understand the firmware part: It's subject to the
> > > > > > > > > same transparent P2M translations as the rest of the VM; it's just
> > > > > > > > > another piece of software running there.
> > > > > > > > > 
> > > > > > > > > "Co-processors not behind an IOMMU" is also interesting; a more
> > > > > > > > > concrete scenario might be nice, yet I realize you may be limited in
> > > > > > > > > what you're allowed to say.
> > > > > > > > 
> > > > > > > > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > > > > > > > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > > > > > > > pass addresses to it.  See drivers/tee/amdtee/ and
> > > > > > > > include/linux/psp-tee.h in Linux.
> > > > > > > 
> > > > > > > We had (have?) similar issue with amdgpu (for integrated graphics) - it
> > > > > > > uses PSP for loading its firmware. With PV dom0 there is a workaround as
> > > > > > > dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> > > > > > > expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> > > > > > > the one I used for debugging this issue).
> > > > > > 
> > > > > > That's ugly, and problematic when used in conjunction with AMD-SEV.
> > > > > > 
> > > > > > I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> > > > > > to use, while allowing Xen to be the sole owner of the device.  Having
> > > > > > both Xen and dom0 use it (for different purposes) seems like asking
> > > > > > for trouble.  But I also have no idea how complex the PSP interface
> > > > > > is, neither whether it would be feasible to emulate the
> > > > > > interfaces/registers needed for firmware loading.
> > > > > 
> > > > > Let me take a step back from the PSP for a moment. I am not opposed to a
> > > > > PSP mediator in Xen, but I want to emphasize that the issue is more
> > > > > general and extends well beyond the PSP.
> > > > > 
> > > > > In my years working in embedded systems, I have consistently seen cases
> > > > > where Dom0 needs to communicate with something that does not go through
> > > > > the IOMMU. This could be due to special firmware on a co-processor, a
> > > > > hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> > > > > device that technically supports the IOMMU but performs poorly unless
> > > > > the IOMMU is disabled. All of these are real-world examples that I have
> > > > > seen personally.
> > > > 
> > > > I wouldn't be surprised, classic PV dom0 avoided those issues because
> > > > it was dealing directly with host addresses (mfns), but that's not the
> > > > case with PVH dom0.
> > > 
> > > Yeah
> > > 
> > > 
> > > > > In my opinion, we definitely need a solution like this patch for Dom0
> > > > > PVH to function correctly in all scenarios.
> > > > 
> > > > I'm not opposed to having such interface available for PVH hardware
> > > > domains.  I find it ugly, but I don't see much other way to deal with
> > > > those kind of "devices".  Xen mediating accesses for each one of them
> > > > is unlikely to be doable.
> > > > 
> > > > How do you hook this exchange interface into Linux to differentiate
> > > > which drivers need to use mfns when interacting with the hardware?
> > > 
> > > In the specific case we have at hands the driver is in Linux userspace
> > > and is specially-written for our use case. It is not generic, so we
> > > don't have this problem. But your question is valid.
> > 
> > Oh, so you then have some kind of ioctl interface that does the memory
> > exchange and bouncing inside of the kernel on behalf of the user-space
> > side I would think?
> 
> I am not sure... Xenia might know more than me here.

One further question I have regarding this approach: have you
considered just populating an empty p2m space with contiguous physical
memory instead of exchanging an existing area?  That would increase
dom0 memory usage, but would prevent super page shattering in the p2m.
You could use a dom0_mem=X,max:X+Y command line option, where Y
would be your extra room for swiotlb-xen bouncing usage.

XENMEM_increase_reservation documentation notes such hypercall already
returns the base MFN of the allocated page (see comment in
xen_memory_reservation struct declaration).

> > > In Linux, the issue is hidden behind drivers/xen/swiotlb-xen.c and
> > > xen_arch_need_swiotlb. There are a few options:
> > > - force swiotlb bounce for everything on the problematic SoC
> > > - edit xen_arch_need_swiotlb to return true for the problematic device
> > > - introduce a kernel command line option to specify which device
> > >   xen_arch_need_swiotlb should return true for
> > 
> > Isn't it a bit misleading to use the swiotlb for this purpose?  Won't
> > this usage of the swiotlb (to bounce from gfns to mfns) create issues
> > if there's any devices that have a DMA physical address limitation and
> > also needs to use the swiotlb while being behind the IOMMU?
> 
> When I wrote swiotlb, I meant swiotlb-xen (drivers/xen/swiotlb-xen.c).
> We have been using it exactly for this kind of address translations so
> far. It can also deal with cases where genuine bouncing needs to happen.

Oh, I see.  I've assumed you meant the generic Linux swiotlb.

So you have repurposed swiotlb-xen to be used on PVH for this purpose.
I think (currently?) swiotlb-xen is unconditionally disabled for
HVM/PVH guests?

> > > - introduce an ACPI table with the relevant info
> > 
> > Hm, best option might be an ACPI table so that Xen can signal to the
> > hardware domain whether communication with the device must be done
> > using mfns, or if accesses are mediated and hence can be done using
> > gfns?
> 
> Yeah
> 
> 
> > It's a bit cumbersome however to have to resort to an ACPI table just
> > for this.  Not sure whether we could expand one of the existing tables
> > already under Xen control (STAO?) to contain this information.  It all
> > looks a bit ad-hoc.
> > 
> > I think we need some kind of list of devices that are not behind the
> > IOMMU, but I have no idea what URI to use to identify those.  I assume
> > the PSP doesn't have a PCI SBDF (as it's not on the PCI bus?).  Maybe
> > by ACPI path?
> 
> Yes, we have a similar issue with the STAO table in terms of which
> identifiers to use. STAO uses: "A list of ACPI strings, where each
> string is the full path name in the ACPI namespace of the device"
> 
> 
> > Or maybe it's fine to always communicate with those non-translated
> > devices using MFNs, and even if we later add some kind of PSP
> > mediation (so that both Xen and dom0 can use it), accesses by dom0
> > will still be assumed to be using MFNs, and thus need no translation.
> 
> I think it is easy to enable/disable GFN/MFN for a device like PSP. We
> can communicate to Linux in many ways this change in behavior.

In fact my biggest concern with the PSP explicitly is not so much the
usage of MFN/GFNs, I think it would be fine for the hardware domain to
unconditionally use MFNs for interactions with the device.

I worry how are we going to reconcile both Xen and the hardware domain
having to use the device, as I have no idea how complex the interface
is, so no idea how much code we would need to add to Xen to mediate a
external domain accesses.  Anyway, that's a question for later really.

Thanks, Roger.

Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Jan Beulich 7 months, 1 week ago
On 09.05.2025 10:32, Roger Pau Monné wrote:
> On Thu, May 08, 2025 at 04:25:28PM -0700, Stefano Stabellini wrote:
>> On Thu, 8 May 2025, Roger Pau Monné wrote:
>>> On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
>>>> On Tue, 6 May 2025, Roger Pau Monné wrote:
>>>>> On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
>>>>>> In my opinion, we definitely need a solution like this patch for Dom0
>>>>>> PVH to function correctly in all scenarios.
>>>>>
>>>>> I'm not opposed to having such interface available for PVH hardware
>>>>> domains.  I find it ugly, but I don't see much other way to deal with
>>>>> those kind of "devices".  Xen mediating accesses for each one of them
>>>>> is unlikely to be doable.
>>>>>
>>>>> How do you hook this exchange interface into Linux to differentiate
>>>>> which drivers need to use mfns when interacting with the hardware?
>>>>
>>>> In the specific case we have at hands the driver is in Linux userspace
>>>> and is specially-written for our use case. It is not generic, so we
>>>> don't have this problem. But your question is valid.
>>>
>>> Oh, so you then have some kind of ioctl interface that does the memory
>>> exchange and bouncing inside of the kernel on behalf of the user-space
>>> side I would think?
>>
>> I am not sure... Xenia might know more than me here.
> 
> One further question I have regarding this approach: have you
> considered just populating an empty p2m space with contiguous physical
> memory instead of exchanging an existing area?  That would increase
> dom0 memory usage, but would prevent super page shattering in the p2m.
> You could use a dom0_mem=X,max:X+Y command line option, where Y
> would be your extra room for swiotlb-xen bouncing usage.
> 
> XENMEM_increase_reservation documentation notes such hypercall already
> returns the base MFN of the allocated page (see comment in
> xen_memory_reservation struct declaration).

Except that this looks to be stale. At the bottom of increase_reservation()
we have:

        if ( !paging_mode_translate(d) &&
             !guest_handle_is_null(a->extent_list) )
        {
            mfn_t mfn = page_to_mfn(page);

            if ( unlikely(__copy_mfn_to_guest_offset(a->extent_list, i, mfn)) )
                goto out;
        }

Consistent with us not exposing MFNs elsewhere as well, except for PV.

Jan

Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 1 week ago
On Fri, 9 May 2025, Roger Pau Monné wrote:
> On Thu, May 08, 2025 at 04:25:28PM -0700, Stefano Stabellini wrote:
> > On Thu, 8 May 2025, Roger Pau Monné wrote:
> > > On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
> > > > On Tue, 6 May 2025, Roger Pau Monné wrote:
> > > > > On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> > > > > > On Mon, 5 May 2025, Roger Pau Monné wrote:
> > > > > > > On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> > > > > > > > On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> > > > > > > > > On Mon, 28 Apr 2025, Jan Beulich wrote:
> > > > > > > > > > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > > > > > > > > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > > > > > > > > > 
> > > > > > > > > > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > > > > > > > > > addresses to firmware or co-processors not behind an IOMMU.
> > > > > > > > > > 
> > > > > > > > > > I definitely don't understand the firmware part: It's subject to the
> > > > > > > > > > same transparent P2M translations as the rest of the VM; it's just
> > > > > > > > > > another piece of software running there.
> > > > > > > > > > 
> > > > > > > > > > "Co-processors not behind an IOMMU" is also interesting; a more
> > > > > > > > > > concrete scenario might be nice, yet I realize you may be limited in
> > > > > > > > > > what you're allowed to say.
> > > > > > > > > 
> > > > > > > > > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > > > > > > > > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > > > > > > > > pass addresses to it.  See drivers/tee/amdtee/ and
> > > > > > > > > include/linux/psp-tee.h in Linux.
> > > > > > > > 
> > > > > > > > We had (have?) similar issue with amdgpu (for integrated graphics) - it
> > > > > > > > uses PSP for loading its firmware. With PV dom0 there is a workaround as
> > > > > > > > dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> > > > > > > > expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> > > > > > > > the one I used for debugging this issue).
> > > > > > > 
> > > > > > > That's ugly, and problematic when used in conjunction with AMD-SEV.
> > > > > > > 
> > > > > > > I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> > > > > > > to use, while allowing Xen to be the sole owner of the device.  Having
> > > > > > > both Xen and dom0 use it (for different purposes) seems like asking
> > > > > > > for trouble.  But I also have no idea how complex the PSP interface
> > > > > > > is, neither whether it would be feasible to emulate the
> > > > > > > interfaces/registers needed for firmware loading.
> > > > > > 
> > > > > > Let me take a step back from the PSP for a moment. I am not opposed to a
> > > > > > PSP mediator in Xen, but I want to emphasize that the issue is more
> > > > > > general and extends well beyond the PSP.
> > > > > > 
> > > > > > In my years working in embedded systems, I have consistently seen cases
> > > > > > where Dom0 needs to communicate with something that does not go through
> > > > > > the IOMMU. This could be due to special firmware on a co-processor, a
> > > > > > hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> > > > > > device that technically supports the IOMMU but performs poorly unless
> > > > > > the IOMMU is disabled. All of these are real-world examples that I have
> > > > > > seen personally.
> > > > > 
> > > > > I wouldn't be surprised, classic PV dom0 avoided those issues because
> > > > > it was dealing directly with host addresses (mfns), but that's not the
> > > > > case with PVH dom0.
> > > > 
> > > > Yeah
> > > > 
> > > > 
> > > > > > In my opinion, we definitely need a solution like this patch for Dom0
> > > > > > PVH to function correctly in all scenarios.
> > > > > 
> > > > > I'm not opposed to having such interface available for PVH hardware
> > > > > domains.  I find it ugly, but I don't see much other way to deal with
> > > > > those kind of "devices".  Xen mediating accesses for each one of them
> > > > > is unlikely to be doable.
> > > > > 
> > > > > How do you hook this exchange interface into Linux to differentiate
> > > > > which drivers need to use mfns when interacting with the hardware?
> > > > 
> > > > In the specific case we have at hands the driver is in Linux userspace
> > > > and is specially-written for our use case. It is not generic, so we
> > > > don't have this problem. But your question is valid.
> > > 
> > > Oh, so you then have some kind of ioctl interface that does the memory
> > > exchange and bouncing inside of the kernel on behalf of the user-space
> > > side I would think?
> > 
> > I am not sure... Xenia might know more than me here.
> 
> One further question I have regarding this approach: have you
> considered just populating an empty p2m space with contiguous physical
> memory instead of exchanging an existing area?  That would increase
> dom0 memory usage, but would prevent super page shattering in the p2m.
> You could use a dom0_mem=X,max:X+Y command line option, where Y
> would be your extra room for swiotlb-xen bouncing usage.
> 
> XENMEM_increase_reservation documentation notes such hypercall already
> returns the base MFN of the allocated page (see comment in
> xen_memory_reservation struct declaration).
 
XENMEM_exchange is the way it has been implemented traditionally in
Linux swiotlb-xen (it has been years). But your idea is good.

Another, more drastic, idea would be to attempt to map Dom0 PVH memory
1:1 at domain creation time like we do on ARM. If not all of it, as much
as possible. That would resolve the problem very efficiently. We could
communicate to Dom0 PVH the range that is 1:1 in one of the initial data
structures, and that would be the end of it.


> > > > In Linux, the issue is hidden behind drivers/xen/swiotlb-xen.c and
> > > > xen_arch_need_swiotlb. There are a few options:
> > > > - force swiotlb bounce for everything on the problematic SoC
> > > > - edit xen_arch_need_swiotlb to return true for the problematic device
> > > > - introduce a kernel command line option to specify which device
> > > >   xen_arch_need_swiotlb should return true for
> > > 
> > > Isn't it a bit misleading to use the swiotlb for this purpose?  Won't
> > > this usage of the swiotlb (to bounce from gfns to mfns) create issues
> > > if there's any devices that have a DMA physical address limitation and
> > > also needs to use the swiotlb while being behind the IOMMU?
> > 
> > When I wrote swiotlb, I meant swiotlb-xen (drivers/xen/swiotlb-xen.c).
> > We have been using it exactly for this kind of address translations so
> > far. It can also deal with cases where genuine bouncing needs to happen.
> 
> Oh, I see.  I've assumed you meant the generic Linux swiotlb.
> 
> So you have repurposed swiotlb-xen to be used on PVH for this purpose.
> I think (currently?) swiotlb-xen is unconditionally disabled for
> HVM/PVH guests?

Yes, I have repurposed swiotlb-xen for something similar this years ago
on ARM. I was planning to do the same for PVH x86. Today, swiotlb-xen is
used for ARM Dom0, which as you know is HVM/PVH from Linux point of view.


> > > > - introduce an ACPI table with the relevant info
> > > 
> > > Hm, best option might be an ACPI table so that Xen can signal to the
> > > hardware domain whether communication with the device must be done
> > > using mfns, or if accesses are mediated and hence can be done using
> > > gfns?
> > 
> > Yeah
> > 
> > 
> > > It's a bit cumbersome however to have to resort to an ACPI table just
> > > for this.  Not sure whether we could expand one of the existing tables
> > > already under Xen control (STAO?) to contain this information.  It all
> > > looks a bit ad-hoc.
> > > 
> > > I think we need some kind of list of devices that are not behind the
> > > IOMMU, but I have no idea what URI to use to identify those.  I assume
> > > the PSP doesn't have a PCI SBDF (as it's not on the PCI bus?).  Maybe
> > > by ACPI path?
> > 
> > Yes, we have a similar issue with the STAO table in terms of which
> > identifiers to use. STAO uses: "A list of ACPI strings, where each
> > string is the full path name in the ACPI namespace of the device"
> > 
> > 
> > > Or maybe it's fine to always communicate with those non-translated
> > > devices using MFNs, and even if we later add some kind of PSP
> > > mediation (so that both Xen and dom0 can use it), accesses by dom0
> > > will still be assumed to be using MFNs, and thus need no translation.
> > 
> > I think it is easy to enable/disable GFN/MFN for a device like PSP. We
> > can communicate to Linux in many ways this change in behavior.
> 
> In fact my biggest concern with the PSP explicitly is not so much the
> usage of MFN/GFNs, I think it would be fine for the hardware domain to
> unconditionally use MFNs for interactions with the device.
> 
> I worry how are we going to reconcile both Xen and the hardware domain
> having to use the device, as I have no idea how complex the interface
> is, so no idea how much code we would need to add to Xen to mediate a
> external domain accesses.
>
> Anyway, that's a question for later really.

Agreed
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Roger Pau Monné 7 months, 1 week ago
On Fri, May 09, 2025 at 02:10:03PM -0700, Stefano Stabellini wrote:
> On Fri, 9 May 2025, Roger Pau Monné wrote:
> > On Thu, May 08, 2025 at 04:25:28PM -0700, Stefano Stabellini wrote:
> > > On Thu, 8 May 2025, Roger Pau Monné wrote:
> > > > On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
> > > > > On Tue, 6 May 2025, Roger Pau Monné wrote:
> > > > > > On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> > > > > > > In my opinion, we definitely need a solution like this patch for Dom0
> > > > > > > PVH to function correctly in all scenarios.
> > > > > > 
> > > > > > I'm not opposed to having such interface available for PVH hardware
> > > > > > domains.  I find it ugly, but I don't see much other way to deal with
> > > > > > those kind of "devices".  Xen mediating accesses for each one of them
> > > > > > is unlikely to be doable.
> > > > > > 
> > > > > > How do you hook this exchange interface into Linux to differentiate
> > > > > > which drivers need to use mfns when interacting with the hardware?
> > > > > 
> > > > > In the specific case we have at hands the driver is in Linux userspace
> > > > > and is specially-written for our use case. It is not generic, so we
> > > > > don't have this problem. But your question is valid.
> > > > 
> > > > Oh, so you then have some kind of ioctl interface that does the memory
> > > > exchange and bouncing inside of the kernel on behalf of the user-space
> > > > side I would think?
> > > 
> > > I am not sure... Xenia might know more than me here.
> > 
> > One further question I have regarding this approach: have you
> > considered just populating an empty p2m space with contiguous physical
> > memory instead of exchanging an existing area?  That would increase
> > dom0 memory usage, but would prevent super page shattering in the p2m.
> > You could use a dom0_mem=X,max:X+Y command line option, where Y
> > would be your extra room for swiotlb-xen bouncing usage.
> > 
> > XENMEM_increase_reservation documentation notes such hypercall already
> > returns the base MFN of the allocated page (see comment in
> > xen_memory_reservation struct declaration).
>  
> XENMEM_exchange is the way it has been implemented traditionally in
> Linux swiotlb-xen (it has been years). But your idea is good.
> 
> Another, more drastic, idea would be to attempt to map Dom0 PVH memory
> 1:1 at domain creation time like we do on ARM. If not all of it, as much
> as possible. That would resolve the problem very efficiently. We could
> communicate to Dom0 PVH the range that is 1:1 in one of the initial data
> structures, and that would be the end of it.

Yes, I wonder however whether attempting this would result in a
fair amount of page-shattering if we need to cater for pages in-use by
Xen that cannot be identity mapped.

Maybe a middle ground: on the Xen command line the admin specifies
the amount of contiguous identity mapped memory required, and Xen
attempts to allocate and identity map it on dom0 p2m?

It would be nice to signal such regions on the memory map itself.
Sadly I don't see a way to do it using the UEFI memory map format.
There's a "EFI_MEMORY_ISA_MASK" region in the attribute field, but I
don't think we can hijack this for Xen purposes.  There's also a 32bit
padding field in EFI_MEMORY_DESCRIPTOR after the type field, but using
that is possibly risky going forward?  I don't think UEFI can
repurpose that padding easily, but we might not want to bet on that.

We also have the need to signal which regions are safe to use as
foreign/grant mapping scratch space, so it would be good to use an
interface that can be expanded.  IOW: have a way to add extra Xen
specific attributes to memory regions.

Anyway, for the patch at hand: I see no reason to prevent
XENMEM_exchange usage. I think it's maybe not the best option because
of the super-page shattering consequences, but I assume you already
have a working solution based on this.

> > > > > In Linux, the issue is hidden behind drivers/xen/swiotlb-xen.c and
> > > > > xen_arch_need_swiotlb. There are a few options:
> > > > > - force swiotlb bounce for everything on the problematic SoC
> > > > > - edit xen_arch_need_swiotlb to return true for the problematic device
> > > > > - introduce a kernel command line option to specify which device
> > > > >   xen_arch_need_swiotlb should return true for
> > > > 
> > > > Isn't it a bit misleading to use the swiotlb for this purpose?  Won't
> > > > this usage of the swiotlb (to bounce from gfns to mfns) create issues
> > > > if there's any devices that have a DMA physical address limitation and
> > > > also needs to use the swiotlb while being behind the IOMMU?
> > > 
> > > When I wrote swiotlb, I meant swiotlb-xen (drivers/xen/swiotlb-xen.c).
> > > We have been using it exactly for this kind of address translations so
> > > far. It can also deal with cases where genuine bouncing needs to happen.
> > 
> > Oh, I see.  I've assumed you meant the generic Linux swiotlb.
> > 
> > So you have repurposed swiotlb-xen to be used on PVH for this purpose.
> > I think (currently?) swiotlb-xen is unconditionally disabled for
> > HVM/PVH guests?
> 
> Yes, I have repurposed swiotlb-xen for something similar this years ago
> on ARM. I was planning to do the same for PVH x86. Today, swiotlb-xen is
> used for ARM Dom0, which as you know is HVM/PVH from Linux point of view.

Sounds good, no objection from my side.

Regards, Roger.

Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Teddy Astie 7 months, 1 week ago
Le 09/05/2025 à 23:13, Stefano Stabellini a écrit :
> On Fri, 9 May 2025, Roger Pau Monné wrote:
>> On Thu, May 08, 2025 at 04:25:28PM -0700, Stefano Stabellini wrote:
>>> On Thu, 8 May 2025, Roger Pau Monné wrote:
>>>> On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
>>>>> On Tue, 6 May 2025, Roger Pau Monné wrote:
>>>>>> On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
>>>>>>> On Mon, 5 May 2025, Roger Pau Monné wrote:
>>>>>>>> On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
>>>>>>>>> On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
>>>>>>>>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
>>>>>>>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
>>>>>>>>>>>> From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Dom0 PVH might need XENMEM_exchange when passing contiguous memory
>>>>>>>>>>>> addresses to firmware or co-processors not behind an IOMMU.
>>>>>>>>>>>
>>>>>>>>>>> I definitely don't understand the firmware part: It's subject to the
>>>>>>>>>>> same transparent P2M translations as the rest of the VM; it's just
>>>>>>>>>>> another piece of software running there.
>>>>>>>>>>>
>>>>>>>>>>> "Co-processors not behind an IOMMU" is also interesting; a more
>>>>>>>>>>> concrete scenario might be nice, yet I realize you may be limited in
>>>>>>>>>>> what you're allowed to say.
>>>>>>>>>>
>>>>>>>>>> Sure. On AMD x86 platforms there is a co-processor called PSP running
>>>>>>>>>> TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
>>>>>>>>>> pass addresses to it.  See drivers/tee/amdtee/ and
>>>>>>>>>> include/linux/psp-tee.h in Linux.
>>>>>>>>>
>>>>>>>>> We had (have?) similar issue with amdgpu (for integrated graphics) - it
>>>>>>>>> uses PSP for loading its firmware. With PV dom0 there is a workaround as
>>>>>>>>> dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
>>>>>>>>> expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
>>>>>>>>> the one I used for debugging this issue).
>>>>>>>>
>>>>>>>> That's ugly, and problematic when used in conjunction with AMD-SEV.
>>>>>>>>
>>>>>>>> I wonder if Xen could emulate/mediate some parts of the PSP for dom0
>>>>>>>> to use, while allowing Xen to be the sole owner of the device.  Having
>>>>>>>> both Xen and dom0 use it (for different purposes) seems like asking
>>>>>>>> for trouble.  But I also have no idea how complex the PSP interface
>>>>>>>> is, neither whether it would be feasible to emulate the
>>>>>>>> interfaces/registers needed for firmware loading.
>>>>>>>
>>>>>>> Let me take a step back from the PSP for a moment. I am not opposed to a
>>>>>>> PSP mediator in Xen, but I want to emphasize that the issue is more
>>>>>>> general and extends well beyond the PSP.
>>>>>>>
>>>>>>> In my years working in embedded systems, I have consistently seen cases
>>>>>>> where Dom0 needs to communicate with something that does not go through
>>>>>>> the IOMMU. This could be due to special firmware on a co-processor, a
>>>>>>> hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
>>>>>>> device that technically supports the IOMMU but performs poorly unless
>>>>>>> the IOMMU is disabled. All of these are real-world examples that I have
>>>>>>> seen personally.
>>>>>>
>>>>>> I wouldn't be surprised, classic PV dom0 avoided those issues because
>>>>>> it was dealing directly with host addresses (mfns), but that's not the
>>>>>> case with PVH dom0.
>>>>>
>>>>> Yeah
>>>>>
>>>>>
>>>>>>> In my opinion, we definitely need a solution like this patch for Dom0
>>>>>>> PVH to function correctly in all scenarios.
>>>>>>
>>>>>> I'm not opposed to having such interface available for PVH hardware
>>>>>> domains.  I find it ugly, but I don't see much other way to deal with
>>>>>> those kind of "devices".  Xen mediating accesses for each one of them
>>>>>> is unlikely to be doable.
>>>>>>
>>>>>> How do you hook this exchange interface into Linux to differentiate
>>>>>> which drivers need to use mfns when interacting with the hardware?
>>>>>
>>>>> In the specific case we have at hands the driver is in Linux userspace
>>>>> and is specially-written for our use case. It is not generic, so we
>>>>> don't have this problem. But your question is valid.
>>>>
>>>> Oh, so you then have some kind of ioctl interface that does the memory
>>>> exchange and bouncing inside of the kernel on behalf of the user-space
>>>> side I would think?
>>>
>>> I am not sure... Xenia might know more than me here.
>>
>> One further question I have regarding this approach: have you
>> considered just populating an empty p2m space with contiguous physical
>> memory instead of exchanging an existing area?  That would increase
>> dom0 memory usage, but would prevent super page shattering in the p2m.
>> You could use a dom0_mem=X,max:X+Y command line option, where Y
>> would be your extra room for swiotlb-xen bouncing usage.
>>
>> XENMEM_increase_reservation documentation notes such hypercall already
>> returns the base MFN of the allocated page (see comment in
>> xen_memory_reservation struct declaration).
>   
> XENMEM_exchange is the way it has been implemented traditionally in
> Linux swiotlb-xen (it has been years). But your idea is good.
> 
> Another, more drastic, idea would be to attempt to map Dom0 PVH memory
> 1:1 at domain creation time like we do on ARM. If not all of it, as much
> as possible. That would resolve the problem very efficiently. We could
> communicate to Dom0 PVH the range that is 1:1 in one of the initial data
> structures, and that would be the end of it.
> 

Could that be done by introducing a "fake" reserved region in advance 
(IVMD?) ? Such region are usually mapped to the domain 1:1 in addition 
to be coherent on the IOMMU side (so it doesn't break in case the PSP 
gets IOMMU-aware).

Teddy


 | Vates

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 1 week ago
On Fri, 9 May 2025, Teddy Astie wrote:
> Le 09/05/2025 à 23:13, Stefano Stabellini a écrit :
> > On Fri, 9 May 2025, Roger Pau Monné wrote:
> >> On Thu, May 08, 2025 at 04:25:28PM -0700, Stefano Stabellini wrote:
> >>> On Thu, 8 May 2025, Roger Pau Monné wrote:
> >>>> On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
> >>>>> On Tue, 6 May 2025, Roger Pau Monné wrote:
> >>>>>> On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> >>>>>>> On Mon, 5 May 2025, Roger Pau Monné wrote:
> >>>>>>>> On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> >>>>>>>>> On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> >>>>>>>>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
> >>>>>>>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
> >>>>>>>>>>>> From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> >>>>>>>>>>>> addresses to firmware or co-processors not behind an IOMMU.
> >>>>>>>>>>>
> >>>>>>>>>>> I definitely don't understand the firmware part: It's subject to the
> >>>>>>>>>>> same transparent P2M translations as the rest of the VM; it's just
> >>>>>>>>>>> another piece of software running there.
> >>>>>>>>>>>
> >>>>>>>>>>> "Co-processors not behind an IOMMU" is also interesting; a more
> >>>>>>>>>>> concrete scenario might be nice, yet I realize you may be limited in
> >>>>>>>>>>> what you're allowed to say.
> >>>>>>>>>>
> >>>>>>>>>> Sure. On AMD x86 platforms there is a co-processor called PSP running
> >>>>>>>>>> TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> >>>>>>>>>> pass addresses to it.  See drivers/tee/amdtee/ and
> >>>>>>>>>> include/linux/psp-tee.h in Linux.
> >>>>>>>>>
> >>>>>>>>> We had (have?) similar issue with amdgpu (for integrated graphics) - it
> >>>>>>>>> uses PSP for loading its firmware. With PV dom0 there is a workaround as
> >>>>>>>>> dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> >>>>>>>>> expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> >>>>>>>>> the one I used for debugging this issue).
> >>>>>>>>
> >>>>>>>> That's ugly, and problematic when used in conjunction with AMD-SEV.
> >>>>>>>>
> >>>>>>>> I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> >>>>>>>> to use, while allowing Xen to be the sole owner of the device.  Having
> >>>>>>>> both Xen and dom0 use it (for different purposes) seems like asking
> >>>>>>>> for trouble.  But I also have no idea how complex the PSP interface
> >>>>>>>> is, neither whether it would be feasible to emulate the
> >>>>>>>> interfaces/registers needed for firmware loading.
> >>>>>>>
> >>>>>>> Let me take a step back from the PSP for a moment. I am not opposed to a
> >>>>>>> PSP mediator in Xen, but I want to emphasize that the issue is more
> >>>>>>> general and extends well beyond the PSP.
> >>>>>>>
> >>>>>>> In my years working in embedded systems, I have consistently seen cases
> >>>>>>> where Dom0 needs to communicate with something that does not go through
> >>>>>>> the IOMMU. This could be due to special firmware on a co-processor, a
> >>>>>>> hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> >>>>>>> device that technically supports the IOMMU but performs poorly unless
> >>>>>>> the IOMMU is disabled. All of these are real-world examples that I have
> >>>>>>> seen personally.
> >>>>>>
> >>>>>> I wouldn't be surprised, classic PV dom0 avoided those issues because
> >>>>>> it was dealing directly with host addresses (mfns), but that's not the
> >>>>>> case with PVH dom0.
> >>>>>
> >>>>> Yeah
> >>>>>
> >>>>>
> >>>>>>> In my opinion, we definitely need a solution like this patch for Dom0
> >>>>>>> PVH to function correctly in all scenarios.
> >>>>>>
> >>>>>> I'm not opposed to having such interface available for PVH hardware
> >>>>>> domains.  I find it ugly, but I don't see much other way to deal with
> >>>>>> those kind of "devices".  Xen mediating accesses for each one of them
> >>>>>> is unlikely to be doable.
> >>>>>>
> >>>>>> How do you hook this exchange interface into Linux to differentiate
> >>>>>> which drivers need to use mfns when interacting with the hardware?
> >>>>>
> >>>>> In the specific case we have at hands the driver is in Linux userspace
> >>>>> and is specially-written for our use case. It is not generic, so we
> >>>>> don't have this problem. But your question is valid.
> >>>>
> >>>> Oh, so you then have some kind of ioctl interface that does the memory
> >>>> exchange and bouncing inside of the kernel on behalf of the user-space
> >>>> side I would think?
> >>>
> >>> I am not sure... Xenia might know more than me here.
> >>
> >> One further question I have regarding this approach: have you
> >> considered just populating an empty p2m space with contiguous physical
> >> memory instead of exchanging an existing area?  That would increase
> >> dom0 memory usage, but would prevent super page shattering in the p2m.
> >> You could use a dom0_mem=X,max:X+Y command line option, where Y
> >> would be your extra room for swiotlb-xen bouncing usage.
> >>
> >> XENMEM_increase_reservation documentation notes such hypercall already
> >> returns the base MFN of the allocated page (see comment in
> >> xen_memory_reservation struct declaration).
> >
> > XENMEM_exchange is the way it has been implemented traditionally in
> > Linux swiotlb-xen (it has been years). But your idea is good.
> >
> > Another, more drastic, idea would be to attempt to map Dom0 PVH memory
> > 1:1 at domain creation time like we do on ARM. If not all of it, as much
> > as possible. That would resolve the problem very efficiently. We could
> > communicate to Dom0 PVH the range that is 1:1 in one of the initial data
> > structures, and that would be the end of it.
> >
> 
> Could that be done by introducing a "fake" reserved region in advance
> (IVMD?) ? Such region are usually mapped to the domain 1:1 in addition
> to be coherent on the IOMMU side (so it doesn't break in case the PSP
> gets IOMMU-aware).

It doesn't have to be an "official" reserved-memory region (in the sense
of Documentation/devicetree/bindings/reserved-memory/) or exposed via
IVMD.

The memory that ends up mapped 1:1 in Dom0 PVH will be good memory which
could be used for all the regular stuff. If it is not a tiny amount, we
could let Linux (or other OS) do as they please with it.

It is only important for swiotlb-xen to know which one is the 1:1 range
so that it can manage bouncing (or not) over it.

If the 1:1 region is tiny, possibly due to memory allocation constraints
or other reasons, then yes, it makes sense to mark it as reserved
memory as you suggested, which we would do with a /reserved-memory node
if this was device tree system.
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Marek Marczykowski-Górecki 7 months, 1 week ago
On Thu, May 08, 2025 at 11:12:34AM +0200, Roger Pau Monné wrote:
> On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
> > On Tue, 6 May 2025, Roger Pau Monné wrote:
> > > On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> > > > On Mon, 5 May 2025, Roger Pau Monné wrote:
> > > > > On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> > > > > > On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> > > > > > > On Mon, 28 Apr 2025, Jan Beulich wrote:
> > > > > > > > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > > > > > > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > > > > > > > 
> > > > > > > > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > > > > > > > addresses to firmware or co-processors not behind an IOMMU.
> > > > > > > > 
> > > > > > > > I definitely don't understand the firmware part: It's subject to the
> > > > > > > > same transparent P2M translations as the rest of the VM; it's just
> > > > > > > > another piece of software running there.
> > > > > > > > 
> > > > > > > > "Co-processors not behind an IOMMU" is also interesting; a more
> > > > > > > > concrete scenario might be nice, yet I realize you may be limited in
> > > > > > > > what you're allowed to say.
> > > > > > > 
> > > > > > > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > > > > > > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > > > > > > pass addresses to it.  See drivers/tee/amdtee/ and
> > > > > > > include/linux/psp-tee.h in Linux.
> > > > > > 
> > > > > > We had (have?) similar issue with amdgpu (for integrated graphics) - it
> > > > > > uses PSP for loading its firmware. With PV dom0 there is a workaround as
> > > > > > dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> > > > > > expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> > > > > > the one I used for debugging this issue).
> > > > > 
> > > > > That's ugly, and problematic when used in conjunction with AMD-SEV.
> > > > > 
> > > > > I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> > > > > to use, while allowing Xen to be the sole owner of the device.  Having
> > > > > both Xen and dom0 use it (for different purposes) seems like asking
> > > > > for trouble.  But I also have no idea how complex the PSP interface
> > > > > is, neither whether it would be feasible to emulate the
> > > > > interfaces/registers needed for firmware loading.
> > > > 
> > > > Let me take a step back from the PSP for a moment. I am not opposed to a
> > > > PSP mediator in Xen, but I want to emphasize that the issue is more
> > > > general and extends well beyond the PSP.
> > > > 
> > > > In my years working in embedded systems, I have consistently seen cases
> > > > where Dom0 needs to communicate with something that does not go through
> > > > the IOMMU. This could be due to special firmware on a co-processor, a
> > > > hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> > > > device that technically supports the IOMMU but performs poorly unless
> > > > the IOMMU is disabled. All of these are real-world examples that I have
> > > > seen personally.
> > > 
> > > I wouldn't be surprised, classic PV dom0 avoided those issues because
> > > it was dealing directly with host addresses (mfns), but that's not the
> > > case with PVH dom0.
> > 
> > Yeah
> > 
> > 
> > > > In my opinion, we definitely need a solution like this patch for Dom0
> > > > PVH to function correctly in all scenarios.
> > > 
> > > I'm not opposed to having such interface available for PVH hardware
> > > domains.  I find it ugly, but I don't see much other way to deal with
> > > those kind of "devices".  Xen mediating accesses for each one of them
> > > is unlikely to be doable.
> > > 
> > > How do you hook this exchange interface into Linux to differentiate
> > > which drivers need to use mfns when interacting with the hardware?
> > 
> > In the specific case we have at hands the driver is in Linux userspace
> > and is specially-written for our use case. It is not generic, so we
> > don't have this problem. But your question is valid.
> 
> Oh, so you then have some kind of ioctl interface that does the memory
> exchange and bouncing inside of the kernel on behalf of the user-space
> side I would think?
> 
> > In Linux, the issue is hidden behind drivers/xen/swiotlb-xen.c and
> > xen_arch_need_swiotlb. There are a few options:
> > - force swiotlb bounce for everything on the problematic SoC
> > - edit xen_arch_need_swiotlb to return true for the problematic device
> > - introduce a kernel command line option to specify which device
> >   xen_arch_need_swiotlb should return true for
> 
> Isn't it a bit misleading to use the swiotlb for this purpose?  Won't
> this usage of the swiotlb (to bounce from gfns to mfns) create issues
> if there's any devices that have a DMA physical address limitation and
> also needs to use the swiotlb while being behind the IOMMU?
> 
> > - introduce an ACPI table with the relevant info
> 
> Hm, best option might be an ACPI table so that Xen can signal to the
> hardware domain whether communication with the device must be done
> using mfns, or if accesses are mediated and hence can be done using
> gfns?

How does it work on native when some devices are not behind IOMMU? Is it
signaled via an ACPI table? It feels like it's a similar (although not
the same) situation here.

> It's a bit cumbersome however to have to resort to an ACPI table just
> for this.  Not sure whether we could expand one of the existing tables
> already under Xen control (STAO?) to contain this information.  It all
> looks a bit ad-hoc.
> 
> I think we need some kind of list of devices that are not behind the
> IOMMU, but I have no idea what URI to use to identify those.  I assume
> the PSP doesn't have a PCI SBDF (as it's not on the PCI bus?).  Maybe
> by ACPI path?

It is actually on the PCI bus:
05:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]

> Or maybe it's fine to always communicate with those non-translated
> devices using MFNs, and even if we later add some kind of PSP
> mediation (so that both Xen and dom0 can use it), accesses by dom0
> will still be assumed to be using MFNs, and thus need no translation.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 1 week ago
On Thu, 8 May 2025, Marek Marczykowski-Górecki wrote:
> On Thu, May 08, 2025 at 11:12:34AM +0200, Roger Pau Monné wrote:
> > On Wed, May 07, 2025 at 04:02:11PM -0700, Stefano Stabellini wrote:
> > > On Tue, 6 May 2025, Roger Pau Monné wrote:
> > > > On Mon, May 05, 2025 at 11:11:10AM -0700, Stefano Stabellini wrote:
> > > > > On Mon, 5 May 2025, Roger Pau Monné wrote:
> > > > > > On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
> > > > > > > On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
> > > > > > > > On Mon, 28 Apr 2025, Jan Beulich wrote:
> > > > > > > > > On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > > > > > > > > From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> > > > > > > > > > 
> > > > > > > > > > Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> > > > > > > > > > addresses to firmware or co-processors not behind an IOMMU.
> > > > > > > > > 
> > > > > > > > > I definitely don't understand the firmware part: It's subject to the
> > > > > > > > > same transparent P2M translations as the rest of the VM; it's just
> > > > > > > > > another piece of software running there.
> > > > > > > > > 
> > > > > > > > > "Co-processors not behind an IOMMU" is also interesting; a more
> > > > > > > > > concrete scenario might be nice, yet I realize you may be limited in
> > > > > > > > > what you're allowed to say.
> > > > > > > > 
> > > > > > > > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > > > > > > > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > > > > > > > pass addresses to it.  See drivers/tee/amdtee/ and
> > > > > > > > include/linux/psp-tee.h in Linux.
> > > > > > > 
> > > > > > > We had (have?) similar issue with amdgpu (for integrated graphics) - it
> > > > > > > uses PSP for loading its firmware. With PV dom0 there is a workaround as
> > > > > > > dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
> > > > > > > expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
> > > > > > > the one I used for debugging this issue).
> > > > > > 
> > > > > > That's ugly, and problematic when used in conjunction with AMD-SEV.
> > > > > > 
> > > > > > I wonder if Xen could emulate/mediate some parts of the PSP for dom0
> > > > > > to use, while allowing Xen to be the sole owner of the device.  Having
> > > > > > both Xen and dom0 use it (for different purposes) seems like asking
> > > > > > for trouble.  But I also have no idea how complex the PSP interface
> > > > > > is, neither whether it would be feasible to emulate the
> > > > > > interfaces/registers needed for firmware loading.
> > > > > 
> > > > > Let me take a step back from the PSP for a moment. I am not opposed to a
> > > > > PSP mediator in Xen, but I want to emphasize that the issue is more
> > > > > general and extends well beyond the PSP.
> > > > > 
> > > > > In my years working in embedded systems, I have consistently seen cases
> > > > > where Dom0 needs to communicate with something that does not go through
> > > > > the IOMMU. This could be due to special firmware on a co-processor, a
> > > > > hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> > > > > device that technically supports the IOMMU but performs poorly unless
> > > > > the IOMMU is disabled. All of these are real-world examples that I have
> > > > > seen personally.
> > > > 
> > > > I wouldn't be surprised, classic PV dom0 avoided those issues because
> > > > it was dealing directly with host addresses (mfns), but that's not the
> > > > case with PVH dom0.
> > > 
> > > Yeah
> > > 
> > > 
> > > > > In my opinion, we definitely need a solution like this patch for Dom0
> > > > > PVH to function correctly in all scenarios.
> > > > 
> > > > I'm not opposed to having such interface available for PVH hardware
> > > > domains.  I find it ugly, but I don't see much other way to deal with
> > > > those kind of "devices".  Xen mediating accesses for each one of them
> > > > is unlikely to be doable.
> > > > 
> > > > How do you hook this exchange interface into Linux to differentiate
> > > > which drivers need to use mfns when interacting with the hardware?
> > > 
> > > In the specific case we have at hands the driver is in Linux userspace
> > > and is specially-written for our use case. It is not generic, so we
> > > don't have this problem. But your question is valid.
> > 
> > Oh, so you then have some kind of ioctl interface that does the memory
> > exchange and bouncing inside of the kernel on behalf of the user-space
> > side I would think?
> > 
> > > In Linux, the issue is hidden behind drivers/xen/swiotlb-xen.c and
> > > xen_arch_need_swiotlb. There are a few options:
> > > - force swiotlb bounce for everything on the problematic SoC
> > > - edit xen_arch_need_swiotlb to return true for the problematic device
> > > - introduce a kernel command line option to specify which device
> > >   xen_arch_need_swiotlb should return true for
> > 
> > Isn't it a bit misleading to use the swiotlb for this purpose?  Won't
> > this usage of the swiotlb (to bounce from gfns to mfns) create issues
> > if there's any devices that have a DMA physical address limitation and
> > also needs to use the swiotlb while being behind the IOMMU?
> > 
> > > - introduce an ACPI table with the relevant info
> > 
> > Hm, best option might be an ACPI table so that Xen can signal to the
> > hardware domain whether communication with the device must be done
> > using mfns, or if accesses are mediated and hence can be done using
> > gfns?
> 
> How does it work on native when some devices are not behind IOMMU? Is it
> signaled via an ACPI table? It feels like it's a similar (although not
> the same) situation here.

The ACPI AMD IVRS table should have this information, but we cannot use
the IVRS table for the guest.

Sometimes this kind of information for platform devices (e.g. GPIO, i2c)
is not reported at all.
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Demi Marie Obenour 7 months, 2 weeks ago
On 5/5/25 2:11 PM, Stefano Stabellini wrote:
> On Mon, 5 May 2025, Roger Pau Monné wrote:
>> On Mon, May 05, 2025 at 12:40:18PM +0200, Marek Marczykowski-Górecki wrote:
>>> On Mon, Apr 28, 2025 at 01:00:01PM -0700, Stefano Stabellini wrote:
>>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
>>>>>> From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
>>>>>>
>>>>>> Dom0 PVH might need XENMEM_exchange when passing contiguous memory
>>>>>> addresses to firmware or co-processors not behind an IOMMU.
>>>>>
>>>>> I definitely don't understand the firmware part: It's subject to the
>>>>> same transparent P2M translations as the rest of the VM; it's just
>>>>> another piece of software running there.
>>>>>
>>>>> "Co-processors not behind an IOMMU" is also interesting; a more
>>>>> concrete scenario might be nice, yet I realize you may be limited in
>>>>> what you're allowed to say.
>>>>
>>>> Sure. On AMD x86 platforms there is a co-processor called PSP running
>>>> TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
>>>> pass addresses to it.  See drivers/tee/amdtee/ and
>>>> include/linux/psp-tee.h in Linux.
>>>
>>> We had (have?) similar issue with amdgpu (for integrated graphics) - it
>>> uses PSP for loading its firmware. With PV dom0 there is a workaround as
>>> dom0 kinda knows MFN. I haven't tried PVH dom0 on such system yet, but I
>>> expect troubles (BTW, hw1 aka zen2 gitlab runner has amdgpu, and it's
>>> the one I used for debugging this issue).
>>
>> That's ugly, and problematic when used in conjunction with AMD-SEV.
>>
>> I wonder if Xen could emulate/mediate some parts of the PSP for dom0
>> to use, while allowing Xen to be the sole owner of the device.  Having
>> both Xen and dom0 use it (for different purposes) seems like asking
>> for trouble.  But I also have no idea how complex the PSP interface
>> is, neither whether it would be feasible to emulate the
>> interfaces/registers needed for firmware loading.
> 
> Let me take a step back from the PSP for a moment. I am not opposed to a
> PSP mediator in Xen, but I want to emphasize that the issue is more
> general and extends well beyond the PSP.
> 
> In my years working in embedded systems, I have consistently seen cases
> where Dom0 needs to communicate with something that does not go through
> the IOMMU. This could be due to special firmware on a co-processor, a
> hardware erratum that prevents proper IOMMU usage, or a high-bandwidth
> device that technically supports the IOMMU but performs poorly unless
> the IOMMU is disabled. All of these are real-world examples that I have
> seen personally.
> 
> In my opinion, we definitely need a solution like this patch for Dom0
> PVH to function correctly in all scenarios.
> 
> Additionally, we could add a PSP mediator in Xen to provide best PSP
> support, and also for cases where both Xen and Dom0 need access, but I
> cannot fully assess the complexity involved, as I am not very familiar
> with the API. Probably it is not going to be simple.
> 
> Even with the PSP mediator in place, I would still recommend this patch.

How does this patch interact with dom0 deprivilege?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Jan Beulich 7 months, 3 weeks ago
On 28.04.2025 22:00, Stefano Stabellini wrote:
> On Mon, 28 Apr 2025, Jan Beulich wrote:
>> On 25.04.2025 22:19, Stefano Stabellini wrote:
>>> From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
>>>
>>> Dom0 PVH might need XENMEM_exchange when passing contiguous memory
>>> addresses to firmware or co-processors not behind an IOMMU.
>>
>> I definitely don't understand the firmware part: It's subject to the
>> same transparent P2M translations as the rest of the VM; it's just
>> another piece of software running there.
>>
>> "Co-processors not behind an IOMMU" is also interesting; a more
>> concrete scenario might be nice, yet I realize you may be limited in
>> what you're allowed to say.
> 
> Sure. On AMD x86 platforms there is a co-processor called PSP running
> TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> pass addresses to it.  See drivers/tee/amdtee/ and
> include/linux/psp-tee.h in Linux.
> 
> This is not a new problem. On ARM we have been dealing with this kind of
> issues for more than a decade and it is the reason why originally the
> decision was to run Dom0 1:1 mapped on ARM. I am not suggesting to map
> Dom0 PVH 1:1; I am only providing context and highlighting that we
> have been lucky on x86 :-)
> 
> 
>>> XENMEM_exchange was blocked for HVM/PVH DomUs, and accidentally it
>>> impacted Dom0 PVH as well.
>>
>> This wasn't accidental at all, I don't think.
> 
> You as the original author of the patch (fae7d5be8bb) in question would
> surely know better. But the way the commit message was written was
> pointing toward a Dom0/DeviceModel problem:
> 
> "The operation's success can't be controlled by the guest, as the device
> model may have an active mapping of the page."

It's the problem mentioned at the bottom: MFNs (in principle) are entirely
meaningless to HVM and PVH domains. With, as you point out there, an
apparently important exception.

>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -4401,7 +4401,7 @@ int steal_page(
>>>      const struct domain *owner;
>>>      int rc;
>>>  
>>> -    if ( paging_mode_external(d) )
>>> +    if ( paging_mode_external(d) && !is_hardware_domain(d) )
>>>          return -EOPNOTSUPP;
>>>  
>>>      /* Grab a reference to make sure the page doesn't change under our feet */
>>
>> Is this (in particular the code following below here) a safe thing to do
>> when we don't properly refcount page references from the P2M, yet? It's
>> Dom0, yes, but even there I might see potential security implications (as
>> top violating privacy of a guest).
> 
> I don't think I am following, could you please elaborate more? The
> change I am proposing is to allow Dom0 to share its own pages to the
> co-processor. DomUs are not in the picture. I would be happy to add
> further restriction to that effect. Is there something else you have in
> mind?

Once "shared" with the PSP, how would Xen know that this sharing has stopped?
Without knowing, how could it safely give the same page to a DomU later on?
("Safely" in both directions: Without compromising privacy of the DomU and
without compromising host safety / security.)

>> Furthermore cleanup_page_mappings() (called later in the function) has a
>> PV-only aspect which would apparently need widening to PVH Dom0 then,
>> too.
> 
> You are referring to:
> 
>         if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
>             rc = iommu_legacy_unmap(d, _dfn(mfn), 1u << PAGE_ORDER_4K);
> 
> is that correct?

Yes, that's what immediately caught my eye. I didn't look carefully whether
there might be more.

>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>>>              rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
>>>                                          exch.out.extent_order) ?: rc;
>>>  
>>> -            if ( !paging_mode_translate(d) &&
>>> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
>>>                   __copy_mfn_to_guest_offset(exch.out.extent_start,
>>>                                              (i << out_chunk_order) + j,
>>>                                              mfn) )
>>
>> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
>> it?
> 
> One way or another Dom0 PVH needs to know the MFN to pass it to the
> co-processor.

I see. That's pretty odd, though. I'm then further concerned of the order of
the chunks. At present we're rather lax, in permitting PVH and PV Dom0 the
same upper bound. With both CPU and I/O side translation there is, in
principle, no reason to permit any kind of contiguity. Of course there's a
performance aspect, but that hardly matters in the specific case here. Yet at
the same time, once we expose MFNs, contiguity will start mattering as soon
as any piece of memory needs to be larger than PAGE_SIZE. Which means it will
make tightening of the presently lax handling prone to regressions in this
new behavior you're introducing. What chunk size does the PSP driver require?

One further thought here: Is it really the hardware domain which is most
logical to drive the PSP? Interaction is (just guessing) perhaps needed
primarily when creating / managing guests? Then having the control domain
talk to a driver in the hardware domain would make the overall picture more
complicated.

Jan
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 3 weeks ago
On Tue, 29 Apr 2025, Jan Beulich wrote:
> On 28.04.2025 22:00, Stefano Stabellini wrote:
> > On Mon, 28 Apr 2025, Jan Beulich wrote:
> >> On 25.04.2025 22:19, Stefano Stabellini wrote:
> >>> From: Xenia Ragiadakou <Xenia.Ragiadakou@amd.com>
> >>>
> >>> Dom0 PVH might need XENMEM_exchange when passing contiguous memory
> >>> addresses to firmware or co-processors not behind an IOMMU.
> >>
> >> I definitely don't understand the firmware part: It's subject to the
> >> same transparent P2M translations as the rest of the VM; it's just
> >> another piece of software running there.
> >>
> >> "Co-processors not behind an IOMMU" is also interesting; a more
> >> concrete scenario might be nice, yet I realize you may be limited in
> >> what you're allowed to say.
> > 
> > Sure. On AMD x86 platforms there is a co-processor called PSP running
> > TEE firmware. The PSP is not behind an IOMMU. Dom0 needs occasionally to
> > pass addresses to it.  See drivers/tee/amdtee/ and
> > include/linux/psp-tee.h in Linux.
> > 
> > This is not a new problem. On ARM we have been dealing with this kind of
> > issues for more than a decade and it is the reason why originally the
> > decision was to run Dom0 1:1 mapped on ARM. I am not suggesting to map
> > Dom0 PVH 1:1; I am only providing context and highlighting that we
> > have been lucky on x86 :-)
> > 
> > 
> >>> XENMEM_exchange was blocked for HVM/PVH DomUs, and accidentally it
> >>> impacted Dom0 PVH as well.
> >>
> >> This wasn't accidental at all, I don't think.
> > 
> > You as the original author of the patch (fae7d5be8bb) in question would
> > surely know better. But the way the commit message was written was
> > pointing toward a Dom0/DeviceModel problem:
> > 
> > "The operation's success can't be controlled by the guest, as the device
> > model may have an active mapping of the page."
> 
> It's the problem mentioned at the bottom: MFNs (in principle) are entirely
> meaningless to HVM and PVH domains. With, as you point out there, an
> apparently important exception.
> 
> >>> --- a/xen/arch/x86/mm.c
> >>> +++ b/xen/arch/x86/mm.c
> >>> @@ -4401,7 +4401,7 @@ int steal_page(
> >>>      const struct domain *owner;
> >>>      int rc;
> >>>  
> >>> -    if ( paging_mode_external(d) )
> >>> +    if ( paging_mode_external(d) && !is_hardware_domain(d) )
> >>>          return -EOPNOTSUPP;
> >>>  
> >>>      /* Grab a reference to make sure the page doesn't change under our feet */
> >>
> >> Is this (in particular the code following below here) a safe thing to do
> >> when we don't properly refcount page references from the P2M, yet? It's
> >> Dom0, yes, but even there I might see potential security implications (as
> >> top violating privacy of a guest).
> > 
> > I don't think I am following, could you please elaborate more? The
> > change I am proposing is to allow Dom0 to share its own pages to the
> > co-processor. DomUs are not in the picture. I would be happy to add
> > further restriction to that effect. Is there something else you have in
> > mind?
> 
> Once "shared" with the PSP, how would Xen know that this sharing has stopped?
> Without knowing, how could it safely give the same page to a DomU later on?
> ("Safely" in both directions: Without compromising privacy of the DomU and
> without compromising host safety / security.)

Why would Xen later assign the same page to a DomU? The page comes
from the hardware domain, which, as of today, cannot be destroyed. BTW I
realize it is a bit different, but we have been doing the same thing
with Dom0 1:1 mapped on ARM since the start of the project.

Additionally, the TEE is considered a more trusted component in the
system than the hypervisor.


> >> Furthermore cleanup_page_mappings() (called later in the function) has a
> >> PV-only aspect which would apparently need widening to PVH Dom0 then,
> >> too.
> > 
> > You are referring to:
> > 
> >         if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
> >             rc = iommu_legacy_unmap(d, _dfn(mfn), 1u << PAGE_ORDER_4K);
> > 
> > is that correct?
> 
> Yes, that's what immediately caught my eye. I didn't look carefully whether
> there might be more.
> 
> >>> --- a/xen/common/memory.c
> >>> +++ b/xen/common/memory.c
> >>> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> >>>              rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
> >>>                                          exch.out.extent_order) ?: rc;
> >>>  
> >>> -            if ( !paging_mode_translate(d) &&
> >>> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
> >>>                   __copy_mfn_to_guest_offset(exch.out.extent_start,
> >>>                                              (i << out_chunk_order) + j,
> >>>                                              mfn) )
> >>
> >> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
> >> it?
> > 
> > One way or another Dom0 PVH needs to know the MFN to pass it to the
> > co-processor.
> 
> I see. That's pretty odd, though. I'm then further concerned of the order of
> the chunks. At present we're rather lax, in permitting PVH and PV Dom0 the
> same upper bound. With both CPU and I/O side translation there is, in
> principle, no reason to permit any kind of contiguity. Of course there's a
> performance aspect, but that hardly matters in the specific case here. Yet at
> the same time, once we expose MFNs, contiguity will start mattering as soon
> as any piece of memory needs to be larger than PAGE_SIZE. Which means it will
> make tightening of the presently lax handling prone to regressions in this
> new behavior you're introducing. What chunk size does the PSP driver require?

I don't know. The memory returned by XENMEM_exchange is contiguous,
right? Are you worried that Xen cannot allocate the requested amount of
memory contiguously? We have been using this patch for months now and it
has been working correctly to this day.


> One further thought here: Is it really the hardware domain which is most
> logical to drive the PSP? Interaction is (just guessing) perhaps needed
> primarily when creating / managing guests? Then having the control domain
> talk to a driver in the hardware domain would make the overall picture more
> complicated.

On AMD the interactions are more related to access to hardware or
controlling other peripherals so it is a good fit for the hardware
domain.
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Jan Beulich 7 months, 3 weeks ago
On 29.04.2025 23:52, Stefano Stabellini wrote:
> On Tue, 29 Apr 2025, Jan Beulich wrote:
>> On 28.04.2025 22:00, Stefano Stabellini wrote:
>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
>>>>> --- a/xen/arch/x86/mm.c
>>>>> +++ b/xen/arch/x86/mm.c
>>>>> @@ -4401,7 +4401,7 @@ int steal_page(
>>>>>      const struct domain *owner;
>>>>>      int rc;
>>>>>  
>>>>> -    if ( paging_mode_external(d) )
>>>>> +    if ( paging_mode_external(d) && !is_hardware_domain(d) )
>>>>>          return -EOPNOTSUPP;
>>>>>  
>>>>>      /* Grab a reference to make sure the page doesn't change under our feet */
>>>>
>>>> Is this (in particular the code following below here) a safe thing to do
>>>> when we don't properly refcount page references from the P2M, yet? It's
>>>> Dom0, yes, but even there I might see potential security implications (as
>>>> top violating privacy of a guest).
>>>
>>> I don't think I am following, could you please elaborate more? The
>>> change I am proposing is to allow Dom0 to share its own pages to the
>>> co-processor. DomUs are not in the picture. I would be happy to add
>>> further restriction to that effect. Is there something else you have in
>>> mind?
>>
>> Once "shared" with the PSP, how would Xen know that this sharing has stopped?
>> Without knowing, how could it safely give the same page to a DomU later on?
>> ("Safely" in both directions: Without compromising privacy of the DomU and
>> without compromising host safety / security.)
> 
> Why would Xen later assign the same page to a DomU? The page comes
> from the hardware domain, which, as of today, cannot be destroyed. BTW I
> realize it is a bit different, but we have been doing the same thing
> with Dom0 1:1 mapped on ARM since the start of the project.

The life cycle of the page within Dom0 may be such that a need arises to
move it elsewhere (balloon out, grant-transfer, and what not).

>>>>> --- a/xen/common/memory.c
>>>>> +++ b/xen/common/memory.c
>>>>> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>>>>>              rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
>>>>>                                          exch.out.extent_order) ?: rc;
>>>>>  
>>>>> -            if ( !paging_mode_translate(d) &&
>>>>> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
>>>>>                   __copy_mfn_to_guest_offset(exch.out.extent_start,
>>>>>                                              (i << out_chunk_order) + j,
>>>>>                                              mfn) )
>>>>
>>>> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
>>>> it?
>>>
>>> One way or another Dom0 PVH needs to know the MFN to pass it to the
>>> co-processor.
>>
>> I see. That's pretty odd, though. I'm then further concerned of the order of
>> the chunks. At present we're rather lax, in permitting PVH and PV Dom0 the
>> same upper bound. With both CPU and I/O side translation there is, in
>> principle, no reason to permit any kind of contiguity. Of course there's a
>> performance aspect, but that hardly matters in the specific case here. Yet at
>> the same time, once we expose MFNs, contiguity will start mattering as soon
>> as any piece of memory needs to be larger than PAGE_SIZE. Which means it will
>> make tightening of the presently lax handling prone to regressions in this
>> new behavior you're introducing. What chunk size does the PSP driver require?
> 
> I don't know. The memory returned by XENMEM_exchange is contiguous,
> right? Are you worried that Xen cannot allocate the requested amount of
> memory contiguously?

That would be Dom0's problem then. But really for a translated guest the
exchanged chunks being contiguous shouldn't matter, correctness-wise. That is,
within Xen, rather than failing a request, we could choose to retry using
discontiguous chunks (contiguous only in GFN space). Such an (afaict)
otherwise correct change would break your use case, as it would invalidate the
MFN information passed back. (This fallback approach would similarly apply to
other related mem-ops. It's just that during domain creation the tool stack
has its own fallback, so it may not be of much use right now.)

> We have been using this patch for months now and it
> has been working correctly to this day.

Sure, that's a good data point. Yet not a proof of correctness.

Jan
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Roger Pau Monné 7 months, 3 weeks ago
On Wed, Apr 30, 2025 at 08:27:55AM +0200, Jan Beulich wrote:
> On 29.04.2025 23:52, Stefano Stabellini wrote:
> > On Tue, 29 Apr 2025, Jan Beulich wrote:
> >> On 28.04.2025 22:00, Stefano Stabellini wrote:
> >>> On Mon, 28 Apr 2025, Jan Beulich wrote:
> >>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
> >>>>> --- a/xen/arch/x86/mm.c
> >>>>> +++ b/xen/arch/x86/mm.c
> >>>>> @@ -4401,7 +4401,7 @@ int steal_page(
> >>>>>      const struct domain *owner;
> >>>>>      int rc;
> >>>>>  
> >>>>> -    if ( paging_mode_external(d) )
> >>>>> +    if ( paging_mode_external(d) && !is_hardware_domain(d) )
> >>>>>          return -EOPNOTSUPP;
> >>>>>  
> >>>>>      /* Grab a reference to make sure the page doesn't change under our feet */
> >>>>
> >>>> Is this (in particular the code following below here) a safe thing to do
> >>>> when we don't properly refcount page references from the P2M, yet? It's
> >>>> Dom0, yes, but even there I might see potential security implications (as
> >>>> top violating privacy of a guest).
> >>>
> >>> I don't think I am following, could you please elaborate more? The
> >>> change I am proposing is to allow Dom0 to share its own pages to the
> >>> co-processor. DomUs are not in the picture. I would be happy to add
> >>> further restriction to that effect. Is there something else you have in
> >>> mind?
> >>
> >> Once "shared" with the PSP, how would Xen know that this sharing has stopped?
> >> Without knowing, how could it safely give the same page to a DomU later on?
> >> ("Safely" in both directions: Without compromising privacy of the DomU and
> >> without compromising host safety / security.)
> > 
> > Why would Xen later assign the same page to a DomU? The page comes
> > from the hardware domain, which, as of today, cannot be destroyed. BTW I
> > realize it is a bit different, but we have been doing the same thing
> > with Dom0 1:1 mapped on ARM since the start of the project.
> 
> The life cycle of the page within Dom0 may be such that a need arises to
> move it elsewhere (balloon out, grant-transfer, and what not).

I think it's up to dom0 to make sure the page is handled
appropriately, in order for it to keep it's special contiguity
properties.

If the PSP is not using the IOMMU page-tables for DMA accesses, and
the hardware domain can freely interact with it, there's no protection
from such device accessing any random MFN on the system, and hence no
refcounts or similar will protect from that.

The only protection would be Xen owning the device, and the hardware
domain using an emulated/mediated interface to communicate with it.  I
have no idea how complicated the PSP interface is, and whether it
would be feasible to trap and emulate/mediate accesses in Xen.

> >>>>> --- a/xen/common/memory.c
> >>>>> +++ b/xen/common/memory.c
> >>>>> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> >>>>>              rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
> >>>>>                                          exch.out.extent_order) ?: rc;
> >>>>>  
> >>>>> -            if ( !paging_mode_translate(d) &&
> >>>>> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
> >>>>>                   __copy_mfn_to_guest_offset(exch.out.extent_start,
> >>>>>                                              (i << out_chunk_order) + j,
> >>>>>                                              mfn) )
> >>>>
> >>>> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
> >>>> it?
> >>>
> >>> One way or another Dom0 PVH needs to know the MFN to pass it to the
> >>> co-processor.
> >>
> >> I see. That's pretty odd, though. I'm then further concerned of the order of
> >> the chunks. At present we're rather lax, in permitting PVH and PV Dom0 the
> >> same upper bound. With both CPU and I/O side translation there is, in
> >> principle, no reason to permit any kind of contiguity. Of course there's a
> >> performance aspect, but that hardly matters in the specific case here. Yet at
> >> the same time, once we expose MFNs, contiguity will start mattering as soon
> >> as any piece of memory needs to be larger than PAGE_SIZE. Which means it will
> >> make tightening of the presently lax handling prone to regressions in this
> >> new behavior you're introducing. What chunk size does the PSP driver require?
> > 
> > I don't know. The memory returned by XENMEM_exchange is contiguous,
> > right? Are you worried that Xen cannot allocate the requested amount of
> > memory contiguously?
> 
> That would be Dom0's problem then. But really for a translated guest the
> exchanged chunks being contiguous shouldn't matter, correctness-wise. That is,
> within Xen, rather than failing a request, we could choose to retry using
> discontiguous chunks (contiguous only in GFN space). Such an (afaict)
> otherwise correct change would break your use case, as it would invalidate the
> MFN information passed back. (This fallback approach would similarly apply to
> other related mem-ops. It's just that during domain creation the tool stack
> has its own fallback, so it may not be of much use right now.)

I think the description in the public header needs to be expanded to
specify what the XENMEM_exchange does for translated guests, and
clearly write down that the underlying MFNs for the exchanged region
will be contiguous.  Possibly a new XENMEMF_ flag needs to be added to
request contiguous physical memory for the new range.

Sadly this also has the side effect of quite likely shattering
superpages for dom0 EPT/NPT, which will result in decreased dom0
performance.

We have so far avoided exposing MFNs to HVM/PVH, but I don't see much
way to avoid this if there's no option to use IOMMU or NPT page-tables
with the PSP and we don't want to intercept PSP accesses in Xen and
translate requests on the fly.

Thanks, Roger.
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 3 weeks ago
On Wed, 30 Apr 2025, Roger Pau Monné wrote:
> On Wed, Apr 30, 2025 at 08:27:55AM +0200, Jan Beulich wrote:
> > On 29.04.2025 23:52, Stefano Stabellini wrote:
> > > On Tue, 29 Apr 2025, Jan Beulich wrote:
> > >> On 28.04.2025 22:00, Stefano Stabellini wrote:
> > >>> On Mon, 28 Apr 2025, Jan Beulich wrote:
> > >>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
> > >>>>> --- a/xen/arch/x86/mm.c
> > >>>>> +++ b/xen/arch/x86/mm.c
> > >>>>> @@ -4401,7 +4401,7 @@ int steal_page(
> > >>>>>      const struct domain *owner;
> > >>>>>      int rc;
> > >>>>>  
> > >>>>> -    if ( paging_mode_external(d) )
> > >>>>> +    if ( paging_mode_external(d) && !is_hardware_domain(d) )
> > >>>>>          return -EOPNOTSUPP;
> > >>>>>  
> > >>>>>      /* Grab a reference to make sure the page doesn't change under our feet */
> > >>>>
> > >>>> Is this (in particular the code following below here) a safe thing to do
> > >>>> when we don't properly refcount page references from the P2M, yet? It's
> > >>>> Dom0, yes, but even there I might see potential security implications (as
> > >>>> top violating privacy of a guest).
> > >>>
> > >>> I don't think I am following, could you please elaborate more? The
> > >>> change I am proposing is to allow Dom0 to share its own pages to the
> > >>> co-processor. DomUs are not in the picture. I would be happy to add
> > >>> further restriction to that effect. Is there something else you have in
> > >>> mind?
> > >>
> > >> Once "shared" with the PSP, how would Xen know that this sharing has stopped?
> > >> Without knowing, how could it safely give the same page to a DomU later on?
> > >> ("Safely" in both directions: Without compromising privacy of the DomU and
> > >> without compromising host safety / security.)
> > > 
> > > Why would Xen later assign the same page to a DomU? The page comes
> > > from the hardware domain, which, as of today, cannot be destroyed. BTW I
> > > realize it is a bit different, but we have been doing the same thing
> > > with Dom0 1:1 mapped on ARM since the start of the project.
> > 
> > The life cycle of the page within Dom0 may be such that a need arises to
> > move it elsewhere (balloon out, grant-transfer, and what not).
> 
> I think it's up to dom0 to make sure the page is handled
> appropriately, in order for it to keep it's special contiguity
> properties.
> 
> If the PSP is not using the IOMMU page-tables for DMA accesses, and
> the hardware domain can freely interact with it, there's no protection
> from such device accessing any random MFN on the system, and hence no
> refcounts or similar will protect from that.

Yes, exactly!


> The only protection would be Xen owning the device, and the hardware
> domain using an emulated/mediated interface to communicate with it.  I
> have no idea how complicated the PSP interface is, and whether it
> would be feasible to trap and emulate/mediate accesses in Xen.

There will be always the possibility of devices or co-processors or
firmware not behind the IOMMU and we won't be able to handle them all in
Xen.


> > >>>>> --- a/xen/common/memory.c
> > >>>>> +++ b/xen/common/memory.c
> > >>>>> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> > >>>>>              rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
> > >>>>>                                          exch.out.extent_order) ?: rc;
> > >>>>>  
> > >>>>> -            if ( !paging_mode_translate(d) &&
> > >>>>> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
> > >>>>>                   __copy_mfn_to_guest_offset(exch.out.extent_start,
> > >>>>>                                              (i << out_chunk_order) + j,
> > >>>>>                                              mfn) )
> > >>>>
> > >>>> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
> > >>>> it?
> > >>>
> > >>> One way or another Dom0 PVH needs to know the MFN to pass it to the
> > >>> co-processor.
> > >>
> > >> I see. That's pretty odd, though. I'm then further concerned of the order of
> > >> the chunks. At present we're rather lax, in permitting PVH and PV Dom0 the
> > >> same upper bound. With both CPU and I/O side translation there is, in
> > >> principle, no reason to permit any kind of contiguity. Of course there's a
> > >> performance aspect, but that hardly matters in the specific case here. Yet at
> > >> the same time, once we expose MFNs, contiguity will start mattering as soon
> > >> as any piece of memory needs to be larger than PAGE_SIZE. Which means it will
> > >> make tightening of the presently lax handling prone to regressions in this
> > >> new behavior you're introducing. What chunk size does the PSP driver require?
> > > 
> > > I don't know. The memory returned by XENMEM_exchange is contiguous,
> > > right? Are you worried that Xen cannot allocate the requested amount of
> > > memory contiguously?
> > 
> > That would be Dom0's problem then. But really for a translated guest the
> > exchanged chunks being contiguous shouldn't matter, correctness-wise. That is,
> > within Xen, rather than failing a request, we could choose to retry using
> > discontiguous chunks (contiguous only in GFN space). Such an (afaict)
> > otherwise correct change would break your use case, as it would invalidate the
> > MFN information passed back. (This fallback approach would similarly apply to
> > other related mem-ops. It's just that during domain creation the tool stack
> > has its own fallback, so it may not be of much use right now.)
> 
> I think the description in the public header needs to be expanded to
> specify what the XENMEM_exchange does for translated guests, and
> clearly write down that the underlying MFNs for the exchanged region
> will be contiguous.  Possibly a new XENMEMF_ flag needs to be added to
> request contiguous physical memory for the new range.
> 
> Sadly this also has the side effect of quite likely shattering
> superpages for dom0 EPT/NPT, which will result in decreased dom0
> performance.
> 
> We have so far avoided exposing MFNs to HVM/PVH, but I don't see much
> way to avoid this if there's no option to use IOMMU or NPT page-tables
> with the PSP and we don't want to intercept PSP accesses in Xen and
> translate requests on the fly.
 
Yeah, I think the same way too.
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Jason Andryuk 7 months, 2 weeks ago
On 2025-04-30 20:19, Stefano Stabellini wrote:
> On Wed, 30 Apr 2025, Roger Pau Monné wrote:
>> On Wed, Apr 30, 2025 at 08:27:55AM +0200, Jan Beulich wrote:
>>> On 29.04.2025 23:52, Stefano Stabellini wrote:
>>>> On Tue, 29 Apr 2025, Jan Beulich wrote:
>>>>> On 28.04.2025 22:00, Stefano Stabellini wrote:
>>>>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
>>>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:

>>>>>>>> --- a/xen/common/memory.c
>>>>>>>> +++ b/xen/common/memory.c
>>>>>>>> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>>>>>>>>               rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
>>>>>>>>                                           exch.out.extent_order) ?: rc;
>>>>>>>>   
>>>>>>>> -            if ( !paging_mode_translate(d) &&
>>>>>>>> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
>>>>>>>>                    __copy_mfn_to_guest_offset(exch.out.extent_start,
>>>>>>>>                                               (i << out_chunk_order) + j,
>>>>>>>>                                               mfn) )
>>>>>>>
>>>>>>> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
>>>>>>> it?
>>>>>>
>>>>>> One way or another Dom0 PVH needs to know the MFN to pass it to the
>>>>>> co-processor.
>>>>>
>>>>> I see. That's pretty odd, though. I'm then further concerned of the order of
>>>>> the chunks. At present we're rather lax, in permitting PVH and PV Dom0 the
>>>>> same upper bound. With both CPU and I/O side translation there is, in
>>>>> principle, no reason to permit any kind of contiguity. Of course there's a
>>>>> performance aspect, but that hardly matters in the specific case here. Yet at
>>>>> the same time, once we expose MFNs, contiguity will start mattering as soon
>>>>> as any piece of memory needs to be larger than PAGE_SIZE. Which means it will
>>>>> make tightening of the presently lax handling prone to regressions in this
>>>>> new behavior you're introducing. What chunk size does the PSP driver require?
>>>>
>>>> I don't know. The memory returned by XENMEM_exchange is contiguous,
>>>> right? Are you worried that Xen cannot allocate the requested amount of
>>>> memory contiguously?

In the case I looked at, it is 8 pages.  The driver defines a ring of 32 
* 1k entries.  I'm not sure if there are other paths or device versions 
where it might differ.

>>> That would be Dom0's problem then. But really for a translated guest the
>>> exchanged chunks being contiguous shouldn't matter, correctness-wise. That is,
>>> within Xen, rather than failing a request, we could choose to retry using
>>> discontiguous chunks (contiguous only in GFN space). Such an (afaict)
>>> otherwise correct change would break your use case, as it would invalidate the
>>> MFN information passed back. (This fallback approach would similarly apply to
>>> other related mem-ops. It's just that during domain creation the tool stack
>>> has its own fallback, so it may not be of much use right now.)
>>
>> I think the description in the public header needs to be expanded to
>> specify what the XENMEM_exchange does for translated guests, and
>> clearly write down that the underlying MFNs for the exchanged region
>> will be contiguous.  Possibly a new XENMEMF_ flag needs to be added to
>> request contiguous physical memory for the new range.
>>
>> Sadly this also has the side effect of quite likely shattering
>> superpages for dom0 EPT/NPT, which will result in decreased dom0
>> performance.

Yes, this appears to happen as memory_exchange seems to always replace 
the pages.  I tested returning the existing MFNs if they are already 
contiguous since that was sufficient for this driver.  It worked, but it 
was messy.  A big loop to copy in the GFN array and check contiguity 
which duplicated much of the real loop.

>> We have so far avoided exposing MFNs to HVM/PVH, but I don't see much
>> way to avoid this if there's no option to use IOMMU or NPT page-tables
>> with the PSP and we don't want to intercept PSP accesses in Xen and
>> translate requests on the fly.
>   
> Yeah, I think the same way too.

Regards,
Jason

Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Jan Beulich 7 months, 2 weeks ago
On 01.05.2025 15:44, Jason Andryuk wrote:
> On 2025-04-30 20:19, Stefano Stabellini wrote:
>> On Wed, 30 Apr 2025, Roger Pau Monné wrote:
>>> On Wed, Apr 30, 2025 at 08:27:55AM +0200, Jan Beulich wrote:
>>>> On 29.04.2025 23:52, Stefano Stabellini wrote:
>>>>> On Tue, 29 Apr 2025, Jan Beulich wrote:
>>>>>> On 28.04.2025 22:00, Stefano Stabellini wrote:
>>>>>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
>>>>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
> 
>>>>>>>>> --- a/xen/common/memory.c
>>>>>>>>> +++ b/xen/common/memory.c
>>>>>>>>> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>>>>>>>>>               rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
>>>>>>>>>                                           exch.out.extent_order) ?: rc;
>>>>>>>>>   
>>>>>>>>> -            if ( !paging_mode_translate(d) &&
>>>>>>>>> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
>>>>>>>>>                    __copy_mfn_to_guest_offset(exch.out.extent_start,
>>>>>>>>>                                               (i << out_chunk_order) + j,
>>>>>>>>>                                               mfn) )
>>>>>>>>
>>>>>>>> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
>>>>>>>> it?
>>>>>>>
>>>>>>> One way or another Dom0 PVH needs to know the MFN to pass it to the
>>>>>>> co-processor.
>>>>>>
>>>>>> I see. That's pretty odd, though. I'm then further concerned of the order of
>>>>>> the chunks. At present we're rather lax, in permitting PVH and PV Dom0 the
>>>>>> same upper bound. With both CPU and I/O side translation there is, in
>>>>>> principle, no reason to permit any kind of contiguity. Of course there's a
>>>>>> performance aspect, but that hardly matters in the specific case here. Yet at
>>>>>> the same time, once we expose MFNs, contiguity will start mattering as soon
>>>>>> as any piece of memory needs to be larger than PAGE_SIZE. Which means it will
>>>>>> make tightening of the presently lax handling prone to regressions in this
>>>>>> new behavior you're introducing. What chunk size does the PSP driver require?
>>>>>
>>>>> I don't know. The memory returned by XENMEM_exchange is contiguous,
>>>>> right? Are you worried that Xen cannot allocate the requested amount of
>>>>> memory contiguously?
> 
> In the case I looked at, it is 8 pages.  The driver defines a ring of 32 
> * 1k entries.  I'm not sure if there are other paths or device versions 
> where it might differ.

As per this ...

>>>> That would be Dom0's problem then. But really for a translated guest the
>>>> exchanged chunks being contiguous shouldn't matter, correctness-wise. That is,
>>>> within Xen, rather than failing a request, we could choose to retry using
>>>> discontiguous chunks (contiguous only in GFN space). Such an (afaict)
>>>> otherwise correct change would break your use case, as it would invalidate the
>>>> MFN information passed back. (This fallback approach would similarly apply to
>>>> other related mem-ops. It's just that during domain creation the tool stack
>>>> has its own fallback, so it may not be of much use right now.)
>>>
>>> I think the description in the public header needs to be expanded to
>>> specify what the XENMEM_exchange does for translated guests, and
>>> clearly write down that the underlying MFNs for the exchanged region
>>> will be contiguous.  Possibly a new XENMEMF_ flag needs to be added to
>>> request contiguous physical memory for the new range.
>>>
>>> Sadly this also has the side effect of quite likely shattering
>>> superpages for dom0 EPT/NPT, which will result in decreased dom0
>>> performance.
> 
> Yes, this appears to happen as memory_exchange seems to always replace 
> the pages.  I tested returning the existing MFNs if they are already 
> contiguous since that was sufficient for this driver.  It worked, but it 
> was messy.  A big loop to copy in the GFN array and check contiguity 
> which duplicated much of the real loop.

... there may not be a need for the output range to be contiguous? In which
case - wouldn't a simple "give me the MFN for this GFN" hypercall do? I seem
to vaguely recall that we even had one, long ago; it was purged because of
it violating the "no MFNs exposed" principle (and because it not having had
any use [anymore]). XENMEM_translate_gpfn_list looks like is what I mean;
see commit 2d2f7977a052.

Jan

Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Stefano Stabellini 7 months, 2 weeks ago
On Fri, 2 May 2025, Jan Beulich wrote:
> On 01.05.2025 15:44, Jason Andryuk wrote:
> > On 2025-04-30 20:19, Stefano Stabellini wrote:
> >> On Wed, 30 Apr 2025, Roger Pau Monné wrote:
> >>> On Wed, Apr 30, 2025 at 08:27:55AM +0200, Jan Beulich wrote:
> >>>> On 29.04.2025 23:52, Stefano Stabellini wrote:
> >>>>> On Tue, 29 Apr 2025, Jan Beulich wrote:
> >>>>>> On 28.04.2025 22:00, Stefano Stabellini wrote:
> >>>>>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
> >>>>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
> > 
> >>>>>>>>> --- a/xen/common/memory.c
> >>>>>>>>> +++ b/xen/common/memory.c
> >>>>>>>>> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> >>>>>>>>>               rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
> >>>>>>>>>                                           exch.out.extent_order) ?: rc;
> >>>>>>>>>   
> >>>>>>>>> -            if ( !paging_mode_translate(d) &&
> >>>>>>>>> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
> >>>>>>>>>                    __copy_mfn_to_guest_offset(exch.out.extent_start,
> >>>>>>>>>                                               (i << out_chunk_order) + j,
> >>>>>>>>>                                               mfn) )
> >>>>>>>>
> >>>>>>>> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
> >>>>>>>> it?
> >>>>>>>
> >>>>>>> One way or another Dom0 PVH needs to know the MFN to pass it to the
> >>>>>>> co-processor.
> >>>>>>
> >>>>>> I see. That's pretty odd, though. I'm then further concerned of the order of
> >>>>>> the chunks. At present we're rather lax, in permitting PVH and PV Dom0 the
> >>>>>> same upper bound. With both CPU and I/O side translation there is, in
> >>>>>> principle, no reason to permit any kind of contiguity. Of course there's a
> >>>>>> performance aspect, but that hardly matters in the specific case here. Yet at
> >>>>>> the same time, once we expose MFNs, contiguity will start mattering as soon
> >>>>>> as any piece of memory needs to be larger than PAGE_SIZE. Which means it will
> >>>>>> make tightening of the presently lax handling prone to regressions in this
> >>>>>> new behavior you're introducing. What chunk size does the PSP driver require?
> >>>>>
> >>>>> I don't know. The memory returned by XENMEM_exchange is contiguous,
> >>>>> right? Are you worried that Xen cannot allocate the requested amount of
> >>>>> memory contiguously?
> > 
> > In the case I looked at, it is 8 pages.  The driver defines a ring of 32 
> > * 1k entries.  I'm not sure if there are other paths or device versions 
> > where it might differ.
> 
> As per this ...
> 
> >>>> That would be Dom0's problem then. But really for a translated guest the
> >>>> exchanged chunks being contiguous shouldn't matter, correctness-wise. That is,
> >>>> within Xen, rather than failing a request, we could choose to retry using
> >>>> discontiguous chunks (contiguous only in GFN space). Such an (afaict)
> >>>> otherwise correct change would break your use case, as it would invalidate the
> >>>> MFN information passed back. (This fallback approach would similarly apply to
> >>>> other related mem-ops. It's just that during domain creation the tool stack
> >>>> has its own fallback, so it may not be of much use right now.)
> >>>
> >>> I think the description in the public header needs to be expanded to
> >>> specify what the XENMEM_exchange does for translated guests, and
> >>> clearly write down that the underlying MFNs for the exchanged region
> >>> will be contiguous.  Possibly a new XENMEMF_ flag needs to be added to
> >>> request contiguous physical memory for the new range.
> >>>
> >>> Sadly this also has the side effect of quite likely shattering
> >>> superpages for dom0 EPT/NPT, which will result in decreased dom0
> >>> performance.
> > 
> > Yes, this appears to happen as memory_exchange seems to always replace 
> > the pages.  I tested returning the existing MFNs if they are already 
> > contiguous since that was sufficient for this driver.  It worked, but it 
> > was messy.  A big loop to copy in the GFN array and check contiguity 
> > which duplicated much of the real loop.
> 
> ... there may not be a need for the output range to be contiguous? In which
> case - wouldn't a simple "give me the MFN for this GFN" hypercall do? I seem
> to vaguely recall that we even had one, long ago; it was purged because of
> it violating the "no MFNs exposed" principle (and because it not having had
> any use [anymore]). XENMEM_translate_gpfn_list looks like is what I mean;
> see commit 2d2f7977a052.

Unfortunately I don't think that would work because I am pretty sure
that the PSP needs a consecutive range. In other words, I think the
output needs to be contiguous.
Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Jan Beulich 7 months, 1 week ago
On 02.05.2025 18:49, Stefano Stabellini wrote:
> On Fri, 2 May 2025, Jan Beulich wrote:
>> On 01.05.2025 15:44, Jason Andryuk wrote:
>>> On 2025-04-30 20:19, Stefano Stabellini wrote:
>>>> On Wed, 30 Apr 2025, Roger Pau Monné wrote:
>>>>> On Wed, Apr 30, 2025 at 08:27:55AM +0200, Jan Beulich wrote:
>>>>>> On 29.04.2025 23:52, Stefano Stabellini wrote:
>>>>>>> On Tue, 29 Apr 2025, Jan Beulich wrote:
>>>>>>>> On 28.04.2025 22:00, Stefano Stabellini wrote:
>>>>>>>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
>>>>>>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
>>>
>>>>>>>>>>> --- a/xen/common/memory.c
>>>>>>>>>>> +++ b/xen/common/memory.c
>>>>>>>>>>> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
>>>>>>>>>>>               rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
>>>>>>>>>>>                                           exch.out.extent_order) ?: rc;
>>>>>>>>>>>   
>>>>>>>>>>> -            if ( !paging_mode_translate(d) &&
>>>>>>>>>>> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
>>>>>>>>>>>                    __copy_mfn_to_guest_offset(exch.out.extent_start,
>>>>>>>>>>>                                               (i << out_chunk_order) + j,
>>>>>>>>>>>                                               mfn) )
>>>>>>>>>>
>>>>>>>>>> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
>>>>>>>>>> it?
>>>>>>>>>
>>>>>>>>> One way or another Dom0 PVH needs to know the MFN to pass it to the
>>>>>>>>> co-processor.
>>>>>>>>
>>>>>>>> I see. That's pretty odd, though. I'm then further concerned of the order of
>>>>>>>> the chunks. At present we're rather lax, in permitting PVH and PV Dom0 the
>>>>>>>> same upper bound. With both CPU and I/O side translation there is, in
>>>>>>>> principle, no reason to permit any kind of contiguity. Of course there's a
>>>>>>>> performance aspect, but that hardly matters in the specific case here. Yet at
>>>>>>>> the same time, once we expose MFNs, contiguity will start mattering as soon
>>>>>>>> as any piece of memory needs to be larger than PAGE_SIZE. Which means it will
>>>>>>>> make tightening of the presently lax handling prone to regressions in this
>>>>>>>> new behavior you're introducing. What chunk size does the PSP driver require?
>>>>>>>
>>>>>>> I don't know. The memory returned by XENMEM_exchange is contiguous,
>>>>>>> right? Are you worried that Xen cannot allocate the requested amount of
>>>>>>> memory contiguously?
>>>
>>> In the case I looked at, it is 8 pages.  The driver defines a ring of 32 
>>> * 1k entries.  I'm not sure if there are other paths or device versions 
>>> where it might differ.
>>
>> As per this ...
>>
>>>>>> That would be Dom0's problem then. But really for a translated guest the
>>>>>> exchanged chunks being contiguous shouldn't matter, correctness-wise. That is,
>>>>>> within Xen, rather than failing a request, we could choose to retry using
>>>>>> discontiguous chunks (contiguous only in GFN space). Such an (afaict)
>>>>>> otherwise correct change would break your use case, as it would invalidate the
>>>>>> MFN information passed back. (This fallback approach would similarly apply to
>>>>>> other related mem-ops. It's just that during domain creation the tool stack
>>>>>> has its own fallback, so it may not be of much use right now.)
>>>>>
>>>>> I think the description in the public header needs to be expanded to
>>>>> specify what the XENMEM_exchange does for translated guests, and
>>>>> clearly write down that the underlying MFNs for the exchanged region
>>>>> will be contiguous.  Possibly a new XENMEMF_ flag needs to be added to
>>>>> request contiguous physical memory for the new range.
>>>>>
>>>>> Sadly this also has the side effect of quite likely shattering
>>>>> superpages for dom0 EPT/NPT, which will result in decreased dom0
>>>>> performance.
>>>
>>> Yes, this appears to happen as memory_exchange seems to always replace 
>>> the pages.  I tested returning the existing MFNs if they are already 
>>> contiguous since that was sufficient for this driver.  It worked, but it 
>>> was messy.  A big loop to copy in the GFN array and check contiguity 
>>> which duplicated much of the real loop.
>>
>> ... there may not be a need for the output range to be contiguous? In which
>> case - wouldn't a simple "give me the MFN for this GFN" hypercall do? I seem
>> to vaguely recall that we even had one, long ago; it was purged because of
>> it violating the "no MFNs exposed" principle (and because it not having had
>> any use [anymore]). XENMEM_translate_gpfn_list looks like is what I mean;
>> see commit 2d2f7977a052.
> 
> Unfortunately I don't think that would work because I am pretty sure
> that the PSP needs a consecutive range. In other words, I think the
> output needs to be contiguous.

Hmm, higher up (context) you said "ring of 32 * 1k entries". That reads like
independent 1k areas, and hence no need for contiguity.

Jan

Re: [PATCH] xen/x86: allow Dom0 PVH to call XENMEM_exchange
Posted by Roger Pau Monné 7 months, 2 weeks ago
On Fri, May 02, 2025 at 09:49:30AM -0700, Stefano Stabellini wrote:
> On Fri, 2 May 2025, Jan Beulich wrote:
> > On 01.05.2025 15:44, Jason Andryuk wrote:
> > > On 2025-04-30 20:19, Stefano Stabellini wrote:
> > >> On Wed, 30 Apr 2025, Roger Pau Monné wrote:
> > >>> On Wed, Apr 30, 2025 at 08:27:55AM +0200, Jan Beulich wrote:
> > >>>> On 29.04.2025 23:52, Stefano Stabellini wrote:
> > >>>>> On Tue, 29 Apr 2025, Jan Beulich wrote:
> > >>>>>> On 28.04.2025 22:00, Stefano Stabellini wrote:
> > >>>>>>> On Mon, 28 Apr 2025, Jan Beulich wrote:
> > >>>>>>>> On 25.04.2025 22:19, Stefano Stabellini wrote:
> > > 
> > >>>>>>>>> --- a/xen/common/memory.c
> > >>>>>>>>> +++ b/xen/common/memory.c
> > >>>>>>>>> @@ -794,7 +794,7 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> > >>>>>>>>>               rc = guest_physmap_add_page(d, _gfn(gpfn), mfn,
> > >>>>>>>>>                                           exch.out.extent_order) ?: rc;
> > >>>>>>>>>   
> > >>>>>>>>> -            if ( !paging_mode_translate(d) &&
> > >>>>>>>>> +            if ( (!paging_mode_translate(d) || is_hardware_domain(d)) &&
> > >>>>>>>>>                    __copy_mfn_to_guest_offset(exch.out.extent_start,
> > >>>>>>>>>                                               (i << out_chunk_order) + j,
> > >>>>>>>>>                                               mfn) )
> > >>>>>>>>
> > >>>>>>>> Wait, no: A PVH domain (Dom0 or not) can't very well make use of MFNs, can
> > >>>>>>>> it?
> > >>>>>>>
> > >>>>>>> One way or another Dom0 PVH needs to know the MFN to pass it to the
> > >>>>>>> co-processor.
> > >>>>>>
> > >>>>>> I see. That's pretty odd, though. I'm then further concerned of the order of
> > >>>>>> the chunks. At present we're rather lax, in permitting PVH and PV Dom0 the
> > >>>>>> same upper bound. With both CPU and I/O side translation there is, in
> > >>>>>> principle, no reason to permit any kind of contiguity. Of course there's a
> > >>>>>> performance aspect, but that hardly matters in the specific case here. Yet at
> > >>>>>> the same time, once we expose MFNs, contiguity will start mattering as soon
> > >>>>>> as any piece of memory needs to be larger than PAGE_SIZE. Which means it will
> > >>>>>> make tightening of the presently lax handling prone to regressions in this
> > >>>>>> new behavior you're introducing. What chunk size does the PSP driver require?
> > >>>>>
> > >>>>> I don't know. The memory returned by XENMEM_exchange is contiguous,
> > >>>>> right? Are you worried that Xen cannot allocate the requested amount of
> > >>>>> memory contiguously?
> > > 
> > > In the case I looked at, it is 8 pages.  The driver defines a ring of 32 
> > > * 1k entries.  I'm not sure if there are other paths or device versions 
> > > where it might differ.
> > 
> > As per this ...
> > 
> > >>>> That would be Dom0's problem then. But really for a translated guest the
> > >>>> exchanged chunks being contiguous shouldn't matter, correctness-wise. That is,
> > >>>> within Xen, rather than failing a request, we could choose to retry using
> > >>>> discontiguous chunks (contiguous only in GFN space). Such an (afaict)
> > >>>> otherwise correct change would break your use case, as it would invalidate the
> > >>>> MFN information passed back. (This fallback approach would similarly apply to
> > >>>> other related mem-ops. It's just that during domain creation the tool stack
> > >>>> has its own fallback, so it may not be of much use right now.)
> > >>>
> > >>> I think the description in the public header needs to be expanded to
> > >>> specify what the XENMEM_exchange does for translated guests, and
> > >>> clearly write down that the underlying MFNs for the exchanged region
> > >>> will be contiguous.  Possibly a new XENMEMF_ flag needs to be added to
> > >>> request contiguous physical memory for the new range.
> > >>>
> > >>> Sadly this also has the side effect of quite likely shattering
> > >>> superpages for dom0 EPT/NPT, which will result in decreased dom0
> > >>> performance.
> > > 
> > > Yes, this appears to happen as memory_exchange seems to always replace 
> > > the pages.  I tested returning the existing MFNs if they are already 
> > > contiguous since that was sufficient for this driver.  It worked, but it 
> > > was messy.  A big loop to copy in the GFN array and check contiguity 
> > > which duplicated much of the real loop.
> > 
> > ... there may not be a need for the output range to be contiguous? In which
> > case - wouldn't a simple "give me the MFN for this GFN" hypercall do? I seem
> > to vaguely recall that we even had one, long ago; it was purged because of
> > it violating the "no MFNs exposed" principle (and because it not having had
> > any use [anymore]). XENMEM_translate_gpfn_list looks like is what I mean;
> > see commit 2d2f7977a052.
> 
> Unfortunately I don't think that would work because I am pretty sure
> that the PSP needs a consecutive range. In other words, I think the
> output needs to be contiguous.

The plan for AMD-SEV support was to place a PSP driver in Xen, and not
let dom0 interact with the PSP directly (or at least that was my
impression from the talks we had about implementing SEV support).

Is what you use from the PSP fully isolated from the functionality
needed by SEV, so that we could still expose the interface you require
to dom0 while letting Xen drive the SEV parts?

Otherwise I think we need an agreement about how to integrate your
usage of the PSP with the SEV work, as having both Xen and dom0
driving the PSP in parallel might not be a wise idea.

Thanks, Roger.