[v1] KVM: TDX huge page support for private memory

[RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Yan Zhao 9 months, 3 weeks ago

Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.

Verify the validity of the level and ensure that the mapping range is fully
contained within the page folio.

As a conservative solution, perform CLFLUSH on all pages to be mapped into
the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
dirty cache lines do not write back later and clobber TD memory.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index f5e2a937c1e7..a66d501b5677 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
 		.rdx = tdx_tdr_pa(td),
 		.r8 = page_to_phys(page),
 	};
+	unsigned long nr_pages = 1 << (level * 9);
+	struct folio *folio = page_folio(page);
+	unsigned long idx = 0;
 	u64 ret;
 
-	tdx_clflush_page(page);
+	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
+	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
+		return -EINVAL;
+
+	while (nr_pages--)
+		tdx_clflush_page(nth_page(page, idx++));
+
 	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
 
 	*ext_err1 = args.rcx;
-- 
2.43.2

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Yan Zhao 7 months ago

On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote:
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index f5e2a937c1e7..a66d501b5677 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
According to the discussion in DPAMT [*],
"hpa here points to a 2M region that pamt_pages covers. We don't have
struct page that represents it. Passing 4k struct page would be
misleading IMO."

Should we update tdh_mem_page_aug() accordingly to use hpa?
Or use struct folio instead?

[*] https://lore.kernel.org/all/3coaqkcfp7xtpvh2x4kph55qlopupknm7dmzqox6fakzaedhem@a2oysbvbshpm/


>  		.rdx = tdx_tdr_pa(td),
>  		.r8 = page_to_phys(page),
>  	};
> +	unsigned long nr_pages = 1 << (level * 9);
> +	struct folio *folio = page_folio(page);
> +	unsigned long idx = 0;
>  	u64 ret;
>  
> -	tdx_clflush_page(page);
> +	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
> +	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
> +		return -EINVAL;
> +
> +	while (nr_pages--)
> +		tdx_clflush_page(nth_page(page, idx++));
> +
>  	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
>  
>  	*ext_err1 = args.rcx;
> -- 
> 2.43.2
>

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Edgecombe, Rick P 7 months ago

On Tue, 2025-07-08 at 16:48 +0800, Yan Zhao wrote:
> On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote:
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index f5e2a937c1e7..a66d501b5677 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> According to the discussion in DPAMT [*],
> "hpa here points to a 2M region that pamt_pages covers. We don't have
> struct page that represents it. Passing 4k struct page would be
> misleading IMO."
> 
> Should we update tdh_mem_page_aug() accordingly to use hpa?
> Or use struct folio instead?
> 
> [*] https://lore.kernel.org/all/3coaqkcfp7xtpvh2x4kph55qlopupknm7dmzqox6fakzaedhem@a2oysbvbshpm/

The original seamcall wrapper patches used "u64 hpa", etc everywhere. The
feedback was that it was too error prone to not have types. We looked at using
kvm types (hpa_t, etc), but the type checking was still just surface level [0].

So the goal is to reduce errors and improve code readability. We can consider
breaking symmetry if it is better that way. In this case though, why not use
struct folio?

[0] https://lore.kernel.org/kvm/30d0cef5-82d5-4325-b149-0e99833b8785@intel.com/

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Yan Zhao 7 months ago

On Tue, Jul 08, 2025 at 09:55:39PM +0800, Edgecombe, Rick P wrote:
> On Tue, 2025-07-08 at 16:48 +0800, Yan Zhao wrote:
> > On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote:
> > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > > index f5e2a937c1e7..a66d501b5677 100644
> > > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> > According to the discussion in DPAMT [*],
> > "hpa here points to a 2M region that pamt_pages covers. We don't have
> > struct page that represents it. Passing 4k struct page would be
> > misleading IMO."
> > 
> > Should we update tdh_mem_page_aug() accordingly to use hpa?
> > Or use struct folio instead?
> > 
> > [*] https://lore.kernel.org/all/3coaqkcfp7xtpvh2x4kph55qlopupknm7dmzqox6fakzaedhem@a2oysbvbshpm/
> 
> The original seamcall wrapper patches used "u64 hpa", etc everywhere. The
> feedback was that it was too error prone to not have types. We looked at using
> kvm types (hpa_t, etc), but the type checking was still just surface level [0].
> 
> So the goal is to reduce errors and improve code readability. We can consider
> breaking symmetry if it is better that way. In this case though, why not use
> struct folio?
I'm Ok with using struct folio.
My previous ask was based on 2 considerations:

1. hpa is simpler and I didn't find Dave's NAK to Kirill's patch (v1 or v2).
2. using struct folio, I need to introduce "start_idx" as well (as below),
   because it's likely that guest_memfd provides a huge folio while KVM wants to
   map it at 4KB.

u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct folio *folio, 
                     unsigned long start_idx, u64 *ext_err1, u64 *ext_err2)      
{                                                                                
        struct page *start = folio_page(folio, start_idx);                       
        unsigned long npages = 1 << (level * PTE_SHIFT);                         
        struct tdx_module_args args = {                                          
                .rcx = gpa | level,                                              
                .rdx = tdx_tdr_pa(td),                                           
                .r8 = page_to_phys(start),                                       
        };                                                                       
        u64 ret;                                                                 
                                                                                 
        if (start_idx + npages > folio_nr_pages(folio))                          
                return TDX_SW_ERROR;                                             
                                                                                 
        for (int i = 0; i < npages; i++)                                         
                tdx_clflush_page(nth_page(start, i));                            
                                                                                 
        ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);                             
                                                                                 
        *ext_err1 = args.rcx;                                                    
        *ext_err2 = args.rdx;                                                    
                                                                                 
        return ret;                                                              
}                                   



> [0] https://lore.kernel.org/kvm/30d0cef5-82d5-4325-b149-0e99833b8785@intel.com/

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Edgecombe, Rick P 7 months ago

On Wed, 2025-07-09 at 10:23 +0800, Yan Zhao wrote:

> 2. using struct folio, I need to introduce "start_idx" as well (as below),
>    because it's likely that guest_memfd provides a huge folio while KVM wants to
>    map it at 4KB.

Seems ok to me.

> 
> u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct folio *folio, 
>                      unsigned long start_idx, u64 *ext_err1, u64 *ext_err2)      
> {                                                                                
>         struct page *start = folio_page(folio, start_idx);                       
>         unsigned long npages = 1 << (level * PTE_SHIFT);                         
>         struct tdx_module_args args = {                                          
>                 .rcx = gpa | level,                                              
>                 .rdx = tdx_tdr_pa(td),                                           
>                 .r8 = page_to_phys(start),                                       
>         };                                                                       
>         u64 ret;                                                                 
>                                                                                  
>         if (start_idx + npages > folio_nr_pages(folio))                          
>                 return TDX_SW_ERROR;                                             
>                                                                                  
>         for (int i = 0; i < npages; i++)                                         
>                 tdx_clflush_page(nth_page(start, i));                            
>                                                                                  
>         ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);                             
>                                                                                  
>         *ext_err1 = args.rcx;                                                    
>         *ext_err2 = args.rdx;                                                    
>                                                                                  
>         return ret;                                                              
> }

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Vishal Annapurve 7 months ago

On Tue, Jul 8, 2025 at 6:56 AM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
>
> On Tue, 2025-07-08 at 16:48 +0800, Yan Zhao wrote:
> > On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote:
> > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > > index f5e2a937c1e7..a66d501b5677 100644
> > > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> > According to the discussion in DPAMT [*],
> > "hpa here points to a 2M region that pamt_pages covers. We don't have
> > struct page that represents it. Passing 4k struct page would be
> > misleading IMO."
> >
> > Should we update tdh_mem_page_aug() accordingly to use hpa?
> > Or use struct folio instead?
> >
> > [*] https://lore.kernel.org/all/3coaqkcfp7xtpvh2x4kph55qlopupknm7dmzqox6fakzaedhem@a2oysbvbshpm/
>
> The original seamcall wrapper patches used "u64 hpa", etc everywhere. The
> feedback was that it was too error prone to not have types. We looked at using
> kvm types (hpa_t, etc), but the type checking was still just surface level [0].
>
> So the goal is to reduce errors and improve code readability. We can consider
> breaking symmetry if it is better that way. In this case though, why not use
> struct folio?

My vote would be to prefer using "hpa" and not rely on folio/page
structs for guest_memfd allocated memory wherever possible.

>
> [0] https://lore.kernel.org/kvm/30d0cef5-82d5-4325-b149-0e99833b8785@intel.com/

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Edgecombe, Rick P 7 months ago

On Tue, 2025-07-08 at 08:29 -0700, Vishal Annapurve wrote:
> > The original seamcall wrapper patches used "u64 hpa", etc everywhere. The
> > feedback was that it was too error prone to not have types. We looked at
> > using
> > kvm types (hpa_t, etc), but the type checking was still just surface level
> > [0].
> > 
> > So the goal is to reduce errors and improve code readability. We can
> > consider
> > breaking symmetry if it is better that way. In this case though, why not use
> > struct folio?
> 
> My vote would be to prefer using "hpa" and not rely on folio/page
> structs for guest_memfd allocated memory wherever possible.

Is this because you want to enable struct page-less gmemfd in the future? Or
other reason?

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Vishal Annapurve 7 months ago

On Tue, Jul 8, 2025 at 8:32 AM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
>
> On Tue, 2025-07-08 at 08:29 -0700, Vishal Annapurve wrote:
> > > The original seamcall wrapper patches used "u64 hpa", etc everywhere. The
> > > feedback was that it was too error prone to not have types. We looked at
> > > using
> > > kvm types (hpa_t, etc), but the type checking was still just surface level
> > > [0].
> > >
> > > So the goal is to reduce errors and improve code readability. We can
> > > consider
> > > breaking symmetry if it is better that way. In this case though, why not use
> > > struct folio?
> >
> > My vote would be to prefer using "hpa" and not rely on folio/page
> > structs for guest_memfd allocated memory wherever possible.
>
> Is this because you want to enable struct page-less gmemfd in the future?

Yes. That's the only reason.

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Edgecombe, Rick P 7 months ago

On Tue, 2025-07-08 at 15:06 -0700, Vishal Annapurve wrote:
> > > My vote would be to prefer using "hpa" and not rely on folio/page
> > > structs for guest_memfd allocated memory wherever possible.
> > 
> > Is this because you want to enable struct page-less gmemfd in the future?
> 
> Yes. That's the only reason.

I don't think we should change just this field of this seamcall wrapper from the
current pattern for that reason. When this stuff comes along it will be just
about as easy to change it with the rest. Then in the meantime it doesn't look
asymmetric.

In general, I (again) think that we should not focus on accommodating future
stuff unless there is an ABI touch point. This is to ultimately speed enabling
of the entire stack.

It is definitely not to make it harder to implement TDX support for pfn based
gmem in the future. Rather to make it possible. As in, if nothing is upstream
because we are endlessly debating how it all fits together at once, then it
won't be possible to enhance it further.

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Vishal Annapurve 7 months ago

On Tue, Jul 8, 2025 at 4:16 PM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
>
> On Tue, 2025-07-08 at 15:06 -0700, Vishal Annapurve wrote:
> > > > My vote would be to prefer using "hpa" and not rely on folio/page
> > > > structs for guest_memfd allocated memory wherever possible.
> > >
> > > Is this because you want to enable struct page-less gmemfd in the future?
> >
> > Yes. That's the only reason.
>
> I don't think we should change just this field of this seamcall wrapper from the
> current pattern for that reason. When this stuff comes along it will be just
> about as easy to change it with the rest. Then in the meantime it doesn't look
> asymmetric.
>
> In general, I (again) think that we should not focus on accommodating future
> stuff unless there is an ABI touch point. This is to ultimately speed enabling
> of the entire stack.
>
> It is definitely not to make it harder to implement TDX support for pfn based
> gmem in the future. Rather to make it possible. As in, if nothing is upstream
> because we are endlessly debating how it all fits together at once, then it
> won't be possible to enhance it further.

I agree and if we can't do without page struct for now that's fine. My
response was just to favor pfn/hpa over "page struct" if possible,
given that we have a choice here. Feel free to ignore if symmetry
seems more important.

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Chao Gao 9 months ago

On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote:
>Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
>
>Verify the validity of the level and ensure that the mapping range is fully
>contained within the page folio.
>
>As a conservative solution, perform CLFLUSH on all pages to be mapped into
>the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
>dirty cache lines do not write back later and clobber TD memory.
>
>Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
>---
> arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
>index f5e2a937c1e7..a66d501b5677 100644
>--- a/arch/x86/virt/vmx/tdx/tdx.c
>+++ b/arch/x86/virt/vmx/tdx/tdx.c
>@@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> 		.rdx = tdx_tdr_pa(td),
> 		.r8 = page_to_phys(page),
> 	};
>+	unsigned long nr_pages = 1 << (level * 9);
>+	struct folio *folio = page_folio(page);
>+	unsigned long idx = 0;
> 	u64 ret;
> 
>-	tdx_clflush_page(page);
>+	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
>+	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
>+		return -EINVAL;

Returning -EINVAL looks incorrect as the return type is u64.

>+
>+	while (nr_pages--)
>+		tdx_clflush_page(nth_page(page, idx++));
>+
> 	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
> 
> 	*ext_err1 = args.rcx;
>-- 
>2.43.2
>

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Yan Zhao 8 months, 4 weeks ago

On Thu, May 15, 2025 at 10:16:58AM +0800, Chao Gao wrote:
> On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote:
> >Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
> >
> >Verify the validity of the level and ensure that the mapping range is fully
> >contained within the page folio.
> >
> >As a conservative solution, perform CLFLUSH on all pages to be mapped into
> >the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
> >dirty cache lines do not write back later and clobber TD memory.
> >
> >Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> >---
> > arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> >diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> >index f5e2a937c1e7..a66d501b5677 100644
> >--- a/arch/x86/virt/vmx/tdx/tdx.c
> >+++ b/arch/x86/virt/vmx/tdx/tdx.c
> >@@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> > 		.rdx = tdx_tdr_pa(td),
> > 		.r8 = page_to_phys(page),
> > 	};
> >+	unsigned long nr_pages = 1 << (level * 9);
> >+	struct folio *folio = page_folio(page);
> >+	unsigned long idx = 0;
> > 	u64 ret;
> > 
> >-	tdx_clflush_page(page);
> >+	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
> >+	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
> >+		return -EINVAL;
> 
> Returning -EINVAL looks incorrect as the return type is u64.
Good catch. Thanks!
I'll think about how to handle it. Looks it could be dropped if we trust KVM.


> >+	while (nr_pages--)
> >+		tdx_clflush_page(nth_page(page, idx++));
> >+
> > 	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
> > 
> > 	*ext_err1 = args.rcx;
> >-- 
> >2.43.2
> >

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Edgecombe, Rick P 9 months ago

On Thu, 2025-04-24 at 11:04 +0800, Yan Zhao wrote:
> Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
> 
> Verify the validity of the level and ensure that the mapping range is fully
> contained within the page folio.
> 
> As a conservative solution, perform CLFLUSH on all pages to be mapped into
> the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
> dirty cache lines do not write back later and clobber TD memory.

This should have a brief background on why it doesn't use the arg - what is
deficient today. Also, an explanation of how it will be used (i.e. what types of
pages will be passed)

> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index f5e2a937c1e7..a66d501b5677 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
>  		.rdx = tdx_tdr_pa(td),
>  		.r8 = page_to_phys(page),
>  	};
> +	unsigned long nr_pages = 1 << (level * 9);
> +	struct folio *folio = page_folio(page);
> +	unsigned long idx = 0;
>  	u64 ret;
>  
> -	tdx_clflush_page(page);
> +	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
> +	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
> +		return -EINVAL;

Shouldn't KVM not try to map a huge page in this situation? Doesn't seem like a
job for the SEAMCALL wrapper.

> +
> +	while (nr_pages--)
> +		tdx_clflush_page(nth_page(page, idx++));

clflush_cache_range() is:
static void tdx_clflush_page(struct page *page)
{
	clflush_cache_range(page_to_virt(page), PAGE_SIZE);
}

So we have loops within loops...  Better to add an arg to tdx_clflush_page() or
add a variant that takes one.

> +
>  	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
>  
>  	*ext_err1 = args.rcx;

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Yan Zhao 8 months, 4 weeks ago

On Wed, May 14, 2025 at 02:52:49AM +0800, Edgecombe, Rick P wrote:
> On Thu, 2025-04-24 at 11:04 +0800, Yan Zhao wrote:
> > Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
> > 
> > Verify the validity of the level and ensure that the mapping range is fully
> > contained within the page folio.
> > 
> > As a conservative solution, perform CLFLUSH on all pages to be mapped into
> > the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
> > dirty cache lines do not write back later and clobber TD memory.
> 
> This should have a brief background on why it doesn't use the arg - what is
> deficient today. Also, an explanation of how it will be used (i.e. what types of
> pages will be passed)
Will do.

> > 
> > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index f5e2a937c1e7..a66d501b5677 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> >  		.rdx = tdx_tdr_pa(td),
> >  		.r8 = page_to_phys(page),
> >  	};
> > +	unsigned long nr_pages = 1 << (level * 9);
> > +	struct folio *folio = page_folio(page);
> > +	unsigned long idx = 0;
> >  	u64 ret;
> >  
> > -	tdx_clflush_page(page);
> > +	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
> > +	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
> > +		return -EINVAL;
> 
> Shouldn't KVM not try to map a huge page in this situation? Doesn't seem like a
> job for the SEAMCALL wrapper.
Ok. If the decision is to trust KVM and all potential callers, it's reasonable
to drop those checks.

> > +
> > +	while (nr_pages--)
> > +		tdx_clflush_page(nth_page(page, idx++));
> 
> clflush_cache_range() is:
> static void tdx_clflush_page(struct page *page)
> {
> 	clflush_cache_range(page_to_virt(page), PAGE_SIZE);
> }
> 
> So we have loops within loops...  Better to add an arg to tdx_clflush_page() or
> add a variant that takes one.
Ok.

One thing to note is that even with an extra arg, tdx_clflush_page() has to call
clflush_cache_range() page by page because with
"#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)",
page virtual addresses are not necessarily contiguous.

What about Binbin's proposal [1]? i.e.,

while (nr_pages)
     tdx_clflush_page(nth_page(page, --nr_pages));

[1] https://lore.kernel.org/all/a7d0988d-037c-454f-bc6b-57e71b357488@linux.intel.com/

> > +
> >  	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
> >  
> >  	*ext_err1 = args.rcx;
>

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Nikolay Borisov 7 months, 3 weeks ago


On 5/16/25 12:05, Yan Zhao wrote:
> On Wed, May 14, 2025 at 02:52:49AM +0800, Edgecombe, Rick P wrote:
>> On Thu, 2025-04-24 at 11:04 +0800, Yan Zhao wrote:
>>> Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
>>>
>>> Verify the validity of the level and ensure that the mapping range is fully
>>> contained within the page folio.
>>>
>>> As a conservative solution, perform CLFLUSH on all pages to be mapped into
>>> the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
>>> dirty cache lines do not write back later and clobber TD memory.
>>
>> This should have a brief background on why it doesn't use the arg - what is
>> deficient today. Also, an explanation of how it will be used (i.e. what types of
>> pages will be passed)
> Will do.
> 
>>>
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>>> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
>>> ---
>>>   arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
>>> index f5e2a937c1e7..a66d501b5677 100644
>>> --- a/arch/x86/virt/vmx/tdx/tdx.c
>>> +++ b/arch/x86/virt/vmx/tdx/tdx.c
>>> @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
>>>   		.rdx = tdx_tdr_pa(td),
>>>   		.r8 = page_to_phys(page),
>>>   	};
>>> +	unsigned long nr_pages = 1 << (level * 9);
>>> +	struct folio *folio = page_folio(page);
>>> +	unsigned long idx = 0;
>>>   	u64 ret;
>>>   
>>> -	tdx_clflush_page(page);
>>> +	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
>>> +	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
>>> +		return -EINVAL;
>>
>> Shouldn't KVM not try to map a huge page in this situation? Doesn't seem like a
>> job for the SEAMCALL wrapper.
> Ok. If the decision is to trust KVM and all potential callers, it's reasonable
> to drop those checks.
> 
>>> +
>>> +	while (nr_pages--)
>>> +		tdx_clflush_page(nth_page(page, idx++));
>>
>> clflush_cache_range() is:
>> static void tdx_clflush_page(struct page *page)
>> {
>> 	clflush_cache_range(page_to_virt(page), PAGE_SIZE);
>> }
>>
>> So we have loops within loops...  Better to add an arg to tdx_clflush_page() or
>> add a variant that takes one.
> Ok.
> 
> One thing to note is that even with an extra arg, tdx_clflush_page() has to call
> clflush_cache_range() page by page because with
> "#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)",
> page virtual addresses are not necessarily contiguous.
> 
> What about Binbin's proposal [1]? i.e.,
> 
> while (nr_pages)
>       tdx_clflush_page(nth_page(page, --nr_pages));

What's the problem with using:

+       for (int i = 0; nr_pages; nr_pages--)
+               tdx_clflush_page(nth_page(page, i++))


The kernel now allows C99-style definition of variables inside a loop + 
it's clear how many times the loop has to be executed.
> 
> [1] https://lore.kernel.org/all/a7d0988d-037c-454f-bc6b-57e71b357488@linux.intel.com/
> 
>>> +
>>>   	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
>>>   
>>>   	*ext_err1 = args.rcx;
>>
>

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Yan Zhao 7 months, 2 weeks ago

On Thu, Jun 19, 2025 at 12:26:07PM +0300, Nikolay Borisov wrote:
> > What about Binbin's proposal [1]? i.e.,
> > 
> > while (nr_pages)
> >       tdx_clflush_page(nth_page(page, --nr_pages));
> 
> What's the problem with using:
> 
> +       for (int i = 0; nr_pages; nr_pages--)
> +               tdx_clflush_page(nth_page(page, i++))
Thanks! It looks good to me.

> The kernel now allows C99-style definition of variables inside a loop + it's
> clear how many times the loop has to be executed.

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Edgecombe, Rick P 8 months, 4 weeks ago

On Fri, 2025-05-16 at 17:05 +0800, Yan Zhao wrote:
> > So we have loops within loops...  Better to add an arg to tdx_clflush_page()
> > or
> > add a variant that takes one.
> Ok.
> 
> One thing to note is that even with an extra arg, tdx_clflush_page() has to
> call
> clflush_cache_range() page by page because with
> "#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)",
> page virtual addresses are not necessarily contiguous.
> 
> What about Binbin's proposal [1]? i.e.,
> 
> while (nr_pages)
>      tdx_clflush_page(nth_page(page, --nr_pages));
> 
> [1]
> https://lore.kernel.org/all/a7d0988d-037c-454f-bc6b-57e71b357488@linux.intel.com/

These SEAMCALLs are handling physically contiguous pages so I don't think we
need to worry about that. But Binbin's suggestion seems fine too.

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Binbin Wu 9 months, 2 weeks ago


On 4/24/2025 11:04 AM, Yan Zhao wrote:
> Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
>
> Verify the validity of the level and ensure that the mapping range is fully
> contained within the page folio.
>
> As a conservative solution, perform CLFLUSH on all pages to be mapped into
> the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
> dirty cache lines do not write back later and clobber TD memory.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index f5e2a937c1e7..a66d501b5677 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
>   		.rdx = tdx_tdr_pa(td),
>   		.r8 = page_to_phys(page),
>   	};
> +	unsigned long nr_pages = 1 << (level * 9);
> +	struct folio *folio = page_folio(page);
> +	unsigned long idx = 0;
>   	u64 ret;
>   
> -	tdx_clflush_page(page);
> +	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
> +	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
> +		return -EINVAL;
> +
> +	while (nr_pages--)
> +		tdx_clflush_page(nth_page(page, idx++));
Is the following better to save a variable?

while (nr_pages)
     tdx_clflush_page(nth_page(page, --nr_pages));


> +
>   	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
>   
>   	*ext_err1 = args.rcx;

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Yan Zhao 9 months, 2 weeks ago

On Fri, Apr 25, 2025 at 02:51:18PM +0800, Binbin Wu wrote:
> 
> 
> On 4/24/2025 11:04 AM, Yan Zhao wrote:
> > Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
> > 
> > Verify the validity of the level and ensure that the mapping range is fully
> > contained within the page folio.
> > 
> > As a conservative solution, perform CLFLUSH on all pages to be mapped into
> > the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
> > dirty cache lines do not write back later and clobber TD memory.
> > 
> > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >   arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
> >   1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index f5e2a937c1e7..a66d501b5677 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> >   		.rdx = tdx_tdr_pa(td),
> >   		.r8 = page_to_phys(page),
> >   	};
> > +	unsigned long nr_pages = 1 << (level * 9);
> > +	struct folio *folio = page_folio(page);
> > +	unsigned long idx = 0;
> >   	u64 ret;
> > -	tdx_clflush_page(page);
> > +	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
> > +	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
> > +		return -EINVAL;
> > +
> > +	while (nr_pages--)
> > +		tdx_clflush_page(nth_page(page, idx++));
> Is the following better to save a variable?
> 
> while (nr_pages)
>     tdx_clflush_page(nth_page(page, --nr_pages));

Looks better except performing the clflush in reverse order :)

> 
> > +
> >   	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
> >   	*ext_err1 = args.rcx;
>

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Kirill A. Shutemov 9 months, 2 weeks ago

On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote:
> Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
> 
> Verify the validity of the level and ensure that the mapping range is fully
> contained within the page folio.
> 
> As a conservative solution, perform CLFLUSH on all pages to be mapped into
> the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
> dirty cache lines do not write back later and clobber TD memory.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index f5e2a937c1e7..a66d501b5677 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
>  		.rdx = tdx_tdr_pa(td),
>  		.r8 = page_to_phys(page),
>  	};
> +	unsigned long nr_pages = 1 << (level * 9);

PTE_SHIFT.

> +	struct folio *folio = page_folio(page);
> +	unsigned long idx = 0;
>  	u64 ret;
>  
> -	tdx_clflush_page(page);
> +	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||

Do we even need this check?

> +	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
> +		return -EINVAL;
> +
> +	while (nr_pages--)
> +		tdx_clflush_page(nth_page(page, idx++));
> +
>  	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
>  
>  	*ext_err1 = args.rcx;
> -- 
> 2.43.2
> 

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Re: [RFC PATCH 02/21] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages

Posted by Yan Zhao 9 months, 2 weeks ago

On Thu, Apr 24, 2025 at 10:48:53AM +0300, Kirill A. Shutemov wrote:
> On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote:
> > Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
> > 
> > Verify the validity of the level and ensure that the mapping range is fully
> > contained within the page folio.
> > 
> > As a conservative solution, perform CLFLUSH on all pages to be mapped into
> > the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
> > dirty cache lines do not write back later and clobber TD memory.
> > 
> > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index f5e2a937c1e7..a66d501b5677 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> >  		.rdx = tdx_tdr_pa(td),
> >  		.r8 = page_to_phys(page),
> >  	};
> > +	unsigned long nr_pages = 1 << (level * 9);
> 
> PTE_SHIFT.
Yes. Thanks.

> > +	struct folio *folio = page_folio(page);
> > +	unsigned long idx = 0;
> >  	u64 ret;
> >  
> > -	tdx_clflush_page(page);
> > +	if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
> 
> Do we even need this check?
Maybe not if tdh_mem_page_aug() trusts KVM :)

The consideration is to avoid nr_pages being too huge to cause too many
tdx_clflush_page()s on any reckless error.
 
> > +	    (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
> > +		return -EINVAL;
> > +
> > +	while (nr_pages--)
> > +		tdx_clflush_page(nth_page(page, idx++));
> > +
> >  	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
> >  
> >  	*ext_err1 = args.rcx;
> > -- 
> > 2.43.2
> > 
> 
> -- 
>   Kiryl Shutsemau / Kirill A. Shutemov