Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
Verify the validity of the level and ensure that the mapping range is fully
contained within the page folio.
As a conservative solution, perform CLFLUSH on all pages to be mapped into
the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
dirty cache lines do not write back later and clobber TD memory.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index f5e2a937c1e7..a66d501b5677 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
.rdx = tdx_tdr_pa(td),
.r8 = page_to_phys(page),
};
+ unsigned long nr_pages = 1 << (level * 9);
+ struct folio *folio = page_folio(page);
+ unsigned long idx = 0;
u64 ret;
- tdx_clflush_page(page);
+ if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
+ (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
+ return -EINVAL;
+
+ while (nr_pages--)
+ tdx_clflush_page(nth_page(page, idx++));
+
ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
*ext_err1 = args.rcx;
--
2.43.2
On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote: > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index f5e2a937c1e7..a66d501b5677 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u According to the discussion in DPAMT [*], "hpa here points to a 2M region that pamt_pages covers. We don't have struct page that represents it. Passing 4k struct page would be misleading IMO." Should we update tdh_mem_page_aug() accordingly to use hpa? Or use struct folio instead? [*] https://lore.kernel.org/all/3coaqkcfp7xtpvh2x4kph55qlopupknm7dmzqox6fakzaedhem@a2oysbvbshpm/ > .rdx = tdx_tdr_pa(td), > .r8 = page_to_phys(page), > }; > + unsigned long nr_pages = 1 << (level * 9); > + struct folio *folio = page_folio(page); > + unsigned long idx = 0; > u64 ret; > > - tdx_clflush_page(page); > + if (!(level >= TDX_PS_4K && level < TDX_PS_NR) || > + (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio))) > + return -EINVAL; > + > + while (nr_pages--) > + tdx_clflush_page(nth_page(page, idx++)); > + > ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); > > *ext_err1 = args.rcx; > -- > 2.43.2 >
On Tue, 2025-07-08 at 16:48 +0800, Yan Zhao wrote: > On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote: > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > index f5e2a937c1e7..a66d501b5677 100644 > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u > According to the discussion in DPAMT [*], > "hpa here points to a 2M region that pamt_pages covers. We don't have > struct page that represents it. Passing 4k struct page would be > misleading IMO." > > Should we update tdh_mem_page_aug() accordingly to use hpa? > Or use struct folio instead? > > [*] https://lore.kernel.org/all/3coaqkcfp7xtpvh2x4kph55qlopupknm7dmzqox6fakzaedhem@a2oysbvbshpm/ The original seamcall wrapper patches used "u64 hpa", etc everywhere. The feedback was that it was too error prone to not have types. We looked at using kvm types (hpa_t, etc), but the type checking was still just surface level [0]. So the goal is to reduce errors and improve code readability. We can consider breaking symmetry if it is better that way. In this case though, why not use struct folio? [0] https://lore.kernel.org/kvm/30d0cef5-82d5-4325-b149-0e99833b8785@intel.com/
On Tue, Jul 08, 2025 at 09:55:39PM +0800, Edgecombe, Rick P wrote:
> On Tue, 2025-07-08 at 16:48 +0800, Yan Zhao wrote:
> > On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote:
> > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > > index f5e2a937c1e7..a66d501b5677 100644
> > > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> > According to the discussion in DPAMT [*],
> > "hpa here points to a 2M region that pamt_pages covers. We don't have
> > struct page that represents it. Passing 4k struct page would be
> > misleading IMO."
> >
> > Should we update tdh_mem_page_aug() accordingly to use hpa?
> > Or use struct folio instead?
> >
> > [*] https://lore.kernel.org/all/3coaqkcfp7xtpvh2x4kph55qlopupknm7dmzqox6fakzaedhem@a2oysbvbshpm/
>
> The original seamcall wrapper patches used "u64 hpa", etc everywhere. The
> feedback was that it was too error prone to not have types. We looked at using
> kvm types (hpa_t, etc), but the type checking was still just surface level [0].
>
> So the goal is to reduce errors and improve code readability. We can consider
> breaking symmetry if it is better that way. In this case though, why not use
> struct folio?
I'm Ok with using struct folio.
My previous ask was based on 2 considerations:
1. hpa is simpler and I didn't find Dave's NAK to Kirill's patch (v1 or v2).
2. using struct folio, I need to introduce "start_idx" as well (as below),
because it's likely that guest_memfd provides a huge folio while KVM wants to
map it at 4KB.
u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct folio *folio,
unsigned long start_idx, u64 *ext_err1, u64 *ext_err2)
{
struct page *start = folio_page(folio, start_idx);
unsigned long npages = 1 << (level * PTE_SHIFT);
struct tdx_module_args args = {
.rcx = gpa | level,
.rdx = tdx_tdr_pa(td),
.r8 = page_to_phys(start),
};
u64 ret;
if (start_idx + npages > folio_nr_pages(folio))
return TDX_SW_ERROR;
for (int i = 0; i < npages; i++)
tdx_clflush_page(nth_page(start, i));
ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
*ext_err1 = args.rcx;
*ext_err2 = args.rdx;
return ret;
}
> [0] https://lore.kernel.org/kvm/30d0cef5-82d5-4325-b149-0e99833b8785@intel.com/
On Wed, 2025-07-09 at 10:23 +0800, Yan Zhao wrote:
> 2. using struct folio, I need to introduce "start_idx" as well (as below),
> because it's likely that guest_memfd provides a huge folio while KVM wants to
> map it at 4KB.
Seems ok to me.
>
> u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct folio *folio,
> unsigned long start_idx, u64 *ext_err1, u64 *ext_err2)
> {
> struct page *start = folio_page(folio, start_idx);
> unsigned long npages = 1 << (level * PTE_SHIFT);
> struct tdx_module_args args = {
> .rcx = gpa | level,
> .rdx = tdx_tdr_pa(td),
> .r8 = page_to_phys(start),
> };
> u64 ret;
>
> if (start_idx + npages > folio_nr_pages(folio))
> return TDX_SW_ERROR;
>
> for (int i = 0; i < npages; i++)
> tdx_clflush_page(nth_page(start, i));
>
> ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
>
> *ext_err1 = args.rcx;
> *ext_err2 = args.rdx;
>
> return ret;
> }
On Tue, Jul 8, 2025 at 6:56 AM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Tue, 2025-07-08 at 16:48 +0800, Yan Zhao wrote: > > On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote: > > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > > index f5e2a937c1e7..a66d501b5677 100644 > > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u > > According to the discussion in DPAMT [*], > > "hpa here points to a 2M region that pamt_pages covers. We don't have > > struct page that represents it. Passing 4k struct page would be > > misleading IMO." > > > > Should we update tdh_mem_page_aug() accordingly to use hpa? > > Or use struct folio instead? > > > > [*] https://lore.kernel.org/all/3coaqkcfp7xtpvh2x4kph55qlopupknm7dmzqox6fakzaedhem@a2oysbvbshpm/ > > The original seamcall wrapper patches used "u64 hpa", etc everywhere. The > feedback was that it was too error prone to not have types. We looked at using > kvm types (hpa_t, etc), but the type checking was still just surface level [0]. > > So the goal is to reduce errors and improve code readability. We can consider > breaking symmetry if it is better that way. In this case though, why not use > struct folio? My vote would be to prefer using "hpa" and not rely on folio/page structs for guest_memfd allocated memory wherever possible. > > [0] https://lore.kernel.org/kvm/30d0cef5-82d5-4325-b149-0e99833b8785@intel.com/
On Tue, 2025-07-08 at 08:29 -0700, Vishal Annapurve wrote: > > The original seamcall wrapper patches used "u64 hpa", etc everywhere. The > > feedback was that it was too error prone to not have types. We looked at > > using > > kvm types (hpa_t, etc), but the type checking was still just surface level > > [0]. > > > > So the goal is to reduce errors and improve code readability. We can > > consider > > breaking symmetry if it is better that way. In this case though, why not use > > struct folio? > > My vote would be to prefer using "hpa" and not rely on folio/page > structs for guest_memfd allocated memory wherever possible. Is this because you want to enable struct page-less gmemfd in the future? Or other reason?
On Tue, Jul 8, 2025 at 8:32 AM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Tue, 2025-07-08 at 08:29 -0700, Vishal Annapurve wrote: > > > The original seamcall wrapper patches used "u64 hpa", etc everywhere. The > > > feedback was that it was too error prone to not have types. We looked at > > > using > > > kvm types (hpa_t, etc), but the type checking was still just surface level > > > [0]. > > > > > > So the goal is to reduce errors and improve code readability. We can > > > consider > > > breaking symmetry if it is better that way. In this case though, why not use > > > struct folio? > > > > My vote would be to prefer using "hpa" and not rely on folio/page > > structs for guest_memfd allocated memory wherever possible. > > Is this because you want to enable struct page-less gmemfd in the future? Yes. That's the only reason.
On Tue, 2025-07-08 at 15:06 -0700, Vishal Annapurve wrote: > > > My vote would be to prefer using "hpa" and not rely on folio/page > > > structs for guest_memfd allocated memory wherever possible. > > > > Is this because you want to enable struct page-less gmemfd in the future? > > Yes. That's the only reason. I don't think we should change just this field of this seamcall wrapper from the current pattern for that reason. When this stuff comes along it will be just about as easy to change it with the rest. Then in the meantime it doesn't look asymmetric. In general, I (again) think that we should not focus on accommodating future stuff unless there is an ABI touch point. This is to ultimately speed enabling of the entire stack. It is definitely not to make it harder to implement TDX support for pfn based gmem in the future. Rather to make it possible. As in, if nothing is upstream because we are endlessly debating how it all fits together at once, then it won't be possible to enhance it further.
On Tue, Jul 8, 2025 at 4:16 PM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Tue, 2025-07-08 at 15:06 -0700, Vishal Annapurve wrote: > > > > My vote would be to prefer using "hpa" and not rely on folio/page > > > > structs for guest_memfd allocated memory wherever possible. > > > > > > Is this because you want to enable struct page-less gmemfd in the future? > > > > Yes. That's the only reason. > > I don't think we should change just this field of this seamcall wrapper from the > current pattern for that reason. When this stuff comes along it will be just > about as easy to change it with the rest. Then in the meantime it doesn't look > asymmetric. > > In general, I (again) think that we should not focus on accommodating future > stuff unless there is an ABI touch point. This is to ultimately speed enabling > of the entire stack. > > It is definitely not to make it harder to implement TDX support for pfn based > gmem in the future. Rather to make it possible. As in, if nothing is upstream > because we are endlessly debating how it all fits together at once, then it > won't be possible to enhance it further. I agree and if we can't do without page struct for now that's fine. My response was just to favor pfn/hpa over "page struct" if possible, given that we have a choice here. Feel free to ignore if symmetry seems more important.
On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote: >Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages. > >Verify the validity of the level and ensure that the mapping range is fully >contained within the page folio. > >As a conservative solution, perform CLFLUSH on all pages to be mapped into >the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any >dirty cache lines do not write back later and clobber TD memory. > >Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> >Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> >--- > arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > >diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c >index f5e2a937c1e7..a66d501b5677 100644 >--- a/arch/x86/virt/vmx/tdx/tdx.c >+++ b/arch/x86/virt/vmx/tdx/tdx.c >@@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u > .rdx = tdx_tdr_pa(td), > .r8 = page_to_phys(page), > }; >+ unsigned long nr_pages = 1 << (level * 9); >+ struct folio *folio = page_folio(page); >+ unsigned long idx = 0; > u64 ret; > >- tdx_clflush_page(page); >+ if (!(level >= TDX_PS_4K && level < TDX_PS_NR) || >+ (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio))) >+ return -EINVAL; Returning -EINVAL looks incorrect as the return type is u64. >+ >+ while (nr_pages--) >+ tdx_clflush_page(nth_page(page, idx++)); >+ > ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); > > *ext_err1 = args.rcx; >-- >2.43.2 >
On Thu, May 15, 2025 at 10:16:58AM +0800, Chao Gao wrote: > On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote: > >Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages. > > > >Verify the validity of the level and ensure that the mapping range is fully > >contained within the page folio. > > > >As a conservative solution, perform CLFLUSH on all pages to be mapped into > >the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any > >dirty cache lines do not write back later and clobber TD memory. > > > >Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > >Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > >Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > >--- > > arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > >diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > >index f5e2a937c1e7..a66d501b5677 100644 > >--- a/arch/x86/virt/vmx/tdx/tdx.c > >+++ b/arch/x86/virt/vmx/tdx/tdx.c > >@@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u > > .rdx = tdx_tdr_pa(td), > > .r8 = page_to_phys(page), > > }; > >+ unsigned long nr_pages = 1 << (level * 9); > >+ struct folio *folio = page_folio(page); > >+ unsigned long idx = 0; > > u64 ret; > > > >- tdx_clflush_page(page); > >+ if (!(level >= TDX_PS_4K && level < TDX_PS_NR) || > >+ (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio))) > >+ return -EINVAL; > > Returning -EINVAL looks incorrect as the return type is u64. Good catch. Thanks! I'll think about how to handle it. Looks it could be dropped if we trust KVM. > >+ while (nr_pages--) > >+ tdx_clflush_page(nth_page(page, idx++)); > >+ > > ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); > > > > *ext_err1 = args.rcx; > >-- > >2.43.2 > >
On Thu, 2025-04-24 at 11:04 +0800, Yan Zhao wrote:
> Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
>
> Verify the validity of the level and ensure that the mapping range is fully
> contained within the page folio.
>
> As a conservative solution, perform CLFLUSH on all pages to be mapped into
> the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
> dirty cache lines do not write back later and clobber TD memory.
This should have a brief background on why it doesn't use the arg - what is
deficient today. Also, an explanation of how it will be used (i.e. what types of
pages will be passed)
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index f5e2a937c1e7..a66d501b5677 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> .rdx = tdx_tdr_pa(td),
> .r8 = page_to_phys(page),
> };
> + unsigned long nr_pages = 1 << (level * 9);
> + struct folio *folio = page_folio(page);
> + unsigned long idx = 0;
> u64 ret;
>
> - tdx_clflush_page(page);
> + if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
> + (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
> + return -EINVAL;
Shouldn't KVM not try to map a huge page in this situation? Doesn't seem like a
job for the SEAMCALL wrapper.
> +
> + while (nr_pages--)
> + tdx_clflush_page(nth_page(page, idx++));
clflush_cache_range() is:
static void tdx_clflush_page(struct page *page)
{
clflush_cache_range(page_to_virt(page), PAGE_SIZE);
}
So we have loops within loops... Better to add an arg to tdx_clflush_page() or
add a variant that takes one.
> +
> ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
>
> *ext_err1 = args.rcx;
On Wed, May 14, 2025 at 02:52:49AM +0800, Edgecombe, Rick P wrote:
> On Thu, 2025-04-24 at 11:04 +0800, Yan Zhao wrote:
> > Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
> >
> > Verify the validity of the level and ensure that the mapping range is fully
> > contained within the page folio.
> >
> > As a conservative solution, perform CLFLUSH on all pages to be mapped into
> > the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
> > dirty cache lines do not write back later and clobber TD memory.
>
> This should have a brief background on why it doesn't use the arg - what is
> deficient today. Also, an explanation of how it will be used (i.e. what types of
> pages will be passed)
Will do.
> >
> > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> > arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index f5e2a937c1e7..a66d501b5677 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
> > .rdx = tdx_tdr_pa(td),
> > .r8 = page_to_phys(page),
> > };
> > + unsigned long nr_pages = 1 << (level * 9);
> > + struct folio *folio = page_folio(page);
> > + unsigned long idx = 0;
> > u64 ret;
> >
> > - tdx_clflush_page(page);
> > + if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
> > + (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
> > + return -EINVAL;
>
> Shouldn't KVM not try to map a huge page in this situation? Doesn't seem like a
> job for the SEAMCALL wrapper.
Ok. If the decision is to trust KVM and all potential callers, it's reasonable
to drop those checks.
> > +
> > + while (nr_pages--)
> > + tdx_clflush_page(nth_page(page, idx++));
>
> clflush_cache_range() is:
> static void tdx_clflush_page(struct page *page)
> {
> clflush_cache_range(page_to_virt(page), PAGE_SIZE);
> }
>
> So we have loops within loops... Better to add an arg to tdx_clflush_page() or
> add a variant that takes one.
Ok.
One thing to note is that even with an extra arg, tdx_clflush_page() has to call
clflush_cache_range() page by page because with
"#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)",
page virtual addresses are not necessarily contiguous.
What about Binbin's proposal [1]? i.e.,
while (nr_pages)
tdx_clflush_page(nth_page(page, --nr_pages));
[1] https://lore.kernel.org/all/a7d0988d-037c-454f-bc6b-57e71b357488@linux.intel.com/
> > +
> > ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
> >
> > *ext_err1 = args.rcx;
>
On 5/16/25 12:05, Yan Zhao wrote:
> On Wed, May 14, 2025 at 02:52:49AM +0800, Edgecombe, Rick P wrote:
>> On Thu, 2025-04-24 at 11:04 +0800, Yan Zhao wrote:
>>> Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
>>>
>>> Verify the validity of the level and ensure that the mapping range is fully
>>> contained within the page folio.
>>>
>>> As a conservative solution, perform CLFLUSH on all pages to be mapped into
>>> the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any
>>> dirty cache lines do not write back later and clobber TD memory.
>>
>> This should have a brief background on why it doesn't use the arg - what is
>> deficient today. Also, an explanation of how it will be used (i.e. what types of
>> pages will be passed)
> Will do.
>
>>>
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>>> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
>>> ---
>>> arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++-
>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
>>> index f5e2a937c1e7..a66d501b5677 100644
>>> --- a/arch/x86/virt/vmx/tdx/tdx.c
>>> +++ b/arch/x86/virt/vmx/tdx/tdx.c
>>> @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u
>>> .rdx = tdx_tdr_pa(td),
>>> .r8 = page_to_phys(page),
>>> };
>>> + unsigned long nr_pages = 1 << (level * 9);
>>> + struct folio *folio = page_folio(page);
>>> + unsigned long idx = 0;
>>> u64 ret;
>>>
>>> - tdx_clflush_page(page);
>>> + if (!(level >= TDX_PS_4K && level < TDX_PS_NR) ||
>>> + (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio)))
>>> + return -EINVAL;
>>
>> Shouldn't KVM not try to map a huge page in this situation? Doesn't seem like a
>> job for the SEAMCALL wrapper.
> Ok. If the decision is to trust KVM and all potential callers, it's reasonable
> to drop those checks.
>
>>> +
>>> + while (nr_pages--)
>>> + tdx_clflush_page(nth_page(page, idx++));
>>
>> clflush_cache_range() is:
>> static void tdx_clflush_page(struct page *page)
>> {
>> clflush_cache_range(page_to_virt(page), PAGE_SIZE);
>> }
>>
>> So we have loops within loops... Better to add an arg to tdx_clflush_page() or
>> add a variant that takes one.
> Ok.
>
> One thing to note is that even with an extra arg, tdx_clflush_page() has to call
> clflush_cache_range() page by page because with
> "#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)",
> page virtual addresses are not necessarily contiguous.
>
> What about Binbin's proposal [1]? i.e.,
>
> while (nr_pages)
> tdx_clflush_page(nth_page(page, --nr_pages));
What's the problem with using:
+ for (int i = 0; nr_pages; nr_pages--)
+ tdx_clflush_page(nth_page(page, i++))
The kernel now allows C99-style definition of variables inside a loop +
it's clear how many times the loop has to be executed.
>
> [1] https://lore.kernel.org/all/a7d0988d-037c-454f-bc6b-57e71b357488@linux.intel.com/
>
>>> +
>>> ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args);
>>>
>>> *ext_err1 = args.rcx;
>>
>
On Thu, Jun 19, 2025 at 12:26:07PM +0300, Nikolay Borisov wrote: > > What about Binbin's proposal [1]? i.e., > > > > while (nr_pages) > > tdx_clflush_page(nth_page(page, --nr_pages)); > > What's the problem with using: > > + for (int i = 0; nr_pages; nr_pages--) > + tdx_clflush_page(nth_page(page, i++)) Thanks! It looks good to me. > The kernel now allows C99-style definition of variables inside a loop + it's > clear how many times the loop has to be executed.
On Fri, 2025-05-16 at 17:05 +0800, Yan Zhao wrote: > > So we have loops within loops... Better to add an arg to tdx_clflush_page() > > or > > add a variant that takes one. > Ok. > > One thing to note is that even with an extra arg, tdx_clflush_page() has to > call > clflush_cache_range() page by page because with > "#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)", > page virtual addresses are not necessarily contiguous. > > What about Binbin's proposal [1]? i.e., > > while (nr_pages) > tdx_clflush_page(nth_page(page, --nr_pages)); > > [1] > https://lore.kernel.org/all/a7d0988d-037c-454f-bc6b-57e71b357488@linux.intel.com/ These SEAMCALLs are handling physically contiguous pages so I don't think we need to worry about that. But Binbin's suggestion seems fine too.
On 4/24/2025 11:04 AM, Yan Zhao wrote: > Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages. > > Verify the validity of the level and ensure that the mapping range is fully > contained within the page folio. > > As a conservative solution, perform CLFLUSH on all pages to be mapped into > the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any > dirty cache lines do not write back later and clobber TD memory. > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > --- > arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index f5e2a937c1e7..a66d501b5677 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u > .rdx = tdx_tdr_pa(td), > .r8 = page_to_phys(page), > }; > + unsigned long nr_pages = 1 << (level * 9); > + struct folio *folio = page_folio(page); > + unsigned long idx = 0; > u64 ret; > > - tdx_clflush_page(page); > + if (!(level >= TDX_PS_4K && level < TDX_PS_NR) || > + (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio))) > + return -EINVAL; > + > + while (nr_pages--) > + tdx_clflush_page(nth_page(page, idx++)); Is the following better to save a variable? while (nr_pages) tdx_clflush_page(nth_page(page, --nr_pages)); > + > ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); > > *ext_err1 = args.rcx;
On Fri, Apr 25, 2025 at 02:51:18PM +0800, Binbin Wu wrote: > > > On 4/24/2025 11:04 AM, Yan Zhao wrote: > > Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages. > > > > Verify the validity of the level and ensure that the mapping range is fully > > contained within the page folio. > > > > As a conservative solution, perform CLFLUSH on all pages to be mapped into > > the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any > > dirty cache lines do not write back later and clobber TD memory. > > > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > > --- > > arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > index f5e2a937c1e7..a66d501b5677 100644 > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u > > .rdx = tdx_tdr_pa(td), > > .r8 = page_to_phys(page), > > }; > > + unsigned long nr_pages = 1 << (level * 9); > > + struct folio *folio = page_folio(page); > > + unsigned long idx = 0; > > u64 ret; > > - tdx_clflush_page(page); > > + if (!(level >= TDX_PS_4K && level < TDX_PS_NR) || > > + (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio))) > > + return -EINVAL; > > + > > + while (nr_pages--) > > + tdx_clflush_page(nth_page(page, idx++)); > Is the following better to save a variable? > > while (nr_pages) > tdx_clflush_page(nth_page(page, --nr_pages)); Looks better except performing the clflush in reverse order :) > > > + > > ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); > > *ext_err1 = args.rcx; >
On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote: > Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages. > > Verify the validity of the level and ensure that the mapping range is fully > contained within the page folio. > > As a conservative solution, perform CLFLUSH on all pages to be mapped into > the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any > dirty cache lines do not write back later and clobber TD memory. > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > --- > arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index f5e2a937c1e7..a66d501b5677 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u > .rdx = tdx_tdr_pa(td), > .r8 = page_to_phys(page), > }; > + unsigned long nr_pages = 1 << (level * 9); PTE_SHIFT. > + struct folio *folio = page_folio(page); > + unsigned long idx = 0; > u64 ret; > > - tdx_clflush_page(page); > + if (!(level >= TDX_PS_4K && level < TDX_PS_NR) || Do we even need this check? > + (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio))) > + return -EINVAL; > + > + while (nr_pages--) > + tdx_clflush_page(nth_page(page, idx++)); > + > ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); > > *ext_err1 = args.rcx; > -- > 2.43.2 > -- Kiryl Shutsemau / Kirill A. Shutemov
On Thu, Apr 24, 2025 at 10:48:53AM +0300, Kirill A. Shutemov wrote: > On Thu, Apr 24, 2025 at 11:04:28AM +0800, Yan Zhao wrote: > > Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages. > > > > Verify the validity of the level and ensure that the mapping range is fully > > contained within the page folio. > > > > As a conservative solution, perform CLFLUSH on all pages to be mapped into > > the TD before invoking the SEAMCALL TDH_MEM_PAGE_AUG. This ensures that any > > dirty cache lines do not write back later and clobber TD memory. > > > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > > --- > > arch/x86/virt/vmx/tdx/tdx.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > index f5e2a937c1e7..a66d501b5677 100644 > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > @@ -1595,9 +1595,18 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u > > .rdx = tdx_tdr_pa(td), > > .r8 = page_to_phys(page), > > }; > > + unsigned long nr_pages = 1 << (level * 9); > > PTE_SHIFT. Yes. Thanks. > > + struct folio *folio = page_folio(page); > > + unsigned long idx = 0; > > u64 ret; > > > > - tdx_clflush_page(page); > > + if (!(level >= TDX_PS_4K && level < TDX_PS_NR) || > > Do we even need this check? Maybe not if tdh_mem_page_aug() trusts KVM :) The consideration is to avoid nr_pages being too huge to cause too many tdx_clflush_page()s on any reckless error. > > + (folio_page_idx(folio, page) + nr_pages > folio_nr_pages(folio))) > > + return -EINVAL; > > + > > + while (nr_pages--) > > + tdx_clflush_page(nth_page(page, idx++)); > > + > > ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); > > > > *ext_err1 = args.rcx; > > -- > > 2.43.2 > > > > -- > Kiryl Shutsemau / Kirill A. Shutemov
© 2016 - 2025 Red Hat, Inc.