[v3] RE: [PATCHv3 0/3] x86/tdx: Fix one more load_unaligned_zeropad() issue

RE: [PATCHv3 0/3] x86/tdx: Fix one more load_unaligned_zeropad() issue

Posted by Michael Kelley (LINUX) 2 years, 6 months ago

From: Kirill A. Shutemov <kirill@shutemov.name> Sent: Saturday, July 8, 2023 11:09 PM
> 
> On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > From: Kirill A. Shutemov <kirill@shutemov.name> Sent: Friday, July 7, 2023 7:07 AM
> > >
> > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Sent: Tuesday, June 6,
> 2023 2:56 AM
> >
> > [snip]
> >
> > >
> > > It only addresses the problem that happens on transition, but
> > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > steps from private to shared mapping and shared mapping triggers #VE,
> > > kernel should be able to handle it.
> >
> > I'm showing my ignorance of TDX architectural details, but what's the
> > situation where shared mappings in general can trigger a #VE?  How
> > do such situations get handled for references that aren't from
> > load_unaligned_zeropad()?
> >
> 
> Shared mappings are under host/VMM control. It can just not map the page
> in shared-ept and trigger ept-violation #VE.

I know you are out on vacation, but let me follow up now for further
discussion when you are back.

Isn't the scenario you are describing a malfunctioning or malicious
host/VMM?  Would what you are describing be done as part of normal
operation?   Kernel code must have switched the page from private to
shared for some purpose.  As soon as that code (which presumably
does not have any entry in the exception table) touches the page, it
would take the #VE and the enter the die path because there's no fixup.
So is there value in having load_unaligned_zeropad() handle the #VE and
succeed where a normal reference would fail?

I'd still like to see the private <-> shared transition code mark the pages
as invalid during the transition, and avoid the possibility of #VE and
similar cases with SEV-SNP.  Such approach reduces (eliminates?)
entanglement between CoCo-specific exceptions and
load_unaligned_zeropad().  It also greatly simplifies TD Partition cases
and SEV-SNP cases where a paravisor is used.

But maybe I'm still missing a case where code must handle the #VE
for load_unaligned_zeropad() outside of private <-> shared transitions.

Michael

> 
> > > Any comments?
> >
> > This looks good to me.  I applied the diff to a TDX VM running on
> > Hyper-V.  When a  load_unaligned_zeropad() occurs on a page that is
> > transitioning between private and shared, the zeropad fixup is now
> > done correctly via the #VE handler.  (This is *without* my RFC patch to
> > mark the pages invalid during a transition.)
> 
> Great.
> 
> I am at vacation for the next two weeks. I will prepare a proper patch
> when I am back. Feel free to make patch yourself if you feel it is urgent.
> 
> --
>   Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCHv3 0/3] x86/tdx: Fix one more load_unaligned_zeropad() issue

Posted by Kirill A. Shutemov 2 years, 6 months ago

On Thu, Jul 13, 2023 at 02:43:39PM +0000, Michael Kelley (LINUX) wrote:
> From: Kirill A. Shutemov <kirill@shutemov.name> Sent: Saturday, July 8, 2023 11:09 PM
> > 
> > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > > From: Kirill A. Shutemov <kirill@shutemov.name> Sent: Friday, July 7, 2023 7:07 AM
> > > >
> > > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > > From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Sent: Tuesday, June 6,
> > 2023 2:56 AM
> > >
> > > [snip]
> > >
> > > >
> > > > It only addresses the problem that happens on transition, but
> > > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > > steps from private to shared mapping and shared mapping triggers #VE,
> > > > kernel should be able to handle it.
> > >
> > > I'm showing my ignorance of TDX architectural details, but what's the
> > > situation where shared mappings in general can trigger a #VE?  How
> > > do such situations get handled for references that aren't from
> > > load_unaligned_zeropad()?
> > >
> > 
> > Shared mappings are under host/VMM control. It can just not map the page
> > in shared-ept and trigger ept-violation #VE.
> 
> I know you are out on vacation, but let me follow up now for further
> discussion when you are back.
> 
> Isn't the scenario you are describing a malfunctioning or malicious
> host/VMM?  Would what you are describing be done as part of normal
> operation?   Kernel code must have switched the page from private to
> shared for some purpose.  As soon as that code (which presumably
> does not have any entry in the exception table) touches the page, it
> would take the #VE and the enter the die path because there's no fixup.
> So is there value in having load_unaligned_zeropad() handle the #VE and
> succeed where a normal reference would fail?

#VE on shared memory is legitimately used for MMIO. But MMIO region is
usually separate from the real memory in physical address space.

But we also have DMA.

DMA pages allocated from common pool of memory and they can be next to
dentry cache that kernel accesses with load_unaligned_zeropad(). DMA pages
are shared, but they usually backed by memory and not cause #VE. However
shared memory is under full control from VMM and VMM can remove page at
any point which would make platform to deliver #VE to the guest. This is
pathological scenario, but I think it still worth handling gracefully.

> I'd still like to see the private <-> shared transition code mark the pages
> as invalid during the transition, and avoid the possibility of #VE and
> similar cases with SEV-SNP.  Such approach reduces (eliminates?)
> entanglement between CoCo-specific exceptions and
> load_unaligned_zeropad().  It also greatly simplifies TD Partition cases
> and SEV-SNP cases where a paravisor is used.

I doesn't eliminates issue for TDX as the scenario above is not transient.
It can happen after memory is converted to shared.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

RE: [PATCHv3 0/3] x86/tdx: Fix one more load_unaligned_zeropad() issue

Posted by Michael Kelley (LINUX) 2 years, 6 months ago

From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Sent: Monday, July 24, 2023 4:19 PM
> 
> On Thu, Jul 13, 2023 at 02:43:39PM +0000, Michael Kelley (LINUX) wrote:
> > From: Kirill A. Shutemov <kirill@shutemov.name> Sent: Saturday, July 8, 2023 11:09 PM
> > >
> > > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > > > From: Kirill A. Shutemov <kirill@shutemov.name> Sent: Friday, July 7, 2023 7:07 AM
> > > > >
> > > > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Sent: Tuesday, June 6, 2023 2:56 AM
> > > >
> > > > [snip]
> > > >
> > > > >
> > > > > It only addresses the problem that happens on transition, but
> > > > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > > > steps from private to shared mapping and shared mapping triggers #VE,
> > > > > kernel should be able to handle it.
> > > >
> > > > I'm showing my ignorance of TDX architectural details, but what's the
> > > > situation where shared mappings in general can trigger a #VE?  How
> > > > do such situations get handled for references that aren't from
> > > > load_unaligned_zeropad()?
> > > >
> > >
> > > Shared mappings are under host/VMM control. It can just not map the page
> > > in shared-ept and trigger ept-violation #VE.
> >
> > I know you are out on vacation, but let me follow up now for further
> > discussion when you are back.
> >
> > Isn't the scenario you are describing a malfunctioning or malicious
> > host/VMM?  Would what you are describing be done as part of normal
> > operation?   Kernel code must have switched the page from private to
> > shared for some purpose.  As soon as that code (which presumably
> > does not have any entry in the exception table) touches the page, it
> > would take the #VE and the enter the die path because there's no fixup.
> > So is there value in having load_unaligned_zeropad() handle the #VE and
> > succeed where a normal reference would fail?
> 
> #VE on shared memory is legitimately used for MMIO. But MMIO region is
> usually separate from the real memory in physical address space.
> 
> But we also have DMA.
> 
> DMA pages allocated from common pool of memory and they can be next to
> dentry cache that kernel accesses with load_unaligned_zeropad(). DMA pages
> are shared, but they usually backed by memory and not cause #VE. However
> shared memory is under full control from VMM and VMM can remove page at
> any point which would make platform to deliver #VE to the guest. This is
> pathological scenario, but I think it still worth handling gracefully.

Yep, pages targeted by DMA have been transitioned to shared, and could
be scattered anywhere in the guest physical address space.  Such a page
could be touched by load_unaligned_zeropad().  But could you elaborate
more on the "pathological scenario" where the guest physical page isn't
backed by memory?

Sure, the VMM can remove the page at any point, but why would it do
so?  Is doing so a legitimate part of the host/guest contract that the guest
must handle cleanly?  More importantly, what is the guest expected to
do in such a case?  I would expect that at some point such a DMA page
is accessed by a guest vCPU with an explicit reference that is not
load_unaligned_zeropad().  Then the guest would take a #VE that
goes down the #GP path and invokes die().

I don't object to make the load_unaligned_zeropad() fixup path work
correctly in response to a #VE, as it seems like general goodness.  I'm
just trying to make sure I understand the nuances of "why".  :-)

> 
> > I'd still like to see the private <-> shared transition code mark the pages
> > as invalid during the transition, and avoid the possibility of #VE and
> > similar cases with SEV-SNP.  Such approach reduces (eliminates?)
> > entanglement between CoCo-specific exceptions and
> > load_unaligned_zeropad().  It also greatly simplifies TD Partition cases
> > and SEV-SNP cases where a paravisor is used.
> 
> I doesn't eliminates issue for TDX as the scenario above is not transient.
> It can happen after memory is converted to shared.

Notwithstanding, do you have any objections to the private <-> shared
transition code being changed so it won't be the cause of #VE, and
similar on SEV-SNP?

Michael

Re: [PATCHv3 0/3] x86/tdx: Fix one more load_unaligned_zeropad() issue

Posted by Kirill A. Shutemov 2 years, 6 months ago

On Tue, Jul 25, 2023 at 03:51:24PM +0000, Michael Kelley (LINUX) wrote:
> From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Sent: Monday, July 24, 2023 4:19 PM
> > 
> > On Thu, Jul 13, 2023 at 02:43:39PM +0000, Michael Kelley (LINUX) wrote:
> > > From: Kirill A. Shutemov <kirill@shutemov.name> Sent: Saturday, July 8, 2023 11:09 PM
> > > >
> > > > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > > > > From: Kirill A. Shutemov <kirill@shutemov.name> Sent: Friday, July 7, 2023 7:07 AM
> > > > > >
> > > > > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > > From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Sent: Tuesday, June 6, 2023 2:56 AM
> > > > >
> > > > > [snip]
> > > > >
> > > > > >
> > > > > > It only addresses the problem that happens on transition, but
> > > > > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > > > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > > > > steps from private to shared mapping and shared mapping triggers #VE,
> > > > > > kernel should be able to handle it.
> > > > >
> > > > > I'm showing my ignorance of TDX architectural details, but what's the
> > > > > situation where shared mappings in general can trigger a #VE?  How
> > > > > do such situations get handled for references that aren't from
> > > > > load_unaligned_zeropad()?
> > > > >
> > > >
> > > > Shared mappings are under host/VMM control. It can just not map the page
> > > > in shared-ept and trigger ept-violation #VE.
> > >
> > > I know you are out on vacation, but let me follow up now for further
> > > discussion when you are back.
> > >
> > > Isn't the scenario you are describing a malfunctioning or malicious
> > > host/VMM?  Would what you are describing be done as part of normal
> > > operation?   Kernel code must have switched the page from private to
> > > shared for some purpose.  As soon as that code (which presumably
> > > does not have any entry in the exception table) touches the page, it
> > > would take the #VE and the enter the die path because there's no fixup.
> > > So is there value in having load_unaligned_zeropad() handle the #VE and
> > > succeed where a normal reference would fail?
> > 
> > #VE on shared memory is legitimately used for MMIO. But MMIO region is
> > usually separate from the real memory in physical address space.
> > 
> > But we also have DMA.
> > 
> > DMA pages allocated from common pool of memory and they can be next to
> > dentry cache that kernel accesses with load_unaligned_zeropad(). DMA pages
> > are shared, but they usually backed by memory and not cause #VE. However
> > shared memory is under full control from VMM and VMM can remove page at
> > any point which would make platform to deliver #VE to the guest. This is
> > pathological scenario, but I think it still worth handling gracefully.
> 
> Yep, pages targeted by DMA have been transitioned to shared, and could
> be scattered anywhere in the guest physical address space.  Such a page
> could be touched by load_unaligned_zeropad().  But could you elaborate
> more on the "pathological scenario" where the guest physical page isn't
> backed by memory?
> 
> Sure, the VMM can remove the page at any point, but why would it do
> so?  Is doing so a legitimate part of the host/guest contract that the guest
> must handle cleanly?  More importantly, what is the guest expected to
> do in such a case?  I would expect that at some point such a DMA page
> is accessed by a guest vCPU with an explicit reference that is not
> load_unaligned_zeropad().  Then the guest would take a #VE that
> goes down the #GP path and invokes die().
> 
> I don't object to make the load_unaligned_zeropad() fixup path work
> correctly in response to a #VE, as it seems like general goodness.  I'm
> just trying to make sure I understand the nuances of "why".  :-)

We actually triggered the issue during internal testing. I wrote initial
patch after that. But looking back on the report I don't have an answer
why the page triggered #VE. Maybe VMM or virtual BIOS screwed up and put
MMIO next to normal memory, I donno.

> > > I'd still like to see the private <-> shared transition code mark the pages
> > > as invalid during the transition, and avoid the possibility of #VE and
> > > similar cases with SEV-SNP.  Such approach reduces (eliminates?)
> > > entanglement between CoCo-specific exceptions and
> > > load_unaligned_zeropad().  It also greatly simplifies TD Partition cases
> > > and SEV-SNP cases where a paravisor is used.
> > 
> > I doesn't eliminates issue for TDX as the scenario above is not transient.
> > It can happen after memory is converted to shared.
> 
> Notwithstanding, do you have any objections to the private <-> shared
> transition code being changed so it won't be the cause of #VE, and
> similar on SEV-SNP?

I am not yet convinced it is needed. But let's see the code.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

RE: [PATCHv3 0/3] x86/tdx: Fix one more load_unaligned_zeropad() issue

Posted by Michael Kelley (LINUX) 2 years, 6 months ago

From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Sent: Tuesday, July 25, 2023 4:13 PM
> 
> On Tue, Jul 25, 2023 at 03:51:24PM +0000, Michael Kelley (LINUX) wrote:
> > From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Sent: Monday, July 24,
> 2023 4:19 PM
> > >
> > > On Thu, Jul 13, 2023 at 02:43:39PM +0000, Michael Kelley (LINUX) wrote:
> > > > From: Kirill A. Shutemov <kirill@shutemov.name> Sent: Saturday, July 8, 2023
> 11:09 PM
> > > > >
> > > > > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > From: Kirill A. Shutemov <kirill@shutemov.name> Sent: Friday, July 7, 2023
> 7:07 AM
> > > > > > >
> > > > > > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > > > From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Sent: Tuesday,
> June 6, 2023 2:56 AM
> > > > > >
> > > > > > [snip]
> > > > > >
> > > > > > >
> > > > > > > It only addresses the problem that happens on transition, but
> > > > > > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > > > > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > > > > > steps from private to shared mapping and shared mapping triggers #VE,
> > > > > > > kernel should be able to handle it.
> > > > > >
> > > > > > I'm showing my ignorance of TDX architectural details, but what's the
> > > > > > situation where shared mappings in general can trigger a #VE?  How
> > > > > > do such situations get handled for references that aren't from
> > > > > > load_unaligned_zeropad()?
> > > > > >
> > > > >
> > > > > Shared mappings are under host/VMM control. It can just not map the page
> > > > > in shared-ept and trigger ept-violation #VE.
> > > >
> > > > I know you are out on vacation, but let me follow up now for further
> > > > discussion when you are back.
> > > >
> > > > Isn't the scenario you are describing a malfunctioning or malicious
> > > > host/VMM?  Would what you are describing be done as part of normal
> > > > operation?   Kernel code must have switched the page from private to
> > > > shared for some purpose.  As soon as that code (which presumably
> > > > does not have any entry in the exception table) touches the page, it
> > > > would take the #VE and the enter the die path because there's no fixup.
> > > > So is there value in having load_unaligned_zeropad() handle the #VE and
> > > > succeed where a normal reference would fail?
> > >
> > > #VE on shared memory is legitimately used for MMIO. But MMIO region is
> > > usually separate from the real memory in physical address space.
> > >
> > > But we also have DMA.
> > >
> > > DMA pages allocated from common pool of memory and they can be next to
> > > dentry cache that kernel accesses with load_unaligned_zeropad(). DMA pages
> > > are shared, but they usually backed by memory and not cause #VE. However
> > > shared memory is under full control from VMM and VMM can remove page at
> > > any point which would make platform to deliver #VE to the guest. This is
> > > pathological scenario, but I think it still worth handling gracefully.
> >
> > Yep, pages targeted by DMA have been transitioned to shared, and could
> > be scattered anywhere in the guest physical address space.  Such a page
> > could be touched by load_unaligned_zeropad().  But could you elaborate
> > more on the "pathological scenario" where the guest physical page isn't
> > backed by memory?
> >
> > Sure, the VMM can remove the page at any point, but why would it do
> > so?  Is doing so a legitimate part of the host/guest contract that the guest
> > must handle cleanly?  More importantly, what is the guest expected to
> > do in such a case?  I would expect that at some point such a DMA page
> > is accessed by a guest vCPU with an explicit reference that is not
> > load_unaligned_zeropad().  Then the guest would take a #VE that
> > goes down the #GP path and invokes die().
> >
> > I don't object to make the load_unaligned_zeropad() fixup path work
> > correctly in response to a #VE, as it seems like general goodness.  I'm
> > just trying to make sure I understand the nuances of "why".  :-)
> 
> We actually triggered the issue during internal testing. I wrote initial
> patch after that. But looking back on the report I don't have an answer
> why the page triggered #VE. Maybe VMM or virtual BIOS screwed up and put
> MMIO next to normal memory, I donno.
> 
> > > > I'd still like to see the private <-> shared transition code mark the pages
> > > > as invalid during the transition, and avoid the possibility of #VE and
> > > > similar cases with SEV-SNP.  Such approach reduces (eliminates?)
> > > > entanglement between CoCo-specific exceptions and
> > > > load_unaligned_zeropad().  It also greatly simplifies TD Partition cases
> > > > and SEV-SNP cases where a paravisor is used.
> > >
> > > I doesn't eliminates issue for TDX as the scenario above is not transient.
> > > It can happen after memory is converted to shared.
> >
> > Notwithstanding, do you have any objections to the private <-> shared
> > transition code being changed so it won't be the cause of #VE, and
> > similar on SEV-SNP?
> 
> I am not yet convinced it is needed. But let's see the code.
> 

Fair enough.  But similarly, I'm not convinced that we have
load_unaligned_zeropad() cases outside of private <-> shared transitions
that *must* be fixed up cleanly. :-)   

Here's the RFC patch, and it includes a couple of open questions:

https://lore.kernel.org/lkml/1688661719-60329-1-git-send-email-mikelley@microsoft.com/

In TD Partitioning configurations, I'm hoping we can eliminate #VE EPT
violations needing to be handled in the L2 guest.  The L1 guest will handle
things like MMIO.  But the L1 guest can't easily detect that a #VE with EPT
violation is due load_unaligned_zeropad() in the L2.  And trying to reflect
the #VE from the L1 guest to the L2 guest wades into a bunch of problems
that I want to avoid.

The same problems arise in SEV-SNP with a paravisor running at VMPL 0.
Changing the private <-> shared transition code prevents the problems
there as well.

Michael