[PATCH v5 0/2] MTE support for KVM guest

Steven Price posted 2 patches 4 days, 15 hours ago
Failed in applying to current master (apply log)
arch/arm64/include/asm/kvm_emulate.h       |  3 +++
arch/arm64/include/asm/kvm_host.h          |  8 ++++++++
arch/arm64/include/asm/sysreg.h            |  3 ++-
arch/arm64/kvm/arm.c                       |  9 +++++++++
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
arch/arm64/kvm/mmu.c                       |  6 ++++++
arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
include/uapi/linux/kvm.h                   |  1 +
8 files changed, 58 insertions(+), 6 deletions(-)

[PATCH v5 0/2] MTE support for KVM guest

Posted by Steven Price 4 days, 15 hours ago
This series adds support for Arm's Memory Tagging Extension (MTE) to
KVM, allowing KVM guests to make use of it. This builds on the existing
user space support already in v5.10-rc1, see [1] for an overview.

[1] https://lwn.net/Articles/834289/

Changes since v4[2]:

 * Rebased on v5.10-rc4.

 * Require the VMM to map all guest memory PROT_MTE if MTE is enabled
   for the guest.

 * Add a kvm_has_mte() accessor.

[2] http://lkml.kernel.org/r/20201026155727.36685-1-steven.price%40arm.com

The change to require the VMM to map all guest memory PROT_MTE is
significant as it means that the VMM has to deal with the MTE tags even
if it doesn't care about them (e.g. for virtual devices or if the VMM
doesn't support migration). Also unfortunately because the VMM can
change the memory layout at any time the check for PROT_MTE/VM_MTE has
to be done very late (at the point of faulting pages into stage 2).

The alternative would be to modify the set_pte_at() handler to always
check if there is MTE data relating to a swap page even if the PTE
doesn't have the MTE bit set. I haven't initially done this because of
ordering issues during early boot, but could investigate further if the
above VMM requirement is too strict.

Steven Price (2):
  arm64: kvm: Save/restore MTE registers
  arm64: kvm: Introduce MTE VCPU feature

 arch/arm64/include/asm/kvm_emulate.h       |  3 +++
 arch/arm64/include/asm/kvm_host.h          |  8 ++++++++
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/kvm/arm.c                       |  9 +++++++++
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/mmu.c                       |  6 ++++++
 arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
 include/uapi/linux/kvm.h                   |  1 +
 8 files changed, 58 insertions(+), 6 deletions(-)

-- 
2.20.1


Re: [PATCH v5 0/2] MTE support for KVM guest

Posted by Peter Maydell 4 days, 15 hours ago
On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote:
> This series adds support for Arm's Memory Tagging Extension (MTE) to
> KVM, allowing KVM guests to make use of it. This builds on the existing
> user space support already in v5.10-rc1, see [1] for an overview.

> The change to require the VMM to map all guest memory PROT_MTE is
> significant as it means that the VMM has to deal with the MTE tags even
> if it doesn't care about them (e.g. for virtual devices or if the VMM
> doesn't support migration). Also unfortunately because the VMM can
> change the memory layout at any time the check for PROT_MTE/VM_MTE has
> to be done very late (at the point of faulting pages into stage 2).

I'm a bit dubious about requring the VMM to map the guest memory
PROT_MTE unless somebody's done at least a sketch of the design
for how this would work on the QEMU side. Currently QEMU just
assumes the guest memory is guest memory and it can access it
without special precautions...

thanks
-- PMM

Re: [PATCH v5 0/2] MTE support for KVM guest

Posted by Dr. David Alan Gilbert 18 hours ago
* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote:
> > This series adds support for Arm's Memory Tagging Extension (MTE) to
> > KVM, allowing KVM guests to make use of it. This builds on the existing
> > user space support already in v5.10-rc1, see [1] for an overview.
> 
> > The change to require the VMM to map all guest memory PROT_MTE is
> > significant as it means that the VMM has to deal with the MTE tags even
> > if it doesn't care about them (e.g. for virtual devices or if the VMM
> > doesn't support migration). Also unfortunately because the VMM can
> > change the memory layout at any time the check for PROT_MTE/VM_MTE has
> > to be done very late (at the point of faulting pages into stage 2).
> 
> I'm a bit dubious about requring the VMM to map the guest memory
> PROT_MTE unless somebody's done at least a sketch of the design
> for how this would work on the QEMU side. Currently QEMU just
> assumes the guest memory is guest memory and it can access it
> without special precautions...

Although that is also changing because of the encrypted/protected memory
in things like SEV.

Dave

> thanks
> -- PMM
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH v5 0/2] MTE support for KVM guest

Posted by Andrew Jones 4 days, 12 hours ago
On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote:
> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote:
> > This series adds support for Arm's Memory Tagging Extension (MTE) to
> > KVM, allowing KVM guests to make use of it. This builds on the existing
> > user space support already in v5.10-rc1, see [1] for an overview.
> 
> > The change to require the VMM to map all guest memory PROT_MTE is
> > significant as it means that the VMM has to deal with the MTE tags even
> > if it doesn't care about them (e.g. for virtual devices or if the VMM
> > doesn't support migration). Also unfortunately because the VMM can
> > change the memory layout at any time the check for PROT_MTE/VM_MTE has
> > to be done very late (at the point of faulting pages into stage 2).
> 
> I'm a bit dubious about requring the VMM to map the guest memory
> PROT_MTE unless somebody's done at least a sketch of the design
> for how this would work on the QEMU side. Currently QEMU just
> assumes the guest memory is guest memory and it can access it
> without special precautions...
>

There are two statements being made here:

1) Requiring the use of PROT_MTE when mapping guest memory may not fit
   QEMU well.

2) New KVM features should be accompanied with supporting QEMU code in
   order to prove that the APIs make sense.

I strongly agree with (2). While kvmtool supports some quick testing, it
doesn't support migration. We must test all new features with a migration
supporting VMM.

I'm not sure about (1). I don't feel like it should be a major problem,
but (2).

I'd be happy to help with the QEMU prototype, but preferably when there's
hardware available. Has all the current MTE testing just been done on
simulators? And, if so, are there regression tests regularly running on
the simulators too? And can they test migration? If hardware doesn't
show up quickly and simulators aren't used for regression tests, then
all this code will start rotting from day one.

Thanks,
drew


Re: [PATCH v5 0/2] MTE support for KVM guest

Posted by Marc Zyngier 4 days, 11 hours ago
On 2020-11-19 18:42, Andrew Jones wrote:
> On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote:
>> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> 
>> wrote:
>> > This series adds support for Arm's Memory Tagging Extension (MTE) to
>> > KVM, allowing KVM guests to make use of it. This builds on the existing
>> > user space support already in v5.10-rc1, see [1] for an overview.
>> 
>> > The change to require the VMM to map all guest memory PROT_MTE is
>> > significant as it means that the VMM has to deal with the MTE tags even
>> > if it doesn't care about them (e.g. for virtual devices or if the VMM
>> > doesn't support migration). Also unfortunately because the VMM can
>> > change the memory layout at any time the check for PROT_MTE/VM_MTE has
>> > to be done very late (at the point of faulting pages into stage 2).
>> 
>> I'm a bit dubious about requring the VMM to map the guest memory
>> PROT_MTE unless somebody's done at least a sketch of the design
>> for how this would work on the QEMU side. Currently QEMU just
>> assumes the guest memory is guest memory and it can access it
>> without special precautions...
>> 
> 
> There are two statements being made here:
> 
> 1) Requiring the use of PROT_MTE when mapping guest memory may not fit
>    QEMU well.
> 
> 2) New KVM features should be accompanied with supporting QEMU code in
>    order to prove that the APIs make sense.
> 
> I strongly agree with (2). While kvmtool supports some quick testing, 
> it
> doesn't support migration. We must test all new features with a 
> migration
> supporting VMM.
> 
> I'm not sure about (1). I don't feel like it should be a major problem,
> but (2).
> 
> I'd be happy to help with the QEMU prototype, but preferably when 
> there's
> hardware available. Has all the current MTE testing just been done on
> simulators? And, if so, are there regression tests regularly running on
> the simulators too? And can they test migration? If hardware doesn't
> show up quickly and simulators aren't used for regression tests, then
> all this code will start rotting from day one.

While I agree with the sentiment, the reality is pretty bleak.

I'm pretty sure nobody will ever run a migration on emulation. I also 
doubt
there is much overlap between MTE users and migration users, 
unfortunately.

No HW is available today, and when it becomes available, it will be in
the form of a closed system on which QEMU doesn't run, either because
we are locked out of EL2 (as usual), or because migration is not part of
the use case (like KVM on Android, for example).

So we can wait another two (five?) years until general purpose HW 
becomes
available, or we start merging what we can test today. I'm inclined to
do the latter.

And I think it is absolutely fine for QEMU to say "no MTE support with 
KVM"
(we can remove all userspace visibility, except for the capability).

         M.
-- 
Jazz is not dead. It just smells funny...

Re: [PATCH v5 0/2] MTE support for KVM guest

Posted by Steven Price 3 days, 20 hours ago
On 19/11/2020 19:11, Marc Zyngier wrote:
> On 2020-11-19 18:42, Andrew Jones wrote:
>> On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote:
>>> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote:
>>> > This series adds support for Arm's Memory Tagging Extension (MTE) to
>>> > KVM, allowing KVM guests to make use of it. This builds on the 
>>> existing
>>> > user space support already in v5.10-rc1, see [1] for an overview.
>>>
>>> > The change to require the VMM to map all guest memory PROT_MTE is
>>> > significant as it means that the VMM has to deal with the MTE tags 
>>> even
>>> > if it doesn't care about them (e.g. for virtual devices or if the VMM
>>> > doesn't support migration). Also unfortunately because the VMM can
>>> > change the memory layout at any time the check for PROT_MTE/VM_MTE has
>>> > to be done very late (at the point of faulting pages into stage 2).
>>>
>>> I'm a bit dubious about requring the VMM to map the guest memory
>>> PROT_MTE unless somebody's done at least a sketch of the design
>>> for how this would work on the QEMU side. Currently QEMU just
>>> assumes the guest memory is guest memory and it can access it
>>> without special precautions...
>>>
>>
>> There are two statements being made here:
>>
>> 1) Requiring the use of PROT_MTE when mapping guest memory may not fit
>>    QEMU well.
>>
>> 2) New KVM features should be accompanied with supporting QEMU code in
>>    order to prove that the APIs make sense.
>>
>> I strongly agree with (2). While kvmtool supports some quick testing, it
>> doesn't support migration. We must test all new features with a migration
>> supporting VMM.
>>
>> I'm not sure about (1). I don't feel like it should be a major problem,
>> but (2).

(1) seems to be contentious whichever way we go. Either PROT_MTE isn't 
required in which case it's easy to accidentally screw up migration, or 
it is required in which case it's difficult to handle normal guest 
memory from the VMM. I get the impression that probably I should go back 
to the previous approach - sorry for the distraction with this change.

(2) isn't something I'm trying to skip, but I'm limited in what I can do 
myself so would appreciate help here. Haibo is looking into this.

>>
>> I'd be happy to help with the QEMU prototype, but preferably when there's
>> hardware available. Has all the current MTE testing just been done on
>> simulators? And, if so, are there regression tests regularly running on
>> the simulators too? And can they test migration? If hardware doesn't
>> show up quickly and simulators aren't used for regression tests, then
>> all this code will start rotting from day one.

As Marc says, hardware isn't available. Testing is either via the Arm 
FVP model (that I've been using for most of my testing) or QEMU full 
system emulation.

> 
> While I agree with the sentiment, the reality is pretty bleak.
> 
> I'm pretty sure nobody will ever run a migration on emulation. I also doubt
> there is much overlap between MTE users and migration users, unfortunately.
> 
> No HW is available today, and when it becomes available, it will be in
> the form of a closed system on which QEMU doesn't run, either because
> we are locked out of EL2 (as usual), or because migration is not part of
> the use case (like KVM on Android, for example).
> 
> So we can wait another two (five?) years until general purpose HW becomes
> available, or we start merging what we can test today. I'm inclined to
> do the latter.
> 
> And I think it is absolutely fine for QEMU to say "no MTE support with KVM"
> (we can remove all userspace visibility, except for the capability).

What I'm trying to achieve is a situation where KVM+MTE without 
migration works and we leave ourselves a clear path where migration can 
be added. With hindsight I think this version of the series was a wrong 
turn - if we return to not requiring PROT_MTE then we have the following 
two potential options to explore for migration in the future:

  * The VMM can choose to enable PROT_MTE if it needs to, and if desired 
we can add a flag to enforce this in the kernel.

  * If needed a new kernel interface can be provided to fetch/set tags 
from guest memory which isn't mapped PROT_MTE.

Does this sound reasonable?

I'll clean up the set_pte_at() change and post a v6 later today.

Re: [PATCH v5 0/2] MTE support for KVM guest

Posted by Marc Zyngier 3 days, 20 hours ago
On 2020-11-20 09:50, Steven Price wrote:
> On 19/11/2020 19:11, Marc Zyngier wrote:

> Does this sound reasonable?
> 
> I'll clean up the set_pte_at() change and post a v6 later today.

Please hold on. I still haven't reviewed your v5, nor have I had time
to read your reply to my comments on v4.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

Re: [PATCH v5 0/2] MTE support for KVM guest

Posted by Steven Price 3 days, 20 hours ago
On 20/11/2020 09:56, Marc Zyngier wrote:
> On 2020-11-20 09:50, Steven Price wrote:
>> On 19/11/2020 19:11, Marc Zyngier wrote:
> 
>> Does this sound reasonable?
>>
>> I'll clean up the set_pte_at() change and post a v6 later today.
> 
> Please hold on. I still haven't reviewed your v5, nor have I had time
> to read your reply to my comments on v4.

Sure, no problem ;)

Steve


Re: [PATCH v5 0/2] MTE support for KVM guest

Posted by Steven Price 4 days, 14 hours ago
On 19/11/2020 15:45, Peter Maydell wrote:
> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote:
>> This series adds support for Arm's Memory Tagging Extension (MTE) to
>> KVM, allowing KVM guests to make use of it. This builds on the existing
>> user space support already in v5.10-rc1, see [1] for an overview.
> 
>> The change to require the VMM to map all guest memory PROT_MTE is
>> significant as it means that the VMM has to deal with the MTE tags even
>> if it doesn't care about them (e.g. for virtual devices or if the VMM
>> doesn't support migration). Also unfortunately because the VMM can
>> change the memory layout at any time the check for PROT_MTE/VM_MTE has
>> to be done very late (at the point of faulting pages into stage 2).
> 
> I'm a bit dubious about requring the VMM to map the guest memory
> PROT_MTE unless somebody's done at least a sketch of the design
> for how this would work on the QEMU side. Currently QEMU just
> assumes the guest memory is guest memory and it can access it
> without special precautions...

I agree this needs some investigation - I'm hoping Haibo will be able to 
provide some feedback here as he has been looking at the QEMU support. 
However the VMM is likely going to require some significant changes to 
ensure that migration doesn't break, so either way there's work to be done.

Fundamentally most memory will need a mapping with PROT_MTE just so the 
VMM can get at the tags for migration purposes, so QEMU is going to have 
to learn how to treat guest memory specially if it wants to be able to 
enable MTE for both itself and the guest.

I'll also hunt down what's happening with my attempts to fix the 
set_pte_at() handling for swap and I'll post that as an alternative if 
it turns out to be a reasonable approach. But I don't think that solve 
the QEMU issue above.

The other alternative would be to implement a new kernel interface to 
fetch tags from the guest and not require the VMM to maintain a PROT_MTE 
mapping. But we need some real feedback from someone familiar with QEMU 
to know what that interface should look like. So I'm holding off on that 
until there's a 'real' PoC implementation.

Thanks,

Steve

Re: [PATCH v5 0/2] MTE support for KVM guest

Posted by Peter Maydell 4 days, 14 hours ago
On Thu, 19 Nov 2020 at 15:57, Steven Price <steven.price@arm.com> wrote:
> On 19/11/2020 15:45, Peter Maydell wrote:
> > I'm a bit dubious about requring the VMM to map the guest memory
> > PROT_MTE unless somebody's done at least a sketch of the design
> > for how this would work on the QEMU side. Currently QEMU just
> > assumes the guest memory is guest memory and it can access it
> > without special precautions...
>
> I agree this needs some investigation - I'm hoping Haibo will be able to
> provide some feedback here as he has been looking at the QEMU support.
> However the VMM is likely going to require some significant changes to
> ensure that migration doesn't break, so either way there's work to be done.
>
> Fundamentally most memory will need a mapping with PROT_MTE just so the
> VMM can get at the tags for migration purposes, so QEMU is going to have
> to learn how to treat guest memory specially if it wants to be able to
> enable MTE for both itself and the guest.

If the only reason the VMM needs tag access is for migration it
feels like there must be a nicer way to do it than by
requiring it to map the whole of the guest address space twice
(once for normal use and once to get the tags)...

Anyway, maybe "must map PROT_MTE" is workable, but it seems
a bit premature to fix the kernel ABI as working that way
until we are at least reasonably sure that it is the right
design.

thanks
-- PMM