Adhering to the requirements of KVM Userfault:
1. When it is toggled on, zap the second stage with
kvm_arch_flush_shadow_memslot(). This is to respect userfault-ness.
2. When KVM_MEM_USERFAULT is enabled, restrict new second-stage mappings
to be PAGE_SIZE, just like when dirty logging is enabled.
Do not zap the second stage when KVM_MEM_USERFAULT is disabled to remain
consistent with the behavior when dirty logging is disabled.
Signed-off-by: James Houghton <jthoughton@google.com>
---
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/mmu.c | 26 +++++++++++++++++++++++++-
2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index ead632ad01b4..d89b4088b580 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -38,6 +38,7 @@ menuconfig KVM
select HAVE_KVM_VCPU_RUN_PID_CHANGE
select SCHED_INFO
select GUEST_PERF_EVENTS if PERF_EVENTS
+ select HAVE_KVM_USERFAULT
help
Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c9d46ad57e52..e099bdcfac42 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1493,7 +1493,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* logging_active is guaranteed to never be true for VM_PFNMAP
* memslots.
*/
- if (logging_active) {
+ if (logging_active || kvm_memslot_userfault(memslot)) {
force_pte = true;
vma_shift = PAGE_SHIFT;
} else {
@@ -1582,6 +1582,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
mmu_seq = vcpu->kvm->mmu_invalidate_seq;
mmap_read_unlock(current->mm);
+ if (kvm_gfn_userfault(kvm, memslot, gfn)) {
+ kvm_prepare_memory_fault_exit(vcpu, gfn << PAGE_SHIFT,
+ PAGE_SIZE, write_fault,
+ exec_fault, false, true);
+ return -EFAULT;
+ }
+
pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
&writable, &page);
if (pfn == KVM_PFN_ERR_HWPOISON) {
@@ -2073,6 +2080,23 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
enum kvm_mr_change change)
{
bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
+ u32 new_flags = new ? new->flags : 0;
+ u32 changed_flags = (new_flags) ^ (old ? old->flags : 0);
+
+ /*
+ * If KVM_MEM_USERFAULT has been enabled, drop all the stage-2 mappings
+ * so that we can respect userfault-ness.
+ */
+ if ((changed_flags & KVM_MEM_USERFAULT) &&
+ (new_flags & KVM_MEM_USERFAULT) &&
+ change == KVM_MR_FLAGS_ONLY)
+ kvm_arch_flush_shadow_memslot(kvm, old);
+
+ /*
+ * Nothing left to do if not toggling dirty logging.
+ */
+ if (!(changed_flags & KVM_MEM_LOG_DIRTY_PAGES))
+ return;
/*
* At this point memslot has been committed and there is an
--
2.47.1.613.gc27f4b7a9f-goog
On Thu, Jan 09, 2025, James Houghton wrote:
> @@ -2073,6 +2080,23 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> enum kvm_mr_change change)
> {
> bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
> + u32 new_flags = new ? new->flags : 0;
> + u32 changed_flags = (new_flags) ^ (old ? old->flags : 0);
This is a bit hard to read, and there's only one use of log_dirty_pages. With
zapping handled in common KVM, just do:
@@ -2127,14 +2131,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
const struct kvm_memory_slot *new,
enum kvm_mr_change change)
{
- bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
+ u32 old_flags = old ? old->flags : 0;
+ u32 new_flags = new ? new->flags : 0;
+
+ /* Nothing to do if not toggling dirty logging. */
+ if (!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
+ return;
/*
* At this point memslot has been committed and there is an
* allocated dirty_bitmap[], dirty pages will be tracked while the
* memory slot is write protected.
*/
- if (log_dirty_pages) {
+ if (new_flags & KVM_MEM_LOG_DIRTY_PAGES) {
if (change == KVM_MR_DELETE)
return;
On Tue, May 6, 2025 at 8:06 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Jan 09, 2025, James Houghton wrote:
> > @@ -2073,6 +2080,23 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> > enum kvm_mr_change change)
> > {
> > bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
> > + u32 new_flags = new ? new->flags : 0;
> > + u32 changed_flags = (new_flags) ^ (old ? old->flags : 0);
>
> This is a bit hard to read, and there's only one use of log_dirty_pages. With
> zapping handled in common KVM, just do:
Thanks, Sean. Yeah what you have below looks a lot better, thanks for
applying it for me. I'll post a new version soon. One note below.
>
> @@ -2127,14 +2131,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> const struct kvm_memory_slot *new,
> enum kvm_mr_change change)
> {
> - bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
> + u32 old_flags = old ? old->flags : 0;
> + u32 new_flags = new ? new->flags : 0;
> +
> + /* Nothing to do if not toggling dirty logging. */
> + if (!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
> + return;
This is my bug, not yours, but I think this condition must also check
that `change == KVM_MR_FLAGS_ONLY` for it to be correct. This, for
example, will break the case where we are deleting a memslot that
still has KVM_MEM_LOG_DIRTY_PAGES enabled. Will fix in the next
version.
>
> /*
> * At this point memslot has been committed and there is an
> * allocated dirty_bitmap[], dirty pages will be tracked while the
> * memory slot is write protected.
> */
> - if (log_dirty_pages) {
> + if (new_flags & KVM_MEM_LOG_DIRTY_PAGES) {
>
> if (change == KVM_MR_DELETE)
> return;
On Wed, May 28, 2025 at 11:09 AM James Houghton <jthoughton@google.com> wrote:
>
> On Tue, May 6, 2025 at 8:06 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Jan 09, 2025, James Houghton wrote:
> > > @@ -2073,6 +2080,23 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> > > enum kvm_mr_change change)
> > > {
> > > bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
> > > + u32 new_flags = new ? new->flags : 0;
> > > + u32 changed_flags = (new_flags) ^ (old ? old->flags : 0);
> >
> > This is a bit hard to read, and there's only one use of log_dirty_pages. With
> > zapping handled in common KVM, just do:
>
> Thanks, Sean. Yeah what you have below looks a lot better, thanks for
> applying it for me. I'll post a new version soon. One note below.
>
> >
> > @@ -2127,14 +2131,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> > const struct kvm_memory_slot *new,
> > enum kvm_mr_change change)
> > {
> > - bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
> > + u32 old_flags = old ? old->flags : 0;
> > + u32 new_flags = new ? new->flags : 0;
> > +
> > + /* Nothing to do if not toggling dirty logging. */
> > + if (!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
> > + return;
>
> This is my bug, not yours, but I think this condition must also check
> that `change == KVM_MR_FLAGS_ONLY` for it to be correct. This, for
> example, will break the case where we are deleting a memslot that
> still has KVM_MEM_LOG_DIRTY_PAGES enabled. Will fix in the next
> version.
Ah it wouldn't break that example, as `new` would be NULL. But I think
it would break the case where we are moving a memslot that keeps
`KVM_MEM_LOG_DIRTY_PAGES`.
>
> >
> > /*
> > * At this point memslot has been committed and there is an
> > * allocated dirty_bitmap[], dirty pages will be tracked while the
> > * memory slot is write protected.
> > */
> > - if (log_dirty_pages) {
> > + if (new_flags & KVM_MEM_LOG_DIRTY_PAGES) {
> >
> > if (change == KVM_MR_DELETE)
> > return;
On Wed, May 28, 2025, James Houghton wrote:
> On Wed, May 28, 2025 at 11:09 AM James Houghton <jthoughton@google.com> wrote:
> >
> > On Tue, May 6, 2025 at 8:06 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Thu, Jan 09, 2025, James Houghton wrote:
> > > > @@ -2073,6 +2080,23 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> > > > enum kvm_mr_change change)
> > > > {
> > > > bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
> > > > + u32 new_flags = new ? new->flags : 0;
> > > > + u32 changed_flags = (new_flags) ^ (old ? old->flags : 0);
> > >
> > > This is a bit hard to read, and there's only one use of log_dirty_pages. With
> > > zapping handled in common KVM, just do:
> >
> > Thanks, Sean. Yeah what you have below looks a lot better, thanks for
> > applying it for me. I'll post a new version soon. One note below.
> >
> > >
> > > @@ -2127,14 +2131,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> > > const struct kvm_memory_slot *new,
> > > enum kvm_mr_change change)
> > > {
> > > - bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
> > > + u32 old_flags = old ? old->flags : 0;
> > > + u32 new_flags = new ? new->flags : 0;
> > > +
> > > + /* Nothing to do if not toggling dirty logging. */
> > > + if (!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
> > > + return;
> >
> > This is my bug, not yours, but I think this condition must also check
> > that `change == KVM_MR_FLAGS_ONLY` for it to be correct. This, for
> > example, will break the case where we are deleting a memslot that
> > still has KVM_MEM_LOG_DIRTY_PAGES enabled. Will fix in the next
> > version.
>
> Ah it wouldn't break that example, as `new` would be NULL. But I think
> it would break the case where we are moving a memslot that keeps
> `KVM_MEM_LOG_DIRTY_PAGES`.
Can you elaborate? Maybe with the full snippet of the final code that's broken.
I'm not entirely following what's path you're referring to.
On Wed, May 28, 2025 at 1:30 PM Sean Christopherson <seanjc@google.com> wrote:
> On Wed, May 28, 2025, James Houghton wrote:
> > On Wed, May 28, 2025 at 11:09 AM James Houghton <jthoughton@google.com> wrote:
> > > On Tue, May 6, 2025 at 8:06 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > @@ -2127,14 +2131,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> > > > const struct kvm_memory_slot *new,
> > > > enum kvm_mr_change change)
> > > > {
> > > > - bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
> > > > + u32 old_flags = old ? old->flags : 0;
> > > > + u32 new_flags = new ? new->flags : 0;
> > > > +
> > > > + /* Nothing to do if not toggling dirty logging. */
> > > > + if (!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
> > > > + return;
> > >
> > > This is my bug, not yours, but I think this condition must also check
> > > that `change == KVM_MR_FLAGS_ONLY` for it to be correct. This, for
> > > example, will break the case where we are deleting a memslot that
> > > still has KVM_MEM_LOG_DIRTY_PAGES enabled. Will fix in the next
> > > version.
> >
> > Ah it wouldn't break that example, as `new` would be NULL. But I think
> > it would break the case where we are moving a memslot that keeps
> > `KVM_MEM_LOG_DIRTY_PAGES`.
>
> Can you elaborate? Maybe with the full snippet of the final code that's broken.
> I'm not entirely following what's path you're referring to.
This is even more broken than I realized.
I mean that this diff should be applied on top of your patch:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5e2ccde66f43c..f1db3f7742b28 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2134,8 +2134,12 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
u32 old_flags = old ? old->flags : 0;
u32 new_flags = new ? new->flags : 0;
- /* Nothing to do if not toggling dirty logging. */
- if (!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
+ /*
+ * If only changing flags, nothing to do if not toggling
+ * dirty logging.
+ */
+ if (change == KVM_MR_FLAGS_ONLY &&
+ !((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
return;
/*
So the final commit looks like:
commit 3c4b57b25b1123629c5f2b64065d51ecdadb6771
Author: James Houghton <jthoughton@google.com>
Date: Tue May 6 15:38:31 2025 -0700
KVM: arm64: Add support for KVM userfault exits
<to be written by James>
Signed-off-by: James Houghton <jthoughton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c5d21bcfa3ed4..f1db3f7742b28 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2127,15 +2131,23 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
const struct kvm_memory_slot *new,
enum kvm_mr_change change)
{
- bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
+ u32 old_flags = old ? old->flags : 0;
+ u32 new_flags = new ? new->flags : 0;
+
+ /*
+ * If only changing flags, nothing to do if not toggling
+ * dirty logging.
+ */
+ if (change == KVM_MR_FLAGS_ONLY &&
+ !((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
+ return;
/*
* At this point memslot has been committed and there is an
* allocated dirty_bitmap[], dirty pages will be tracked while the
* memory slot is write protected.
*/
- if (log_dirty_pages) {
-
+ if (new_flags & KVM_MEM_LOG_DIRTY_PAGES) {
if (change == KVM_MR_DELETE)
return;
So we need to bail out early if we are enabling KVM_MEM_USERFAULT but
KVM_MEM_LOG_DIRTY_PAGES is already enabled, otherwise we'll be
write-protecting a bunch of PTEs that we don't need or want to WP.
When *disabling* KVM_MEM_USERFAULT, we definitely don't want to WP
things, as we aren't going to get the unmap afterwards anyway.
So the check we started with handles this:
> > > > + u32 old_flags = old ? old->flags : 0;
> > > > + u32 new_flags = new ? new->flags : 0;
> > > > +
> > > > + /* Nothing to do if not toggling dirty logging. */
> > > > + if (!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
> > > > + return;
So why also check for `change == KVM_MR_FLAGS_ONLY` as well? Everything I just
said doesn't really apply when the memslot is being created, moved, or
destroyed. Otherwise, consider the case where we never enable dirty logging:
- Memslot deletion would be totally broken; we'll see that
KVM_MEM_LOG_DIRTY_PAGES is not getting toggled and then bail out, skipping
some freeing.
- Memslot creation would be broken in a similar way; we'll skip a bunch of
setup work.
- For memslot moving, the only case that we could possibly be leaving
KVM_MEM_LOG_DIRTY_PAGES set without the change being KVM_MR_FLAGS_ONLY,
I think we still need to do the split and WP stuff.
On Wed, May 28, 2025, James Houghton wrote:
> On Wed, May 28, 2025 at 1:30 PM Sean Christopherson <seanjc@google.com> wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c5d21bcfa3ed4..f1db3f7742b28 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -2127,15 +2131,23 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> const struct kvm_memory_slot *new,
> enum kvm_mr_change change)
> {
> - bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
> + u32 old_flags = old ? old->flags : 0;
> + u32 new_flags = new ? new->flags : 0;
> +
> + /*
> + * If only changing flags, nothing to do if not toggling
> + * dirty logging.
> + */
> + if (change == KVM_MR_FLAGS_ONLY &&
> + !((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
> + return;
>
> /*
> * At this point memslot has been committed and there is an
> * allocated dirty_bitmap[], dirty pages will be tracked while the
> * memory slot is write protected.
> */
> - if (log_dirty_pages) {
> -
> + if (new_flags & KVM_MEM_LOG_DIRTY_PAGES) {
> if (change == KVM_MR_DELETE)
> return;
>
>
> So we need to bail out early if we are enabling KVM_MEM_USERFAULT but
> KVM_MEM_LOG_DIRTY_PAGES is already enabled, otherwise we'll be
> write-protecting a bunch of PTEs that we don't need or want to WP.
>
> When *disabling* KVM_MEM_USERFAULT, we definitely don't want to WP
> things, as we aren't going to get the unmap afterwards anyway.
>
> So the check we started with handles this:
> > > > > + u32 old_flags = old ? old->flags : 0;
> > > > > + u32 new_flags = new ? new->flags : 0;
> > > > > +
> > > > > + /* Nothing to do if not toggling dirty logging. */
> > > > > + if (!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
> > > > > + return;
>
> So why also check for `change == KVM_MR_FLAGS_ONLY` as well? Everything I just
> said doesn't really apply when the memslot is being created, moved, or
> destroyed. Otherwise, consider the case where we never enable dirty logging:
>
> - Memslot deletion would be totally broken; we'll see that
> KVM_MEM_LOG_DIRTY_PAGES is not getting toggled and then bail out, skipping
> some freeing.
No, because @new and thus new_flags will be 0. If dirty logging wasn't enabled,
then there's nothing to be done.
> - Memslot creation would be broken in a similar way; we'll skip a bunch of
> setup work.
No, because @old and thus old_flags will be 0. If dirty logging isn't being
enabled, then there's nothing to be done.
> - For memslot moving, the only case that we could possibly be leaving
> KVM_MEM_LOG_DIRTY_PAGES set without the change being KVM_MR_FLAGS_ONLY,
> I think we still need to do the split and WP stuff.
No, because KVM invokes kvm_arch_flush_shadow_memslot() on the memslot and marks
it invalid prior to installing the new, moved memslot. See kvm_invalidate_memslot().
So I'm still not seeing what's buggy.
On Wed, May 28, 2025 at 4:25 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, May 28, 2025, James Houghton wrote:
> > On Wed, May 28, 2025 at 1:30 PM Sean Christopherson <seanjc@google.com> wrote:
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c5d21bcfa3ed4..f1db3f7742b28 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -2127,15 +2131,23 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> > const struct kvm_memory_slot *new,
> > enum kvm_mr_change change)
> > {
> > - bool log_dirty_pages = new && new->flags & KVM_MEM_LOG_DIRTY_PAGES;
> > + u32 old_flags = old ? old->flags : 0;
> > + u32 new_flags = new ? new->flags : 0;
> > +
> > + /*
> > + * If only changing flags, nothing to do if not toggling
> > + * dirty logging.
> > + */
> > + if (change == KVM_MR_FLAGS_ONLY &&
> > + !((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
> > + return;
> >
> > /*
> > * At this point memslot has been committed and there is an
> > * allocated dirty_bitmap[], dirty pages will be tracked while the
> > * memory slot is write protected.
> > */
> > - if (log_dirty_pages) {
> > -
> > + if (new_flags & KVM_MEM_LOG_DIRTY_PAGES) {
> > if (change == KVM_MR_DELETE)
> > return;
> >
> >
> > So we need to bail out early if we are enabling KVM_MEM_USERFAULT but
> > KVM_MEM_LOG_DIRTY_PAGES is already enabled, otherwise we'll be
> > write-protecting a bunch of PTEs that we don't need or want to WP.
> >
> > When *disabling* KVM_MEM_USERFAULT, we definitely don't want to WP
> > things, as we aren't going to get the unmap afterwards anyway.
> >
> > So the check we started with handles this:
> > > > > > + u32 old_flags = old ? old->flags : 0;
> > > > > > + u32 new_flags = new ? new->flags : 0;
> > > > > > +
> > > > > > + /* Nothing to do if not toggling dirty logging. */
> > > > > > + if (!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))
> > > > > > + return;
> >
> > So why also check for `change == KVM_MR_FLAGS_ONLY` as well? Everything I just
> > said doesn't really apply when the memslot is being created, moved, or
> > destroyed. Otherwise, consider the case where we never enable dirty logging:
> >
> > - Memslot deletion would be totally broken; we'll see that
> > KVM_MEM_LOG_DIRTY_PAGES is not getting toggled and then bail out, skipping
> > some freeing.
>
> No, because @new and thus new_flags will be 0. If dirty logging wasn't enabled,
> then there's nothing to be done.
>
> > - Memslot creation would be broken in a similar way; we'll skip a bunch of
> > setup work.
>
> No, because @old and thus old_flags will be 0. If dirty logging isn't being
> enabled, then there's nothing to be done.
>
> > - For memslot moving, the only case that we could possibly be leaving
> > KVM_MEM_LOG_DIRTY_PAGES set without the change being KVM_MR_FLAGS_ONLY,
> > I think we still need to do the split and WP stuff.
>
> No, because KVM invokes kvm_arch_flush_shadow_memslot() on the memslot and marks
> it invalid prior to installing the new, moved memslot. See kvm_invalidate_memslot().
>
> So I'm still not seeing what's buggy.
Sorry, I didn't see your reply, Sean. :(
You're right, I was confusing the KVM_MEM_USERFAULT and
KVM_MEM_LOG_DIRTY_PAGES. I'll undo the little change I said I was
going to make.
Thank you!
© 2016 - 2026 Red Hat, Inc.