[PATCH v8 02/11] KVM: Add lockless memslot walk to KVM

James Houghton posted 11 patches 1 year, 3 months ago
There is a newer version of this series
[PATCH v8 02/11] KVM: Add lockless memslot walk to KVM
Posted by James Houghton 1 year, 3 months ago
Provide flexibility to the architecture to synchronize as optimally as
they can instead of always taking the MMU lock for writing.

Architectures that do their own locking must select
CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS.

The immediate application is to allow architectures to implement the
test/clear_young MMU notifiers more cheaply.

Suggested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/Kconfig         |  2 ++
 virt/kvm/kvm_main.c      | 28 +++++++++++++++++++++-------
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 18a1672ffcbf..ab0318dbb8bd 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -260,6 +260,7 @@ struct kvm_gfn_range {
 	gfn_t end;
 	union kvm_mmu_notifier_arg arg;
 	bool may_block;
+	bool lockless;
 };
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 54e959e7d68f..b50e4e629ac9 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -102,6 +102,8 @@ config KVM_GENERIC_MMU_NOTIFIER
 
 config KVM_ELIDE_TLB_FLUSH_IF_YOUNG
        depends on KVM_GENERIC_MMU_NOTIFIER
+
+config KVM_MMU_NOTIFIER_YOUNG_LOCKLESS
        bool
 
 config KVM_GENERIC_MEMORY_ATTRIBUTES
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b234a9acdb3..218edf037917 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -517,6 +517,7 @@ struct kvm_mmu_notifier_range {
 	on_lock_fn_t on_lock;
 	bool flush_on_ret;
 	bool may_block;
+	bool lockless;
 };
 
 /*
@@ -571,6 +572,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm,
 			 IS_KVM_NULL_FN(range->handler)))
 		return r;
 
+	/* on_lock will never be called for lockless walks */
+	if (WARN_ON_ONCE(range->lockless && !IS_KVM_NULL_FN(range->on_lock)))
+		return r;
+
 	idx = srcu_read_lock(&kvm->srcu);
 
 	for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
@@ -602,15 +607,18 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm,
 			gfn_range.start = hva_to_gfn_memslot(hva_start, slot);
 			gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot);
 			gfn_range.slot = slot;
+			gfn_range.lockless = range->lockless;
 
 			if (!r.found_memslot) {
 				r.found_memslot = true;
-				KVM_MMU_LOCK(kvm);
-				if (!IS_KVM_NULL_FN(range->on_lock))
-					range->on_lock(kvm);
-
-				if (IS_KVM_NULL_FN(range->handler))
-					goto mmu_unlock;
+				if (!range->lockless) {
+					KVM_MMU_LOCK(kvm);
+					if (!IS_KVM_NULL_FN(range->on_lock))
+						range->on_lock(kvm);
+
+					if (IS_KVM_NULL_FN(range->handler))
+						goto mmu_unlock;
+				}
 			}
 			r.ret |= range->handler(kvm, &gfn_range);
 		}
@@ -620,7 +628,7 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm,
 		kvm_flush_remote_tlbs(kvm);
 
 mmu_unlock:
-	if (r.found_memslot)
+	if (r.found_memslot && !range->lockless)
 		KVM_MMU_UNLOCK(kvm);
 
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -797,6 +805,8 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
 		.flush_on_ret	=
 			!IS_ENABLED(CONFIG_KVM_ELIDE_TLB_FLUSH_IF_YOUNG),
 		.may_block	= false,
+		.lockless	=
+			IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS),
 	};
 
 	trace_kvm_age_hva(start, end);
@@ -817,6 +827,8 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
 		.on_lock	= (void *)kvm_null_fn,
 		.flush_on_ret	= false,
 		.may_block	= false,
+		.lockless	=
+			IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS),
 	};
 
 	trace_kvm_age_hva(start, end);
@@ -849,6 +861,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
 		.on_lock	= (void *)kvm_null_fn,
 		.flush_on_ret	= false,
 		.may_block	= false,
+		.lockless	=
+			IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS),
 	};
 
 	trace_kvm_test_age_hva(address);
-- 
2.47.0.199.ga7371fff76-goog
Re: [PATCH v8 02/11] KVM: Add lockless memslot walk to KVM
Posted by Sean Christopherson 1 year ago
On Tue, Nov 05, 2024, James Houghton wrote:
> Provide flexibility to the architecture to synchronize as optimally as

Similar to my nit on "locking" below, "synchronize" is somewhat misleading.  There's
no requirement for architectures to synchronize during aging.  There is all but
guaranteed to be some form of "synchronization", e.g. to prevent walking freed
page tables, but the aging itself never synchronizes, and protecting a walk by
disabling IRQs (as x86's shadow MMU does in some flows) only "synchronizes" in a
loose sense of the word.

> they can instead of always taking the MMU lock for writing.
> 
> Architectures that do their own locking must select
> CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS.

This is backwards and could be misleading, and for the TDP MMU outright wrong.
If some hypothetical architecture took _additional_ locks, then it can do so
without needing to select CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS.

What you want to say is that architectures that select
CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS are responsible for ensuring correctness.
And it's specifically all about correctness.  Common KVM doesn't care if the
arch does it's own locking, e.g. taking mmu_lock for read, or has some completely
lock-free scheme for aging.

> The immediate application is to allow architectures to implement the

"immediate application" is pretty redundant.  There's really only one reason to
not take mmu_lock in this path.

> test/clear_young MMU notifiers more cheaply.

IMO, "more cheaply" is vague, and doesn't add much beyond the opening sentence
about "synchronize as optimally as they can".  I would just delete this last
sentence.

> Suggested-by: Yu Zhao <yuzhao@google.com>
> Signed-off-by: James Houghton <jthoughton@google.com>
> Reviewed-by: David Matlack <dmatlack@google.com>
> ---
> @@ -797,6 +805,8 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
>  		.flush_on_ret	=
>  			!IS_ENABLED(CONFIG_KVM_ELIDE_TLB_FLUSH_IF_YOUNG),
>  		.may_block	= false,
> +		.lockless	=
> +			IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS),

		.lockess	= IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS),

Random thought, maybe CONFIG_KVM_MMU_NOTIFIER_AGING_LOCKLESS or
CONFIG_KVM_MMU_NOTIFIER_AGE_LOCKLESS instead of "YOUNG"?

>  	};
>  
>  	trace_kvm_age_hva(start, end);
> @@ -817,6 +827,8 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn,
>  		.on_lock	= (void *)kvm_null_fn,
>  		.flush_on_ret	= false,
>  		.may_block	= false,
> +		.lockless	=
> +			IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS),
>  	};
>  
>  	trace_kvm_age_hva(start, end);
> @@ -849,6 +861,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn,
>  		.on_lock	= (void *)kvm_null_fn,
>  		.flush_on_ret	= false,
>  		.may_block	= false,
> +		.lockless	=
> +			IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS),
>  	};
>  
>  	trace_kvm_test_age_hva(address);
> -- 
> 2.47.0.199.ga7371fff76-goog
>
Re: [PATCH v8 02/11] KVM: Add lockless memslot walk to KVM
Posted by James Houghton 1 year ago
On Fri, Jan 10, 2025 at 2:26 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Nov 05, 2024, James Houghton wrote:
> > Provide flexibility to the architecture to synchronize as optimally as
>
> Similar to my nit on "locking" below, "synchronize" is somewhat misleading.  There's
> no requirement for architectures to synchronize during aging.  There is all but
> guaranteed to be some form of "synchronization", e.g. to prevent walking freed
> page tables, but the aging itself never synchronizes, and protecting a walk by
> disabling IRQs (as x86's shadow MMU does in some flows) only "synchronizes" in a
> loose sense of the word.
>
> > they can instead of always taking the MMU lock for writing.
> >
> > Architectures that do their own locking must select
> > CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS.
>
> This is backwards and could be misleading, and for the TDP MMU outright wrong.
> If some hypothetical architecture took _additional_ locks, then it can do so
> without needing to select CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS.
>
> What you want to say is that architectures that select
> CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS are responsible for ensuring correctness.
> And it's specifically all about correctness.  Common KVM doesn't care if the
> arch does it's own locking, e.g. taking mmu_lock for read, or has some completely
> lock-free scheme for aging.
>
> > The immediate application is to allow architectures to implement the
>
> "immediate application" is pretty redundant.  There's really only one reason to
> not take mmu_lock in this path.
>
> > test/clear_young MMU notifiers more cheaply.
>
> IMO, "more cheaply" is vague, and doesn't add much beyond the opening sentence
> about "synchronize as optimally as they can".  I would just delete this last
> sentence.

Thanks Sean. I've reworded the changelog like this:

    It is possible to correctly do aging without taking the KVM MMU lock;
    this option allows such architectures to do so. Architectures that
    select CONFIG_KVM_MMU_NOTIFIER_AGING_LOCKLESS are responsible for
    ensuring correctness.

>
> > Suggested-by: Yu Zhao <yuzhao@google.com>
> > Signed-off-by: James Houghton <jthoughton@google.com>
> > Reviewed-by: David Matlack <dmatlack@google.com>
> > ---
> > @@ -797,6 +805,8 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
> >               .flush_on_ret   =
> >                       !IS_ENABLED(CONFIG_KVM_ELIDE_TLB_FLUSH_IF_YOUNG),
> >               .may_block      = false,
> > +             .lockless       =
> > +                     IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS),
>
>                 .lockess        = IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_YOUNG_LOCKLESS),
>
> Random thought, maybe CONFIG_KVM_MMU_NOTIFIER_AGING_LOCKLESS or
> CONFIG_KVM_MMU_NOTIFIER_AGE_LOCKLESS instead of "YOUNG"?

CONFIG_KVM_MMU_NOTIFIER_AGING_LOCKLESS sounds good. Changed.