[PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering

Keir Fraser posted 4 patches 2 months, 3 weeks ago
There is a newer version of this series
[PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Keir Fraser 2 months, 3 weeks ago
In preparation to remove synchronize_srcu() from MMIO registration,
remove the distributor's dependency on this implicit barrier by
direct acquire-release synchronization on the flag write and its
lock-free check.

Signed-off-by: Keir Fraser <keirf@google.com>
---
 arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 502b65049703..bc83672e461b 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
 	gpa_t dist_base;
 	int ret = 0;
 
-	if (likely(dist->ready))
+	if (likely(smp_load_acquire(&dist->ready)))
 		return 0;
 
 	mutex_lock(&kvm->slots_lock);
@@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
 		goto out_slots;
 	}
 
-	/*
-	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
-	 * registration before returning through synchronize_srcu(), which also
-	 * implies a full memory barrier. As such, marking the distributor as
-	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
-	 * a completely configured distributor.
-	 */
-	dist->ready = true;
+	smp_store_release(&dist->ready, true);
 	goto out_slots;
 out:
 	mutex_unlock(&kvm->arch.config_lock);
-- 
2.50.0.727.gbf7dc18ff4-goog
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Yao Yuan 2 months, 3 weeks ago
On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> In preparation to remove synchronize_srcu() from MMIO registration,
> remove the distributor's dependency on this implicit barrier by
> direct acquire-release synchronization on the flag write and its
> lock-free check.
>
> Signed-off-by: Keir Fraser <keirf@google.com>
> ---
>  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> index 502b65049703..bc83672e461b 100644
> --- a/arch/arm64/kvm/vgic/vgic-init.c
> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
>  	gpa_t dist_base;
>  	int ret = 0;
>
> -	if (likely(dist->ready))
> +	if (likely(smp_load_acquire(&dist->ready)))
>  		return 0;
>
>  	mutex_lock(&kvm->slots_lock);
> @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
>  		goto out_slots;
>  	}
>
> -	/*
> -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> -	 * registration before returning through synchronize_srcu(), which also
> -	 * implies a full memory barrier. As such, marking the distributor as
> -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> -	 * a completely configured distributor.
> -	 */
> -	dist->ready = true;
> +	smp_store_release(&dist->ready, true);

No need the store-release and load-acquire for replacing
synchronize_srcu_expedited() w/ call_srcu() IIUC:

Tree SRCU on SMP:
call_srcu()
 __call_srcu()
  srcu_gp_start_if_needed()
    __srcu_read_unlock_nmisafe()
	 #ifdef	CONFIG_NEED_SRCU_NMI_SAFE
	   	  smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86.
	 #else
          __srcu_read_unlock()
		   smp_mb()
	 #endif

TINY SRCY on UP:
Should have no memory ordering issue on UP.

>  	goto out_slots;
>  out:
>  	mutex_unlock(&kvm->arch.config_lock);
> --
> 2.50.0.727.gbf7dc18ff4-goog
>
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Sean Christopherson 2 months, 2 weeks ago
On Thu, Jul 17, 2025, Yao Yuan wrote:
> On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > In preparation to remove synchronize_srcu() from MMIO registration,
> > remove the distributor's dependency on this implicit barrier by
> > direct acquire-release synchronization on the flag write and its
> > lock-free check.
> >
> > Signed-off-by: Keir Fraser <keirf@google.com>
> > ---
> >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> >  1 file changed, 2 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > index 502b65049703..bc83672e461b 100644
> > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> >  	gpa_t dist_base;
> >  	int ret = 0;
> >
> > -	if (likely(dist->ready))
> > +	if (likely(smp_load_acquire(&dist->ready)))
> >  		return 0;
> >
> >  	mutex_lock(&kvm->slots_lock);
> > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> >  		goto out_slots;
> >  	}
> >
> > -	/*
> > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > -	 * registration before returning through synchronize_srcu(), which also
> > -	 * implies a full memory barrier. As such, marking the distributor as
> > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > -	 * a completely configured distributor.
> > -	 */
> > -	dist->ready = true;
> > +	smp_store_release(&dist->ready, true);
> 
> No need the store-release and load-acquire for replacing
> synchronize_srcu_expedited() w/ call_srcu() IIUC:

This isn't about using call_srcu(), because it's not actually about kvm->buses.
This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
with a half-baked distributor.

In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
kvm_io_bus_register_dev() to provide the ordering guarantees.  Switching to
smp_store_release() + smp_load_acquire() removes the dependency on the
synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Yao Yuan 2 months, 2 weeks ago
On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> On Thu, Jul 17, 2025, Yao Yuan wrote:
> > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > remove the distributor's dependency on this implicit barrier by
> > > direct acquire-release synchronization on the flag write and its
> > > lock-free check.
> > >
> > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > ---
> > >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > index 502b65049703..bc83672e461b 100644
> > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > >  	gpa_t dist_base;
> > >  	int ret = 0;
> > >
> > > -	if (likely(dist->ready))
> > > +	if (likely(smp_load_acquire(&dist->ready)))
> > >  		return 0;
> > >
> > >  	mutex_lock(&kvm->slots_lock);
> > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > >  		goto out_slots;
> > >  	}
> > >
> > > -	/*
> > > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > -	 * registration before returning through synchronize_srcu(), which also
> > > -	 * implies a full memory barrier. As such, marking the distributor as
> > > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > -	 * a completely configured distributor.
> > > -	 */
> > > -	dist->ready = true;
> > > +	smp_store_release(&dist->ready, true);
> >
> > No need the store-release and load-acquire for replacing
> > synchronize_srcu_expedited() w/ call_srcu() IIUC:
>
> This isn't about using call_srcu(), because it's not actually about kvm->buses.
> This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> with a half-baked distributor.
>
> In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> kvm_io_bus_register_dev() to provide the ordering guarantees.  Switching to
> smp_store_release() + smp_load_acquire() removes the dependency on the
> synchronize_srcu() so that the synchronize_srcu() call can be safely removed.

Yes, I understand this and agree with your point.

Just for discusstion: I thought it should also work even w/o
introduce the load acqure + store release after switch to
call_srcu(): The smp_mb() in call_srcu() order the all store
to kvm->arch.vgic before store kvm->arch.vgic.ready in
current implementation.

>
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Keir Fraser 2 months, 2 weeks ago
On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > remove the distributor's dependency on this implicit barrier by
> > > > direct acquire-release synchronization on the flag write and its
> > > > lock-free check.
> > > >
> > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > ---
> > > >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > index 502b65049703..bc83672e461b 100644
> > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > >  	gpa_t dist_base;
> > > >  	int ret = 0;
> > > >
> > > > -	if (likely(dist->ready))
> > > > +	if (likely(smp_load_acquire(&dist->ready)))
> > > >  		return 0;
> > > >
> > > >  	mutex_lock(&kvm->slots_lock);
> > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > >  		goto out_slots;
> > > >  	}
> > > >
> > > > -	/*
> > > > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > -	 * registration before returning through synchronize_srcu(), which also
> > > > -	 * implies a full memory barrier. As such, marking the distributor as
> > > > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > -	 * a completely configured distributor.
> > > > -	 */
> > > > -	dist->ready = true;
> > > > +	smp_store_release(&dist->ready, true);
> > >
> > > No need the store-release and load-acquire for replacing
> > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> >
> > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > with a half-baked distributor.
> >
> > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > kvm_io_bus_register_dev() to provide the ordering guarantees.  Switching to
> > smp_store_release() + smp_load_acquire() removes the dependency on the
> > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
> 
> Yes, I understand this and agree with your point.
> 
> Just for discusstion: I thought it should also work even w/o
> introduce the load acqure + store release after switch to
> call_srcu(): The smp_mb() in call_srcu() order the all store
> to kvm->arch.vgic before store kvm->arch.vgic.ready in
> current implementation.

The load-acquire would still be required, to ensure that accesses to
kvm->arch.vgic do not get reordered earlier than the lock-free check
of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
initialised, but then use speculated reads of uninitialised vgic state.

> >
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Yao Yuan 2 months, 2 weeks ago
On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote:
> On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > > remove the distributor's dependency on this implicit barrier by
> > > > > direct acquire-release synchronization on the flag write and its
> > > > > lock-free check.
> > > > >
> > > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > > ---
> > > > >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > index 502b65049703..bc83672e461b 100644
> > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > >  	gpa_t dist_base;
> > > > >  	int ret = 0;
> > > > >
> > > > > -	if (likely(dist->ready))
> > > > > +	if (likely(smp_load_acquire(&dist->ready)))
> > > > >  		return 0;
> > > > >
> > > > >  	mutex_lock(&kvm->slots_lock);
> > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > >  		goto out_slots;
> > > > >  	}
> > > > >
> > > > > -	/*
> > > > > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > > -	 * registration before returning through synchronize_srcu(), which also
> > > > > -	 * implies a full memory barrier. As such, marking the distributor as
> > > > > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > > -	 * a completely configured distributor.
> > > > > -	 */
> > > > > -	dist->ready = true;
> > > > > +	smp_store_release(&dist->ready, true);
> > > >
> > > > No need the store-release and load-acquire for replacing
> > > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > >
> > > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > > with a half-baked distributor.
> > >
> > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > > kvm_io_bus_register_dev() to provide the ordering guarantees.  Switching to
> > > smp_store_release() + smp_load_acquire() removes the dependency on the
> > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
> >
> > Yes, I understand this and agree with your point.
> >
> > Just for discusstion: I thought it should also work even w/o
> > introduce the load acqure + store release after switch to
> > call_srcu(): The smp_mb() in call_srcu() order the all store
> > to kvm->arch.vgic before store kvm->arch.vgic.ready in
> > current implementation.
>
> The load-acquire would still be required, to ensure that accesses to
> kvm->arch.vgic do not get reordered earlier than the lock-free check
> of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
> initialised, but then use speculated reads of uninitialised vgic state.
>

Thanks for your explanation.

I see. But there's "mutex_lock(&kvm->slot_lock);" before later
acccessing to the kvm->arch.vgic, so I think the order can be
guaranteed. Of cause as you said a explicitly acquire-load +
store-release is better than before implicitly implementation.

> > >
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Keir Fraser 2 months, 2 weeks ago
On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote:
> On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote:
> > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > > > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > > > remove the distributor's dependency on this implicit barrier by
> > > > > > direct acquire-release synchronization on the flag write and its
> > > > > > lock-free check.
> > > > > >
> > > > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > > > ---
> > > > > >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > index 502b65049703..bc83672e461b 100644
> > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > >  	gpa_t dist_base;
> > > > > >  	int ret = 0;
> > > > > >
> > > > > > -	if (likely(dist->ready))
> > > > > > +	if (likely(smp_load_acquire(&dist->ready)))
> > > > > >  		return 0;
> > > > > >
> > > > > >  	mutex_lock(&kvm->slots_lock);
> > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > >  		goto out_slots;
> > > > > >  	}
> > > > > >
> > > > > > -	/*
> > > > > > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > > > -	 * registration before returning through synchronize_srcu(), which also
> > > > > > -	 * implies a full memory barrier. As such, marking the distributor as
> > > > > > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > > > -	 * a completely configured distributor.
> > > > > > -	 */
> > > > > > -	dist->ready = true;
> > > > > > +	smp_store_release(&dist->ready, true);
> > > > >
> > > > > No need the store-release and load-acquire for replacing
> > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > > >
> > > > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > > > with a half-baked distributor.
> > > >
> > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > > > kvm_io_bus_register_dev() to provide the ordering guarantees.  Switching to
> > > > smp_store_release() + smp_load_acquire() removes the dependency on the
> > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
> > >
> > > Yes, I understand this and agree with your point.
> > >
> > > Just for discusstion: I thought it should also work even w/o
> > > introduce the load acqure + store release after switch to
> > > call_srcu(): The smp_mb() in call_srcu() order the all store
> > > to kvm->arch.vgic before store kvm->arch.vgic.ready in
> > > current implementation.
> >
> > The load-acquire would still be required, to ensure that accesses to
> > kvm->arch.vgic do not get reordered earlier than the lock-free check
> > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
> > initialised, but then use speculated reads of uninitialised vgic state.
> >
> 
> Thanks for your explanation.
> 
> I see. But there's "mutex_lock(&kvm->slot_lock);" before later
> acccessing to the kvm->arch.vgic, so I think the order can be
> guaranteed. Of cause as you said a explicitly acquire-load +
> store-release is better than before implicitly implementation.

If vgic_dist::ready is observed true by the lock-free read (the one
which is turned into load-acquire by this patch) then the function
immediately returns with no mutex_lock() executed. It is reads of
vgic_dist *after* return from kvm_vgic_map_resources() that you have
to worry about, and which require load-acquire semantics.

> 
> > > >
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Yao Yuan 2 months, 2 weeks ago
On Mon, Jul 21, 2025 at 07:49:10AM +0800, Keir Fraser wrote:
> On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote:
> > On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote:
> > > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> > > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > > > > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > > > > remove the distributor's dependency on this implicit barrier by
> > > > > > > direct acquire-release synchronization on the flag write and its
> > > > > > > lock-free check.
> > > > > > >
> > > > > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > > > > ---
> > > > > > >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > > > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > index 502b65049703..bc83672e461b 100644
> > > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > >  	gpa_t dist_base;
> > > > > > >  	int ret = 0;
> > > > > > >
> > > > > > > -	if (likely(dist->ready))
> > > > > > > +	if (likely(smp_load_acquire(&dist->ready)))
> > > > > > >  		return 0;
> > > > > > >
> > > > > > >  	mutex_lock(&kvm->slots_lock);
> > > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > >  		goto out_slots;
> > > > > > >  	}
> > > > > > >
> > > > > > > -	/*
> > > > > > > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > > > > -	 * registration before returning through synchronize_srcu(), which also
> > > > > > > -	 * implies a full memory barrier. As such, marking the distributor as
> > > > > > > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > > > > -	 * a completely configured distributor.
> > > > > > > -	 */
> > > > > > > -	dist->ready = true;
> > > > > > > +	smp_store_release(&dist->ready, true);
> > > > > >
> > > > > > No need the store-release and load-acquire for replacing
> > > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > > > >
> > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > > > > with a half-baked distributor.
> > > > >
> > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > > > > kvm_io_bus_register_dev() to provide the ordering guarantees.  Switching to
> > > > > smp_store_release() + smp_load_acquire() removes the dependency on the
> > > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
> > > >
> > > > Yes, I understand this and agree with your point.
> > > >
> > > > Just for discusstion: I thought it should also work even w/o
> > > > introduce the load acqure + store release after switch to
> > > > call_srcu(): The smp_mb() in call_srcu() order the all store
> > > > to kvm->arch.vgic before store kvm->arch.vgic.ready in
> > > > current implementation.
> > >
> > > The load-acquire would still be required, to ensure that accesses to
> > > kvm->arch.vgic do not get reordered earlier than the lock-free check
> > > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
> > > initialised, but then use speculated reads of uninitialised vgic state.
> > >
> >
> > Thanks for your explanation.
> >
> > I see. But there's "mutex_lock(&kvm->slot_lock);" before later
> > acccessing to the kvm->arch.vgic, so I think the order can be
> > guaranteed. Of cause as you said a explicitly acquire-load +
> > store-release is better than before implicitly implementation.
>
> If vgic_dist::ready is observed true by the lock-free read (the one
> which is turned into load-acquire by this patch) then the function
> immediately returns with no mutex_lock() executed. It is reads of
> vgic_dist *after* return from kvm_vgic_map_resources() that you have
> to worry about, and which require load-acquire semantics.

I think this is the main purpose of such lock-free reading
here, to avoid lock contention on VM w/ large vCPUs for
vcpus' first time run together.

store-release makes sure the changes to vgic_dist::ready
become visible after the changes to vgic_dist become
visible, but it doesn't guarantee the vgic_dist::ready
becomes visible to reader on aother CPU **IMMEDIATELY**,
thus load-acquire in reader side is request for this.
Is above understanding correct ?

>
> >
> > > > >
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Yao Yuan 2 months, 2 weeks ago
On Tue, Jul 22, 2025 at 10:01:27AM +0800, Yao Yuan wrote:
> On Mon, Jul 21, 2025 at 07:49:10AM +0800, Keir Fraser wrote:
> > On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote:
> > > On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote:
> > > > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote:
> > > > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote:
> > > > > > On Thu, Jul 17, 2025, Yao Yuan wrote:
> > > > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > > > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > > > > > remove the distributor's dependency on this implicit barrier by
> > > > > > > > direct acquire-release synchronization on the flag write and its
> > > > > > > > lock-free check.
> > > > > > > >
> > > > > > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > > > > > ---
> > > > > > > >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > > > > > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > > index 502b65049703..bc83672e461b 100644
> > > > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > > >  	gpa_t dist_base;
> > > > > > > >  	int ret = 0;
> > > > > > > >
> > > > > > > > -	if (likely(dist->ready))
> > > > > > > > +	if (likely(smp_load_acquire(&dist->ready)))
> > > > > > > >  		return 0;
> > > > > > > >
> > > > > > > >  	mutex_lock(&kvm->slots_lock);
> > > > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > > > > > >  		goto out_slots;
> > > > > > > >  	}
> > > > > > > >
> > > > > > > > -	/*
> > > > > > > > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > > > > > -	 * registration before returning through synchronize_srcu(), which also
> > > > > > > > -	 * implies a full memory barrier. As such, marking the distributor as
> > > > > > > > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > > > > > -	 * a completely configured distributor.
> > > > > > > > -	 */
> > > > > > > > -	dist->ready = true;
> > > > > > > > +	smp_store_release(&dist->ready, true);
> > > > > > >
> > > > > > > No need the store-release and load-acquire for replacing
> > > > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > > > > >
> > > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses.
> > > > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered
> > > > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true"
> > > > > > with a half-baked distributor.
> > > > > >
> > > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in
> > > > > > kvm_io_bus_register_dev() to provide the ordering guarantees.  Switching to
> > > > > > smp_store_release() + smp_load_acquire() removes the dependency on the
> > > > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
> > > > >
> > > > > Yes, I understand this and agree with your point.
> > > > >
> > > > > Just for discusstion: I thought it should also work even w/o
> > > > > introduce the load acqure + store release after switch to
> > > > > call_srcu(): The smp_mb() in call_srcu() order the all store
> > > > > to kvm->arch.vgic before store kvm->arch.vgic.ready in
> > > > > current implementation.
> > > >
> > > > The load-acquire would still be required, to ensure that accesses to
> > > > kvm->arch.vgic do not get reordered earlier than the lock-free check
> > > > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is
> > > > initialised, but then use speculated reads of uninitialised vgic state.
> > > >
> > >
> > > Thanks for your explanation.
> > >
> > > I see. But there's "mutex_lock(&kvm->slot_lock);" before later
> > > acccessing to the kvm->arch.vgic, so I think the order can be
> > > guaranteed. Of cause as you said a explicitly acquire-load +
> > > store-release is better than before implicitly implementation.
> >
> > If vgic_dist::ready is observed true by the lock-free read (the one
> > which is turned into load-acquire by this patch) then the function
> > immediately returns with no mutex_lock() executed. It is reads of
> > vgic_dist *after* return from kvm_vgic_map_resources() that you have
> > to worry about, and which require load-acquire semantics.
>
> I think this is the main purpose of such lock-free reading
> here, to avoid lock contention on VM w/ large vCPUs for
> vcpus' first time run together.
>
> store-release makes sure the changes to vgic_dist::ready
> become visible after the changes to vgic_dist become
> visible, but it doesn't guarantee the vgic_dist::ready
> becomes visible to reader on aother CPU **IMMEDIATELY**,
> thus load-acquire in reader side is request for this.
> Is above understanding correct ?

No. I get your point, the load-acuqire here for:

There's code path that the cpu read the vgic_dist reorder
before the vgic_dist::ready in case of the cpu get
vgic_dist::ready is true w/o the load_acquire here.

>
> >
> > >
> > > > > >
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Keir Fraser 2 months, 2 weeks ago
On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote:
> On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > In preparation to remove synchronize_srcu() from MMIO registration,
> > remove the distributor's dependency on this implicit barrier by
> > direct acquire-release synchronization on the flag write and its
> > lock-free check.
> >
> > Signed-off-by: Keir Fraser <keirf@google.com>
> > ---
> >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> >  1 file changed, 2 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > index 502b65049703..bc83672e461b 100644
> > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> >  	gpa_t dist_base;
> >  	int ret = 0;
> >
> > -	if (likely(dist->ready))
> > +	if (likely(smp_load_acquire(&dist->ready)))
> >  		return 0;
> >
> >  	mutex_lock(&kvm->slots_lock);
> > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> >  		goto out_slots;
> >  	}
> >
> > -	/*
> > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > -	 * registration before returning through synchronize_srcu(), which also
> > -	 * implies a full memory barrier. As such, marking the distributor as
> > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > -	 * a completely configured distributor.
> > -	 */
> > -	dist->ready = true;
> > +	smp_store_release(&dist->ready, true);
> 
> No need the store-release and load-acquire for replacing
> synchronize_srcu_expedited() w/ call_srcu() IIUC:
> 
> Tree SRCU on SMP:
> call_srcu()
>  __call_srcu()
>   srcu_gp_start_if_needed()
>     __srcu_read_unlock_nmisafe()
> 	 #ifdef	CONFIG_NEED_SRCU_NMI_SAFE
> 	   	  smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86.
> 	 #else
>           __srcu_read_unlock()
> 		   smp_mb()
> 	 #endif

I don't think it's nice to depend on an implementation detail of
kvm_io_bus_register_dev() and, transitively, on implementation details
of call_srcu().

kvm_vgic_map_resources() isn't called that often and can afford its
own synchronization.

 -- Keir

> TINY SRCY on UP:
> Should have no memory ordering issue on UP.
> 
> >  	goto out_slots;
> >  out:
> >  	mutex_unlock(&kvm->arch.config_lock);
> > --
> > 2.50.0.727.gbf7dc18ff4-goog
> >
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Keir Fraser 2 months, 2 weeks ago
On Fri, Jul 18, 2025 at 02:53:42PM +0000, Keir Fraser wrote:
> On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote:
> > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > remove the distributor's dependency on this implicit barrier by
> > > direct acquire-release synchronization on the flag write and its
> > > lock-free check.
> > >
> > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > ---
> > >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > index 502b65049703..bc83672e461b 100644
> > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > >  	gpa_t dist_base;
> > >  	int ret = 0;
> > >
> > > -	if (likely(dist->ready))
> > > +	if (likely(smp_load_acquire(&dist->ready)))
> > >  		return 0;
> > >
> > >  	mutex_lock(&kvm->slots_lock);
> > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > >  		goto out_slots;
> > >  	}
> > >
> > > -	/*
> > > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > -	 * registration before returning through synchronize_srcu(), which also
> > > -	 * implies a full memory barrier. As such, marking the distributor as
> > > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > -	 * a completely configured distributor.
> > > -	 */
> > > -	dist->ready = true;
> > > +	smp_store_release(&dist->ready, true);
> > 
> > No need the store-release and load-acquire for replacing
> > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > 
> > Tree SRCU on SMP:
> > call_srcu()
> >  __call_srcu()
> >   srcu_gp_start_if_needed()
> >     __srcu_read_unlock_nmisafe()
> > 	 #ifdef	CONFIG_NEED_SRCU_NMI_SAFE
> > 	   	  smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86.
> > 	 #else
> >           __srcu_read_unlock()
> > 		   smp_mb()
> > 	 #endif
> 
> I don't think it's nice to depend on an implementation detail of
> kvm_io_bus_register_dev() and, transitively, on implementation details
> of call_srcu().

Also I should note that this is moot because the smp_mb() would *not*
safely replace the load-acquire.

> kvm_vgic_map_resources() isn't called that often and can afford its
> own synchronization.
> 
>  -- Keir
> 
> > TINY SRCY on UP:
> > Should have no memory ordering issue on UP.
> > 
> > >  	goto out_slots;
> > >  out:
> > >  	mutex_unlock(&kvm->arch.config_lock);
> > > --
> > > 2.50.0.727.gbf7dc18ff4-goog
> > >
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Yao Yuan 2 months, 2 weeks ago
On Fri, Jul 18, 2025 at 03:25:46PM +0000, Keir Fraser wrote:
> On Fri, Jul 18, 2025 at 02:53:42PM +0000, Keir Fraser wrote:
> > On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote:
> > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > > remove the distributor's dependency on this implicit barrier by
> > > > direct acquire-release synchronization on the flag write and its
> > > > lock-free check.
> > > >
> > > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > > ---
> > > >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > > index 502b65049703..bc83672e461b 100644
> > > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > >  	gpa_t dist_base;
> > > >  	int ret = 0;
> > > >
> > > > -	if (likely(dist->ready))
> > > > +	if (likely(smp_load_acquire(&dist->ready)))
> > > >  		return 0;
> > > >
> > > >  	mutex_lock(&kvm->slots_lock);
> > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > > >  		goto out_slots;
> > > >  	}
> > > >
> > > > -	/*
> > > > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > > -	 * registration before returning through synchronize_srcu(), which also
> > > > -	 * implies a full memory barrier. As such, marking the distributor as
> > > > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > > -	 * a completely configured distributor.
> > > > -	 */
> > > > -	dist->ready = true;
> > > > +	smp_store_release(&dist->ready, true);
> > >
> > > No need the store-release and load-acquire for replacing
> > > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > >
> > > Tree SRCU on SMP:
> > > call_srcu()
> > >  __call_srcu()
> > >   srcu_gp_start_if_needed()
> > >     __srcu_read_unlock_nmisafe()
> > > 	 #ifdef	CONFIG_NEED_SRCU_NMI_SAFE
> > > 	   	  smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86.
> > > 	 #else
> > >           __srcu_read_unlock()
> > > 		   smp_mb()
> > > 	 #endif
> >
> > I don't think it's nice to depend on an implementation detail of
> > kvm_io_bus_register_dev() and, transitively, on implementation details
> > of call_srcu().

This is good point, I agree with you.

>
> Also I should note that this is moot because the smp_mb() would *not*
> safely replace the load-acquire.

Hmm.. do you mean it can't order the write to dist->ready
here while read it in aonther thread at the same time ?

>
> > kvm_vgic_map_resources() isn't called that often and can afford its
> > own synchronization.
> >
> >  -- Keir
> >
> > > TINY SRCY on UP:
> > > Should have no memory ordering issue on UP.
> > >
> > > >  	goto out_slots;
> > > >  out:
> > > >  	mutex_unlock(&kvm->arch.config_lock);
> > > > --
> > > > 2.50.0.727.gbf7dc18ff4-goog
> > > >
>
Re: [PATCH v2 2/4] KVM: arm64: vgic: Explicitly implement vgic_dist::ready ordering
Posted by Sean Christopherson 2 months, 2 weeks ago
On Fri, Jul 18, 2025, Keir Fraser wrote:
> On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote:
> > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote:
> > > In preparation to remove synchronize_srcu() from MMIO registration,
> > > remove the distributor's dependency on this implicit barrier by
> > > direct acquire-release synchronization on the flag write and its
> > > lock-free check.
> > >
> > > Signed-off-by: Keir Fraser <keirf@google.com>
> > > ---
> > >  arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
> > >  1 file changed, 2 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> > > index 502b65049703..bc83672e461b 100644
> > > --- a/arch/arm64/kvm/vgic/vgic-init.c
> > > +++ b/arch/arm64/kvm/vgic/vgic-init.c
> > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > >  	gpa_t dist_base;
> > >  	int ret = 0;
> > >
> > > -	if (likely(dist->ready))
> > > +	if (likely(smp_load_acquire(&dist->ready)))
> > >  		return 0;
> > >
> > >  	mutex_lock(&kvm->slots_lock);
> > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
> > >  		goto out_slots;
> > >  	}
> > >
> > > -	/*
> > > -	 * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
> > > -	 * registration before returning through synchronize_srcu(), which also
> > > -	 * implies a full memory barrier. As such, marking the distributor as
> > > -	 * 'ready' here is guaranteed to be ordered after all vCPUs having seen
> > > -	 * a completely configured distributor.
> > > -	 */
> > > -	dist->ready = true;
> > > +	smp_store_release(&dist->ready, true);
> > 
> > No need the store-release and load-acquire for replacing
> > synchronize_srcu_expedited() w/ call_srcu() IIUC:
> > 
> > Tree SRCU on SMP:
> > call_srcu()
> >  __call_srcu()
> >   srcu_gp_start_if_needed()
> >     __srcu_read_unlock_nmisafe()
> > 	 #ifdef	CONFIG_NEED_SRCU_NMI_SAFE
> > 	   	  smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86.
> > 	 #else
> >           __srcu_read_unlock()
> > 		   smp_mb()
> > 	 #endif
> 
> I don't think it's nice to depend on an implementation detail of
> kvm_io_bus_register_dev() and, transitively, on implementation details
> of call_srcu().
> 
> kvm_vgic_map_resources() isn't called that often and can afford its
> own synchronization.

+1