In preparation to remove synchronize_srcu() from MMIO registration,
remove the distributor's dependency on this implicit barrier by
direct acquire-release synchronization on the flag write and its
lock-free check.
Signed-off-by: Keir Fraser <keirf@google.com>
---
arch/arm64/kvm/vgic/vgic-init.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 502b65049703..bc83672e461b 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
gpa_t dist_base;
int ret = 0;
- if (likely(dist->ready))
+ if (likely(smp_load_acquire(&dist->ready)))
return 0;
mutex_lock(&kvm->slots_lock);
@@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
goto out_slots;
}
- /*
- * kvm_io_bus_register_dev() guarantees all readers see the new MMIO
- * registration before returning through synchronize_srcu(), which also
- * implies a full memory barrier. As such, marking the distributor as
- * 'ready' here is guaranteed to be ordered after all vCPUs having seen
- * a completely configured distributor.
- */
- dist->ready = true;
+ smp_store_release(&dist->ready, true);
goto out_slots;
out:
mutex_unlock(&kvm->arch.config_lock);
--
2.50.0.727.gbf7dc18ff4-goog
On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > In preparation to remove synchronize_srcu() from MMIO registration, > remove the distributor's dependency on this implicit barrier by > direct acquire-release synchronization on the flag write and its > lock-free check. > > Signed-off-by: Keir Fraser <keirf@google.com> > --- > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > 1 file changed, 2 insertions(+), 9 deletions(-) > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > index 502b65049703..bc83672e461b 100644 > --- a/arch/arm64/kvm/vgic/vgic-init.c > +++ b/arch/arm64/kvm/vgic/vgic-init.c > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > gpa_t dist_base; > int ret = 0; > > - if (likely(dist->ready)) > + if (likely(smp_load_acquire(&dist->ready))) > return 0; > > mutex_lock(&kvm->slots_lock); > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > goto out_slots; > } > > - /* > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > - * registration before returning through synchronize_srcu(), which also > - * implies a full memory barrier. As such, marking the distributor as > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > - * a completely configured distributor. > - */ > - dist->ready = true; > + smp_store_release(&dist->ready, true); No need the store-release and load-acquire for replacing synchronize_srcu_expedited() w/ call_srcu() IIUC: Tree SRCU on SMP: call_srcu() __call_srcu() srcu_gp_start_if_needed() __srcu_read_unlock_nmisafe() #ifdef CONFIG_NEED_SRCU_NMI_SAFE smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86. #else __srcu_read_unlock() smp_mb() #endif TINY SRCY on UP: Should have no memory ordering issue on UP. > goto out_slots; > out: > mutex_unlock(&kvm->arch.config_lock); > -- > 2.50.0.727.gbf7dc18ff4-goog >
On Thu, Jul 17, 2025, Yao Yuan wrote: > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > In preparation to remove synchronize_srcu() from MMIO registration, > > remove the distributor's dependency on this implicit barrier by > > direct acquire-release synchronization on the flag write and its > > lock-free check. > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > --- > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > index 502b65049703..bc83672e461b 100644 > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > gpa_t dist_base; > > int ret = 0; > > > > - if (likely(dist->ready)) > > + if (likely(smp_load_acquire(&dist->ready))) > > return 0; > > > > mutex_lock(&kvm->slots_lock); > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > goto out_slots; > > } > > > > - /* > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > - * registration before returning through synchronize_srcu(), which also > > - * implies a full memory barrier. As such, marking the distributor as > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > - * a completely configured distributor. > > - */ > > - dist->ready = true; > > + smp_store_release(&dist->ready, true); > > No need the store-release and load-acquire for replacing > synchronize_srcu_expedited() w/ call_srcu() IIUC: This isn't about using call_srcu(), because it's not actually about kvm->buses. This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true" with a half-baked distributor. In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to smp_store_release() + smp_load_acquire() removes the dependency on the synchronize_srcu() so that the synchronize_srcu() call can be safely removed.
On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote: > On Thu, Jul 17, 2025, Yao Yuan wrote: > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > > In preparation to remove synchronize_srcu() from MMIO registration, > > > remove the distributor's dependency on this implicit barrier by > > > direct acquire-release synchronization on the flag write and its > > > lock-free check. > > > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > > --- > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > > index 502b65049703..bc83672e461b 100644 > > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > gpa_t dist_base; > > > int ret = 0; > > > > > > - if (likely(dist->ready)) > > > + if (likely(smp_load_acquire(&dist->ready))) > > > return 0; > > > > > > mutex_lock(&kvm->slots_lock); > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > goto out_slots; > > > } > > > > > > - /* > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > > - * registration before returning through synchronize_srcu(), which also > > > - * implies a full memory barrier. As such, marking the distributor as > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > > - * a completely configured distributor. > > > - */ > > > - dist->ready = true; > > > + smp_store_release(&dist->ready, true); > > > > No need the store-release and load-acquire for replacing > > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > This isn't about using call_srcu(), because it's not actually about kvm->buses. > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true" > with a half-baked distributor. > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to > smp_store_release() + smp_load_acquire() removes the dependency on the > synchronize_srcu() so that the synchronize_srcu() call can be safely removed. Yes, I understand this and agree with your point. Just for discusstion: I thought it should also work even w/o introduce the load acqure + store release after switch to call_srcu(): The smp_mb() in call_srcu() order the all store to kvm->arch.vgic before store kvm->arch.vgic.ready in current implementation. >
On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote: > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote: > > On Thu, Jul 17, 2025, Yao Yuan wrote: > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > > > In preparation to remove synchronize_srcu() from MMIO registration, > > > > remove the distributor's dependency on this implicit barrier by > > > > direct acquire-release synchronization on the flag write and its > > > > lock-free check. > > > > > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > > > --- > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > > > index 502b65049703..bc83672e461b 100644 > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > gpa_t dist_base; > > > > int ret = 0; > > > > > > > > - if (likely(dist->ready)) > > > > + if (likely(smp_load_acquire(&dist->ready))) > > > > return 0; > > > > > > > > mutex_lock(&kvm->slots_lock); > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > goto out_slots; > > > > } > > > > > > > > - /* > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > > > - * registration before returning through synchronize_srcu(), which also > > > > - * implies a full memory barrier. As such, marking the distributor as > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > > > - * a completely configured distributor. > > > > - */ > > > > - dist->ready = true; > > > > + smp_store_release(&dist->ready, true); > > > > > > No need the store-release and load-acquire for replacing > > > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses. > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true" > > with a half-baked distributor. > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in > > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to > > smp_store_release() + smp_load_acquire() removes the dependency on the > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed. > > Yes, I understand this and agree with your point. > > Just for discusstion: I thought it should also work even w/o > introduce the load acqure + store release after switch to > call_srcu(): The smp_mb() in call_srcu() order the all store > to kvm->arch.vgic before store kvm->arch.vgic.ready in > current implementation. The load-acquire would still be required, to ensure that accesses to kvm->arch.vgic do not get reordered earlier than the lock-free check of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is initialised, but then use speculated reads of uninitialised vgic state. > >
On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote: > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote: > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote: > > > On Thu, Jul 17, 2025, Yao Yuan wrote: > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > > > > In preparation to remove synchronize_srcu() from MMIO registration, > > > > > remove the distributor's dependency on this implicit barrier by > > > > > direct acquire-release synchronization on the flag write and its > > > > > lock-free check. > > > > > > > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > > > > --- > > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > > > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > > > > index 502b65049703..bc83672e461b 100644 > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > > gpa_t dist_base; > > > > > int ret = 0; > > > > > > > > > > - if (likely(dist->ready)) > > > > > + if (likely(smp_load_acquire(&dist->ready))) > > > > > return 0; > > > > > > > > > > mutex_lock(&kvm->slots_lock); > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > > goto out_slots; > > > > > } > > > > > > > > > > - /* > > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > > > > - * registration before returning through synchronize_srcu(), which also > > > > > - * implies a full memory barrier. As such, marking the distributor as > > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > > > > - * a completely configured distributor. > > > > > - */ > > > > > - dist->ready = true; > > > > > + smp_store_release(&dist->ready, true); > > > > > > > > No need the store-release and load-acquire for replacing > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses. > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true" > > > with a half-baked distributor. > > > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in > > > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to > > > smp_store_release() + smp_load_acquire() removes the dependency on the > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed. > > > > Yes, I understand this and agree with your point. > > > > Just for discusstion: I thought it should also work even w/o > > introduce the load acqure + store release after switch to > > call_srcu(): The smp_mb() in call_srcu() order the all store > > to kvm->arch.vgic before store kvm->arch.vgic.ready in > > current implementation. > > The load-acquire would still be required, to ensure that accesses to > kvm->arch.vgic do not get reordered earlier than the lock-free check > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is > initialised, but then use speculated reads of uninitialised vgic state. > Thanks for your explanation. I see. But there's "mutex_lock(&kvm->slot_lock);" before later acccessing to the kvm->arch.vgic, so I think the order can be guaranteed. Of cause as you said a explicitly acquire-load + store-release is better than before implicitly implementation. > > >
On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote: > On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote: > > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote: > > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote: > > > > On Thu, Jul 17, 2025, Yao Yuan wrote: > > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > > > > > In preparation to remove synchronize_srcu() from MMIO registration, > > > > > > remove the distributor's dependency on this implicit barrier by > > > > > > direct acquire-release synchronization on the flag write and its > > > > > > lock-free check. > > > > > > > > > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > > > > > --- > > > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > > > > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > > > > > index 502b65049703..bc83672e461b 100644 > > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > > > gpa_t dist_base; > > > > > > int ret = 0; > > > > > > > > > > > > - if (likely(dist->ready)) > > > > > > + if (likely(smp_load_acquire(&dist->ready))) > > > > > > return 0; > > > > > > > > > > > > mutex_lock(&kvm->slots_lock); > > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > > > goto out_slots; > > > > > > } > > > > > > > > > > > > - /* > > > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > > > > > - * registration before returning through synchronize_srcu(), which also > > > > > > - * implies a full memory barrier. As such, marking the distributor as > > > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > > > > > - * a completely configured distributor. > > > > > > - */ > > > > > > - dist->ready = true; > > > > > > + smp_store_release(&dist->ready, true); > > > > > > > > > > No need the store-release and load-acquire for replacing > > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > > > > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses. > > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered > > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true" > > > > with a half-baked distributor. > > > > > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in > > > > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to > > > > smp_store_release() + smp_load_acquire() removes the dependency on the > > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed. > > > > > > Yes, I understand this and agree with your point. > > > > > > Just for discusstion: I thought it should also work even w/o > > > introduce the load acqure + store release after switch to > > > call_srcu(): The smp_mb() in call_srcu() order the all store > > > to kvm->arch.vgic before store kvm->arch.vgic.ready in > > > current implementation. > > > > The load-acquire would still be required, to ensure that accesses to > > kvm->arch.vgic do not get reordered earlier than the lock-free check > > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is > > initialised, but then use speculated reads of uninitialised vgic state. > > > > Thanks for your explanation. > > I see. But there's "mutex_lock(&kvm->slot_lock);" before later > acccessing to the kvm->arch.vgic, so I think the order can be > guaranteed. Of cause as you said a explicitly acquire-load + > store-release is better than before implicitly implementation. If vgic_dist::ready is observed true by the lock-free read (the one which is turned into load-acquire by this patch) then the function immediately returns with no mutex_lock() executed. It is reads of vgic_dist *after* return from kvm_vgic_map_resources() that you have to worry about, and which require load-acquire semantics. > > > > >
On Mon, Jul 21, 2025 at 07:49:10AM +0800, Keir Fraser wrote: > On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote: > > On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote: > > > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote: > > > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote: > > > > > On Thu, Jul 17, 2025, Yao Yuan wrote: > > > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > > > > > > In preparation to remove synchronize_srcu() from MMIO registration, > > > > > > > remove the distributor's dependency on this implicit barrier by > > > > > > > direct acquire-release synchronization on the flag write and its > > > > > > > lock-free check. > > > > > > > > > > > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > > > > > > --- > > > > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > > > > > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > > > > > > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > > > > > > index 502b65049703..bc83672e461b 100644 > > > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > > > > gpa_t dist_base; > > > > > > > int ret = 0; > > > > > > > > > > > > > > - if (likely(dist->ready)) > > > > > > > + if (likely(smp_load_acquire(&dist->ready))) > > > > > > > return 0; > > > > > > > > > > > > > > mutex_lock(&kvm->slots_lock); > > > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > > > > goto out_slots; > > > > > > > } > > > > > > > > > > > > > > - /* > > > > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > > > > > > - * registration before returning through synchronize_srcu(), which also > > > > > > > - * implies a full memory barrier. As such, marking the distributor as > > > > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > > > > > > - * a completely configured distributor. > > > > > > > - */ > > > > > > > - dist->ready = true; > > > > > > > + smp_store_release(&dist->ready, true); > > > > > > > > > > > > No need the store-release and load-acquire for replacing > > > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > > > > > > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses. > > > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered > > > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true" > > > > > with a half-baked distributor. > > > > > > > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in > > > > > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to > > > > > smp_store_release() + smp_load_acquire() removes the dependency on the > > > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed. > > > > > > > > Yes, I understand this and agree with your point. > > > > > > > > Just for discusstion: I thought it should also work even w/o > > > > introduce the load acqure + store release after switch to > > > > call_srcu(): The smp_mb() in call_srcu() order the all store > > > > to kvm->arch.vgic before store kvm->arch.vgic.ready in > > > > current implementation. > > > > > > The load-acquire would still be required, to ensure that accesses to > > > kvm->arch.vgic do not get reordered earlier than the lock-free check > > > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is > > > initialised, but then use speculated reads of uninitialised vgic state. > > > > > > > Thanks for your explanation. > > > > I see. But there's "mutex_lock(&kvm->slot_lock);" before later > > acccessing to the kvm->arch.vgic, so I think the order can be > > guaranteed. Of cause as you said a explicitly acquire-load + > > store-release is better than before implicitly implementation. > > If vgic_dist::ready is observed true by the lock-free read (the one > which is turned into load-acquire by this patch) then the function > immediately returns with no mutex_lock() executed. It is reads of > vgic_dist *after* return from kvm_vgic_map_resources() that you have > to worry about, and which require load-acquire semantics. I think this is the main purpose of such lock-free reading here, to avoid lock contention on VM w/ large vCPUs for vcpus' first time run together. store-release makes sure the changes to vgic_dist::ready become visible after the changes to vgic_dist become visible, but it doesn't guarantee the vgic_dist::ready becomes visible to reader on aother CPU **IMMEDIATELY**, thus load-acquire in reader side is request for this. Is above understanding correct ? > > > > > > > >
On Tue, Jul 22, 2025 at 10:01:27AM +0800, Yao Yuan wrote: > On Mon, Jul 21, 2025 at 07:49:10AM +0800, Keir Fraser wrote: > > On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote: > > > On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote: > > > > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote: > > > > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote: > > > > > > On Thu, Jul 17, 2025, Yao Yuan wrote: > > > > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > > > > > > > In preparation to remove synchronize_srcu() from MMIO registration, > > > > > > > > remove the distributor's dependency on this implicit barrier by > > > > > > > > direct acquire-release synchronization on the flag write and its > > > > > > > > lock-free check. > > > > > > > > > > > > > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > > > > > > > --- > > > > > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > > > > > > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > > > > > > > > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > > > > > > > index 502b65049703..bc83672e461b 100644 > > > > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > > > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > > > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > > > > > gpa_t dist_base; > > > > > > > > int ret = 0; > > > > > > > > > > > > > > > > - if (likely(dist->ready)) > > > > > > > > + if (likely(smp_load_acquire(&dist->ready))) > > > > > > > > return 0; > > > > > > > > > > > > > > > > mutex_lock(&kvm->slots_lock); > > > > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > > > > > goto out_slots; > > > > > > > > } > > > > > > > > > > > > > > > > - /* > > > > > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > > > > > > > - * registration before returning through synchronize_srcu(), which also > > > > > > > > - * implies a full memory barrier. As such, marking the distributor as > > > > > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > > > > > > > - * a completely configured distributor. > > > > > > > > - */ > > > > > > > > - dist->ready = true; > > > > > > > > + smp_store_release(&dist->ready, true); > > > > > > > > > > > > > > No need the store-release and load-acquire for replacing > > > > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > > > > > > > > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses. > > > > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered > > > > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true" > > > > > > with a half-baked distributor. > > > > > > > > > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in > > > > > > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to > > > > > > smp_store_release() + smp_load_acquire() removes the dependency on the > > > > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed. > > > > > > > > > > Yes, I understand this and agree with your point. > > > > > > > > > > Just for discusstion: I thought it should also work even w/o > > > > > introduce the load acqure + store release after switch to > > > > > call_srcu(): The smp_mb() in call_srcu() order the all store > > > > > to kvm->arch.vgic before store kvm->arch.vgic.ready in > > > > > current implementation. > > > > > > > > The load-acquire would still be required, to ensure that accesses to > > > > kvm->arch.vgic do not get reordered earlier than the lock-free check > > > > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is > > > > initialised, but then use speculated reads of uninitialised vgic state. > > > > > > > > > > Thanks for your explanation. > > > > > > I see. But there's "mutex_lock(&kvm->slot_lock);" before later > > > acccessing to the kvm->arch.vgic, so I think the order can be > > > guaranteed. Of cause as you said a explicitly acquire-load + > > > store-release is better than before implicitly implementation. > > > > If vgic_dist::ready is observed true by the lock-free read (the one > > which is turned into load-acquire by this patch) then the function > > immediately returns with no mutex_lock() executed. It is reads of > > vgic_dist *after* return from kvm_vgic_map_resources() that you have > > to worry about, and which require load-acquire semantics. > > I think this is the main purpose of such lock-free reading > here, to avoid lock contention on VM w/ large vCPUs for > vcpus' first time run together. > > store-release makes sure the changes to vgic_dist::ready > become visible after the changes to vgic_dist become > visible, but it doesn't guarantee the vgic_dist::ready > becomes visible to reader on aother CPU **IMMEDIATELY**, > thus load-acquire in reader side is request for this. > Is above understanding correct ? No. I get your point, the load-acuqire here for: There's code path that the cpu read the vgic_dist reorder before the vgic_dist::ready in case of the cpu get vgic_dist::ready is true w/o the load_acquire here. > > > > > > > > > > > >
On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote: > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > In preparation to remove synchronize_srcu() from MMIO registration, > > remove the distributor's dependency on this implicit barrier by > > direct acquire-release synchronization on the flag write and its > > lock-free check. > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > --- > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > index 502b65049703..bc83672e461b 100644 > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > gpa_t dist_base; > > int ret = 0; > > > > - if (likely(dist->ready)) > > + if (likely(smp_load_acquire(&dist->ready))) > > return 0; > > > > mutex_lock(&kvm->slots_lock); > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > goto out_slots; > > } > > > > - /* > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > - * registration before returning through synchronize_srcu(), which also > > - * implies a full memory barrier. As such, marking the distributor as > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > - * a completely configured distributor. > > - */ > > - dist->ready = true; > > + smp_store_release(&dist->ready, true); > > No need the store-release and load-acquire for replacing > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > Tree SRCU on SMP: > call_srcu() > __call_srcu() > srcu_gp_start_if_needed() > __srcu_read_unlock_nmisafe() > #ifdef CONFIG_NEED_SRCU_NMI_SAFE > smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86. > #else > __srcu_read_unlock() > smp_mb() > #endif I don't think it's nice to depend on an implementation detail of kvm_io_bus_register_dev() and, transitively, on implementation details of call_srcu(). kvm_vgic_map_resources() isn't called that often and can afford its own synchronization. -- Keir > TINY SRCY on UP: > Should have no memory ordering issue on UP. > > > goto out_slots; > > out: > > mutex_unlock(&kvm->arch.config_lock); > > -- > > 2.50.0.727.gbf7dc18ff4-goog > >
On Fri, Jul 18, 2025 at 02:53:42PM +0000, Keir Fraser wrote: > On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote: > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > > In preparation to remove synchronize_srcu() from MMIO registration, > > > remove the distributor's dependency on this implicit barrier by > > > direct acquire-release synchronization on the flag write and its > > > lock-free check. > > > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > > --- > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > > index 502b65049703..bc83672e461b 100644 > > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > gpa_t dist_base; > > > int ret = 0; > > > > > > - if (likely(dist->ready)) > > > + if (likely(smp_load_acquire(&dist->ready))) > > > return 0; > > > > > > mutex_lock(&kvm->slots_lock); > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > goto out_slots; > > > } > > > > > > - /* > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > > - * registration before returning through synchronize_srcu(), which also > > > - * implies a full memory barrier. As such, marking the distributor as > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > > - * a completely configured distributor. > > > - */ > > > - dist->ready = true; > > > + smp_store_release(&dist->ready, true); > > > > No need the store-release and load-acquire for replacing > > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > > > Tree SRCU on SMP: > > call_srcu() > > __call_srcu() > > srcu_gp_start_if_needed() > > __srcu_read_unlock_nmisafe() > > #ifdef CONFIG_NEED_SRCU_NMI_SAFE > > smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86. > > #else > > __srcu_read_unlock() > > smp_mb() > > #endif > > I don't think it's nice to depend on an implementation detail of > kvm_io_bus_register_dev() and, transitively, on implementation details > of call_srcu(). Also I should note that this is moot because the smp_mb() would *not* safely replace the load-acquire. > kvm_vgic_map_resources() isn't called that often and can afford its > own synchronization. > > -- Keir > > > TINY SRCY on UP: > > Should have no memory ordering issue on UP. > > > > > goto out_slots; > > > out: > > > mutex_unlock(&kvm->arch.config_lock); > > > -- > > > 2.50.0.727.gbf7dc18ff4-goog > > >
On Fri, Jul 18, 2025 at 03:25:46PM +0000, Keir Fraser wrote: > On Fri, Jul 18, 2025 at 02:53:42PM +0000, Keir Fraser wrote: > > On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote: > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > > > In preparation to remove synchronize_srcu() from MMIO registration, > > > > remove the distributor's dependency on this implicit barrier by > > > > direct acquire-release synchronization on the flag write and its > > > > lock-free check. > > > > > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > > > --- > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > > > index 502b65049703..bc83672e461b 100644 > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > gpa_t dist_base; > > > > int ret = 0; > > > > > > > > - if (likely(dist->ready)) > > > > + if (likely(smp_load_acquire(&dist->ready))) > > > > return 0; > > > > > > > > mutex_lock(&kvm->slots_lock); > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > goto out_slots; > > > > } > > > > > > > > - /* > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > > > - * registration before returning through synchronize_srcu(), which also > > > > - * implies a full memory barrier. As such, marking the distributor as > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > > > - * a completely configured distributor. > > > > - */ > > > > - dist->ready = true; > > > > + smp_store_release(&dist->ready, true); > > > > > > No need the store-release and load-acquire for replacing > > > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > > > > > Tree SRCU on SMP: > > > call_srcu() > > > __call_srcu() > > > srcu_gp_start_if_needed() > > > __srcu_read_unlock_nmisafe() > > > #ifdef CONFIG_NEED_SRCU_NMI_SAFE > > > smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86. > > > #else > > > __srcu_read_unlock() > > > smp_mb() > > > #endif > > > > I don't think it's nice to depend on an implementation detail of > > kvm_io_bus_register_dev() and, transitively, on implementation details > > of call_srcu(). This is good point, I agree with you. > > Also I should note that this is moot because the smp_mb() would *not* > safely replace the load-acquire. Hmm.. do you mean it can't order the write to dist->ready here while read it in aonther thread at the same time ? > > > kvm_vgic_map_resources() isn't called that often and can afford its > > own synchronization. > > > > -- Keir > > > > > TINY SRCY on UP: > > > Should have no memory ordering issue on UP. > > > > > > > goto out_slots; > > > > out: > > > > mutex_unlock(&kvm->arch.config_lock); > > > > -- > > > > 2.50.0.727.gbf7dc18ff4-goog > > > > >
On Fri, Jul 18, 2025, Keir Fraser wrote: > On Thu, Jul 17, 2025 at 01:44:48PM +0800, Yao Yuan wrote: > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > > In preparation to remove synchronize_srcu() from MMIO registration, > > > remove the distributor's dependency on this implicit barrier by > > > direct acquire-release synchronization on the flag write and its > > > lock-free check. > > > > > > Signed-off-by: Keir Fraser <keirf@google.com> > > > --- > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > > index 502b65049703..bc83672e461b 100644 > > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > gpa_t dist_base; > > > int ret = 0; > > > > > > - if (likely(dist->ready)) > > > + if (likely(smp_load_acquire(&dist->ready))) > > > return 0; > > > > > > mutex_lock(&kvm->slots_lock); > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > goto out_slots; > > > } > > > > > > - /* > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > > - * registration before returning through synchronize_srcu(), which also > > > - * implies a full memory barrier. As such, marking the distributor as > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > > - * a completely configured distributor. > > > - */ > > > - dist->ready = true; > > > + smp_store_release(&dist->ready, true); > > > > No need the store-release and load-acquire for replacing > > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > > > Tree SRCU on SMP: > > call_srcu() > > __call_srcu() > > srcu_gp_start_if_needed() > > __srcu_read_unlock_nmisafe() > > #ifdef CONFIG_NEED_SRCU_NMI_SAFE > > smp_mb__before_atomic() // __smp_mb() on ARM64, do nothing on x86. > > #else > > __srcu_read_unlock() > > smp_mb() > > #endif > > I don't think it's nice to depend on an implementation detail of > kvm_io_bus_register_dev() and, transitively, on implementation details > of call_srcu(). > > kvm_vgic_map_resources() isn't called that often and can afford its > own synchronization. +1
© 2016 - 2025 Red Hat, Inc.