[v1] Introduce simple hazard pointers for lockdep

[RFC PATCH 0/8] Introduce simple hazard pointers for lockdep

Posted by Boqun Feng 9 months, 4 weeks ago

Hi,

This RFC is mostly a follow-up on discussion:

	https://lore.kernel.org/lkml/20250321-lockdep-v1-1-78b732d195fb@debian.org/

I found that using a hazard pointer variant can speed up the
lockdep_unregister_key(), on my system (a 96-cpu VMs), the results of:

	time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq

are

	(without the patchset)
	real    0m1.039s
	user    0m0.001s
	sys     0m0.069s

	(with the patchset)
	real    0m0.053s
	user    0m0.000s
	sys     0m0.051s

i.e. almost 20x speed-up.

Other comparisons between RCU and shazptr, the rcuscale results (using
default configuration from
tools/testing/selftests/rcutorture/bin/kvm.sh):

RCU:

	Average grace-period duration: 7470.02 microseconds
	Minimum grace-period duration: 3981.6
	50th percentile grace-period duration: 6002.73
	90th percentile grace-period duration: 7008.93
	99th percentile grace-period duration: 10015
	Maximum grace-period duration: 142228

shazptr:

	Average grace-period duration: 0.845825 microseconds
	Minimum grace-period duration: 0.199
	50th percentile grace-period duration: 0.585
	90th percentile grace-period duration: 1.656
	99th percentile grace-period duration: 3.872
	Maximum grace-period duration: 3049.05

shazptr (skip_synchronize_self_scan=1, i.e. always let scan kthread to
wakeup):

	Average grace-period duration: 467.861 microseconds
	Minimum grace-period duration: 92.913
	50th percentile grace-period duration: 440.691
	90th percentile grace-period duration: 460.623
	99th percentile grace-period duration: 650.068
	Maximum grace-period duration: 5775.46

shazptr_wildcard (i.e. readers always use SHAZPTR_WILDCARD):

	Average grace-period duration: 599.569 microseconds
	Minimum grace-period duration: 1.432
	50th percentile grace-period duration: 582.631
	90th percentile grace-period duration: 781.704
	99th percentile grace-period duration: 1160.26
	Maximum grace-period duration: 6727.53

shazptr_wildcard (skip_synchronize_self_scan=1):

	Average grace-period duration: 460.466 microseconds
	Minimum grace-period duration: 303.546
	50th percentile grace-period duration: 424.334
	90th percentile grace-period duration: 482.637
	99th percentile grace-period duration: 600.214
	Maximum grace-period duration: 4126.94
	

Overall it looks promising to me, but I would like to see how it
performs in the environment of Breno. Also as Paul always reminds me:
buggy code usually run faster, so please take a look in case I'm missing
something ;-) Thanks!

The patchset is based on v6.15-rc1.

Boqun Feng (8):
  Introduce simple hazard pointers
  shazptr: Add refscale test
  shazptr: Add refscale test for wildcard
  shazptr: Avoid synchronize_shaptr() busy waiting
  shazptr: Allow skip self scan in synchronize_shaptr()
  rcuscale: Allow rcu_scale_ops::get_gp_seq to be NULL
  rcuscale: Add tests for simple hazard pointers
  locking/lockdep: Use shazptr to protect the key hashlist

 include/linux/shazptr.h  |  73 +++++++++
 kernel/locking/Makefile  |   2 +-
 kernel/locking/lockdep.c |  11 +-
 kernel/locking/shazptr.c | 318 +++++++++++++++++++++++++++++++++++++++
 kernel/rcu/rcuscale.c    |  60 +++++++-
 kernel/rcu/refscale.c    |  77 ++++++++++
 6 files changed, 534 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/shazptr.h
 create mode 100644 kernel/locking/shazptr.c

-- 
2.47.1

Re: [RFC PATCH 0/8] Introduce simple hazard pointers for lockdep

Posted by Breno Leitao 9 months, 3 weeks ago

Hi Boqun,

On Sun, Apr 13, 2025 at 11:00:47PM -0700, Boqun Feng wrote:

> Overall it looks promising to me, but I would like to see how it
> performs in the environment of Breno. Also as Paul always reminds me:
> buggy code usually run faster, so please take a look in case I'm missing
> something ;-) Thanks!

Thanks for the patchset. I've confirmed that the wins are large on my
environment, but, at the same magnitute of synchronize_rcu_expedited().

Here are the numbers I got:

	6.15-rc1 (upstream)
		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
		real	0m3.986s
		user	0m0.001s
		sys	0m0.093s

	Your patchset on top of 6.15-rc1
		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
		real	0m0.072s
		user	0m0.001s
		sys	0m0.070s


	My original proposal of using synchronize_rcu_expedited()[1]
		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
		real	0m0.074s
		user	0m0.001s
		sys	0m0.061s

Link: https://lore.kernel.org/all/20250321-lockdep-v1-1-78b732d195fb@debian.org/ [1]

Thanks for working on it,
--breno

Re: [RFC PATCH 0/8] Introduce simple hazard pointers for lockdep

Posted by Uladzislau Rezki 9 months, 3 weeks ago

On Wed, Apr 16, 2025 at 07:14:04AM -0700, Breno Leitao wrote:
> Hi Boqun,
> 
> On Sun, Apr 13, 2025 at 11:00:47PM -0700, Boqun Feng wrote:
> 
> > Overall it looks promising to me, but I would like to see how it
> > performs in the environment of Breno. Also as Paul always reminds me:
> > buggy code usually run faster, so please take a look in case I'm missing
> > something ;-) Thanks!
> 
> Thanks for the patchset. I've confirmed that the wins are large on my
> environment, but, at the same magnitute of synchronize_rcu_expedited().
> 
> Here are the numbers I got:
> 
> 	6.15-rc1 (upstream)
> 		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> 		real	0m3.986s
> 		user	0m0.001s
> 		sys	0m0.093s
> 
> 	Your patchset on top of 6.15-rc1
> 		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> 		real	0m0.072s
> 		user	0m0.001s
> 		sys	0m0.070s
> 
> 
> 	My original proposal of using synchronize_rcu_expedited()[1]
> 		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> 		real	0m0.074s
> 		user	0m0.001s
> 		sys	0m0.061s
> 
> Link: https://lore.kernel.org/all/20250321-lockdep-v1-1-78b732d195fb@debian.org/ [1]
> 
Could you please also do the test of fist scenario with a regular
synchronize_rcu() but switch to its faster variant:

echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp

and run the test. If you have a time.

Thank you!

--
Vlad Rezki

Re: [RFC PATCH 0/8] Introduce simple hazard pointers for lockdep

Posted by Breno Leitao 9 months, 3 weeks ago

Hello Vlad,

On Wed, Apr 16, 2025 at 05:04:31PM +0200, Uladzislau Rezki wrote:
> On Wed, Apr 16, 2025 at 07:14:04AM -0700, Breno Leitao wrote:
> > Hi Boqun,
> > 
> > On Sun, Apr 13, 2025 at 11:00:47PM -0700, Boqun Feng wrote:
> > 
> > > Overall it looks promising to me, but I would like to see how it
> > > performs in the environment of Breno. Also as Paul always reminds me:
> > > buggy code usually run faster, so please take a look in case I'm missing
> > > something ;-) Thanks!
> > 
> > Thanks for the patchset. I've confirmed that the wins are large on my
> > environment, but, at the same magnitute of synchronize_rcu_expedited().
> > 
> > Here are the numbers I got:
> > 
> > 	6.15-rc1 (upstream)
> > 		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> > 		real	0m3.986s
> > 		user	0m0.001s
> > 		sys	0m0.093s
> > 
> > 	Your patchset on top of 6.15-rc1
> > 		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> > 		real	0m0.072s
> > 		user	0m0.001s
> > 		sys	0m0.070s
> > 
> > 
> > 	My original proposal of using synchronize_rcu_expedited()[1]
> > 		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> > 		real	0m0.074s
> > 		user	0m0.001s
> > 		sys	0m0.061s
> > 
> > Link: https://lore.kernel.org/all/20250321-lockdep-v1-1-78b732d195fb@debian.org/ [1]
> > 
> Could you please also do the test of fist scenario with a regular
> synchronize_rcu() but switch to its faster variant:
> 
> echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
> 
> and run the test. If you have a time.

Of course, I am more than interesting in this topic. This is what I run:


	# /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq; time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
	real	0m4.150s
	user	0m0.001s
	sys	0m0.076s

	[root@host2 ~]# echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
	[root@host2 ~]# /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq; time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
	real	0m4.225s
	user	0m0.000s
	sys	0m0.106s

	[root@host2 ~]# cat /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
	1
	[root@host2 ~]# echo 0 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
	[root@host2 ~]# /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq; time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
	real	0m4.152s
	user	0m0.001s
	sys	0m0.099s

It seems it made very little difference?

Thanks
--breno

Re: [RFC PATCH 0/8] Introduce simple hazard pointers for lockdep

Posted by Uladzislau Rezki 9 months, 3 weeks ago

Hello, Breno!

> Hello Vlad,
> 
> On Wed, Apr 16, 2025 at 05:04:31PM +0200, Uladzislau Rezki wrote:
> > On Wed, Apr 16, 2025 at 07:14:04AM -0700, Breno Leitao wrote:
> > > Hi Boqun,
> > > 
> > > On Sun, Apr 13, 2025 at 11:00:47PM -0700, Boqun Feng wrote:
> > > 
> > > > Overall it looks promising to me, but I would like to see how it
> > > > performs in the environment of Breno. Also as Paul always reminds me:
> > > > buggy code usually run faster, so please take a look in case I'm missing
> > > > something ;-) Thanks!
> > > 
> > > Thanks for the patchset. I've confirmed that the wins are large on my
> > > environment, but, at the same magnitute of synchronize_rcu_expedited().
> > > 
> > > Here are the numbers I got:
> > > 
> > > 	6.15-rc1 (upstream)
> > > 		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> > > 		real	0m3.986s
> > > 		user	0m0.001s
> > > 		sys	0m0.093s
> > > 
> > > 	Your patchset on top of 6.15-rc1
> > > 		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> > > 		real	0m0.072s
> > > 		user	0m0.001s
> > > 		sys	0m0.070s
> > > 
> > > 
> > > 	My original proposal of using synchronize_rcu_expedited()[1]
> > > 		# time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> > > 		real	0m0.074s
> > > 		user	0m0.001s
> > > 		sys	0m0.061s
> > > 
> > > Link: https://lore.kernel.org/all/20250321-lockdep-v1-1-78b732d195fb@debian.org/ [1]
> > > 
> > Could you please also do the test of fist scenario with a regular
> > synchronize_rcu() but switch to its faster variant:
> > 
> > echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
> > 
> > and run the test. If you have a time.
> 
> Of course, I am more than interesting in this topic. This is what I run:
> 
> 
> 	# /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq; time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> 	real	0m4.150s
> 	user	0m0.001s
> 	sys	0m0.076s
> 
> 	[root@host2 ~]# echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
> 	[root@host2 ~]# /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq; time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> 	real	0m4.225s
> 	user	0m0.000s
> 	sys	0m0.106s
> 
> 	[root@host2 ~]# cat /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
> 	1
> 	[root@host2 ~]# echo 0 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
> 	[root@host2 ~]# /usr/sbin/tc qdisc replace dev eth0 root handle 0x1: mq; time /usr/sbin/tc qdisc replace dev eth0 root handle 0x1234: mq
> 	real	0m4.152s
> 	user	0m0.001s
> 	sys	0m0.099s
> 
> It seems it made very little difference?
> 
Yep, no difference. In your case you just need to speed up a grace
period completion. So an expedited version really improves your case.

So you do not have a lot of callbacks which may delay a normal GP.

Thank you!

--
Uladzislau Rezki