[v1] Make wake_up_{bit,var} less fragile

[PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()

Posted by NeilBrown 1 year, 5 months ago

bd_prepare_to_claim() current uses a bit waitqueue with a matching
wake_up_bit() in bd_clear_claiming().  However it is really waiting on a
"var", not a "bit".

So change to wake_up_var(), and use ___wait_var_event() for the waiting.
Using the triple-underscore version allows us to drop the mutex across
the schedule() call.

Add a missing memory barrier before the wake_up_var() call.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 block/bdev.c | 49 ++++++++++++++++++++-----------------------------
 1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index c5507b6f63b8..d804c91c651b 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -487,10 +487,10 @@ long nr_blockdev_pages(void)
  * Test whether @bdev can be claimed by @holder.
  *
  * RETURNS:
- * %true if @bdev can be claimed, %false otherwise.
+ * %0 if @bdev can be claimed, %-EBUSY otherwise.
  */
-static bool bd_may_claim(struct block_device *bdev, void *holder,
-		const struct blk_holder_ops *hops)
+static int bd_may_claim(struct block_device *bdev, void *holder,
+			const struct blk_holder_ops *hops)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
@@ -503,9 +503,9 @@ static bool bd_may_claim(struct block_device *bdev, void *holder,
 		if (bdev->bd_holder == holder) {
 			if (WARN_ON_ONCE(bdev->bd_holder_ops != hops))
 				return false;
-			return true;
+			return 0;
 		}
-		return false;
+		return -EBUSY;
 	}
 
 	/*
@@ -514,8 +514,8 @@ static bool bd_may_claim(struct block_device *bdev, void *holder,
 	 */
 	if (whole != bdev &&
 	    whole->bd_holder && whole->bd_holder != bd_may_claim)
-		return false;
-	return true;
+		return -EBUSY;
+	return 0;
 }
 
 /**
@@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
 		const struct blk_holder_ops *hops)
 {
 	struct block_device *whole = bdev_whole(bdev);
+	int err = 0;
 
 	if (WARN_ON_ONCE(!holder))
 		return -EINVAL;
-retry:
-	mutex_lock(&bdev_lock);
-	/* if someone else claimed, fail */
-	if (!bd_may_claim(bdev, holder, hops)) {
-		mutex_unlock(&bdev_lock);
-		return -EBUSY;
-	}
-
-	/* if claiming is already in progress, wait for it to finish */
-	if (whole->bd_claiming) {
-		wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
-		DEFINE_WAIT(wait);
 
-		prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
-		mutex_unlock(&bdev_lock);
-		schedule();
-		finish_wait(wq, &wait);
-		goto retry;
-	}
+	mutex_lock(&bdev_lock);
+	___wait_var_event(&whole->bd_claiming,
+			  (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
+			  TASK_UNINTERRUPTIBLE, 0, 0,
+			  mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
 
-	/* yay, all mine */
-	whole->bd_claiming = holder;
+	/* if someone else claimed, fail */
+	if (!err)
+		/* yay, all mine */
+		whole->bd_claiming = holder;
 	mutex_unlock(&bdev_lock);
-	return 0;
+	return err;
 }
 EXPORT_SYMBOL_GPL(bd_prepare_to_claim); /* only for the loop driver */
 
@@ -571,7 +561,8 @@ static void bd_clear_claiming(struct block_device *whole, void *holder)
 	/* tell others that we're done */
 	BUG_ON(whole->bd_claiming != holder);
 	whole->bd_claiming = NULL;
-	wake_up_bit(&whole->bd_claiming, 0);
+	smp_mb();
+	wake_up_var(&whole->bd_claiming);
 }
 
 /**
-- 
2.44.0

Re: [PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()

Posted by kernel test robot 1 year, 5 months ago


Hello,

kernel test robot noticed "kernel_BUG_at_block/bdev.c" on:

commit: b80d7798a6f9db2c094014570a73728ace8d844d ("[PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()")
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/i915-remove-wake_up-on-I915_RESET_MODESET/20240819-134414
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
patch link: https://lore.kernel.org/all/20240819053605.11706-6-neilb@suse.de/
patch subject: [PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()

in testcase: boot

compiler: clang-18
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+------------------------------------------+------------+------------+
|                                          | 30a670cac3 | b80d7798a6 |
+------------------------------------------+------------+------------+
| boot_successes                           | 9          | 0          |
| boot_failures                            | 0          | 9          |
| kernel_BUG_at_block/bdev.c               | 0          | 9          |
| Oops:invalid_opcode:#[##]PREEMPT_SMP_PTI | 0          | 9          |
| RIP:bd_finish_claiming                   | 0          | 9          |
| Kernel_panic-not_syncing:Fatal_exception | 0          | 9          |
+------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202408211439.954a6d41-lkp@intel.com


[    8.768327][ T2733] ------------[ cut here ]------------
[    8.768333][ T2733] kernel BUG at block/bdev.c:583!
[    8.768342][ T2733] Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
[    8.768348][ T2733] CPU: 1 UID: 0 PID: 2733 Comm: cdrom_id Not tainted 6.11.0-rc3-00005-gb80d7798a6f9 #1
[    8.768352][ T2733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 8.768355][ T2733] RIP: 0010:bd_finish_claiming (block/bdev.c:583) 
[ 8.768388][ T2733] Code: 48 c7 03 00 00 00 00 f0 83 44 24 fc 00 48 89 df e8 0f bc b1 ff 48 c7 c7 00 97 a0 82 5b 41 5c 41 5d 41 5e 41 5f e9 5a aa 95 00 <0f> 0b 0f 0b 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90
All code
========
   0:	48 c7 03 00 00 00 00 	movq   $0x0,(%rbx)
   7:	f0 83 44 24 fc 00    	lock addl $0x0,-0x4(%rsp)
   d:	48 89 df             	mov    %rbx,%rdi
  10:	e8 0f bc b1 ff       	call   0xffffffffffb1bc24
  15:	48 c7 c7 00 97 a0 82 	mov    $0xffffffff82a09700,%rdi
  1c:	5b                   	pop    %rbx
  1d:	41 5c                	pop    %r12
  1f:	41 5d                	pop    %r13
  21:	41 5e                	pop    %r14
  23:	41 5f                	pop    %r15
  25:	e9 5a aa 95 00       	jmp    0x95aa84
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	0f 0b                	ud2
  2e:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  34:	90                   	nop
  35:	90                   	nop
  36:	90                   	nop
  37:	90                   	nop
  38:	90                   	nop
  39:	90                   	nop
  3a:	90                   	nop
  3b:	90                   	nop
  3c:	90                   	nop
  3d:	90                   	nop
  3e:	90                   	nop
  3f:	90                   	nop

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	0f 0b                	ud2
   4:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
   a:	90                   	nop
   b:	90                   	nop
   c:	90                   	nop
   d:	90                   	nop
   e:	90                   	nop
   f:	90                   	nop
  10:	90                   	nop
  11:	90                   	nop
  12:	90                   	nop
  13:	90                   	nop
  14:	90                   	nop
  15:	90                   	nop
[    8.768392][ T2733] RSP: 0000:ffffc90000a5bc00 EFLAGS: 00010246
[    8.768396][ T2733] RAX: 0000000000000000 RBX: ffff888125940000 RCX: 0000000000000000
[    8.768398][ T2733] RDX: 0000000000000000 RSI: ffff88812a326800 RDI: ffff888125940000
[    8.768400][ T2733] RBP: 000000000000000d R08: 0000000000000004 R09: 00000002f1ee446d
[    8.768402][ T2733] R10: 00646b636f6c6200 R11: ffffffffa0015f60 R12: ffff888125940000
[    8.768404][ T2733] R13: 0000000000000000 R14: ffff88812a326800 R15: 0000000000000000
[    8.768407][ T2733] FS:  0000000000000000(0000) GS:ffff88842fd00000(0063) knlGS:00000000f7d406c0
[    8.768410][ T2733] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[    8.768412][ T2733] CR2: 00000000f7d0315c CR3: 000000012a244000 CR4: 00000000000406f0
[    8.768417][ T2733] Call Trace:
[    8.769866][ T2733]  <TASK>
[ 8.769875][ T2733] ? __die_body (arch/x86/kernel/dumpstack.c:421) 
[ 8.769884][ T2733] ? die (arch/x86/kernel/dumpstack.c:? arch/x86/kernel/dumpstack.c:447) 
[ 8.769888][ T2733] ? do_trap (arch/x86/kernel/traps.c:129) 
[ 8.769893][ T2733] ? bd_finish_claiming (block/bdev.c:583) 
[ 8.769898][ T2733] ? do_error_trap (arch/x86/kernel/traps.c:175) 
[ 8.769902][ T2733] ? bd_finish_claiming (block/bdev.c:583) 
[ 8.769905][ T2733] ? handle_invalid_op (arch/x86/kernel/traps.c:212) 
[ 8.769909][ T2733] ? bd_finish_claiming (block/bdev.c:583) 
[ 8.769912][ T2733] ? exc_invalid_op (arch/x86/kernel/traps.c:267) 
[ 8.769917][ T2733] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) 
[ 8.769926][ T2733] ? __pfx_sr_open (drivers/scsi/sr.c:593) sr_mod
[ 8.769932][ T2733] ? bd_finish_claiming (block/bdev.c:583) 
[ 8.769936][ T2733] bdev_open (block/bdev.c:914) 
[ 8.769940][ T2733] ? iput (fs/inode.c:1821) 
[ 8.769945][ T2733] blkdev_open (block/fops.c:631) 
[ 8.769949][ T2733] ? __pfx_blkdev_open (block/fops.c:610) 
[ 8.769952][ T2733] do_dentry_open (fs/open.c:959) 
[ 8.769958][ T2733] vfs_open (fs/open.c:1089) 
[ 8.769962][ T2733] path_openat (fs/namei.c:3727) 
[ 8.769966][ T2733] ? call_rcu (kernel/rcu/tree.c:2996) 
[ 8.769972][ T2733] do_filp_open (fs/namei.c:3913) 
[ 8.769978][ T2733] do_sys_openat2 (fs/open.c:1416) 
[ 8.769982][ T2733] do_sys_open (fs/open.c:1431) 
[ 8.769986][ T2733] do_int80_emulation (arch/x86/entry/common.c:?) 
[ 8.769990][ T2733] ? irqentry_exit_to_user_mode (arch/x86/include/asm/processor.h:702 arch/x86/include/asm/entry-common.h:91 include/linux/entry-common.h:364 kernel/entry/common.c:233) 
[ 8.769994][ T2733] asm_int80_emulation (arch/x86/include/asm/idtentry.h:626) 
[    8.769999][ T2733] RIP: 0023:0xf7f111b2
[ 8.770004][ T2733] Code: 89 c2 31 c0 89 d7 f3 aa 8b 44 24 1c 89 30 c6 40 04 00 83 c4 2c 89 f0 5b 5e 5f 5d c3 90 90 90 90 90 90 90 90 90 90 90 90 cd 80 <c3> 8d b6 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
All code
========
   0:	89 c2                	mov    %eax,%edx
   2:	31 c0                	xor    %eax,%eax
   4:	89 d7                	mov    %edx,%edi
   6:	f3 aa                	rep stos %al,%es:(%rdi)
   8:	8b 44 24 1c          	mov    0x1c(%rsp),%eax
   c:	89 30                	mov    %esi,(%rax)
   e:	c6 40 04 00          	movb   $0x0,0x4(%rax)
  12:	83 c4 2c             	add    $0x2c,%esp
  15:	89 f0                	mov    %esi,%eax
  17:	5b                   	pop    %rbx
  18:	5e                   	pop    %rsi
  19:	5f                   	pop    %rdi
  1a:	5d                   	pop    %rbp
  1b:	c3                   	ret
  1c:	90                   	nop
  1d:	90                   	nop
  1e:	90                   	nop
  1f:	90                   	nop
  20:	90                   	nop
  21:	90                   	nop
  22:	90                   	nop
  23:	90                   	nop
  24:	90                   	nop
  25:	90                   	nop
  26:	90                   	nop
  27:	90                   	nop
  28:	cd 80                	int    $0x80
  2a:*	c3                   	ret		<-- trapping instruction
  2b:	8d b6 00 00 00 00    	lea    0x0(%rsi),%esi
  31:	8d bc 27 00 00 00 00 	lea    0x0(%rdi,%riz,1),%edi
  38:	8b 1c 24             	mov    (%rsp),%ebx
  3b:	c3                   	ret
  3c:	8d                   	.byte 0x8d
  3d:	b6 00                	mov    $0x0,%dh
	...

Code starting with the faulting instruction
===========================================
   0:	c3                   	ret
   1:	8d b6 00 00 00 00    	lea    0x0(%rsi),%esi
   7:	8d bc 27 00 00 00 00 	lea    0x0(%rdi,%riz,1),%edi
   e:	8b 1c 24             	mov    (%rsp),%ebx
  11:	c3                   	ret
  12:	8d                   	.byte 0x8d
  13:	b6 00                	mov    $0x0,%dh


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240821/202408211439.954a6d41-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()

Posted by Hillf Danton 1 year, 5 months ago

On Mon, 19 Aug 2024 15:20:39 +1000 NeilBrown <neilb@suse.de>
> 
> @@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
>  		const struct blk_holder_ops *hops)
>  {
>  	struct block_device *whole = bdev_whole(bdev);
> +	int err = 0;
>  
>  	if (WARN_ON_ONCE(!holder))
>  		return -EINVAL;
> -retry:
> -	mutex_lock(&bdev_lock);
> -	/* if someone else claimed, fail */
> -	if (!bd_may_claim(bdev, holder, hops)) {
> -		mutex_unlock(&bdev_lock);
> -		return -EBUSY;
> -	}
> -
> -	/* if claiming is already in progress, wait for it to finish */
> -	if (whole->bd_claiming) {
> -		wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
> -		DEFINE_WAIT(wait);
>  
> -		prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> -		mutex_unlock(&bdev_lock);
> -		schedule();
> -		finish_wait(wq, &wait);
> -		goto retry;
> -	}
> +	mutex_lock(&bdev_lock);
> +	___wait_var_event(&whole->bd_claiming,
> +			  (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> +			  TASK_UNINTERRUPTIBLE, 0, 0,
> +			  mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
>  
At the first glance you add the coding pattern not recommended for the block
directory. Second, you abuse ___wait_var_event() simply because it is available.

> -	/* yay, all mine */
> -	whole->bd_claiming = holder;
> +	/* if someone else claimed, fail */
> +	if (!err)
> +		/* yay, all mine */
> +		whole->bd_claiming = holder;
>  	mutex_unlock(&bdev_lock);
> -	return 0;
> +	return err;
>  }
>  EXPORT_SYMBOL_GPL(bd_prepare_to_claim); /* only for the loop driver */
>  
> @@ -571,7 +561,8 @@ static void bd_clear_claiming(struct block_device *whole, void *holder)
>  	/* tell others that we're done */
>  	BUG_ON(whole->bd_claiming != holder);
>  	whole->bd_claiming = NULL;
> -	wake_up_bit(&whole->bd_claiming, 0);
> +	smp_mb();
> +	wake_up_var(&whole->bd_claiming);

Third, worse, you have no real idea why mb is needed.
>  }
>  
>  /**
> -- 
> 2.44.0

Re: [PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()

Posted by Dave Chinner 1 year, 5 months ago

On Mon, Aug 19, 2024 at 03:20:39PM +1000, NeilBrown wrote:
> bd_prepare_to_claim() current uses a bit waitqueue with a matching
> wake_up_bit() in bd_clear_claiming().  However it is really waiting on a
> "var", not a "bit".
> 
> So change to wake_up_var(), and use ___wait_var_event() for the waiting.
> Using the triple-underscore version allows us to drop the mutex across
> the schedule() call.
....
> @@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
>  		const struct blk_holder_ops *hops)
>  {
>  	struct block_device *whole = bdev_whole(bdev);
> +	int err = 0;
>  
>  	if (WARN_ON_ONCE(!holder))
>  		return -EINVAL;
> -retry:
> -	mutex_lock(&bdev_lock);
> -	/* if someone else claimed, fail */
> -	if (!bd_may_claim(bdev, holder, hops)) {
> -		mutex_unlock(&bdev_lock);
> -		return -EBUSY;
> -	}
> -
> -	/* if claiming is already in progress, wait for it to finish */
> -	if (whole->bd_claiming) {
> -		wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
> -		DEFINE_WAIT(wait);
>  
> -		prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> -		mutex_unlock(&bdev_lock);
> -		schedule();
> -		finish_wait(wq, &wait);
> -		goto retry;
> -	}
> +	mutex_lock(&bdev_lock);
> +	___wait_var_event(&whole->bd_claiming,
> +			  (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> +			  TASK_UNINTERRUPTIBLE, 0, 0,
> +			  mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));

That's not an improvement. Instead of nice, obvious, readable code,
I now have to go look at a macro and manually substitute the
parameters to work out what this abomination actually does.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

Re: [PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()

Posted by NeilBrown 1 year, 5 months ago

On Tue, 20 Aug 2024, Dave Chinner wrote:
> On Mon, Aug 19, 2024 at 03:20:39PM +1000, NeilBrown wrote:
> > bd_prepare_to_claim() current uses a bit waitqueue with a matching
> > wake_up_bit() in bd_clear_claiming().  However it is really waiting on a
> > "var", not a "bit".
> > 
> > So change to wake_up_var(), and use ___wait_var_event() for the waiting.
> > Using the triple-underscore version allows us to drop the mutex across
> > the schedule() call.
> ....
> > @@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> >  		const struct blk_holder_ops *hops)
> >  {
> >  	struct block_device *whole = bdev_whole(bdev);
> > +	int err = 0;
> >  
> >  	if (WARN_ON_ONCE(!holder))
> >  		return -EINVAL;
> > -retry:
> > -	mutex_lock(&bdev_lock);
> > -	/* if someone else claimed, fail */
> > -	if (!bd_may_claim(bdev, holder, hops)) {
> > -		mutex_unlock(&bdev_lock);
> > -		return -EBUSY;
> > -	}
> > -
> > -	/* if claiming is already in progress, wait for it to finish */
> > -	if (whole->bd_claiming) {
> > -		wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
> > -		DEFINE_WAIT(wait);
> >  
> > -		prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > -		mutex_unlock(&bdev_lock);
> > -		schedule();
> > -		finish_wait(wq, &wait);
> > -		goto retry;
> > -	}
> > +	mutex_lock(&bdev_lock);
> > +	___wait_var_event(&whole->bd_claiming,
> > +			  (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> > +			  TASK_UNINTERRUPTIBLE, 0, 0,
> > +			  mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
> 
> That's not an improvement. Instead of nice, obvious, readable code,
> I now have to go look at a macro and manually substitute the
> parameters to work out what this abomination actually does.

Interesting - I thought the function as a whole was more readable this
way.
I agree that the ___wait_var_event macro isn't the best part.
Is your dislike simply that it isn't a macro that you are familar with,
or is there something specific that you don't like?

Suppose we could add a new macro so that it read:

     wait_var_event_mutex(&whole->bd_claiming,
			  (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
			  &bdev_lock);

would that be less abominable?

Thanks,
NeilBrown

Re: [PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()

Posted by Dave Chinner 1 year, 5 months ago

On Wed, Aug 21, 2024 at 07:52:39AM +1000, NeilBrown wrote:
> On Tue, 20 Aug 2024, Dave Chinner wrote:
> > On Mon, Aug 19, 2024 at 03:20:39PM +1000, NeilBrown wrote:
> > > bd_prepare_to_claim() current uses a bit waitqueue with a matching
> > > wake_up_bit() in bd_clear_claiming().  However it is really waiting on a
> > > "var", not a "bit".
> > > 
> > > So change to wake_up_var(), and use ___wait_var_event() for the waiting.
> > > Using the triple-underscore version allows us to drop the mutex across
> > > the schedule() call.
> > ....
> > > @@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> > >  		const struct blk_holder_ops *hops)
> > >  {
> > >  	struct block_device *whole = bdev_whole(bdev);
> > > +	int err = 0;
> > >  
> > >  	if (WARN_ON_ONCE(!holder))
> > >  		return -EINVAL;
> > > -retry:
> > > -	mutex_lock(&bdev_lock);
> > > -	/* if someone else claimed, fail */
> > > -	if (!bd_may_claim(bdev, holder, hops)) {
> > > -		mutex_unlock(&bdev_lock);
> > > -		return -EBUSY;
> > > -	}
> > > -
> > > -	/* if claiming is already in progress, wait for it to finish */
> > > -	if (whole->bd_claiming) {
> > > -		wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
> > > -		DEFINE_WAIT(wait);
> > >  
> > > -		prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > > -		mutex_unlock(&bdev_lock);
> > > -		schedule();
> > > -		finish_wait(wq, &wait);
> > > -		goto retry;
> > > -	}
> > > +	mutex_lock(&bdev_lock);
> > > +	___wait_var_event(&whole->bd_claiming,
> > > +			  (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> > > +			  TASK_UNINTERRUPTIBLE, 0, 0,
> > > +			  mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
> > 
> > That's not an improvement. Instead of nice, obvious, readable code,
> > I now have to go look at a macro and manually substitute the
> > parameters to work out what this abomination actually does.
> 
> Interesting - I thought the function as a whole was more readable this
> way.
> I agree that the ___wait_var_event macro isn't the best part.
> Is your dislike simply that it isn't a macro that you are familar with,
> or is there something specific that you don't like?

It's the encoding of non-trivial logic and code into the macro
parameters that is the problem....

> Suppose we could add a new macro so that it read:
> 
>      wait_var_event_mutex(&whole->bd_claiming,
> 			  (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> 			  &bdev_lock);

.... and this still does it. 

In fact, it's worse, because now I have -zero idea- of what locking
is being performed in this case, and so now I definitely have to go
pull that macro apart to understand what this is actually doing.

Complex macros don't make understanding the code easier - they may
make writing the code faster, but that comes at the expense of
clarity and obviousness of the logic flow of the code...

-Dave.

-- 
Dave Chinner
david@fromorbit.com

Re: [PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()

Posted by NeilBrown 1 year, 5 months ago

On Wed, 28 Aug 2024, Dave Chinner wrote:
> On Wed, Aug 21, 2024 at 07:52:39AM +1000, NeilBrown wrote:
> > On Tue, 20 Aug 2024, Dave Chinner wrote:
> > > On Mon, Aug 19, 2024 at 03:20:39PM +1000, NeilBrown wrote:
> > > > bd_prepare_to_claim() current uses a bit waitqueue with a matching
> > > > wake_up_bit() in bd_clear_claiming().  However it is really waiting on a
> > > > "var", not a "bit".
> > > > 
> > > > So change to wake_up_var(), and use ___wait_var_event() for the waiting.
> > > > Using the triple-underscore version allows us to drop the mutex across
> > > > the schedule() call.
> > > ....
> > > > @@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> > > >  		const struct blk_holder_ops *hops)
> > > >  {
> > > >  	struct block_device *whole = bdev_whole(bdev);
> > > > +	int err = 0;
> > > >  
> > > >  	if (WARN_ON_ONCE(!holder))
> > > >  		return -EINVAL;
> > > > -retry:
> > > > -	mutex_lock(&bdev_lock);
> > > > -	/* if someone else claimed, fail */
> > > > -	if (!bd_may_claim(bdev, holder, hops)) {
> > > > -		mutex_unlock(&bdev_lock);
> > > > -		return -EBUSY;
> > > > -	}
> > > > -
> > > > -	/* if claiming is already in progress, wait for it to finish */
> > > > -	if (whole->bd_claiming) {
> > > > -		wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
> > > > -		DEFINE_WAIT(wait);
> > > >  
> > > > -		prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > > > -		mutex_unlock(&bdev_lock);
> > > > -		schedule();
> > > > -		finish_wait(wq, &wait);
> > > > -		goto retry;
> > > > -	}
> > > > +	mutex_lock(&bdev_lock);
> > > > +	___wait_var_event(&whole->bd_claiming,
> > > > +			  (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> > > > +			  TASK_UNINTERRUPTIBLE, 0, 0,
> > > > +			  mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
> > > 
> > > That's not an improvement. Instead of nice, obvious, readable code,
> > > I now have to go look at a macro and manually substitute the
> > > parameters to work out what this abomination actually does.
> > 
> > Interesting - I thought the function as a whole was more readable this
> > way.
> > I agree that the ___wait_var_event macro isn't the best part.
> > Is your dislike simply that it isn't a macro that you are familar with,
> > or is there something specific that you don't like?
> 
> It's the encoding of non-trivial logic and code into the macro
> parameters that is the problem....

It would probably make sense to move all the logic into bd_may_claim()
so that it returns:
  -EBUSY if claim cannot succeed
  -EAGAIN if claim might succeed soon, or
  0 if it can be claimed now.
Then the wait becomes:

   wait_var_event_mutex(&whole->bd_claiming,
			bd_may_claim(bdev, holder, hops) != -EAGAIN,
			&bdev_lock);

> 
> > Suppose we could add a new macro so that it read:
> > 
> >      wait_var_event_mutex(&whole->bd_claiming,
> > 			  (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> > 			  &bdev_lock);
> 
> .... and this still does it. 
> 
> In fact, it's worse, because now I have -zero idea- of what locking
> is being performed in this case, and so now I definitely have to go
> pull that macro apart to understand what this is actually doing.
> 
> Complex macros don't make understanding the code easier - they may
> make writing the code faster, but that comes at the expense of
> clarity and obviousness of the logic flow of the code...

I think that SIMPLE macros rarely make the code easier to understand -
for precisely the reason that you have to go and find out what the macro
actually does.
Complex macros obviously suffer the same problem but I believe they
bring tangible benefits by making review easier for those who understand
the macros, and consequently reducing bugs.

I'm currently particularly sensitive to this since finding that the
open-coded wait loop in pkt_make_request_write() - which I wrote - is
missing a finish_wait() call.  Ouch.  If there had been a
wait_var_event_spinlock() when I wrote that code, the mistake would not
have happened.

The argument about locking being non-obvious is, I think, doubly true
for wait_on_bit_lock().  But that is still a useful interface.

Thanks,
NeilBrown

[PATCH 1/9] i915: remove wake_up on I915_RESET_MODESET.
[PATCH 2/9] Introduce atomic_dec_and_wake_up_var().
[PATCH 3/9] XFS: use wait_var_event() when waiting of i_pincount.
[PATCH 4/9] Use wait_var_event() instead of I_DIO_WAKEUP
[PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()
[PATCH 6/9] block/pktdvd: switch congestion waiting to ___wait_var_event()
[PATCH 7/9] Improve and expand wake_up_bit() interface.
[PATCH 8/9] Improve and extend wake_up_var() interface.
[PATCH 9/9] Use clear_and_wake_up_bit() where appropriate.