bd_prepare_to_claim() current uses a bit waitqueue with a matching
wake_up_bit() in bd_clear_claiming(). However it is really waiting on a
"var", not a "bit".
So change to wake_up_var(), and use ___wait_var_event() for the waiting.
Using the triple-underscore version allows us to drop the mutex across
the schedule() call.
Add a missing memory barrier before the wake_up_var() call.
Signed-off-by: NeilBrown <neilb@suse.de>
---
block/bdev.c | 49 ++++++++++++++++++++-----------------------------
1 file changed, 20 insertions(+), 29 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index c5507b6f63b8..d804c91c651b 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -487,10 +487,10 @@ long nr_blockdev_pages(void)
* Test whether @bdev can be claimed by @holder.
*
* RETURNS:
- * %true if @bdev can be claimed, %false otherwise.
+ * %0 if @bdev can be claimed, %-EBUSY otherwise.
*/
-static bool bd_may_claim(struct block_device *bdev, void *holder,
- const struct blk_holder_ops *hops)
+static int bd_may_claim(struct block_device *bdev, void *holder,
+ const struct blk_holder_ops *hops)
{
struct block_device *whole = bdev_whole(bdev);
@@ -503,9 +503,9 @@ static bool bd_may_claim(struct block_device *bdev, void *holder,
if (bdev->bd_holder == holder) {
if (WARN_ON_ONCE(bdev->bd_holder_ops != hops))
return false;
- return true;
+ return 0;
}
- return false;
+ return -EBUSY;
}
/*
@@ -514,8 +514,8 @@ static bool bd_may_claim(struct block_device *bdev, void *holder,
*/
if (whole != bdev &&
whole->bd_holder && whole->bd_holder != bd_may_claim)
- return false;
- return true;
+ return -EBUSY;
+ return 0;
}
/**
@@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
const struct blk_holder_ops *hops)
{
struct block_device *whole = bdev_whole(bdev);
+ int err = 0;
if (WARN_ON_ONCE(!holder))
return -EINVAL;
-retry:
- mutex_lock(&bdev_lock);
- /* if someone else claimed, fail */
- if (!bd_may_claim(bdev, holder, hops)) {
- mutex_unlock(&bdev_lock);
- return -EBUSY;
- }
-
- /* if claiming is already in progress, wait for it to finish */
- if (whole->bd_claiming) {
- wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
- DEFINE_WAIT(wait);
- prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
- mutex_unlock(&bdev_lock);
- schedule();
- finish_wait(wq, &wait);
- goto retry;
- }
+ mutex_lock(&bdev_lock);
+ ___wait_var_event(&whole->bd_claiming,
+ (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
+ TASK_UNINTERRUPTIBLE, 0, 0,
+ mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
- /* yay, all mine */
- whole->bd_claiming = holder;
+ /* if someone else claimed, fail */
+ if (!err)
+ /* yay, all mine */
+ whole->bd_claiming = holder;
mutex_unlock(&bdev_lock);
- return 0;
+ return err;
}
EXPORT_SYMBOL_GPL(bd_prepare_to_claim); /* only for the loop driver */
@@ -571,7 +561,8 @@ static void bd_clear_claiming(struct block_device *whole, void *holder)
/* tell others that we're done */
BUG_ON(whole->bd_claiming != holder);
whole->bd_claiming = NULL;
- wake_up_bit(&whole->bd_claiming, 0);
+ smp_mb();
+ wake_up_var(&whole->bd_claiming);
}
/**
--
2.44.0
Hello,
kernel test robot noticed "kernel_BUG_at_block/bdev.c" on:
commit: b80d7798a6f9db2c094014570a73728ace8d844d ("[PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()")
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/i915-remove-wake_up-on-I915_RESET_MODESET/20240819-134414
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
patch link: https://lore.kernel.org/all/20240819053605.11706-6-neilb@suse.de/
patch subject: [PATCH 5/9] Block: switch bd_prepare_to_claim to use ___wait_var_event()
in testcase: boot
compiler: clang-18
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
+------------------------------------------+------------+------------+
| | 30a670cac3 | b80d7798a6 |
+------------------------------------------+------------+------------+
| boot_successes | 9 | 0 |
| boot_failures | 0 | 9 |
| kernel_BUG_at_block/bdev.c | 0 | 9 |
| Oops:invalid_opcode:#[##]PREEMPT_SMP_PTI | 0 | 9 |
| RIP:bd_finish_claiming | 0 | 9 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 9 |
+------------------------------------------+------------+------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202408211439.954a6d41-lkp@intel.com
[ 8.768327][ T2733] ------------[ cut here ]------------
[ 8.768333][ T2733] kernel BUG at block/bdev.c:583!
[ 8.768342][ T2733] Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 8.768348][ T2733] CPU: 1 UID: 0 PID: 2733 Comm: cdrom_id Not tainted 6.11.0-rc3-00005-gb80d7798a6f9 #1
[ 8.768352][ T2733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 8.768355][ T2733] RIP: 0010:bd_finish_claiming (block/bdev.c:583)
[ 8.768388][ T2733] Code: 48 c7 03 00 00 00 00 f0 83 44 24 fc 00 48 89 df e8 0f bc b1 ff 48 c7 c7 00 97 a0 82 5b 41 5c 41 5d 41 5e 41 5f e9 5a aa 95 00 <0f> 0b 0f 0b 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90
All code
========
0: 48 c7 03 00 00 00 00 movq $0x0,(%rbx)
7: f0 83 44 24 fc 00 lock addl $0x0,-0x4(%rsp)
d: 48 89 df mov %rbx,%rdi
10: e8 0f bc b1 ff call 0xffffffffffb1bc24
15: 48 c7 c7 00 97 a0 82 mov $0xffffffff82a09700,%rdi
1c: 5b pop %rbx
1d: 41 5c pop %r12
1f: 41 5d pop %r13
21: 41 5e pop %r14
23: 41 5f pop %r15
25: e9 5a aa 95 00 jmp 0x95aa84
2a:* 0f 0b ud2 <-- trapping instruction
2c: 0f 0b ud2
2e: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
34: 90 nop
35: 90 nop
36: 90 nop
37: 90 nop
38: 90 nop
39: 90 nop
3a: 90 nop
3b: 90 nop
3c: 90 nop
3d: 90 nop
3e: 90 nop
3f: 90 nop
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 0f 0b ud2
4: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
a: 90 nop
b: 90 nop
c: 90 nop
d: 90 nop
e: 90 nop
f: 90 nop
10: 90 nop
11: 90 nop
12: 90 nop
13: 90 nop
14: 90 nop
15: 90 nop
[ 8.768392][ T2733] RSP: 0000:ffffc90000a5bc00 EFLAGS: 00010246
[ 8.768396][ T2733] RAX: 0000000000000000 RBX: ffff888125940000 RCX: 0000000000000000
[ 8.768398][ T2733] RDX: 0000000000000000 RSI: ffff88812a326800 RDI: ffff888125940000
[ 8.768400][ T2733] RBP: 000000000000000d R08: 0000000000000004 R09: 00000002f1ee446d
[ 8.768402][ T2733] R10: 00646b636f6c6200 R11: ffffffffa0015f60 R12: ffff888125940000
[ 8.768404][ T2733] R13: 0000000000000000 R14: ffff88812a326800 R15: 0000000000000000
[ 8.768407][ T2733] FS: 0000000000000000(0000) GS:ffff88842fd00000(0063) knlGS:00000000f7d406c0
[ 8.768410][ T2733] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 8.768412][ T2733] CR2: 00000000f7d0315c CR3: 000000012a244000 CR4: 00000000000406f0
[ 8.768417][ T2733] Call Trace:
[ 8.769866][ T2733] <TASK>
[ 8.769875][ T2733] ? __die_body (arch/x86/kernel/dumpstack.c:421)
[ 8.769884][ T2733] ? die (arch/x86/kernel/dumpstack.c:? arch/x86/kernel/dumpstack.c:447)
[ 8.769888][ T2733] ? do_trap (arch/x86/kernel/traps.c:129)
[ 8.769893][ T2733] ? bd_finish_claiming (block/bdev.c:583)
[ 8.769898][ T2733] ? do_error_trap (arch/x86/kernel/traps.c:175)
[ 8.769902][ T2733] ? bd_finish_claiming (block/bdev.c:583)
[ 8.769905][ T2733] ? handle_invalid_op (arch/x86/kernel/traps.c:212)
[ 8.769909][ T2733] ? bd_finish_claiming (block/bdev.c:583)
[ 8.769912][ T2733] ? exc_invalid_op (arch/x86/kernel/traps.c:267)
[ 8.769917][ T2733] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 8.769926][ T2733] ? __pfx_sr_open (drivers/scsi/sr.c:593) sr_mod
[ 8.769932][ T2733] ? bd_finish_claiming (block/bdev.c:583)
[ 8.769936][ T2733] bdev_open (block/bdev.c:914)
[ 8.769940][ T2733] ? iput (fs/inode.c:1821)
[ 8.769945][ T2733] blkdev_open (block/fops.c:631)
[ 8.769949][ T2733] ? __pfx_blkdev_open (block/fops.c:610)
[ 8.769952][ T2733] do_dentry_open (fs/open.c:959)
[ 8.769958][ T2733] vfs_open (fs/open.c:1089)
[ 8.769962][ T2733] path_openat (fs/namei.c:3727)
[ 8.769966][ T2733] ? call_rcu (kernel/rcu/tree.c:2996)
[ 8.769972][ T2733] do_filp_open (fs/namei.c:3913)
[ 8.769978][ T2733] do_sys_openat2 (fs/open.c:1416)
[ 8.769982][ T2733] do_sys_open (fs/open.c:1431)
[ 8.769986][ T2733] do_int80_emulation (arch/x86/entry/common.c:?)
[ 8.769990][ T2733] ? irqentry_exit_to_user_mode (arch/x86/include/asm/processor.h:702 arch/x86/include/asm/entry-common.h:91 include/linux/entry-common.h:364 kernel/entry/common.c:233)
[ 8.769994][ T2733] asm_int80_emulation (arch/x86/include/asm/idtentry.h:626)
[ 8.769999][ T2733] RIP: 0023:0xf7f111b2
[ 8.770004][ T2733] Code: 89 c2 31 c0 89 d7 f3 aa 8b 44 24 1c 89 30 c6 40 04 00 83 c4 2c 89 f0 5b 5e 5f 5d c3 90 90 90 90 90 90 90 90 90 90 90 90 cd 80 <c3> 8d b6 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
All code
========
0: 89 c2 mov %eax,%edx
2: 31 c0 xor %eax,%eax
4: 89 d7 mov %edx,%edi
6: f3 aa rep stos %al,%es:(%rdi)
8: 8b 44 24 1c mov 0x1c(%rsp),%eax
c: 89 30 mov %esi,(%rax)
e: c6 40 04 00 movb $0x0,0x4(%rax)
12: 83 c4 2c add $0x2c,%esp
15: 89 f0 mov %esi,%eax
17: 5b pop %rbx
18: 5e pop %rsi
19: 5f pop %rdi
1a: 5d pop %rbp
1b: c3 ret
1c: 90 nop
1d: 90 nop
1e: 90 nop
1f: 90 nop
20: 90 nop
21: 90 nop
22: 90 nop
23: 90 nop
24: 90 nop
25: 90 nop
26: 90 nop
27: 90 nop
28: cd 80 int $0x80
2a:* c3 ret <-- trapping instruction
2b: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
31: 8d bc 27 00 00 00 00 lea 0x0(%rdi,%riz,1),%edi
38: 8b 1c 24 mov (%rsp),%ebx
3b: c3 ret
3c: 8d .byte 0x8d
3d: b6 00 mov $0x0,%dh
...
Code starting with the faulting instruction
===========================================
0: c3 ret
1: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
7: 8d bc 27 00 00 00 00 lea 0x0(%rdi,%riz,1),%edi
e: 8b 1c 24 mov (%rsp),%ebx
11: c3 ret
12: 8d .byte 0x8d
13: b6 00 mov $0x0,%dh
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240821/202408211439.954a6d41-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Mon, 19 Aug 2024 15:20:39 +1000 NeilBrown <neilb@suse.de>
>
> @@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> const struct blk_holder_ops *hops)
> {
> struct block_device *whole = bdev_whole(bdev);
> + int err = 0;
>
> if (WARN_ON_ONCE(!holder))
> return -EINVAL;
> -retry:
> - mutex_lock(&bdev_lock);
> - /* if someone else claimed, fail */
> - if (!bd_may_claim(bdev, holder, hops)) {
> - mutex_unlock(&bdev_lock);
> - return -EBUSY;
> - }
> -
> - /* if claiming is already in progress, wait for it to finish */
> - if (whole->bd_claiming) {
> - wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
> - DEFINE_WAIT(wait);
>
> - prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> - mutex_unlock(&bdev_lock);
> - schedule();
> - finish_wait(wq, &wait);
> - goto retry;
> - }
> + mutex_lock(&bdev_lock);
> + ___wait_var_event(&whole->bd_claiming,
> + (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> + TASK_UNINTERRUPTIBLE, 0, 0,
> + mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
>
At the first glance you add the coding pattern not recommended for the block
directory. Second, you abuse ___wait_var_event() simply because it is available.
> - /* yay, all mine */
> - whole->bd_claiming = holder;
> + /* if someone else claimed, fail */
> + if (!err)
> + /* yay, all mine */
> + whole->bd_claiming = holder;
> mutex_unlock(&bdev_lock);
> - return 0;
> + return err;
> }
> EXPORT_SYMBOL_GPL(bd_prepare_to_claim); /* only for the loop driver */
>
> @@ -571,7 +561,8 @@ static void bd_clear_claiming(struct block_device *whole, void *holder)
> /* tell others that we're done */
> BUG_ON(whole->bd_claiming != holder);
> whole->bd_claiming = NULL;
> - wake_up_bit(&whole->bd_claiming, 0);
> + smp_mb();
> + wake_up_var(&whole->bd_claiming);
Third, worse, you have no real idea why mb is needed.
> }
>
> /**
> --
> 2.44.0
On Mon, Aug 19, 2024 at 03:20:39PM +1000, NeilBrown wrote:
> bd_prepare_to_claim() current uses a bit waitqueue with a matching
> wake_up_bit() in bd_clear_claiming(). However it is really waiting on a
> "var", not a "bit".
>
> So change to wake_up_var(), and use ___wait_var_event() for the waiting.
> Using the triple-underscore version allows us to drop the mutex across
> the schedule() call.
....
> @@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> const struct blk_holder_ops *hops)
> {
> struct block_device *whole = bdev_whole(bdev);
> + int err = 0;
>
> if (WARN_ON_ONCE(!holder))
> return -EINVAL;
> -retry:
> - mutex_lock(&bdev_lock);
> - /* if someone else claimed, fail */
> - if (!bd_may_claim(bdev, holder, hops)) {
> - mutex_unlock(&bdev_lock);
> - return -EBUSY;
> - }
> -
> - /* if claiming is already in progress, wait for it to finish */
> - if (whole->bd_claiming) {
> - wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
> - DEFINE_WAIT(wait);
>
> - prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> - mutex_unlock(&bdev_lock);
> - schedule();
> - finish_wait(wq, &wait);
> - goto retry;
> - }
> + mutex_lock(&bdev_lock);
> + ___wait_var_event(&whole->bd_claiming,
> + (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> + TASK_UNINTERRUPTIBLE, 0, 0,
> + mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
That's not an improvement. Instead of nice, obvious, readable code,
I now have to go look at a macro and manually substitute the
parameters to work out what this abomination actually does.
-Dave.
--
Dave Chinner
david@fromorbit.com
On Tue, 20 Aug 2024, Dave Chinner wrote:
> On Mon, Aug 19, 2024 at 03:20:39PM +1000, NeilBrown wrote:
> > bd_prepare_to_claim() current uses a bit waitqueue with a matching
> > wake_up_bit() in bd_clear_claiming(). However it is really waiting on a
> > "var", not a "bit".
> >
> > So change to wake_up_var(), and use ___wait_var_event() for the waiting.
> > Using the triple-underscore version allows us to drop the mutex across
> > the schedule() call.
> ....
> > @@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> > const struct blk_holder_ops *hops)
> > {
> > struct block_device *whole = bdev_whole(bdev);
> > + int err = 0;
> >
> > if (WARN_ON_ONCE(!holder))
> > return -EINVAL;
> > -retry:
> > - mutex_lock(&bdev_lock);
> > - /* if someone else claimed, fail */
> > - if (!bd_may_claim(bdev, holder, hops)) {
> > - mutex_unlock(&bdev_lock);
> > - return -EBUSY;
> > - }
> > -
> > - /* if claiming is already in progress, wait for it to finish */
> > - if (whole->bd_claiming) {
> > - wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
> > - DEFINE_WAIT(wait);
> >
> > - prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > - mutex_unlock(&bdev_lock);
> > - schedule();
> > - finish_wait(wq, &wait);
> > - goto retry;
> > - }
> > + mutex_lock(&bdev_lock);
> > + ___wait_var_event(&whole->bd_claiming,
> > + (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> > + TASK_UNINTERRUPTIBLE, 0, 0,
> > + mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
>
> That's not an improvement. Instead of nice, obvious, readable code,
> I now have to go look at a macro and manually substitute the
> parameters to work out what this abomination actually does.
Interesting - I thought the function as a whole was more readable this
way.
I agree that the ___wait_var_event macro isn't the best part.
Is your dislike simply that it isn't a macro that you are familar with,
or is there something specific that you don't like?
Suppose we could add a new macro so that it read:
wait_var_event_mutex(&whole->bd_claiming,
(err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
&bdev_lock);
would that be less abominable?
Thanks,
NeilBrown
On Wed, Aug 21, 2024 at 07:52:39AM +1000, NeilBrown wrote:
> On Tue, 20 Aug 2024, Dave Chinner wrote:
> > On Mon, Aug 19, 2024 at 03:20:39PM +1000, NeilBrown wrote:
> > > bd_prepare_to_claim() current uses a bit waitqueue with a matching
> > > wake_up_bit() in bd_clear_claiming(). However it is really waiting on a
> > > "var", not a "bit".
> > >
> > > So change to wake_up_var(), and use ___wait_var_event() for the waiting.
> > > Using the triple-underscore version allows us to drop the mutex across
> > > the schedule() call.
> > ....
> > > @@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> > > const struct blk_holder_ops *hops)
> > > {
> > > struct block_device *whole = bdev_whole(bdev);
> > > + int err = 0;
> > >
> > > if (WARN_ON_ONCE(!holder))
> > > return -EINVAL;
> > > -retry:
> > > - mutex_lock(&bdev_lock);
> > > - /* if someone else claimed, fail */
> > > - if (!bd_may_claim(bdev, holder, hops)) {
> > > - mutex_unlock(&bdev_lock);
> > > - return -EBUSY;
> > > - }
> > > -
> > > - /* if claiming is already in progress, wait for it to finish */
> > > - if (whole->bd_claiming) {
> > > - wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
> > > - DEFINE_WAIT(wait);
> > >
> > > - prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > > - mutex_unlock(&bdev_lock);
> > > - schedule();
> > > - finish_wait(wq, &wait);
> > > - goto retry;
> > > - }
> > > + mutex_lock(&bdev_lock);
> > > + ___wait_var_event(&whole->bd_claiming,
> > > + (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> > > + TASK_UNINTERRUPTIBLE, 0, 0,
> > > + mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
> >
> > That's not an improvement. Instead of nice, obvious, readable code,
> > I now have to go look at a macro and manually substitute the
> > parameters to work out what this abomination actually does.
>
> Interesting - I thought the function as a whole was more readable this
> way.
> I agree that the ___wait_var_event macro isn't the best part.
> Is your dislike simply that it isn't a macro that you are familar with,
> or is there something specific that you don't like?
It's the encoding of non-trivial logic and code into the macro
parameters that is the problem....
> Suppose we could add a new macro so that it read:
>
> wait_var_event_mutex(&whole->bd_claiming,
> (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> &bdev_lock);
.... and this still does it.
In fact, it's worse, because now I have -zero idea- of what locking
is being performed in this case, and so now I definitely have to go
pull that macro apart to understand what this is actually doing.
Complex macros don't make understanding the code easier - they may
make writing the code faster, but that comes at the expense of
clarity and obviousness of the logic flow of the code...
-Dave.
--
Dave Chinner
david@fromorbit.com
On Wed, 28 Aug 2024, Dave Chinner wrote:
> On Wed, Aug 21, 2024 at 07:52:39AM +1000, NeilBrown wrote:
> > On Tue, 20 Aug 2024, Dave Chinner wrote:
> > > On Mon, Aug 19, 2024 at 03:20:39PM +1000, NeilBrown wrote:
> > > > bd_prepare_to_claim() current uses a bit waitqueue with a matching
> > > > wake_up_bit() in bd_clear_claiming(). However it is really waiting on a
> > > > "var", not a "bit".
> > > >
> > > > So change to wake_up_var(), and use ___wait_var_event() for the waiting.
> > > > Using the triple-underscore version allows us to drop the mutex across
> > > > the schedule() call.
> > > ....
> > > > @@ -535,33 +535,23 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> > > > const struct blk_holder_ops *hops)
> > > > {
> > > > struct block_device *whole = bdev_whole(bdev);
> > > > + int err = 0;
> > > >
> > > > if (WARN_ON_ONCE(!holder))
> > > > return -EINVAL;
> > > > -retry:
> > > > - mutex_lock(&bdev_lock);
> > > > - /* if someone else claimed, fail */
> > > > - if (!bd_may_claim(bdev, holder, hops)) {
> > > > - mutex_unlock(&bdev_lock);
> > > > - return -EBUSY;
> > > > - }
> > > > -
> > > > - /* if claiming is already in progress, wait for it to finish */
> > > > - if (whole->bd_claiming) {
> > > > - wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
> > > > - DEFINE_WAIT(wait);
> > > >
> > > > - prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > > > - mutex_unlock(&bdev_lock);
> > > > - schedule();
> > > > - finish_wait(wq, &wait);
> > > > - goto retry;
> > > > - }
> > > > + mutex_lock(&bdev_lock);
> > > > + ___wait_var_event(&whole->bd_claiming,
> > > > + (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> > > > + TASK_UNINTERRUPTIBLE, 0, 0,
> > > > + mutex_unlock(&bdev_lock); schedule(); mutex_lock(&bdev_lock));
> > >
> > > That's not an improvement. Instead of nice, obvious, readable code,
> > > I now have to go look at a macro and manually substitute the
> > > parameters to work out what this abomination actually does.
> >
> > Interesting - I thought the function as a whole was more readable this
> > way.
> > I agree that the ___wait_var_event macro isn't the best part.
> > Is your dislike simply that it isn't a macro that you are familar with,
> > or is there something specific that you don't like?
>
> It's the encoding of non-trivial logic and code into the macro
> parameters that is the problem....
It would probably make sense to move all the logic into bd_may_claim()
so that it returns:
-EBUSY if claim cannot succeed
-EAGAIN if claim might succeed soon, or
0 if it can be claimed now.
Then the wait becomes:
wait_var_event_mutex(&whole->bd_claiming,
bd_may_claim(bdev, holder, hops) != -EAGAIN,
&bdev_lock);
>
> > Suppose we could add a new macro so that it read:
> >
> > wait_var_event_mutex(&whole->bd_claiming,
> > (err = bd_may_claim(bdev, holder, hops)) != 0 || !whole->bd_claiming,
> > &bdev_lock);
>
> .... and this still does it.
>
> In fact, it's worse, because now I have -zero idea- of what locking
> is being performed in this case, and so now I definitely have to go
> pull that macro apart to understand what this is actually doing.
>
> Complex macros don't make understanding the code easier - they may
> make writing the code faster, but that comes at the expense of
> clarity and obviousness of the logic flow of the code...
I think that SIMPLE macros rarely make the code easier to understand -
for precisely the reason that you have to go and find out what the macro
actually does.
Complex macros obviously suffer the same problem but I believe they
bring tangible benefits by making review easier for those who understand
the macros, and consequently reducing bugs.
I'm currently particularly sensitive to this since finding that the
open-coded wait loop in pkt_make_request_write() - which I wrote - is
missing a finish_wait() call. Ouch. If there had been a
wait_var_event_spinlock() when I wrote that code, the mistake would not
have happened.
The argument about locking being non-obvious is, I think, doubly true
for wait_on_bit_lock(). But that is still a useful interface.
Thanks,
NeilBrown
© 2016 - 2026 Red Hat, Inc.