fs/dlm/lockspace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
do_uevent returns the value written to event_done. In case it is a positive
value, new_lockspace would undo all the work, and lockspace would not be
set. __dlm_new_lockspace, however, would treat that positive value as a
success due to commit 8511a2728ab8 ("dlm: fix use count with multiple
joins").
Down the line, device_create_lockspace would pass that NULL lockspace to
dlm_find_lockspace_local, leading to a NULL pointer dereference:
[ 1130.159339] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 1130.160015] #PF: supervisor write access in kernel mode
[ 1130.160242] #PF: error_code(0x0002) - not-present page
[ 1130.160242] PGD 0 P4D 0
[ 1130.160242] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 1130.160242] CPU: 4 UID: 0 PID: 213 Comm: dlm Not tainted 6.13.0-rc3-00077-gcbfcf8f40ff5 #318 0df1a232d58f3206f4f481ddfc0108c8b3da89be
[ 1130.160242] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 1130.160242] RIP: 0010:dlm_find_lockspace_local+0x8/0x20
[ 1130.160242] Code: 75 f3 f0 ff 43 20 eb 02 31 db 48 c7 c7 50 d6 85 b0 e8 4c aa 58 01 48 89 d8 5b 5d c3 cc cc cc cc 66 90 0f 1f 44 00 00 48 89 f8 <f0> ff 47 20 c3 cc cc cc cc 66 66 66 66 66 66 2e 0f 1f 84 00 00 00
[ 1130.160242] RSP: 0018:ffffa4bbc0637b08 EFLAGS: 00010246
[ 1130.160242] RAX: 0000000000000000 RBX: ffff8c8fc3cbab18 RCX: 0000000000000000
[ 1130.160242] RDX: 0000000000000006 RSI: ffffffffaeb012ce RDI: 0000000000000000
[ 1130.160242] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 1130.160242] R10: 0000000000000000 R11: ffffffffac61e550 R12: ffff8c8fc34b5000
[ 1130.160242] R13: 000000000000007d R14: 0000000000000000 R15: ffff8c8fc3cbab00
[ 1130.160242] FS: 000078790ad92740(0000) GS:ffff8c903dd00000(0000) knlGS:0000000000000000
[ 1130.160242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1130.160242] CR2: 0000000000000020 CR3: 000000000518a001 CR4: 0000000000770eb0
[ 1130.160242] PKRU: 55555554
[ 1130.160242] Call Trace:
[ 1130.160242] <TASK>
[ 1130.160242] ? __die_body+0x64/0xb0
[ 1130.160242] ? page_fault_oops+0x3eb/0x490
[ 1130.160242] ? exc_page_fault+0x4f2/0x6d0
[ 1130.160242] ? asm_exc_page_fault+0x22/0x30
[ 1130.160242] ? stack_trace_save+0x70/0x70
[ 1130.160242] ? dlm_find_lockspace_local+0x8/0x20
[ 1130.160242] device_create_lockspace+0x7c/0x180
[ 1130.160242] device_write+0x252/0x310
[ 1130.160242] vfs_write+0xe3/0x350
[ 1130.160242] ksys_write+0x74/0xe0
[ 1130.160242] do_syscall_64+0x87/0x100
[ 1130.160242] ? _raw_spin_unlock_irqrestore+0x3d/0x60
[ 1130.160242] ? __slab_free+0x2b6/0x320
[ 1130.160242] ? do_sys_openat2+0xae/0xe0
[ 1130.160242] ? kmem_cache_free+0x146/0x380
[ 1130.160242] ? do_syscall_64+0x93/0x100
[ 1130.160242] ? lockdep_hardirqs_on+0x95/0x140
[ 1130.160242] ? syscall_exit_to_user_mode+0x1f0/0x270
[ 1130.160242] ? do_syscall_64+0x93/0x100
[ 1130.160242] ? do_pte_missing+0xfd/0x1410
[ 1130.160242] ? do_pte_missing+0x1e7/0x1410
[ 1130.160242] ? handle_mm_fault+0x7ad/0xa30
[ 1130.160242] ? reacquire_held_locks+0x124/0x1c0
[ 1130.160242] ? vma_end_read+0x12/0xe0
[ 1130.160242] ? exc_page_fault+0x49e/0x6d0
[ 1130.160242] ? lockdep_hardirqs_on+0x95/0x140
[ 1130.160242] ? irqentry_exit_to_user_mode+0x1a7/0x1f0
[ 1130.160242] ? exc_page_fault+0x49e/0x6d0
[ 1130.160242] entry_SYSCALL_64_after_hwframe+0x55/0x5d
[ 1130.160242] RIP: 0033:0x78790aeb0214
[ 1130.160242] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 55 b1 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
[ 1130.160242] RSP: 002b:00007ffc0309b5e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 1130.160242] RAX: ffffffffffffffda RBX: 00007ffc0309b7c8 RCX: 000078790aeb0214
[ 1130.160242] RDX: 000000000000007d RSI: 00007ffc0309b610 RDI: 0000000000000003
[ 1130.160242] RBP: 00007ffc0309b6b0 R08: 0000000000000000 R09: 000078790afd1210
[ 1130.160242] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 1130.160242] R13: 00007ffc0309b7d8 R14: 000056989ab4bd98 R15: 000078790b003000
[ 1130.160242] </TASK>
[ 1130.160242] Modules linked in: mousedev
[ 1130.160242] CR2: 0000000000000020
[ 1130.160242] ---[ end trace 0000000000000000 ]---
[ 1130.160242] RIP: 0010:dlm_find_lockspace_local+0x8/0x20
[ 1130.160242] Code: 75 f3 f0 ff 43 20 eb 02 31 db 48 c7 c7 50 d6 85 b0 e8 4c aa 58 01 48 89 d8 5b 5d c3 cc cc cc cc 66 90 0f 1f 44 00 00 48 89 f8 <f0> ff 47 20 c3 cc cc cc cc 66 66 66 66 66 66 2e 0f 1f 84 00 00 00
[ 1130.160242] RSP: 0018:ffffa4bbc0637b08 EFLAGS: 00010246
[ 1130.160242] RAX: 0000000000000000 RBX: ffff8c8fc3cbab18 RCX: 0000000000000000
[ 1130.160242] RDX: 0000000000000006 RSI: ffffffffaeb012ce RDI: 0000000000000000
[ 1130.160242] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 1130.160242] R10: 0000000000000000 R11: ffffffffac61e550 R12: ffff8c8fc34b5000
[ 1130.160242] R13: 000000000000007d R14: 0000000000000000 R15: ffff8c8fc3cbab00
[ 1130.160242] FS: 000078790ad92740(0000) GS:ffff8c903dd00000(0000) knlGS:0000000000000000
[ 1130.160242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1130.160242] CR2: 0000000000000020 CR3: 000000000518a001 CR4: 0000000000770eb0
[ 1130.160242] PKRU: 55555554
[ 1130.160242] Kernel panic - not syncing: Fatal exception
[ 1130.160242] Kernel Offset: 0x2b400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Treating such positive values as successes prevents the problem. Given this
has been broken for so long, this is unlikely to break userspace
expectations.
Fixes: 8511a2728ab8 ("dlm: fix use count with multiple joins")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
fs/dlm/lockspace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c
index 8afac6e2dff0..1929327ffbe1 100644
--- a/fs/dlm/lockspace.c
+++ b/fs/dlm/lockspace.c
@@ -576,7 +576,7 @@ static int new_lockspace(const char *name, const char *cluster,
lockspace to start running (via sysfs) in dlm_ls_start(). */
error = do_uevent(ls, 1);
- if (error)
+ if (error < 0)
goto out_recoverd;
/* wait until recovery is successful or failed */
--
2.34.1
Hi,
On Fri, Dec 20, 2024 at 9:28 AM Thadeu Lima de Souza Cascardo
<cascardo@igalia.com> wrote:
>
> do_uevent returns the value written to event_done. In case it is a positive
> value, new_lockspace would undo all the work, and lockspace would not be
> set. __dlm_new_lockspace, however, would treat that positive value as a
> success due to commit 8511a2728ab8 ("dlm: fix use count with multiple
> joins").
>
> Down the line, device_create_lockspace would pass that NULL lockspace to
> dlm_find_lockspace_local, leading to a NULL pointer dereference:
>
This issue starts to be an issue since commit 4db41bf4f04f ("dlm:
remove ls_local_handle from struct dlm_ls"). Before
dlm_find_lockspace_local() was then returning NULL and
device_create_lockspace() returned -ENOENT as it wasn't found a NULL
lockspace handle, see [0].
After the patch it dlm_find_lockspace_local() will end in a
null-pointer dereference because we don't do this lookup of pointers
when we already should have this pointer already.
This behaviour is kind of weird (as it makes no sense to lookup NULL)
and yes, I didn't see that when I wrote commit 4db41bf4f04f ("dlm:
remove ls_local_handle from struct dlm_ls"). Just this commit signals
us we do something stupid with the error handling.
May I ask you which user space software do you use? As for
dlm_controld I can't see that there is a positive number case for
telling the event_done attribute. [1]
Your patch is fine anyway as a positive event_done should not be
handled as an error (so far I can say that from the current DLM UAPI
spec). We might need to review the whole error handling here again as
the caller is only interested if it failed or not. Even kernel
lockspaces clear positive error codes to zero whereas user space (our
case) doesn't do that.
- Alex
[0] https://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git/tree/fs/dlm/user.c?h=dlm-6.9#n431
[1] https://pagure.io/dlm/blob/main/f/dlm_controld/cpg.c#_1811
On Mon, Jan 06, 2025 at 11:14:35AM -0500, Alexander Aring wrote:
> Hi,
>
> On Fri, Dec 20, 2024 at 9:28 AM Thadeu Lima de Souza Cascardo
> <cascardo@igalia.com> wrote:
> >
> > do_uevent returns the value written to event_done. In case it is a positive
> > value, new_lockspace would undo all the work, and lockspace would not be
> > set. __dlm_new_lockspace, however, would treat that positive value as a
> > success due to commit 8511a2728ab8 ("dlm: fix use count with multiple
> > joins").
> >
> > Down the line, device_create_lockspace would pass that NULL lockspace to
> > dlm_find_lockspace_local, leading to a NULL pointer dereference:
> >
>
> This issue starts to be an issue since commit 4db41bf4f04f ("dlm:
> remove ls_local_handle from struct dlm_ls"). Before
> dlm_find_lockspace_local() was then returning NULL and
> device_create_lockspace() returned -ENOENT as it wasn't found a NULL
> lockspace handle, see [0].
> After the patch it dlm_find_lockspace_local() will end in a
> null-pointer dereference because we don't do this lookup of pointers
> when we already should have this pointer already.
> This behaviour is kind of weird (as it makes no sense to lookup NULL)
> and yes, I didn't see that when I wrote commit 4db41bf4f04f ("dlm:
> remove ls_local_handle from struct dlm_ls"). Just this commit signals
> us we do something stupid with the error handling.
>
> May I ask you which user space software do you use? As for
> dlm_controld I can't see that there is a positive number case for
> telling the event_done attribute. [1]
I was just playing with it manually, that is, without dlm_controld. I was
writing down into configfs and sysfs manually while issuing ioctls and then
tripped over this NULL pointer dereference.
I see how commit 4db41bf4f04f ("dlm: remove ls_local_handle from struct
dlm_ls") leads to the NPD. In any case, I suppose the Fixes line could
either be the one I pointed to (since that is what leads to such positive
return values to undo lockspace creation and still be interpreted as
success) or 4db41bf4f04f, specially if we want to restrict the backports to
only the kernels with this latter commit.
Regards.
Cascardo.
>
> Your patch is fine anyway as a positive event_done should not be
> handled as an error (so far I can say that from the current DLM UAPI
> spec). We might need to review the whole error handling here again as
> the caller is only interested if it failed or not. Even kernel
> lockspaces clear positive error codes to zero whereas user space (our
> case) doesn't do that.
>
> - Alex
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git/tree/fs/dlm/user.c?h=dlm-6.9#n431
> [1] https://pagure.io/dlm/blob/main/f/dlm_controld/cpg.c#_1811
>
Hi,
On Tue, Jan 7, 2025 at 1:19 PM Thadeu Lima de Souza Cascardo
<cascardo@igalia.com> wrote:
>
> On Mon, Jan 06, 2025 at 11:14:35AM -0500, Alexander Aring wrote:
> > Hi,
> >
> > On Fri, Dec 20, 2024 at 9:28 AM Thadeu Lima de Souza Cascardo
> > <cascardo@igalia.com> wrote:
> > >
> > > do_uevent returns the value written to event_done. In case it is a positive
> > > value, new_lockspace would undo all the work, and lockspace would not be
> > > set. __dlm_new_lockspace, however, would treat that positive value as a
> > > success due to commit 8511a2728ab8 ("dlm: fix use count with multiple
> > > joins").
> > >
> > > Down the line, device_create_lockspace would pass that NULL lockspace to
> > > dlm_find_lockspace_local, leading to a NULL pointer dereference:
> > >
> >
> > This issue starts to be an issue since commit 4db41bf4f04f ("dlm:
> > remove ls_local_handle from struct dlm_ls"). Before
> > dlm_find_lockspace_local() was then returning NULL and
> > device_create_lockspace() returned -ENOENT as it wasn't found a NULL
> > lockspace handle, see [0].
> > After the patch it dlm_find_lockspace_local() will end in a
> > null-pointer dereference because we don't do this lookup of pointers
> > when we already should have this pointer already.
> > This behaviour is kind of weird (as it makes no sense to lookup NULL)
> > and yes, I didn't see that when I wrote commit 4db41bf4f04f ("dlm:
> > remove ls_local_handle from struct dlm_ls"). Just this commit signals
> > us we do something stupid with the error handling.
> >
> > May I ask you which user space software do you use? As for
> > dlm_controld I can't see that there is a positive number case for
> > telling the event_done attribute. [1]
>
> I was just playing with it manually, that is, without dlm_controld. I was
> writing down into configfs and sysfs manually while issuing ioctls and then
> tripped over this NULL pointer dereference.
>
ok, thanks... I am worried now you probably see more issues. :)
> I see how commit 4db41bf4f04f ("dlm: remove ls_local_handle from struct
> dlm_ls") leads to the NPD. In any case, I suppose the Fixes line could
> either be the one I pointed to (since that is what leads to such positive
> return values to undo lockspace creation and still be interpreted as
> success) or 4db41bf4f04f, specially if we want to restrict the backports to
> only the kernels with this latter commit.
>
yes, it was already broken before. I agree.
Acked-by: Alexander Aring <aahringo@redhat.com>
- Alex
On Wed, Jan 08, 2025 at 09:27:31AM -0500, Alexander Aring wrote:
> Hi,
>
> On Tue, Jan 7, 2025 at 1:19 PM Thadeu Lima de Souza Cascardo
> <cascardo@igalia.com> wrote:
> >
> > On Mon, Jan 06, 2025 at 11:14:35AM -0500, Alexander Aring wrote:
> > > Hi,
> > >
> > > On Fri, Dec 20, 2024 at 9:28 AM Thadeu Lima de Souza Cascardo
> > > <cascardo@igalia.com> wrote:
> > > >
> > > > do_uevent returns the value written to event_done. In case it is a positive
> > > > value, new_lockspace would undo all the work, and lockspace would not be
> > > > set. __dlm_new_lockspace, however, would treat that positive value as a
> > > > success due to commit 8511a2728ab8 ("dlm: fix use count with multiple
> > > > joins").
> > > >
> > > > Down the line, device_create_lockspace would pass that NULL lockspace to
> > > > dlm_find_lockspace_local, leading to a NULL pointer dereference:
> > > >
> > >
> > > This issue starts to be an issue since commit 4db41bf4f04f ("dlm:
> > > remove ls_local_handle from struct dlm_ls"). Before
> > > dlm_find_lockspace_local() was then returning NULL and
> > > device_create_lockspace() returned -ENOENT as it wasn't found a NULL
> > > lockspace handle, see [0].
> > > After the patch it dlm_find_lockspace_local() will end in a
> > > null-pointer dereference because we don't do this lookup of pointers
> > > when we already should have this pointer already.
> > > This behaviour is kind of weird (as it makes no sense to lookup NULL)
> > > and yes, I didn't see that when I wrote commit 4db41bf4f04f ("dlm:
> > > remove ls_local_handle from struct dlm_ls"). Just this commit signals
> > > us we do something stupid with the error handling.
> > >
> > > May I ask you which user space software do you use? As for
> > > dlm_controld I can't see that there is a positive number case for
> > > telling the event_done attribute. [1]
> >
> > I was just playing with it manually, that is, without dlm_controld. I was
> > writing down into configfs and sysfs manually while issuing ioctls and then
> > tripped over this NULL pointer dereference.
> >
>
> ok, thanks... I am worried now you probably see more issues. :)
>
I will try to send fixes if I find anything else. ;-)
Thanks for the review.
Cascardo.
> > I see how commit 4db41bf4f04f ("dlm: remove ls_local_handle from struct
> > dlm_ls") leads to the NPD. In any case, I suppose the Fixes line could
> > either be the one I pointed to (since that is what leads to such positive
> > return values to undo lockspace creation and still be interpreted as
> > success) or 4db41bf4f04f, specially if we want to restrict the backports to
> > only the kernels with this latter commit.
> >
>
> yes, it was already broken before. I agree.
>
> Acked-by: Alexander Aring <aahringo@redhat.com>
>
> - Alex
>
>
© 2016 - 2025 Red Hat, Inc.