fs/nfsd/nfs4callback.c | 108 ++++++++++++++++++++++++++++++++++--------------- fs/nfsd/nfs4state.c | 17 +++++--- fs/nfsd/state.h | 19 ++++----- fs/nfsd/trace.h | 2 +- 4 files changed, 98 insertions(+), 48 deletions(-)
A few more minor updates to the set to fix some small-ish bugs, and do a bit of cleanup. This seems to test OK for me so far. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- Changes in v3: - add patch to convert se_flags to single se_dead bool - fix off-by-one bug in handling of NFSD_BC_SLOT_TABLE_MAX - don't reject target highest slot value of 0 - Link to v2: https://lore.kernel.org/r/20241029-bcwide-v2-1-e9010b6ef55d@kernel.org Changes in v2: - take cl_lock when fetching fields from session to be encoded - use fls() instead of bespoke highest_unset_index() - rename variables in several functions with more descriptive names - clamp limit of for loop in update_cb_slot_table() - re-add missing rpc_wake_up_queued_task() call - fix slotid check in decode_cb_sequence4resok() - add new per-session spinlock --- Jeff Layton (2): nfsd: make nfsd4_session->se_flags a bool nfsd: allow for up to 32 callback session slots fs/nfsd/nfs4callback.c | 108 ++++++++++++++++++++++++++++++++++--------------- fs/nfsd/nfs4state.c | 17 +++++--- fs/nfsd/state.h | 19 ++++----- fs/nfsd/trace.h | 2 +- 4 files changed, 98 insertions(+), 48 deletions(-) --- base-commit: 06c049d2a81a81f01ff072c6519d0c38b646b550 change-id: 20241025-bcwide-6bd7e4b63db2 Best regards, -- Jeff Layton <jlayton@kernel.org>
On Wed, Oct 30, 2024 at 3:50 PM Jeff Layton <jlayton@kernel.org> wrote: > > A few more minor updates to the set to fix some small-ish bugs, and do a > bit of cleanup. This seems to test OK for me so far. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > Changes in v3: > - add patch to convert se_flags to single se_dead bool > - fix off-by-one bug in handling of NFSD_BC_SLOT_TABLE_MAX > - don't reject target highest slot value of 0 > - Link to v2: https://lore.kernel.org/r/20241029-bcwide-v2-1-e9010b6ef55d@kernel.org > > Changes in v2: > - take cl_lock when fetching fields from session to be encoded > - use fls() instead of bespoke highest_unset_index() > - rename variables in several functions with more descriptive names > - clamp limit of for loop in update_cb_slot_table() > - re-add missing rpc_wake_up_queued_task() call > - fix slotid check in decode_cb_sequence4resok() > - add new per-session spinlock > What does a NFSv4.1 client need to do to be compatible with this change? Sebi -- Sebastian Feld - IT secruity expert
From: Chuck Lever <chuck.lever@oracle.com> On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote: > A few more minor updates to the set to fix some small-ish bugs, and do a > bit of cleanup. This seems to test OK for me so far. > > Applied to nfsd-next for v6.13, thanks! Still open for comments and test results. [1/2] nfsd: make nfsd4_session->se_flags a bool commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816 [2/2] nfsd: allow for up to 32 callback session slots commit: 6c8910ac1cd360ea01136d707158690b5159a1d0 -- Chuck Lever
Hi Jeff/Chuck, Hitting the following softlockup when running using nfsd-next code. testing is same open bunch of file get delegations, do local conflicting operation. Network trace shows a few cb_recalls occurring successfully before the soft lockup (I can confirm that more than 1 slot was used. But I also see that the server isn't trying to use the lowest available slot but instead just bumps the number and uses the next one. By that I mean, say slot 0 was used and a reply came back but the next callback would use slot 1 instead of slot 0). [ 344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/u24:28:205] [ 344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg fuse [ 344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump: loaded Not tainted 6.12.0-rc4+ #42 [ 344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023 [ 344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc] [ 344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 344.051821] pc : kasan_check_range+0x0/0x188 [ 344.052011] lr : __kasan_check_write+0x1c/0x28 [ 344.052208] sp : ffff800087027920 [ 344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170 [ 344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0 [ 344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a [ 344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000 [ 344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d [ 344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f [ 344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000 [ 344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001 [ 344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc [ 344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172 [ 344.056479] Call trace: [ 344.056636] kasan_check_range+0x0/0x188 [ 344.056886] queued_spin_lock_slowpath+0x5f4/0xaa0 [ 344.057192] _raw_spin_lock+0x180/0x1a8 [ 344.057436] rpc_sleep_on+0x78/0xe8 [sunrpc] [ 344.057700] nfsd4_cb_prepare+0x15c/0x468 [nfsd] [ 344.057935] rpc_prepare_task+0x70/0xa0 [sunrpc] [ 344.058165] __rpc_execute+0x1e8/0xa48 [sunrpc] [ 344.058388] rpc_async_schedule+0x90/0x100 [sunrpc] [ 344.058623] process_one_work+0x598/0x1100 [ 344.058818] worker_thread+0x6c0/0xa58 [ 344.058992] kthread+0x288/0x310 [ 344.059145] ret_from_fork+0x10/0x20 [ 344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/u24:27:204] [ 344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg fuse [ 344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump: loaded Tainted: G L 6.12.0-rc4+ #42 [ 344.080290] Tainted: [L]=SOFTLOCKUP [ 344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023 [ 344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc] [ 344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 344.081630] pc : _raw_spin_lock+0x108/0x1a8 [ 344.081815] lr : _raw_spin_lock+0xf4/0x1a8 [ 344.081998] sp : ffff800087017a30 [ 344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081 [ 344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40 [ 344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50 [ 344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000 [ 344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10 [ 344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b [ 344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000 [ 344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001 [ 344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154 [ 344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 [ 344.085915] Call trace: [ 344.086028] _raw_spin_lock+0x108/0x1a8 [ 344.086210] rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc] [ 344.086465] nfsd4_cb_prepare+0x168/0x468 [nfsd] [ 344.086694] rpc_prepare_task+0x70/0xa0 [sunrpc] [ 344.086922] __rpc_execute+0x1e8/0xa48 [sunrpc] [ 344.087148] rpc_async_schedule+0x90/0x100 [sunrpc] [ 344.087389] process_one_work+0x598/0x1100 [ 344.087584] worker_thread+0x6c0/0xa58 [ 344.087758] kthread+0x288/0x310 [ 344.087909] ret_from_fork+0x10/0x20 On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote: > > A few more minor updates to the set to fix some small-ish bugs, and do a > > bit of cleanup. This seems to test OK for me so far. > > > > > > Applied to nfsd-next for v6.13, thanks! Still open for comments and > test results. > > [1/2] nfsd: make nfsd4_session->se_flags a bool > commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816 > [2/2] nfsd: allow for up to 32 callback session slots > commit: 6c8910ac1cd360ea01136d707158690b5159a1d0 > > -- > Chuck Lever > >
On Tue, 2024-11-05 at 17:08 -0500, Olga Kornievskaia wrote: > Hi Jeff/Chuck, > > Hitting the following softlockup when running using nfsd-next code. > testing is same open bunch of file get delegations, do local > conflicting operation. Network trace shows a few cb_recalls occurring > successfully before the soft lockup (I can confirm that more than 1 > slot was used. But I also see that the server isn't trying to use the > lowest available slot but instead just bumps the number and uses the > next one. By that I mean, say slot 0 was used and a reply came back > but the next callback would use slot 1 instead of slot 0). > If the slots are being consumed and not released then that's what you'd see. The question is why those slots aren't being released. Did the client return a SEQUENCE error on some of those callbacks? It looks like the slot doesn't always get released if that occurs, so that might be one possibility. > [ 344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! > [kworker/u24:28:205] > [ 344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > fuse > [ 344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump: > loaded Not tainted 6.12.0-rc4+ #42 > [ 344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > [ 344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc] > [ 344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > [ 344.051821] pc : kasan_check_range+0x0/0x188 > [ 344.052011] lr : __kasan_check_write+0x1c/0x28 > [ 344.052208] sp : ffff800087027920 > [ 344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170 > [ 344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0 > [ 344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a > [ 344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000 > [ 344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d > [ 344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f > [ 344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000 > [ 344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001 > [ 344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc > [ 344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172 > [ 344.056479] Call trace: > [ 344.056636] kasan_check_range+0x0/0x188 > [ 344.056886] queued_spin_lock_slowpath+0x5f4/0xaa0 > [ 344.057192] _raw_spin_lock+0x180/0x1a8 > [ 344.057436] rpc_sleep_on+0x78/0xe8 [sunrpc] > [ 344.057700] nfsd4_cb_prepare+0x15c/0x468 [nfsd] > [ 344.057935] rpc_prepare_task+0x70/0xa0 [sunrpc] > [ 344.058165] __rpc_execute+0x1e8/0xa48 [sunrpc] > [ 344.058388] rpc_async_schedule+0x90/0x100 [sunrpc] > [ 344.058623] process_one_work+0x598/0x1100 > [ 344.058818] worker_thread+0x6c0/0xa58 > [ 344.058992] kthread+0x288/0x310 > [ 344.059145] ret_from_fork+0x10/0x20 > [ 344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! > [kworker/u24:27:204] > [ 344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > fuse > [ 344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump: > loaded Tainted: G L 6.12.0-rc4+ #42 > [ 344.080290] Tainted: [L]=SOFTLOCKUP > [ 344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > [ 344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc] > [ 344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > [ 344.081630] pc : _raw_spin_lock+0x108/0x1a8 > [ 344.081815] lr : _raw_spin_lock+0xf4/0x1a8 > [ 344.081998] sp : ffff800087017a30 > [ 344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081 > [ 344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40 > [ 344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50 > [ 344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000 > [ 344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10 > [ 344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b > [ 344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000 > [ 344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001 > [ 344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154 > [ 344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 > [ 344.085915] Call trace: > [ 344.086028] _raw_spin_lock+0x108/0x1a8 > [ 344.086210] rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc] > [ 344.086465] nfsd4_cb_prepare+0x168/0x468 [nfsd] > [ 344.086694] rpc_prepare_task+0x70/0xa0 [sunrpc] > [ 344.086922] __rpc_execute+0x1e8/0xa48 [sunrpc] > [ 344.087148] rpc_async_schedule+0x90/0x100 [sunrpc] > [ 344.087389] process_one_work+0x598/0x1100 > [ 344.087584] worker_thread+0x6c0/0xa58 > [ 344.087758] kthread+0x288/0x310 > [ 344.087909] ret_from_fork+0x10/0x20 > > On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote: > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote: > > > A few more minor updates to the set to fix some small-ish bugs, and do a > > > bit of cleanup. This seems to test OK for me so far. > > > > > > > > > > Applied to nfsd-next for v6.13, thanks! Still open for comments and > > test results. > > > > [1/2] nfsd: make nfsd4_session->se_flags a bool > > commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816 > > [2/2] nfsd: allow for up to 32 callback session slots > > commit: 6c8910ac1cd360ea01136d707158690b5159a1d0 > > > > -- > > Chuck Lever > > > > -- Jeff Layton <jlayton@kernel.org>
On Tue, Nov 5, 2024 at 5:27 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Tue, 2024-11-05 at 17:08 -0500, Olga Kornievskaia wrote: > > Hi Jeff/Chuck, > > > > Hitting the following softlockup when running using nfsd-next code. > > testing is same open bunch of file get delegations, do local > > conflicting operation. Network trace shows a few cb_recalls occurring > > successfully before the soft lockup (I can confirm that more than 1 > > slot was used. But I also see that the server isn't trying to use the > > lowest available slot but instead just bumps the number and uses the > > next one. By that I mean, say slot 0 was used and a reply came back > > but the next callback would use slot 1 instead of slot 0). > > > > If the slots are being consumed and not released then that's what you'd > see. The question is why those slots aren't being released. > > Did the client return a SEQUENCE error on some of those callbacks? It > looks like the slot doesn't always get released if that occurs, so that > might be one possibility. No sequence errors. CB_SEQUENCE and CB_RECALL replies are all successful. > > [ 344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! > > [kworker/u24:28:205] > > [ 344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > > fuse > > [ 344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump: > > loaded Not tainted 6.12.0-rc4+ #42 > > [ 344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > > [ 344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc] > > [ 344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > [ 344.051821] pc : kasan_check_range+0x0/0x188 > > [ 344.052011] lr : __kasan_check_write+0x1c/0x28 > > [ 344.052208] sp : ffff800087027920 > > [ 344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170 > > [ 344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0 > > [ 344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a > > [ 344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000 > > [ 344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d > > [ 344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f > > [ 344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000 > > [ 344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001 > > [ 344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc > > [ 344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172 > > [ 344.056479] Call trace: > > [ 344.056636] kasan_check_range+0x0/0x188 > > [ 344.056886] queued_spin_lock_slowpath+0x5f4/0xaa0 > > [ 344.057192] _raw_spin_lock+0x180/0x1a8 > > [ 344.057436] rpc_sleep_on+0x78/0xe8 [sunrpc] > > [ 344.057700] nfsd4_cb_prepare+0x15c/0x468 [nfsd] > > [ 344.057935] rpc_prepare_task+0x70/0xa0 [sunrpc] > > [ 344.058165] __rpc_execute+0x1e8/0xa48 [sunrpc] > > [ 344.058388] rpc_async_schedule+0x90/0x100 [sunrpc] > > [ 344.058623] process_one_work+0x598/0x1100 > > [ 344.058818] worker_thread+0x6c0/0xa58 > > [ 344.058992] kthread+0x288/0x310 > > [ 344.059145] ret_from_fork+0x10/0x20 > > [ 344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! > > [kworker/u24:27:204] > > [ 344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > > fuse > > [ 344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump: > > loaded Tainted: G L 6.12.0-rc4+ #42 > > [ 344.080290] Tainted: [L]=SOFTLOCKUP > > [ 344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > > [ 344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc] > > [ 344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > [ 344.081630] pc : _raw_spin_lock+0x108/0x1a8 > > [ 344.081815] lr : _raw_spin_lock+0xf4/0x1a8 > > [ 344.081998] sp : ffff800087017a30 > > [ 344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081 > > [ 344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40 > > [ 344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50 > > [ 344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000 > > [ 344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10 > > [ 344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b > > [ 344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000 > > [ 344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001 > > [ 344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154 > > [ 344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 > > [ 344.085915] Call trace: > > [ 344.086028] _raw_spin_lock+0x108/0x1a8 > > [ 344.086210] rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc] > > [ 344.086465] nfsd4_cb_prepare+0x168/0x468 [nfsd] > > [ 344.086694] rpc_prepare_task+0x70/0xa0 [sunrpc] > > [ 344.086922] __rpc_execute+0x1e8/0xa48 [sunrpc] > > [ 344.087148] rpc_async_schedule+0x90/0x100 [sunrpc] > > [ 344.087389] process_one_work+0x598/0x1100 > > [ 344.087584] worker_thread+0x6c0/0xa58 > > [ 344.087758] kthread+0x288/0x310 > > [ 344.087909] ret_from_fork+0x10/0x20 > > > > On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote: > > > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote: > > > > A few more minor updates to the set to fix some small-ish bugs, and do a > > > > bit of cleanup. This seems to test OK for me so far. > > > > > > > > > > > > > > Applied to nfsd-next for v6.13, thanks! Still open for comments and > > > test results. > > > > > > [1/2] nfsd: make nfsd4_session->se_flags a bool > > > commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816 > > > [2/2] nfsd: allow for up to 32 callback session slots > > > commit: 6c8910ac1cd360ea01136d707158690b5159a1d0 > > > > > > -- > > > Chuck Lever > > > > > > > > -- > Jeff Layton <jlayton@kernel.org>
On Tue, 2024-11-05 at 17:40 -0500, Olga Kornievskaia wrote: > On Tue, Nov 5, 2024 at 5:27 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > On Tue, 2024-11-05 at 17:08 -0500, Olga Kornievskaia wrote: > > > Hi Jeff/Chuck, > > > > > > Hitting the following softlockup when running using nfsd-next code. > > > testing is same open bunch of file get delegations, do local > > > conflicting operation. Network trace shows a few cb_recalls occurring > > > successfully before the soft lockup (I can confirm that more than 1 > > > slot was used. But I also see that the server isn't trying to use the > > > lowest available slot but instead just bumps the number and uses the > > > next one. By that I mean, say slot 0 was used and a reply came back > > > but the next callback would use slot 1 instead of slot 0). > > > > > > > If the slots are being consumed and not released then that's what you'd > > see. The question is why those slots aren't being released. > > > > Did the client return a SEQUENCE error on some of those callbacks? It > > looks like the slot doesn't always get released if that occurs, so that > > might be one possibility. > > No sequence errors. CB_SEQUENCE and CB_RECALL replies are all successful. > Nevermind. I think I see the problem. I think I've got the rpc_sleep_on() handling all wrong here. I'll have to respin this patch. Chuck, mind dropping this one for now? Thanks, Jeff > > > [ 344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! > > > [kworker/u24:28:205] > > > [ 344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > > > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > > > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > > > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > > > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > > > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > > > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > > > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > > > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > > > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > > > fuse > > > [ 344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump: > > > loaded Not tainted 6.12.0-rc4+ #42 > > > [ 344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > > > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > > > [ 344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc] > > > [ 344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > > [ 344.051821] pc : kasan_check_range+0x0/0x188 > > > [ 344.052011] lr : __kasan_check_write+0x1c/0x28 > > > [ 344.052208] sp : ffff800087027920 > > > [ 344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170 > > > [ 344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0 > > > [ 344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a > > > [ 344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000 > > > [ 344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d > > > [ 344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f > > > [ 344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000 > > > [ 344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001 > > > [ 344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc > > > [ 344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172 > > > [ 344.056479] Call trace: > > > [ 344.056636] kasan_check_range+0x0/0x188 > > > [ 344.056886] queued_spin_lock_slowpath+0x5f4/0xaa0 > > > [ 344.057192] _raw_spin_lock+0x180/0x1a8 > > > [ 344.057436] rpc_sleep_on+0x78/0xe8 [sunrpc] > > > [ 344.057700] nfsd4_cb_prepare+0x15c/0x468 [nfsd] > > > [ 344.057935] rpc_prepare_task+0x70/0xa0 [sunrpc] > > > [ 344.058165] __rpc_execute+0x1e8/0xa48 [sunrpc] > > > [ 344.058388] rpc_async_schedule+0x90/0x100 [sunrpc] > > > [ 344.058623] process_one_work+0x598/0x1100 > > > [ 344.058818] worker_thread+0x6c0/0xa58 > > > [ 344.058992] kthread+0x288/0x310 > > > [ 344.059145] ret_from_fork+0x10/0x20 > > > [ 344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! > > > [kworker/u24:27:204] > > > [ 344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > > > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > > > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > > > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > > > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > > > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > > > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > > > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > > > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > > > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > > > fuse > > > [ 344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump: > > > loaded Tainted: G L 6.12.0-rc4+ #42 > > > [ 344.080290] Tainted: [L]=SOFTLOCKUP > > > [ 344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > > > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > > > [ 344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc] > > > [ 344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > > [ 344.081630] pc : _raw_spin_lock+0x108/0x1a8 > > > [ 344.081815] lr : _raw_spin_lock+0xf4/0x1a8 > > > [ 344.081998] sp : ffff800087017a30 > > > [ 344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081 > > > [ 344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40 > > > [ 344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50 > > > [ 344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000 > > > [ 344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10 > > > [ 344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b > > > [ 344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000 > > > [ 344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001 > > > [ 344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154 > > > [ 344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 > > > [ 344.085915] Call trace: > > > [ 344.086028] _raw_spin_lock+0x108/0x1a8 > > > [ 344.086210] rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc] > > > [ 344.086465] nfsd4_cb_prepare+0x168/0x468 [nfsd] > > > [ 344.086694] rpc_prepare_task+0x70/0xa0 [sunrpc] > > > [ 344.086922] __rpc_execute+0x1e8/0xa48 [sunrpc] > > > [ 344.087148] rpc_async_schedule+0x90/0x100 [sunrpc] > > > [ 344.087389] process_one_work+0x598/0x1100 > > > [ 344.087584] worker_thread+0x6c0/0xa58 > > > [ 344.087758] kthread+0x288/0x310 > > > [ 344.087909] ret_from_fork+0x10/0x20 > > > > > > On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote: > > > > > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > > > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote: > > > > > A few more minor updates to the set to fix some small-ish bugs, and do a > > > > > bit of cleanup. This seems to test OK for me so far. > > > > > > > > > > > > > > > > > > Applied to nfsd-next for v6.13, thanks! Still open for comments and > > > > test results. > > > > > > > > [1/2] nfsd: make nfsd4_session->se_flags a bool > > > > commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816 > > > > [2/2] nfsd: allow for up to 32 callback session slots > > > > commit: 6c8910ac1cd360ea01136d707158690b5159a1d0 > > > > > > > > -- > > > > Chuck Lever > > > > > > > > > > > > -- > > Jeff Layton <jlayton@kernel.org> -- Jeff Layton <jlayton@kernel.org>
© 2016 - 2024 Red Hat, Inc.