[PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots

Jeff Layton posted 2 patches 3 weeks, 4 days ago
fs/nfsd/nfs4callback.c | 108 ++++++++++++++++++++++++++++++++++---------------
fs/nfsd/nfs4state.c    |  17 +++++---
fs/nfsd/state.h        |  19 ++++-----
fs/nfsd/trace.h        |   2 +-
4 files changed, 98 insertions(+), 48 deletions(-)
[PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots
Posted by Jeff Layton 3 weeks, 4 days ago
A few more minor updates to the set to fix some small-ish bugs, and do a
bit of cleanup. This seems to test OK for me so far.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Changes in v3:
- add patch to convert se_flags to single se_dead bool
- fix off-by-one bug in handling of NFSD_BC_SLOT_TABLE_MAX
- don't reject target highest slot value of 0
- Link to v2: https://lore.kernel.org/r/20241029-bcwide-v2-1-e9010b6ef55d@kernel.org

Changes in v2:
- take cl_lock when fetching fields from session to be encoded
- use fls() instead of bespoke highest_unset_index()
- rename variables in several functions with more descriptive names
- clamp limit of for loop in update_cb_slot_table()
- re-add missing rpc_wake_up_queued_task() call
- fix slotid check in decode_cb_sequence4resok()
- add new per-session spinlock

---
Jeff Layton (2):
      nfsd: make nfsd4_session->se_flags a bool
      nfsd: allow for up to 32 callback session slots

 fs/nfsd/nfs4callback.c | 108 ++++++++++++++++++++++++++++++++++---------------
 fs/nfsd/nfs4state.c    |  17 +++++---
 fs/nfsd/state.h        |  19 ++++-----
 fs/nfsd/trace.h        |   2 +-
 4 files changed, 98 insertions(+), 48 deletions(-)
---
base-commit: 06c049d2a81a81f01ff072c6519d0c38b646b550
change-id: 20241025-bcwide-6bd7e4b63db2

Best regards,
-- 
Jeff Layton <jlayton@kernel.org>
Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots
Posted by Sebastian Feld 2 weeks, 4 days ago
On Wed, Oct 30, 2024 at 3:50 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> A few more minor updates to the set to fix some small-ish bugs, and do a
> bit of cleanup. This seems to test OK for me so far.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> Changes in v3:
> - add patch to convert se_flags to single se_dead bool
> - fix off-by-one bug in handling of NFSD_BC_SLOT_TABLE_MAX
> - don't reject target highest slot value of 0
> - Link to v2: https://lore.kernel.org/r/20241029-bcwide-v2-1-e9010b6ef55d@kernel.org
>
> Changes in v2:
> - take cl_lock when fetching fields from session to be encoded
> - use fls() instead of bespoke highest_unset_index()
> - rename variables in several functions with more descriptive names
> - clamp limit of for loop in update_cb_slot_table()
> - re-add missing rpc_wake_up_queued_task() call
> - fix slotid check in decode_cb_sequence4resok()
> - add new per-session spinlock
>

What does a NFSv4.1 client need to do to be compatible with this change?

Sebi
-- 
Sebastian Feld - IT secruity expert
Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots
Posted by cel@kernel.org 3 weeks, 4 days ago
From: Chuck Lever <chuck.lever@oracle.com>

On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote:
> A few more minor updates to the set to fix some small-ish bugs, and do a
> bit of cleanup. This seems to test OK for me so far.
> 
> 

Applied to nfsd-next for v6.13, thanks! Still open for comments and
test results.

[1/2] nfsd: make nfsd4_session->se_flags a bool
      commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816
[2/2] nfsd: allow for up to 32 callback session slots
      commit: 6c8910ac1cd360ea01136d707158690b5159a1d0

--
Chuck Lever
Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots
Posted by Olga Kornievskaia 2 weeks, 5 days ago
Hi Jeff/Chuck,

Hitting the following softlockup when running using nfsd-next code.
testing is same open bunch of file get delegations, do local
conflicting operation. Network trace shows a few cb_recalls occurring
successfully before the soft lockup (I can confirm that more than 1
slot was used. But I also see that the server isn't trying to use the
lowest available slot but instead just bumps the number and uses the
next one. By that I mean, say slot 0 was used and a reply came back
but the next callback would use slot 1 instead of slot 0).

[  344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s!
[kworker/u24:28:205]
[  344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core
nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy
snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr
rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo
snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg
videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core
snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc
snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme
drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64
drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg
fuse
[  344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump:
loaded Not tainted 6.12.0-rc4+ #42
[  344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
VMW201.00V.21805430.BA64.2305221830 05/22/2023
[  344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc]
[  344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  344.051821] pc : kasan_check_range+0x0/0x188
[  344.052011] lr : __kasan_check_write+0x1c/0x28
[  344.052208] sp : ffff800087027920
[  344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170
[  344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0
[  344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a
[  344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000
[  344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d
[  344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f
[  344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000
[  344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001
[  344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc
[  344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172
[  344.056479] Call trace:
[  344.056636]  kasan_check_range+0x0/0x188
[  344.056886]  queued_spin_lock_slowpath+0x5f4/0xaa0
[  344.057192]  _raw_spin_lock+0x180/0x1a8
[  344.057436]  rpc_sleep_on+0x78/0xe8 [sunrpc]
[  344.057700]  nfsd4_cb_prepare+0x15c/0x468 [nfsd]
[  344.057935]  rpc_prepare_task+0x70/0xa0 [sunrpc]
[  344.058165]  __rpc_execute+0x1e8/0xa48 [sunrpc]
[  344.058388]  rpc_async_schedule+0x90/0x100 [sunrpc]
[  344.058623]  process_one_work+0x598/0x1100
[  344.058818]  worker_thread+0x6c0/0xa58
[  344.058992]  kthread+0x288/0x310
[  344.059145]  ret_from_fork+0x10/0x20
[  344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
[kworker/u24:27:204]
[  344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core
nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy
snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr
rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo
snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg
videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core
snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc
snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme
drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64
drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg
fuse
[  344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump:
loaded Tainted: G             L     6.12.0-rc4+ #42
[  344.080290] Tainted: [L]=SOFTLOCKUP
[  344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
VMW201.00V.21805430.BA64.2305221830 05/22/2023
[  344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc]
[  344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  344.081630] pc : _raw_spin_lock+0x108/0x1a8
[  344.081815] lr : _raw_spin_lock+0xf4/0x1a8
[  344.081998] sp : ffff800087017a30
[  344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081
[  344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40
[  344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50
[  344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000
[  344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10
[  344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b
[  344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000
[  344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001
[  344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154
[  344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000
[  344.085915] Call trace:
[  344.086028]  _raw_spin_lock+0x108/0x1a8
[  344.086210]  rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc]
[  344.086465]  nfsd4_cb_prepare+0x168/0x468 [nfsd]
[  344.086694]  rpc_prepare_task+0x70/0xa0 [sunrpc]
[  344.086922]  __rpc_execute+0x1e8/0xa48 [sunrpc]
[  344.087148]  rpc_async_schedule+0x90/0x100 [sunrpc]
[  344.087389]  process_one_work+0x598/0x1100
[  344.087584]  worker_thread+0x6c0/0xa58
[  344.087758]  kthread+0x288/0x310
[  344.087909]  ret_from_fork+0x10/0x20

On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote:
> > A few more minor updates to the set to fix some small-ish bugs, and do a
> > bit of cleanup. This seems to test OK for me so far.
> >
> >
>
> Applied to nfsd-next for v6.13, thanks! Still open for comments and
> test results.
>
> [1/2] nfsd: make nfsd4_session->se_flags a bool
>       commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816
> [2/2] nfsd: allow for up to 32 callback session slots
>       commit: 6c8910ac1cd360ea01136d707158690b5159a1d0
>
> --
> Chuck Lever
>
>
Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots
Posted by Jeff Layton 2 weeks, 5 days ago
On Tue, 2024-11-05 at 17:08 -0500, Olga Kornievskaia wrote:
> Hi Jeff/Chuck,
> 
> Hitting the following softlockup when running using nfsd-next code.
> testing is same open bunch of file get delegations, do local
> conflicting operation. Network trace shows a few cb_recalls occurring
> successfully before the soft lockup (I can confirm that more than 1
> slot was used. But I also see that the server isn't trying to use the
> lowest available slot but instead just bumps the number and uses the
> next one. By that I mean, say slot 0 was used and a reply came back
> but the next callback would use slot 1 instead of slot 0).
> 

If the slots are being consumed and not released then that's what you'd
see. The question is why those slots aren't being released.

Did the client return a SEQUENCE error on some of those callbacks? It
looks like the slot doesn't always get released if that occurs, so that
might be one possibility.

> [  344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s!
> [kworker/u24:28:205]
> [  344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core
> nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy
> snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr
> rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo
> snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg
> videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core
> snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc
> snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme
> drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64
> drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg
> fuse
> [  344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump:
> loaded Not tainted 6.12.0-rc4+ #42
> [  344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
> VMW201.00V.21805430.BA64.2305221830 05/22/2023
> [  344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc]
> [  344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [  344.051821] pc : kasan_check_range+0x0/0x188
> [  344.052011] lr : __kasan_check_write+0x1c/0x28
> [  344.052208] sp : ffff800087027920
> [  344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170
> [  344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0
> [  344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a
> [  344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000
> [  344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d
> [  344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f
> [  344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000
> [  344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001
> [  344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc
> [  344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172
> [  344.056479] Call trace:
> [  344.056636]  kasan_check_range+0x0/0x188
> [  344.056886]  queued_spin_lock_slowpath+0x5f4/0xaa0
> [  344.057192]  _raw_spin_lock+0x180/0x1a8
> [  344.057436]  rpc_sleep_on+0x78/0xe8 [sunrpc]
> [  344.057700]  nfsd4_cb_prepare+0x15c/0x468 [nfsd]
> [  344.057935]  rpc_prepare_task+0x70/0xa0 [sunrpc]
> [  344.058165]  __rpc_execute+0x1e8/0xa48 [sunrpc]
> [  344.058388]  rpc_async_schedule+0x90/0x100 [sunrpc]
> [  344.058623]  process_one_work+0x598/0x1100
> [  344.058818]  worker_thread+0x6c0/0xa58
> [  344.058992]  kthread+0x288/0x310
> [  344.059145]  ret_from_fork+0x10/0x20
> [  344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
> [kworker/u24:27:204]
> [  344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core
> nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy
> snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr
> rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo
> snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg
> videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core
> snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc
> snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme
> drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64
> drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg
> fuse
> [  344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump:
> loaded Tainted: G             L     6.12.0-rc4+ #42
> [  344.080290] Tainted: [L]=SOFTLOCKUP
> [  344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
> VMW201.00V.21805430.BA64.2305221830 05/22/2023
> [  344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc]
> [  344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [  344.081630] pc : _raw_spin_lock+0x108/0x1a8
> [  344.081815] lr : _raw_spin_lock+0xf4/0x1a8
> [  344.081998] sp : ffff800087017a30
> [  344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081
> [  344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40
> [  344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50
> [  344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000
> [  344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10
> [  344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b
> [  344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000
> [  344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001
> [  344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154
> [  344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000
> [  344.085915] Call trace:
> [  344.086028]  _raw_spin_lock+0x108/0x1a8
> [  344.086210]  rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc]
> [  344.086465]  nfsd4_cb_prepare+0x168/0x468 [nfsd]
> [  344.086694]  rpc_prepare_task+0x70/0xa0 [sunrpc]
> [  344.086922]  __rpc_execute+0x1e8/0xa48 [sunrpc]
> [  344.087148]  rpc_async_schedule+0x90/0x100 [sunrpc]
> [  344.087389]  process_one_work+0x598/0x1100
> [  344.087584]  worker_thread+0x6c0/0xa58
> [  344.087758]  kthread+0x288/0x310
> [  344.087909]  ret_from_fork+0x10/0x20
> 
> On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote:
> > 
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote:
> > > A few more minor updates to the set to fix some small-ish bugs, and do a
> > > bit of cleanup. This seems to test OK for me so far.
> > > 
> > > 
> > 
> > Applied to nfsd-next for v6.13, thanks! Still open for comments and
> > test results.
> > 
> > [1/2] nfsd: make nfsd4_session->se_flags a bool
> >       commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816
> > [2/2] nfsd: allow for up to 32 callback session slots
> >       commit: 6c8910ac1cd360ea01136d707158690b5159a1d0
> > 
> > --
> > Chuck Lever
> > 
> > 

-- 
Jeff Layton <jlayton@kernel.org>
Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots
Posted by Olga Kornievskaia 2 weeks, 5 days ago
On Tue, Nov 5, 2024 at 5:27 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Tue, 2024-11-05 at 17:08 -0500, Olga Kornievskaia wrote:
> > Hi Jeff/Chuck,
> >
> > Hitting the following softlockup when running using nfsd-next code.
> > testing is same open bunch of file get delegations, do local
> > conflicting operation. Network trace shows a few cb_recalls occurring
> > successfully before the soft lockup (I can confirm that more than 1
> > slot was used. But I also see that the server isn't trying to use the
> > lowest available slot but instead just bumps the number and uses the
> > next one. By that I mean, say slot 0 was used and a reply came back
> > but the next callback would use slot 1 instead of slot 0).
> >
>
> If the slots are being consumed and not released then that's what you'd
> see. The question is why those slots aren't being released.
>
> Did the client return a SEQUENCE error on some of those callbacks? It
> looks like the slot doesn't always get released if that occurs, so that
> might be one possibility.

No sequence errors. CB_SEQUENCE and CB_RECALL replies are all successful.

> > [  344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s!
> > [kworker/u24:28:205]
> > [  344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core
> > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy
> > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr
> > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo
> > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg
> > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core
> > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc
> > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme
> > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64
> > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg
> > fuse
> > [  344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump:
> > loaded Not tainted 6.12.0-rc4+ #42
> > [  344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
> > VMW201.00V.21805430.BA64.2305221830 05/22/2023
> > [  344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc]
> > [  344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > [  344.051821] pc : kasan_check_range+0x0/0x188
> > [  344.052011] lr : __kasan_check_write+0x1c/0x28
> > [  344.052208] sp : ffff800087027920
> > [  344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170
> > [  344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0
> > [  344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a
> > [  344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000
> > [  344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d
> > [  344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f
> > [  344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000
> > [  344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001
> > [  344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc
> > [  344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172
> > [  344.056479] Call trace:
> > [  344.056636]  kasan_check_range+0x0/0x188
> > [  344.056886]  queued_spin_lock_slowpath+0x5f4/0xaa0
> > [  344.057192]  _raw_spin_lock+0x180/0x1a8
> > [  344.057436]  rpc_sleep_on+0x78/0xe8 [sunrpc]
> > [  344.057700]  nfsd4_cb_prepare+0x15c/0x468 [nfsd]
> > [  344.057935]  rpc_prepare_task+0x70/0xa0 [sunrpc]
> > [  344.058165]  __rpc_execute+0x1e8/0xa48 [sunrpc]
> > [  344.058388]  rpc_async_schedule+0x90/0x100 [sunrpc]
> > [  344.058623]  process_one_work+0x598/0x1100
> > [  344.058818]  worker_thread+0x6c0/0xa58
> > [  344.058992]  kthread+0x288/0x310
> > [  344.059145]  ret_from_fork+0x10/0x20
> > [  344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
> > [kworker/u24:27:204]
> > [  344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core
> > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy
> > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr
> > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo
> > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg
> > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core
> > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc
> > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme
> > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64
> > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg
> > fuse
> > [  344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump:
> > loaded Tainted: G             L     6.12.0-rc4+ #42
> > [  344.080290] Tainted: [L]=SOFTLOCKUP
> > [  344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
> > VMW201.00V.21805430.BA64.2305221830 05/22/2023
> > [  344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc]
> > [  344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > [  344.081630] pc : _raw_spin_lock+0x108/0x1a8
> > [  344.081815] lr : _raw_spin_lock+0xf4/0x1a8
> > [  344.081998] sp : ffff800087017a30
> > [  344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081
> > [  344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40
> > [  344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50
> > [  344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000
> > [  344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10
> > [  344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b
> > [  344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000
> > [  344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001
> > [  344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154
> > [  344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000
> > [  344.085915] Call trace:
> > [  344.086028]  _raw_spin_lock+0x108/0x1a8
> > [  344.086210]  rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc]
> > [  344.086465]  nfsd4_cb_prepare+0x168/0x468 [nfsd]
> > [  344.086694]  rpc_prepare_task+0x70/0xa0 [sunrpc]
> > [  344.086922]  __rpc_execute+0x1e8/0xa48 [sunrpc]
> > [  344.087148]  rpc_async_schedule+0x90/0x100 [sunrpc]
> > [  344.087389]  process_one_work+0x598/0x1100
> > [  344.087584]  worker_thread+0x6c0/0xa58
> > [  344.087758]  kthread+0x288/0x310
> > [  344.087909]  ret_from_fork+0x10/0x20
> >
> > On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote:
> > >
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > >
> > > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote:
> > > > A few more minor updates to the set to fix some small-ish bugs, and do a
> > > > bit of cleanup. This seems to test OK for me so far.
> > > >
> > > >
> > >
> > > Applied to nfsd-next for v6.13, thanks! Still open for comments and
> > > test results.
> > >
> > > [1/2] nfsd: make nfsd4_session->se_flags a bool
> > >       commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816
> > > [2/2] nfsd: allow for up to 32 callback session slots
> > >       commit: 6c8910ac1cd360ea01136d707158690b5159a1d0
> > >
> > > --
> > > Chuck Lever
> > >
> > >
>
> --
> Jeff Layton <jlayton@kernel.org>
Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots
Posted by Jeff Layton 2 weeks, 5 days ago
On Tue, 2024-11-05 at 17:40 -0500, Olga Kornievskaia wrote:
> On Tue, Nov 5, 2024 at 5:27 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > On Tue, 2024-11-05 at 17:08 -0500, Olga Kornievskaia wrote:
> > > Hi Jeff/Chuck,
> > > 
> > > Hitting the following softlockup when running using nfsd-next code.
> > > testing is same open bunch of file get delegations, do local
> > > conflicting operation. Network trace shows a few cb_recalls occurring
> > > successfully before the soft lockup (I can confirm that more than 1
> > > slot was used. But I also see that the server isn't trying to use the
> > > lowest available slot but instead just bumps the number and uses the
> > > next one. By that I mean, say slot 0 was used and a reply came back
> > > but the next callback would use slot 1 instead of slot 0).
> > > 
> > 
> > If the slots are being consumed and not released then that's what you'd
> > see. The question is why those slots aren't being released.
> > 
> > Did the client return a SEQUENCE error on some of those callbacks? It
> > looks like the slot doesn't always get released if that occurs, so that
> > might be one possibility.
> 
> No sequence errors. CB_SEQUENCE and CB_RECALL replies are all successful.
> 

Nevermind. I think I see the problem. I think I've got the
rpc_sleep_on() handling all wrong here. I'll have to respin this patch.
Chuck, mind dropping this one for now?

Thanks,
Jeff

> > > [  344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s!
> > > [kworker/u24:28:205]
> > > [  344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core
> > > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy
> > > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr
> > > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo
> > > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg
> > > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core
> > > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc
> > > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme
> > > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64
> > > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg
> > > fuse
> > > [  344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump:
> > > loaded Not tainted 6.12.0-rc4+ #42
> > > [  344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
> > > VMW201.00V.21805430.BA64.2305221830 05/22/2023
> > > [  344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc]
> > > [  344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > > [  344.051821] pc : kasan_check_range+0x0/0x188
> > > [  344.052011] lr : __kasan_check_write+0x1c/0x28
> > > [  344.052208] sp : ffff800087027920
> > > [  344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170
> > > [  344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0
> > > [  344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a
> > > [  344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000
> > > [  344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d
> > > [  344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f
> > > [  344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000
> > > [  344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001
> > > [  344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc
> > > [  344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172
> > > [  344.056479] Call trace:
> > > [  344.056636]  kasan_check_range+0x0/0x188
> > > [  344.056886]  queued_spin_lock_slowpath+0x5f4/0xaa0
> > > [  344.057192]  _raw_spin_lock+0x180/0x1a8
> > > [  344.057436]  rpc_sleep_on+0x78/0xe8 [sunrpc]
> > > [  344.057700]  nfsd4_cb_prepare+0x15c/0x468 [nfsd]
> > > [  344.057935]  rpc_prepare_task+0x70/0xa0 [sunrpc]
> > > [  344.058165]  __rpc_execute+0x1e8/0xa48 [sunrpc]
> > > [  344.058388]  rpc_async_schedule+0x90/0x100 [sunrpc]
> > > [  344.058623]  process_one_work+0x598/0x1100
> > > [  344.058818]  worker_thread+0x6c0/0xa58
> > > [  344.058992]  kthread+0x288/0x310
> > > [  344.059145]  ret_from_fork+0x10/0x20
> > > [  344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
> > > [kworker/u24:27:204]
> > > [  344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core
> > > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy
> > > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr
> > > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo
> > > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg
> > > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core
> > > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc
> > > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme
> > > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64
> > > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg
> > > fuse
> > > [  344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump:
> > > loaded Tainted: G             L     6.12.0-rc4+ #42
> > > [  344.080290] Tainted: [L]=SOFTLOCKUP
> > > [  344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
> > > VMW201.00V.21805430.BA64.2305221830 05/22/2023
> > > [  344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc]
> > > [  344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > > [  344.081630] pc : _raw_spin_lock+0x108/0x1a8
> > > [  344.081815] lr : _raw_spin_lock+0xf4/0x1a8
> > > [  344.081998] sp : ffff800087017a30
> > > [  344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081
> > > [  344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40
> > > [  344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50
> > > [  344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000
> > > [  344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10
> > > [  344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b
> > > [  344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000
> > > [  344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001
> > > [  344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154
> > > [  344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000
> > > [  344.085915] Call trace:
> > > [  344.086028]  _raw_spin_lock+0x108/0x1a8
> > > [  344.086210]  rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc]
> > > [  344.086465]  nfsd4_cb_prepare+0x168/0x468 [nfsd]
> > > [  344.086694]  rpc_prepare_task+0x70/0xa0 [sunrpc]
> > > [  344.086922]  __rpc_execute+0x1e8/0xa48 [sunrpc]
> > > [  344.087148]  rpc_async_schedule+0x90/0x100 [sunrpc]
> > > [  344.087389]  process_one_work+0x598/0x1100
> > > [  344.087584]  worker_thread+0x6c0/0xa58
> > > [  344.087758]  kthread+0x288/0x310
> > > [  344.087909]  ret_from_fork+0x10/0x20
> > > 
> > > On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote:
> > > > 
> > > > From: Chuck Lever <chuck.lever@oracle.com>
> > > > 
> > > > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote:
> > > > > A few more minor updates to the set to fix some small-ish bugs, and do a
> > > > > bit of cleanup. This seems to test OK for me so far.
> > > > > 
> > > > > 
> > > > 
> > > > Applied to nfsd-next for v6.13, thanks! Still open for comments and
> > > > test results.
> > > > 
> > > > [1/2] nfsd: make nfsd4_session->se_flags a bool
> > > >       commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816
> > > > [2/2] nfsd: allow for up to 32 callback session slots
> > > >       commit: 6c8910ac1cd360ea01136d707158690b5159a1d0
> > > > 
> > > > --
> > > > Chuck Lever
> > > > 
> > > > 
> > 
> > --
> > Jeff Layton <jlayton@kernel.org>

-- 
Jeff Layton <jlayton@kernel.org>