[PATCH] nd_btt: Make BTT lanes preemptible

Tomáš Glozar posted 1 patch 2 years, 3 months ago
There is a newer version of this series
drivers/nvdimm/region_devs.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
[PATCH] nd_btt: Make BTT lanes preemptible
Posted by Tomáš Glozar 2 years, 3 months ago
From: Tomas Glozar <tglozar@redhat.com>

nd_region_acquire_lane uses get_cpu, which disables preemption. This is
an issue on PREEMPT_RT kernels, since btt_write_pg and also
nd_region_acquire_lane itself take a spin lock, resulting in BUG:
sleeping function called from invalid context.

Fix the issue by replacing get_cpu with smp_process_id and
migrate_disable when needed. This makes BTT operations preemptible, thus
permitting the use of spin_lock.

Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 drivers/nvdimm/region_devs.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 0a81f87f6f6c..e2f1fb99707f 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -939,7 +939,8 @@ unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
 {
 	unsigned int cpu, lane;
 
-	cpu = get_cpu();
+	migrate_disable();
+	cpu = smp_processor_id();
 	if (nd_region->num_lanes < nr_cpu_ids) {
 		struct nd_percpu_lane *ndl_lock, *ndl_count;
 
@@ -958,16 +959,15 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
 {
 	if (nd_region->num_lanes < nr_cpu_ids) {
-		unsigned int cpu = get_cpu();
+		unsigned int cpu = smp_processor_id();
 		struct nd_percpu_lane *ndl_lock, *ndl_count;
 
 		ndl_count = per_cpu_ptr(nd_region->lane, cpu);
 		ndl_lock = per_cpu_ptr(nd_region->lane, lane);
 		if (--ndl_count->count == 0)
 			spin_unlock(&ndl_lock->lock);
-		put_cpu();
 	}
-	put_cpu();
+	migrate_enable();
 }
 EXPORT_SYMBOL(nd_region_release_lane);
 
-- 
2.39.3
Re: [PATCH] nd_btt: Make BTT lanes preemptible
Posted by Ira Weiny 2 years, 3 months ago
Tomáš Glozar wrote:
> From: Tomas Glozar <tglozar@redhat.com>
> 
> nd_region_acquire_lane uses get_cpu, which disables preemption. This is
> an issue on PREEMPT_RT kernels, since btt_write_pg and also
> nd_region_acquire_lane itself take a spin lock, resulting in BUG:
> sleeping function called from invalid context.

Is the bug in 1 of 2 places?

1) When btt_write_pg()->lock_map() (when the number of lanes is < number
   of cpus) and the lane is acquired is called?

*or*

2) When nd_region_acquire_lane() internally trys to take it's lock?

A copy/paste of the BUG observed would have been more clear I think.

Regardless I *think* this is ok but I'm worried I don't fully understand
what the problem is.

Ira

> 
> Fix the issue by replacing get_cpu with smp_process_id and
> migrate_disable when needed. This makes BTT operations preemptible, thus
> permitting the use of spin_lock.
> 
> Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
> Signed-off-by: Tomas Glozar <tglozar@redhat.com>
> ---
>  drivers/nvdimm/region_devs.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index 0a81f87f6f6c..e2f1fb99707f 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -939,7 +939,8 @@ unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
>  {
>  	unsigned int cpu, lane;
>  
> -	cpu = get_cpu();
> +	migrate_disable();
> +	cpu = smp_processor_id();
>  	if (nd_region->num_lanes < nr_cpu_ids) {
>  		struct nd_percpu_lane *ndl_lock, *ndl_count;
>  
> @@ -958,16 +959,15 @@ EXPORT_SYMBOL(nd_region_acquire_lane);
>  void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane)
>  {
>  	if (nd_region->num_lanes < nr_cpu_ids) {
> -		unsigned int cpu = get_cpu();
> +		unsigned int cpu = smp_processor_id();
>  		struct nd_percpu_lane *ndl_lock, *ndl_count;
>  
>  		ndl_count = per_cpu_ptr(nd_region->lane, cpu);
>  		ndl_lock = per_cpu_ptr(nd_region->lane, lane);
>  		if (--ndl_count->count == 0)
>  			spin_unlock(&ndl_lock->lock);
> -		put_cpu();
>  	}
> -	put_cpu();
> +	migrate_enable();
>  }
>  EXPORT_SYMBOL(nd_region_release_lane);
>  
> -- 
> 2.39.3
> 


Re: [PATCH] nd_btt: Make BTT lanes preemptible
Posted by Tomas Glozar 2 years, 3 months ago
čt 14. 9. 2023 v 22:18 odesílatel Ira Weiny <ira.weiny@intel.com> napsal:
> Is the bug in 1 of 2 places?
>
> 1) When btt_write_pg()->lock_map() (when the number of lanes is < number
>    of cpus) and the lane is acquired is called?
>
> *or*
>
> 2) When nd_region_acquire_lane() internally trys to take it's lock?
>
> A copy/paste of the BUG observed would have been more clear I think.
>

The BUG was observed on btt_write_pg()->lock_map(), but I assume the
BUG will also happen on the lock in nd_region_acquire_lane, since that
is also a spin lock, i.e. a sleeping lock on RT.

BUG observed in dmesg when running ndctl tests on RT kernel without the patch:

[  123.262740] nfit_test_iomap: loading out-of-tree module taints kernel.
[  123.262744] nfit_test_iomap: loading test module taints kernel.
[  123.408628] nfit_test nfit_test.0: changing numa node from -1 to 0
for nfit region [0x0000000100000000-0x0000000101ffffff]
[  123.408633] nfit_test nfit_test.0: changing target node from -1 to
0 for nfit region [0x0000000100000000-0x0000000101ffffff]
[  123.408852] nfit_test nfit_test.0: changing numa node from -1 to 0
for nfit region [0x0000000108000000-0x000000010bffffff]
[  123.408855] nfit_test nfit_test.0: changing target node from -1 to
0 for nfit region [0x0000000108000000-0x000000010bffffff]
[  123.408933] nfit_test nfit_test.0: changing numa node from -1 to 0
for nfit region [0xffffb900c2cbd000-0xffffb900c2cbd00b]
[  123.408935] nfit_test nfit_test.0: changing target node from -1 to
0 for nfit region [0xffffb900c2cbd000-0xffffb900c2cbd00b]
[  123.408961] nfit_test nfit_test.0: changing numa node from -1 to 0
for nfit region [0xffffb900c2cde000-0xffffb900c2cde00b]
[  123.408963] nfit_test nfit_test.0: changing target node from -1 to
0 for nfit region [0xffffb900c2cde000-0xffffb900c2cde00b]
[  123.408988] nfit_test nfit_test.0: changing numa node from -1 to 0
for nfit region [0xffffb900c2cff000-0xffffb900c2cff00b]
[  123.408990] nfit_test nfit_test.0: changing target node from -1 to
0 for nfit region [0xffffb900c2cff000-0xffffb900c2cff00b]
[  123.409015] nfit_test nfit_test.0: changing numa node from -1 to 0
for nfit region [0xffffb900c2d45000-0xffffb900c2d4500b]
[  123.409018] nfit_test nfit_test.0: changing target node from -1 to
0 for nfit region [0xffffb900c2d45000-0xffffb900c2d4500b]
[  123.409586] nfit_test nfit_test.0: failed to evaluate _FIT
[  123.441834] nfit_test nfit_test.1: Error found in NVDIMM nmem4
flags: save_fail restore_fail flush_fail not_armed
[  123.441857] nfit_test nfit_test.1: Error found in NVDIMM nmem5
flags: map_fail
[  123.457346] nfit_test nfit_test.1: changing numa node from -1 to 0
for nfit region [0x0000000140000000-0x0000000141ffffff]
[  123.457351] nfit_test nfit_test.1: changing target node from -1 to
0 for nfit region [0x0000000140000000-0x0000000141ffffff]
[  123.457427] nfit_test nfit_test.1: changing numa node from -1 to 0
for nfit region [0xffffb900c361d000-0xffffb900c3a1cfff]
[  123.457429] nfit_test nfit_test.1: changing target node from -1 to
0 for nfit region [0xffffb900c361d000-0xffffb900c3a1cfff]
[  123.475513] nd_pmem namespace3.0: unable to guarantee persistence of writes
[  123.484778] nd_pmem namespace2.0: region2 read-only, marking pmem2 read-only
[  126.349866] nd_pmem btt0.0: No existing arenas
[  126.407070] BUG: sleeping function called from invalid context at
kernel/locking/spinlock_rt.c:48
[  126.407073] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
4903, name: libndctl
[  126.407074] preempt_count: 1, expected: 0
[  126.407075] RCU nest depth: 0, expected: 0
[  126.407075] 1 lock held by libndctl/4903:
[  126.407076]  #0: ffff8c184a270060
(&arena->map_locks[i].lock){+.+.}-{2:2}, at: btt_write_pg+0x2d7/0x500
[nd_btt]
[  126.407085] Preemption disabled at:
[  126.407085] [<ffffffffc1313db5>] nd_region_acquire_lane+0x15/0x90 [libnvdimm]
[  126.407099] CPU: 1 PID: 4903 Comm: libndctl Kdump: loaded Tainted:
G        W  O     N-------  ---
6.5.0-ark.ndctl-tests.el9+rt-debug-rt6+ #6
[  126.407101] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.16.2-1.fc38 04/01/2014
[  126.407102] Call Trace:
[  126.407103]  <TASK>
[  126.407104]  dump_stack_lvl+0x8e/0xb0
[  126.407109]  __might_resched+0x19b/0x250
[  126.407113]  rt_spin_lock+0x4c/0x100
[  126.407116]  ? btt_write_pg+0x2d7/0x500 [nd_btt]
[  126.407120]  btt_write_pg+0x2d7/0x500 [nd_btt]
[  126.407127]  ? local_clock_noinstr+0x9/0xc0
[  126.407131]  btt_submit_bio+0x16d/0x270 [nd_btt]
[  126.407138]  __submit_bio+0x48/0x80
[  126.407141]  __submit_bio_noacct+0x7e/0x1e0
[  126.407146]  submit_bio_wait+0x58/0xb0
[  126.407153]  __blkdev_direct_IO_simple+0x107/0x240
[  126.407156]  ? inode_set_ctime_current+0x51/0x110
[  126.407164]  ? __pfx_submit_bio_wait_endio+0x10/0x10
[  126.407171]  blkdev_write_iter+0x1d8/0x290
[  126.407174]  vfs_write+0x237/0x330
[  126.407183]  ksys_write+0x68/0xf0
[  126.407187]  do_syscall_64+0x59/0x90
[  126.407192]  ? do_syscall_64+0x69/0x90
[  126.407193]  ? lockdep_hardirqs_on+0x79/0x100
[  126.407195]  ? do_syscall_64+0x69/0x90
[  126.407196]  ? lockdep_hardirqs_on+0x79/0x100
[  126.407198]  ? do_syscall_64+0x69/0x90
[  126.407199]  ? do_syscall_64+0x69/0x90
[  126.407200]  ? lockdep_hardirqs_on+0x79/0x100
[  126.407202]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  126.407205] RIP: 0033:0x7f95f2f3eba7
[  126.407218] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7
0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89
7
4 24
[  126.407219] RSP: 002b:00007ffe7a298678 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[  126.407221] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f95f2f3eba7
[  126.407222] RDX: 0000000000001000 RSI: 0000000001ea9000 RDI: 000000000000000a
[  126.407223] RBP: 00007ffe7a298740 R08: 0000000000000000 R09: 00007f95f2fb14e0
[  126.407223] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe7a298be8
[  126.407224] R13: 000000000040e3a2 R14: 0000000000412db8 R15: 00007f95f31ff000
[  126.407233]  </TASK>
Re: [PATCH] nd_btt: Make BTT lanes preemptible
Posted by Ira Weiny 2 years, 3 months ago
Tomas Glozar wrote:
> čt 14. 9. 2023 v 22:18 odesílatel Ira Weiny <ira.weiny@intel.com> napsal:
> > Is the bug in 1 of 2 places?
> >
> > 1) When btt_write_pg()->lock_map() (when the number of lanes is < number
> >    of cpus) and the lane is acquired is called?
> >
> > *or*
> >
> > 2) When nd_region_acquire_lane() internally trys to take it's lock?
> >
> > A copy/paste of the BUG observed would have been more clear I think.
> >
> 
> The BUG was observed on btt_write_pg()->lock_map(), but I assume the
> BUG will also happen on the lock in nd_region_acquire_lane, since that
> is also a spin lock, i.e. a sleeping lock on RT.
> 
> BUG observed in dmesg when running ndctl tests on RT kernel without the patch:

Thanks for clarifying.  Could you respin the patch with the text below?
That would have saved me a lot of time digging to see what the code path
was.

...
	BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
	in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 4903, name: libndctl
	preempt_count: 1, expected: 0
	RCU nest depth: 0, expected: 0
	1 lock held by libndctl/4903:
	 #0: ffff8c184a270060 (&arena->map_locks[i].lock){+.+.}-{2:2}, at: btt_write_pg+0x2d7/0x500 [nd_btt]
	Preemption disabled at:
	[<ffffffffc1313db5>] nd_region_acquire_lane+0x15/0x90 [libnvdimm]
	Call Trace:
	 <TASK>
	 dump_stack_lvl+0x8e/0xb0
	 __might_resched+0x19b/0x250
	 rt_spin_lock+0x4c/0x100
	 ? btt_write_pg+0x2d7/0x500 [nd_btt]
	 btt_write_pg+0x2d7/0x500 [nd_btt]
	 ? local_clock_noinstr+0x9/0xc0
	 btt_submit_bio+0x16d/0x270 [nd_btt]
	 __submit_bio+0x48/0x80
	 __submit_bio_noacct+0x7e/0x1e0
	 submit_bio_wait+0x58/0xb0
	 __blkdev_direct_IO_simple+0x107/0x240
	 ? inode_set_ctime_current+0x51/0x110
	 ? __pfx_submit_bio_wait_endio+0x10/0x10
	 blkdev_write_iter+0x1d8/0x290
	 vfs_write+0x237/0x330
...

With a respin including this trace:

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Re: [PATCH] nd_btt: Make BTT lanes preemptible
Posted by Tomas Glozar 2 years, 2 months ago
po 18. 9. 2023 v 23:32 odesílatel Ira Weiny <ira.weiny@intel.com> napsal:
> Thanks for clarifying.  Could you respin the patch with the text below?
> That would have saved me a lot of time digging to see what the code path
> was.

Sent v2 with relevant parts of the BUG message.