[PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show

Alessandro Carminati posted 1 patch 1 week, 6 days ago
mm/kmemleak.c | 46 ++++++++++++++++++++++++++++++++++++----------
1 file changed, 36 insertions(+), 10 deletions(-)
[PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Alessandro Carminati 1 week, 6 days ago
This patch addresses a bug in the RT variant of the kernel where a
"sleeping function called from invalid context" warning may occur in
kmemleak_seq_show under specific conditions:
- CONFIG_PREEMPT_RT=y
- SELinux is the LSM for the system
- `kptr_restrict` is set to 1.
- The kmemleak buffer contains at least one item.

Commit 8c96f1bc6fc49c724c4cdd22d3e99260263b7384 ("mm/kmemleak: turn
kmemleak_lock and object->lock to raw_spinlock_t") introduced a change
where kmemleak_seq_show is executed in atomic context within the RT kernel.
However, the SELinux capability check within this function flow still
relies on regular spinlocks, leading to potential race conditions that
trigger the error when printing the kmemleak backtrace.

To resolve this, the backtrace printing has been moved out of the critical
section.

Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
---
Please read previous conversation in the RFC
https://lore.kernel.org/all/20241115145410.114376-1-acarmina@redhat.com/

Splash triggering this patch:

```
[  159.247069] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[  159.247193] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat
[  159.247241] preempt_count: 1, expected: 0
[  159.247277] RCU nest depth: 2, expected: 2
[  159.247388] 6 locks held by cat/136:
[  159.247438]  #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30
[  159.248835]  #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128
[  159.249053]  #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0
[  159.249127]  #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0
[  159.249205]  #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0
[  159.249364] irq event stamp: 136660
[  159.249407] hardirqs last  enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8
[  159.249465] hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0
[  159.249518] softirqs last  enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8
[  159.249571] softirqs last disabled at (0): [<0000000000000000>] 0x0
[  159.249970] Preemption disabled at:
[  159.249988] [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
[  159.250609] CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G            E      6.11.0-rt7+ #34
[  159.250797] Tainted: [E]=UNSIGNED_MODULE
[  159.250822] Hardware name: linux,dummy-virt (DT)
[  159.251050] Call trace:
[  159.251079]  dump_backtrace+0xa0/0x128
[  159.251132]  show_stack+0x1c/0x30
[  159.251156]  dump_stack_lvl+0xe8/0x198
[  159.251180]  dump_stack+0x18/0x20
[  159.251227]  rt_spin_lock+0x8c/0x1a8
[  159.251273]  avc_perm_nonode+0xa0/0x150
[  159.251316]  cred_has_capability.isra.0+0x118/0x218
[  159.251340]  selinux_capable+0x50/0x80
[  159.251363]  security_capable+0x7c/0xd0
[  159.251388]  has_ns_capability_noaudit+0x94/0x1b0
[  159.251412]  has_capability_noaudit+0x20/0x30
[  159.251437]  restricted_pointer+0x21c/0x4b0
[  159.251461]  pointer+0x298/0x760
[  159.251482]  vsnprintf+0x330/0xf70
[  159.251504]  seq_printf+0x178/0x218
[  159.251526]  print_unreferenced+0x1a4/0x2d0
[  159.251551]  kmemleak_seq_show+0xd0/0x1e0
[  159.251576]  seq_read_iter+0x354/0xe30
[  159.251599]  seq_read+0x250/0x378
[  159.251622]  full_proxy_read+0xd8/0x148
[  159.251649]  vfs_read+0x190/0x918
[  159.251672]  ksys_read+0xf0/0x1e0
[  159.251693]  __arm64_sys_read+0x70/0xa8
[  159.251716]  invoke_syscall.constprop.0+0xd4/0x1d8
[  159.251767]  el0_svc+0x50/0x158
[  159.251813]  el0t_64_sync+0x17c/0x180
```
 mm/kmemleak.c | 46 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 0400f5e8ac60..c77899af3e9e 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -356,14 +356,9 @@ static bool unreferenced_object(struct kmemleak_object *object)
  * Printing of the unreferenced objects information to the seq file. The
  * print_unreferenced function must be called with the object->lock held.
  */
-static void print_unreferenced(struct seq_file *seq,
+static depot_stack_handle_t print_unreferenced(struct seq_file *seq,
 			       struct kmemleak_object *object)
 {
-	int i;
-	unsigned long *entries;
-	unsigned int nr_entries;
-
-	nr_entries = stack_depot_fetch(object->trace_handle, &entries);
 	warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
 			  object->pointer, object->size);
 	warn_or_seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu\n",
@@ -371,6 +366,23 @@ static void print_unreferenced(struct seq_file *seq,
 	hex_dump_object(seq, object);
 	warn_or_seq_printf(seq, "  backtrace (crc %x):\n", object->checksum);
 
+	return object->trace_handle;
+}
+
+/*
+ * Prints stack traces of unreferenced objects outside of the lock context.
+ * This avoids potential issues with printing pointers that might require
+ * additional locking.
+ */
+static void print_stack_trace(struct seq_file *seq,
+			      depot_stack_handle_t h)
+{
+	int i;
+	unsigned long *entries;
+	unsigned int nr_entries;
+
+	nr_entries = stack_depot_fetch(h, &entries);
+
 	for (i = 0; i < nr_entries; i++) {
 		void *ptr = (void *)entries[i];
 		warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
@@ -1621,7 +1633,9 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
  */
 static void kmemleak_scan(void)
 {
+	depot_stack_handle_t stackdepot_handle;
 	struct kmemleak_object *object;
+	bool do_print = false;
 	struct zone *zone;
 	int __maybe_unused i;
 	int new_leaks = 0;
@@ -1783,12 +1797,17 @@ static void kmemleak_scan(void)
 		    !(object->flags & OBJECT_REPORTED)) {
 			object->flags |= OBJECT_REPORTED;
 
-			if (kmemleak_verbose)
-				print_unreferenced(NULL, object);
+			if (kmemleak_verbose) {
+				stackdepot_handle = print_unreferenced(NULL, object);
+				do_print = true;
+			}
 
 			new_leaks++;
 		}
 		raw_spin_unlock_irq(&object->lock);
+		if (kmemleak_verbose && do_print)
+			print_stack_trace(NULL, stackdepot_handle);
+
 	}
 	rcu_read_unlock();
 
@@ -1937,13 +1956,20 @@ static void kmemleak_seq_stop(struct seq_file *seq, void *v)
  */
 static int kmemleak_seq_show(struct seq_file *seq, void *v)
 {
+	depot_stack_handle_t stackdepot_handle;
 	struct kmemleak_object *object = v;
+	bool do_print = false;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&object->lock, flags);
-	if ((object->flags & OBJECT_REPORTED) && unreferenced_object(object))
-		print_unreferenced(seq, object);
+	if ((object->flags & OBJECT_REPORTED) && unreferenced_object(object)) {
+		stackdepot_handle = print_unreferenced(seq, object);
+		do_print = true;
+	}
 	raw_spin_unlock_irqrestore(&object->lock, flags);
+	if (do_print)
+		print_stack_trace(seq, stackdepot_handle);
+
 	return 0;
 }
 
-- 
2.34.1
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Catalin Marinas 1 week, 6 days ago
Hi Alessandro,

On Wed, Nov 20, 2024 at 10:23:25AM +0000, Alessandro Carminati wrote:
> This patch addresses a bug in the RT variant of the kernel where a
> "sleeping function called from invalid context" warning may occur in
> kmemleak_seq_show under specific conditions:
> - CONFIG_PREEMPT_RT=y
> - SELinux is the LSM for the system
> - `kptr_restrict` is set to 1.
> - The kmemleak buffer contains at least one item.
> 
> Commit 8c96f1bc6fc49c724c4cdd22d3e99260263b7384 ("mm/kmemleak: turn
> kmemleak_lock and object->lock to raw_spinlock_t") introduced a change
> where kmemleak_seq_show is executed in atomic context within the RT kernel.
> However, the SELinux capability check within this function flow still
> relies on regular spinlocks, leading to potential race conditions that
> trigger the error when printing the kmemleak backtrace.
> 
> To resolve this, the backtrace printing has been moved out of the critical
> section.
> 
> Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
> ---
> Please read previous conversation in the RFC
> https://lore.kernel.org/all/20241115145410.114376-1-acarmina@redhat.com/
> 
> Splash triggering this patch:
> 
> ```
> [  159.247069] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> [  159.247193] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat
> [  159.247241] preempt_count: 1, expected: 0
> [  159.247277] RCU nest depth: 2, expected: 2
> [  159.247388] 6 locks held by cat/136:
> [  159.247438]  #0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30
> [  159.248835]  #1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128
> [  159.249053]  #3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0
> [  159.249127]  #4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0
> [  159.249205]  #5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0
> [  159.249364] irq event stamp: 136660
> [  159.249407] hardirqs last  enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8
> [  159.249465] hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0
> [  159.249518] softirqs last  enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8
> [  159.249571] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [  159.249970] Preemption disabled at:
> [  159.249988] [<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
> [  159.250609] CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G            E      6.11.0-rt7+ #34
> [  159.250797] Tainted: [E]=UNSIGNED_MODULE
> [  159.250822] Hardware name: linux,dummy-virt (DT)
> [  159.251050] Call trace:
[...]

It would be worth including the kernel dump in the commit log for future
references but remove the timestamps.

It also needs some explanation that the stack depot entries are never
freed by kmemleak, so no need to refcount.

> @@ -356,14 +356,9 @@ static bool unreferenced_object(struct kmemleak_object *object)
>   * Printing of the unreferenced objects information to the seq file. The
>   * print_unreferenced function must be called with the object->lock held.
>   */
> -static void print_unreferenced(struct seq_file *seq,
> +static depot_stack_handle_t print_unreferenced(struct seq_file *seq,
>  			       struct kmemleak_object *object)
>  {
> -	int i;
> -	unsigned long *entries;
> -	unsigned int nr_entries;
> -
> -	nr_entries = stack_depot_fetch(object->trace_handle, &entries);
>  	warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
>  			  object->pointer, object->size);
>  	warn_or_seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu\n",
> @@ -371,6 +366,23 @@ static void print_unreferenced(struct seq_file *seq,
>  	hex_dump_object(seq, object);
>  	warn_or_seq_printf(seq, "  backtrace (crc %x):\n", object->checksum);
>  
> +	return object->trace_handle;
> +}

What I don't fully understand - is this a problem with any seq_printf()
or just the backtrace pointers from the stack depot that trigger this
issue? I guess it's something to do with restricted pointers but I'm not
familiar with the PREEMPT_RT concepts. It would be good to explain,
ideally both in the commit log and a comment in the code, why we only
need to do this for the stack dump.

> +
> +/*
> + * Prints stack traces of unreferenced objects outside of the lock context.
> + * This avoids potential issues with printing pointers that might require
> + * additional locking.
> + */
> +static void print_stack_trace(struct seq_file *seq,
> +			      depot_stack_handle_t h)
> +{
> +	int i;
> +	unsigned long *entries;
> +	unsigned int nr_entries;
> +
> +	nr_entries = stack_depot_fetch(h, &entries);
> +
>  	for (i = 0; i < nr_entries; i++) {
>  		void *ptr = (void *)entries[i];
>  		warn_or_seq_printf(seq, "    [<%pK>] %pS\n", ptr, ptr);
> @@ -1621,7 +1633,9 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
>   */
>  static void kmemleak_scan(void)
>  {
> +	depot_stack_handle_t stackdepot_handle;
>  	struct kmemleak_object *object;
> +	bool do_print = false;
>  	struct zone *zone;
>  	int __maybe_unused i;
>  	int new_leaks = 0;
> @@ -1783,12 +1797,17 @@ static void kmemleak_scan(void)
>  		    !(object->flags & OBJECT_REPORTED)) {
>  			object->flags |= OBJECT_REPORTED;
>  
> -			if (kmemleak_verbose)
> -				print_unreferenced(NULL, object);
> +			if (kmemleak_verbose) {
> +				stackdepot_handle = print_unreferenced(NULL, object);
> +				do_print = true;
> +			}
>  
>  			new_leaks++;
>  		}
>  		raw_spin_unlock_irq(&object->lock);
> +		if (kmemleak_verbose && do_print)
> +			print_stack_trace(NULL, stackdepot_handle);
> +
>  	}
>  	rcu_read_unlock();

I wonder whether it would be simpler to just have a copy of the object
on the stack. The only downside is hex_dump_object() which can only be
done under the lock, otherwise the object may disappear. But we can copy
part of the object to a buffer on the stack as well. Something like:

static void kmemleak_scan(void)
{
	...
	struct kmemleak_object unref_object;
	u8 unref_buf[HEX_MAX_LINES * HEX_ROW_SIZE];
	...
}

static void save_unref_object(struct kmemleak_object *unref_object,
			      struct kmemleak_object *object,
			      u8 *unref_buf)
{
	unref_object = *object;
	unref_object->pointer = (unsigned long)unref_buf;
	...
	// some memcpy from object->pointer to unref_buf similar to what
	// we do in hex_dump_object().
}

Update hex_dump_object() accordingly (i.e. skip per-cpu checks since we
copied the above).

After this, just call print_unreferenced(&unref_object) outside the
raw_spin_lock.

Thanks.

-- 
Catalin
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Steven Rostedt 1 week, 6 days ago
On Wed, 20 Nov 2024 14:53:13 +0000
Catalin Marinas <catalin.marinas@arm.com> wrote:

> > -static void print_unreferenced(struct seq_file *seq,
> > +static depot_stack_handle_t print_unreferenced(struct seq_file *seq,
> >  			       struct kmemleak_object *object)
> >  {
> > -	int i;
> > -	unsigned long *entries;
> > -	unsigned int nr_entries;
> > -
> > -	nr_entries = stack_depot_fetch(object->trace_handle, &entries);
> >  	warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
> >  			  object->pointer, object->size);
> >  	warn_or_seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu\n",
> > @@ -371,6 +366,23 @@ static void print_unreferenced(struct seq_file *seq,
> >  	hex_dump_object(seq, object);
> >  	warn_or_seq_printf(seq, "  backtrace (crc %x):\n", object->checksum);
> >  
> > +	return object->trace_handle;
> > +}  
> 
> What I don't fully understand - is this a problem with any seq_printf()
> or just the backtrace pointers from the stack depot that trigger this
> issue? I guess it's something to do with restricted pointers but I'm not
> familiar with the PREEMPT_RT concepts. It would be good to explain,
> ideally both in the commit log and a comment in the code, why we only
> need to do this for the stack dump.

In PREEMPT_RT, to achieve the ability to preempt in more context,
spin_lock() is converted to a special sleeping mutex. But there's some
places where it can not be converted, and in those cases we use
raw_spin_lock(). kmemleak has been converted to use raw_spin_lock() which
means anything that gets called under that lock can not take a normal
spin_lock().

What happened here is that the kmemleak raw spinlock is held and
seq_printf() is called. Normally, this is not an issue, but the behavior of
seq_printf() is dependent on what values is being printed.

The "%pK" dereferences a pointer and there's some SELinux hooks attached to
that code. The problem is that the SELinux hooks take spinlocks. This would
not have been an issue if it wasn't for that "%pK" in the format.

Maybe SELinux locks should be converted to raw? I don't know how long that
lock is held. There are some loops though :-/

avc_insert():

	spin_lock_irqsave(lock, flag);
	hlist_for_each_entry(pos, head, list) {
		if (pos->ae.ssid == ssid &&
			pos->ae.tsid == tsid &&
			pos->ae.tclass == tclass) {
			avc_node_replace(node, pos);
			goto found;
		}
	}
	hlist_add_head_rcu(&node->list, head);
found:
	spin_unlock_irqrestore(lock, flag);

Perhaps that could be converted to simple RCU?

As I'm sure there's other places that call vsprintf() under a raw_spin_lock
or non-preemptable context, perhaps this should be the fix we do.

-- Steve
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Catalin Marinas 1 week, 4 days ago
On Wed, Nov 20, 2024 at 10:26:02AM -0500, Steven Rostedt wrote:
> On Wed, 20 Nov 2024 14:53:13 +0000
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > -static void print_unreferenced(struct seq_file *seq,
> > > +static depot_stack_handle_t print_unreferenced(struct seq_file *seq,
> > >  			       struct kmemleak_object *object)
> > >  {
> > > -	int i;
> > > -	unsigned long *entries;
> > > -	unsigned int nr_entries;
> > > -
> > > -	nr_entries = stack_depot_fetch(object->trace_handle, &entries);
> > >  	warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
> > >  			  object->pointer, object->size);
> > >  	warn_or_seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu\n",
> > > @@ -371,6 +366,23 @@ static void print_unreferenced(struct seq_file *seq,
> > >  	hex_dump_object(seq, object);
> > >  	warn_or_seq_printf(seq, "  backtrace (crc %x):\n", object->checksum);
> > >  
> > > +	return object->trace_handle;
> > > +}  
> > 
> > What I don't fully understand - is this a problem with any seq_printf()
> > or just the backtrace pointers from the stack depot that trigger this
> > issue? I guess it's something to do with restricted pointers but I'm not
> > familiar with the PREEMPT_RT concepts. It would be good to explain,
> > ideally both in the commit log and a comment in the code, why we only
> > need to do this for the stack dump.
> 
> In PREEMPT_RT, to achieve the ability to preempt in more context,
> spin_lock() is converted to a special sleeping mutex. But there's some
> places where it can not be converted, and in those cases we use
> raw_spin_lock(). kmemleak has been converted to use raw_spin_lock() which
> means anything that gets called under that lock can not take a normal
> spin_lock().
> 
> What happened here is that the kmemleak raw spinlock is held and
> seq_printf() is called. Normally, this is not an issue, but the behavior of
> seq_printf() is dependent on what values is being printed.
> 
> The "%pK" dereferences a pointer and there's some SELinux hooks attached to
> that code. The problem is that the SELinux hooks take spinlocks. This would
> not have been an issue if it wasn't for that "%pK" in the format.

Thanks Steven. That's very useful.

> Maybe SELinux locks should be converted to raw? I don't know how long that
> lock is held. There are some loops though :-/
> 
> avc_insert():
> 
> 	spin_lock_irqsave(lock, flag);
> 	hlist_for_each_entry(pos, head, list) {
> 		if (pos->ae.ssid == ssid &&
> 			pos->ae.tsid == tsid &&
> 			pos->ae.tclass == tclass) {
> 			avc_node_replace(node, pos);
> 			goto found;
> 		}
> 	}
> 	hlist_add_head_rcu(&node->list, head);
> found:
> 	spin_unlock_irqrestore(lock, flag);
> 
> Perhaps that could be converted to simple RCU?
> 
> As I'm sure there's other places that call vsprintf() under a raw_spin_lock
> or non-preemptable context, perhaps this should be the fix we do.

My preference would also be to convert SELinux rather than avoiding the
issue in kmemleak (and other similar places).

-- 
Catalin
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Sebastian Andrzej Siewior 1 week, 4 days ago
On 2024-11-21 19:19:05 [+0000], Catalin Marinas wrote:
…
> > Maybe SELinux locks should be converted to raw? I don't know how long that
> > lock is held. There are some loops though :-/
> > 
> > avc_insert():
> > 
> > 	spin_lock_irqsave(lock, flag);
> > 	hlist_for_each_entry(pos, head, list) {
> > 		if (pos->ae.ssid == ssid &&
> > 			pos->ae.tsid == tsid &&
> > 			pos->ae.tclass == tclass) {
> > 			avc_node_replace(node, pos);
> > 			goto found;
> > 		}
> > 	}
> > 	hlist_add_head_rcu(&node->list, head);
> > found:
> > 	spin_unlock_irqrestore(lock, flag);
> > 
> > Perhaps that could be converted to simple RCU?
> > 
> > As I'm sure there's other places that call vsprintf() under a raw_spin_lock
> > or non-preemptable context, perhaps this should be the fix we do.
> 
> My preference would also be to convert SELinux rather than avoiding the
> issue in kmemleak (and other similar places).

No. kmemleak has been made use a raw_spinlock_t because most of what it
does is something that is not used in production on a PREEMPT_RT system
and falls in the same category as lockdep for instance. And that code
calls into LSM/ selinux.
Before making the lock in selinux a raw_spinlock_t you have to think
about the consequences in general and audit the code. From a quick
look, there is also avc_insert() invoked in that callchain which
allocates memory and this is a no no.
Also, if you make the solution here in selinux to use a raw_spinlock_t
you would have to do it also in every LSM as they might be used instead
of selinux.

Therefore, I still prefer adding PREEMPT_RT to the restricted_pointer()
category for atomic invocations.

Sebastian
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Catalin Marinas 1 week, 4 days ago
On Fri, Nov 22, 2024 at 09:14:37AM +0100, Sebastian Andrzej Siewior wrote:
> On 2024-11-21 19:19:05 [+0000], Catalin Marinas wrote:
> …
> > > Maybe SELinux locks should be converted to raw? I don't know how long that
> > > lock is held. There are some loops though :-/
> > > 
> > > avc_insert():
> > > 
> > > 	spin_lock_irqsave(lock, flag);
> > > 	hlist_for_each_entry(pos, head, list) {
> > > 		if (pos->ae.ssid == ssid &&
> > > 			pos->ae.tsid == tsid &&
> > > 			pos->ae.tclass == tclass) {
> > > 			avc_node_replace(node, pos);
> > > 			goto found;
> > > 		}
> > > 	}
> > > 	hlist_add_head_rcu(&node->list, head);
> > > found:
> > > 	spin_unlock_irqrestore(lock, flag);
> > > 
> > > Perhaps that could be converted to simple RCU?
> > > 
> > > As I'm sure there's other places that call vsprintf() under a raw_spin_lock
> > > or non-preemptable context, perhaps this should be the fix we do.
> > 
> > My preference would also be to convert SELinux rather than avoiding the
> > issue in kmemleak (and other similar places).
> 
> No. kmemleak has been made use a raw_spinlock_t because most of what it
> does is something that is not used in production on a PREEMPT_RT system
> and falls in the same category as lockdep for instance. And that code
> calls into LSM/ selinux.
> Before making the lock in selinux a raw_spinlock_t you have to think
> about the consequences in general and audit the code. From a quick
> look, there is also avc_insert() invoked in that callchain which
> allocates memory and this is a no no.
> Also, if you make the solution here in selinux to use a raw_spinlock_t
> you would have to do it also in every LSM as they might be used instead
> of selinux.

Good point, thanks. Kmemleak is indeed a debug tool not supposed to be
used in production. Modifying SELinux has wider implications for
PREEMPT_RT.

> Therefore, I still prefer adding PREEMPT_RT to the restricted_pointer()
> category for atomic invocations.

This should work. If one wants the actual (hashed) pointers with
kmemleak, I guess they can disable kptr_restrict.

-- 
Catalin
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Sebastian Andrzej Siewior 1 week, 6 days ago
On 2024-11-20 10:26:02 [-0500], Steven Rostedt wrote:
> The "%pK" dereferences a pointer and there's some SELinux hooks attached to
> that code. The problem is that the SELinux hooks take spinlocks. This would
> not have been an issue if it wasn't for that "%pK" in the format.

That is missing check and I think Thomas Weissschuh wanted to add it. So
we don't call into selinux.

Sebastian
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Alessandro Carminati 1 week, 5 days ago
Hello Sebastian,

On Wed, Nov 20, 2024 at 5:40 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2024-11-20 10:26:02 [-0500], Steven Rostedt wrote:
> > The "%pK" dereferences a pointer and there's some SELinux hooks attached to
> > that code. The problem is that the SELinux hooks take spinlocks. This would
> > not have been an issue if it wasn't for that "%pK" in the format.
>
> That is missing check and I think Thomas Weissschuh wanted to add it. So
> we don't call into selinux.

Your comment confuses me a bit, as I'm unsure what Thomas is actually
working on.
Am I correct in assuming he's addressing a fix in lib/vsprintf.c to ensure
that sleeping functions aren't called, allowing these functions to work in
any context?
However, his mention of "This fix for kmemleak is still needed as the
pointers in the kmemleak report are useful" adds to my confusion.
Meanwhile, Steven suggests reworking SELinux to resolve the issue.
Could you clarify what you mean by "So we don't call into selinux"?

>
> Sebastian
>

Thanks
-- 
---
172
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Sebastian Andrzej Siewior 1 week, 5 days ago
On 2024-11-21 17:50:06 [+0100], Alessandro Carminati wrote:
> Hello Sebastian,
Hi Alessandro,

> On Wed, Nov 20, 2024 at 5:40 PM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2024-11-20 10:26:02 [-0500], Steven Rostedt wrote:
> > > The "%pK" dereferences a pointer and there's some SELinux hooks attached to
> > > that code. The problem is that the SELinux hooks take spinlocks. This would
> > > not have been an issue if it wasn't for that "%pK" in the format.
> >
> > That is missing check and I think Thomas Weissschuh wanted to add it. So
> > we don't call into selinux.
> 
> Your comment confuses me a bit, as I'm unsure what Thomas is actually
> working on.
> Am I correct in assuming he's addressing a fix in lib/vsprintf.c to ensure
> that sleeping functions aren't called, allowing these functions to work in
> any context?

restricted_pointer() has a check for in_hardirq() among others. This
needs an additional PREEMPT_RT check.
I would be actual in favour to get rid of case 1 for kptr_restrict and
have either 0 or 2.

> However, his mention of "This fix for kmemleak is still needed as the
> pointers in the kmemleak report are useful" adds to my confusion.
> Meanwhile, Steven suggests reworking SELinux to resolve the issue.
> Could you clarify what you mean by "So we don't call into selinux"?

This getting out of hand. By adding the PREEMPT_RT check to
restricted_pointer() we don't call in selinux so the problem is gone.
kmemleak is not the only problem. printk(), as another of vspritf pointer
code user, can be called from any place and would also trigger a
warning here.
As far as "kmemleak need to be usefull" goes: With kptr_restrict == 0
then with or without pointer hashing they will be useful. If we need to
go via selinux then it ends as a hint.

Sebastian
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Alessandro Carminati 1 week, 4 days ago
Hi Sebastian,

On Thu, Nov 21, 2024 at 6:04 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2024-11-21 17:50:06 [+0100], Alessandro Carminati wrote:
> > Hello Sebastian,
> Hi Alessandro,
>
> > On Wed, Nov 20, 2024 at 5:40 PM Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de> wrote:
> > >
> > > On 2024-11-20 10:26:02 [-0500], Steven Rostedt wrote:
> > > > The "%pK" dereferences a pointer and there's some SELinux hooks attached to
> > > > that code. The problem is that the SELinux hooks take spinlocks. This would
> > > > not have been an issue if it wasn't for that "%pK" in the format.
> > >
> > > That is missing check and I think Thomas Weissschuh wanted to add it. So
> > > we don't call into selinux.
> >
> > Your comment confuses me a bit, as I'm unsure what Thomas is actually
> > working on.
> > Am I correct in assuming he's addressing a fix in lib/vsprintf.c to ensure
> > that sleeping functions aren't called, allowing these functions to work in
> > any context?
>
> restricted_pointer() has a check for in_hardirq() among others. This
> needs an additional PREEMPT_RT check.
> I would be actual in favour to get rid of case 1 for kptr_restrict and
> have either 0 or 2.
>
> > However, his mention of "This fix for kmemleak is still needed as the
> > pointers in the kmemleak report are useful" adds to my confusion.
> > Meanwhile, Steven suggests reworking SELinux to resolve the issue.
> > Could you clarify what you mean by "So we don't call into selinux"?
>
> This getting out of hand. By adding the PREEMPT_RT check to
> restricted_pointer() we don't call in selinux so the problem is gone.

I am really glad that now we have a clear solution, however practically
speaking is Thomas working on such a patch or is he working on something
related that does not fully solve the problem?

Even if he is working on a partial solution, I am happy to coordinate
off-list working on his own private branch
(or else I would just give up and review the Thomas' patchset when it is
out...)

> kmemleak is not the only problem. printk(), as another of vspritf pointer
> code user, can be called from any place and would also trigger a
> warning here.
> As far as "kmemleak need to be usefull" goes: With kptr_restrict == 0
> then with or without pointer hashing they will be useful. If we need to
> go via selinux then it ends as a hint.
>
> Sebastian
>


-- 
---
172
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Thomas Weissschuh 1 week ago
Hi Alessandro,

On Fri, Nov 22, 2024 at 11:48:04AM +0100, Alessandro Carminati wrote:
> On Thu, Nov 21, 2024 at 6:04 PM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> > On 2024-11-21 17:50:06 [+0100], Alessandro Carminati wrote:
> > > On Wed, Nov 20, 2024 at 5:40 PM Sebastian Andrzej Siewior
> > > <bigeasy@linutronix.de> wrote:
> > > >
> > > > On 2024-11-20 10:26:02 [-0500], Steven Rostedt wrote:
> > > > > The "%pK" dereferences a pointer and there's some SELinux hooks attached to
> > > > > that code. The problem is that the SELinux hooks take spinlocks. This would
> > > > > not have been an issue if it wasn't for that "%pK" in the format.
> > > >
> > > > That is missing check and I think Thomas Weissschuh wanted to add it. So
> > > > we don't call into selinux.
> > >
> > > Your comment confuses me a bit, as I'm unsure what Thomas is actually
> > > working on.
> > > Am I correct in assuming he's addressing a fix in lib/vsprintf.c to ensure
> > > that sleeping functions aren't called, allowing these functions to work in
> > > any context?
> >
> > restricted_pointer() has a check for in_hardirq() among others. This
> > needs an additional PREEMPT_RT check.
> > I would be actual in favour to get rid of case 1 for kptr_restrict and
> > have either 0 or 2.
> >
> > > However, his mention of "This fix for kmemleak is still needed as the
> > > pointers in the kmemleak report are useful" adds to my confusion.
> > > Meanwhile, Steven suggests reworking SELinux to resolve the issue.
> > > Could you clarify what you mean by "So we don't call into selinux"?
> >
> > This getting out of hand. By adding the PREEMPT_RT check to
> > restricted_pointer() we don't call in selinux so the problem is gone.
> 
> I am really glad that now we have a clear solution, however practically
> speaking is Thomas working on such a patch or is he working on something
> related that does not fully solve the problem?

I am working on a change to lib/vsprintf.c. Something like the
following:

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c5e2ec9303c5..54ad175a22bd 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -870,6 +870,9 @@ char *restricted_pointer(char *buf, char *end, const void *ptr,
                        return error_string(buf, end, "pK-error", spec);
                }

+               if (IS_ENABLED(CONFIG_PREEMPT_RT) && ...)
+                       return error_string(buf, end, "pK-error", spec);
+
                /*
                 * Only print the real pointer value if the current
                 * process has CAP_SYSLOG and is running with the


Hoping to replace the ... with something that can detect actual
problematic situations at runtime.
I'm fairly new to PREEMPT_RT so need to do some digging.

> Even if he is working on a partial solution, I am happy to coordinate
> off-list working on his own private branch
> (or else I would just give up and review the Thomas' patchset when it is
> out...)

Given that there is no direct interaction between your kmemleak patches
and the vsprintf ones I don't think coordination is necessary.
I'll Cc you for review.

Please do go ahead with your kmemleak patches.

> > kmemleak is not the only problem. printk(), as another of vspritf pointer
> > code user, can be called from any place and would also trigger a
> > warning here.
> > As far as "kmemleak need to be usefull" goes: With kptr_restrict == 0
> > then with or without pointer hashing they will be useful. If we need to
> > go via selinux then it ends as a hint.


Thomas
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Catalin Marinas 1 week ago
On Tue, Nov 26, 2024 at 09:11:31AM +0100, Thomas Weissschuh wrote:
> On Fri, Nov 22, 2024 at 11:48:04AM +0100, Alessandro Carminati wrote:
> > On Thu, Nov 21, 2024 at 6:04 PM Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de> wrote:
> > > On 2024-11-21 17:50:06 [+0100], Alessandro Carminati wrote:
> > > > On Wed, Nov 20, 2024 at 5:40 PM Sebastian Andrzej Siewior
> > > > <bigeasy@linutronix.de> wrote:
> > > > >
> > > > > On 2024-11-20 10:26:02 [-0500], Steven Rostedt wrote:
> > > > > > The "%pK" dereferences a pointer and there's some SELinux hooks attached to
> > > > > > that code. The problem is that the SELinux hooks take spinlocks. This would
> > > > > > not have been an issue if it wasn't for that "%pK" in the format.
> > > > >
> > > > > That is missing check and I think Thomas Weissschuh wanted to add it. So
> > > > > we don't call into selinux.
> > > >
> > > > Your comment confuses me a bit, as I'm unsure what Thomas is actually
> > > > working on.
> > > > Am I correct in assuming he's addressing a fix in lib/vsprintf.c to ensure
> > > > that sleeping functions aren't called, allowing these functions to work in
> > > > any context?
> > >
> > > restricted_pointer() has a check for in_hardirq() among others. This
> > > needs an additional PREEMPT_RT check.
> > > I would be actual in favour to get rid of case 1 for kptr_restrict and
> > > have either 0 or 2.
> > >
> > > > However, his mention of "This fix for kmemleak is still needed as the
> > > > pointers in the kmemleak report are useful" adds to my confusion.
> > > > Meanwhile, Steven suggests reworking SELinux to resolve the issue.
> > > > Could you clarify what you mean by "So we don't call into selinux"?
> > >
> > > This getting out of hand. By adding the PREEMPT_RT check to
> > > restricted_pointer() we don't call in selinux so the problem is gone.
> > 
> > I am really glad that now we have a clear solution, however practically
> > speaking is Thomas working on such a patch or is he working on something
> > related that does not fully solve the problem?
> 
> I am working on a change to lib/vsprintf.c. Something like the
> following:
> 
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index c5e2ec9303c5..54ad175a22bd 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -870,6 +870,9 @@ char *restricted_pointer(char *buf, char *end, const void *ptr,
>                         return error_string(buf, end, "pK-error", spec);
>                 }
> 
> +               if (IS_ENABLED(CONFIG_PREEMPT_RT) && ...)
> +                       return error_string(buf, end, "pK-error", spec);
> +
>                 /*
>                  * Only print the real pointer value if the current
>                  * process has CAP_SYSLOG and is running with the
> 
> 
> Hoping to replace the ... with something that can detect actual
> problematic situations at runtime.
> I'm fairly new to PREEMPT_RT so need to do some digging.
> 
> > Even if he is working on a partial solution, I am happy to coordinate
> > off-list working on his own private branch
> > (or else I would just give up and review the Thomas' patchset when it is
> > out...)
> 
> Given that there is no direct interaction between your kmemleak patches
> and the vsprintf ones I don't think coordination is necessary.
> I'll Cc you for review.
> 
> Please do go ahead with your kmemleak patches.

If the above fixes the kmemleak warnings by not triggering a call into
SELinux, I'd rather not change kmemleak. I don't know how important it
is to run kmemleak with kptr_restrict=1 and get meaningful pointers.

-- 
Catalin
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Alessandro Carminati 1 week, 6 days ago
Looping selinix Maintainers into the conversation.


On Wed, Nov 20, 2024 at 4:30 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 20 Nov 2024 14:53:13 +0000
> Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> > > -static void print_unreferenced(struct seq_file *seq,
> > > +static depot_stack_handle_t print_unreferenced(struct seq_file *seq,
> > >                            struct kmemleak_object *object)
> > >  {
> > > -   int i;
> > > -   unsigned long *entries;
> > > -   unsigned int nr_entries;
> > > -
> > > -   nr_entries = stack_depot_fetch(object->trace_handle, &entries);
> > >     warn_or_seq_printf(seq, "unreferenced object 0x%08lx (size %zu):\n",
> > >                       object->pointer, object->size);
> > >     warn_or_seq_printf(seq, "  comm \"%s\", pid %d, jiffies %lu\n",
> > > @@ -371,6 +366,23 @@ static void print_unreferenced(struct seq_file *seq,
> > >     hex_dump_object(seq, object);
> > >     warn_or_seq_printf(seq, "  backtrace (crc %x):\n", object->checksum);
> > >
> > > +   return object->trace_handle;
> > > +}
> >
> > What I don't fully understand - is this a problem with any seq_printf()
> > or just the backtrace pointers from the stack depot that trigger this
> > issue? I guess it's something to do with restricted pointers but I'm not
> > familiar with the PREEMPT_RT concepts. It would be good to explain,
> > ideally both in the commit log and a comment in the code, why we only
> > need to do this for the stack dump.
>
> In PREEMPT_RT, to achieve the ability to preempt in more context,
> spin_lock() is converted to a special sleeping mutex. But there's some
> places where it can not be converted, and in those cases we use
> raw_spin_lock(). kmemleak has been converted to use raw_spin_lock() which
> means anything that gets called under that lock can not take a normal
> spin_lock().
>
> What happened here is that the kmemleak raw spinlock is held and
> seq_printf() is called. Normally, this is not an issue, but the behavior of
> seq_printf() is dependent on what values is being printed.
>
> The "%pK" dereferences a pointer and there's some SELinux hooks attached to
> that code. The problem is that the SELinux hooks take spinlocks. This would
> not have been an issue if it wasn't for that "%pK" in the format.
>
> Maybe SELinux locks should be converted to raw? I don't know how long that
> lock is held. There are some loops though :-/
>
> avc_insert():
>
>         spin_lock_irqsave(lock, flag);
>         hlist_for_each_entry(pos, head, list) {
>                 if (pos->ae.ssid == ssid &&
>                         pos->ae.tsid == tsid &&
>                         pos->ae.tclass == tclass) {
>                         avc_node_replace(node, pos);
>                         goto found;
>                 }
>         }
>         hlist_add_head_rcu(&node->list, head);
> found:
>         spin_unlock_irqrestore(lock, flag);
>
> Perhaps that could be converted to simple RCU?
>
> As I'm sure there's other places that call vsprintf() under a raw_spin_lock
> or non-preemptable context, perhaps this should be the fix we do.
@Paul and @Stephen do you have any feedback on this idea?

>
> -- Steve
>


-- 
---
172
Re: [PATCH] mm/kmemleak: Fix sleeping function called from invalid context in kmemleak_seq_show
Posted by Thomas Weissschuh 1 week, 6 days ago
On Wed, Nov 20, 2024 at 02:53:13PM +0000, Catalin Marinas wrote:
> On Wed, Nov 20, 2024 at 10:23:25AM +0000, Alessandro Carminati wrote:
> > This patch addresses a bug in the RT variant of the kernel where a
> > "sleeping function called from invalid context" warning may occur in
> > kmemleak_seq_show under specific conditions:
> > - CONFIG_PREEMPT_RT=y
> > - SELinux is the LSM for the system
> > - `kptr_restrict` is set to 1.
> > - The kmemleak buffer contains at least one item.
> > 
> > Commit 8c96f1bc6fc49c724c4cdd22d3e99260263b7384 ("mm/kmemleak: turn
> > kmemleak_lock and object->lock to raw_spinlock_t") introduced a change
> > where kmemleak_seq_show is executed in atomic context within the RT kernel.
> > However, the SELinux capability check within this function flow still
> > relies on regular spinlocks, leading to potential race conditions that
> > trigger the error when printing the kmemleak backtrace.
> > 
> > To resolve this, the backtrace printing has been moved out of the critical
> > section.

[..]

> What I don't fully understand - is this a problem with any seq_printf()
> or just the backtrace pointers from the stack depot that trigger this
> issue? I guess it's something to do with restricted pointers but I'm not
> familiar with the PREEMPT_RT concepts. It would be good to explain,
> ideally both in the commit log and a comment in the code, why we only
> need to do this for the stack dump.

Yes, this is a problem for all users of lib/vsprintf.c.
I am working on a fix for this, to avoid calling the sleeping functions
from contexts which are not allowed to do so.
In these cases the pointers would not be printed.

This fix for kmemleak is still needed as the pointers in the kmemleak
report are useful.


Thomas