[v2] mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF

[RESEND PATCH v2 2/2] mm/kmemleak: Fix UAF bug in kmemleak_scan()

Posted by Waiman Long 3 years ago

Commit 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first
object iteration loop of kmemleak_scan()") fixes soft lockup problem
in kmemleak_scan() by periodically doing a cond_resched(). It does
take a reference of the current object before doing it. Unfortunately,
if the object has been deleted from the object_list, the next object
pointed to by its next pointer may no longer be valid after coming
back from cond_resched(). This can result in use-after-free and other
nasty problem.

Fix this problem by adding a del_state flag into kmemleak_object
structure to synchronize the object deletion process between
kmemleak_cond_resched() and __remove_object() to make sure that the
object remained in the object_list in the duration of the cond_resched()
call.

Fixes: 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan()")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/kmemleak.c | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index e7cb521236bf..0ece170fc9ef 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -13,11 +13,12 @@
  *
  * The following locks and mutexes are used by kmemleak:
  *
- * - kmemleak_lock (raw_spinlock_t): protects the object_list modifications and
- *   accesses to the object_tree_root (or object_phys_tree_root). The
- *   object_list is the main list holding the metadata (struct kmemleak_object)
- *   for the allocated memory blocks. The object_tree_root and object_phys_tree_root
- *   are red black trees used to look-up metadata based on a pointer to the
+ * - kmemleak_lock (raw_spinlock_t): protects the object_list as well as
+ *   del_state modifications and accesses to the object_tree_root (or
+ *   object_phys_tree_root). The object_list is the main list holding the
+ *   metadata (struct kmemleak_object) for the allocated memory blocks.
+ *   The object_tree_root and object_phys_tree_root are red
+ *   black trees used to look-up metadata based on a pointer to the
  *   corresponding memory block. The object_phys_tree_root is for objects
  *   allocated with physical address. The kmemleak_object structures are
  *   added to the object_list and object_tree_root (or object_phys_tree_root)
@@ -147,6 +148,7 @@ struct kmemleak_object {
 	struct rcu_head rcu;		/* object_list lockless traversal */
 	/* object usage count; object freed when use_count == 0 */
 	atomic_t use_count;
+	unsigned int del_state;		/* deletion state */
 	unsigned long pointer;
 	size_t size;
 	/* pass surplus references to this pointer */
@@ -177,6 +179,11 @@ struct kmemleak_object {
 /* flag set for object allocated with physical address */
 #define OBJECT_PHYS		(1 << 4)
 
+/* set when __remove_object() called */
+#define DELSTATE_REMOVED	(1 << 0)
+/* set to temporarily prevent deletion from object_list */
+#define DELSTATE_NO_DELETE	(1 << 1)
+
 #define HEX_PREFIX		"    "
 /* number of bytes to print per line; must be 16 or 32 */
 #define HEX_ROW_SIZE		16
@@ -567,7 +574,9 @@ static void __remove_object(struct kmemleak_object *object)
 	rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ?
 				   &object_phys_tree_root :
 				   &object_tree_root);
-	list_del_rcu(&object->object_list);
+	if (!(object->del_state & DELSTATE_NO_DELETE))
+		list_del_rcu(&object->object_list);
+	object->del_state |= DELSTATE_REMOVED;
 }
 
 /*
@@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size,
 	object->count = 0;			/* white color initially */
 	object->jiffies = jiffies;
 	object->checksum = 0;
+	object->del_state = 0;
 
 	/* task information */
 	if (in_hardirq()) {
@@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
 	if (!get_object(object))
 		return;	/* Try next object */
 
+	raw_spin_lock_irq(&kmemleak_lock);
+	if (object->del_state & DELSTATE_REMOVED)
+		goto unlock_put;	/* Object removed */
+	object->del_state |= DELSTATE_NO_DELETE;
+	raw_spin_unlock_irq(&kmemleak_lock);
+
 	rcu_read_unlock();
 	cond_resched();
 	rcu_read_lock();
+
+	raw_spin_lock_irq(&kmemleak_lock);
+	if (object->del_state & DELSTATE_REMOVED)
+		list_del_rcu(&object->object_list);
+	object->del_state &= ~DELSTATE_NO_DELETE;
+unlock_put:
+	raw_spin_unlock_irq(&kmemleak_lock);
 	put_object(object);
 }
 
-- 
2.31.1

Re: [RESEND PATCH v2 2/2] mm/kmemleak: Fix UAF bug in kmemleak_scan()

Posted by Catalin Marinas 3 years ago

Hi Waiman,

Thanks for your effort on trying to fix this.

On Wed, Jan 18, 2023 at 11:01:11PM -0500, Waiman Long wrote:
> @@ -567,7 +574,9 @@ static void __remove_object(struct kmemleak_object *object)
>  	rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ?
>  				   &object_phys_tree_root :
>  				   &object_tree_root);
> -	list_del_rcu(&object->object_list);
> +	if (!(object->del_state & DELSTATE_NO_DELETE))
> +		list_del_rcu(&object->object_list);
> +	object->del_state |= DELSTATE_REMOVED;
>  }

So IIUC, this prevents the current object being scanned from being
removed from the list during the kmemleak_cond_resched() call.

>  /*
> @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size,
>  	object->count = 0;			/* white color initially */
>  	object->jiffies = jiffies;
>  	object->checksum = 0;
> +	object->del_state = 0;
>  
>  	/* task information */
>  	if (in_hardirq()) {
> @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
>  	if (!get_object(object))
>  		return;	/* Try next object */
>  
> +	raw_spin_lock_irq(&kmemleak_lock);
> +	if (object->del_state & DELSTATE_REMOVED)
> +		goto unlock_put;	/* Object removed */
> +	object->del_state |= DELSTATE_NO_DELETE;
> +	raw_spin_unlock_irq(&kmemleak_lock);
> +
>  	rcu_read_unlock();
>  	cond_resched();
>  	rcu_read_lock();
> +
> +	raw_spin_lock_irq(&kmemleak_lock);
> +	if (object->del_state & DELSTATE_REMOVED)
> +		list_del_rcu(&object->object_list);
> +	object->del_state &= ~DELSTATE_NO_DELETE;
> +unlock_put:
> +	raw_spin_unlock_irq(&kmemleak_lock);
>  	put_object(object);
>  }

I'm not sure this was the only problem. We do have the problem that the
current object may be removed from the list, solved above, but another
scenario I had in mind is the next object being released during this
brief resched period. The RCU relies on object->next->next being valid
but, with a brief rcu_read_unlock(), the object->next could be freed,
reallocated, so object->next->next invalid.

-- 
Catalin

Re: [RESEND PATCH v2 2/2] mm/kmemleak: Fix UAF bug in kmemleak_scan()

Posted by Waiman Long 3 years ago

On 1/20/23 14:18, Catalin Marinas wrote:
> Hi Waiman,
>
> Thanks for your effort on trying to fix this.
>
> On Wed, Jan 18, 2023 at 11:01:11PM -0500, Waiman Long wrote:
>> @@ -567,7 +574,9 @@ static void __remove_object(struct kmemleak_object *object)
>>   	rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ?
>>   				   &object_phys_tree_root :
>>   				   &object_tree_root);
>> -	list_del_rcu(&object->object_list);
>> +	if (!(object->del_state & DELSTATE_NO_DELETE))
>> +		list_del_rcu(&object->object_list);
>> +	object->del_state |= DELSTATE_REMOVED;
>>   }
> So IIUC, this prevents the current object being scanned from being
> removed from the list during the kmemleak_cond_resched() call.

Yes, that is the point.


>
>>   /*
>> @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size,
>>   	object->count = 0;			/* white color initially */
>>   	object->jiffies = jiffies;
>>   	object->checksum = 0;
>> +	object->del_state = 0;
>>   
>>   	/* task information */
>>   	if (in_hardirq()) {
>> @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
>>   	if (!get_object(object))
>>   		return;	/* Try next object */
>>   
>> +	raw_spin_lock_irq(&kmemleak_lock);
>> +	if (object->del_state & DELSTATE_REMOVED)
>> +		goto unlock_put;	/* Object removed */
>> +	object->del_state |= DELSTATE_NO_DELETE;
>> +	raw_spin_unlock_irq(&kmemleak_lock);
>> +
>>   	rcu_read_unlock();
>>   	cond_resched();
>>   	rcu_read_lock();
>> +
>> +	raw_spin_lock_irq(&kmemleak_lock);
>> +	if (object->del_state & DELSTATE_REMOVED)
>> +		list_del_rcu(&object->object_list);
>> +	object->del_state &= ~DELSTATE_NO_DELETE;
>> +unlock_put:
>> +	raw_spin_unlock_irq(&kmemleak_lock);
>>   	put_object(object);
>>   }
> I'm not sure this was the only problem. We do have the problem that the
> current object may be removed from the list, solved above, but another
> scenario I had in mind is the next object being released during this
> brief resched period. The RCU relies on object->next->next being valid
> but, with a brief rcu_read_unlock(), the object->next could be freed,
> reallocated, so object->next->next invalid.

Looking at the following scenario,

object->next => A (removed)
A->next => B (removed)

As object->next is pointing to A, A must still be allocated and not 
freed yet. Now if B is also removed, there are 2 possible case.

1) B is removed from the list after the removal of A. In that case, it 
is not possible that A is allocated, but B is freed.

2) B is removed before A. A->next can't pointed to B when it is being 
removed. Due to weak memory ordering, it is possible that another cpu 
can see A->next still pointing to B. In that case, I believe that it is 
still within the grace period where neither A or B is freed.

In fact, it is no different from a regular scanning of the object list 
without ever called cond_resched().

Cheers,
Longman

Re: [RESEND PATCH v2 2/2] mm/kmemleak: Fix UAF bug in kmemleak_scan()

Posted by Catalin Marinas 3 years ago

On Fri, Jan 20, 2023 at 05:54:28PM -0500, Waiman Long wrote:
> On 1/20/23 14:18, Catalin Marinas wrote:
> > >   /*
> > > @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size,
> > >   	object->count = 0;			/* white color initially */
> > >   	object->jiffies = jiffies;
> > >   	object->checksum = 0;
> > > +	object->del_state = 0;
> > >   	/* task information */
> > >   	if (in_hardirq()) {
> > > @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
> > >   	if (!get_object(object))
> > >   		return;	/* Try next object */
> > > +	raw_spin_lock_irq(&kmemleak_lock);
> > > +	if (object->del_state & DELSTATE_REMOVED)
> > > +		goto unlock_put;	/* Object removed */
> > > +	object->del_state |= DELSTATE_NO_DELETE;
> > > +	raw_spin_unlock_irq(&kmemleak_lock);
> > > +
> > >   	rcu_read_unlock();
> > >   	cond_resched();
> > >   	rcu_read_lock();
> > > +
> > > +	raw_spin_lock_irq(&kmemleak_lock);
> > > +	if (object->del_state & DELSTATE_REMOVED)
> > > +		list_del_rcu(&object->object_list);
> > > +	object->del_state &= ~DELSTATE_NO_DELETE;
> > > +unlock_put:
> > > +	raw_spin_unlock_irq(&kmemleak_lock);
> > >   	put_object(object);
> > >   }
> > I'm not sure this was the only problem. We do have the problem that the
> > current object may be removed from the list, solved above, but another
> > scenario I had in mind is the next object being released during this
> > brief resched period. The RCU relies on object->next->next being valid
> > but, with a brief rcu_read_unlock(), the object->next could be freed,
> > reallocated, so object->next->next invalid.
> 
> Looking at the following scenario,
> 
> object->next => A (removed)
> A->next => B (removed)
> 
> As object->next is pointing to A, A must still be allocated and not freed
> yet. Now if B is also removed, there are 2 possible case.
> 
> 1) B is removed from the list after the removal of A. In that case, it is
> not possible that A is allocated, but B is freed.
> 
> 2) B is removed before A. A->next can't pointed to B when it is being
> removed. Due to weak memory ordering, it is possible that another cpu can
> see A->next still pointing to B. In that case, I believe that it is still
> within the grace period where neither A or B is freed.
> 
> In fact, it is no different from a regular scanning of the object list
> without ever called cond_resched().

More like thinking out loud:

The lockless RCU loop relies on object->next->next being valid within
the grace period (A not freed). Due to weak memory ordering, the looping
CPU may not observe the object->next update (removal of A) by another
CPU, so it continues to loop over it. But since we do an
rcu_read_unlock() in the middle of the loop, I don't think these
assumptions are still valid, so A may be freed.

What we need is that object->next reading for the following iteration
either sees the updated object->next (B) or it sees A but the latter
still around. I think this holds with the proposed
kmemleak_cond_resched() since we now start a new grace period with
rcu_read_lock() followed by taking and releasing kmemleak_lock. The
latter would give us the memory ordering required since removing object
A from the list does take the lock.

So yeah, you are probably right, I just find it hard to get my head
around ;). I still think it would be simpler with a single kmemleak_lock
(no object->lock) but that's more involved than a simple fix.

Assuming your (and my) reasoning above is correct:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

Re: [RESEND PATCH v2 2/2] mm/kmemleak: Fix UAF bug in kmemleak_scan()

Posted by Waiman Long 3 years ago

On 1/23/23 14:24, Catalin Marinas wrote:
> On Fri, Jan 20, 2023 at 05:54:28PM -0500, Waiman Long wrote:
>> On 1/20/23 14:18, Catalin Marinas wrote:
>>>>    /*
>>>> @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size,
>>>>    	object->count = 0;			/* white color initially */
>>>>    	object->jiffies = jiffies;
>>>>    	object->checksum = 0;
>>>> +	object->del_state = 0;
>>>>    	/* task information */
>>>>    	if (in_hardirq()) {
>>>> @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
>>>>    	if (!get_object(object))
>>>>    		return;	/* Try next object */
>>>> +	raw_spin_lock_irq(&kmemleak_lock);
>>>> +	if (object->del_state & DELSTATE_REMOVED)
>>>> +		goto unlock_put;	/* Object removed */
>>>> +	object->del_state |= DELSTATE_NO_DELETE;
>>>> +	raw_spin_unlock_irq(&kmemleak_lock);
>>>> +
>>>>    	rcu_read_unlock();
>>>>    	cond_resched();
>>>>    	rcu_read_lock();
>>>> +
>>>> +	raw_spin_lock_irq(&kmemleak_lock);
>>>> +	if (object->del_state & DELSTATE_REMOVED)
>>>> +		list_del_rcu(&object->object_list);
>>>> +	object->del_state &= ~DELSTATE_NO_DELETE;
>>>> +unlock_put:
>>>> +	raw_spin_unlock_irq(&kmemleak_lock);
>>>>    	put_object(object);
>>>>    }
>>> I'm not sure this was the only problem. We do have the problem that the
>>> current object may be removed from the list, solved above, but another
>>> scenario I had in mind is the next object being released during this
>>> brief resched period. The RCU relies on object->next->next being valid
>>> but, with a brief rcu_read_unlock(), the object->next could be freed,
>>> reallocated, so object->next->next invalid.
>> Looking at the following scenario,
>>
>> object->next => A (removed)
>> A->next => B (removed)
>>
>> As object->next is pointing to A, A must still be allocated and not freed
>> yet. Now if B is also removed, there are 2 possible case.
>>
>> 1) B is removed from the list after the removal of A. In that case, it is
>> not possible that A is allocated, but B is freed.
>>
>> 2) B is removed before A. A->next can't pointed to B when it is being
>> removed. Due to weak memory ordering, it is possible that another cpu can
>> see A->next still pointing to B. In that case, I believe that it is still
>> within the grace period where neither A or B is freed.
>>
>> In fact, it is no different from a regular scanning of the object list
>> without ever called cond_resched().
> More like thinking out loud:
>
> The lockless RCU loop relies on object->next->next being valid within
> the grace period (A not freed). Due to weak memory ordering, the looping
> CPU may not observe the object->next update (removal of A) by another
> CPU, so it continues to loop over it. But since we do an
> rcu_read_unlock() in the middle of the loop, I don't think these
> assumptions are still valid, so A may be freed.
>
> What we need is that object->next reading for the following iteration
> either sees the updated object->next (B) or it sees A but the latter
> still around. I think this holds with the proposed
> kmemleak_cond_resched() since we now start a new grace period with
> rcu_read_lock() followed by taking and releasing kmemleak_lock. The
> latter would give us the memory ordering required since removing object
> A from the list does take the lock.
>
> So yeah, you are probably right, I just find it hard to get my head
> around ;). I still think it would be simpler with a single kmemleak_lock
> (no object->lock) but that's more involved than a simple fix.
>
> Assuming your (and my) reasoning above is correct:
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

I should have mentioned the fact that taking the kmemleak_lock will post 
some ordering guarantee since it is done after a new rcu_read_lock(). So 
yes, even if both A and B are removed from the object_list, they should 
still be around and not freed yet.

Thanks for your review.

Cheers,
Longman

[RESEND PATCH v2 1/2] mm/kmemleak: Simplify kmemleak_cond_resched() usage
[RESEND PATCH v2 2/2] mm/kmemleak: Fix UAF bug in kmemleak_scan()