Commit 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first
object iteration loop of kmemleak_scan()") fixes soft lockup problem
in kmemleak_scan() by periodically doing a cond_resched(). It does
take a reference of the current object before doing it. Unfortunately,
if the object has been deleted from the object_list, the next object
pointed to by its next pointer may no longer be valid after coming
back from cond_resched(). This can result in use-after-free and other
nasty problem.
Fix this problem by adding a del_state flag into kmemleak_object
structure to synchronize the object deletion process between
kmemleak_cond_resched() and __remove_object() to make sure that the
object remained in the object_list in the duration of the cond_resched()
call.
Fixes: 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan()")
Signed-off-by: Waiman Long <longman@redhat.com>
---
mm/kmemleak.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index e7cb521236bf..0ece170fc9ef 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -13,11 +13,12 @@
*
* The following locks and mutexes are used by kmemleak:
*
- * - kmemleak_lock (raw_spinlock_t): protects the object_list modifications and
- * accesses to the object_tree_root (or object_phys_tree_root). The
- * object_list is the main list holding the metadata (struct kmemleak_object)
- * for the allocated memory blocks. The object_tree_root and object_phys_tree_root
- * are red black trees used to look-up metadata based on a pointer to the
+ * - kmemleak_lock (raw_spinlock_t): protects the object_list as well as
+ * del_state modifications and accesses to the object_tree_root (or
+ * object_phys_tree_root). The object_list is the main list holding the
+ * metadata (struct kmemleak_object) for the allocated memory blocks.
+ * The object_tree_root and object_phys_tree_root are red
+ * black trees used to look-up metadata based on a pointer to the
* corresponding memory block. The object_phys_tree_root is for objects
* allocated with physical address. The kmemleak_object structures are
* added to the object_list and object_tree_root (or object_phys_tree_root)
@@ -147,6 +148,7 @@ struct kmemleak_object {
struct rcu_head rcu; /* object_list lockless traversal */
/* object usage count; object freed when use_count == 0 */
atomic_t use_count;
+ unsigned int del_state; /* deletion state */
unsigned long pointer;
size_t size;
/* pass surplus references to this pointer */
@@ -177,6 +179,11 @@ struct kmemleak_object {
/* flag set for object allocated with physical address */
#define OBJECT_PHYS (1 << 4)
+/* set when __remove_object() called */
+#define DELSTATE_REMOVED (1 << 0)
+/* set to temporarily prevent deletion from object_list */
+#define DELSTATE_NO_DELETE (1 << 1)
+
#define HEX_PREFIX " "
/* number of bytes to print per line; must be 16 or 32 */
#define HEX_ROW_SIZE 16
@@ -567,7 +574,9 @@ static void __remove_object(struct kmemleak_object *object)
rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ?
&object_phys_tree_root :
&object_tree_root);
- list_del_rcu(&object->object_list);
+ if (!(object->del_state & DELSTATE_NO_DELETE))
+ list_del_rcu(&object->object_list);
+ object->del_state |= DELSTATE_REMOVED;
}
/*
@@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size,
object->count = 0; /* white color initially */
object->jiffies = jiffies;
object->checksum = 0;
+ object->del_state = 0;
/* task information */
if (in_hardirq()) {
@@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object)
if (!get_object(object))
return; /* Try next object */
+ raw_spin_lock_irq(&kmemleak_lock);
+ if (object->del_state & DELSTATE_REMOVED)
+ goto unlock_put; /* Object removed */
+ object->del_state |= DELSTATE_NO_DELETE;
+ raw_spin_unlock_irq(&kmemleak_lock);
+
rcu_read_unlock();
cond_resched();
rcu_read_lock();
+
+ raw_spin_lock_irq(&kmemleak_lock);
+ if (object->del_state & DELSTATE_REMOVED)
+ list_del_rcu(&object->object_list);
+ object->del_state &= ~DELSTATE_NO_DELETE;
+unlock_put:
+ raw_spin_unlock_irq(&kmemleak_lock);
put_object(object);
}
--
2.31.1
Hi Waiman, Thanks for your effort on trying to fix this. On Wed, Jan 18, 2023 at 11:01:11PM -0500, Waiman Long wrote: > @@ -567,7 +574,9 @@ static void __remove_object(struct kmemleak_object *object) > rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ? > &object_phys_tree_root : > &object_tree_root); > - list_del_rcu(&object->object_list); > + if (!(object->del_state & DELSTATE_NO_DELETE)) > + list_del_rcu(&object->object_list); > + object->del_state |= DELSTATE_REMOVED; > } So IIUC, this prevents the current object being scanned from being removed from the list during the kmemleak_cond_resched() call. > /* > @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size, > object->count = 0; /* white color initially */ > object->jiffies = jiffies; > object->checksum = 0; > + object->del_state = 0; > > /* task information */ > if (in_hardirq()) { > @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) > if (!get_object(object)) > return; /* Try next object */ > > + raw_spin_lock_irq(&kmemleak_lock); > + if (object->del_state & DELSTATE_REMOVED) > + goto unlock_put; /* Object removed */ > + object->del_state |= DELSTATE_NO_DELETE; > + raw_spin_unlock_irq(&kmemleak_lock); > + > rcu_read_unlock(); > cond_resched(); > rcu_read_lock(); > + > + raw_spin_lock_irq(&kmemleak_lock); > + if (object->del_state & DELSTATE_REMOVED) > + list_del_rcu(&object->object_list); > + object->del_state &= ~DELSTATE_NO_DELETE; > +unlock_put: > + raw_spin_unlock_irq(&kmemleak_lock); > put_object(object); > } I'm not sure this was the only problem. We do have the problem that the current object may be removed from the list, solved above, but another scenario I had in mind is the next object being released during this brief resched period. The RCU relies on object->next->next being valid but, with a brief rcu_read_unlock(), the object->next could be freed, reallocated, so object->next->next invalid. -- Catalin
On 1/20/23 14:18, Catalin Marinas wrote: > Hi Waiman, > > Thanks for your effort on trying to fix this. > > On Wed, Jan 18, 2023 at 11:01:11PM -0500, Waiman Long wrote: >> @@ -567,7 +574,9 @@ static void __remove_object(struct kmemleak_object *object) >> rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ? >> &object_phys_tree_root : >> &object_tree_root); >> - list_del_rcu(&object->object_list); >> + if (!(object->del_state & DELSTATE_NO_DELETE)) >> + list_del_rcu(&object->object_list); >> + object->del_state |= DELSTATE_REMOVED; >> } > So IIUC, this prevents the current object being scanned from being > removed from the list during the kmemleak_cond_resched() call. Yes, that is the point. > >> /* >> @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size, >> object->count = 0; /* white color initially */ >> object->jiffies = jiffies; >> object->checksum = 0; >> + object->del_state = 0; >> >> /* task information */ >> if (in_hardirq()) { >> @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) >> if (!get_object(object)) >> return; /* Try next object */ >> >> + raw_spin_lock_irq(&kmemleak_lock); >> + if (object->del_state & DELSTATE_REMOVED) >> + goto unlock_put; /* Object removed */ >> + object->del_state |= DELSTATE_NO_DELETE; >> + raw_spin_unlock_irq(&kmemleak_lock); >> + >> rcu_read_unlock(); >> cond_resched(); >> rcu_read_lock(); >> + >> + raw_spin_lock_irq(&kmemleak_lock); >> + if (object->del_state & DELSTATE_REMOVED) >> + list_del_rcu(&object->object_list); >> + object->del_state &= ~DELSTATE_NO_DELETE; >> +unlock_put: >> + raw_spin_unlock_irq(&kmemleak_lock); >> put_object(object); >> } > I'm not sure this was the only problem. We do have the problem that the > current object may be removed from the list, solved above, but another > scenario I had in mind is the next object being released during this > brief resched period. The RCU relies on object->next->next being valid > but, with a brief rcu_read_unlock(), the object->next could be freed, > reallocated, so object->next->next invalid. Looking at the following scenario, object->next => A (removed) A->next => B (removed) As object->next is pointing to A, A must still be allocated and not freed yet. Now if B is also removed, there are 2 possible case. 1) B is removed from the list after the removal of A. In that case, it is not possible that A is allocated, but B is freed. 2) B is removed before A. A->next can't pointed to B when it is being removed. Due to weak memory ordering, it is possible that another cpu can see A->next still pointing to B. In that case, I believe that it is still within the grace period where neither A or B is freed. In fact, it is no different from a regular scanning of the object list without ever called cond_resched(). Cheers, Longman
On Fri, Jan 20, 2023 at 05:54:28PM -0500, Waiman Long wrote: > On 1/20/23 14:18, Catalin Marinas wrote: > > > /* > > > @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size, > > > object->count = 0; /* white color initially */ > > > object->jiffies = jiffies; > > > object->checksum = 0; > > > + object->del_state = 0; > > > /* task information */ > > > if (in_hardirq()) { > > > @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) > > > if (!get_object(object)) > > > return; /* Try next object */ > > > + raw_spin_lock_irq(&kmemleak_lock); > > > + if (object->del_state & DELSTATE_REMOVED) > > > + goto unlock_put; /* Object removed */ > > > + object->del_state |= DELSTATE_NO_DELETE; > > > + raw_spin_unlock_irq(&kmemleak_lock); > > > + > > > rcu_read_unlock(); > > > cond_resched(); > > > rcu_read_lock(); > > > + > > > + raw_spin_lock_irq(&kmemleak_lock); > > > + if (object->del_state & DELSTATE_REMOVED) > > > + list_del_rcu(&object->object_list); > > > + object->del_state &= ~DELSTATE_NO_DELETE; > > > +unlock_put: > > > + raw_spin_unlock_irq(&kmemleak_lock); > > > put_object(object); > > > } > > I'm not sure this was the only problem. We do have the problem that the > > current object may be removed from the list, solved above, but another > > scenario I had in mind is the next object being released during this > > brief resched period. The RCU relies on object->next->next being valid > > but, with a brief rcu_read_unlock(), the object->next could be freed, > > reallocated, so object->next->next invalid. > > Looking at the following scenario, > > object->next => A (removed) > A->next => B (removed) > > As object->next is pointing to A, A must still be allocated and not freed > yet. Now if B is also removed, there are 2 possible case. > > 1) B is removed from the list after the removal of A. In that case, it is > not possible that A is allocated, but B is freed. > > 2) B is removed before A. A->next can't pointed to B when it is being > removed. Due to weak memory ordering, it is possible that another cpu can > see A->next still pointing to B. In that case, I believe that it is still > within the grace period where neither A or B is freed. > > In fact, it is no different from a regular scanning of the object list > without ever called cond_resched(). More like thinking out loud: The lockless RCU loop relies on object->next->next being valid within the grace period (A not freed). Due to weak memory ordering, the looping CPU may not observe the object->next update (removal of A) by another CPU, so it continues to loop over it. But since we do an rcu_read_unlock() in the middle of the loop, I don't think these assumptions are still valid, so A may be freed. What we need is that object->next reading for the following iteration either sees the updated object->next (B) or it sees A but the latter still around. I think this holds with the proposed kmemleak_cond_resched() since we now start a new grace period with rcu_read_lock() followed by taking and releasing kmemleak_lock. The latter would give us the memory ordering required since removing object A from the list does take the lock. So yeah, you are probably right, I just find it hard to get my head around ;). I still think it would be simpler with a single kmemleak_lock (no object->lock) but that's more involved than a simple fix. Assuming your (and my) reasoning above is correct: Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
On 1/23/23 14:24, Catalin Marinas wrote: > On Fri, Jan 20, 2023 at 05:54:28PM -0500, Waiman Long wrote: >> On 1/20/23 14:18, Catalin Marinas wrote: >>>> /* >>>> @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size, >>>> object->count = 0; /* white color initially */ >>>> object->jiffies = jiffies; >>>> object->checksum = 0; >>>> + object->del_state = 0; >>>> /* task information */ >>>> if (in_hardirq()) { >>>> @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) >>>> if (!get_object(object)) >>>> return; /* Try next object */ >>>> + raw_spin_lock_irq(&kmemleak_lock); >>>> + if (object->del_state & DELSTATE_REMOVED) >>>> + goto unlock_put; /* Object removed */ >>>> + object->del_state |= DELSTATE_NO_DELETE; >>>> + raw_spin_unlock_irq(&kmemleak_lock); >>>> + >>>> rcu_read_unlock(); >>>> cond_resched(); >>>> rcu_read_lock(); >>>> + >>>> + raw_spin_lock_irq(&kmemleak_lock); >>>> + if (object->del_state & DELSTATE_REMOVED) >>>> + list_del_rcu(&object->object_list); >>>> + object->del_state &= ~DELSTATE_NO_DELETE; >>>> +unlock_put: >>>> + raw_spin_unlock_irq(&kmemleak_lock); >>>> put_object(object); >>>> } >>> I'm not sure this was the only problem. We do have the problem that the >>> current object may be removed from the list, solved above, but another >>> scenario I had in mind is the next object being released during this >>> brief resched period. The RCU relies on object->next->next being valid >>> but, with a brief rcu_read_unlock(), the object->next could be freed, >>> reallocated, so object->next->next invalid. >> Looking at the following scenario, >> >> object->next => A (removed) >> A->next => B (removed) >> >> As object->next is pointing to A, A must still be allocated and not freed >> yet. Now if B is also removed, there are 2 possible case. >> >> 1) B is removed from the list after the removal of A. In that case, it is >> not possible that A is allocated, but B is freed. >> >> 2) B is removed before A. A->next can't pointed to B when it is being >> removed. Due to weak memory ordering, it is possible that another cpu can >> see A->next still pointing to B. In that case, I believe that it is still >> within the grace period where neither A or B is freed. >> >> In fact, it is no different from a regular scanning of the object list >> without ever called cond_resched(). > More like thinking out loud: > > The lockless RCU loop relies on object->next->next being valid within > the grace period (A not freed). Due to weak memory ordering, the looping > CPU may not observe the object->next update (removal of A) by another > CPU, so it continues to loop over it. But since we do an > rcu_read_unlock() in the middle of the loop, I don't think these > assumptions are still valid, so A may be freed. > > What we need is that object->next reading for the following iteration > either sees the updated object->next (B) or it sees A but the latter > still around. I think this holds with the proposed > kmemleak_cond_resched() since we now start a new grace period with > rcu_read_lock() followed by taking and releasing kmemleak_lock. The > latter would give us the memory ordering required since removing object > A from the list does take the lock. > > So yeah, you are probably right, I just find it hard to get my head > around ;). I still think it would be simpler with a single kmemleak_lock > (no object->lock) but that's more involved than a simple fix. > > Assuming your (and my) reasoning above is correct: > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> I should have mentioned the fact that taking the kmemleak_lock will post some ordering guarantee since it is done after a new rcu_read_lock(). So yes, even if both A and B are removed from the object_list, they should still be around and not freed yet. Thanks for your review. Cheers, Longman
© 2016 - 2025 Red Hat, Inc.