mm/kasan/common.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-)
Currently, enabling KASAN masks bugs where a lockless lookup path gets a
pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be
recycled and is insufficiently careful about handling recycled objects:
KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine
queues, even when it can't actually detect UAF in these objects, and the
quarantine prevents fast recycling.
When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling
CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed
after an RCU grace period and put them on the quarantine, while disabling
CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately;
but that hasn't actually been working.
I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I
could only trigger this bug in a KASAN build by disabling
CONFIG_SLUB_RCU_DEBUG and applying this patch.
Signed-off-by: Jann Horn <jannh@google.com>
---
mm/kasan/common.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index ed4873e18c75..9142964ab9c9 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -230,16 +230,12 @@ static bool check_slab_allocation(struct kmem_cache *cache, void *object,
}
static inline void poison_slab_object(struct kmem_cache *cache, void *object,
- bool init, bool still_accessible)
+ bool init)
{
void *tagged_object = object;
object = kasan_reset_tag(object);
- /* RCU slabs could be legally used after free within the RCU period. */
- if (unlikely(still_accessible))
- return;
-
kasan_poison(object, round_up(cache->object_size, KASAN_GRANULE_SIZE),
KASAN_SLAB_FREE, init);
@@ -261,7 +257,22 @@ bool __kasan_slab_free(struct kmem_cache *cache, void *object, bool init,
if (!kasan_arch_is_ready() || is_kfence_address(object))
return false;
- poison_slab_object(cache, object, init, still_accessible);
+ /*
+ * If this point is reached with an object that must still be
+ * accessible under RCU, we can't poison it; in that case, also skip the
+ * quarantine. This should mostly only happen when CONFIG_SLUB_RCU_DEBUG
+ * has been disabled manually.
+ *
+ * Putting the object on the quarantine wouldn't help catch UAFs (since
+ * we can't poison it here), and it would mask bugs caused by
+ * SLAB_TYPESAFE_BY_RCU users not being careful enough about object
+ * reuse; so overall, putting the object into the quarantine here would
+ * be counterproductive.
+ */
+ if (still_accessible)
+ return false;
+
+ poison_slab_object(cache, object, init);
/*
* If the object is put into quarantine, do not let slab put the object
@@ -519,7 +530,7 @@ bool __kasan_mempool_poison_object(void *ptr, unsigned long ip)
if (check_slab_allocation(slab->slab_cache, ptr, ip))
return false;
- poison_slab_object(slab->slab_cache, ptr, false, false);
+ poison_slab_object(slab->slab_cache, ptr, false);
return true;
}
---
base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41
change-id: 20250723-kasan-tsbrcu-noquarantine-e207bb990e24
--
Jann Horn <jannh@google.com>
On Wed, Jul 23, 2025 at 4:59 PM Jann Horn <jannh@google.com> wrote: > > Currently, enabling KASAN masks bugs where a lockless lookup path gets a > pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be > recycled and is insufficiently careful about handling recycled objects: > KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine > queues, even when it can't actually detect UAF in these objects, and the > quarantine prevents fast recycling. > > When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling > CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed > after an RCU grace period and put them on the quarantine, while disabling > CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately; > but that hasn't actually been working. > > I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I > could only trigger this bug in a KASAN build by disabling > CONFIG_SLUB_RCU_DEBUG and applying this patch. > > Signed-off-by: Jann Horn <jannh@google.com> > --- > mm/kasan/common.c | 25 ++++++++++++++++++------- > 1 file changed, 18 insertions(+), 7 deletions(-) > > diff --git a/mm/kasan/common.c b/mm/kasan/common.c > index ed4873e18c75..9142964ab9c9 100644 > --- a/mm/kasan/common.c > +++ b/mm/kasan/common.c > @@ -230,16 +230,12 @@ static bool check_slab_allocation(struct kmem_cache *cache, void *object, > } > > static inline void poison_slab_object(struct kmem_cache *cache, void *object, > - bool init, bool still_accessible) > + bool init) > { > void *tagged_object = object; > > object = kasan_reset_tag(object); > > - /* RCU slabs could be legally used after free within the RCU period. */ > - if (unlikely(still_accessible)) > - return; > - > kasan_poison(object, round_up(cache->object_size, KASAN_GRANULE_SIZE), > KASAN_SLAB_FREE, init); > > @@ -261,7 +257,22 @@ bool __kasan_slab_free(struct kmem_cache *cache, void *object, bool init, > if (!kasan_arch_is_ready() || is_kfence_address(object)) > return false; > > - poison_slab_object(cache, object, init, still_accessible); > + /* > + * If this point is reached with an object that must still be > + * accessible under RCU, we can't poison it; in that case, also skip the > + * quarantine. This should mostly only happen when CONFIG_SLUB_RCU_DEBUG > + * has been disabled manually. > + * > + * Putting the object on the quarantine wouldn't help catch UAFs (since > + * we can't poison it here), and it would mask bugs caused by > + * SLAB_TYPESAFE_BY_RCU users not being careful enough about object > + * reuse; so overall, putting the object into the quarantine here would > + * be counterproductive. > + */ > + if (still_accessible) > + return false; > + > + poison_slab_object(cache, object, init); > > /* > * If the object is put into quarantine, do not let slab put the object > @@ -519,7 +530,7 @@ bool __kasan_mempool_poison_object(void *ptr, unsigned long ip) > if (check_slab_allocation(slab->slab_cache, ptr, ip)) > return false; > > - poison_slab_object(slab->slab_cache, ptr, false, false); > + poison_slab_object(slab->slab_cache, ptr, false); > return true; > } > > > --- > base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41 > change-id: 20250723-kasan-tsbrcu-noquarantine-e207bb990e24 > > -- > Jann Horn <jannh@google.com> > Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Would it be hard to add KUnit test to check that KASAN detects such issues?
On Sun, Jul 27, 2025 at 12:06 AM Andrey Konovalov <andreyknvl@gmail.com> wrote: > On Wed, Jul 23, 2025 at 4:59 PM Jann Horn <jannh@google.com> wrote: > > > > Currently, enabling KASAN masks bugs where a lockless lookup path gets a > > pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be > > recycled and is insufficiently careful about handling recycled objects: > > KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine > > queues, even when it can't actually detect UAF in these objects, and the > > quarantine prevents fast recycling. > > > > When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling > > CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed > > after an RCU grace period and put them on the quarantine, while disabling > > CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately; > > but that hasn't actually been working. > > > > I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I > > could only trigger this bug in a KASAN build by disabling > > CONFIG_SLUB_RCU_DEBUG and applying this patch. > > > > Signed-off-by: Jann Horn <jannh@google.com> > > --- > > mm/kasan/common.c | 25 ++++++++++++++++++------- > > 1 file changed, 18 insertions(+), 7 deletions(-) > > > > diff --git a/mm/kasan/common.c b/mm/kasan/common.c > > index ed4873e18c75..9142964ab9c9 100644 > > --- a/mm/kasan/common.c > > +++ b/mm/kasan/common.c > > @@ -230,16 +230,12 @@ static bool check_slab_allocation(struct kmem_cache *cache, void *object, > > } > > > > static inline void poison_slab_object(struct kmem_cache *cache, void *object, > > - bool init, bool still_accessible) > > + bool init) > > { > > void *tagged_object = object; > > > > object = kasan_reset_tag(object); > > > > - /* RCU slabs could be legally used after free within the RCU period. */ > > - if (unlikely(still_accessible)) > > - return; > > - > > kasan_poison(object, round_up(cache->object_size, KASAN_GRANULE_SIZE), > > KASAN_SLAB_FREE, init); > > > > @@ -261,7 +257,22 @@ bool __kasan_slab_free(struct kmem_cache *cache, void *object, bool init, > > if (!kasan_arch_is_ready() || is_kfence_address(object)) > > return false; > > > > - poison_slab_object(cache, object, init, still_accessible); > > + /* > > + * If this point is reached with an object that must still be > > + * accessible under RCU, we can't poison it; in that case, also skip the > > + * quarantine. This should mostly only happen when CONFIG_SLUB_RCU_DEBUG > > + * has been disabled manually. > > + * > > + * Putting the object on the quarantine wouldn't help catch UAFs (since > > + * we can't poison it here), and it would mask bugs caused by > > + * SLAB_TYPESAFE_BY_RCU users not being careful enough about object > > + * reuse; so overall, putting the object into the quarantine here would > > + * be counterproductive. > > + */ > > + if (still_accessible) > > + return false; > > + > > + poison_slab_object(cache, object, init); > > > > /* > > * If the object is put into quarantine, do not let slab put the object > > @@ -519,7 +530,7 @@ bool __kasan_mempool_poison_object(void *ptr, unsigned long ip) > > if (check_slab_allocation(slab->slab_cache, ptr, ip)) > > return false; > > > > - poison_slab_object(slab->slab_cache, ptr, false, false); > > + poison_slab_object(slab->slab_cache, ptr, false); > > return true; > > } > > > > > > --- > > base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41 > > change-id: 20250723-kasan-tsbrcu-noquarantine-e207bb990e24 > > > > -- > > Jann Horn <jannh@google.com> > > > > Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Thanks! > Would it be hard to add KUnit test to check that KASAN detects such issues? Sent a separate patch with a kunit test.
On 7/23/25 16:59, Jann Horn wrote: > Currently, enabling KASAN masks bugs where a lockless lookup path gets a > pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be > recycled and is insufficiently careful about handling recycled objects: > KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine > queues, even when it can't actually detect UAF in these objects, and the > quarantine prevents fast recycling. > > When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling > CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed > after an RCU grace period and put them on the quarantine, while disabling > CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately; > but that hasn't actually been working. Was the "allow reuse immediately" not working also before you introduced CONFIG_SLUB_RCU_DEBUG, or is it a side-effect of that? IOW should we add a Fixes: here? > I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I > could only trigger this bug in a KASAN build by disabling > CONFIG_SLUB_RCU_DEBUG and applying this patch. > > Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > mm/kasan/common.c | 25 ++++++++++++++++++------- > 1 file changed, 18 insertions(+), 7 deletions(-) > > diff --git a/mm/kasan/common.c b/mm/kasan/common.c > index ed4873e18c75..9142964ab9c9 100644 > --- a/mm/kasan/common.c > +++ b/mm/kasan/common.c > @@ -230,16 +230,12 @@ static bool check_slab_allocation(struct kmem_cache *cache, void *object, > } > > static inline void poison_slab_object(struct kmem_cache *cache, void *object, > - bool init, bool still_accessible) > + bool init) > { > void *tagged_object = object; > > object = kasan_reset_tag(object); > > - /* RCU slabs could be legally used after free within the RCU period. */ > - if (unlikely(still_accessible)) > - return; > - > kasan_poison(object, round_up(cache->object_size, KASAN_GRANULE_SIZE), > KASAN_SLAB_FREE, init); > > @@ -261,7 +257,22 @@ bool __kasan_slab_free(struct kmem_cache *cache, void *object, bool init, > if (!kasan_arch_is_ready() || is_kfence_address(object)) > return false; > > - poison_slab_object(cache, object, init, still_accessible); > + /* > + * If this point is reached with an object that must still be > + * accessible under RCU, we can't poison it; in that case, also skip the > + * quarantine. This should mostly only happen when CONFIG_SLUB_RCU_DEBUG > + * has been disabled manually. > + * > + * Putting the object on the quarantine wouldn't help catch UAFs (since > + * we can't poison it here), and it would mask bugs caused by > + * SLAB_TYPESAFE_BY_RCU users not being careful enough about object > + * reuse; so overall, putting the object into the quarantine here would > + * be counterproductive. > + */ > + if (still_accessible) > + return false; > + > + poison_slab_object(cache, object, init); > > /* > * If the object is put into quarantine, do not let slab put the object > @@ -519,7 +530,7 @@ bool __kasan_mempool_poison_object(void *ptr, unsigned long ip) > if (check_slab_allocation(slab->slab_cache, ptr, ip)) > return false; > > - poison_slab_object(slab->slab_cache, ptr, false, false); > + poison_slab_object(slab->slab_cache, ptr, false); > return true; > } > > > --- > base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41 > change-id: 20250723-kasan-tsbrcu-noquarantine-e207bb990e24 >
On Thu, Jul 24, 2025 at 12:14 PM Vlastimil Babka <vbabka@suse.cz> wrote: > On 7/23/25 16:59, Jann Horn wrote: > > Currently, enabling KASAN masks bugs where a lockless lookup path gets a > > pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be > > recycled and is insufficiently careful about handling recycled objects: > > KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine > > queues, even when it can't actually detect UAF in these objects, and the > > quarantine prevents fast recycling. > > > > When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling > > CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed > > after an RCU grace period and put them on the quarantine, while disabling > > CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately; > > but that hasn't actually been working. > > Was the "allow reuse immediately" not working also before you introduced > CONFIG_SLUB_RCU_DEBUG, or is it a side-effect of that? IOW should we add a > Fixes: here? This was already an issue before. I think it got broken by refactoring in commit b556a462eb8df6b6836c318d23f43409c40a7c7e ("kasan: save free stack traces for slab mempools"), but I don't think it was necessarily an intentionally supported feature. > > I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I > > could only trigger this bug in a KASAN build by disabling > > CONFIG_SLUB_RCU_DEBUG and applying this patch. > > > > Signed-off-by: Jann Horn <jannh@google.com> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> Thanks!
On Thu, Jul 24, 2025 at 12:14 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 7/23/25 16:59, Jann Horn wrote: > > Currently, enabling KASAN masks bugs where a lockless lookup path gets a > > pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be > > recycled and is insufficiently careful about handling recycled objects: > > KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine > > queues, even when it can't actually detect UAF in these objects, and the > > quarantine prevents fast recycling. > > > > When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling > > CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed > > after an RCU grace period and put them on the quarantine, while disabling > > CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately; > > but that hasn't actually been working. > > Was the "allow reuse immediately" not working also before you introduced > CONFIG_SLUB_RCU_DEBUG, or is it a side-effect of that? IOW should we add a > Fixes: here? > > > I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I > > could only trigger this bug in a KASAN build by disabling > > CONFIG_SLUB_RCU_DEBUG and applying this patch. > > > > Signed-off-by: Jann Horn <jannh@google.com> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexander Potapenko <glider@google.com>
© 2016 - 2025 Red Hat, Inc.