[PATCH] fs: Keep long filenames in isolated slab buckets

Kees Cook posted 1 patch 1 month, 2 weeks ago
fs/namei.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
[PATCH] fs: Keep long filenames in isolated slab buckets
Posted by Kees Cook 1 month, 2 weeks ago
A building block of Use-After-Free heap memory corruption attacks is
using userspace controllable kernel allocations to fill specifically sized
kmalloc regions with specific contents. The most powerful of these kinds
of primitives is arbitrarily controllable contents with arbitrary size.
Keeping these kinds of allocations out of the general kmalloc buckets is
needed to harden the kernel against such manipulations, so this is why
these sorts of "copy data from userspace into kernel heap" situations are
expected to use things like memdup_user(), which keeps the allocations
in their own set of slab buckets. However, using memdup_user() is not
always appropriate, so in those cases, kmem_buckets can used directly.

Filenames used to be isolated in their own (fixed size) slab cache so
they would not end up in the general kmalloc buckets (which made them
unusable for the heap grooming method described above). After commit
8c888b31903c ("struct filename: saner handling of long names"), filenames
were being copied into arbitrarily sized kmalloc regions in the general
kmalloc buckets. Instead, like memdup_user(), use a dedicated set of
kmem buckets for long filenames so we do not introduce a new way for
attackers to place arbitrary contents into the general kmalloc buckets.

Fixes: 8c888b31903c ("struct filename: saner handling of long names")
Signed-off-by: Kees Cook <kees@kernel.org>
---
Also, from the same commit, is the loss of SLAB_HWCACHE_ALIGN|SLAB_PANIC
for filename allocations relavant at all? It could be added back for
these buckets if desired, but I left it default in this patch.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: <linux-fsdevel@vger.kernel.org>
---
 fs/namei.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8e7792de0000..a901733380cd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -128,6 +128,8 @@
 /* SLAB cache for struct filename instances */
 static struct kmem_cache *__names_cache __ro_after_init;
 #define names_cache	runtime_const_ptr(__names_cache)
+/* SLAB buckets for long names */
+static kmem_buckets *names_buckets __ro_after_init;
 
 void __init filename_init(void)
 {
@@ -135,6 +137,8 @@ void __init filename_init(void)
 			 SLAB_HWCACHE_ALIGN|SLAB_PANIC, offsetof(struct filename, iname),
 			 EMBEDDED_NAME_MAX, NULL);
 	runtime_const_init(ptr, __names_cache);
+
+	names_buckets = kmem_buckets_create("names_bucket", 0, 0, PATH_MAX, NULL);
 }
 
 static inline struct filename *alloc_filename(void)
@@ -156,7 +160,7 @@ static inline void initname(struct filename *name)
 static int getname_long(struct filename *name, const char __user *filename)
 {
 	int len;
-	char *p __free(kfree) = kmalloc(PATH_MAX, GFP_KERNEL);
+	char *p __free(kfree) = kmem_buckets_alloc(names_buckets, PATH_MAX, GFP_KERNEL);
 	if (unlikely(!p))
 		return -ENOMEM;
 
@@ -264,14 +268,14 @@ static struct filename *do_getname_kernel(const char *filename, bool incomplete)
 
 	if (len <= EMBEDDED_NAME_MAX) {
 		p = (char *)result->iname;
-		memcpy(p, filename, len);
 	} else {
-		p = kmemdup(filename, len, GFP_KERNEL);
+		p = kmem_buckets_alloc(names_buckets, len, GFP_KERNEL);
 		if (unlikely(!p)) {
 			free_filename(result);
 			return ERR_PTR(-ENOMEM);
 		}
 	}
+	memcpy(p, filename, len);
 	result->name = p;
 	initname(result);
 	if (likely(!incomplete))
-- 
2.34.1
Re: [PATCH] fs: Keep long filenames in isolated slab buckets
Posted by Jann Horn 1 month, 2 weeks ago
On Wed, Feb 11, 2026 at 1:48 AM Kees Cook <kees@kernel.org> wrote:
> A building block of Use-After-Free heap memory corruption attacks is
> using userspace controllable kernel allocations to fill specifically sized
> kmalloc regions with specific contents. The most powerful of these kinds
> of primitives is arbitrarily controllable contents with arbitrary size.
> Keeping these kinds of allocations out of the general kmalloc buckets is
> needed to harden the kernel against such manipulations, so this is why
> these sorts of "copy data from userspace into kernel heap" situations are
> expected to use things like memdup_user(), which keeps the allocations
> in their own set of slab buckets. However, using memdup_user() is not
> always appropriate, so in those cases, kmem_buckets can used directly.
>
> Filenames used to be isolated in their own (fixed size) slab cache so
> they would not end up in the general kmalloc buckets (which made them
> unusable for the heap grooming method described above). After commit
> 8c888b31903c ("struct filename: saner handling of long names"), filenames
> were being copied into arbitrarily sized kmalloc regions in the general
> kmalloc buckets. Instead, like memdup_user(), use a dedicated set of
> kmem buckets for long filenames so we do not introduce a new way for
> attackers to place arbitrary contents into the general kmalloc buckets.
>
> Fixes: 8c888b31903c ("struct filename: saner handling of long names")
> Signed-off-by: Kees Cook <kees@kernel.org>
> ---
> Also, from the same commit, is the loss of SLAB_HWCACHE_ALIGN|SLAB_PANIC
> for filename allocations relavant at all? It could be added back for
> these buckets if desired, but I left it default in this patch.
>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: <linux-fsdevel@vger.kernel.org>
> ---
>  fs/namei.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 8e7792de0000..a901733380cd 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -128,6 +128,8 @@
>  /* SLAB cache for struct filename instances */
>  static struct kmem_cache *__names_cache __ro_after_init;
>  #define names_cache    runtime_const_ptr(__names_cache)
> +/* SLAB buckets for long names */
> +static kmem_buckets *names_buckets __ro_after_init;
>
>  void __init filename_init(void)
>  {
> @@ -135,6 +137,8 @@ void __init filename_init(void)
>                          SLAB_HWCACHE_ALIGN|SLAB_PANIC, offsetof(struct filename, iname),
>                          EMBEDDED_NAME_MAX, NULL);
>         runtime_const_init(ptr, __names_cache);
> +
> +       names_buckets = kmem_buckets_create("names_bucket", 0, 0, PATH_MAX, NULL);
>  }
>
>  static inline struct filename *alloc_filename(void)
> @@ -156,7 +160,7 @@ static inline void initname(struct filename *name)
>  static int getname_long(struct filename *name, const char __user *filename)
>  {
>         int len;
> -       char *p __free(kfree) = kmalloc(PATH_MAX, GFP_KERNEL);
> +       char *p __free(kfree) = kmem_buckets_alloc(names_buckets, PATH_MAX, GFP_KERNEL);
>         if (unlikely(!p))
>                 return -ENOMEM;

I think this path, where we always do maximally-sized allocations, is
the normal case where we're handling paths coming from userspace...

> @@ -264,14 +268,14 @@ static struct filename *do_getname_kernel(const char *filename, bool incomplete)
>
>         if (len <= EMBEDDED_NAME_MAX) {
>                 p = (char *)result->iname;
> -               memcpy(p, filename, len);
>         } else {
> -               p = kmemdup(filename, len, GFP_KERNEL);
> +               p = kmem_buckets_alloc(names_buckets, len, GFP_KERNEL);

... while this is kind of the exceptional case, where paths are coming
from kernelspace.

So you might want to get rid of the bucketing and instead just create
a single kmem_cache for long paths.


By the way, did you know that "struct filename" is only used for
storing fairly-temporary stuff like paths supplied to open(), but not
for storing the names of directory entries in the dentry cache (which
are more long-lived)? My understanding is that this is also why the
kernel doesn't really try to optimize the size of "struct filename" -
almost all instances of it only exist for the duration of a syscall or
something like that.

The dentry cache allocates long names as "struct external_name" in
reclaimable kmalloc slabs, see __d_alloc().
Re: [PATCH] fs: Keep long filenames in isolated slab buckets
Posted by Kees Cook 1 month, 2 weeks ago
On Wed, Feb 11, 2026 at 02:28:53AM +0100, Jann Horn wrote:
> On Wed, Feb 11, 2026 at 1:48 AM Kees Cook <kees@kernel.org> wrote:
> > A building block of Use-After-Free heap memory corruption attacks is
> > using userspace controllable kernel allocations to fill specifically sized
> > kmalloc regions with specific contents. The most powerful of these kinds
> > of primitives is arbitrarily controllable contents with arbitrary size.
> > Keeping these kinds of allocations out of the general kmalloc buckets is
> > needed to harden the kernel against such manipulations, so this is why
> > these sorts of "copy data from userspace into kernel heap" situations are
> > expected to use things like memdup_user(), which keeps the allocations
> > in their own set of slab buckets. However, using memdup_user() is not
> > always appropriate, so in those cases, kmem_buckets can used directly.
> >
> > Filenames used to be isolated in their own (fixed size) slab cache so
> > they would not end up in the general kmalloc buckets (which made them
> > unusable for the heap grooming method described above). After commit
> > 8c888b31903c ("struct filename: saner handling of long names"), filenames
> > were being copied into arbitrarily sized kmalloc regions in the general
> > kmalloc buckets. Instead, like memdup_user(), use a dedicated set of
> > kmem buckets for long filenames so we do not introduce a new way for
> > attackers to place arbitrary contents into the general kmalloc buckets.
> >
> > Fixes: 8c888b31903c ("struct filename: saner handling of long names")
> > Signed-off-by: Kees Cook <kees@kernel.org>
> > ---
> > Also, from the same commit, is the loss of SLAB_HWCACHE_ALIGN|SLAB_PANIC
> > for filename allocations relavant at all? It could be added back for
> > these buckets if desired, but I left it default in this patch.
> >
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: <linux-fsdevel@vger.kernel.org>
> > ---
> >  fs/namei.c | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 8e7792de0000..a901733380cd 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -128,6 +128,8 @@
> >  /* SLAB cache for struct filename instances */
> >  static struct kmem_cache *__names_cache __ro_after_init;
> >  #define names_cache    runtime_const_ptr(__names_cache)
> > +/* SLAB buckets for long names */
> > +static kmem_buckets *names_buckets __ro_after_init;
> >
> >  void __init filename_init(void)
> >  {
> > @@ -135,6 +137,8 @@ void __init filename_init(void)
> >                          SLAB_HWCACHE_ALIGN|SLAB_PANIC, offsetof(struct filename, iname),
> >                          EMBEDDED_NAME_MAX, NULL);
> >         runtime_const_init(ptr, __names_cache);
> > +
> > +       names_buckets = kmem_buckets_create("names_bucket", 0, 0, PATH_MAX, NULL);
> >  }
> >
> >  static inline struct filename *alloc_filename(void)
> > @@ -156,7 +160,7 @@ static inline void initname(struct filename *name)
> >  static int getname_long(struct filename *name, const char __user *filename)
> >  {
> >         int len;
> > -       char *p __free(kfree) = kmalloc(PATH_MAX, GFP_KERNEL);
> > +       char *p __free(kfree) = kmem_buckets_alloc(names_buckets, PATH_MAX, GFP_KERNEL);
> >         if (unlikely(!p))
> >                 return -ENOMEM;
> 
> I think this path, where we always do maximally-sized allocations, is
> the normal case where we're handling paths coming from userspace...

Actually, is there any reason we can't use strnlen_user() in
do_getname(), and then just use strndup_user() in the long case?

> > @@ -264,14 +268,14 @@ static struct filename *do_getname_kernel(const char *filename, bool incomplete)
> >
> >         if (len <= EMBEDDED_NAME_MAX) {
> >                 p = (char *)result->iname;
> > -               memcpy(p, filename, len);
> >         } else {
> > -               p = kmemdup(filename, len, GFP_KERNEL);
> > +               p = kmem_buckets_alloc(names_buckets, len, GFP_KERNEL);
> 
> ... while this is kind of the exceptional case, where paths are coming
> from kernelspace.
> 
> So you might want to get rid of the bucketing and instead just create
> a single kmem_cache for long paths.

I wasn't sure if there was a controllable way to reach this case or not.

> By the way, did you know that "struct filename" is only used for
> storing fairly-temporary stuff like paths supplied to open(), but not
> for storing the names of directory entries in the dentry cache (which
> are more long-lived)? My understanding is that this is also why the
> kernel doesn't really try to optimize the size of "struct filename" -
> almost all instances of it only exist for the duration of a syscall or
> something like that.

Right, but it was enough of a location change that I felt like it was
worth fixing it up.

> The dentry cache allocates long names as "struct external_name" in
> reclaimable kmalloc slabs, see __d_alloc().

Oh hey, look at that!

                struct external_name *p = kmalloc(size + name->len,
                                                  GFP_KERNEL_ACCOUNT |
                                                  __GFP_RECLAIMABLE);

Yeah, let's put that into dedicated buckets instead?

-Kees

-- 
Kees Cook
Re: [PATCH] fs: Keep long filenames in isolated slab buckets
Posted by Al Viro 1 month, 2 weeks ago
On Tue, Feb 10, 2026 at 05:41:43PM -0800, Kees Cook wrote:

> > I think this path, where we always do maximally-sized allocations, is
> > the normal case where we're handling paths coming from userspace...
> 
> Actually, is there any reason we can't use strnlen_user() in
> do_getname(), and then just use strndup_user() in the long case?

Yes.  Not having to deal with the "oh, lookie - it became empty this
time around" case.


> > >         if (len <= EMBEDDED_NAME_MAX) {
> > >                 p = (char *)result->iname;
> > > -               memcpy(p, filename, len);
> > >         } else {
> > > -               p = kmemdup(filename, len, GFP_KERNEL);
> > > +               p = kmem_buckets_alloc(names_buckets, len, GFP_KERNEL);
> > 
> > ... while this is kind of the exceptional case, where paths are coming
> > from kernelspace.

mount -t ext2 fucking_long_pathname_resolving_to_dev_sda1 /mnt

Watch the show.  "Fucking long" here being "longer than 150 bytes or so".
Re: [PATCH] fs: Keep long filenames in isolated slab buckets
Posted by Jann Horn 1 month, 2 weeks ago
On Wed, Feb 11, 2026 at 2:41 AM Kees Cook <kees@kernel.org> wrote:
> On Wed, Feb 11, 2026 at 02:28:53AM +0100, Jann Horn wrote:
> > On Wed, Feb 11, 2026 at 1:48 AM Kees Cook <kees@kernel.org> wrote:
> > > A building block of Use-After-Free heap memory corruption attacks is
> > > using userspace controllable kernel allocations to fill specifically sized
> > > kmalloc regions with specific contents. The most powerful of these kinds
> > > of primitives is arbitrarily controllable contents with arbitrary size.
> > > Keeping these kinds of allocations out of the general kmalloc buckets is
> > > needed to harden the kernel against such manipulations, so this is why
> > > these sorts of "copy data from userspace into kernel heap" situations are
> > > expected to use things like memdup_user(), which keeps the allocations
> > > in their own set of slab buckets. However, using memdup_user() is not
> > > always appropriate, so in those cases, kmem_buckets can used directly.
> > >
> > > Filenames used to be isolated in their own (fixed size) slab cache so
> > > they would not end up in the general kmalloc buckets (which made them
> > > unusable for the heap grooming method described above). After commit
> > > 8c888b31903c ("struct filename: saner handling of long names"), filenames
> > > were being copied into arbitrarily sized kmalloc regions in the general
> > > kmalloc buckets. Instead, like memdup_user(), use a dedicated set of
> > > kmem buckets for long filenames so we do not introduce a new way for
> > > attackers to place arbitrary contents into the general kmalloc buckets.
> > >
> > > Fixes: 8c888b31903c ("struct filename: saner handling of long names")
> > > Signed-off-by: Kees Cook <kees@kernel.org>
> > > ---
> > > Also, from the same commit, is the loss of SLAB_HWCACHE_ALIGN|SLAB_PANIC
> > > for filename allocations relavant at all? It could be added back for
> > > these buckets if desired, but I left it default in this patch.
> > >
> > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: <linux-fsdevel@vger.kernel.org>
> > > ---
> > >  fs/namei.c | 10 +++++++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/namei.c b/fs/namei.c
> > > index 8e7792de0000..a901733380cd 100644
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -128,6 +128,8 @@
> > >  /* SLAB cache for struct filename instances */
> > >  static struct kmem_cache *__names_cache __ro_after_init;
> > >  #define names_cache    runtime_const_ptr(__names_cache)
> > > +/* SLAB buckets for long names */
> > > +static kmem_buckets *names_buckets __ro_after_init;
> > >
> > >  void __init filename_init(void)
> > >  {
> > > @@ -135,6 +137,8 @@ void __init filename_init(void)
> > >                          SLAB_HWCACHE_ALIGN|SLAB_PANIC, offsetof(struct filename, iname),
> > >                          EMBEDDED_NAME_MAX, NULL);
> > >         runtime_const_init(ptr, __names_cache);
> > > +
> > > +       names_buckets = kmem_buckets_create("names_bucket", 0, 0, PATH_MAX, NULL);
> > >  }
> > >
> > >  static inline struct filename *alloc_filename(void)
> > > @@ -156,7 +160,7 @@ static inline void initname(struct filename *name)
> > >  static int getname_long(struct filename *name, const char __user *filename)
> > >  {
> > >         int len;
> > > -       char *p __free(kfree) = kmalloc(PATH_MAX, GFP_KERNEL);
> > > +       char *p __free(kfree) = kmem_buckets_alloc(names_buckets, PATH_MAX, GFP_KERNEL);
> > >         if (unlikely(!p))
> > >                 return -ENOMEM;
> >
> > I think this path, where we always do maximally-sized allocations, is
> > the normal case where we're handling paths coming from userspace...
>
> Actually, is there any reason we can't use strnlen_user() in
> do_getname(), and then just use strndup_user() in the long case?

I'm not an expert, but as far as I know, this path is supposed to be
really fast (because pretty much every syscall that operates on a path
will hit it), and doesn't care how much memory it allocates (because
these allocations are normally only alive for the duration of a
syscall). strnlen_user() would add another pass over the userspace
buffer, which I think would probably have negative performance impact?

> > > @@ -264,14 +268,14 @@ static struct filename *do_getname_kernel(const char *filename, bool incomplete)
> > >
> > >         if (len <= EMBEDDED_NAME_MAX) {
> > >                 p = (char *)result->iname;
> > > -               memcpy(p, filename, len);
> > >         } else {
> > > -               p = kmemdup(filename, len, GFP_KERNEL);
> > > +               p = kmem_buckets_alloc(names_buckets, len, GFP_KERNEL);
> >
> > ... while this is kind of the exceptional case, where paths are coming
> > from kernelspace.
> >
> > So you might want to get rid of the bucketing and instead just create
> > a single kmem_cache for long paths.
>
> I wasn't sure if there was a controllable way to reach this case or not.

I don't understand your point. I'm suggesting that in both of these
two cases, you can just allocate from the same dedicated slab.

But yes, you can get controlled data into the filename passed to
do_getname_kernel() - for example, when you execute a script (a file
that starts with "#!"), the interpreter path after "#!" is passed from
load_script() to open_exec(), which uses CLASS(filename_kernel, ...).

> > By the way, did you know that "struct filename" is only used for
> > storing fairly-temporary stuff like paths supplied to open(), but not
> > for storing the names of directory entries in the dentry cache (which
> > are more long-lived)? My understanding is that this is also why the
> > kernel doesn't really try to optimize the size of "struct filename" -
> > almost all instances of it only exist for the duration of a syscall or
> > something like that.
>
> Right, but it was enough of a location change that I felt like it was
> worth fixing it up.
>
> > The dentry cache allocates long names as "struct external_name" in
> > reclaimable kmalloc slabs, see __d_alloc().
>
> Oh hey, look at that!
>
>                 struct external_name *p = kmalloc(size + name->len,
>                                                   GFP_KERNEL_ACCOUNT |
>                                                   __GFP_RECLAIMABLE);
>
> Yeah, let's put that into dedicated buckets instead?

Actually, looking around a bit, there really aren't that many
allocations with __GFP_RECLAIMABLE, so this probably isn't all that
useful for same-cache attacks. (To be clear: Anything with
__GFP_RECLAIMABLE goes in the special kmalloc-rcl-* slabs.) Looking
around, the only other kmalloc users of __GFP_RECLAIMABLE I see are:

 - alloc_buffer_data() in drivers/md/dm-bufio.c
 - fuse_dentry_init()

So I don't think there's anything to be done here, this is already
using quasi-dedicated buckets. Sorry for the distraction.
Re: [PATCH] fs: Keep long filenames in isolated slab buckets
Posted by Kees Cook 1 month, 2 weeks ago
On Wed, Feb 11, 2026 at 03:06:47AM +0100, Jann Horn wrote:
> Actually, looking around a bit, there really aren't that many
> allocations with __GFP_RECLAIMABLE, so this probably isn't all that
> useful for same-cache attacks. (To be clear: Anything with
> __GFP_RECLAIMABLE goes in the special kmalloc-rcl-* slabs.) Looking

Ah! Yeah, I looked right past __GFP_RECLAIMABLE. As you say, this will
keep it isolated already.

-- 
Kees Cook
Re: [PATCH] fs: Keep long filenames in isolated slab buckets
Posted by Al Viro 1 month, 2 weeks ago
On Wed, Feb 11, 2026 at 03:06:47AM +0100, Jann Horn wrote:

> > > I think this path, where we always do maximally-sized allocations, is
> > > the normal case where we're handling paths coming from userspace...
> >
> > Actually, is there any reason we can't use strnlen_user() in
> > do_getname(), and then just use strndup_user() in the long case?
> 
> I'm not an expert, but as far as I know, this path is supposed to be
> really fast (because pretty much every syscall that operates on a path
> will hit it), and doesn't care how much memory it allocates (because
> these allocations are normally only alive for the duration of a
> syscall). strnlen_user() would add another pass over the userspace
> buffer, which I think would probably have negative performance impact?

Sigh...  This is the case of path longer than 168 bytes (EMBEDDED_NAME_MAX);
that's not hard to trigger, but not exactly common.  What matters more is
that we really do not want to deal with the "now it appears to be empty"
case here - it makes the logics in the caller more convoluted and it's not
pretty as it is.

And no, it is not going to be presistent - the longest you can stick such
beasts in there is probably with io-uring; names copied in when request is
submitted and stay around until a worker thread gets around to finishing
the request.