[PATCH bpf v3] bpf: do not use kmalloc_nolock when !HAVE_CMPXCHG_DOUBLE

Levi Zim via B4 Relay posted 1 patch 3 weeks, 2 days ago
include/linux/bpf_local_storage.h | 1 +
kernel/bpf/bpf_cgrp_storage.c     | 3 ++-
kernel/bpf/bpf_local_storage.c    | 4 ++++
kernel/bpf/bpf_task_storage.c     | 3 ++-
4 files changed, 9 insertions(+), 2 deletions(-)
[PATCH bpf v3] bpf: do not use kmalloc_nolock when !HAVE_CMPXCHG_DOUBLE
Posted by Levi Zim via B4 Relay 3 weeks, 2 days ago
From: Levi Zim <rsworktech@outlook.com>

kmalloc_nolock always fails for architectures that lack cmpxchg16b.
For example, this causes bpf_task_storage_get with flag
BPF_LOCAL_STORAGE_GET_F_CREATE to fails on riscv64 6.19 kernel.

Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
But leave the PREEMPT_RT case as is because it requires kmalloc_nolock
for correctness. Add a comment about this limitation that architecture's
lack of CMPXCHG_DOUBLE combined with PREEMPT_RT could make
bpf_local_storage_alloc always fail.

Fixes: f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage")
Cc: stable@vger.kernel.org
Signed-off-by: Levi Zim <rsworktech@outlook.com>
---
I find that bpf_task_storage_get with flag BPF_LOCAL_STORAGE_GET_F_CREATE
always fails for me on 6.19 kernel on riscv64 and bisected it.

In f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock()
in local storage"), bpf memory allocator is replaced with kmalloc_nolock.
This approach is problematic for architectures that lack CMPXCHG_DOUBLE
because kmalloc_nolock always fails in this case:

In function kmalloc_nolock (kmalloc_nolock_noprof): 

	if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
		/*
		 * kmalloc_nolock() is not supported on architectures that
		 * don't implement cmpxchg16b, but debug caches don't use
		 * per-cpu slab and per-cpu partial slabs. They rely on
		 * kmem_cache_node->list_lock, so kmalloc_nolock() can
		 * attempt to allocate from debug caches by
		 * spin_trylock_irqsave(&n->list_lock, ...)
		 */
		return NULL;

Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
(But not for a PREEMPT_RT case as explained in the comment and commitmsg)

Note for stable: this only needs to be picked into v6.19 if the patch
makes it into 7.0.
---
Changes in v3:
- Use macro instead of const static variable to avoid triggering
  warnings.
- Wrap lines at 80 columns
- Link to v2: https://lore.kernel.org/r/20260314-bpf-kmalloc-nolock-v2-1-576e33e4fa67@outlook.com

Changes in v2:
- Drop the modification to the PREEMPT_RT case as it requires
  kmalloc_nolock for correctness.
- Add a comment to the PREEMPT_RT case about the limitation when
  not HAVE_CMPXCHG_DOUBLE but enables PREEMPT_RT.
- Link to v1: https://lore.kernel.org/r/20260314-bpf-kmalloc-nolock-v1-1-24abf3f75a9f@outlook.com
---
 include/linux/bpf_local_storage.h | 1 +
 kernel/bpf/bpf_cgrp_storage.c     | 3 ++-
 kernel/bpf/bpf_local_storage.c    | 4 ++++
 kernel/bpf/bpf_task_storage.c     | 3 ++-
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 8157e8da61d40..d8f2c5d63a80e 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -18,6 +18,7 @@
 #include <asm/rqspinlock.h>
 
 #define BPF_LOCAL_STORAGE_CACHE_SIZE	16
+#define KMALLOC_NOLOCK_SUPPORTED IS_ENABLED(CONFIG_HAVE_CMPXCHG_DOUBLE)
 
 struct bpf_local_storage_map_bucket {
 	struct hlist_head list;
diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
index c2a2ead1f466d..cd18193c44058 100644
--- a/kernel/bpf/bpf_cgrp_storage.c
+++ b/kernel/bpf/bpf_cgrp_storage.c
@@ -114,7 +114,8 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
 
 static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 {
-	return bpf_local_storage_map_alloc(attr, &cgroup_cache, true);
+	return bpf_local_storage_map_alloc(attr, &cgroup_cache,
+					   KMALLOC_NOLOCK_SUPPORTED);
 }
 
 static void cgroup_storage_map_free(struct bpf_map *map)
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 9c96a4477f81a..a6c240da87668 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -893,6 +893,10 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
 	/* In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non
 	 * preemptible context. Thus, enforce all storages to use
 	 * kmalloc_nolock() when CONFIG_PREEMPT_RT is enabled.
+	 *
+	 * However, kmalloc_nolock would fail on architectures that do not
+	 * have CMPXCHG_DOUBLE. On such architectures with PREEMPT_RT,
+	 * bpf_local_storage_alloc would always fail.
 	 */
 	smap->use_kmalloc_nolock = IS_ENABLED(CONFIG_PREEMPT_RT) ? true : use_kmalloc_nolock;
 
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index 605506792b5b4..6e8597edea314 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -212,7 +212,8 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
 
 static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
 {
-	return bpf_local_storage_map_alloc(attr, &task_cache, true);
+	return bpf_local_storage_map_alloc(attr, &task_cache,
+					   KMALLOC_NOLOCK_SUPPORTED);
 }
 
 static void task_storage_map_free(struct bpf_map *map)

---
base-commit: e06e6b8001233241eb5b2e2791162f0585f50f4b
change-id: 20260314-bpf-kmalloc-nolock-60da80e613de

Best regards,
-- 
Levi Zim <rsworktech@outlook.com>
Re: [PATCH bpf v3] bpf: do not use kmalloc_nolock when !HAVE_CMPXCHG_DOUBLE
Posted by Amery Hung 3 weeks ago
On Sat, Mar 14, 2026 at 9:02 AM Levi Zim via B4 Relay
<devnull+rsworktech.outlook.com@kernel.org> wrote:
>
> From: Levi Zim <rsworktech@outlook.com>
>
> kmalloc_nolock always fails for architectures that lack cmpxchg16b.
> For example, this causes bpf_task_storage_get with flag
> BPF_LOCAL_STORAGE_GET_F_CREATE to fails on riscv64 6.19 kernel.
>
> Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
> But leave the PREEMPT_RT case as is because it requires kmalloc_nolock
> for correctness. Add a comment about this limitation that architecture's
> lack of CMPXCHG_DOUBLE combined with PREEMPT_RT could make
> bpf_local_storage_alloc always fail.

Let's not do this.

This re-introduces deadlock to local storage. In addition, local
storage will switch to using kmalloc_nolock() entirely.

For riscv hardware without zacas extension, I think a workaround with
some performance overhead is to enable CONFIG_SLUB_DEBUG and
slub_debug options.

>
> Fixes: f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage")
> Cc: stable@vger.kernel.org
> Signed-off-by: Levi Zim <rsworktech@outlook.com>
> ---
> I find that bpf_task_storage_get with flag BPF_LOCAL_STORAGE_GET_F_CREATE
> always fails for me on 6.19 kernel on riscv64 and bisected it.
>
> In f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock()
> in local storage"), bpf memory allocator is replaced with kmalloc_nolock.
> This approach is problematic for architectures that lack CMPXCHG_DOUBLE
> because kmalloc_nolock always fails in this case:
>
> In function kmalloc_nolock (kmalloc_nolock_noprof):
>
>         if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
>                 /*
>                  * kmalloc_nolock() is not supported on architectures that
>                  * don't implement cmpxchg16b, but debug caches don't use
>                  * per-cpu slab and per-cpu partial slabs. They rely on
>                  * kmem_cache_node->list_lock, so kmalloc_nolock() can
>                  * attempt to allocate from debug caches by
>                  * spin_trylock_irqsave(&n->list_lock, ...)
>                  */
>                 return NULL;
>
> Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
> (But not for a PREEMPT_RT case as explained in the comment and commitmsg)
>
> Note for stable: this only needs to be picked into v6.19 if the patch
> makes it into 7.0.
> ---
> Changes in v3:
> - Use macro instead of const static variable to avoid triggering
>   warnings.
> - Wrap lines at 80 columns
> - Link to v2: https://lore.kernel.org/r/20260314-bpf-kmalloc-nolock-v2-1-576e33e4fa67@outlook.com
>
> Changes in v2:
> - Drop the modification to the PREEMPT_RT case as it requires
>   kmalloc_nolock for correctness.
> - Add a comment to the PREEMPT_RT case about the limitation when
>   not HAVE_CMPXCHG_DOUBLE but enables PREEMPT_RT.
> - Link to v1: https://lore.kernel.org/r/20260314-bpf-kmalloc-nolock-v1-1-24abf3f75a9f@outlook.com
> ---
>  include/linux/bpf_local_storage.h | 1 +
>  kernel/bpf/bpf_cgrp_storage.c     | 3 ++-
>  kernel/bpf/bpf_local_storage.c    | 4 ++++
>  kernel/bpf/bpf_task_storage.c     | 3 ++-
>  4 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
> index 8157e8da61d40..d8f2c5d63a80e 100644
> --- a/include/linux/bpf_local_storage.h
> +++ b/include/linux/bpf_local_storage.h
> @@ -18,6 +18,7 @@
>  #include <asm/rqspinlock.h>
>
>  #define BPF_LOCAL_STORAGE_CACHE_SIZE   16
> +#define KMALLOC_NOLOCK_SUPPORTED IS_ENABLED(CONFIG_HAVE_CMPXCHG_DOUBLE)
>
>  struct bpf_local_storage_map_bucket {
>         struct hlist_head list;
> diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
> index c2a2ead1f466d..cd18193c44058 100644
> --- a/kernel/bpf/bpf_cgrp_storage.c
> +++ b/kernel/bpf/bpf_cgrp_storage.c
> @@ -114,7 +114,8 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
>
>  static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>  {
> -       return bpf_local_storage_map_alloc(attr, &cgroup_cache, true);
> +       return bpf_local_storage_map_alloc(attr, &cgroup_cache,
> +                                          KMALLOC_NOLOCK_SUPPORTED);
>  }
>
>  static void cgroup_storage_map_free(struct bpf_map *map)
> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index 9c96a4477f81a..a6c240da87668 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c
> @@ -893,6 +893,10 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
>         /* In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non
>          * preemptible context. Thus, enforce all storages to use
>          * kmalloc_nolock() when CONFIG_PREEMPT_RT is enabled.
> +        *
> +        * However, kmalloc_nolock would fail on architectures that do not
> +        * have CMPXCHG_DOUBLE. On such architectures with PREEMPT_RT,
> +        * bpf_local_storage_alloc would always fail.
>          */
>         smap->use_kmalloc_nolock = IS_ENABLED(CONFIG_PREEMPT_RT) ? true : use_kmalloc_nolock;
>
> diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
> index 605506792b5b4..6e8597edea314 100644
> --- a/kernel/bpf/bpf_task_storage.c
> +++ b/kernel/bpf/bpf_task_storage.c
> @@ -212,7 +212,8 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
>
>  static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
>  {
> -       return bpf_local_storage_map_alloc(attr, &task_cache, true);
> +       return bpf_local_storage_map_alloc(attr, &task_cache,
> +                                          KMALLOC_NOLOCK_SUPPORTED);
>  }
>
>  static void task_storage_map_free(struct bpf_map *map)
>
> ---
> base-commit: e06e6b8001233241eb5b2e2791162f0585f50f4b
> change-id: 20260314-bpf-kmalloc-nolock-60da80e613de
>
> Best regards,
> --
> Levi Zim <rsworktech@outlook.com>
>
>
Re: [PATCH bpf v3] bpf: do not use kmalloc_nolock when !HAVE_CMPXCHG_DOUBLE
Posted by Paul Chaignon 3 weeks ago
On Sun, Mar 15, 2026 at 12:02:48AM +0800, Levi Zim via B4 Relay wrote:
> From: Levi Zim <rsworktech@outlook.com>
> 
> kmalloc_nolock always fails for architectures that lack cmpxchg16b.
> For example, this causes bpf_task_storage_get with flag
> BPF_LOCAL_STORAGE_GET_F_CREATE to fails on riscv64 6.19 kernel.
> 
> Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
> But leave the PREEMPT_RT case as is because it requires kmalloc_nolock
> for correctness. Add a comment about this limitation that architecture's
> lack of CMPXCHG_DOUBLE combined with PREEMPT_RT could make
> bpf_local_storage_alloc always fail.
> 
> Fixes: f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage")
> Cc: stable@vger.kernel.org
> Signed-off-by: Levi Zim <rsworktech@outlook.com>
> ---

Note there may be something broken with your setup as lore is reporting
that you sent this v3 email three times. Not sure if it could be an
issue.

[...]

> diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
> index 8157e8da61d40..d8f2c5d63a80e 100644
> --- a/include/linux/bpf_local_storage.h
> +++ b/include/linux/bpf_local_storage.h
> @@ -18,6 +18,7 @@
>  #include <asm/rqspinlock.h>
>  
>  #define BPF_LOCAL_STORAGE_CACHE_SIZE	16
> +#define KMALLOC_NOLOCK_SUPPORTED IS_ENABLED(CONFIG_HAVE_CMPXCHG_DOUBLE)
>  
>  struct bpf_local_storage_map_bucket {
>  	struct hlist_head list;
> diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
> index c2a2ead1f466d..cd18193c44058 100644
> --- a/kernel/bpf/bpf_cgrp_storage.c
> +++ b/kernel/bpf/bpf_cgrp_storage.c
> @@ -114,7 +114,8 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
>  
>  static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>  {
> -	return bpf_local_storage_map_alloc(attr, &cgroup_cache, true);
> +	return bpf_local_storage_map_alloc(attr, &cgroup_cache,
> +					   KMALLOC_NOLOCK_SUPPORTED);
>  }
>  
>  static void cgroup_storage_map_free(struct bpf_map *map)
> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index 9c96a4477f81a..a6c240da87668 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c
> @@ -893,6 +893,10 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
>  	/* In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non
>  	 * preemptible context. Thus, enforce all storages to use
>  	 * kmalloc_nolock() when CONFIG_PREEMPT_RT is enabled.
> +	 *
> +	 * However, kmalloc_nolock would fail on architectures that do not
> +	 * have CMPXCHG_DOUBLE. On such architectures with PREEMPT_RT,
> +	 * bpf_local_storage_alloc would always fail.
>  	 */
>  	smap->use_kmalloc_nolock = IS_ENABLED(CONFIG_PREEMPT_RT) ? true : use_kmalloc_nolock;
>  
> diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
> index 605506792b5b4..6e8597edea314 100644
> --- a/kernel/bpf/bpf_task_storage.c
> +++ b/kernel/bpf/bpf_task_storage.c
> @@ -212,7 +212,8 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
>  
>  static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
>  {
> -	return bpf_local_storage_map_alloc(attr, &task_cache, true);
> +	return bpf_local_storage_map_alloc(attr, &task_cache,
> +					   KMALLOC_NOLOCK_SUPPORTED);

I can confirm that this does fix one selftest using
BPF_LOCAL_STORAGE_GET_F_CREATE on riscv64: test_ls_map_kptr_ref1 in
map_kptr. Other tests using BPF_LOCAL_STORAGE_GET_F_CREATE are still
failing so I guess they have other issues.

Tested-by: Paul Chaignon <paul.chaignon@gmail.com>

>  }
>  
>  static void task_storage_map_free(struct bpf_map *map)
> 
> ---
> base-commit: e06e6b8001233241eb5b2e2791162f0585f50f4b
> change-id: 20260314-bpf-kmalloc-nolock-60da80e613de
> 
> Best regards,
> -- 
> Levi Zim <rsworktech@outlook.com>
> 
> 
>
Re: [PATCH bpf v3] bpf: do not use kmalloc_nolock when !HAVE_CMPXCHG_DOUBLE
Posted by Yao Zi 3 weeks ago
On Mon, Mar 16, 2026 at 04:05:14PM +0100, Paul Chaignon wrote:
> On Sun, Mar 15, 2026 at 12:02:48AM +0800, Levi Zim via B4 Relay wrote:
> > From: Levi Zim <rsworktech@outlook.com>
> > 
> > kmalloc_nolock always fails for architectures that lack cmpxchg16b.
> > For example, this causes bpf_task_storage_get with flag
> > BPF_LOCAL_STORAGE_GET_F_CREATE to fails on riscv64 6.19 kernel.
> > 
> > Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
> > But leave the PREEMPT_RT case as is because it requires kmalloc_nolock
> > for correctness. Add a comment about this limitation that architecture's
> > lack of CMPXCHG_DOUBLE combined with PREEMPT_RT could make
> > bpf_local_storage_alloc always fail.
> > 
> > Fixes: f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Levi Zim <rsworktech@outlook.com>
> > ---
> 
> Note there may be something broken with your setup as lore is reporting
> that you sent this v3 email three times. Not sure if it could be an
> issue.

Once is because linux-riscv@lists.infradead.org adds a trailer when
forwarding messages but keeps the Message-ID unchanged, so lore indexed
one extra message with the same ID but different content, it was not
Levi doing something wrong.

The other message has the same content but a different From line, not
sure what happened to it. Differences of the messages could be viewed
here[1].

Regards,
Yao Zi

[1]: https://lore.kernel.org/all/20260315-bpf-kmalloc-nolock-v3-1-91c72bf91902@outlook.com/d/