[PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()

Byeonguk Jeong posted 1 patch 1 month ago
kernel/bpf/lpm_trie.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
Posted by Byeonguk Jeong 1 month ago
trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
full paths from the root to leaves. For example, consider a trie with
max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
.prefixlen = 8 make 9 nodes be written on the node stack with size 8.

Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
---
 kernel/bpf/lpm_trie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 0218a5132ab5..9b60eda0f727 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -655,7 +655,7 @@ static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
 	if (!key || key->prefixlen > trie->max_prefixlen)
 		goto find_leftmost;
 
-	node_stack = kmalloc_array(trie->max_prefixlen,
+	node_stack = kmalloc_array(trie->max_prefixlen + 1,
 				   sizeof(struct lpm_trie_node *),
 				   GFP_ATOMIC | __GFP_NOWARN);
 	if (!node_stack)
-- 
2.43.5
Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
Posted by Hou Tao 1 month ago

On 10/22/2024 9:45 AM, Byeonguk Jeong wrote:
> trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> full paths from the root to leaves. For example, consider a trie with
> max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
>
> Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
> ---

Tested-by: Hou Tao <houtao1@huawei.com>

Without the fix, there will be KASAN report as show below when dumping
all keys in the lpm-trie through bpf_map_get_next_key().

However, I have a dumb question: does it make sense to reject the
element with prefixlen = 0 ? Because I can't think of a use case where a
zero-length prefix will be useful.


 ==================================================================
 BUG: KASAN: slab-out-of-bounds in trie_get_next_key+0x133/0x530
 Write of size 8 at addr ffff8881076c2fc0 by task test_lpm_trie.b/446

 CPU: 0 UID: 0 PID: 446 Comm: test_lpm_trie.b Not tainted 6.11.0+ #52
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
 Call Trace:
  <TASK>
  dump_stack_lvl+0x6e/0xb0
  print_report+0xce/0x610
  ? trie_get_next_key+0x133/0x530
  ? kasan_complete_mode_report_info+0x3c/0x200
  ? trie_get_next_key+0x133/0x530
  kasan_report+0x9c/0xd0
  ? trie_get_next_key+0x133/0x530
  __asan_store8+0x81/0xb0
  trie_get_next_key+0x133/0x530
  __sys_bpf+0x1b03/0x3140
  ? __pfx___sys_bpf+0x10/0x10
  ? __pfx_vfs_write+0x10/0x10
  ? find_held_lock+0x8e/0xb0
  ? ksys_write+0xee/0x180
  ? syscall_exit_to_user_mode+0xb3/0x220
  ? mark_held_locks+0x28/0x90
  ? mark_held_locks+0x28/0x90
  __x64_sys_bpf+0x45/0x60
  x64_sys_call+0x1b2a/0x20d0
  do_syscall_64+0x5d/0x100
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f9c5e9c9c5d
  ......
  </TASK>
 Allocated by task 446:
  kasan_save_stack+0x28/0x50
  kasan_save_track+0x14/0x30
  kasan_save_alloc_info+0x36/0x40
  __kasan_kmalloc+0x84/0xa0
  __kmalloc_noprof+0x214/0x540
  trie_get_next_key+0xa7/0x530
  __sys_bpf+0x1b03/0x3140
  __x64_sys_bpf+0x45/0x60
  x64_sys_call+0x1b2a/0x20d0
  do_syscall_64+0x5d/0x100
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

 The buggy address belongs to the object at ffff8881076c2f80
  which belongs to the cache kmalloc-rnd-09-64 of size 64
 The buggy address is located 0 bytes to the right of
  allocated 64-byte region [ffff8881076c2f80, ffff8881076c2fc0)

>  kernel/bpf/lpm_trie.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 0218a5132ab5..9b60eda0f727 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -655,7 +655,7 @@ static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
>  	if (!key || key->prefixlen > trie->max_prefixlen)
>  		goto find_leftmost;
>  
> -	node_stack = kmalloc_array(trie->max_prefixlen,
> +	node_stack = kmalloc_array(trie->max_prefixlen + 1,
>  				   sizeof(struct lpm_trie_node *),
>  				   GFP_ATOMIC | __GFP_NOWARN);
>  	if (!node_stack)


Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
Posted by Byeonguk Jeong 1 month ago
On Wed, Oct 23, 2024 at 10:03:44AM +0800, Hou Tao wrote:
>
> Without the fix, there will be KASAN report as show below when dumping
> all keys in the lpm-trie through bpf_map_get_next_key().

Thank you for testing.

> 
> However, I have a dumb question: does it make sense to reject the
> element with prefixlen = 0 ? Because I can't think of a use case where a
> zero-length prefix will be useful.

With prefixlen = 0, it would always return -ENOENT, I think. Maybe it is
good to reject it earlier!
Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
Posted by Hou Tao 1 month ago
Hi,

On 10/23/2024 3:30 PM, Byeonguk Jeong wrote:
> On Wed, Oct 23, 2024 at 10:03:44AM +0800, Hou Tao wrote:
>> Without the fix, there will be KASAN report as show below when dumping
>> all keys in the lpm-trie through bpf_map_get_next_key().
> Thank you for testing.

Alexei suggested adding a bpf self-test for the patch.  I think you
could reference the code in lpm_trie_map_batch_ops.c [1] or similar and
add a new file that uses bpf_map_get_next_key to demonstrate the
out-of-bound problem. The test can be run by ./test_maps. There is some
document for the procedure in [2].

[1]:  tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c
[2]:
https://github.com/torvalds/linux/blob/master/Documentation/bpf/bpf_devel_QA.rst
>
>> However, I have a dumb question: does it make sense to reject the
>> element with prefixlen = 0 ? Because I can't think of a use case where a
>> zero-length prefix will be useful.
> With prefixlen = 0, it would always return -ENOENT, I think. Maybe it is
> good to reject it earlier!
>
> .

Which procedure will return -ENOENT ? I think the element with
prefixlen=0 could still be found through the key with prefixlen = 0.

Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
Posted by Byeonguk Jeong 1 month ago
Hi,

On Wed, Oct 23, 2024 at 05:59:53PM +0800, Hou Tao wrote:
> Alexei suggested adding a bpf self-test for the patch.  I think you
> could reference the code in lpm_trie_map_batch_ops.c [1] or similar and
> add a new file that uses bpf_map_get_next_key to demonstrate the
> out-of-bound problem. The test can be run by ./test_maps. There is some
> document for the procedure in [2].
> 
> [1]:  tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c
> [2]:
> https://github.com/torvalds/linux/blob/master/Documentation/bpf/bpf_devel_QA.rst

Okay, I will add a new test. Thanks for the detailed guideline.

> Which procedure will return -ENOENT ? I think the element with
> prefixlen=0 could still be found through the key with prefixlen = 0.

I mean, BPF_MAP_GET_NEXT_KEY with .prefixlen = 0 would give us -ENOENT,
as it follows postorder. BPF_MAP_LOOKUP_ELEM still find the element
with prefixlen 0 through the key with prefixlen 0 as you said.
Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
Posted by Hou Tao 1 month ago

On 10/24/2024 9:48 AM, Byeonguk Jeong wrote:
> Hi,
>
> On Wed, Oct 23, 2024 at 05:59:53PM +0800, Hou Tao wrote:
>> Alexei suggested adding a bpf self-test for the patch.  I think you
>> could reference the code in lpm_trie_map_batch_ops.c [1] or similar and
>> add a new file that uses bpf_map_get_next_key to demonstrate the
>> out-of-bound problem. The test can be run by ./test_maps. There is some
>> document for the procedure in [2].
>>
>> [1]:  tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c
>> [2]:
>> https://github.com/torvalds/linux/blob/master/Documentation/bpf/bpf_devel_QA.rst
> Okay, I will add a new test. Thanks for the detailed guideline.
>
>> Which procedure will return -ENOENT ? I think the element with
>> prefixlen=0 could still be found through the key with prefixlen = 0.
> I mean, BPF_MAP_GET_NEXT_KEY with .prefixlen = 0 would give us -ENOENT,
> as it follows postorder. BPF_MAP_LOOKUP_ELEM still find the element
> with prefixlen 0 through the key with prefixlen 0 as you said.

I see. But considering the element with .prefixlen = 0 is the last one
in the map, returning -ENOENT is expected.
> .

Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
Posted by Alexei Starovoitov 1 month ago
On Mon, Oct 21, 2024 at 6:49 PM Byeonguk Jeong <jungbu2855@gmail.com> wrote:
>
> trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> full paths from the root to leaves. For example, consider a trie with
> max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> .prefixlen = 8 make 9 nodes be written on the node stack with size 8.

Hmm. It sounds possible, but pls demonstrate it with a selftest.
With the amount of fuzzing I'm surprised it was not discovered earlier.

pw-bot: cr
Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
Posted by Byeonguk Jeong 1 month ago
On Tue, Oct 22, 2024 at 12:51:05PM -0700, Alexei Starovoitov wrote:
> On Mon, Oct 21, 2024 at 6:49 PM Byeonguk Jeong <jungbu2855@gmail.com> wrote:
> >
> > trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> > while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> > full paths from the root to leaves. For example, consider a trie with
> > max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> > 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> > .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
> 
> Hmm. It sounds possible, but pls demonstrate it with a selftest.
> With the amount of fuzzing I'm surprised it was not discovered earlier.
> 
> pw-bot: cr

I sent this again because lkml did not understand previous one which is
8B encoded.

With a simple test below, the kernel crashes in a minute or you can
discover the bug on KFENCE-enabled kernels easily.

#!/bin/bash
bpftool map create /sys/fs/bpf/lpm type lpm_trie key 5 value 1 \
entries 16 flags 0x1name lpm

for i in {0..8}; do
	bpftool map update pinned /sys/fs/bpf/lpm \
	key hex 0$i 00 00 00 00 \
	value hex 00 any
done

while true; do
	bpftool map dump pinned /sys/fs/bpf/lpm
done

In my environment (6.12-rc4, with CONFIG_KFENCE), dmesg gave me this
message as expected.

[  463.141394] BUG: KFENCE: out-of-bounds write in trie_get_next_key+0x2f2/0x670

[  463.143422] Out-of-bounds write at 0x0000000095bc45ea (256B right of kfence-#156):
[  463.144438]  trie_get_next_key+0x2f2/0x670
[  463.145439]  map_get_next_key+0x261/0x410
[  463.146444]  __sys_bpf+0xad4/0x1170
[  463.147438]  __x64_sys_bpf+0x74/0xc0
[  463.148431]  do_syscall_64+0x79/0x150
[  463.149425]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  463.151436] kfence-#156: 0x00000000279749c1-0x0000000034dc4abb, size=256, cache=kmalloc-256

[  463.153414] allocated by task 2021 on cpu 2 at 463.140440s (0.012974s ago):
[  463.154413]  trie_get_next_key+0x252/0x670
[  463.155411]  map_get_next_key+0x261/0x410
[  463.156402]  __sys_bpf+0xad4/0x1170
[  463.157390]  __x64_sys_bpf+0x74/0xc0
[  463.158386]  do_syscall_64+0x79/0x150
[  463.159372]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
Posted by Byeonguk Jeong 1 month ago
On Tue, Oct 22, 2024 at 12:51:05PM -0700, Alexei Starovoitov wrote:
> On Mon, Oct 21, 2024 at 6:49 PM Byeonguk Jeong <jungbu2855@gmail.com> wrote:
> >
> > trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> > while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> > full paths from the root to leaves. For example, consider a trie with
> > max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> > 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> > .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
> 
> Hmm. It sounds possible, but pls demonstrate it with a selftest.
> With the amount of fuzzing I'm surprised it was not discovered earlier.
> 
> pw-bot: cr

With a simple test below, the kernel crashes in a minute or you can easily
discover the bug on KFENCE-enabled kernels.

#!/bin/bash
bpftool map create /sys/fs/bpf/lpm type lpm_trie key 5 value 1 \
entries 16 flags 0x1name lpm

for i in {0..8}; do
	bpftool map update pinned /sys/fs/bpf/lpm \
	key hex 0$i 00 00 00 00 \
	value hex 00 any
done

while true; do
	bpftool map dump pinned /sys/fs/bpf/lpm
done

In my environment (6.12-rc4, with CONFIG_KFENCE), dmesg gave me this
message as expected.

[  463.141394] BUG: KFENCE: out-of-bounds write in trie_get_next_key+0x2f2/0x670

[  463.143422] Out-of-bounds write at 0x0000000095bc45ea (256B right of kfence-#156):
[  463.144438]  trie_get_next_key+0x2f2/0x670
[  463.145439]  map_get_next_key+0x261/0x410
[  463.146444]  __sys_bpf+0xad4/0x1170
[  463.147438]  __x64_sys_bpf+0x74/0xc0
[  463.148431]  do_syscall_64+0x79/0x150
[  463.149425]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  463.151436] kfence-#156: 0x00000000279749c1-0x0000000034dc4abb, size=256, cache=kmalloc-256

[  463.153414] allocated by task 2021 on cpu 2 at 463.140440s (0.012974s ago):
[  463.154413]  trie_get_next_key+0x252/0x670
[  463.155411]  map_get_next_key+0x261/0x410
[  463.156402]  __sys_bpf+0xad4/0x1170
[  463.157390]  __x64_sys_bpf+0x74/0xc0
[  463.158386]  do_syscall_64+0x79/0x150
[  463.159372]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
Posted by Toke Høiland-Jørgensen 1 month ago
Byeonguk Jeong <jungbu2855@gmail.com> writes:

> trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> full paths from the root to leaves. For example, consider a trie with
> max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
>
> Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>

Makes sense!

Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
[PATCH] selftests/bpf: Add test for trie_get_next_key()
Posted by Byeonguk Jeong 1 month ago
Add a test for out-of-bounds write in trie_get_next_key() when a full
path from root to leaf exists and bpf_map_get_next_key() is called
with the leaf node. It may crashes the kernel on failure, so please
run in a VM.

Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
---
 .../bpf/map_tests/lpm_trie_map_get_next_key.c | 115 ++++++++++++++++++
 1 file changed, 115 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c

diff --git a/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c b/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
new file mode 100644
index 000000000000..85b916b69411
--- /dev/null
+++ b/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * WARNING
+ * -------
+ *  This test suite may crash the kernel, thus should be run in a VM.
+ */
+
+#define _GNU_SOURCE
+#include <linux/bpf.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include <test_maps.h>
+
+struct test_lpm_key {
+	__u32 prefix;
+	__u32 data;
+};
+
+struct get_next_key_ctx {
+	struct test_lpm_key key;
+	bool start;
+	bool stop;
+	int map_fd;
+	int loop;
+};
+
+static void *get_next_key_fn(void *arg)
+{
+	struct get_next_key_ctx *ctx = arg;
+	struct test_lpm_key next_key;
+	int i;
+
+	while (!ctx->start)
+		usleep(1);
+
+	while (!ctx->stop && i++ < ctx->loop)
+		bpf_map_get_next_key(ctx->map_fd, &ctx->key, &next_key);
+
+	return NULL;
+}
+
+static void abort_get_next_key(struct get_next_key_ctx *ctx, pthread_t *tids,
+			       unsigned int nr)
+{
+	unsigned int i;
+
+	ctx->stop = true;
+	ctx->start = true;
+	for (i = 0; i < nr; i++)
+		pthread_join(tids[i], NULL);
+}
+
+/* This test aims to prevent regression of future. As long as the kernel does
+ * not panic, it is considered as success.
+ */
+void test_lpm_trie_map_get_next_key(void)
+{
+#define MAX_NR_THREADS 256
+	LIBBPF_OPTS(bpf_map_create_opts, create_opts,
+		    .map_flags = BPF_F_NO_PREALLOC);
+	struct test_lpm_key key = {};
+	__u32 val = 0;
+	int map_fd;
+	const __u32 max_prefixlen = 8 * (sizeof(key) - sizeof(key.prefix));
+	const __u32 max_entries = max_prefixlen + 1;
+	unsigned int i, nr = MAX_NR_THREADS, loop = 4096;
+	pthread_t tids[MAX_NR_THREADS];
+	struct get_next_key_ctx ctx;
+	int err;
+
+	map_fd = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, "lpm_trie_map",
+				sizeof(struct test_lpm_key), sizeof(__u32),
+				max_entries, &create_opts);
+	CHECK(map_fd == -1, "bpf_map_create(), error:%s\n",
+	      strerror(errno));
+
+	for (i = 0; i <= max_prefixlen; i++) {
+		key.prefix = i;
+		err = bpf_map_update_elem(map_fd, &key, &val, BPF_ANY);
+		CHECK(err, "bpf_map_update_elem()", "error:%s\n",
+		      strerror(errno));
+	}
+
+	ctx.start = false;
+	ctx.stop = false;
+	ctx.map_fd = map_fd;
+	ctx.loop = loop;
+	memcpy(&ctx.key, &key, sizeof(key));
+
+	for (i = 0; i < nr; i++) {
+		err = pthread_create(&tids[i], NULL, get_next_key_fn, &ctx);
+		if (err) {
+			abort_get_next_key(&ctx, tids, i);
+			CHECK(err, "pthread_create", "error %d\n", err);
+		}
+	}
+
+	ctx.start = true;
+	for (i = 0; i < nr; i++)
+		pthread_join(tids[i], NULL);
+
+	printf("%s:PASS\n", __func__);
+
+	close(map_fd);
+}
-- 
2.43.5
Re: [PATCH] selftests/bpf: Add test for trie_get_next_key()
Posted by Hou Tao 1 month ago
Hi,

On 10/24/2024 5:08 PM, Byeonguk Jeong wrote:
> Add a test for out-of-bounds write in trie_get_next_key() when a full
> path from root to leaf exists and bpf_map_get_next_key() is called
> with the leaf node. It may crashes the kernel on failure, so please
> run in a VM.
>
> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
> ---
>  .../bpf/map_tests/lpm_trie_map_get_next_key.c | 115 ++++++++++++++++++
>  1 file changed, 115 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
>
> diff --git a/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c b/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
> new file mode 100644
> index 000000000000..85b916b69411
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * WARNING
> + * -------
> + *  This test suite may crash the kernel, thus should be run in a VM.
> + */
> +

The comments above are unnecessary, please remove it.
> +#define _GNU_SOURCE
> +#include <linux/bpf.h>
> +#include <stdio.h>
> +#include <stdbool.h>
> +#include <unistd.h>
> +#include <errno.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <pthread.h>
> +
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include <test_maps.h>
> +
> +struct test_lpm_key {
> +	__u32 prefix;
> +	__u32 data;
> +};
> +
> +struct get_next_key_ctx {
> +	struct test_lpm_key key;
> +	bool start;
> +	bool stop;
> +	int map_fd;
> +	int loop;
> +};
> +
> +static void *get_next_key_fn(void *arg)
> +{
> +	struct get_next_key_ctx *ctx = arg;
> +	struct test_lpm_key next_key;
> +	int i;

int i = 0;
> +
> +	while (!ctx->start)
> +		usleep(1);
> +
> +	while (!ctx->stop && i++ < ctx->loop)
> +		bpf_map_get_next_key(ctx->map_fd, &ctx->key, &next_key);
> +
> +	return NULL;
> +}
> +
> +static void abort_get_next_key(struct get_next_key_ctx *ctx, pthread_t *tids,
> +			       unsigned int nr)
> +{
> +	unsigned int i;
> +
> +	ctx->stop = true;
> +	ctx->start = true;
> +	for (i = 0; i < nr; i++)
> +		pthread_join(tids[i], NULL);
> +}
> +
> +/* This test aims to prevent regression of future. As long as the kernel does
> + * not panic, it is considered as success.
> + */
> +void test_lpm_trie_map_get_next_key(void)
> +{
> +#define MAX_NR_THREADS 256

Are 8 threads sufficient to reproduce the problem ?
> +	LIBBPF_OPTS(bpf_map_create_opts, create_opts,
> +		    .map_flags = BPF_F_NO_PREALLOC);
> +	struct test_lpm_key key = {};
> +	__u32 val = 0;
> +	int map_fd;
> +	const __u32 max_prefixlen = 8 * (sizeof(key) - sizeof(key.prefix));
> +	const __u32 max_entries = max_prefixlen + 1;
> +	unsigned int i, nr = MAX_NR_THREADS, loop = 4096;
> +	pthread_t tids[MAX_NR_THREADS];
> +	struct get_next_key_ctx ctx;
> +	int err;
> +
> +	map_fd = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, "lpm_trie_map",
> +				sizeof(struct test_lpm_key), sizeof(__u32),
> +				max_entries, &create_opts);
> +	CHECK(map_fd == -1, "bpf_map_create(), error:%s\n",
> +	      strerror(errno));

CHECK(map_fd == -1, "bpf_map_create()", "error:%s\n", strerror(errno));
It seems you didn't build test it.
> +
> +	for (i = 0; i <= max_prefixlen; i++) {
> +		key.prefix = i;
> +		err = bpf_map_update_elem(map_fd, &key, &val, BPF_ANY);
> +		CHECK(err, "bpf_map_update_elem()", "error:%s\n",
> +		      strerror(errno));
> +	}
> +
> +	ctx.start = false;
> +	ctx.stop = false;
> +	ctx.map_fd = map_fd;
> +	ctx.loop = loop;
> +	memcpy(&ctx.key, &key, sizeof(key));
> +
> +	for (i = 0; i < nr; i++) {
> +		err = pthread_create(&tids[i], NULL, get_next_key_fn, &ctx);
> +		if (err) {
> +			abort_get_next_key(&ctx, tids, i);
> +			CHECK(err, "pthread_create", "error %d\n", err);
> +		}
> +	}
> +
> +	ctx.start = true;
> +	for (i = 0; i < nr; i++)
> +		pthread_join(tids[i], NULL);
> +
> +	printf("%s:PASS\n", __func__);
> +
> +	close(map_fd);
> +}
Re: [PATCH] selftests/bpf: Add test for trie_get_next_key()
Posted by Daniel Borkmann 1 month ago
Hi Byeonguk,

On 10/24/24 11:08 AM, Byeonguk Jeong wrote:
> Add a test for out-of-bounds write in trie_get_next_key() when a full
> path from root to leaf exists and bpf_map_get_next_key() is called
> with the leaf node. It may crashes the kernel on failure, so please
> run in a VM.
> 
> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>

Could you submit the fix + this selftest as a 2-patch series, otherwise BPF CI
cannot test both in combination (pls make sure subject has [PATCH bpf] so that
our CI adds this on top of the bpf tree).

Right now the CI selftest build threw an error:

   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c: In function ‘test_lpm_trie_map_get_next_key’:
   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c:84:9: error: format not a string literal and no format arguments [-Werror=format-security]
      84 |         CHECK(map_fd == -1, "bpf_map_create(), error:%s\n",
         |         ^~~~~
     TEST-OBJ [test_maps] task_storage_map.test.o
     TEST-OBJ [test_progs] access_variable_array.test.o
   cc1: all warnings being treated as errors
     TEST-OBJ [test_progs] align.test.o
     TEST-OBJ [test_progs] arena_atomics.test.o
   make: *** [Makefile:765: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/lpm_trie_map_get_next_key.test.o] Error 1
   make: *** Waiting for unfinished jobs....
     GEN-SKEL [test_progs-no_alu32] test_usdt.skel.h
   make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'

Also on quick glance, please use ASSERT_*() macros instead of CHECK() as the
latter is deprecated.

Thanks,
Daniel
Re: [PATCH] selftests/bpf: Add test for trie_get_next_key()
Posted by Byeonguk Jeong 1 month ago
Hi Daniel,

Okay, I will submit them in a series of patches. Btw, ASSERT_* macros
are not defined for map_tests. Should I add the definitions for them,
or just go with CHECK?

Thanks,
Byeonguk

On Thu, Oct 24, 2024 at 11:41:19AM +0200, Daniel Borkmann wrote:
> Hi Byeonguk,
> 
> On 10/24/24 11:08 AM, Byeonguk Jeong wrote:
> > Add a test for out-of-bounds write in trie_get_next_key() when a full
> > path from root to leaf exists and bpf_map_get_next_key() is called
> > with the leaf node. It may crashes the kernel on failure, so please
> > run in a VM.
> > 
> > Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
> 
> Could you submit the fix + this selftest as a 2-patch series, otherwise BPF CI
> cannot test both in combination (pls make sure subject has [PATCH bpf] so that
> our CI adds this on top of the bpf tree).
> 
> Right now the CI selftest build threw an error:
> 
>   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c: In function ‘test_lpm_trie_map_get_next_key’:
>   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c:84:9: error: format not a string literal and no format arguments [-Werror=format-security]
>      84 |         CHECK(map_fd == -1, "bpf_map_create(), error:%s\n",
>         |         ^~~~~
>     TEST-OBJ [test_maps] task_storage_map.test.o
>     TEST-OBJ [test_progs] access_variable_array.test.o
>   cc1: all warnings being treated as errors
>     TEST-OBJ [test_progs] align.test.o
>     TEST-OBJ [test_progs] arena_atomics.test.o
>   make: *** [Makefile:765: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/lpm_trie_map_get_next_key.test.o] Error 1
>   make: *** Waiting for unfinished jobs....
>     GEN-SKEL [test_progs-no_alu32] test_usdt.skel.h
>   make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
> 
> Also on quick glance, please use ASSERT_*() macros instead of CHECK() as the
> latter is deprecated.
> 
> Thanks,
> Daniel
Re: [PATCH] selftests/bpf: Add test for trie_get_next_key()
Posted by Hou Tao 1 month ago
Hi,

On 10/25/2024 6:26 AM, Byeonguk Jeong wrote:
> Hi Daniel,
>
> Okay, I will submit them in a series of patches. Btw, ASSERT_* macros
> are not defined for map_tests. Should I add the definitions for them,
> or just go with CHECK?

For tests in map_tests, I think using CHECK() will be fine.
>
> Thanks,
> Byeonguk
>
> On Thu, Oct 24, 2024 at 11:41:19AM +0200, Daniel Borkmann wrote:
>> Hi Byeonguk,
>>
>> On 10/24/24 11:08 AM, Byeonguk Jeong wrote:
>>> Add a test for out-of-bounds write in trie_get_next_key() when a full
>>> path from root to leaf exists and bpf_map_get_next_key() is called
>>> with the leaf node. It may crashes the kernel on failure, so please
>>> run in a VM.
>>>
>>> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
>> Could you submit the fix + this selftest as a 2-patch series, otherwise BPF CI
>> cannot test both in combination (pls make sure subject has [PATCH bpf] so that
>> our CI adds this on top of the bpf tree).
>>
>> Right now the CI selftest build threw an error:
>>
>>   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c: In function ‘test_lpm_trie_map_get_next_key’:
>>   /tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c:84:9: error: format not a string literal and no format arguments [-Werror=format-security]
>>      84 |         CHECK(map_fd == -1, "bpf_map_create(), error:%s\n",
>>         |         ^~~~~
>>     TEST-OBJ [test_maps] task_storage_map.test.o
>>     TEST-OBJ [test_progs] access_variable_array.test.o
>>   cc1: all warnings being treated as errors
>>     TEST-OBJ [test_progs] align.test.o
>>     TEST-OBJ [test_progs] arena_atomics.test.o
>>   make: *** [Makefile:765: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/lpm_trie_map_get_next_key.test.o] Error 1
>>   make: *** Waiting for unfinished jobs....
>>     GEN-SKEL [test_progs-no_alu32] test_usdt.skel.h
>>   make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
>>
>> Also on quick glance, please use ASSERT_*() macros instead of CHECK() as the
>> latter is deprecated.
>>
>> Thanks,
>> Daniel
> .

Re: [PATCH] selftests/bpf: Add test for trie_get_next_key()
Posted by Daniel Borkmann 1 month ago
On 10/25/24 1:54 PM, Hou Tao wrote:
> On 10/25/2024 6:26 AM, Byeonguk Jeong wrote:
>>
>> Okay, I will submit them in a series of patches. Btw, ASSERT_* macros
>> are not defined for map_tests. Should I add the definitions for them,
>> or just go with CHECK?
> 
> For tests in map_tests, I think using CHECK() will be fine.

Given there is no alternative infra, agree. Would be nice to convert this
over at some point.

Best,
Daniel