[PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps

Leon Hwang posted 3 patches 2 months, 1 week ago
[PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps
Posted by Leon Hwang 2 months, 1 week ago
When updating an existing element in lru_hash maps, the current
implementation always calls prealloc_lru_pop() to get a new node before
checking if the key already exists. If the map is full, this triggers
LRU eviction and removes an existing element, even though the update
operation only needs to modify the value of an existing key in-place.

This is problematic because:
1. Users may unexpectedly lose entries when doing simple value updates
2. The eviction overhead is unnecessary for existing key updates

Fix this by first checking if the key exists before allocating a new
node. If the key is found, update the value in-place, refresh the LRU
reference, and return immediately without triggering any eviction.

Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 kernel/bpf/hashtab.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index c8a9b27f8663..fb624aa76573 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1207,6 +1207,27 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
+	ret = htab_lock_bucket(b, &flags);
+	if (ret)
+		goto err_lock_bucket;
+
+	l_old = lookup_elem_raw(head, hash, key, key_size);
+
+	ret = check_flags(htab, l_old, map_flags);
+	if (ret)
+		goto err;
+
+	if (l_old) {
+		bpf_lru_node_set_ref(&l_old->lru_node);
+		copy_map_value(&htab->map, htab_elem_value(l_old, map->key_size), value);
+		check_and_free_fields(htab, l_old);
+	}
+
+	htab_unlock_bucket(b, flags);
+
+	if (l_old)
+		return 0;
+
 	/* For LRU, we need to alloc before taking bucket's
 	 * spinlock because getting free nodes from LRU may need
 	 * to remove older elements from htab and this removal
-- 
2.52.0
Re: [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps
Posted by Alexei Starovoitov 2 months, 1 week ago
On Tue, Dec 2, 2025 at 7:31 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> When updating an existing element in lru_hash maps, the current
> implementation always calls prealloc_lru_pop() to get a new node before
> checking if the key already exists. If the map is full, this triggers
> LRU eviction and removes an existing element, even though the update
> operation only needs to modify the value of an existing key in-place.
>
> This is problematic because:
> 1. Users may unexpectedly lose entries when doing simple value updates
> 2. The eviction overhead is unnecessary for existing key updates
>
> Fix this by first checking if the key exists before allocating a new
> node. If the key is found, update the value in-place, refresh the LRU
> reference, and return immediately without triggering any eviction.
>
> Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  kernel/bpf/hashtab.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index c8a9b27f8663..fb624aa76573 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -1207,6 +1207,27 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
>         b = __select_bucket(htab, hash);
>         head = &b->head;
>
> +       ret = htab_lock_bucket(b, &flags);
> +       if (ret)
> +               goto err_lock_bucket;
> +
> +       l_old = lookup_elem_raw(head, hash, key, key_size);
> +
> +       ret = check_flags(htab, l_old, map_flags);
> +       if (ret)
> +               goto err;
> +
> +       if (l_old) {
> +               bpf_lru_node_set_ref(&l_old->lru_node);
> +               copy_map_value(&htab->map, htab_elem_value(l_old, map->key_size), value);
> +               check_and_free_fields(htab, l_old);
> +       }

We cannot do this. It breaks the atomicity of the update.
We added htab_map_update_elem_in_place() for a very specific case.
See
https://lore.kernel.org/all/20250401062250.543403-1-houtao@huaweicloud.com/
and discussion in v1,v2.

We cannot do in-place updates for other map types.
It will break user expectations.

pw-bot: cr
Re: [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps
Posted by Leon Hwang 2 months ago

On 2025/12/3 02:10, Alexei Starovoitov wrote:
> On Tue, Dec 2, 2025 at 7:31 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> When updating an existing element in lru_hash maps, the current
>> implementation always calls prealloc_lru_pop() to get a new node before
>> checking if the key already exists. If the map is full, this triggers
>> LRU eviction and removes an existing element, even though the update
>> operation only needs to modify the value of an existing key in-place.
>>
>> This is problematic because:
>> 1. Users may unexpectedly lose entries when doing simple value updates
>> 2. The eviction overhead is unnecessary for existing key updates
>>
>> Fix this by first checking if the key exists before allocating a new
>> node. If the key is found, update the value in-place, refresh the LRU
>> reference, and return immediately without triggering any eviction.
>>
>> Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  kernel/bpf/hashtab.c | 21 +++++++++++++++++++++
>>  1 file changed, 21 insertions(+)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index c8a9b27f8663..fb624aa76573 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -1207,6 +1207,27 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
>>         b = __select_bucket(htab, hash);
>>         head = &b->head;
>>
>> +       ret = htab_lock_bucket(b, &flags);
>> +       if (ret)
>> +               goto err_lock_bucket;
>> +
>> +       l_old = lookup_elem_raw(head, hash, key, key_size);
>> +
>> +       ret = check_flags(htab, l_old, map_flags);
>> +       if (ret)
>> +               goto err;
>> +
>> +       if (l_old) {
>> +               bpf_lru_node_set_ref(&l_old->lru_node);
>> +               copy_map_value(&htab->map, htab_elem_value(l_old, map->key_size), value);
>> +               check_and_free_fields(htab, l_old);
>> +       }
> 
> We cannot do this. It breaks the atomicity of the update.
> We added htab_map_update_elem_in_place() for a very specific case.
> See
> https://lore.kernel.org/all/20250401062250.543403-1-houtao@huaweicloud.com/
> and discussion in v1,v2.
> 
> We cannot do in-place updates for other map types.
> It will break user expectations.
> 

After going through the patch set and the related discussions, I
understand the concerns around breaking update atomicity.

I'll look into alternative approaches to address this issue without
violating the expected atomic semantics.

Thanks,
Leon