kernel/bpf/hashtab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
The intermediate product value_size * num_possible_cpus() is evaluated
in 32-bit arithmetic and only then promoted to 64 bits. On systems with
large value_size and many possible CPUs this can overflow and lead to
an underestimated memory usage.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 304849a27b34 ("bpf: hashtab memory usage")
Cc: stable@vger.kernel.org
Suggested-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Alexei Safin <a.safin@rosa.ru>
---
v2: Promote value_size to u64 at declaration to avoid 32-bit overflow
in all arithmetic using this variable (suggested by Yafang Shao)
kernel/bpf/hashtab.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 570e2f723144..1f0add26ba3f 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -2252,7 +2252,7 @@ static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_
static u64 htab_map_mem_usage(const struct bpf_map *map)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
- u32 value_size = round_up(htab->map.value_size, 8);
+ u64 value_size = round_up(htab->map.value_size, 8);
bool prealloc = htab_is_prealloc(htab);
bool percpu = htab_is_percpu(htab);
bool lru = htab_is_lru(htab);
--
2.50.1 (Apple Git-155)
On Fri, 7 Nov 2025 13:03:05 +0300
Alexei Safin <a.safin@rosa.ru> wrote:
> The intermediate product value_size * num_possible_cpus() is evaluated
> in 32-bit arithmetic and only then promoted to 64 bits. On systems with
> large value_size and many possible CPUs this can overflow and lead to
> an underestimated memory usage.
>
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
That code is insane.
The size being calculated looks like a kernel memory size.
You really don't want to be allocating single structures that exceed 4GB.
David
>
> Fixes: 304849a27b34 ("bpf: hashtab memory usage")
> Cc: stable@vger.kernel.org
> Suggested-by: Yafang Shao <laoar.shao@gmail.com>
> Signed-off-by: Alexei Safin <a.safin@rosa.ru>
> ---
> v2: Promote value_size to u64 at declaration to avoid 32-bit overflow
> in all arithmetic using this variable (suggested by Yafang Shao)
> kernel/bpf/hashtab.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 570e2f723144..1f0add26ba3f 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -2252,7 +2252,7 @@ static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_
> static u64 htab_map_mem_usage(const struct bpf_map *map)
> {
> struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> - u32 value_size = round_up(htab->map.value_size, 8);
> + u64 value_size = round_up(htab->map.value_size, 8);
> bool prealloc = htab_is_prealloc(htab);
> bool percpu = htab_is_percpu(htab);
> bool lru = htab_is_lru(htab);
On Fri, Nov 7, 2025 at 7:41 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Fri, 7 Nov 2025 13:03:05 +0300
> Alexei Safin <a.safin@rosa.ru> wrote:
>
> > The intermediate product value_size * num_possible_cpus() is evaluated
> > in 32-bit arithmetic and only then promoted to 64 bits. On systems with
> > large value_size and many possible CPUs this can overflow and lead to
> > an underestimated memory usage.
> >
> > Found by Linux Verification Center (linuxtesting.org) with SVACE.
>
> That code is insane.
> The size being calculated looks like a kernel memory size.
> You really don't want to be allocating single structures that exceed 4GB.
I failed to get your point.
The calculation `value_size * num_possible_cpus() * num_entries` can
overflow. While the creation of a hashmap limits `value_size *
num_entries` to U32_MAX, this new formula can easily exceed that
limit. For example, on my test server with just 64 CPUs, the following
operation will trigger an overflow:
map_fd = bpf_map_create(BPF_MAP_TYPE_PERCPU_HASH, "count_map", 4, 4,
1 << 27, &map_opts)
--
Regards
Yafang
On Sun, Nov 9, 2025 at 11:00 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Fri, Nov 7, 2025 at 7:41 PM David Laight > <david.laight.linux@gmail.com> wrote: > > > > On Fri, 7 Nov 2025 13:03:05 +0300 > > Alexei Safin <a.safin@rosa.ru> wrote: > > > > > The intermediate product value_size * num_possible_cpus() is evaluated > > > in 32-bit arithmetic and only then promoted to 64 bits. On systems with > > > large value_size and many possible CPUs this can overflow and lead to > > > an underestimated memory usage. > > > > > > Found by Linux Verification Center (linuxtesting.org) with SVACE. > > > > That code is insane. > > The size being calculated looks like a kernel memory size. > > You really don't want to be allocating single structures that exceed 4GB. > > I failed to get your point. > The calculation `value_size * num_possible_cpus() * num_entries` can > overflow. While the creation of a hashmap limits `value_size * > num_entries` to U32_MAX, this new formula can easily exceed that > limit. For example, on my test server with just 64 CPUs, the following > operation will trigger an overflow: > > map_fd = bpf_map_create(BPF_MAP_TYPE_PERCPU_HASH, "count_map", 4, 4, > 1 << 27, &map_opts) Upon reviewing the code, I see that `num_entries` is declared as u64, which prevents overflow in the calculation `value_size * num_possible_cpus() * num_entries`. Therefore, this change is unnecessary. It seems that the Linux Verification Center (linuxtesting.org) needs to be improved ;-) -- Regards Yafang
Thanks for the follow-up. Just to clarify: the overflow happens before the multiplication by num_entries. In C, the * operator is left-associative, so the expression is evaluated as (value_size * num_possible_cpus()) * num_entries. Since value_size was u32 and num_possible_cpus() returns int, the first product is performed in 32-bit arithmetic due to usual integer promotions. If that intermediate product overflows, the result is already incorrect before it is promoted when multiplied by u64 num_entries. A concrete example within allowed limits: value_size = 1,048,576 (1 MiB), num_possible_cpus() = 4096 => 1,048,576 * 4096 = 2^32 => wraps to 0 in 32 bits, even with num_entries = 1. This isn’t about a single >4GiB allocation - it’s about aggregated memory usage (percpu), which can legitimately exceed 4GiB in total. v2 promotes value_size to u64 at declaration, which avoids the 32-bit intermediate overflow cleanly. 09.11.2025 11:20, Yafang Shao пишет: > On Sun, Nov 9, 2025 at 11:00 AM Yafang Shao <laoar.shao@gmail.com> wrote: >> On Fri, Nov 7, 2025 at 7:41 PM David Laight >> <david.laight.linux@gmail.com> wrote: >>> On Fri, 7 Nov 2025 13:03:05 +0300 >>> Alexei Safin <a.safin@rosa.ru> wrote: >>> >>>> The intermediate product value_size * num_possible_cpus() is evaluated >>>> in 32-bit arithmetic and only then promoted to 64 bits. On systems with >>>> large value_size and many possible CPUs this can overflow and lead to >>>> an underestimated memory usage. >>>> >>>> Found by Linux Verification Center (linuxtesting.org) with SVACE. >>> That code is insane. >>> The size being calculated looks like a kernel memory size. >>> You really don't want to be allocating single structures that exceed 4GB. >> I failed to get your point. >> The calculation `value_size * num_possible_cpus() * num_entries` can >> overflow. While the creation of a hashmap limits `value_size * >> num_entries` to U32_MAX, this new formula can easily exceed that >> limit. For example, on my test server with just 64 CPUs, the following >> operation will trigger an overflow: >> >> map_fd = bpf_map_create(BPF_MAP_TYPE_PERCPU_HASH, "count_map", 4, 4, >> 1 << 27, &map_opts) > Upon reviewing the code, I see that `num_entries` is declared as u64, > which prevents overflow in the calculation `value_size * > num_possible_cpus() * num_entries`. Therefore, this change is > unnecessary. > > It seems that the Linux Verification Center (linuxtesting.org) needs > to be improved ;-) >
On Sun, Nov 9, 2025 at 7:00 PM Алексей Сафин <a.safin@rosa.ru> wrote: > > Thanks for the follow-up. > > Just to clarify: the overflow happens before the multiplication by > num_entries. In C, the * operator is left-associative, so the expression is > evaluated as (value_size * num_possible_cpus()) * num_entries. Since > value_size was u32 and num_possible_cpus() returns int, the first product is > performed in 32-bit arithmetic due to usual integer promotions. If that > intermediate product overflows, the result is already incorrect before it is > promoted when multiplied by u64 num_entries. > > A concrete example within allowed limits: > value_size = 1,048,576 (1 MiB), num_possible_cpus() = 4096 > => 1,048,576 * 4096 = 2^32 => wraps to 0 in 32 bits, even with > num_entries = 1. Thank you for the clarification. Based on my understanding, the maximum value_size for a percpu hashmap appears to be constrained by PCPU_MIN_UNIT_SIZE (32768), as referenced in htab_map_alloc_check(): https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/hashtab.c#n457 This would require num_possible_cpus() to reach 131072 to potentially cause an overflow. However, the maximum number of CPUs supported on x86_64 is typically 8192 in standard kernel configurations. I'm uncertain if any architectures actually support systems at this scale. > > This isn’t about a single >4GiB allocation - it’s about aggregated memory > usage (percpu), which can legitimately exceed 4GiB in total. > > v2 promotes value_size to u64 at declaration, which avoids the 32-bit > intermediate overflow cleanly. -- Regards Yafang
On Fri, Nov 7, 2025 at 6:03 PM Alexei Safin <a.safin@rosa.ru> wrote:
>
> The intermediate product value_size * num_possible_cpus() is evaluated
> in 32-bit arithmetic and only then promoted to 64 bits. On systems with
> large value_size and many possible CPUs this can overflow and lead to
> an underestimated memory usage.
>
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>
> Fixes: 304849a27b34 ("bpf: hashtab memory usage")
> Cc: stable@vger.kernel.org
> Suggested-by: Yafang Shao <laoar.shao@gmail.com>
> Signed-off-by: Alexei Safin <a.safin@rosa.ru>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
> ---
> v2: Promote value_size to u64 at declaration to avoid 32-bit overflow
> in all arithmetic using this variable (suggested by Yafang Shao)
> kernel/bpf/hashtab.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 570e2f723144..1f0add26ba3f 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -2252,7 +2252,7 @@ static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_
> static u64 htab_map_mem_usage(const struct bpf_map *map)
> {
> struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> - u32 value_size = round_up(htab->map.value_size, 8);
> + u64 value_size = round_up(htab->map.value_size, 8);
> bool prealloc = htab_is_prealloc(htab);
> bool percpu = htab_is_percpu(htab);
> bool lru = htab_is_lru(htab);
> --
> 2.50.1 (Apple Git-155)
>
--
Regards
Yafang
© 2016 - 2025 Red Hat, Inc.