[v2] bpf, verifier: Improve precision of BPF_MUL

[PATCH v2] bpf, verifier: Improve precision of BPF_MUL

Posted by Matan Shachnai 1 year, 2 months ago

This patch improves (or maintains) the precision of register value tracking
in BPF_MUL across all possible inputs. It also simplifies
scalar32_min_max_mul() and scalar_min_max_mul().

As it stands, BPF_MUL is composed of three functions:

case BPF_MUL:
  tnum_mul();
  scalar32_min_max_mul();
  scalar_min_max_mul();

The current implementation of scalar_min_max_mul() restricts the u64 input
ranges of dst_reg and src_reg to be within [0, U32_MAX]:

    /* Both values are positive, so we can work with unsigned and
     * copy the result to signed (unless it exceeds S64_MAX).
     */
    if (umax_val > U32_MAX || dst_reg->umax_value > U32_MAX) {
        /* Potential overflow, we know nothing */
        __mark_reg64_unbounded(dst_reg);
        return;
    }

This restriction is done to avoid unsigned overflow, which could otherwise
wrap the result around 0, and leave an unsound output where umin > umax. We
also observe that limiting these u64 input ranges to [0, U32_MAX] leads to
a loss of precision. Consider the case where the u64 bounds of dst_reg are
[0, 2^34] and the u64 bounds of src_reg are [0, 2^2]. While the
multiplication of these two bounds doesn't overflow and is sound [0, 2^36],
the current scalar_min_max_mul() would set the entire register state to
unbounded.

The key idea of our patch is that if there’s no possibility of overflow, we
can multiply the unsigned bounds; otherwise, we set the 64-bit bounds to
[0, U64_MAX], marking them as unbounded.

if (check_mul_overflow(*dst_umax, src_reg->umax_value, dst_umax) ||
       (check_mul_overflow(*dst_umin, src_reg->umin_value, dst_umin))) {
        /* Overflow possible, we know nothing */
        dst_reg->umin_value = 0;
        dst_reg->umax_value = U64_MAX;
    }
  ...

Now, to update the signed bounds based on the unsigned bounds, we need to
ensure that the unsigned bounds don't cross the signed boundary (i.e., if
((s64)reg->umin_value <= (s64)reg->umax_value)). We observe that this is
done anyway by __reg_deduce_bounds later, so we can just set signed bounds
to unbounded [S64_MIN, S64_MAX]. Deferring the assignment of s64 bounds to
reg_bounds_sync removes the current redundancy in scalar_min_max_mul(),
which currently sets the s64 bounds based on the u64 bounds only in the
case where umin <= umax <= 2^(63)-1.

Below, we provide an example BPF program (below) that exhibits the
imprecision in the current BPF_MUL, where the outputs are all unbounded. In
contrast, the updated BPF_MUL produces a bounded register state:

BPF_LD_IMM64(BPF_REG_1, 11),
BPF_LD_IMM64(BPF_REG_2, 4503599627370624),
BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0),
BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0),
BPF_ALU64_REG(BPF_AND, BPF_REG_1, BPF_REG_2),
BPF_LD_IMM64(BPF_REG_3, 809591906117232263),
BPF_ALU64_REG(BPF_MUL, BPF_REG_3, BPF_REG_1),
BPF_MOV64_IMM(BPF_REG_0, 1),
BPF_EXIT_INSN(),

Verifier log using the old BPF_MUL:

func#0 @0
0: R1=ctx() R10=fp0
0: (18) r1 = 0xb                      ; R1_w=11
2: (18) r2 = 0x10000000000080         ; R2_w=0x10000000000080
4: (87) r2 = -r2                      ; R2_w=scalar()
5: (87) r2 = -r2                      ; R2_w=scalar()
6: (5f) r1 &= r2                      ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R2_w=scalar()
7: (18) r3 = 0xb3c3f8c99262687        ; R3_w=0xb3c3f8c99262687
9: (2f) r3 *= r1                      ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R3_w=scalar()
...

Verifier using the new updated BPF_MUL (more precise bounds at label 9)

func#0 @0
0: R1=ctx() R10=fp0
0: (18) r1 = 0xb                      ; R1_w=11
2: (18) r2 = 0x10000000000080         ; R2_w=0x10000000000080
4: (87) r2 = -r2                      ; R2_w=scalar()
5: (87) r2 = -r2                      ; R2_w=scalar()
6: (5f) r1 &= r2                      ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R2_w=scalar()
7: (18) r3 = 0xb3c3f8c99262687        ; R3_w=0xb3c3f8c99262687
9: (2f) r3 *= r1                      ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R3_w=scalar(smin=0,smax=umax=0x7b96bb0a94a3a7cd,var_off=(0x0; 0x7fffffffffffffff))
...

Finally, we proved the soundness of the new scalar_min_max_mul() and
scalar32_min_max_mul() functions. Typically, multiplication operations are
expensive to check with bitvector-based solvers. We were able to prove the
soundness of these functions using Non-Linear Integer Arithmetic (NIA)
theory. Additionally, using Agni [2,3], we obtained the encodings for
scalar32_min_max_mul() and scalar_min_max_mul() in bitvector theory, and
were able to prove their soundness using 16-bit bitvectors (instead of
64-bit bitvectors that the functions actually use).

In conclusion, with this patch,

1. We were able to show that we can improve the overall precision of
   BPF_MUL. We proved (using an SMT solver) that this new version of
   BPF_MUL is at least as precise as the current version for all inputs.

2. We are able to prove the soundness of the new scalar_min_max_mul() and
   scalar32_min_max_mul(). By leveraging the existing proof of tnum_mul
   [1], we can say that the composition of these three functions within
   BPF_MUL is sound.

[1] https://ieeexplore.ieee.org/abstract/document/9741267
[2] https://link.springer.com/chapter/10.1007/978-3-031-37709-9_12
[3] https://people.cs.rutgers.edu/~sn349/papers/sas24-preprint.pdf

Co-developed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Matan Shachnai <m.shachnai@gmail.com>
---
 kernel/bpf/verifier.c | 72 +++++++++++++++----------------------------
 1 file changed, 24 insertions(+), 48 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1c4ebb326785..4785f3fac70a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -13827,65 +13827,41 @@ static void scalar_min_max_sub(struct bpf_reg_state *dst_reg,
 static void scalar32_min_max_mul(struct bpf_reg_state *dst_reg,
 				 struct bpf_reg_state *src_reg)
 {
-	s32 smin_val = src_reg->s32_min_value;
-	u32 umin_val = src_reg->u32_min_value;
-	u32 umax_val = src_reg->u32_max_value;
+	u32 *dst_umin = &dst_reg->u32_min_value;
+	u32 *dst_umax = &dst_reg->u32_max_value;
 
-	if (smin_val < 0 || dst_reg->s32_min_value < 0) {
-		/* Ain't nobody got time to multiply that sign */
-		__mark_reg32_unbounded(dst_reg);
-		return;
-	}
-	/* Both values are positive, so we can work with unsigned and
-	 * copy the result to signed (unless it exceeds S32_MAX).
-	 */
-	if (umax_val > U16_MAX || dst_reg->u32_max_value > U16_MAX) {
-		/* Potential overflow, we know nothing */
-		__mark_reg32_unbounded(dst_reg);
-		return;
-	}
-	dst_reg->u32_min_value *= umin_val;
-	dst_reg->u32_max_value *= umax_val;
-	if (dst_reg->u32_max_value > S32_MAX) {
+	if (check_mul_overflow(*dst_umax, src_reg->u32_max_value, dst_umax) ||
+	    check_mul_overflow(*dst_umin, src_reg->u32_min_value, dst_umin)) {
 		/* Overflow possible, we know nothing */
-		dst_reg->s32_min_value = S32_MIN;
-		dst_reg->s32_max_value = S32_MAX;
-	} else {
-		dst_reg->s32_min_value = dst_reg->u32_min_value;
-		dst_reg->s32_max_value = dst_reg->u32_max_value;
+		dst_reg->u32_min_value = 0;
+		dst_reg->u32_max_value = U32_MAX;
 	}
+
+	/* Set signed bounds to unbounded and improve precision in
+	 * reg_bounds_sync()
+	 */
+	dst_reg->s32_min_value = S32_MIN;
+	dst_reg->s32_max_value = S32_MAX;
 }
 
 static void scalar_min_max_mul(struct bpf_reg_state *dst_reg,
 			       struct bpf_reg_state *src_reg)
 {
-	s64 smin_val = src_reg->smin_value;
-	u64 umin_val = src_reg->umin_value;
-	u64 umax_val = src_reg->umax_value;
+	u64 *dst_umin = &dst_reg->umin_value;
+	u64 *dst_umax = &dst_reg->umax_value;
 
-	if (smin_val < 0 || dst_reg->smin_value < 0) {
-		/* Ain't nobody got time to multiply that sign */
-		__mark_reg64_unbounded(dst_reg);
-		return;
-	}
-	/* Both values are positive, so we can work with unsigned and
-	 * copy the result to signed (unless it exceeds S64_MAX).
-	 */
-	if (umax_val > U32_MAX || dst_reg->umax_value > U32_MAX) {
-		/* Potential overflow, we know nothing */
-		__mark_reg64_unbounded(dst_reg);
-		return;
-	}
-	dst_reg->umin_value *= umin_val;
-	dst_reg->umax_value *= umax_val;
-	if (dst_reg->umax_value > S64_MAX) {
+	if (check_mul_overflow(*dst_umax, src_reg->umax_value, dst_umax) ||
+	    check_mul_overflow(*dst_umin, src_reg->umin_value, dst_umin)) {
 		/* Overflow possible, we know nothing */
-		dst_reg->smin_value = S64_MIN;
-		dst_reg->smax_value = S64_MAX;
-	} else {
-		dst_reg->smin_value = dst_reg->umin_value;
-		dst_reg->smax_value = dst_reg->umax_value;
+		dst_reg->umin_value = 0;
+		dst_reg->umax_value = U64_MAX;
 	}
+
+	/* Set signed bounds to unbounded and improve precision in
+	 * reg_bounds_sync()
+	 */
+	dst_reg->smin_value = S64_MIN;
+	dst_reg->smax_value = S64_MAX;
 }
 
 static void scalar32_min_max_and(struct bpf_reg_state *dst_reg,
-- 
2.25.1

Re: [PATCH v2] bpf, verifier: Improve precision of BPF_MUL

Posted by Eduard Zingerman 1 year, 2 months ago

On Wed, 2024-11-27 at 02:41 -0500, Matan Shachnai wrote:

[...]

> In conclusion, with this patch,
> 
> 1. We were able to show that we can improve the overall precision of
>    BPF_MUL. We proved (using an SMT solver) that this new version of
>    BPF_MUL is at least as precise as the current version for all inputs.
> 
> 2. We are able to prove the soundness of the new scalar_min_max_mul() and
>    scalar32_min_max_mul(). By leveraging the existing proof of tnum_mul
>    [1], we can say that the composition of these three functions within
>    BPF_MUL is sound.

Hi Matan,

I think this is a nice simplification of the existing code.
Could you please also add a few canary tests in the
tools/testing/selftests/bpf/progs/verifier_bounds.c ?
(e.g. simple case plus possible edge cases).
Something like:

    SEC("tc")
    __success __log_level(2)
    __msg("r6 *= r7 {{.*}}; R6_w=some-range-here")
    __naked void mult_mixed_sign(void)
    {
    	asm volatile (
    	"call %[bpf_get_prandom_u32];"
    	"r6 = r0;"
    	"call %[bpf_get_prandom_u32];"
    	"r7 = r0;"
    	"r6 &= 0xf;"
    	"r6 -= 1000000000;"
    	"r7 &= 0xf;"
    	"r7 -= 2000000000;"
    	"r6 *= r7;"
    	"exit"
    	:
    	: __imm(bpf_get_prandom_u32),
    	  __imm(bpf_skb_store_bytes)
    	: __clobber_all);
    }

We usually do this as a separate patch in a patch-set.

Also, it looks like this has limited applicability in practice,
because small negative values denote huge unsigned values,
hence overflow check kicks in for such values.
E.g. no range inferred for [-10,5] * [-20,-5]:

  0: (85) call bpf_get_prandom_u32#7    ; R0_w=scalar()
  1: (bf) r6 = r0                       ; R0_w=scalar(id=1) R6_w=scalar(id=1)
  2: (85) call bpf_get_prandom_u32#7    ; R0_w=scalar()
  3: (bf) r7 = r0                       ; R0_w=scalar(id=2) R7_w=scalar(id=2)
  4: (57) r6 &= 15                      ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf))
  5: (17) r6 -= 10                      ; R6_w=scalar(smin=smin32=-10,smax=smax32=5)
  6: (57) r7 &= 15                      ; R7_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf))
  7: (17) r7 -= 20                      ; R7_w=scalar(smin=smin32=-20,smax=smax32=-5,umin=0xffffffffffffffec,umax=0xfffffffffffffffb,umin32=0xffffffec,umax32=0xfffffffb,var_off=(0xffffffffffffffe0; 0x1f))
  8: (2f) r6 *= r7                      ; R6_w=scalar() R7_w=scalar(smin=smin32=-20,smax=smax32=-5,umin=0xffffffffffffffec,umax=0xfffffffffffffffb,umin32=0xffffffec,umax32=0xfffffffb,var_off=(0xffffffffffffffe0; 0x1f))
  9: (95) exit

Compared to:

  0: R1=ctx() R10=fp0
  ; asm volatile ( @ verifier_bounds.c:1208
  0: (85) call bpf_get_prandom_u32#7    ; R0_w=scalar()
  1: (bf) r6 = r0                       ; R0_w=scalar(id=1) R6_w=scalar(id=1)
  2: (85) call bpf_get_prandom_u32#7    ; R0_w=scalar()
  3: (bf) r7 = r0                       ; R0_w=scalar(id=2) R7_w=scalar(id=2)
  4: (57) r6 &= 15                      ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf))
  5: (17) r6 -= 1000000000              ; R6_w=scalar(smin=0xffffffffc4653600,smax=0xffffffffc465360f,umin=0xffffffffc4653600,umax=0xffffffffc465360f,smin32=umin32=0xc4653600,smax32=umax32=0xc465360f,var_off=(0xffffffffc4653600; 0xf))
  6: (57) r7 &= 15                      ; R7_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf))
  7: (17) r7 -= 2000000000              ; R7_w=scalar(smin=0xffffffff88ca6c00,smax=0xffffffff88ca6c0f,umin=0xffffffff88ca6c00,umax=0xffffffff88ca6c0f,smin32=umin32=0x88ca6c00,smax32=umax32=0x88ca6c0f,var_off=(0xffffffff88ca6c00; 0xf))
  8: (2f) r6 *= r7                      ; R6_w=scalar(smax=0x7ffffffffffffeff,umax=0xfffffffffffffeff,smax32=0x7ffffeff,umax32=0xfffffeff,var_off=(0x0; 0xfffffffffffffeff)) R7_w=scalar(smin=0xffffffff88ca6c00,smax=0xffffffff88ca6c0f,umin=0xffffffff88ca6c00,umax=0xffffffff88ca6c0f,smin32=umin32=0x88ca6c00,smax32=umax32=0x88ca6c0f,var_off=(0xffffffff88ca6c00; 0xf))
  9: (95) exit

Is it possible to do check_mul_overflow() for signed bounds and
rely on reg_bounds_sync() for unsigned?

[...]

Re: [PATCH v2] bpf, verifier: Improve precision of BPF_MUL

Posted by M Shachnai 1 year, 2 months ago

On Wed, Nov 27, 2024 at 5:53 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-11-27 at 02:41 -0500, Matan Shachnai wrote:
>
> [...]
>
> > In conclusion, with this patch,
> >
> > 1. We were able to show that we can improve the overall precision of
> >    BPF_MUL. We proved (using an SMT solver) that this new version of
> >    BPF_MUL is at least as precise as the current version for all inputs.
> >
> > 2. We are able to prove the soundness of the new scalar_min_max_mul() and
> >    scalar32_min_max_mul(). By leveraging the existing proof of tnum_mul
> >    [1], we can say that the composition of these three functions within
> >    BPF_MUL is sound.
>
> Hi Matan,
>
> I think this is a nice simplification of the existing code.
> Could you please also add a few canary tests in the
> tools/testing/selftests/bpf/progs/verifier_bounds.c ?
> (e.g. simple case plus possible edge cases).

Thanks for your feedback, Eduard! We'll be happy to add test-cases to
exercise BPF_MUL.

> Something like:
>
>     SEC("tc")
>     __success __log_level(2)
>     __msg("r6 *= r7 {{.*}}; R6_w=some-range-here")
>     __naked void mult_mixed_sign(void)
>     {
>         asm volatile (
>         "call %[bpf_get_prandom_u32];"
>         "r6 = r0;"
>         "call %[bpf_get_prandom_u32];"
>         "r7 = r0;"
>         "r6 &= 0xf;"
>         "r6 -= 1000000000;"
>         "r7 &= 0xf;"
>         "r7 -= 2000000000;"
>         "r6 *= r7;"
>         "exit"
>         :
>         : __imm(bpf_get_prandom_u32),
>           __imm(bpf_skb_store_bytes)
>         : __clobber_all);
>     }
>
> We usually do this as a separate patch in a patch-set.
>
> Also, it looks like this has limited applicability in practice,
> because small negative values denote huge unsigned values,
> hence overflow check kicks in for such values.
> E.g. no range inferred for [-10,5] * [-20,-5]:
>
>   0: (85) call bpf_get_prandom_u32#7    ; R0_w=scalar()
>   1: (bf) r6 = r0                       ; R0_w=scalar(id=1) R6_w=scalar(id=1)
>   2: (85) call bpf_get_prandom_u32#7    ; R0_w=scalar()
>   3: (bf) r7 = r0                       ; R0_w=scalar(id=2) R7_w=scalar(id=2)
>   4: (57) r6 &= 15                      ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf))
>   5: (17) r6 -= 10                      ; R6_w=scalar(smin=smin32=-10,smax=smax32=5)
>   6: (57) r7 &= 15                      ; R7_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf))
>   7: (17) r7 -= 20                      ; R7_w=scalar(smin=smin32=-20,smax=smax32=-5,umin=0xffffffffffffffec,umax=0xfffffffffffffffb,umin32=0xffffffec,umax32=0xfffffffb,var_off=(0xffffffffffffffe0; 0x1f))
>   8: (2f) r6 *= r7                      ; R6_w=scalar() R7_w=scalar(smin=smin32=-20,smax=smax32=-5,umin=0xffffffffffffffec,umax=0xfffffffffffffffb,umin32=0xffffffec,umax32=0xfffffffb,var_off=(0xffffffffffffffe0; 0x1f))
>   9: (95) exit
>
> Compared to:
>
>   0: R1=ctx() R10=fp0
>   ; asm volatile ( @ verifier_bounds.c:1208
>   0: (85) call bpf_get_prandom_u32#7    ; R0_w=scalar()
>   1: (bf) r6 = r0                       ; R0_w=scalar(id=1) R6_w=scalar(id=1)
>   2: (85) call bpf_get_prandom_u32#7    ; R0_w=scalar()
>   3: (bf) r7 = r0                       ; R0_w=scalar(id=2) R7_w=scalar(id=2)
>   4: (57) r6 &= 15                      ; R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf))
>   5: (17) r6 -= 1000000000              ; R6_w=scalar(smin=0xffffffffc4653600,smax=0xffffffffc465360f,umin=0xffffffffc4653600,umax=0xffffffffc465360f,smin32=umin32=0xc4653600,smax32=umax32=0xc465360f,var_off=(0xffffffffc4653600; 0xf))
>   6: (57) r7 &= 15                      ; R7_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf))
>   7: (17) r7 -= 2000000000              ; R7_w=scalar(smin=0xffffffff88ca6c00,smax=0xffffffff88ca6c0f,umin=0xffffffff88ca6c00,umax=0xffffffff88ca6c0f,smin32=umin32=0x88ca6c00,smax32=umax32=0x88ca6c0f,var_off=(0xffffffff88ca6c00; 0xf))
>   8: (2f) r6 *= r7                      ; R6_w=scalar(smax=0x7ffffffffffffeff,umax=0xfffffffffffffeff,smax32=0x7ffffeff,umax32=0xfffffeff,var_off=(0x0; 0xfffffffffffffeff)) R7_w=scalar(smin=0xffffffff88ca6c00,smax=0xffffffff88ca6c0f,umin=0xffffffff88ca6c00,umax=0xffffffff88ca6c0f,smin32=umin32=0x88ca6c00,smax32=umax32=0x88ca6c0f,var_off=(0xffffffff88ca6c00; 0xf))
>   9: (95) exit
>
> Is it possible to do check_mul_overflow() for signed bounds and
> rely on reg_bounds_sync() for unsigned?
>

The patch in its current form (and the existing BPF_MUL version in the
verifier) doesn't handle negative values well, as the example you gave
here illustrates. The initial goal of this patch was to improve
precision of unsigned multiplication. However, there is a canonical
way to perform signed multiplication which is sound and is able to
handle negative values. Specifically, signed multiplication can be
performed soundly by [min(a, b, c, d), max(a, b, c, d)], where a, b,
c, d correspond to the four products obtained by multiplying all the
bounds (these products are checked for overflows). For better
precision, we propose having both unsigned multiplication as well as
signed multiplication. The resulting bounds can then be refined in
reg_bounds_sync().

We will update our patch with both signed and unsigned multiplication,
add test-cases, and send it all as a patch-set soon.

Best,
Matan

> [...]
>