[v1] bpf: backport scalar not-equal tracking fixes

[RFC PATCH 6.1.y 0/2] bpf: backport scalar not-equal tracking fixes

Posted by Zhenzhong Wu 6 days, 10 hours ago

Hi BPF maintainers,

This RFC backports two BPF verifier scalar range-tracking fixes to 6.1.y.
The series is intended to fix a verifier state-pruning issue where an
impossible scalar path can be kept while the real success path is pruned.

This is a verifier scalar range-tracking issue, not a helper-specific
issue.
The visible failure is that the verifier can prune the real success
continuation, which should not be skipped, and keep only an impossible one.
In the reproducer, the traced function returns 15 at runtime, but the
verifier keeps the path where r7 is treated as 0, hard-wires the opposite
branch, and the program reports the error branch.

The minimized reproducer uses fexit/bpf_get_func_ret only because it
provides a compact way to create the interesting register flow: one scalar
in r0 for the helper status, and another scalar loaded from the stack for
the traced function return value. The issue is not specific to
bpf_get_func_ret itself.
Because bpf_get_func_ret() was added in v5.17, this particular reproducer
directly applies to 6.1.y. I have not built a 5.15.y-compatible reproducer.

The relevant verifier-log bytecode from the reproducer is below. The later
instructions only store r7 into a map so user space can observe which
branch the verifier kept.

  15: (85) call bpf_get_func_ret#184    ; R0_w=scalar() fp-8_w=mmmmmmmm
  16: (79) r7 = *(u64 *)(r10 -8)        ; R7_w=scalar() R10=fp0
  17: (15) if r0 == 0x0 goto pc+1       ; R0_w=scalar()
  18: (bf) r7 = r0                      ; R0=scalar(id=1) R7=scalar(id=1)
  19: (55) if r0 != 0x0 goto pc+6       ; R0=0
  20: (67) r7 <<= 32                    ; R7_w=0
  21: (77) r7 >>= 32                    ; R7_w=0
  22: (b7) r1 = 1                       ; R1_w=1
  23: (55) if r7 != 0xf goto pc+1

The failure mechanism is:

  1. The program checks "if r0 == 0". The jump target is the success path,
     and the fallthrough path is the failure path and should imply r0 != 0.

  2. On v6.1.91, the verifier does not record that r0 != 0 fact for the
     fallthrough path. The following "r7 = r0" then gives r0 and r7 the
     same scalar id while both are still treated as possibly zero.

  3. At the later "if r0 != 0" check, the verifier still thinks r0 may be
     zero, so it explores the fallthrough path of that JNE. That path means
     r0 == 0, and because r7 shares the same scalar id, r7 is narrowed to
     zero as well. This is an impossible path: it came from the earlier
     failure path that should have implied r0 != 0.

  4. That impossible continuation reaches the return-value comparison with
     r7 == 0 and can make the verifier keep only the wrong branch. When the
     real success path is analyzed later, state pruning considers it safe
     against the earlier cached verifier state, so the real continuation is
     not explored.

The relevant pruning point is that regsafe()/states_equal() accepted the
real success-path state against an earlier cached state where r0 was an
imprecise scalar and r7 constraints were loose enough to cover the current
r7.

After confirming the mechanism, I ran git bisect with this minimized C
reproducer as the test case. The bisect started from the affected 6.7.y
behavior and the fixed v6.8 behavior, and narrowed the fix to the
v6.7..v6.8 window:

  https://gist.github.com/swananan/165cca6008f6c81870a28aa7a445d5ea

The bisect identified the upstream fix as:

  d028f87517d6775dccff4ddbca2740826f9e53f1
  bpf: make the verifier tracks the "not equal" for regs

For 6.1.y, applying d028f87517d6 alone is not sufficient. The older
verifier code also needs the range-preservation semantics from:

  9e314f5d8682e1fe6ac214fb34580a238b6fd3c4
  bpf: drop knowledge-losing __reg_combine_{32,64}_into_{64,32} logic

Without that semantic prerequisite, the old range-combining logic can still
discard the refined bounds after the verifier learns them.

The 6.1.y adaptation is split as follows:

  - patch 1 carries the 6.1.y-relevant part of 9e314f5d8682 by removing the
    knowledge-losing __reg_combine_{32,64}_into_{64,32} paths and using
    reg_bounds_sync() after conditional refinement;
  - patch 2 carries d028f87517d6 in the older reg_set_min_max() layout. In
    newer kernels, reg_set_min_max() refines the fallthrough branch through
    rev_opcode(opcode), so the fallthrough branch of BPF_JEQ is handled by
    the BPF_JNE refinement. In 6.1.y that split does not exist, so the same
    not-equal fact is expressed directly on BPF_JEQ's false_reg and
    BPF_JNE's true_reg.

Observed results with that reproducer:

  v6.1.91:               REPRO: BAD  (ran=1 error=1)
  v6.7.12:               REPRO: BAD  (ran=1 error=1)
  v6.8:                  REPRO: GOOD (ran=1 error=0)
  v6.1.91 + this series: REPRO: GOOD (ran=1 error=0)

Because this touches shared verifier scalar range logic, I am sending it as
RFC and would appreciate BPF maintainer guidance on whether this 6.1.y
semantic backport should be carried and whether the split in this series is
reasonable. The same issue should also be relevant to 6.6.y, which still
has the older verifier logic and predates the v6.8 fix, but this RFC only
includes the 6.1.y backport.

Zhenzhong Wu (2):
  bpf: drop knowledge-losing __reg_combine_{32,64}_into_{64,32} logic
  bpf: make the verifier tracks the "not equal" for regs

 kernel/bpf/verifier.c | 92 +++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 52 deletions(-)

base-commit: 228da13e907e2b46b7222cfc35290fbfad920bef
-- 
2.43.0

Re: [RFC PATCH 6.1.y 0/2] bpf: backport scalar not-equal tracking fixes

Posted by Shung-Hsi Yu 5 days, 22 hours ago

Hi Zhenzhong,

Thanks for looking at the stable kernel branch!

Since this patchset is intended for stable 6.1 I'd suggest to also
include stable@vger.kernel.org even if this is an RFC (and ideally with
'PATCH stable ...' as subject prefix, but that's just minor), so that
the stable team is aware.

On Tue, Jun 02, 2026 at 02:03:58AM +0800, Zhenzhong Wu wrote:
> Hi BPF maintainers,
> 
> This RFC backports two BPF verifier scalar range-tracking fixes to 6.1.y.
> The series is intended to fix a verifier state-pruning issue where an
> impossible scalar path can be kept while the real success path is pruned.
> 
> This is a verifier scalar range-tracking issue, not a helper-specific
> issue.
> The visible failure is that the verifier can prune the real success
> continuation, which should not be skipped, and keep only an impossible one.
...

This sounds somewhat similar to the issue fixed in "backport of iterator
and callback handling fixes" for stable 6.6[1] by @Eduard. Could you try
to test on the latest stable 6.6.y as well at see if you can reproduce
the issue there?

Also per stable policy[2] we have backport the patches in the series to
6.6 first if we want it in 6.1 anyway.

  When using option 2 or 3 you can ask for your change to be included in specific
  stable series. When doing so, ensure the fix or an equivalent is applicable,
  submitted, or already present in all newer stable trees still supported. This is
  meant to prevent regressions that users might later encounter on updating...

Cheers,
Shung-Hsi Yu

1: https://lore.kernel.org/stable/20240125001554.25287-1-eddyz87@gmail.com/
2: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

Re: [RFC PATCH 6.1.y 0/2] bpf: backport scalar not-equal tracking fixes

Posted by Shung-Hsi Yu 5 days, 21 hours ago

On Tue, Jun 02, 2026 at 01:47:01PM +0800, Shung-Hsi Yu wrote:
...
> On Tue, Jun 02, 2026 at 02:03:58AM +0800, Zhenzhong Wu wrote:
> > Hi BPF maintainers,
> > 
> > This RFC backports two BPF verifier scalar range-tracking fixes to 6.1.y.
> > The series is intended to fix a verifier state-pruning issue where an
> > impossible scalar path can be kept while the real success path is pruned.
> > 
> > This is a verifier scalar range-tracking issue, not a helper-specific
> > issue.
> > The visible failure is that the verifier can prune the real success
> > continuation, which should not be skipped, and keep only an impossible one.
> ...
> 
> This sounds somewhat similar to the issue fixed in "backport of iterator
> and callback handling fixes" for stable 6.6[1] by @Eduard. Could you try
> to test on the latest stable 6.6.y as well at see if you can reproduce
> the issue there?
...

My mistake, the reproducer you had doesn't use iterator or callback, so
probably not fixed in stable 6.6. I'll take a better look at this later
this week.

> 1: https://lore.kernel.org/stable/20240125001554.25287-1-eddyz87@gmail.com/
> 2: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

Re: [RFC PATCH 6.1.y 0/2] bpf: backport scalar not-equal tracking fixes

Posted by Shung-Hsi Yu 5 days, 19 hours ago

On Tue, Jun 02, 2026 at 02:42:35PM +0800, Shung-Hsi Yu wrote:
> On Tue, Jun 02, 2026 at 01:47:01PM +0800, Shung-Hsi Yu wrote:
> ...
> > On Tue, Jun 02, 2026 at 02:03:58AM +0800, Zhenzhong Wu wrote:
> > > Hi BPF maintainers,
> > > 
> > > This RFC backports two BPF verifier scalar range-tracking fixes to 6.1.y.
> > > The series is intended to fix a verifier state-pruning issue where an
> > > impossible scalar path can be kept while the real success path is pruned.
> > > 
> > > This is a verifier scalar range-tracking issue, not a helper-specific
> > > issue.
> > > The visible failure is that the verifier can prune the real success
> > > continuation, which should not be skipped, and keep only an impossible one.
> > ...
> > 
> > This sounds somewhat similar to the issue fixed in "backport of iterator
> > and callback handling fixes" for stable 6.6[1] by @Eduard. Could you try
> > to test on the latest stable 6.6.y as well at see if you can reproduce
> > the issue there?
> ...
> 
> My mistake, the reproducer you had doesn't use iterator or callback, so
> probably not fixed in stable 6.6. I'll take a better look at this later
> this week.

Two more ideas beside testing on latest stable 6.6. 

1. Can you try testing on bpf-next, but with commit d028f87517d6 'bpf:
   make the verifier tracks the "not equal" for regs' reverted? My
   concern is that it is possible that commit d028f87517d6 does not
   address the root cause of incorrect state pruning here.

   If the reproducer _fails_ to reproduce the issue even with commit
   d028f87517d6 reverted, then it is possible that the root cause was
   fixed by another commit further down the line.

2. Have you consider adding your reproducer into BPF selftests? Would be
   very useful to have in stable (though it needs to first land in
   bpf-next first).

> > 1: https://lore.kernel.org/stable/20240125001554.25287-1-eddyz87@gmail.com/
> > 2: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

Re: [RFC PATCH 6.1.y 0/2] bpf: backport scalar not-equal tracking fixes

Posted by Zhenzhong Wu 5 days, 10 hours ago

Hi Shung-Hsi,

Thanks, that makes sense.

I was mixing up two different things here: the BPF docs say not to add
"Cc: stable@vger.kernel.org" to the patch description as a stable tag, and
instead ask BPF maintainers to queue stable fixes. Cc'ing stable@ in the
email headers for awareness is separate. Thanks for pointing this out.

Thanks also for pointing out the 6.6.y requirement. I'll make sure v2 takes
the stable ordering requirement into account before targeting 6.1.y.

I ran the suggested checks with the same reproducer, where BAD means the
program ran and observed the unexpected error, and GOOD means no error was
observed:

- latest 6.6.y, v6.6.142 (924b4a879cbb): BAD
- bpf-next at b93c55b4932d: GOOD
- bpf-next with the d028f87517d6 JNE refinement reverted: still GOOD

So the issue still reproduces on the latest 6.6.y, but d028f87517d6 alone
does not explain why bpf-next passes. I'll do more narrowing and update the
candidate backport set accordingly.

I'm also happy to add a BPF selftest for this. I plan to send a v2 series
later this week.

BR,
Zhenzhong

Shung-Hsi Yu <shung-hsi.yu@suse.com>于2026年6月2日 周二17:18写道：


On Tue, Jun 2, 2026 at 5:18 PM Shung-Hsi Yu <shung-hsi.yu@suse.com> wrote:
>
> On Tue, Jun 02, 2026 at 02:42:35PM +0800, Shung-Hsi Yu wrote:
> > On Tue, Jun 02, 2026 at 01:47:01PM +0800, Shung-Hsi Yu wrote:
> > ...
> > > On Tue, Jun 02, 2026 at 02:03:58AM +0800, Zhenzhong Wu wrote:
> > > > Hi BPF maintainers,
> > > >
> > > > This RFC backports two BPF verifier scalar range-tracking fixes to 6.1.y.
> > > > The series is intended to fix a verifier state-pruning issue where an
> > > > impossible scalar path can be kept while the real success path is pruned.
> > > >
> > > > This is a verifier scalar range-tracking issue, not a helper-specific
> > > > issue.
> > > > The visible failure is that the verifier can prune the real success
> > > > continuation, which should not be skipped, and keep only an impossible one.
> > > ...
> > >
> > > This sounds somewhat similar to the issue fixed in "backport of iterator
> > > and callback handling fixes" for stable 6.6[1] by @Eduard. Could you try
> > > to test on the latest stable 6.6.y as well at see if you can reproduce
> > > the issue there?
> > ...
> >
> > My mistake, the reproducer you had doesn't use iterator or callback, so
> > probably not fixed in stable 6.6. I'll take a better look at this later
> > this week.
>
> Two more ideas beside testing on latest stable 6.6.
>
> 1. Can you try testing on bpf-next, but with commit d028f87517d6 'bpf:
>    make the verifier tracks the "not equal" for regs' reverted? My
>    concern is that it is possible that commit d028f87517d6 does not
>    address the root cause of incorrect state pruning here.
>
>    If the reproducer _fails_ to reproduce the issue even with commit
>    d028f87517d6 reverted, then it is possible that the root cause was
>    fixed by another commit further down the line.
>
> 2. Have you consider adding your reproducer into BPF selftests? Would be
>    very useful to have in stable (though it needs to first land in
>    bpf-next first).
>
> > > 1: https://lore.kernel.org/stable/20240125001554.25287-1-eddyz87@gmail.com/
> > > 2: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

Re: [RFC PATCH 6.1.y 0/2] bpf: backport scalar not-equal tracking fixes

Posted by Shung-Hsi Yu 1 day, 19 hours ago

Just want to send out a quick reply after looking at this.

On Wed, Jun 03, 2026 at 01:25:15AM +0800, Zhenzhong Wu wrote:
> Hi Shung-Hsi,
...
> I ran the suggested checks with the same reproducer, where BAD means the
> program ran and observed the unexpected error, and GOOD means no error was
> observed:
> 
> - latest 6.6.y, v6.6.142 (924b4a879cbb): BAD
> - bpf-next at b93c55b4932d: GOOD
> - bpf-next with the d028f87517d6 JNE refinement reverted: still GOOD
> 
> So the issue still reproduces on the latest 6.6.y, but d028f87517d6 alone
> does not explain why bpf-next passes. I'll do more narrowing and update the
> candidate backport set accordingly.
...

I think it possibly comes down to commit 4bf79f9be434e ("bpf: Track
equal scalars history on per-instruction level") added in v6.12. Without
that, the precise mark wasn't propogated (for scalars with the same ID),
and that likely made the state comparison (invalidly) go through.

Shung-Hsi