From nobody Sun Feb  8 17:37:41 2026
Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com
 [209.85.214.179])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77F9E2DE6F8
	for <linux-kernel@vger.kernel.org>; Sat, 17 Jan 2026 10:09:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.179
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1768644594; cv=none;
 b=SHavZIesm4YOgj822L2DBxKm182mBW02WeLrQG/wSuk7nWdG4A3qy4OUwc69nCcQJEMXn7KCpPxLHQhbDfNMQP4Ehpwuk7ymunHOj6jOt4FCDpwn3tqNtoj63bJXJJf82c/gI5SdttXV2+pIy/yHrFJTVicCjenuZFnyh3XbxEE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1768644594; c=relaxed/simple;
	bh=ILGfYOXnBd2SedZINJ1vZqXThkv4zHeV+jRFjhqO+ew=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=WzHaOYF8BnEZl/zkudMatdCuITwXgyReu53YtKyXcDMNe5Uea8O3nOQ4Xe3Bhkta+2gB5cG2VtiNKoc6HuAm4UfY/Qh7yPYqjggW5C9FIQABwH/k6Qx2LikYFJnA63MPLFmjZajCUXGLbGDSaiU9WIYVJzUHmZvWmioAf+utYWQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=U9WWDkZ5; arc=none smtp.client-ip=209.85.214.179
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="U9WWDkZ5"
Received: by mail-pl1-f179.google.com with SMTP id
 d9443c01a7336-2a58f2e514eso18652945ad.3
        for <linux-kernel@vger.kernel.org>;
 Sat, 17 Jan 2026 02:09:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1768644592; x=1769249392;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=TpLRIydUf332u62HkHeb5lB6bp3zwgBaw2oGbRiHoKI=;
        b=U9WWDkZ5S8w74Qrs1RiJlnI0PpGl0uDxDfma4zAlLNDJk4UtKga9qOVVPLdqUXXJeW
         KnNmr7C8xeSAquejUfXBK+8uvY68xuJNrJoLTNXnqFUWYsqg33kdFkswEh+0Be6ndOT7
         1xvHwIQ67o9fnZBvpaJWRWMAzatPbGwYcoHHY8Yl6NMvzXsE6lfgtWgnv2Xv9XTwjvk/
         a2H2x80wSUZvAlmjBNjrLk77RobvuUmdAVueN0PSE/o8OSH3QVEEq1t5HlU/0tNA8849
         wQmMzNkvpN6bIrNFJ8tkQAG2z/GL7lErRDifEPde237I/r1jFwBz3rVN0CUOrSSQACKo
         1hUQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1768644592; x=1769249392;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=TpLRIydUf332u62HkHeb5lB6bp3zwgBaw2oGbRiHoKI=;
        b=i+1AiRpgzcfoUrpb1JVze5K7BV7B8J4sbRbfINcAqU+wdXBWC6R1xIUfpSFZ9MKu2R
         1hyfKJYuEiYfU+7fYOv4uBStEm/GD1Wiy1EDHDgtjeIwbEORZr5EyQgUztUwj1W6GUvi
         ZnWQ+FOyB43Anc/ihlv2aTlu9oVyX5oqXrV1HzVdsqL2DNUPVMwmhzh1pVgOd7blptp8
         EZxdi/69VJM+y1/5F3YABwHYf0bw0OR1Rpc/fr+33mobPF57PtWNmwUIcXyVWyYXJVBV
         jtUVZzLvhGD1Rquw+o4s4t41VMXkzBeQaz0KqkS2R+Rc3EFHowzN7DvZhstbHhgSpLwH
         9GMA==
X-Forwarded-Encrypted: i=1;
 AJvYcCUTXkygPeP6QQBt/320jVNlDCGwykoljZXVrfg+DFhhNp7WFRrVwI9oOd4pb7Ds8PSU/HHULd1+5vt7KHw=@vger.kernel.org
X-Gm-Message-State: AOJu0YyOw1cFWjoxgzfbkbsXnCoWjBbTCOvqJLYZkmNsWLstQvcbrXRl
	vON/ckeXLVdjd6sBWYwKt4/lRLzF+N0KDpzMQEmfNwxsQTPVAWF3gD1O
X-Gm-Gg: AY/fxX5ENvxtfjF8iuTC9Lb0S3gGKwxwAs+V06HBkQZlU+iVgI3WYBP1Sj2ZexKHqIE
	PMN5bTl6BaMnpvVEvX7oXdiua0Cun++obNZ5sehdBIL9S0fMxsmXw8Z4ffhFMWBgk4eNKoTAng0
	ZM65Nuv8Bd6jnjW7zdPO0fi53mro9XCApWIBEeHnfSMkdtd4dwENokBu29bS7PswQCUBHLXE2oo
	p6UYq2SydFBAmaKJNq70hmqkGdqGt01FuNR6XmSTK4w5Pi+hx01eChH4rs8RV0G3Tc5Fw4JXFWf
	bJ0Cv73swUgIYJGEHQ8mF6Y6ZJRQs20Ssu9W1SVJPu65eXsqYUMaXCcrwsuhgw7KH7+3r/O2/sM
	b4DJcRAMjwXtjcnV9Q0bh7/Sl/rZXHsDkETOmjWFLuhgQUJz7XMKS+bi26Um53nTGj+4qrl0TuX
	nJgP2LROaUTQJ9QwqVx30aTW0=
X-Received: by 2002:a17:902:d4d2:b0:2a0:acca:f3f0 with SMTP id
 d9443c01a7336-2a71893cddcmr50870455ad.49.1768644591710;
        Sat, 17 Jan 2026 02:09:51 -0800 (PST)
Received: from localhost.localdomain ([138.199.21.245])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-2a7193fab79sm43676805ad.67.2026.01.17.02.09.48
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 17 Jan 2026 02:09:51 -0800 (PST)
From: Qiliang Yuan <realwujing@gmail.com>
To: eddyz87@gmail.com
Cc: andrii@kernel.org,
	ast@kernel.org,
	bpf@vger.kernel.org,
	daniel@iogearbox.net,
	haoluo@google.com,
	jolsa@kernel.org,
	kpsingh@kernel.org,
	linux-kernel@vger.kernel.org,
	martin.lau@linux.dev,
	realwujing@gmail.com,
	sdf@fomichev.me,
	song@kernel.org,
	yonghong.song@linux.dev,
	yuanql9@chinatelecom.cn
Subject: [PATCH v3] bpf/verifier: optimize precision backtracking by skipping
 precise bits
Date: Sat, 17 Jan 2026 18:09:22 +0800
Message-Id: <20260117100922.38459-1-realwujing@gmail.com>
X-Mailer: git-send-email 2.39.5
In-Reply-To: <aa4cc54a3a0796b16d2d5e13142d104fa5a483e1.camel@gmail.com>
References: <aa4cc54a3a0796b16d2d5e13142d104fa5a483e1.camel@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Optimize __mark_chain_precision() by skipping registers or stack slots
that are already marked as precise. This prevents redundant history
walks when multiple verification paths converge on the same state.

Centralizing this check in __mark_chain_precision() improves efficiency
for all entry points (mark_chain_precision, propagate_precision) and
simplifies call-site logic.

Performance Results (Extreme Stress Test on 32-core system):
Under system-wide saturation (32 parallel veristat instances) using a
high-stress backtracking payload (~290k insns, 34k states), this
optimization demonstrated significant micro-architectural gains:

- Total Retired Instructions:  -82.2 Billion (-1.94%)
- Total CPU Cycles:            -161.3 Billion (-3.11%)
- Avg. Insns per Verify:       -17.2 Million (-2.84%)
- Page Faults:                 -39.90% (Significant reduction in memory pre=
ssure)

The massive reduction in page faults suggests that avoiding redundant
backtracking significantly lowers memory subsystem churn during deep
state history walks.

Verified that total instruction and state counts (per veristat) remain
identical across all tests, confirming logic equivalence.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
On Fri, Jan 16, 2026 at 11:27 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> As I said before, this is a useful change.
>=20
> > 4. bpf/verifier: optimize precision backtracking by skipping precise bi=
ts
> >    (https://lore.kernel.org/all/20260115152037.449362-1-realwujing@gmai=
l.com/)
> >    Following your suggestion to refactor the logic into the core engine=
 for
> >    better coverage and clarity, I have provided a v2 version of this pa=
tch here:
> >    (https://lore.kernel.org/all/20260116045839.23743-1-realwujing@gmail=
.com/)
> >    This v2 version specifically addresses your feedback by centralizing=
 the
> >    logic and includes a comprehensive performance comparison (veristat =
results)
> >    in the commit log. It reduces the complexity of redundant backtracki=
ng
> >    requests from O(D) (where D is history depth) to O(1) by utilizing t=
he
> >    'precise' flag to skip already-processed states.
>=20
> Same as with #1: using veristat duration metric, especially for such
> small programs, is not a reasonable performance analysis.

Link: https://lore.kernel.org/all/75807149f7de7a106db0ccda88e5d4439b94a1e7.=
camel@gmail.com/

Hi Eduard,

Acknowledged. To provide a more robust performance analysis, I have moved a=
way
from veristat duration and instead used hardware performance counters (perf=
 stat)
under system-wide saturation with a custom backtracking stress test. This=20
demonstrates the optimization's hardware-level efficiency (retired instruct=
ions=20
and page faults) more reliably.

Best regards,
Qiliang

Test case (backtrack_stress.c):
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u64);
} dummy_map SEC(".maps");

SEC("tc")
int backtrack_stress(struct __sk_buff *skb)
{
    __u32 key =3D 0;
    __u64 *val =3D bpf_map_lookup_elem(&dummy_map, &key);
    if (!val) return 0;
    __u64 x =3D *val;
   =20
    /* 1. Create a deep dependency chain to fill history for 'x' */
    x +=3D 1; x *=3D 2; x -=3D 1; x ^=3D 0x55;
    x +=3D 1; x *=3D 2; x -=3D 1; x ^=3D 0xAA;
    x +=3D 1; x *=3D 2; x -=3D 1; x ^=3D 0x55;
    x +=3D 1; x *=3D 2; x -=3D 1; x ^=3D 0xAA;

    /* 2. Create many states via conditional branches */
#define CHECK_X(n) if (x =3D=3D n) { x +=3D 1; } if (x =3D=3D n + 1) { x -=
=3D 1; }
#define CHECK_X10(n)  CHECK_X(n) CHECK_X(n+2) CHECK_X(n+4) CHECK_X(n+6) CHE=
CK_X(n+8) \
                      CHECK_X(n+10) CHECK_X(n+12) CHECK_X(n+14) CHECK_X(n+1=
6) CHECK_X(n+18)
#define CHECK_X100(n) CHECK_X10(n) CHECK_X10(n+20) CHECK_X10(n+40) CHECK_X1=
0(n+60) CHECK_X10(n+80) \
                      CHECK_X10(n+100) CHECK_X10(n+120) CHECK_X10(n+140) CH=
ECK_X10(n+160) CHECK_X10(n+180)

    CHECK_X100(0)
    CHECK_X100(200)
    CHECK_X100(400)
    CHECK_X100(600)
    CHECK_X100(800)
    CHECK_X100(1000)

    /* 3. Trigger mark_chain_precision() multiple times on 'x' */
    #pragma clang loop unroll(full)
    for (int i =3D 0; i < 500; i++) {
        if (x =3D=3D (2000 + i)) {=20
            x +=3D 1;
        }
    }

    return x;
}

char _license[] SEC("license") =3D "GPL";

How to Test:
-----------
1. Compile the BPF program (from kernel root):
   clang -O2 -target bpf \
         -I./tools/testing/selftests/bpf/ \
         -I./tools/lib/ \
         -I./tools/include/uapi/ \
         -I./tools/testing/selftests/bpf/include \
         -c backtrack_stress.c -o backtrack_stress.bpf.o

2. System-wide saturation profiling (32 cores):
   # Start perf in background
   sudo perf stat -a -- sleep 60 &
   # Start 32 parallel loops of veristat
   for i in {1..32}; do (while true; do ./veristat backtrack_stress.bpf.o >=
 /dev/null; done &); done

Raw Performance Data:
---------------------
Baseline (6.19.0-rc5-baseline, git commit 944aacb68baf):
File                    Program           Verdict  Duration (us)   Insns  S=
tates  Program size  Jited size
----------------------  ----------------  -------  -------------  ------  -=
-----  ------------  ----------
backtrack_stress.bpf.o  backtrack_stress  success         197924  289939   =
34331          5437       28809
----------------------  ----------------  -------  -------------  ------  -=
-----  ------------  ----------

         1,388,149      context-switches                 #    722.5 cs/sec =
 cs_per_second    =20
      1,921,399.69 msec cpu-clock                        #     32.0 CPUs  C=
PUs_utilized      =20
            25,113      cpu-migrations                   #     13.1 migrati=
ons/sec  migrations_per_second
         8,108,516      page-faults                      #   4220.1 faults/=
sec  page_faults_per_second
    97,445,724,421      branch-misses                    #      8.1 %  bran=
ch_miss_rate         (50.07%)
   903,852,287,721      branches                         #    470.4 M/sec  =
branch_frequency     (66.76%)
 5,190,519,089,751      cpu-cycles                       #      2.7 GHz  cy=
cles_frequency       (66.81%)
 4,230,500,391,043      instructions                     #      0.8 instruc=
tions  insn_per_cycle  (66.76%)
 1,853,856,616,836      stalled-cycles-frontend          #     0.36 fronten=
d_cycles_idle        (66.52%)

      60.031936126 seconds time elapsed

Patched (6.19.0-rc5-optimized):
File                    Program           Verdict  Duration (us)   Insns  S=
tates  Program size  Jited size
----------------------  ----------------  -------  -------------  ------  -=
-----  ------------  ----------
backtrack_stress.bpf.o  backtrack_stress  success         214600  289939   =
34331          5437       28809
----------------------  ----------------  -------  -------------  ------  -=
-----  ------------  ----------

         1,433,270      context-switches                 #    745.9 cs/sec =
 cs_per_second    =20
      1,921,604.54 msec cpu-clock                        #     32.0 CPUs  C=
PUs_utilized      =20
            22,795      cpu-migrations                   #     11.9 migrati=
ons/sec  migrations_per_second
         4,873,895      page-faults                      #   2536.4 faults/=
sec  page_faults_per_second
    97,038,959,375      branch-misses                    #      8.1 %  bran=
ch_miss_rate         (50.07%)
   890,170,312,491      branches                         #    463.2 M/sec  =
branch_frequency     (66.76%)
 5,029,192,994,167      cpu-cycles                       #      2.6 GHz  cy=
cles_frequency       (66.81%)
 4,148,237,426,723      instructions                     #      0.8 instruc=
tions  insn_per_cycle  (66.77%)
 1,818,457,318,301      stalled-cycles-frontend          #     0.36 fronten=
d_cycles_idle        (66.51%)

      60.032523872 seconds time elapsed

 kernel/bpf/verifier.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3135643d5695..250f1dc0298e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4765,14 +4765,37 @@ static int __mark_chain_precision(struct bpf_verifi=
er_env *env,
 	 * slot, but don't set precise flag in current state, as precision
 	 * tracking in the current state is unnecessary.
 	 */
-	func =3D st->frame[bt->frame];
 	if (regno >=3D 0) {
-		reg =3D &func->regs[regno];
+		reg =3D &st->frame[bt->frame]->regs[regno];
 		if (reg->type !=3D SCALAR_VALUE) {
 			verifier_bug(env, "backtracking misuse");
 			return -EFAULT;
 		}
+		if (reg->precise)
+			return 0;
 		bt_set_reg(bt, regno);
+	} else {
+		for (fr =3D bt->frame; fr >=3D 0; fr--) {
+			u32 reg_mask =3D bt_frame_reg_mask(bt, fr);
+			u64 stack_mask =3D bt_frame_stack_mask(bt, fr);
+			DECLARE_BITMAP(mask, 64);
+
+			func =3D st->frame[fr];
+			if (reg_mask) {
+				bitmap_from_u64(mask, reg_mask);
+				for_each_set_bit(i, mask, 32) {
+					if (func->regs[i].precise)
+						bt_clear_frame_reg(bt, fr, i);
+				}
+			}
+			if (stack_mask) {
+				bitmap_from_u64(mask, stack_mask);
+				for_each_set_bit(i, mask, 64) {
+					if (func->stack[i].spilled_ptr.precise)
+						bt_clear_frame_slot(bt, fr, i);
+				}
+			}
+		}
 	}
=20
 	if (bt_empty(bt))
--=20
2.39.5