From nobody Sun Feb 8 17:37:41 2026 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77F9E2DE6F8 for ; Sat, 17 Jan 2026 10:09:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768644594; cv=none; b=SHavZIesm4YOgj822L2DBxKm182mBW02WeLrQG/wSuk7nWdG4A3qy4OUwc69nCcQJEMXn7KCpPxLHQhbDfNMQP4Ehpwuk7ymunHOj6jOt4FCDpwn3tqNtoj63bJXJJf82c/gI5SdttXV2+pIy/yHrFJTVicCjenuZFnyh3XbxEE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768644594; c=relaxed/simple; bh=ILGfYOXnBd2SedZINJ1vZqXThkv4zHeV+jRFjhqO+ew=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=WzHaOYF8BnEZl/zkudMatdCuITwXgyReu53YtKyXcDMNe5Uea8O3nOQ4Xe3Bhkta+2gB5cG2VtiNKoc6HuAm4UfY/Qh7yPYqjggW5C9FIQABwH/k6Qx2LikYFJnA63MPLFmjZajCUXGLbGDSaiU9WIYVJzUHmZvWmioAf+utYWQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=U9WWDkZ5; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="U9WWDkZ5" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2a58f2e514eso18652945ad.3 for ; Sat, 17 Jan 2026 02:09:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768644592; x=1769249392; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TpLRIydUf332u62HkHeb5lB6bp3zwgBaw2oGbRiHoKI=; b=U9WWDkZ5S8w74Qrs1RiJlnI0PpGl0uDxDfma4zAlLNDJk4UtKga9qOVVPLdqUXXJeW KnNmr7C8xeSAquejUfXBK+8uvY68xuJNrJoLTNXnqFUWYsqg33kdFkswEh+0Be6ndOT7 1xvHwIQ67o9fnZBvpaJWRWMAzatPbGwYcoHHY8Yl6NMvzXsE6lfgtWgnv2Xv9XTwjvk/ a2H2x80wSUZvAlmjBNjrLk77RobvuUmdAVueN0PSE/o8OSH3QVEEq1t5HlU/0tNA8849 wQmMzNkvpN6bIrNFJ8tkQAG2z/GL7lErRDifEPde237I/r1jFwBz3rVN0CUOrSSQACKo 1hUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768644592; x=1769249392; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=TpLRIydUf332u62HkHeb5lB6bp3zwgBaw2oGbRiHoKI=; b=i+1AiRpgzcfoUrpb1JVze5K7BV7B8J4sbRbfINcAqU+wdXBWC6R1xIUfpSFZ9MKu2R 1hyfKJYuEiYfU+7fYOv4uBStEm/GD1Wiy1EDHDgtjeIwbEORZr5EyQgUztUwj1W6GUvi ZnWQ+FOyB43Anc/ihlv2aTlu9oVyX5oqXrV1HzVdsqL2DNUPVMwmhzh1pVgOd7blptp8 EZxdi/69VJM+y1/5F3YABwHYf0bw0OR1Rpc/fr+33mobPF57PtWNmwUIcXyVWyYXJVBV jtUVZzLvhGD1Rquw+o4s4t41VMXkzBeQaz0KqkS2R+Rc3EFHowzN7DvZhstbHhgSpLwH 9GMA== X-Forwarded-Encrypted: i=1; AJvYcCUTXkygPeP6QQBt/320jVNlDCGwykoljZXVrfg+DFhhNp7WFRrVwI9oOd4pb7Ds8PSU/HHULd1+5vt7KHw=@vger.kernel.org X-Gm-Message-State: AOJu0YyOw1cFWjoxgzfbkbsXnCoWjBbTCOvqJLYZkmNsWLstQvcbrXRl vON/ckeXLVdjd6sBWYwKt4/lRLzF+N0KDpzMQEmfNwxsQTPVAWF3gD1O X-Gm-Gg: AY/fxX5ENvxtfjF8iuTC9Lb0S3gGKwxwAs+V06HBkQZlU+iVgI3WYBP1Sj2ZexKHqIE PMN5bTl6BaMnpvVEvX7oXdiua0Cun++obNZ5sehdBIL9S0fMxsmXw8Z4ffhFMWBgk4eNKoTAng0 ZM65Nuv8Bd6jnjW7zdPO0fi53mro9XCApWIBEeHnfSMkdtd4dwENokBu29bS7PswQCUBHLXE2oo p6UYq2SydFBAmaKJNq70hmqkGdqGt01FuNR6XmSTK4w5Pi+hx01eChH4rs8RV0G3Tc5Fw4JXFWf bJ0Cv73swUgIYJGEHQ8mF6Y6ZJRQs20Ssu9W1SVJPu65eXsqYUMaXCcrwsuhgw7KH7+3r/O2/sM b4DJcRAMjwXtjcnV9Q0bh7/Sl/rZXHsDkETOmjWFLuhgQUJz7XMKS+bi26Um53nTGj+4qrl0TuX nJgP2LROaUTQJ9QwqVx30aTW0= X-Received: by 2002:a17:902:d4d2:b0:2a0:acca:f3f0 with SMTP id d9443c01a7336-2a71893cddcmr50870455ad.49.1768644591710; Sat, 17 Jan 2026 02:09:51 -0800 (PST) Received: from localhost.localdomain ([138.199.21.245]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a7193fab79sm43676805ad.67.2026.01.17.02.09.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 17 Jan 2026 02:09:51 -0800 (PST) From: Qiliang Yuan To: eddyz87@gmail.com Cc: andrii@kernel.org, ast@kernel.org, bpf@vger.kernel.org, daniel@iogearbox.net, haoluo@google.com, jolsa@kernel.org, kpsingh@kernel.org, linux-kernel@vger.kernel.org, martin.lau@linux.dev, realwujing@gmail.com, sdf@fomichev.me, song@kernel.org, yonghong.song@linux.dev, yuanql9@chinatelecom.cn Subject: [PATCH v3] bpf/verifier: optimize precision backtracking by skipping precise bits Date: Sat, 17 Jan 2026 18:09:22 +0800 Message-Id: <20260117100922.38459-1-realwujing@gmail.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Optimize __mark_chain_precision() by skipping registers or stack slots that are already marked as precise. This prevents redundant history walks when multiple verification paths converge on the same state. Centralizing this check in __mark_chain_precision() improves efficiency for all entry points (mark_chain_precision, propagate_precision) and simplifies call-site logic. Performance Results (Extreme Stress Test on 32-core system): Under system-wide saturation (32 parallel veristat instances) using a high-stress backtracking payload (~290k insns, 34k states), this optimization demonstrated significant micro-architectural gains: - Total Retired Instructions: -82.2 Billion (-1.94%) - Total CPU Cycles: -161.3 Billion (-3.11%) - Avg. Insns per Verify: -17.2 Million (-2.84%) - Page Faults: -39.90% (Significant reduction in memory pre= ssure) The massive reduction in page faults suggests that avoiding redundant backtracking significantly lowers memory subsystem churn during deep state history walks. Verified that total instruction and state counts (per veristat) remain identical across all tests, confirming logic equivalence. Suggested-by: Eduard Zingerman Signed-off-by: Qiliang Yuan --- On Fri, Jan 16, 2026 at 11:27 PM Eduard Zingerman wrote: > As I said before, this is a useful change. >=20 > > 4. bpf/verifier: optimize precision backtracking by skipping precise bi= ts > > (https://lore.kernel.org/all/20260115152037.449362-1-realwujing@gmai= l.com/) > > Following your suggestion to refactor the logic into the core engine= for > > better coverage and clarity, I have provided a v2 version of this pa= tch here: > > (https://lore.kernel.org/all/20260116045839.23743-1-realwujing@gmail= .com/) > > This v2 version specifically addresses your feedback by centralizing= the > > logic and includes a comprehensive performance comparison (veristat = results) > > in the commit log. It reduces the complexity of redundant backtracki= ng > > requests from O(D) (where D is history depth) to O(1) by utilizing t= he > > 'precise' flag to skip already-processed states. >=20 > Same as with #1: using veristat duration metric, especially for such > small programs, is not a reasonable performance analysis. Link: https://lore.kernel.org/all/75807149f7de7a106db0ccda88e5d4439b94a1e7.= camel@gmail.com/ Hi Eduard, Acknowledged. To provide a more robust performance analysis, I have moved a= way from veristat duration and instead used hardware performance counters (perf= stat) under system-wide saturation with a custom backtracking stress test. This=20 demonstrates the optimization's hardware-level efficiency (retired instruct= ions=20 and page faults) more reliably. Best regards, Qiliang Test case (backtrack_stress.c): #include "vmlinux.h" #include struct { __uint(type, BPF_MAP_TYPE_ARRAY); __uint(max_entries, 1); __type(key, __u32); __type(value, __u64); } dummy_map SEC(".maps"); SEC("tc") int backtrack_stress(struct __sk_buff *skb) { __u32 key =3D 0; __u64 *val =3D bpf_map_lookup_elem(&dummy_map, &key); if (!val) return 0; __u64 x =3D *val; =20 /* 1. Create a deep dependency chain to fill history for 'x' */ x +=3D 1; x *=3D 2; x -=3D 1; x ^=3D 0x55; x +=3D 1; x *=3D 2; x -=3D 1; x ^=3D 0xAA; x +=3D 1; x *=3D 2; x -=3D 1; x ^=3D 0x55; x +=3D 1; x *=3D 2; x -=3D 1; x ^=3D 0xAA; /* 2. Create many states via conditional branches */ #define CHECK_X(n) if (x =3D=3D n) { x +=3D 1; } if (x =3D=3D n + 1) { x -= =3D 1; } #define CHECK_X10(n) CHECK_X(n) CHECK_X(n+2) CHECK_X(n+4) CHECK_X(n+6) CHE= CK_X(n+8) \ CHECK_X(n+10) CHECK_X(n+12) CHECK_X(n+14) CHECK_X(n+1= 6) CHECK_X(n+18) #define CHECK_X100(n) CHECK_X10(n) CHECK_X10(n+20) CHECK_X10(n+40) CHECK_X1= 0(n+60) CHECK_X10(n+80) \ CHECK_X10(n+100) CHECK_X10(n+120) CHECK_X10(n+140) CH= ECK_X10(n+160) CHECK_X10(n+180) CHECK_X100(0) CHECK_X100(200) CHECK_X100(400) CHECK_X100(600) CHECK_X100(800) CHECK_X100(1000) /* 3. Trigger mark_chain_precision() multiple times on 'x' */ #pragma clang loop unroll(full) for (int i =3D 0; i < 500; i++) { if (x =3D=3D (2000 + i)) {=20 x +=3D 1; } } return x; } char _license[] SEC("license") =3D "GPL"; How to Test: ----------- 1. Compile the BPF program (from kernel root): clang -O2 -target bpf \ -I./tools/testing/selftests/bpf/ \ -I./tools/lib/ \ -I./tools/include/uapi/ \ -I./tools/testing/selftests/bpf/include \ -c backtrack_stress.c -o backtrack_stress.bpf.o 2. System-wide saturation profiling (32 cores): # Start perf in background sudo perf stat -a -- sleep 60 & # Start 32 parallel loops of veristat for i in {1..32}; do (while true; do ./veristat backtrack_stress.bpf.o >= /dev/null; done &); done Raw Performance Data: --------------------- Baseline (6.19.0-rc5-baseline, git commit 944aacb68baf): File Program Verdict Duration (us) Insns S= tates Program size Jited size ---------------------- ---------------- ------- ------------- ------ -= ----- ------------ ---------- backtrack_stress.bpf.o backtrack_stress success 197924 289939 = 34331 5437 28809 ---------------------- ---------------- ------- ------------- ------ -= ----- ------------ ---------- 1,388,149 context-switches # 722.5 cs/sec = cs_per_second =20 1,921,399.69 msec cpu-clock # 32.0 CPUs C= PUs_utilized =20 25,113 cpu-migrations # 13.1 migrati= ons/sec migrations_per_second 8,108,516 page-faults # 4220.1 faults/= sec page_faults_per_second 97,445,724,421 branch-misses # 8.1 % bran= ch_miss_rate (50.07%) 903,852,287,721 branches # 470.4 M/sec = branch_frequency (66.76%) 5,190,519,089,751 cpu-cycles # 2.7 GHz cy= cles_frequency (66.81%) 4,230,500,391,043 instructions # 0.8 instruc= tions insn_per_cycle (66.76%) 1,853,856,616,836 stalled-cycles-frontend # 0.36 fronten= d_cycles_idle (66.52%) 60.031936126 seconds time elapsed Patched (6.19.0-rc5-optimized): File Program Verdict Duration (us) Insns S= tates Program size Jited size ---------------------- ---------------- ------- ------------- ------ -= ----- ------------ ---------- backtrack_stress.bpf.o backtrack_stress success 214600 289939 = 34331 5437 28809 ---------------------- ---------------- ------- ------------- ------ -= ----- ------------ ---------- 1,433,270 context-switches # 745.9 cs/sec = cs_per_second =20 1,921,604.54 msec cpu-clock # 32.0 CPUs C= PUs_utilized =20 22,795 cpu-migrations # 11.9 migrati= ons/sec migrations_per_second 4,873,895 page-faults # 2536.4 faults/= sec page_faults_per_second 97,038,959,375 branch-misses # 8.1 % bran= ch_miss_rate (50.07%) 890,170,312,491 branches # 463.2 M/sec = branch_frequency (66.76%) 5,029,192,994,167 cpu-cycles # 2.6 GHz cy= cles_frequency (66.81%) 4,148,237,426,723 instructions # 0.8 instruc= tions insn_per_cycle (66.77%) 1,818,457,318,301 stalled-cycles-frontend # 0.36 fronten= d_cycles_idle (66.51%) 60.032523872 seconds time elapsed kernel/bpf/verifier.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3135643d5695..250f1dc0298e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4765,14 +4765,37 @@ static int __mark_chain_precision(struct bpf_verifi= er_env *env, * slot, but don't set precise flag in current state, as precision * tracking in the current state is unnecessary. */ - func =3D st->frame[bt->frame]; if (regno >=3D 0) { - reg =3D &func->regs[regno]; + reg =3D &st->frame[bt->frame]->regs[regno]; if (reg->type !=3D SCALAR_VALUE) { verifier_bug(env, "backtracking misuse"); return -EFAULT; } + if (reg->precise) + return 0; bt_set_reg(bt, regno); + } else { + for (fr =3D bt->frame; fr >=3D 0; fr--) { + u32 reg_mask =3D bt_frame_reg_mask(bt, fr); + u64 stack_mask =3D bt_frame_stack_mask(bt, fr); + DECLARE_BITMAP(mask, 64); + + func =3D st->frame[fr]; + if (reg_mask) { + bitmap_from_u64(mask, reg_mask); + for_each_set_bit(i, mask, 32) { + if (func->regs[i].precise) + bt_clear_frame_reg(bt, fr, i); + } + } + if (stack_mask) { + bitmap_from_u64(mask, stack_mask); + for_each_set_bit(i, mask, 64) { + if (func->stack[i].spilled_ptr.precise) + bt_clear_frame_slot(bt, fr, i); + } + } + } } =20 if (bt_empty(bt)) --=20 2.39.5