From nobody Mon May 25 00:08:05 2026 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 865A63D565C for ; Wed, 20 May 2026 10:27:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.196 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779272866; cv=none; b=MYSSh1ckeGuYdKGdN/BtiEy23DPsEKR9NikuvdT/ktkKCp5mctZDUBEBLeIaLOeDSmjVj1m/M3Zh/zqfgq1QrD0BsZFSf/5MVM2wxq84Z51dbtgQoFBnPFgmnkgZ/8eWziHZV5jlqd++yZ3Y6JeP6pqrsKlgCEzyecVoBZ41N3o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779272866; c=relaxed/simple; bh=v4LRPgssKtacnsgHYPSxXQMqRCVWbDp3MTS0anM1IUg=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=X1M/m2y+dlPDs13Xej5d9sIULXRe0oFi1ZvOCDp/U8mIC/ckP5S7zJk+G0WwvTMMJofswSDtg4w8IvZGg8t+612QLUpOZQPgwkhkS1ji88SjrKCyPhk3Vy/ZxHyzOR1tlOnPZIihC8844mRKjcQRMIzOqbM6LfKccJhRtcr8LiM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Q7R528Xt; arc=none smtp.client-ip=209.85.214.196 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Q7R528Xt" Received: by mail-pl1-f196.google.com with SMTP id d9443c01a7336-2ba21d32776so33148375ad.2 for ; Wed, 20 May 2026 03:27:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779272856; x=1779877656; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=4Jr69J7LRul37iv7/iruhVOBfe8scp7vK2+y5cCRCh0=; b=Q7R528XtvYRNu4A9tbubq+K4K3WXLTLbKHVdG8xejzBc6p6efl42jngrf7LNljbkMN glSKM59I3ZTf6h817uMi4ya8hs87vazlw24ci+Iy3xNae2NJEYBiDC5H/O9zKrmkV7Hm teuBQ6e5Xk273BIwFirkBUK8FZGUwK/E9Jbj5wYobkoPJllzkuCVpCUZJJVWslye7evk nwHMESTP5KFLPtEg93z45taVFVwpWdMHzwJNp7B0fp2eSCjG1VR7d+rAqfKaYfSsusuH 3qG2Aq38PJ3OuNdh4n5H4EANgAEKI6poZT+eLuI8RkowEbmbgdBnos0e9z2mmyjTPqf0 oITA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779272856; x=1779877656; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4Jr69J7LRul37iv7/iruhVOBfe8scp7vK2+y5cCRCh0=; b=HsSJkv3maCF9L66y+ZTVjs+xWhUNaT/uNSqpbV+qe7CSICxbBZsKvxh5FSh2Z219+U amQRSYFS+LTR6G8FN4VFicEQMVSiun6WXdHfHLZ0t6LjzDFxWNYlTWt47VP2PbqSa3zM rmgvIJ4XBAn9iLbm3nS8KEBAvNF6hXVMkl/qLJgg8aXpDLhu7BKJxEMDnLB1Xa9t1nXb L7rptlImI+C/p6BK/hqdXYgbExXEpl8VzpuoQOAp/Uws2/5haa+FHklZpcLdkpEwjUIw C2yIaVja4kSua24v08wrzXm6TdaTteIQf7X9CByv8CAQxFHrMFLpJ1a/YAkxavLi+nsD Osug== X-Forwarded-Encrypted: i=1; AFNElJ/h9LZpZNzEuSy1WZ2BXHHegHy3ayjN0iUquHRyQAlNPgyxmeIQFHDz5UJVbbgVRw7xBzyxyWPmbYeJ+ps=@vger.kernel.org X-Gm-Message-State: AOJu0YzUzG5SVlFyQ1eJJNOI5mtv2QLyvKirEGWF7PnyWbggmYW41ys+ S5ZVkLWFZDIdGentJjgOlE3l0pooPrwmEzneVkxyCn8sDoVBQ/r935c/ X-Gm-Gg: Acq92OGRiGMw5N2ykDbKeLXS6k6aBe8j7/2Ml1v19pfbe56UAuQsZZHJYpYvJf4S52C W1NwU5MPIMNtQVkx140DqmdrZ3IoBThri6QODkmjViOmtSGxIbR8u+tLUUlTigjWU+9y5v2AxO3 PQwr17kYj20K8vy27QlfQuNU611uYm7G0L0VcJj1Wxu3grE01IDYT8L8/+Eg4OPo5Njq7aHSiud oZHVbrYI3VTRvQB+uYYN0EGMLPOJrEtg59wxuFffiWcLH5rrEXy9XXqvyX3jS8jv8OjdkqG6VQJ O89+T2gSvarvYu7vpRAPButhXDtYXGs6i0vyWWjyunV0em+VbXzpgjITnBpiH6ZAPDO91LRCuwV GB+1SB4x2GM6s+Sne44AD7pLbYn5eFHQjMJ3sPAAeXbBfp93TMS1uvTigcxGtuks1uBMVrR4HQk H65GKA4ushSyV3KZHWwYqXwJYrkyvUIqS3h5+yqfVQ0A== X-Received: by 2002:a17:903:2a87:b0:2ba:4eee:6c1e with SMTP id d9443c01a7336-2bd7e8214d5mr251314915ad.15.1779272855877; Wed, 20 May 2026 03:27:35 -0700 (PDT) Received: from localhost ([111.228.63.84]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2bd5d116287sm216373685ad.68.2026.05.20.03.27.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 May 2026 03:27:35 -0700 (PDT) From: Zhang Cen To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , John Fastabend , Stanislav Fomichev , Jakub Sitnicki , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, zerocling0077@gmail.com, 2045gemini@gmail.com, Zhang Cen , stable@vger.kernel.org Subject: [PATCH v3] bpf, sockmap: keep sk_msg copy state in sync Date: Wed, 20 May 2026 18:27:15 +0800 Message-Id: <20260520102715.3033936-1-rollkingzzc@gmail.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" SK_MSG uses msg->sg.copy as per-scatterlist-entry provenance. Entries with this bit set are copied before data/data_end are exposed to SK_MSG BPF programs for direct packet access. bpf_msg_pull_data(), bpf_msg_push_data(), and bpf_msg_pop_data() rewrite the sk_msg scatterlist ring by collapsing, splitting, and shifting entries. These operations move msg->sg.data[] entries, but the parallel copy bitmap can be left behind on the old slot. A copied entry can then return to msg->sg.start with its copy bit clear and be exposed as directly writable packet data. This corruption path requires an attached SK_MSG BPF program that calls the mutating helpers; ordinary sockmap/TLS traffic that never runs push/pop/pull helper sequences is not affected. Keep msg->sg.copy synchronized with scatterlist entry moves, preserve the copy bit when an entry is split, clear it when a helper replaces an entry with a private page, and clear slots vacated by pull-data compaction. Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data") Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data") Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages") Cc: stable@vger.kernel.org Co-developed-by: Han Guidong <2045gemini@gmail.com> Signed-off-by: Han Guidong <2045gemini@gmail.com> Signed-off-by: Zhang Cen Reviewed-by: John Fastabend --- v3: - Refactor copy-bit helpers per John Fastabend's review: encapsulate scatte= rlist element moves alongside their copy state into a unified helper, and s= treamline bit-clearing operations. - Clarify in commit log that only programs using push/pop/pull are affected. - Note: The additional edge cases reported by the Sashiko-bot bot and the a= ddition of BPF selftests will be addressed in a separate follow-up patch se= ries to expedite this core fix. v2: Address Sashiko-bot's feedback by clearing msg->sg.copy for every entry con= sumed by bpf_msg_pull_data() before compacting the scatterlist ring, preven= ting stale copy bits on collapsed tail entries. v1: While researching recent page cache bugs, we discovered this bug. We confirmed it allows overwriting the page cache of read-only files via splice(). We haven't attempted to write an exploit, but the corruption primitive is verified. PoC available upon request. Recommend fixing ASAP. --- net/core/filter.c | 88 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 83 insertions(+), 5 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 78b548158fb05..1be8fc750a1a1 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2650,6 +2650,38 @@ static void sk_msg_reset_curr(struct sk_msg *msg) } } =20 +static bool sk_msg_elem_is_copy(const struct sk_msg *msg, u32 i) +{ + return test_bit(i, msg->sg.copy); +} + +static void sk_msg_clear_elem_copy(struct sk_msg *msg, u32 i) +{ + __clear_bit(i, msg->sg.copy); +} + +static void sk_msg_set_elem_copy(struct sk_msg *msg, u32 i) +{ + __set_bit(i, msg->sg.copy); +} + +static void sk_msg_clear_copy_range(struct sk_msg *msg, u32 start, u32 end) +{ + while (start !=3D end) { + sk_msg_clear_elem_copy(msg, start); + sk_msg_iter_var_next(start); + } +} + +static void sk_msg_sg_move(struct sk_msg *msg, u32 dst, u32 src) +{ + msg->sg.data[dst] =3D msg->sg.data[src]; + if (sk_msg_elem_is_copy(msg, src)) + sk_msg_set_elem_copy(msg, dst); + else + sk_msg_clear_elem_copy(msg, dst); +} + static const struct bpf_func_proto bpf_msg_cork_bytes_proto =3D { .func =3D bpf_msg_cork_bytes, .gpl_only =3D false, @@ -2688,7 +2720,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u= 32, start, * account for the headroom. */ bytes_sg_total =3D start - offset + bytes; - if (!test_bit(i, msg->sg.copy) && bytes_sg_total <=3D len) + if (!sk_msg_elem_is_copy(msg, i) && bytes_sg_total <=3D len) goto out; =20 /* At this point we need to linearize multiple scatterlist @@ -2734,6 +2766,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u= 32, start, } while (i !=3D last_sge); =20 sg_set_page(&msg->sg.data[first_sge], page, copy, 0); + sk_msg_clear_elem_copy(msg, first_sge); =20 /* To repair sg ring we need to shift entries. If we only * had a single entry though we can just replace it and @@ -2743,8 +2776,14 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, = u32, start, shift =3D last_sge > first_sge ? last_sge - first_sge - 1 : NR_MSG_FRAG_IDS - first_sge + last_sge - 1; - if (!shift) + if (!shift) { + sk_msg_clear_elem_copy(msg, msg->sg.end); goto out; + } + + i =3D first_sge; + sk_msg_iter_var_next(i); + sk_msg_clear_copy_range(msg, i, last_sge); =20 i =3D first_sge; sk_msg_iter_var_next(i); @@ -2758,16 +2797,18 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg,= u32, start, if (move_from =3D=3D msg->sg.end) break; =20 - msg->sg.data[i] =3D msg->sg.data[move_from]; + sk_msg_sg_move(msg, i, move_from); msg->sg.data[move_from].length =3D 0; msg->sg.data[move_from].page_link =3D 0; msg->sg.data[move_from].offset =3D 0; + sk_msg_clear_elem_copy(msg, move_from); sk_msg_iter_var_next(i); } while (1); =20 msg->sg.end =3D msg->sg.end - shift > msg->sg.end ? msg->sg.end - shift + NR_MSG_FRAG_IDS : msg->sg.end - shift; + sk_msg_clear_elem_copy(msg, msg->sg.end); out: sk_msg_reset_curr(msg); msg->data =3D sg_virt(&msg->sg.data[first_sge]) + start - offset; @@ -2790,6 +2831,8 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u= 32, start, { struct scatterlist sge, nsge, nnsge, rsge =3D {0}, *psge; u32 new, i =3D 0, l =3D 0, space, copy =3D 0, offset =3D 0; + bool sge_copy =3D false, nsge_copy =3D false, nnsge_copy =3D false; + bool rsge_copy =3D false; u8 *raw, *to, *from; struct page *page; =20 @@ -2862,6 +2905,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u= 32, start, sk_msg_iter_var_prev(i); psge =3D sk_msg_elem(msg, i); rsge =3D sk_msg_elem_cpy(msg, i); + rsge_copy =3D sk_msg_elem_is_copy(msg, i); =20 psge->length =3D start - offset; rsge.length -=3D psge->length; @@ -2887,23 +2931,34 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg,= u32, start, /* Shift one or two slots as needed */ sge =3D sk_msg_elem_cpy(msg, new); sg_unmark_end(&sge); + sge_copy =3D sk_msg_elem_is_copy(msg, new); =20 nsge =3D sk_msg_elem_cpy(msg, i); + nsge_copy =3D sk_msg_elem_is_copy(msg, i); if (rsge.length) { sk_msg_iter_var_next(i); nnsge =3D sk_msg_elem_cpy(msg, i); + nnsge_copy =3D sk_msg_elem_is_copy(msg, i); sk_msg_iter_next(msg, end); } =20 while (i !=3D msg->sg.end) { msg->sg.data[i] =3D sge; + if (sge_copy) + sk_msg_set_elem_copy(msg, i); + else + sk_msg_clear_elem_copy(msg, i); sge =3D nsge; + sge_copy =3D nsge_copy; sk_msg_iter_var_next(i); if (rsge.length) { nsge =3D nnsge; + nsge_copy =3D nnsge_copy; nnsge =3D sk_msg_elem_cpy(msg, i); + nnsge_copy =3D sk_msg_elem_is_copy(msg, i); } else { nsge =3D sk_msg_elem_cpy(msg, i); + nsge_copy =3D sk_msg_elem_is_copy(msg, i); } } =20 @@ -2911,13 +2966,18 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg,= u32, start, /* Place newly allocated data buffer */ sk_mem_charge(msg->sk, len); msg->sg.size +=3D len; - __clear_bit(new, msg->sg.copy); + sk_msg_clear_elem_copy(msg, new); sg_set_page(&msg->sg.data[new], page, len + copy, 0); if (rsge.length) { get_page(sg_page(&rsge)); sk_msg_iter_var_next(new); msg->sg.data[new] =3D rsge; + if (rsge_copy) + sk_msg_set_elem_copy(msg, new); + else + sk_msg_clear_elem_copy(msg, new); } + sk_msg_clear_elem_copy(msg, msg->sg.end); =20 sk_msg_reset_curr(msg); sk_msg_compute_data_pointers(msg); @@ -2943,27 +3003,38 @@ static void sk_msg_shift_left(struct sk_msg *msg, i= nt i) do { prev =3D i; sk_msg_iter_var_next(i); - msg->sg.data[prev] =3D msg->sg.data[i]; + sk_msg_sg_move(msg, prev, i); } while (i !=3D msg->sg.end); =20 sk_msg_iter_prev(msg, end); + sk_msg_clear_elem_copy(msg, msg->sg.end); } =20 static void sk_msg_shift_right(struct sk_msg *msg, int i) { struct scatterlist tmp, sge; + bool tmp_copy, sge_copy; =20 sk_msg_iter_next(msg, end); sge =3D sk_msg_elem_cpy(msg, i); + sge_copy =3D sk_msg_elem_is_copy(msg, i); sk_msg_iter_var_next(i); tmp =3D sk_msg_elem_cpy(msg, i); + tmp_copy =3D sk_msg_elem_is_copy(msg, i); =20 while (i !=3D msg->sg.end) { msg->sg.data[i] =3D sge; + if (sge_copy) + sk_msg_set_elem_copy(msg, i); + else + sk_msg_clear_elem_copy(msg, i); sk_msg_iter_var_next(i); sge =3D tmp; + sge_copy =3D tmp_copy; tmp =3D sk_msg_elem_cpy(msg, i); + tmp_copy =3D sk_msg_elem_is_copy(msg, i); } + sk_msg_clear_elem_copy(msg, msg->sg.end); } =20 BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start, @@ -3020,8 +3091,10 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u= 32, start, */ if (start !=3D offset) { struct scatterlist *nsge, *sge =3D sk_msg_elem(msg, i); + u32 sge_idx =3D i; int a =3D start - offset; int b =3D sge->length - pop - a; + bool sge_copy =3D sk_msg_elem_is_copy(msg, sge_idx); =20 sk_msg_iter_var_next(i); =20 @@ -3034,6 +3107,10 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u= 32, start, sg_set_page(nsge, sg_page(sge), b, sge->offset + pop + a); + if (sge_copy) + sk_msg_set_elem_copy(msg, i); + else + sk_msg_clear_elem_copy(msg, i); } else { struct page *page, *orig; u8 *to, *from; @@ -3050,6 +3127,7 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u3= 2, start, memcpy(to, from, a); memcpy(to + a, from + a + pop, b); sg_set_page(sge, page, a + b, 0); + sk_msg_clear_elem_copy(msg, sge_idx); put_page(orig); } pop =3D 0; --=20 2.43.0