From nobody Wed Jun 10 08:20:39 2026 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E9283E7BD9 for ; Sat, 16 May 2026 16:43:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.195 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778949825; cv=none; b=p9HZjsH6LUFMF7RZOCEpzHYN0HRHn7EoJwKWhnLCeaQI0ssNa04Lz1xlYXb87ydQ4KyMDEGeaDatXYNg0kZYz8wDTifpwQUDAlMmbeW7SRt6gwe96exHslkfBFdgL59EVI++DEoc91NUWNJ02OGECnRYxtrvY008IdVO0wdsrH4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778949825; c=relaxed/simple; bh=VK6oQzrtgf6YYfW3WkqQQxt0wecu5lgEtccn/P7djnQ=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=S7XL+cnjFfvP4Hl+W9mPL3xXXi4x7fjAQQjqcQOIy3oVvEWuaNarjxY6E57QJfpXLdV0SOTZ2mGFBWcZ1CuxSTe9RiK4pMKZLXi47UrDxWIBm776zI7DmcIXQ3nS6l7q3QCvIsDPUe55Lbrz1/Ap2MR5jWiPa1f2CkNPMZ4SyZ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=m/Rk49AL; arc=none smtp.client-ip=209.85.210.195 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="m/Rk49AL" Received: by mail-pf1-f195.google.com with SMTP id d2e1a72fcca58-83659d38e38so365185b3a.1 for ; Sat, 16 May 2026 09:43:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778949822; x=1779554622; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=aA9KTdiB1VlZvIllZFRx7+iTC5UX5yzMKOX1jBgCP78=; b=m/Rk49ALLW1g2ZK1g4YOD614AUKx7qFC/F/ELNakxInhVQjipE4NUeIL33OqJFxTs6 OBt9+cR/wQJeP1kTBcZrFqIfdodO3GKCaS6PHEhAPCDsUSR+4sugCj8NmndjrtewEPB4 2serkR9BsAbORsFryIeYsN2zxW1jJpRhXmGlgmwwSyKGoR19GajOwpKNVNBtqlEtT+j7 bcgiK2Ymb3e2gsi12y6DCgJskfmxQdRzWDE6uYK4Vev6QQ8bibPkeZ/k9DDlQaq9+06U e44OucPPHgfE8j5B3tyOFBlOZNmhm6Yw6Qu3C6D0UC19/5mumyBmIRQeTYlqyMwHPAUW PdVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778949822; x=1779554622; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=aA9KTdiB1VlZvIllZFRx7+iTC5UX5yzMKOX1jBgCP78=; b=H0VB7FvJBcNvLjbL5yN2JhEuJ1Gv9ReMwA5a2hr6eQEgBU+H3VfWiaR/Xw+ouz68Tw 5zxMx6gNvMh+9T+gJF0eXKMjqwu+FRf3YUNFSN7jU2/yEImk6LcySJGZ4wB5TVaQWm22 0wg+jcVgnWpkXiFer8SjOTZklYTSj5+o4Aqo4mNrLDBNWbeJ/ZEgDrzolAiZ0LtRHb8F R7NZM8oR44xm0eWBV+ZC//BfkJr0c54eGtKeef5j8h6sI/s+3gYHTOapBoQ7ApuYKKVN 5YxJJ7DZRuU29oSsQZlVZ9C2ooUQmxUxpjxAY5p+sBecbrrJ7gb5B2V8BChwO0XD/ghV RiLA== X-Forwarded-Encrypted: i=1; AFNElJ8em+JG7VVO6UZYcx9/6le5JU+1f58mOMJvHJcHtZ/tLDerqog53Xtvydcj+4eQ0eilspIXOI5MeAuF71E=@vger.kernel.org X-Gm-Message-State: AOJu0YxNW8l51cA59FLE/d4hhW5MiVSnV5ZazpI0q3X5yjBDBZajvFdQ m9fuBQLxx/Lc9OBum6pkUGn3eOIbdpC+i9UgJ3jXPiQ0T3qyzxEd7u5l X-Gm-Gg: Acq92OETdc0X1wrc4rMxV2RYND511NgBxapswJJ3aUps3S/DW8gkPci+s+Kr7DLja/v fhotc7xyyN3MKkPTwbPH56NaEBPoRB3cXoSmp+d2wvH6L4TpVRS1k6ppR/GJ6G6DhppitSQX9BW oLufolnLvMN9aXIt2opdudcfonMpTCcYZfU+u7a1gLqpH+I7Qa6cjjrqHNRoaIbcceTPthJeGYt tMIjOXGoPSzr5XEYey9ZY5TUaeTbIOCHCgciSU5q5HCm7dBKSohGZ0xCibjP2njNAjwM/F5iWt/ W/LlMuBRsS5BcGaVmgZXEBujAVpjeJrm3HtYlU/44xEBPlN9wIttjZwy/XjnD7HJn159bqBHupJ rs0O2Cn/O2c8LjbkXlpCh6FC+qXs+9ahkBlbf+BnRHM7PxqK97GkS5BZHxVFi7uYQVKxTyGz1DN Nw7Wdv+s2OWl54UAxYFrSHT/lmtA7wLjGR1ptu/Y/ccA== X-Received: by 2002:a05:6a00:1251:b0:838:6d43:9488 with SMTP id d2e1a72fcca58-83f33c61c17mr8989803b3a.32.1778949821723; Sat, 16 May 2026 09:43:41 -0700 (PDT) Received: from localhost ([111.228.63.84]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-83f19f7ccc7sm11281641b3a.58.2026.05.16.09.43.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 16 May 2026 09:43:41 -0700 (PDT) From: Zhang Cen To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , John Fastabend , Stanislav Fomichev , Jakub Sitnicki Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, zerocling0077@gmail.com, 2045gemini@gmail.com, Zhang Cen , stable@vger.kernel.org Subject: [PATCH] bpf, sockmap: keep sk_msg copy state in sync Date: Sun, 17 May 2026 00:43:19 +0800 Message-Id: <20260516164319.1519418-1-rollkingzzc@gmail.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" SK_MSG helpers use msg->sg.copy as provenance for scatterlist entries that still refer to external or shared pages and must not be exposed through data/data_end. bpf_msg_pull_data(), bpf_msg_push_data() and bpf_msg_pop_data() rewrite the scatterlist ring by compacting, splitting and shifting entries. Those updates move msg->sg.data[] slots around, but leave the parallel copy bitmap behind. A later helper sequence can then move an external entry back to msg->sg.start with its copy bit cleared and make sk_msg_compute_data_pointers() treat it as directly writable packet data. Keep msg->sg.copy synchronized with every scatterlist move, preserve the bit for split tail entries, and clear it whenever a helper replaces an entry with a freshly allocated private page. Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data") Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data") Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages") Cc: stable@vger.kernel.org Co-developed-by: Han Guidong <2045gemini@gmail.com> Signed-off-by: Han Guidong <2045gemini@gmail.com> Signed-off-by: Zhang Cen --- While researching recent page cache bugs, we discovered this bug. We confir= med it allows overwriting the page cache of read-only files via splice(). W= e haven't attempted to write an exploit, but the corruption primitive is ve= rified. PoC available upon request. Recommend fixing ASAP. --- net/core/filter.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 9590877b0714f..352233da29429 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2654,6 +2654,19 @@ static void sk_msg_reset_curr(struct sk_msg *msg) } } =20 +static bool sk_msg_elem_is_copy(const struct sk_msg *msg, u32 i) +{ + return test_bit(i, msg->sg.copy); +} + +static void sk_msg_set_elem_copy(struct sk_msg *msg, u32 i, bool copy) +{ + if (copy) + __set_bit(i, msg->sg.copy); + else + __clear_bit(i, msg->sg.copy); +} + static const struct bpf_func_proto bpf_msg_cork_bytes_proto =3D { .func =3D bpf_msg_cork_bytes, .gpl_only =3D false, @@ -2738,6 +2751,8 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u= 32, start, } while (i !=3D last_sge); =20 sg_set_page(&msg->sg.data[first_sge], page, copy, 0); + sk_msg_set_elem_copy(msg, first_sge, false); + sk_msg_set_elem_copy(msg, msg->sg.end, false); =20 /* To repair sg ring we need to shift entries. If we only * had a single entry though we can just replace it and @@ -2754,6 +2769,7 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg, u= 32, start, sk_msg_iter_var_next(i); do { u32 move_from; + bool move_copy; =20 if (i + shift >=3D NR_MSG_FRAG_IDS) move_from =3D i + shift - NR_MSG_FRAG_IDS; @@ -2762,10 +2778,13 @@ BPF_CALL_4(bpf_msg_pull_data, struct sk_msg *, msg,= u32, start, if (move_from =3D=3D msg->sg.end) break; =20 + move_copy =3D sk_msg_elem_is_copy(msg, move_from); msg->sg.data[i] =3D msg->sg.data[move_from]; + sk_msg_set_elem_copy(msg, i, move_copy); msg->sg.data[move_from].length =3D 0; msg->sg.data[move_from].page_link =3D 0; msg->sg.data[move_from].offset =3D 0; + sk_msg_set_elem_copy(msg, move_from, false); sk_msg_iter_var_next(i); } while (1); =20 @@ -2794,6 +2813,8 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u= 32, start, { struct scatterlist sge, nsge, nnsge, rsge =3D {0}, *psge; u32 new, i =3D 0, l =3D 0, space, copy =3D 0, offset =3D 0; + bool sge_copy =3D false, nsge_copy =3D false, nnsge_copy =3D false; + bool rsge_copy =3D false; u8 *raw, *to, *from; struct page *page; =20 @@ -2866,6 +2887,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u= 32, start, sk_msg_iter_var_prev(i); psge =3D sk_msg_elem(msg, i); rsge =3D sk_msg_elem_cpy(msg, i); + rsge_copy =3D sk_msg_elem_is_copy(msg, i); =20 psge->length =3D start - offset; rsge.length -=3D psge->length; @@ -2891,23 +2913,31 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg,= u32, start, /* Shift one or two slots as needed */ sge =3D sk_msg_elem_cpy(msg, new); sg_unmark_end(&sge); + sge_copy =3D sk_msg_elem_is_copy(msg, new); =20 nsge =3D sk_msg_elem_cpy(msg, i); + nsge_copy =3D sk_msg_elem_is_copy(msg, i); if (rsge.length) { sk_msg_iter_var_next(i); nnsge =3D sk_msg_elem_cpy(msg, i); + nnsge_copy =3D sk_msg_elem_is_copy(msg, i); sk_msg_iter_next(msg, end); } =20 while (i !=3D msg->sg.end) { msg->sg.data[i] =3D sge; + sk_msg_set_elem_copy(msg, i, sge_copy); sge =3D nsge; + sge_copy =3D nsge_copy; sk_msg_iter_var_next(i); if (rsge.length) { nsge =3D nnsge; + nsge_copy =3D nnsge_copy; nnsge =3D sk_msg_elem_cpy(msg, i); + nnsge_copy =3D sk_msg_elem_is_copy(msg, i); } else { nsge =3D sk_msg_elem_cpy(msg, i); + nsge_copy =3D sk_msg_elem_is_copy(msg, i); } } =20 @@ -2915,13 +2945,15 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg,= u32, start, /* Place newly allocated data buffer */ sk_mem_charge(msg->sk, len); msg->sg.size +=3D len; - __clear_bit(new, msg->sg.copy); + sk_msg_set_elem_copy(msg, new, false); sg_set_page(&msg->sg.data[new], page, len + copy, 0); if (rsge.length) { get_page(sg_page(&rsge)); sk_msg_iter_var_next(new); msg->sg.data[new] =3D rsge; + sk_msg_set_elem_copy(msg, new, rsge_copy); } + sk_msg_set_elem_copy(msg, msg->sg.end, false); =20 sk_msg_reset_curr(msg); sk_msg_compute_data_pointers(msg); @@ -2945,29 +2977,41 @@ static void sk_msg_shift_left(struct sk_msg *msg, i= nt i) =20 put_page(sg_page(sge)); do { + bool copy; + prev =3D i; sk_msg_iter_var_next(i); + copy =3D sk_msg_elem_is_copy(msg, i); msg->sg.data[prev] =3D msg->sg.data[i]; + sk_msg_set_elem_copy(msg, prev, copy); } while (i !=3D msg->sg.end); =20 sk_msg_iter_prev(msg, end); + sk_msg_set_elem_copy(msg, msg->sg.end, false); } =20 static void sk_msg_shift_right(struct sk_msg *msg, int i) { struct scatterlist tmp, sge; + bool tmp_copy, sge_copy; =20 sk_msg_iter_next(msg, end); sge =3D sk_msg_elem_cpy(msg, i); + sge_copy =3D sk_msg_elem_is_copy(msg, i); sk_msg_iter_var_next(i); tmp =3D sk_msg_elem_cpy(msg, i); + tmp_copy =3D sk_msg_elem_is_copy(msg, i); =20 while (i !=3D msg->sg.end) { msg->sg.data[i] =3D sge; + sk_msg_set_elem_copy(msg, i, sge_copy); sk_msg_iter_var_next(i); sge =3D tmp; + sge_copy =3D tmp_copy; tmp =3D sk_msg_elem_cpy(msg, i); + tmp_copy =3D sk_msg_elem_is_copy(msg, i); } + sk_msg_set_elem_copy(msg, msg->sg.end, false); } =20 BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start, @@ -3024,8 +3068,10 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u= 32, start, */ if (start !=3D offset) { struct scatterlist *nsge, *sge =3D sk_msg_elem(msg, i); + u32 sge_idx =3D i; int a =3D start - offset; int b =3D sge->length - pop - a; + bool sge_copy =3D sk_msg_elem_is_copy(msg, sge_idx); =20 sk_msg_iter_var_next(i); =20 @@ -3038,6 +3084,7 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u3= 2, start, sg_set_page(nsge, sg_page(sge), b, sge->offset + pop + a); + sk_msg_set_elem_copy(msg, i, sge_copy); } else { struct page *page, *orig; u8 *to, *from; @@ -3054,6 +3101,7 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u3= 2, start, memcpy(to, from, a); memcpy(to + a, from + a + pop, b); sg_set_page(sge, page, a + b, 0); + sk_msg_set_elem_copy(msg, sge_idx, false); put_page(orig); } pop =3D 0; --=20 2.43.0