From nobody Tue Feb 10 09:58:02 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12060191F84; Sat, 18 Oct 2025 04:04:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760760286; cv=none; b=hOZfLJPAxcdYH/CTKnpszmmJY6fafZpx6EgOgrkqTe+hqzKNQYkmkGoTTvpE2lB8BYWuIzkAVxEOAdV52E1VJiUNb0djyFoo8BUsUeopadeCSL2TlNC7SCoMgYprzrih7zeKK7vRcKAN9JgxtcR/hV5vFBN0nEdSdxBttoVHxJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760760286; c=relaxed/simple; bh=CASKKxocOPmLTgVU6t9PMhMoTEnDMtR+9RUSQmyxZA4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KZ3/zsmawzzP4T0RiUUDV9xihtejRqufEQO3d5kY+lR/ywBIeWrlhLoB0mZKxqWzb8zE2TXX/8NaiocxNBcghGy9y972jJvMwxhXDt/3AL14Cac3VBuXRfK7aH2CrboGV7FuIftDVKbr3uve1LYofDWjD7x+Jh7WCU3zuJpHdM0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4cpSjc6w0GzYQtjF; Sat, 18 Oct 2025 12:03:48 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 0AF191A0FE9; Sat, 18 Oct 2025 12:04:36 +0800 (CST) Received: from k-arm6401.huawei.com (unknown [7.217.19.243]) by APP2 (Coremail) with SMTP id Syh0CgA32UHPEfNoOeb+Ag--.21556S3; Sat, 18 Oct 2025 12:04:35 +0800 (CST) From: Xu Kuohai To: bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Yonghong Song , Song Liu Subject: [PATCH bpf-next v3 1/3] bpf: Add overwrite mode for BPF ring buffer Date: Sat, 18 Oct 2025 11:57:36 +0800 Message-ID: <20251018035738.4039621-2-xukuohai@huaweicloud.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251018035738.4039621-1-xukuohai@huaweicloud.com> References: <20251018035738.4039621-1-xukuohai@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgA32UHPEfNoOeb+Ag--.21556S3 X-Coremail-Antispam: 1UD129KBjvAXoWfJrWUGF13tr48XF13AFW3trb_yoW8Cw1kGo WxZa1xuF48Cr1DZrWUG3Z7GF15CryDGF9rJr43uw13CFyDJFZFqry3tFs5W3Z8Xrn8GF1D Cw1DJr1Utrs8Jr1Un29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOs7kC6x804xWl14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK 8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr 4l82xGYIkIc2x26xkF7I0E14v26r4j6ryUM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AK xVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ew Av7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY 6r1j6r4UM4x0Y48IcxkI7VAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS14 v26r1q6r43MxkF7I0Ew4C26cxK6c8Ij28IcwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE 7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI 8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWU CwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBI daVFxhVjvjDU0xZFpf9x07jI0PfUUUUU= X-CM-SenderInfo: 50xn30hkdlqx5xdzvxpfor3voofrz/ From: Xu Kuohai When the BPF ring buffer is full, a new event cannot be recorded until one or more old events are consumed to make enough space for it. In cases such as fault diagnostics, where recent events are more useful than older ones, this mechanism may lead to critical events being lost. So add overwrite mode for BPF ring buffer to address it. In this mode, the new event overwrites the oldest event when the buffer is full. The basic idea is as follows: 1. producer_pos tracks the next position to record new event. When there is enough free space, producer_pos is simply advanced by producer to make space for the new event. 2. To avoid waiting for consumer when the buffer is full, a new variable, overwrite_pos, is introduced for producer. It points to the oldest event committed in the buffer. It is advanced by producer to discard one or mo= re oldest events to make space for the new event when the buffer is full. 3. pending_pos tracks the oldest event to be committed. pending_pos is never passed by producer_pos, so multiple producers never write to the same position at the same time. The following example diagrams show how it works in a 4096-byte ring buffer. 1. At first, {producer,overwrite,pending,consumer}_pos are all set to 0. 0 512 1024 1536 2048 2560 3072 3584 = 4096 +-----------------------------------------------------------------------+ | | | | | | +-----------------------------------------------------------------------+ ^ | | producer_pos =3D 0 overwrite_pos =3D 0 pending_pos =3D 0 consumer_pos =3D 0 2. Now reserve a 512-byte event A. There is enough free space, so A is allocated at offset 0. And producer_= pos is advanced to 512, the end of A. Since A is not submitted, the BUSY bit= is set. 0 512 1024 1536 2048 2560 3072 3584 = 4096 +-----------------------------------------------------------------------+ | | | | A | | | [BUSY] | | +-----------------------------------------------------------------------+ ^ ^ | | | | | producer_pos =3D 512 | overwrite_pos =3D 0 pending_pos =3D 0 consumer_pos =3D 0 3. Reserve event B, size 1024. B is allocated at offset 512 with BUSY bit set, and producer_pos is adva= nced to the end of B. 0 512 1024 1536 2048 2560 3072 3584 = 4096 +-----------------------------------------------------------------------+ | | | | | A | B | | | [BUSY] | [BUSY] | | +-----------------------------------------------------------------------+ ^ ^ | | | | | producer_pos =3D 1536 | overwrite_pos =3D 0 pending_pos =3D 0 consumer_pos =3D 0 4. Reserve event C, size 2048. C is allocated at offset 1536, and producer_pos is advanced to 3584. 0 512 1024 1536 2048 2560 3072 3584 = 4096 +-----------------------------------------------------------------------+ | | | | | | A | B | C | | | [BUSY] | [BUSY] | [BUSY] | | +-----------------------------------------------------------------------+ ^ ^ | | | | | producer_pos =3D 35= 84 | overwrite_pos =3D 0 pending_pos =3D 0 consumer_pos =3D 0 5. Submit event A. The BUSY bit of A is cleared. B becomes the oldest event to be committed= , so pending_pos is advanced to 512, the start of B. 0 512 1024 1536 2048 2560 3072 3584 = 4096 +-----------------------------------------------------------------------+ | | | | | | A | B | C | | | | [BUSY] | [BUSY] | | +-----------------------------------------------------------------------+ ^ ^ ^ | | | | | | | pending_pos =3D 512 producer_pos = =3D 3584 | overwrite_pos =3D 0 consumer_pos =3D 0 6. Submit event B. The BUSY bit of B is cleared, and pending_pos is advanced to the start o= f C, which is now the oldest event to be committed. 0 512 1024 1536 2048 2560 3072 3584 = 4096 +-----------------------------------------------------------------------+ | | | | | | A | B | C | | | | | [BUSY] | | +-----------------------------------------------------------------------+ ^ ^ ^ | | | | | | | pending_pos =3D 1536 producer_pos = =3D 3584 | overwrite_pos =3D 0 consumer_pos =3D 0 7. Reserve event D, size 1536 (3 * 512). There are 2048 bytes not being written between producer_pos (currently 3= 584) and pending_pos, so D is allocated at offset 3584, and producer_pos is a= dvanced by 1536 (from 3584 to 5120). Since event D will overwrite all bytes of event A and the first 512 byte= s of event B, overwrite_pos is advanced to the start of event C, the oldest e= vent that is not overwritten. 0 512 1024 1536 2048 2560 3072 3584 = 4096 +-----------------------------------------------------------------------+ | | | | | | D End | | C | D Begin| | [BUSY] | | [BUSY] | [BUSY] | +-----------------------------------------------------------------------+ ^ ^ ^ | | | | | pending_pos =3D 1536 | | overwrite_pos =3D 1536 | | | producer_pos=3D5120 | consumer_pos =3D 0 8. Reserve event E, size 1024. Although there are 512 bytes not being written between producer_pos and pending_pos, E cannot be reserved, as it would overwrite the first 512 bytes of event C, which is still being written. 9. Submit event C and D. pending_pos is advanced to the end of D. 0 512 1024 1536 2048 2560 3072 3584 = 4096 +-----------------------------------------------------------------------+ | | | | | | D End | | C | D Begin| | | | | | +-----------------------------------------------------------------------+ ^ ^ ^ | | | | | overwrite_pos =3D 1536 | | | producer_pos=3D5120 | pending_pos=3D5120 | consumer_pos =3D 0 The performance data for overwrite mode will be provided in a follow-up patch that adds overwrite-mode benchmarks. A sample of performance data for non-overwrite mode, collected on an x86_64 CPU and an arm64 CPU, before and after this patch, is shown below. As we can see, no obvious performance regression occurs. - x86_64 (AMD EPYC 9654) Before: Ringbuf, multi-producer contention =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D rb-libbpf nr_prod 1 11.623 =C2=B1 0.027M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 2 15.812 =C2=B1 0.014M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 3 7.871 =C2=B1 0.003M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 4 6.703 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 8 2.896 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 12 2.054 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 16 1.864 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 20 1.580 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 24 1.484 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 28 1.369 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 32 1.316 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 36 1.272 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 40 1.239 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 44 1.226 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 48 1.213 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 52 1.193 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) After: Ringbuf, multi-producer contention =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D rb-libbpf nr_prod 1 11.845 =C2=B1 0.036M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 2 15.889 =C2=B1 0.006M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 3 8.155 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 4 6.708 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 8 2.918 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 12 2.065 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 16 1.870 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 20 1.582 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 24 1.482 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 28 1.372 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 32 1.323 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 36 1.264 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 40 1.236 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 44 1.209 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 48 1.189 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 52 1.165 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) - arm64 (HiSilicon Kunpeng 920) Before: Ringbuf, multi-producer contention =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D rb-libbpf nr_prod 1 11.310 =C2=B1 0.623M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 2 9.947 =C2=B1 0.004M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 3 6.634 =C2=B1 0.011M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 4 4.502 =C2=B1 0.003M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 8 3.888 =C2=B1 0.003M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 12 3.372 =C2=B1 0.005M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 16 3.189 =C2=B1 0.010M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 20 2.998 =C2=B1 0.006M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 24 3.086 =C2=B1 0.018M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 28 2.845 =C2=B1 0.004M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 32 2.815 =C2=B1 0.008M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 36 2.771 =C2=B1 0.009M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 40 2.814 =C2=B1 0.011M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 44 2.752 =C2=B1 0.006M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 48 2.695 =C2=B1 0.006M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 52 2.710 =C2=B1 0.006M/s (drops 0.000 =C2=B1 0.000M/s) After: Ringbuf, multi-producer contention =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D rb-libbpf nr_prod 1 11.283 =C2=B1 0.550M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 2 9.993 =C2=B1 0.003M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 3 6.898 =C2=B1 0.006M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 4 5.257 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 8 3.830 =C2=B1 0.005M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 12 3.528 =C2=B1 0.013M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 16 3.265 =C2=B1 0.018M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 20 2.990 =C2=B1 0.007M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 24 2.929 =C2=B1 0.014M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 28 2.898 =C2=B1 0.010M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 32 2.818 =C2=B1 0.006M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 36 2.789 =C2=B1 0.012M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 40 2.770 =C2=B1 0.006M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 44 2.651 =C2=B1 0.007M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 48 2.669 =C2=B1 0.005M/s (drops 0.000 =C2=B1 0.000M/s) rb-libbpf nr_prod 52 2.695 =C2=B1 0.009M/s (drops 0.000 =C2=B1 0.000M/s) Signed-off-by: Xu Kuohai --- include/uapi/linux/bpf.h | 4 ++ kernel/bpf/ringbuf.c | 109 +++++++++++++++++++++++++++------ tools/include/uapi/linux/bpf.h | 4 ++ 3 files changed, 98 insertions(+), 19 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 6829936d33f5..9fbbbc3dc490 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1430,6 +1430,9 @@ enum { =20 /* Do not translate kernel bpf_arena pointers to user pointers */ BPF_F_NO_USER_CONV =3D (1U << 18), + +/* Enable BPF ringbuf overwrite mode */ + BPF_F_RB_OVERWRITE =3D (1U << 19), }; =20 /* Flags for BPF_PROG_QUERY. */ @@ -6231,6 +6234,7 @@ enum { BPF_RB_RING_SIZE =3D 1, BPF_RB_CONS_POS =3D 2, BPF_RB_PROD_POS =3D 3, + BPF_RB_OVERWRITE_POS =3D 4, }; =20 /* BPF ring buffer constants */ diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index 719d73299397..821929da778e 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -13,7 +13,7 @@ #include #include =20 -#define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE) +#define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE | BPF_F_RB_OVERWRITE) =20 /* non-mmap()'able part of bpf_ringbuf (everything up to consumer page) */ #define RINGBUF_PGOFF \ @@ -30,6 +30,7 @@ struct bpf_ringbuf { u64 mask; struct page **pages; int nr_pages; + bool overwrite_mode; rqspinlock_t spinlock ____cacheline_aligned_in_smp; /* For user-space producer ring buffers, an atomic_t busy bit is used * to synchronize access to the ring buffers in the kernel, rather than @@ -72,6 +73,8 @@ struct bpf_ringbuf { */ unsigned long consumer_pos __aligned(PAGE_SIZE); unsigned long producer_pos __aligned(PAGE_SIZE); + /* points to the record right after the last overwritten one */ + unsigned long overwrite_pos; unsigned long pending_pos; char data[] __aligned(PAGE_SIZE); }; @@ -166,7 +169,7 @@ static void bpf_ringbuf_notify(struct irq_work *work) * considering that the maximum value of data_sz is (4GB - 1), there * will be no overflow, so just note the size limit in the comments. */ -static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node) +static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node= , bool overwrite_mode) { struct bpf_ringbuf *rb; =20 @@ -183,17 +186,25 @@ static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t d= ata_sz, int numa_node) rb->consumer_pos =3D 0; rb->producer_pos =3D 0; rb->pending_pos =3D 0; + rb->overwrite_mode =3D overwrite_mode; =20 return rb; } =20 static struct bpf_map *ringbuf_map_alloc(union bpf_attr *attr) { + bool overwrite_mode =3D false; struct bpf_ringbuf_map *rb_map; =20 if (attr->map_flags & ~RINGBUF_CREATE_FLAG_MASK) return ERR_PTR(-EINVAL); =20 + if (attr->map_flags & BPF_F_RB_OVERWRITE) { + if (attr->map_type =3D=3D BPF_MAP_TYPE_USER_RINGBUF) + return ERR_PTR(-EINVAL); + overwrite_mode =3D true; + } + if (attr->key_size || attr->value_size || !is_power_of_2(attr->max_entries) || !PAGE_ALIGNED(attr->max_entries)) @@ -205,7 +216,7 @@ static struct bpf_map *ringbuf_map_alloc(union bpf_attr= *attr) =20 bpf_map_init_from_attr(&rb_map->map, attr); =20 - rb_map->rb =3D bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node= ); + rb_map->rb =3D bpf_ringbuf_alloc(attr->max_entries, rb_map->map.numa_node= , overwrite_mode); if (!rb_map->rb) { bpf_map_area_free(rb_map); return ERR_PTR(-ENOMEM); @@ -293,13 +304,25 @@ static int ringbuf_map_mmap_user(struct bpf_map *map,= struct vm_area_struct *vma return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF= ); } =20 +/* Return an estimate of the available data in the ring buffer. + * Note: the returned value can exceed the actual ring buffer size because= the + * function is not synchronized with the producer. The producer acquires t= he + * ring buffer's spinlock, but this function does not. + */ static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb) { - unsigned long cons_pos, prod_pos; + unsigned long cons_pos, prod_pos, over_pos; =20 cons_pos =3D smp_load_acquire(&rb->consumer_pos); - prod_pos =3D smp_load_acquire(&rb->producer_pos); - return prod_pos - cons_pos; + + if (unlikely(rb->overwrite_mode)) { + over_pos =3D smp_load_acquire(&rb->overwrite_pos); + prod_pos =3D smp_load_acquire(&rb->producer_pos); + return prod_pos - max(cons_pos, over_pos); + } else { + prod_pos =3D smp_load_acquire(&rb->producer_pos); + return prod_pos - cons_pos; + } } =20 static u32 ringbuf_total_data_sz(const struct bpf_ringbuf *rb) @@ -402,11 +425,41 @@ bpf_ringbuf_restore_from_rec(struct bpf_ringbuf_hdr *= hdr) return (void*)((addr & PAGE_MASK) - off); } =20 +static bool bpf_ringbuf_has_space(const struct bpf_ringbuf *rb, + unsigned long new_prod_pos, + unsigned long cons_pos, + unsigned long pend_pos) +{ + /* no space if oldest not yet committed record until the newest + * record span more than (ringbuf_size - 1). + */ + if (new_prod_pos - pend_pos > rb->mask) + return false; + + /* ok, we have space in overwrite mode */ + if (unlikely(rb->overwrite_mode)) + return true; + + /* no space if producer position advances more than (ringbuf_size - 1) + * ahead of consumer position when not in overwrite mode. + */ + if (new_prod_pos - cons_pos > rb->mask) + return false; + + return true; +} + +static u32 bpf_ringbuf_round_up_hdr_len(u32 hdr_len) +{ + hdr_len &=3D ~BPF_RINGBUF_DISCARD_BIT; + return round_up(hdr_len + BPF_RINGBUF_HDR_SZ, 8); +} + static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size) { - unsigned long cons_pos, prod_pos, new_prod_pos, pend_pos, flags; + unsigned long cons_pos, prod_pos, new_prod_pos, pend_pos, over_pos, flags; struct bpf_ringbuf_hdr *hdr; - u32 len, pg_off, tmp_size, hdr_len; + u32 len, pg_off, hdr_len; =20 if (unlikely(size > RINGBUF_MAX_RECORD_SZ)) return NULL; @@ -429,24 +482,40 @@ static void *__bpf_ringbuf_reserve(struct bpf_ringbuf= *rb, u64 size) hdr_len =3D READ_ONCE(hdr->len); if (hdr_len & BPF_RINGBUF_BUSY_BIT) break; - tmp_size =3D hdr_len & ~BPF_RINGBUF_DISCARD_BIT; - tmp_size =3D round_up(tmp_size + BPF_RINGBUF_HDR_SZ, 8); - pend_pos +=3D tmp_size; + pend_pos +=3D bpf_ringbuf_round_up_hdr_len(hdr_len); } rb->pending_pos =3D pend_pos; =20 - /* check for out of ringbuf space: - * - by ensuring producer position doesn't advance more than - * (ringbuf_size - 1) ahead - * - by ensuring oldest not yet committed record until newest - * record does not span more than (ringbuf_size - 1) - */ - if (new_prod_pos - cons_pos > rb->mask || - new_prod_pos - pend_pos > rb->mask) { + if (!bpf_ringbuf_has_space(rb, new_prod_pos, cons_pos, pend_pos)) { raw_res_spin_unlock_irqrestore(&rb->spinlock, flags); return NULL; } =20 + /* In overwrite mode, advance overwrite_pos when the ring buffer is full. + * The key points are to stay on record boundaries and consume enough rec= ords + * to fit the new one. + */ + if (unlikely(rb->overwrite_mode)) { + over_pos =3D rb->overwrite_pos; + while (new_prod_pos - over_pos > rb->mask) { + hdr =3D (void *)rb->data + (over_pos & rb->mask); + hdr_len =3D READ_ONCE(hdr->len); + /* The bpf_ringbuf_has_space() check above ensures we won=E2=80=99t + * step over a record currently being worked on by another + * producer. + */ + over_pos +=3D bpf_ringbuf_round_up_hdr_len(hdr_len); + } + /* smp_store_release(&rb->producer_pos, new_prod_pos) at + * the end of the function ensures that when consumer sees + * the updated rb->producer_pos, it always sees the updated + * rb->overwrite_pos, so when consumer reads overwrite_pos + * after smp_load_acquire(r->producer_pos), the overwrite_pos + * will always be valid. + */ + WRITE_ONCE(rb->overwrite_pos, over_pos); + } + hdr =3D (void *)rb->data + (prod_pos & rb->mask); pg_off =3D bpf_ringbuf_rec_pg_off(rb, hdr); hdr->len =3D size | BPF_RINGBUF_BUSY_BIT; @@ -576,6 +645,8 @@ BPF_CALL_2(bpf_ringbuf_query, struct bpf_map *, map, u6= 4, flags) return smp_load_acquire(&rb->consumer_pos); case BPF_RB_PROD_POS: return smp_load_acquire(&rb->producer_pos); + case BPF_RB_OVERWRITE_POS: + return smp_load_acquire(&rb->overwrite_pos); default: return 0; } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 6829936d33f5..9fbbbc3dc490 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1430,6 +1430,9 @@ enum { =20 /* Do not translate kernel bpf_arena pointers to user pointers */ BPF_F_NO_USER_CONV =3D (1U << 18), + +/* Enable BPF ringbuf overwrite mode */ + BPF_F_RB_OVERWRITE =3D (1U << 19), }; =20 /* Flags for BPF_PROG_QUERY. */ @@ -6231,6 +6234,7 @@ enum { BPF_RB_RING_SIZE =3D 1, BPF_RB_CONS_POS =3D 2, BPF_RB_PROD_POS =3D 3, + BPF_RB_OVERWRITE_POS =3D 4, }; =20 /* BPF ring buffer constants */ --=20 2.43.0 From nobody Tue Feb 10 09:58:02 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 119879443; Sat, 18 Oct 2025 04:04:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760760287; cv=none; b=o0XuCW+Uh+1dS5aCO5Hva+fykT4u9DOfIhcl5+TlEaFnu2W+1Tjj8chPNAAP/WTSIjYhA0D7afx7CJuSv4UtLbZEyE3IGkwTKpBaSdrPc+ropD9NjrAna3/WVL9zbFefKGvbMSWJ/mz/QJbn8Lw7002bkV6hjgoS11jBXFcY4oM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760760287; c=relaxed/simple; bh=EGot/d/BKG/mhnUls92EnB/ESEjY2JYD89ix94m0Pig=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OaXizK6JZ0dOvTL3Jm1uEXbQq2+tAiWGCofh+dT9cfKLisfmmfRwXcLnWEWaRjYyrjSNuIcoS45PfXTCt9sLCgOfkJcZpJoj8bZfwlvK7SjTd86E1pTNB9yAqQOpQ6LB+iyOgNYb2CBGo5SX4a1dnKuTqST/HaR59entXdLHkdU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4cpSjd3cZBzYQtjF; Sat, 18 Oct 2025 12:03:49 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 906801A0FEE; Sat, 18 Oct 2025 12:04:36 +0800 (CST) Received: from k-arm6401.huawei.com (unknown [7.217.19.243]) by APP2 (Coremail) with SMTP id Syh0CgA32UHPEfNoOeb+Ag--.21556S4; Sat, 18 Oct 2025 12:04:36 +0800 (CST) From: Xu Kuohai To: bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Yonghong Song , Song Liu Subject: [PATCH bpf-next v3 2/3] selftests/bpf: Add overwrite mode test for BPF ring buffer Date: Sat, 18 Oct 2025 11:57:37 +0800 Message-ID: <20251018035738.4039621-3-xukuohai@huaweicloud.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251018035738.4039621-1-xukuohai@huaweicloud.com> References: <20251018035738.4039621-1-xukuohai@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgA32UHPEfNoOeb+Ag--.21556S4 X-Coremail-Antispam: 1UD129KBjvJXoW3Ww18tw4DuF1ruw13uw1ftFb_yoWxuw1kpa yFgryYkryIg3WvgrZ7uFyxZFW8ur4DAw4rKr47Xw18Zr1DCFsxXr1Ikr1UtFn8XrW8Xr1Y k34a9FZ3A3WUKFUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmab4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUXw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI 0_Jw0_GFylc7CjxVAKzI0EY4vE52x082I5MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCj c4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4 CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1x MIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsG vfC2KfnxnUUI43ZEXa7IU8go7tUUUUU== X-CM-SenderInfo: 50xn30hkdlqx5xdzvxpfor3voofrz/ Content-Type: text/plain; charset="utf-8" From: Xu Kuohai Add overwrite mode test for BPF ring buffer. The test creates a BPF ring buffer in overwrite mode, then repeatedly reserves and commits records to check if the ring buffer works as expected both before and after overwriting occurs. Signed-off-by: Xu Kuohai --- tools/testing/selftests/bpf/Makefile | 3 +- .../selftests/bpf/prog_tests/ringbuf.c | 64 ++++++++++++ .../bpf/progs/test_ringbuf_overwrite.c | 98 +++++++++++++++++++ 3 files changed, 164 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf_overwrit= e.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests= /bpf/Makefile index f00587d4ede6..43d133bf514d 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -498,7 +498,8 @@ LINKED_SKELS :=3D test_static_linked.skel.h linked_func= s.skel.h \ =20 LSKELS :=3D fexit_sleep.c trace_printk.c trace_vprintk.c map_ptr_kern.c \ core_kern.c core_kern_overflow.c test_ringbuf.c \ - test_ringbuf_n.c test_ringbuf_map_key.c test_ringbuf_write.c + test_ringbuf_n.c test_ringbuf_map_key.c test_ringbuf_write.c \ + test_ringbuf_overwrite.c =20 LSKELS_SIGNED :=3D fentry_test.c fexit_test.c atomics.c =20 diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c b/tools/testi= ng/selftests/bpf/prog_tests/ringbuf.c index d1e4cb28a72c..5264af1dc768 100644 --- a/tools/testing/selftests/bpf/prog_tests/ringbuf.c +++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c @@ -17,6 +17,7 @@ #include "test_ringbuf_n.lskel.h" #include "test_ringbuf_map_key.lskel.h" #include "test_ringbuf_write.lskel.h" +#include "test_ringbuf_overwrite.lskel.h" =20 #define EDONE 7777 =20 @@ -497,6 +498,67 @@ static void ringbuf_map_key_subtest(void) test_ringbuf_map_key_lskel__destroy(skel_map_key); } =20 +static void ringbuf_overwrite_mode_subtest(void) +{ + unsigned long size, len1, len2, len3, len4, len5; + unsigned long expect_avail_data, expect_prod_pos, expect_over_pos; + struct test_ringbuf_overwrite_lskel *skel; + int err; + + skel =3D test_ringbuf_overwrite_lskel__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return; + + size =3D 0x1000; + len1 =3D 0x800; + len2 =3D 0x400; + len3 =3D size - len1 - len2 - BPF_RINGBUF_HDR_SZ * 3; /* 0x3e8 */ + len4 =3D len3 - 8; /* 0x3e0 */ + len5 =3D len3; /* retry with len3 */ + + skel->maps.ringbuf.max_entries =3D size; + skel->rodata->LEN1 =3D len1; + skel->rodata->LEN2 =3D len2; + skel->rodata->LEN3 =3D len3; + skel->rodata->LEN4 =3D len4; + skel->rodata->LEN5 =3D len5; + + skel->bss->pid =3D getpid(); + + err =3D test_ringbuf_overwrite_lskel__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + + err =3D test_ringbuf_overwrite_lskel__attach(skel); + if (!ASSERT_OK(err, "skel_attach")) + goto cleanup; + + syscall(__NR_getpgid); + + ASSERT_EQ(skel->bss->reserve1_fail, 0, "reserve 1"); + ASSERT_EQ(skel->bss->reserve2_fail, 0, "reserve 2"); + ASSERT_EQ(skel->bss->reserve3_fail, 1, "reserve 3"); + ASSERT_EQ(skel->bss->reserve4_fail, 0, "reserve 4"); + ASSERT_EQ(skel->bss->reserve5_fail, 0, "reserve 5"); + + ASSERT_EQ(skel->bss->ring_size, size, "check_ring_size"); + + expect_avail_data =3D len2 + len4 + len5 + 3 * BPF_RINGBUF_HDR_SZ; + ASSERT_EQ(skel->bss->avail_data, expect_avail_data, "check_avail_size"); + + ASSERT_EQ(skel->bss->cons_pos, 0, "check_cons_pos"); + + expect_prod_pos =3D len1 + len2 + len4 + len5 + 4 * BPF_RINGBUF_HDR_SZ; + ASSERT_EQ(skel->bss->prod_pos, expect_prod_pos, "check_prod_pos"); + + expect_over_pos =3D len1 + BPF_RINGBUF_HDR_SZ; + ASSERT_EQ(skel->bss->over_pos, expect_over_pos, "check_over_pos"); + + test_ringbuf_overwrite_lskel__detach(skel); +cleanup: + test_ringbuf_overwrite_lskel__destroy(skel); +} + void test_ringbuf(void) { if (test__start_subtest("ringbuf")) @@ -507,4 +569,6 @@ void test_ringbuf(void) ringbuf_map_key_subtest(); if (test__start_subtest("ringbuf_write")) ringbuf_write_subtest(); + if (test__start_subtest("ringbuf_overwrite_mode")) + ringbuf_overwrite_mode_subtest(); } diff --git a/tools/testing/selftests/bpf/progs/test_ringbuf_overwrite.c b/t= ools/testing/selftests/bpf/progs/test_ringbuf_overwrite.c new file mode 100644 index 000000000000..ff4aa67ddacc --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_ringbuf_overwrite.c @@ -0,0 +1,98 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2025. Huawei Technologies Co., Ltd */ + +#include +#include +#include "bpf_misc.h" + +char _license[] SEC("license") =3D "GPL"; + +struct { + __uint(type, BPF_MAP_TYPE_RINGBUF); + __uint(map_flags, BPF_F_RB_OVERWRITE); +} ringbuf SEC(".maps"); + +int pid; + +const volatile unsigned long LEN1; +const volatile unsigned long LEN2; +const volatile unsigned long LEN3; +const volatile unsigned long LEN4; +const volatile unsigned long LEN5; + +long reserve1_fail =3D 0; +long reserve2_fail =3D 0; +long reserve3_fail =3D 0; +long reserve4_fail =3D 0; +long reserve5_fail =3D 0; + +unsigned long avail_data =3D 0; +unsigned long ring_size =3D 0; +unsigned long cons_pos =3D 0; +unsigned long prod_pos =3D 0; +unsigned long over_pos =3D 0; + +SEC("fentry/" SYS_PREFIX "sys_getpgid") +int test_overwrite_ringbuf(void *ctx) +{ + char *rec1, *rec2, *rec3, *rec4, *rec5; + int cur_pid =3D bpf_get_current_pid_tgid() >> 32; + + if (cur_pid !=3D pid) + return 0; + + rec1 =3D bpf_ringbuf_reserve(&ringbuf, LEN1, 0); + if (!rec1) { + reserve1_fail =3D 1; + return 0; + } + + rec2 =3D bpf_ringbuf_reserve(&ringbuf, LEN2, 0); + if (!rec2) { + bpf_ringbuf_discard(rec1, 0); + reserve2_fail =3D 1; + return 0; + } + + rec3 =3D bpf_ringbuf_reserve(&ringbuf, LEN3, 0); + /* expect failure */ + if (!rec3) { + reserve3_fail =3D 1; + } else { + bpf_ringbuf_discard(rec1, 0); + bpf_ringbuf_discard(rec2, 0); + bpf_ringbuf_discard(rec3, 0); + return 0; + } + + rec4 =3D bpf_ringbuf_reserve(&ringbuf, LEN4, 0); + if (!rec4) { + reserve4_fail =3D 1; + bpf_ringbuf_discard(rec1, 0); + bpf_ringbuf_discard(rec2, 0); + return 0; + } + + bpf_ringbuf_submit(rec1, 0); + bpf_ringbuf_submit(rec2, 0); + bpf_ringbuf_submit(rec4, 0); + + rec5 =3D bpf_ringbuf_reserve(&ringbuf, LEN5, 0); + if (!rec5) { + reserve5_fail =3D 1; + return 0; + } + + for (int i =3D 0; i < LEN3; i++) + rec5[i] =3D 0xdd; + + bpf_ringbuf_submit(rec5, 0); + + ring_size =3D bpf_ringbuf_query(&ringbuf, BPF_RB_RING_SIZE); + avail_data =3D bpf_ringbuf_query(&ringbuf, BPF_RB_AVAIL_DATA); + cons_pos =3D bpf_ringbuf_query(&ringbuf, BPF_RB_CONS_POS); + prod_pos =3D bpf_ringbuf_query(&ringbuf, BPF_RB_PROD_POS); + over_pos =3D bpf_ringbuf_query(&ringbuf, BPF_RB_OVERWRITE_POS); + + return 0; +} --=20 2.43.0 From nobody Tue Feb 10 09:58:02 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1486020B80B; Sat, 18 Oct 2025 04:04:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760760288; cv=none; b=fC7dj6Ou0BUEH4qppyAKmXpieyrf1o8JLF3bbT/inooV36iXCHxbRTYYn2XobnZgMT/7dAzgeGKzofLbUJ5hLAcsUlDiyQ2Z2mL1kdJuMY3/RPSgHckPLuNU+4/ipFH9V1AlVKpFDGDyD/Cmw26fmK7YVAHnhsC9/ojK7KHzYvE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760760288; c=relaxed/simple; bh=s0YeU+M9ekczfZEx6RyiKeOQ8ZtpbhrwKw89w/fKXf4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=R7RkfErStIvy7lFvG89xgdURN2qdD1FDGOUrph3Mi2e+fO1vobmX5H5xuF4Ea+tUVXaOAuIedNyKCjrBVZiaOpIbgmMr8Aq+/aP6UXxb4+qmnvi17M/E+2wZn5jCncAVGUtscWas5cqfHYXxhb2XqGEcgbcfZB0EmFBekMFe51c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4cpSjf0PSqzYQtjb; Sat, 18 Oct 2025 12:03:50 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 1FC261A1013; Sat, 18 Oct 2025 12:04:37 +0800 (CST) Received: from k-arm6401.huawei.com (unknown [7.217.19.243]) by APP2 (Coremail) with SMTP id Syh0CgA32UHPEfNoOeb+Ag--.21556S5; Sat, 18 Oct 2025 12:04:36 +0800 (CST) From: Xu Kuohai To: bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Yonghong Song , Song Liu Subject: [PATCH bpf-next v3 3/3] selftests/bpf/benchs: Add overwrite mode benchmark for BPF ring buffer Date: Sat, 18 Oct 2025 11:57:38 +0800 Message-ID: <20251018035738.4039621-4-xukuohai@huaweicloud.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251018035738.4039621-1-xukuohai@huaweicloud.com> References: <20251018035738.4039621-1-xukuohai@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgA32UHPEfNoOeb+Ag--.21556S5 X-Coremail-Antispam: 1UD129KBjvJXoW3CrWDKrWrWr4rKF47Jw1rtFb_yoWDXF1xpa n2kFWfCF1fA3s3WFyvk3y8ArWxursrZ3W5CF1xJa1UZw1UWw4jqryIk34UJw15G34vyw1S 934ktry09r4Yy3JanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmab4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUWw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMc Ij6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_ Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AFwI 0_Jw0_GFylc7CjxVAKzI0EY4vE52x082I5MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCj c4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4 CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1x MIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsG vfC2KfnxnUUI43ZEXa7IU8D5r7UUUUU== X-CM-SenderInfo: 50xn30hkdlqx5xdzvxpfor3voofrz/ From: Xu Kuohai Add --rb-overwrite option to benchmark BPF ring buffer in overwrite mode. Since overwrite mode is not yet supported by libbpf for consumer, also add --rb-bench-producer option to benchmark producer directly without a consume= r. Benchmarks on an x86_64 and an arm64 CPU are shown below for reference. - AMD EPYC 9654 (x86_64) Ringbuf, multi-producer contention in overwrite mode, no consumer =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D rb-prod nr_prod 1 32.180 =C2=B1 0.033M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 2 9.617 =C2=B1 0.003M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 3 8.810 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 4 9.272 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 8 9.173 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 12 3.086 =C2=B1 0.032M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 16 2.945 =C2=B1 0.021M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 20 2.519 =C2=B1 0.021M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 24 2.545 =C2=B1 0.021M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 28 2.363 =C2=B1 0.024M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 32 2.357 =C2=B1 0.021M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 36 2.267 =C2=B1 0.011M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 40 2.284 =C2=B1 0.020M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 44 2.215 =C2=B1 0.025M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 48 2.193 =C2=B1 0.023M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 52 2.208 =C2=B1 0.024M/s (drops 0.000 =C2=B1 0.000M/s) - HiSilicon Kunpeng 920 (arm64) Ringbuf, multi-producer contention in overwrite mode, no consumer =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D rb-prod nr_prod 1 14.478 =C2=B1 0.006M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 2 21.787 =C2=B1 0.010M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 3 6.045 =C2=B1 0.001M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 4 5.352 =C2=B1 0.003M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 8 4.850 =C2=B1 0.002M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 12 3.542 =C2=B1 0.016M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 16 3.509 =C2=B1 0.021M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 20 3.171 =C2=B1 0.010M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 24 3.154 =C2=B1 0.014M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 28 2.974 =C2=B1 0.015M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 32 3.167 =C2=B1 0.014M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 36 2.903 =C2=B1 0.010M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 40 2.866 =C2=B1 0.010M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 44 2.914 =C2=B1 0.010M/s (drops 0.000 =C2=B1 0.000M/s) rb-prod nr_prod 48 2.806 =C2=B1 0.012M/s (drops 0.000 =C2=B1 0.000M/s) Rb-prod nr_prod 52 2.840 =C2=B1 0.012M/s (drops 0.000 =C2=B1 0.000M/s) Signed-off-by: Xu Kuohai --- .../selftests/bpf/benchs/bench_ringbufs.c | 66 +++++++++++++++++-- .../bpf/benchs/run_bench_ringbufs.sh | 4 ++ .../selftests/bpf/progs/ringbuf_bench.c | 11 ++++ 3 files changed, 75 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/benchs/bench_ringbufs.c b/tools/te= sting/selftests/bpf/benchs/bench_ringbufs.c index e1ee979e6acc..212859fb2961 100644 --- a/tools/testing/selftests/bpf/benchs/bench_ringbufs.c +++ b/tools/testing/selftests/bpf/benchs/bench_ringbufs.c @@ -19,6 +19,8 @@ static struct { int ringbuf_sz; /* per-ringbuf, in bytes */ bool ringbuf_use_output; /* use slower output API */ int perfbuf_sz; /* per-CPU size, in pages */ + bool overwrite; + bool bench_producer; } args =3D { .back2back =3D false, .batch_cnt =3D 500, @@ -27,6 +29,8 @@ static struct { .ringbuf_sz =3D 512 * 1024, .ringbuf_use_output =3D false, .perfbuf_sz =3D 128, + .overwrite =3D false, + .bench_producer =3D false, }; =20 enum { @@ -35,6 +39,8 @@ enum { ARG_RB_BATCH_CNT =3D 2002, ARG_RB_SAMPLED =3D 2003, ARG_RB_SAMPLE_RATE =3D 2004, + ARG_RB_OVERWRITE =3D 2005, + ARG_RB_BENCH_PRODUCER =3D 2006, }; =20 static const struct argp_option opts[] =3D { @@ -43,6 +49,8 @@ static const struct argp_option opts[] =3D { { "rb-batch-cnt", ARG_RB_BATCH_CNT, "CNT", 0, "Set BPF-side record batch = count"}, { "rb-sampled", ARG_RB_SAMPLED, NULL, 0, "Notification sampling"}, { "rb-sample-rate", ARG_RB_SAMPLE_RATE, "RATE", 0, "Notification sample r= ate"}, + { "rb-overwrite", ARG_RB_OVERWRITE, NULL, 0, "Overwrite mode"}, + { "rb-bench-producer", ARG_RB_BENCH_PRODUCER, NULL, 0, "Benchmark produce= r"}, {}, }; =20 @@ -72,6 +80,12 @@ static error_t parse_arg(int key, char *arg, struct argp= _state *state) argp_usage(state); } break; + case ARG_RB_OVERWRITE: + args.overwrite =3D true; + break; + case ARG_RB_BENCH_PRODUCER: + args.bench_producer =3D true; + break; default: return ARGP_ERR_UNKNOWN; } @@ -95,8 +109,33 @@ static inline void bufs_trigger_batch(void) =20 static void bufs_validate(void) { - if (env.consumer_cnt !=3D 1) { - fprintf(stderr, "rb-libbpf benchmark needs one consumer!\n"); + if (args.bench_producer && strcmp(env.bench_name, "rb-libbpf")) { + fprintf(stderr, "--rb-bench-producer only works with rb-libbpf!\n"); + exit(1); + } + + if (args.overwrite && !args.bench_producer) { + fprintf(stderr, "overwrite mode only works with --rb-bench-producer for = now!\n"); + exit(1); + } + + if (args.bench_producer && env.consumer_cnt !=3D 0) { + fprintf(stderr, "no consumer is needed for --rb-bench-producer!\n"); + exit(1); + } + + if (args.bench_producer && args.back2back) { + fprintf(stderr, "back-to-back mode makes no sense for --rb-bench-produce= r!\n"); + exit(1); + } + + if (args.bench_producer && args.sampled) { + fprintf(stderr, "sampling mode makes no sense for --rb-bench-producer!\n= "); + exit(1); + } + + if (!args.bench_producer && env.consumer_cnt !=3D 1) { + fprintf(stderr, "benchmarks without --rb-bench-producer require exactly = one consumer!\n"); exit(1); } =20 @@ -128,12 +167,17 @@ static void ringbuf_libbpf_measure(struct bench_res *= res) { struct ringbuf_libbpf_ctx *ctx =3D &ringbuf_libbpf_ctx; =20 - res->hits =3D atomic_swap(&buf_hits.value, 0); + if (args.bench_producer) + res->hits =3D atomic_swap(&ctx->skel->bss->hits, 0); + else + res->hits =3D atomic_swap(&buf_hits.value, 0); res->drops =3D atomic_swap(&ctx->skel->bss->dropped, 0); } =20 static struct ringbuf_bench *ringbuf_setup_skeleton(void) { + __u32 flags; + struct bpf_map *ringbuf; struct ringbuf_bench *skel; =20 setup_libbpf(); @@ -146,12 +190,19 @@ static struct ringbuf_bench *ringbuf_setup_skeleton(v= oid) =20 skel->rodata->batch_cnt =3D args.batch_cnt; skel->rodata->use_output =3D args.ringbuf_use_output ? 1 : 0; + skel->rodata->bench_producer =3D args.bench_producer; =20 if (args.sampled) /* record data + header take 16 bytes */ skel->rodata->wakeup_data_size =3D args.sample_rate * 16; =20 - bpf_map__set_max_entries(skel->maps.ringbuf, args.ringbuf_sz); + ringbuf =3D skel->maps.ringbuf; + if (args.overwrite) { + flags =3D bpf_map__map_flags(ringbuf) | BPF_F_RB_OVERWRITE; + bpf_map__set_map_flags(ringbuf, flags); + } + + bpf_map__set_max_entries(ringbuf, args.ringbuf_sz); =20 if (ringbuf_bench__load(skel)) { fprintf(stderr, "failed to load skeleton\n"); @@ -171,10 +222,13 @@ static void ringbuf_libbpf_setup(void) { struct ringbuf_libbpf_ctx *ctx =3D &ringbuf_libbpf_ctx; struct bpf_link *link; + int map_fd; =20 ctx->skel =3D ringbuf_setup_skeleton(); - ctx->ringbuf =3D ring_buffer__new(bpf_map__fd(ctx->skel->maps.ringbuf), - buf_process_sample, NULL, NULL); + + map_fd =3D bpf_map__fd(ctx->skel->maps.ringbuf); + ctx->ringbuf =3D ring_buffer__new(map_fd, buf_process_sample, + NULL, NULL); if (!ctx->ringbuf) { fprintf(stderr, "failed to create ringbuf\n"); exit(1); diff --git a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh b/too= ls/testing/selftests/bpf/benchs/run_bench_ringbufs.sh index 91e3567962ff..83e05e837871 100755 --- a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh +++ b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh @@ -49,3 +49,7 @@ for b in 1 2 3 4 8 12 16 20 24 28 32 36 40 44 48 52; do summarize "rb-libbpf nr_prod $b" "$($RUN_RB_BENCH -p$b --rb-batch-cnt 50 = rb-libbpf)" done =20 +header "Ringbuf, multi-producer contention in overwrite mode, no consumer" +for b in 1 2 3 4 8 12 16 20 24 28 32 36 40 44 48 52; do + summarize "rb-prod nr_prod $b" "$($RUN_BENCH -p$b --rb-batch-cnt 50 --rb-= overwrite --rb-bench-producer rb-libbpf)" +done diff --git a/tools/testing/selftests/bpf/progs/ringbuf_bench.c b/tools/test= ing/selftests/bpf/progs/ringbuf_bench.c index 6a468496f539..d96c7d1e8fc2 100644 --- a/tools/testing/selftests/bpf/progs/ringbuf_bench.c +++ b/tools/testing/selftests/bpf/progs/ringbuf_bench.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 // Copyright (c) 2020 Facebook =20 +#include #include #include #include @@ -14,9 +15,11 @@ struct { =20 const volatile int batch_cnt =3D 0; const volatile long use_output =3D 0; +const volatile bool bench_producer =3D false; =20 long sample_val =3D 42; long dropped __attribute__((aligned(128))) =3D 0; +long hits __attribute__((aligned(128))) =3D 0; =20 const volatile long wakeup_data_size =3D 0; =20 @@ -24,6 +27,9 @@ static __always_inline long get_flags() { long sz; =20 + if (bench_producer) + return BPF_RB_NO_WAKEUP; + if (!wakeup_data_size) return 0; =20 @@ -47,6 +53,8 @@ int bench_ringbuf(void *ctx) *sample =3D sample_val; flags =3D get_flags(); bpf_ringbuf_submit(sample, flags); + if (bench_producer) + __sync_add_and_fetch(&hits, 1); } } } else { @@ -55,6 +63,9 @@ int bench_ringbuf(void *ctx) if (bpf_ringbuf_output(&ringbuf, &sample_val, sizeof(sample_val), flags)) __sync_add_and_fetch(&dropped, 1); + else if (bench_producer) + __sync_add_and_fetch(&hits, 1); + } } return 0; --=20 2.43.0