From nobody Tue Jun 30 00:47:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EDF1C433EF for ; Fri, 28 Jan 2022 23:51:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351839AbiA1XvT convert rfc822-to-8bit (ORCPT ); Fri, 28 Jan 2022 18:51:19 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:29482 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344725AbiA1XvQ (ORCPT ); Fri, 28 Jan 2022 18:51:16 -0500 Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 20SKhMY6026408 for ; Fri, 28 Jan 2022 15:51:16 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3dv7skekt1-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 28 Jan 2022 15:51:15 -0800 Received: from twshared22811.39.frc1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 28 Jan 2022 15:51:14 -0800 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id 2A2AD28E02F07; Fri, 28 Jan 2022 15:45:25 -0800 (PST) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v7 bpf-next 1/9] x86/Kconfig: select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP Date: Fri, 28 Jan 2022 15:45:09 -0800 Message-ID: <20220128234517.3503701-2-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220128234517.3503701-1-song@kernel.org> References: <20220128234517.3503701-1-song@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe X-Proofpoint-GUID: 8i2bbdXN5xmP5NoX9gRtsebuBsVSKBkm X-Proofpoint-ORIG-GUID: 8i2bbdXN5xmP5NoX9gRtsebuBsVSKBkm X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_08,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=fb_outbound_notspam policy=fb_outbound score=0 mlxlogscore=910 phishscore=0 lowpriorityscore=0 spamscore=0 bulkscore=0 impostorscore=0 malwarescore=0 priorityscore=1501 clxscore=1034 mlxscore=0 adultscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280129 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Song Liu This enables module_alloc() to allocate huge page for 2MB+ requests. To check the difference of this change, we need enable config CONFIG_PTDUMP_DEBUGFS, and call module_alloc(2MB). Before the change, /sys/kernel/debug/page_tables/kernel shows pte for this map. With the change, /sys/kernel/debug/page_tables/ show pmd for thie map. Signed-off-by: Song Liu --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8910b09b5601..b5d1582ea848 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -158,6 +158,7 @@ config X86 select HAVE_ALIGNED_STRUCT_PAGE if SLUB select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE + select HAVE_ARCH_HUGE_VMALLOC if HAVE_ARCH_HUGE_VMAP select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_JUMP_LABEL_RELATIVE select HAVE_ARCH_KASAN if X86_64 --=20 2.30.2 From nobody Tue Jun 30 00:47:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDECDC433FE for ; Fri, 28 Jan 2022 23:51:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351874AbiA1Xvc convert rfc822-to-8bit (ORCPT ); Fri, 28 Jan 2022 18:51:32 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:26208 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351845AbiA1Xv1 (ORCPT ); Fri, 28 Jan 2022 18:51:27 -0500 Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 20SKOkQD018916 for ; Fri, 28 Jan 2022 15:51:27 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3dv8umee4f-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 28 Jan 2022 15:51:26 -0800 Received: from twshared22811.39.frc1.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 28 Jan 2022 15:51:26 -0800 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id 4A35B28E02F2B; Fri, 28 Jan 2022 15:45:28 -0800 (PST) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v7 bpf-next 2/9] bpf: use bytes instead of pages for bpf_jit_[charge|uncharge]_modmem Date: Fri, 28 Jan 2022 15:45:10 -0800 Message-ID: <20220128234517.3503701-3-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220128234517.3503701-1-song@kernel.org> References: <20220128234517.3503701-1-song@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 6v4CIzSZjmdPtZ1g9PLp168rQYbnUlHd X-Proofpoint-GUID: 6v4CIzSZjmdPtZ1g9PLp168rQYbnUlHd X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_08,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=fb_outbound_notspam policy=fb_outbound score=0 adultscore=0 mlxscore=0 spamscore=0 bulkscore=0 suspectscore=0 malwarescore=0 mlxlogscore=999 priorityscore=1501 impostorscore=0 lowpriorityscore=0 clxscore=1015 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280129 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Song Liu This enables sub-page memory charge and allocation. Signed-off-by: Song Liu --- include/linux/bpf.h | 4 ++-- kernel/bpf/core.c | 17 ++++++++--------- kernel/bpf/trampoline.c | 6 +++--- 3 files changed, 13 insertions(+), 14 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e8ec8d2f2fe3..28ac752f21ba 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -838,8 +838,8 @@ void bpf_image_ksym_add(void *data, struct bpf_ksym *ks= ym); void bpf_image_ksym_del(struct bpf_ksym *ksym); void bpf_ksym_add(struct bpf_ksym *ksym); void bpf_ksym_del(struct bpf_ksym *ksym); -int bpf_jit_charge_modmem(u32 pages); -void bpf_jit_uncharge_modmem(u32 pages); +int bpf_jit_charge_modmem(u32 size); +void bpf_jit_uncharge_modmem(u32 size); bool bpf_prog_has_trampoline(const struct bpf_prog *prog); #else static inline int bpf_trampoline_link_prog(struct bpf_prog *prog, diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 0a1cfd8544b9..d96ad87f0a2c 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -833,12 +833,11 @@ static int __init bpf_jit_charge_init(void) } pure_initcall(bpf_jit_charge_init); =20 -int bpf_jit_charge_modmem(u32 pages) +int bpf_jit_charge_modmem(u32 size) { - if (atomic_long_add_return(pages, &bpf_jit_current) > - (bpf_jit_limit >> PAGE_SHIFT)) { + if (atomic_long_add_return(size, &bpf_jit_current) > bpf_jit_limit) { if (!bpf_capable()) { - atomic_long_sub(pages, &bpf_jit_current); + atomic_long_sub(size, &bpf_jit_current); return -EPERM; } } @@ -846,9 +845,9 @@ int bpf_jit_charge_modmem(u32 pages) return 0; } =20 -void bpf_jit_uncharge_modmem(u32 pages) +void bpf_jit_uncharge_modmem(u32 size) { - atomic_long_sub(pages, &bpf_jit_current); + atomic_long_sub(size, &bpf_jit_current); } =20 void *__weak bpf_jit_alloc_exec(unsigned long size) @@ -879,11 +878,11 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image= _ptr, size =3D round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE); pages =3D size / PAGE_SIZE; =20 - if (bpf_jit_charge_modmem(pages)) + if (bpf_jit_charge_modmem(size)) return NULL; hdr =3D bpf_jit_alloc_exec(size); if (!hdr) { - bpf_jit_uncharge_modmem(pages); + bpf_jit_uncharge_modmem(size); return NULL; } =20 @@ -906,7 +905,7 @@ void bpf_jit_binary_free(struct bpf_binary_header *hdr) u32 pages =3D hdr->pages; =20 bpf_jit_free_exec(hdr); - bpf_jit_uncharge_modmem(pages); + bpf_jit_uncharge_modmem(pages << PAGE_SHIFT); } =20 /* This symbol is only overridden by archs that have different diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 4b6974a195c1..e76a488c09c3 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -213,7 +213,7 @@ static void __bpf_tramp_image_put_deferred(struct work_= struct *work) im =3D container_of(work, struct bpf_tramp_image, work); bpf_image_ksym_del(&im->ksym); bpf_jit_free_exec(im->image); - bpf_jit_uncharge_modmem(1); + bpf_jit_uncharge_modmem(PAGE_SIZE); percpu_ref_exit(&im->pcref); kfree_rcu(im, rcu); } @@ -310,7 +310,7 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u6= 4 key, u32 idx) if (!im) goto out; =20 - err =3D bpf_jit_charge_modmem(1); + err =3D bpf_jit_charge_modmem(PAGE_SIZE); if (err) goto out_free_im; =20 @@ -332,7 +332,7 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u6= 4 key, u32 idx) out_free_image: bpf_jit_free_exec(im->image); out_uncharge: - bpf_jit_uncharge_modmem(1); + bpf_jit_uncharge_modmem(PAGE_SIZE); out_free_im: kfree(im); out: --=20 2.30.2 From nobody Tue Jun 30 00:47:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18966C433EF for ; Fri, 28 Jan 2022 23:51:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351844AbiA1XvY convert rfc822-to-8bit (ORCPT ); Fri, 28 Jan 2022 18:51:24 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:29818 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1344665AbiA1XvR (ORCPT ); Fri, 28 Jan 2022 18:51:17 -0500 Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.1.2/8.16.1.2) with ESMTP id 20SKoIXU002143 for ; Fri, 28 Jan 2022 15:51:17 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3dv695y14m-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 28 Jan 2022 15:51:16 -0800 Received: from twshared11487.23.frc3.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 28 Jan 2022 15:51:14 -0800 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id 2C7CA28E02F80; Fri, 28 Jan 2022 15:45:30 -0800 (PST) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v7 bpf-next 3/9] bpf: use size instead of pages in bpf_binary_header Date: Fri, 28 Jan 2022 15:45:11 -0800 Message-ID: <20220128234517.3503701-4-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220128234517.3503701-1-song@kernel.org> References: <20220128234517.3503701-1-song@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: Sh2rDQ0ic04P_HBsqjpbWx2ouVE77Pin X-Proofpoint-GUID: Sh2rDQ0ic04P_HBsqjpbWx2ouVE77Pin X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_08,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=fb_outbound_notspam policy=fb_outbound score=0 mlxscore=0 suspectscore=0 impostorscore=0 phishscore=0 priorityscore=1501 spamscore=0 lowpriorityscore=0 malwarescore=0 adultscore=0 clxscore=1015 mlxlogscore=862 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280129 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Song Liu This is necessary to charge sub page memory for the BPF program. Signed-off-by: Song Liu --- include/linux/filter.h | 6 +++--- kernel/bpf/core.c | 11 +++++------ 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index d23e999dc032..5855eb474c62 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -548,7 +548,7 @@ struct sock_fprog_kern { #define BPF_IMAGE_ALIGNMENT 8 =20 struct bpf_binary_header { - u32 pages; + u32 size; u8 image[] __aligned(BPF_IMAGE_ALIGNMENT); }; =20 @@ -886,8 +886,8 @@ static inline void bpf_prog_lock_ro(struct bpf_prog *fp) static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr) { set_vm_flush_reset_perms(hdr); - set_memory_ro((unsigned long)hdr, hdr->pages); - set_memory_x((unsigned long)hdr, hdr->pages); + set_memory_ro((unsigned long)hdr, hdr->size >> PAGE_SHIFT); + set_memory_x((unsigned long)hdr, hdr->size >> PAGE_SHIFT); } =20 static inline struct bpf_binary_header * diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index d96ad87f0a2c..69f348d9f816 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -543,7 +543,7 @@ bpf_prog_ksym_set_addr(struct bpf_prog *prog) WARN_ON_ONCE(!bpf_prog_ebpf_jited(prog)); =20 prog->aux->ksym.start =3D (unsigned long) prog->bpf_func; - prog->aux->ksym.end =3D addr + hdr->pages * PAGE_SIZE; + prog->aux->ksym.end =3D addr + hdr->size; } =20 static void @@ -866,7 +866,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_p= tr, bpf_jit_fill_hole_t bpf_fill_ill_insns) { struct bpf_binary_header *hdr; - u32 size, hole, start, pages; + u32 size, hole, start; =20 WARN_ON_ONCE(!is_power_of_2(alignment) || alignment > BPF_IMAGE_ALIGNMENT); @@ -876,7 +876,6 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_p= tr, * random section of illegal instructions. */ size =3D round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE); - pages =3D size / PAGE_SIZE; =20 if (bpf_jit_charge_modmem(size)) return NULL; @@ -889,7 +888,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_p= tr, /* Fill space with illegal/arch-dep instructions. */ bpf_fill_ill_insns(hdr, size); =20 - hdr->pages =3D pages; + hdr->size =3D size; hole =3D min_t(unsigned int, size - (proglen + sizeof(*hdr)), PAGE_SIZE - sizeof(*hdr)); start =3D (get_random_int() % hole) & ~(alignment - 1); @@ -902,10 +901,10 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image= _ptr, =20 void bpf_jit_binary_free(struct bpf_binary_header *hdr) { - u32 pages =3D hdr->pages; + u32 size =3D hdr->size; =20 bpf_jit_free_exec(hdr); - bpf_jit_uncharge_modmem(pages << PAGE_SHIFT); + bpf_jit_uncharge_modmem(size); } =20 /* This symbol is only overridden by archs that have different --=20 2.30.2 From nobody Tue Jun 30 00:47:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACB6AC433EF for ; Fri, 28 Jan 2022 23:48:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351824AbiA1XsP convert rfc822-to-8bit (ORCPT ); Fri, 28 Jan 2022 18:48:15 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:60286 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242914AbiA1XsO (ORCPT ); Fri, 28 Jan 2022 18:48:14 -0500 Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 20SFIkdP030176 for ; Fri, 28 Jan 2022 15:48:14 -0800 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3dvk0n37ue-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 28 Jan 2022 15:48:14 -0800 Received: from twshared12416.02.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 28 Jan 2022 15:48:12 -0800 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id E82F928E02F85; Fri, 28 Jan 2022 15:45:32 -0800 (PST) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v7 bpf-next 4/9] bpf: use prog->jited_len in bpf_prog_ksym_set_addr() Date: Fri, 28 Jan 2022 15:45:12 -0800 Message-ID: <20220128234517.3503701-5-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220128234517.3503701-1-song@kernel.org> References: <20220128234517.3503701-1-song@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: iZ7EvUD0Y1__y8vHlpLaKYJZeclzmnyZ X-Proofpoint-GUID: iZ7EvUD0Y1__y8vHlpLaKYJZeclzmnyZ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_08,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=fb_outbound_notspam policy=fb_outbound score=0 bulkscore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 spamscore=0 priorityscore=1501 mlxlogscore=999 mlxscore=0 clxscore=1034 phishscore=0 suspectscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280129 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Using prog->jited_len is simpler and more accurate than current estimation (header + header->size). Signed-off-by: Song Liu --- kernel/bpf/core.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 69f348d9f816..7cbcf6bfbb52 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -537,13 +537,10 @@ long bpf_jit_limit_max __read_mostly; static void bpf_prog_ksym_set_addr(struct bpf_prog *prog) { - const struct bpf_binary_header *hdr =3D bpf_jit_binary_hdr(prog); - unsigned long addr =3D (unsigned long)hdr; - WARN_ON_ONCE(!bpf_prog_ebpf_jited(prog)); =20 prog->aux->ksym.start =3D (unsigned long) prog->bpf_func; - prog->aux->ksym.end =3D addr + hdr->size; + prog->aux->ksym.end =3D prog->aux->ksym.start + prog->jited_len; } =20 static void --=20 2.30.2 From nobody Tue Jun 30 00:47:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45FA3C433EF for ; Fri, 28 Jan 2022 23:51:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351850AbiA1Xvq convert rfc822-to-8bit (ORCPT ); Fri, 28 Jan 2022 18:51:46 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:53212 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1351878AbiA1Xvg (ORCPT ); Fri, 28 Jan 2022 18:51:36 -0500 Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.1.2/8.16.1.2) with ESMTP id 20SKoIXa002143 for ; Fri, 28 Jan 2022 15:51:35 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3dv695y15n-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 28 Jan 2022 15:51:35 -0800 Received: from twshared11487.23.frc3.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 28 Jan 2022 15:51:27 -0800 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id 6A33A28E02F8B; Fri, 28 Jan 2022 15:45:34 -0800 (PST) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v7 bpf-next 5/9] x86/alternative: introduce text_poke_copy Date: Fri, 28 Jan 2022 15:45:13 -0800 Message-ID: <20220128234517.3503701-6-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220128234517.3503701-1-song@kernel.org> References: <20220128234517.3503701-1-song@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: uCGZOHZgE6mIPj8uii87BOJi62EMGL7F X-Proofpoint-GUID: uCGZOHZgE6mIPj8uii87BOJi62EMGL7F X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_08,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=fb_outbound_notspam policy=fb_outbound score=0 mlxscore=0 suspectscore=0 impostorscore=0 phishscore=0 priorityscore=1501 spamscore=0 lowpriorityscore=0 malwarescore=0 adultscore=0 clxscore=1015 mlxlogscore=361 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280130 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This will be used by BPF jit compiler to dump JITed binary to a RX huge page, and thus allow multiple BPF programs sharing the a huge (2MB) page. Acked-by: Peter Zijlstra (Intel) Signed-off-by: Song Liu --- arch/x86/include/asm/text-patching.h | 1 + arch/x86/kernel/alternative.c | 32 ++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/te= xt-patching.h index b7421780e4e9..4cc18ba1b75e 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -44,6 +44,7 @@ extern void text_poke_early(void *addr, const void *opcod= e, size_t len); extern void *text_poke(void *addr, const void *opcode, size_t len); extern void text_poke_sync(void); extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len); +extern void *text_poke_copy(void *addr, const void *opcode, size_t len); extern int poke_int3_handler(struct pt_regs *regs); extern void text_poke_bp(void *addr, const void *opcode, size_t len, const= void *emulate); =20 diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 23fb4d51a5da..903a415c19fa 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1102,6 +1102,38 @@ void *text_poke_kgdb(void *addr, const void *opcode,= size_t len) return __text_poke(addr, opcode, len); } =20 +/** + * text_poke_copy - Copy instructions into (an unused part of) RX memory + * @addr: address to modify + * @opcode: source of the copy + * @len: length to copy, could be more than 2x PAGE_SIZE + * + * Not safe against concurrent execution; useful for JITs to dump + * new code blocks into unused regions of RX memory. Can be used in + * conjunction with synchronize_rcu_tasks() to wait for existing + * execution to quiesce after having made sure no existing functions + * pointers are live. + */ +void *text_poke_copy(void *addr, const void *opcode, size_t len) +{ + unsigned long start =3D (unsigned long)addr; + size_t patched =3D 0; + + if (WARN_ON_ONCE(core_kernel_text(start))) + return NULL; + + while (patched < len) { + unsigned long ptr =3D start + patched; + size_t s; + + s =3D min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched); + + __text_poke((void *)ptr, opcode + patched, s); + patched +=3D s; + } + return addr; +} + static void do_sync_core(void *info) { sync_core(); --=20 2.30.2 From nobody Tue Jun 30 00:47:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B65DDC433EF for ; Fri, 28 Jan 2022 23:57:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351565AbiA1X5U convert rfc822-to-8bit (ORCPT ); Fri, 28 Jan 2022 18:57:20 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:10008 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344727AbiA1X5R (ORCPT ); Fri, 28 Jan 2022 18:57:17 -0500 Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 20SKe9fD026412 for ; Fri, 28 Jan 2022 15:57:16 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3dv8um6b98-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 28 Jan 2022 15:57:16 -0800 Received: from twshared0654.04.ash8.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 28 Jan 2022 15:57:14 -0800 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id F3B2528E02F93; Fri, 28 Jan 2022 15:45:35 -0800 (PST) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v7 bpf-next 6/9] bpf: introduce bpf_arch_text_copy Date: Fri, 28 Jan 2022 15:45:14 -0800 Message-ID: <20220128234517.3503701-7-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220128234517.3503701-1-song@kernel.org> References: <20220128234517.3503701-1-song@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: bMCnYoILlEqUjdhJic2dWCNsqpqDaA5A X-Proofpoint-GUID: bMCnYoILlEqUjdhJic2dWCNsqpqDaA5A X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_08,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=fb_outbound_notspam policy=fb_outbound score=0 spamscore=0 bulkscore=0 priorityscore=1501 lowpriorityscore=0 impostorscore=0 malwarescore=0 suspectscore=0 clxscore=1015 mlxscore=0 phishscore=0 adultscore=0 mlxlogscore=998 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280130 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This will be used to copy JITed text to RO protected module memory. On x86, bpf_arch_text_copy is implemented with text_poke_copy. bpf_arch_text_copy returns pointer to dst on success, and ERR_PTR(errno) on errors. Signed-off-by: Song Liu --- arch/x86/net/bpf_jit_comp.c | 7 +++++++ include/linux/bpf.h | 2 ++ kernel/bpf/core.c | 5 +++++ 3 files changed, 14 insertions(+) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index ce1f86f245c9..9792bf10d881 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -2413,3 +2413,10 @@ bool bpf_jit_supports_kfunc_call(void) { return true; } + +void *bpf_arch_text_copy(void *dst, void *src, size_t len) +{ + if (text_poke_copy(dst, src, len) =3D=3D NULL) + return ERR_PTR(-EINVAL); + return dst; +} diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 28ac752f21ba..7f58fe256671 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2350,6 +2350,8 @@ enum bpf_text_poke_type { int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, void *addr1, void *addr2); =20 +void *bpf_arch_text_copy(void *dst, void *src, size_t len); + struct btf_id_set; bool btf_id_set_contains(const struct btf_id_set *set, u32 id); =20 diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 7cbcf6bfbb52..dc0142e20c72 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2448,6 +2448,11 @@ int __weak bpf_arch_text_poke(void *ip, enum bpf_tex= t_poke_type t, return -ENOTSUPP; } =20 +void * __weak bpf_arch_text_copy(void *dst, void *src, size_t len) +{ + return ERR_PTR(-ENOTSUPP); +} + DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key); EXPORT_SYMBOL(bpf_stats_enabled_key); =20 --=20 2.30.2 From nobody Tue Jun 30 00:47:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1A10C433EF for ; Fri, 28 Jan 2022 23:51:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347541AbiA1XvR convert rfc822-to-8bit (ORCPT ); Fri, 28 Jan 2022 18:51:17 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:1518 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344665AbiA1XvP (ORCPT ); Fri, 28 Jan 2022 18:51:15 -0500 Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 20SKhMY3026408 for ; Fri, 28 Jan 2022 15:51:15 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3dv7skekt1-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 28 Jan 2022 15:51:15 -0800 Received: from twshared11487.23.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 28 Jan 2022 15:51:14 -0800 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id DAF5E28E02F9C; Fri, 28 Jan 2022 15:45:37 -0800 (PST) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v7 bpf-next 7/9] bpf: introduce bpf_prog_pack allocator Date: Fri, 28 Jan 2022 15:45:15 -0800 Message-ID: <20220128234517.3503701-8-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220128234517.3503701-1-song@kernel.org> References: <20220128234517.3503701-1-song@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe X-Proofpoint-GUID: ORaH1_D7aAYJxJK-mcfuMEE-QAIXAN2r X-Proofpoint-ORIG-GUID: ORaH1_D7aAYJxJK-mcfuMEE-QAIXAN2r X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_08,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=fb_outbound_notspam policy=fb_outbound score=0 mlxlogscore=909 phishscore=0 lowpriorityscore=0 spamscore=0 bulkscore=0 impostorscore=0 malwarescore=0 priorityscore=1501 clxscore=1034 mlxscore=0 adultscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280129 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Most BPF programs are small, but they consume a page each. For systems with busy traffic and many BPF programs, this could add significant pressure to instruction TLB. Introduce bpf_prog_pack allocator to pack multiple BPF programs in a huge page. The memory is then allocated in 64 byte chunks. Memory allocated by bpf_prog_pack allocator is RO protected after initial allocation. To write to it, the user (jit engine) need to use text poke API. Signed-off-by: Song Liu --- kernel/bpf/core.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 127 insertions(+) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index dc0142e20c72..25e34caa9a95 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -805,6 +805,133 @@ int bpf_jit_add_poke_descriptor(struct bpf_prog *prog, return slot; } =20 +/* + * BPF program pack allocator. + * + * Most BPF programs are pretty small. Allocating a hole page for each + * program is sometime a waste. Many small bpf program also adds pressure + * to instruction TLB. To solve this issue, we introduce a BPF program pack + * allocator. The prog_pack allocator uses HPAGE_PMD_SIZE page (2MB on x86) + * to host BPF programs. + */ +#define BPF_PROG_PACK_SIZE HPAGE_PMD_SIZE +#define BPF_PROG_CHUNK_SHIFT 6 +#define BPF_PROG_CHUNK_SIZE (1 << BPF_PROG_CHUNK_SHIFT) +#define BPF_PROG_CHUNK_MASK (~(BPF_PROG_CHUNK_SIZE - 1)) +#define BPF_PROG_CHUNK_COUNT (BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE) + +struct bpf_prog_pack { + struct list_head list; + void *ptr; + unsigned long bitmap[BITS_TO_LONGS(BPF_PROG_CHUNK_COUNT)]; +}; + +#define BPF_PROG_MAX_PACK_PROG_SIZE HPAGE_PMD_SIZE +#define BPF_PROG_SIZE_TO_NBITS(size) (round_up(size, BPF_PROG_CHUNK_SIZE) = / BPF_PROG_CHUNK_SIZE) + +static DEFINE_MUTEX(pack_mutex); +static LIST_HEAD(pack_list); + +static struct bpf_prog_pack *alloc_new_pack(void) +{ + struct bpf_prog_pack *pack; + + pack =3D kzalloc(sizeof(*pack), GFP_KERNEL); + if (!pack) + return NULL; + pack->ptr =3D module_alloc(BPF_PROG_PACK_SIZE); + if (!pack->ptr) { + kfree(pack); + return NULL; + } + bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE); + list_add_tail(&pack->list, &pack_list); + + set_vm_flush_reset_perms(pack->ptr); + set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); + set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); + return pack; +} + +static void *bpf_prog_pack_alloc(u32 size) +{ + unsigned int nbits =3D BPF_PROG_SIZE_TO_NBITS(size); + struct bpf_prog_pack *pack; + unsigned long pos; + void *ptr =3D NULL; + + if (size > BPF_PROG_MAX_PACK_PROG_SIZE) { + size =3D round_up(size, PAGE_SIZE); + ptr =3D module_alloc(size); + if (ptr) { + set_vm_flush_reset_perms(ptr); + set_memory_ro((unsigned long)ptr, size / PAGE_SIZE); + set_memory_x((unsigned long)ptr, size / PAGE_SIZE); + } + return ptr; + } + mutex_lock(&pack_mutex); + list_for_each_entry(pack, &pack_list, list) { + pos =3D bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, + nbits, 0); + if (pos < BPF_PROG_CHUNK_COUNT) + goto found_free_area; + } + + pack =3D alloc_new_pack(); + if (!pack) + goto out; + + pos =3D 0; + +found_free_area: + bitmap_set(pack->bitmap, pos, nbits); + ptr =3D (void *)(pack->ptr) + (pos << BPF_PROG_CHUNK_SHIFT); + +out: + mutex_unlock(&pack_mutex); + return ptr; +} + +static void bpf_prog_pack_free(struct bpf_binary_header *hdr) +{ + struct bpf_prog_pack *pack =3D NULL, *tmp; + unsigned int nbits; + unsigned long pos; + void *pack_ptr; + + if (hdr->size > BPF_PROG_MAX_PACK_PROG_SIZE) { + module_memfree(hdr); + return; + } + + pack_ptr =3D (void *)((unsigned long)hdr & ~(BPF_PROG_PACK_SIZE - 1)); + mutex_lock(&pack_mutex); + + list_for_each_entry(tmp, &pack_list, list) { + if (tmp->ptr =3D=3D pack_ptr) { + pack =3D tmp; + break; + } + } + + if (WARN_ONCE(!pack, "bpf_prog_pack bug\n")) + goto out; + + nbits =3D BPF_PROG_SIZE_TO_NBITS(hdr->size); + pos =3D ((unsigned long)hdr - (unsigned long)pack_ptr) >> BPF_PROG_CHUNK_= SHIFT; + + bitmap_clear(pack->bitmap, pos, nbits); + if (bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, + BPF_PROG_CHUNK_COUNT, 0) =3D=3D 0) { + list_del(&pack->list); + module_memfree(pack->ptr); + kfree(pack); + } +out: + mutex_unlock(&pack_mutex); +} + static atomic_long_t bpf_jit_current; =20 /* Can be overridden by an arch's JIT compiler if it has a custom, --=20 2.30.2 From nobody Tue Jun 30 00:47:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 519ACC433FE for ; Fri, 28 Jan 2022 23:51:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351859AbiA1Xv3 convert rfc822-to-8bit (ORCPT ); Fri, 28 Jan 2022 18:51:29 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:19184 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351846AbiA1Xv0 (ORCPT ); Fri, 28 Jan 2022 18:51:26 -0500 Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 20SLrktf014243 for ; Fri, 28 Jan 2022 15:51:25 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3dvghkm900-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 28 Jan 2022 15:51:25 -0800 Received: from twshared11487.23.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 28 Jan 2022 15:51:23 -0800 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id D18DF28E02FAE; Fri, 28 Jan 2022 15:45:38 -0800 (PST) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v7 bpf-next 8/9] bpf: introduce bpf_jit_binary_pack_[alloc|finalize|free] Date: Fri, 28 Jan 2022 15:45:16 -0800 Message-ID: <20220128234517.3503701-9-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220128234517.3503701-1-song@kernel.org> References: <20220128234517.3503701-1-song@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: RguxOwtuOzyxPNwJWq7RRd2mfkS7iUcS X-Proofpoint-GUID: RguxOwtuOzyxPNwJWq7RRd2mfkS7iUcS X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_08,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=fb_outbound_notspam policy=fb_outbound score=0 malwarescore=0 bulkscore=0 adultscore=0 mlxscore=0 mlxlogscore=999 priorityscore=1501 phishscore=0 impostorscore=0 clxscore=1015 spamscore=0 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280129 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Song Liu This is the jit binary allocator built on top of bpf_prog_pack. bpf_prog_pack allocates RO memory, which cannot be used directly by the JIT engine. Therefore, a temporary rw buffer is allocated for the JIT engine. Once JIT is done, bpf_jit_binary_pack_finalize is used to copy the program to the RO memory. bpf_jit_binary_pack_alloc reserves 16 bytes of extra space for illegal instructions, which is small than the 128 bytes space reserved by bpf_jit_binary_alloc. This change is necessary for bpf_jit_binary_hdr to find the correct header. Also, flag use_bpf_prog_pack is added to differentiate a program allocated by bpf_jit_binary_pack_alloc. Signed-off-by: Song Liu --- include/linux/bpf.h | 1 + include/linux/filter.h | 21 ++++---- kernel/bpf/core.c | 108 ++++++++++++++++++++++++++++++++++++++++- 3 files changed, 120 insertions(+), 10 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 7f58fe256671..06d119c472e7 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -945,6 +945,7 @@ struct bpf_prog_aux { bool sleepable; bool tail_call_reachable; bool xdp_has_frags; + bool use_bpf_prog_pack; struct hlist_node tramp_hlist; /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ const struct btf_type *attach_func_proto; diff --git a/include/linux/filter.h b/include/linux/filter.h index 5855eb474c62..1cb1af917617 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -890,15 +890,6 @@ static inline void bpf_jit_binary_lock_ro(struct bpf_b= inary_header *hdr) set_memory_x((unsigned long)hdr, hdr->size >> PAGE_SHIFT); } =20 -static inline struct bpf_binary_header * -bpf_jit_binary_hdr(const struct bpf_prog *fp) -{ - unsigned long real_start =3D (unsigned long)fp->bpf_func; - unsigned long addr =3D real_start & PAGE_MASK; - - return (void *)addr; -} - int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int = cap); static inline int sk_filter(struct sock *sk, struct sk_buff *skb) { @@ -1068,6 +1059,18 @@ void *bpf_jit_alloc_exec(unsigned long size); void bpf_jit_free_exec(void *addr); void bpf_jit_free(struct bpf_prog *fp); =20 +struct bpf_binary_header * +bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **ro_image, + unsigned int alignment, + struct bpf_binary_header **rw_hdr, + u8 **rw_image, + bpf_jit_fill_hole_t bpf_fill_ill_insns); +int bpf_jit_binary_pack_finalize(struct bpf_prog *prog, + struct bpf_binary_header *ro_header, + struct bpf_binary_header *rw_header); +void bpf_jit_binary_pack_free(struct bpf_binary_header *ro_header, + struct bpf_binary_header *rw_header); + int bpf_jit_add_poke_descriptor(struct bpf_prog *prog, struct bpf_jit_poke_descriptor *poke); =20 diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 25e34caa9a95..ff0c51ef1cb7 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1031,6 +1031,109 @@ void bpf_jit_binary_free(struct bpf_binary_header *= hdr) bpf_jit_uncharge_modmem(size); } =20 +/* Allocate jit binary from bpf_prog_pack allocator. + * Since the allocated meory is RO+X, the JIT engine cannot write directly + * to the memory. To solve this problem, a RW buffer is also allocated at + * as the same time. The JIT engine should calculate offsets based on the + * RO memory address, but write JITed program to the RW buffer. Once the + * JIT engine finishes, it calls bpf_jit_binary_pack_finalize, which copies + * the JITed program to the RO memory. + */ +struct bpf_binary_header * +bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr, + unsigned int alignment, + struct bpf_binary_header **rw_header, + u8 **rw_image, + bpf_jit_fill_hole_t bpf_fill_ill_insns) +{ + struct bpf_binary_header *ro_header; + u32 size, hole, start; + + WARN_ON_ONCE(!is_power_of_2(alignment) || + alignment > BPF_IMAGE_ALIGNMENT); + + /* add 16 bytes for a random section of illegal instructions */ + size =3D round_up(proglen + sizeof(*ro_header) + 16, BPF_PROG_CHUNK_SIZE); + + if (bpf_jit_charge_modmem(size)) + return NULL; + ro_header =3D bpf_prog_pack_alloc(size); + if (!ro_header) { + bpf_jit_uncharge_modmem(size); + return NULL; + } + + *rw_header =3D kvmalloc(size, GFP_KERNEL); + if (!*rw_header) { + bpf_prog_pack_free(ro_header); + bpf_jit_uncharge_modmem(size); + return NULL; + } + + /* Fill space with illegal/arch-dep instructions. */ + bpf_fill_ill_insns(*rw_header, size); + (*rw_header)->size =3D size; + + hole =3D min_t(unsigned int, size - (proglen + sizeof(*ro_header)), + BPF_PROG_CHUNK_SIZE - sizeof(*ro_header)); + start =3D (get_random_int() % hole) & ~(alignment - 1); + + *image_ptr =3D &ro_header->image[start]; + *rw_image =3D &(*rw_header)->image[start]; + + return ro_header; +} + +/* Copy JITed text from rw_header to its final location, the ro_header. */ +int bpf_jit_binary_pack_finalize(struct bpf_prog *prog, + struct bpf_binary_header *ro_header, + struct bpf_binary_header *rw_header) +{ + void *ptr; + + ptr =3D bpf_arch_text_copy(ro_header, rw_header, rw_header->size); + + kvfree(rw_header); + + if (IS_ERR(ptr)) { + bpf_prog_pack_free(ro_header); + return PTR_ERR(ptr); + } + prog->aux->use_bpf_prog_pack =3D true; + return 0; +} + +/* bpf_jit_binary_pack_free is called in two different scenarios: + * 1) when the program is freed after; + * 2) when the JIT engine fails (before bpf_jit_binary_pack_finalize). + * For case 2), we need to free both the RO memory and the RW buffer. + * Also, ro_header->size in 2) is not properly set yet, so rw_header->size + * is used for uncharge. + */ +void bpf_jit_binary_pack_free(struct bpf_binary_header *ro_header, + struct bpf_binary_header *rw_header) +{ + u32 size =3D rw_header ? rw_header->size : ro_header->size; + + bpf_prog_pack_free(ro_header); + kvfree(rw_header); + bpf_jit_uncharge_modmem(size); +} + +static inline struct bpf_binary_header * +bpf_jit_binary_hdr(const struct bpf_prog *fp) +{ + unsigned long real_start =3D (unsigned long)fp->bpf_func; + unsigned long addr; + + if (fp->aux->use_bpf_prog_pack) + addr =3D real_start & BPF_PROG_CHUNK_MASK; + else + addr =3D real_start & PAGE_MASK; + + return (void *)addr; +} + /* This symbol is only overridden by archs that have different * requirements than the usual eBPF JITs, f.e. when they only * implement cBPF JIT, do not set images read-only, etc. @@ -1040,7 +1143,10 @@ void __weak bpf_jit_free(struct bpf_prog *fp) if (fp->jited) { struct bpf_binary_header *hdr =3D bpf_jit_binary_hdr(fp); =20 - bpf_jit_binary_free(hdr); + if (fp->aux->use_bpf_prog_pack) + bpf_jit_binary_pack_free(hdr, NULL /* rw_buffer */); + else + bpf_jit_binary_free(hdr); =20 WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp)); } --=20 2.30.2 From nobody Tue Jun 30 00:47:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B5F7C433EF for ; Fri, 28 Jan 2022 23:51:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351861AbiA1Xve convert rfc822-to-8bit (ORCPT ); Fri, 28 Jan 2022 18:51:34 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:38442 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351850AbiA1Xv3 (ORCPT ); Fri, 28 Jan 2022 18:51:29 -0500 Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 20SKOkQI018916 for ; Fri, 28 Jan 2022 15:51:29 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3dv8umee4f-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 28 Jan 2022 15:51:28 -0800 Received: from twshared11487.23.frc3.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Fri, 28 Jan 2022 15:51:27 -0800 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id 559E628E032D4; Fri, 28 Jan 2022 15:45:44 -0800 (PST) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v7 bpf-next 9/9] bpf, x86_64: use bpf_jit_binary_pack_alloc Date: Fri, 28 Jan 2022 15:45:17 -0800 Message-ID: <20220128234517.3503701-10-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220128234517.3503701-1-song@kernel.org> References: <20220128234517.3503701-1-song@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: X0j6oUDuklCVvOgRIdWNn7EDRj6qPXC3 X-Proofpoint-GUID: X0j6oUDuklCVvOgRIdWNn7EDRj6qPXC3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_08,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=fb_outbound_notspam policy=fb_outbound score=0 adultscore=0 mlxscore=0 spamscore=0 bulkscore=0 suspectscore=0 malwarescore=0 mlxlogscore=960 priorityscore=1501 impostorscore=0 lowpriorityscore=0 clxscore=1034 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280129 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Song Liu Use bpf_jit_binary_pack_alloc in x86_64 jit. The jit engine first writes the program to the rw buffer. When the jit is done, the program is copied to the final location with bpf_jit_binary_pack_finalize. Note that we need to do bpf_tail_call_direct_fixup after finalize. Therefore, the text_live =3D false logic in __bpf_arch_text_poke is no longer needed. Signed-off-by: Song Liu --- arch/x86/net/bpf_jit_comp.c | 59 +++++++++++++++++++------------------ 1 file changed, 31 insertions(+), 28 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 9792bf10d881..afb957e63e3d 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -330,8 +330,7 @@ static int emit_jump(u8 **pprog, void *func, void *ip) } =20 static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, - void *old_addr, void *new_addr, - const bool text_live) + void *old_addr, void *new_addr) { const u8 *nop_insn =3D x86_nops[5]; u8 old_insn[X86_PATCH_SIZE]; @@ -365,10 +364,7 @@ static int __bpf_arch_text_poke(void *ip, enum bpf_tex= t_poke_type t, goto out; ret =3D 1; if (memcmp(ip, new_insn, X86_PATCH_SIZE)) { - if (text_live) - text_poke_bp(ip, new_insn, X86_PATCH_SIZE, NULL); - else - memcpy(ip, new_insn, X86_PATCH_SIZE); + text_poke_bp(ip, new_insn, X86_PATCH_SIZE, NULL); ret =3D 0; } out: @@ -384,7 +380,7 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_typ= e t, /* BPF poking in modules is not supported */ return -EINVAL; =20 - return __bpf_arch_text_poke(ip, t, old_addr, new_addr, true); + return __bpf_arch_text_poke(ip, t, old_addr, new_addr); } =20 #define EMIT_LFENCE() EMIT3(0x0F, 0xAE, 0xE8) @@ -558,24 +554,15 @@ static void bpf_tail_call_direct_fixup(struct bpf_pro= g *prog) mutex_lock(&array->aux->poke_mutex); target =3D array->ptrs[poke->tail_call.key]; if (target) { - /* Plain memcpy is used when image is not live yet - * and still not locked as read-only. Once poke - * location is active (poke->tailcall_target_stable), - * any parallel bpf_arch_text_poke() might occur - * still on the read-write image until we finally - * locked it as read-only. Both modifications on - * the given image are under text_mutex to avoid - * interference. - */ ret =3D __bpf_arch_text_poke(poke->tailcall_target, BPF_MOD_JUMP, NULL, (u8 *)target->bpf_func + - poke->adj_off, false); + poke->adj_off); BUG_ON(ret < 0); ret =3D __bpf_arch_text_poke(poke->tailcall_bypass, BPF_MOD_JUMP, (u8 *)poke->tailcall_target + - X86_PATCH_SIZE, NULL, false); + X86_PATCH_SIZE, NULL); BUG_ON(ret < 0); } WRITE_ONCE(poke->tailcall_target_stable, true); @@ -867,7 +854,7 @@ static void emit_nops(u8 **pprog, int len) =20 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp))) =20 -static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, +static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw= _image, int oldproglen, struct jit_context *ctx, bool jmp_padding) { bool tail_call_reachable =3D bpf_prog->aux->tail_call_reachable; @@ -894,8 +881,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs= , u8 *image, push_callee_regs(&prog, callee_regs_used); =20 ilen =3D prog - temp; - if (image) - memcpy(image + proglen, temp, ilen); + if (rw_image) + memcpy(rw_image + proglen, temp, ilen); proglen +=3D ilen; addrs[0] =3D proglen; prog =3D temp; @@ -1324,8 +1311,10 @@ st: if (is_imm8(insn->off)) pr_err("extable->insn doesn't fit into 32-bit\n"); return -EFAULT; } - ex->insn =3D delta; + /* switch ex to rw buffer for writes */ + ex =3D (void *)rw_image + ((void *)ex - (void *)image); =20 + ex->insn =3D delta; ex->type =3D EX_TYPE_BPF; =20 if (dst_reg > BPF_REG_9) { @@ -1706,7 +1695,7 @@ st: if (is_imm8(insn->off)) pr_err("bpf_jit: fatal error\n"); return -EFAULT; } - memcpy(image + proglen, temp, ilen); + memcpy(rw_image + proglen, temp, ilen); } proglen +=3D ilen; addrs[i] =3D proglen; @@ -2247,6 +2236,7 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *fun= cs, int num_funcs) } =20 struct x64_jit_data { + struct bpf_binary_header *rw_header; struct bpf_binary_header *header; int *addrs; u8 *image; @@ -2259,6 +2249,7 @@ struct x64_jit_data { =20 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) { + struct bpf_binary_header *rw_header =3D NULL; struct bpf_binary_header *header =3D NULL; struct bpf_prog *tmp, *orig_prog =3D prog; struct x64_jit_data *jit_data; @@ -2267,6 +2258,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog = *prog) bool tmp_blinded =3D false; bool extra_pass =3D false; bool padding =3D false; + u8 *rw_image =3D NULL; u8 *image =3D NULL; int *addrs; int pass; @@ -2302,6 +2294,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog = *prog) oldproglen =3D jit_data->proglen; image =3D jit_data->image; header =3D jit_data->header; + rw_header =3D jit_data->rw_header; + rw_image =3D (void *)rw_header + ((void *)image - (void *)header); extra_pass =3D true; padding =3D true; goto skip_init_addrs; @@ -2332,12 +2326,12 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_pro= g *prog) for (pass =3D 0; pass < MAX_PASSES || image; pass++) { if (!padding && pass >=3D PADDING_PASSES) padding =3D true; - proglen =3D do_jit(prog, addrs, image, oldproglen, &ctx, padding); + proglen =3D do_jit(prog, addrs, image, rw_image, oldproglen, &ctx, paddi= ng); if (proglen <=3D 0) { out_image: image =3D NULL; if (header) - bpf_jit_binary_free(header); + bpf_jit_binary_pack_free(header, rw_header); prog =3D orig_prog; goto out_addrs; } @@ -2361,8 +2355,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog = *prog) sizeof(struct exception_table_entry); =20 /* allocate module memory for x86 insns and extable */ - header =3D bpf_jit_binary_alloc(roundup(proglen, align) + extable_size, - &image, align, jit_fill_hole); + header =3D bpf_jit_binary_pack_alloc(roundup(proglen, align) + extable_= size, + &image, align, &rw_header, &rw_image, + jit_fill_hole); if (!header) { prog =3D orig_prog; goto out_addrs; @@ -2378,14 +2373,22 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_pro= g *prog) =20 if (image) { if (!prog->is_func || extra_pass) { + /* + * bpf_jit_binary_pack_finalize fails in two scenarios: + * 1) header is not pointing to proper module memory; + * 2) the arch doesn't support bpf_arch_text_copy(). + * + * Both cases are serious bugs that we should not continue. + */ + BUG_ON(bpf_jit_binary_pack_finalize(prog, header, rw_header)); bpf_tail_call_direct_fixup(prog); - bpf_jit_binary_lock_ro(header); } else { jit_data->addrs =3D addrs; jit_data->ctx =3D ctx; jit_data->proglen =3D proglen; jit_data->image =3D image; jit_data->header =3D header; + jit_data->rw_header =3D rw_header; } prog->bpf_func =3D (void *)image; prog->jited =3D 1; --=20 2.30.2