From nobody Fri Apr 10 12:33:37 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AAD8ECAAA1 for ; Wed, 7 Sep 2022 00:55:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229552AbiIGAzd (ORCPT ); Tue, 6 Sep 2022 20:55:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229577AbiIGAzb (ORCPT ); Tue, 6 Sep 2022 20:55:31 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2A486CF7D; Tue, 6 Sep 2022 17:55:28 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 34704B81ADB; Wed, 7 Sep 2022 00:55:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 83B8CC43470; Wed, 7 Sep 2022 00:55:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1662512125; bh=9qMDnC2buKP4UgClMUvu+xZbN9+LPObN7qc5Fphlt5w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Ok4mh7/paLtWrv7GTc1p9anKETp0eOf46AklpH9DLHqKcHmTETuRS7mgH8RzOd8us KEfHdaCu0luR16C7NwhmdToOQWBMdBUgD4pvGXwshIUphBUQgx2uO+Z0nLNiX9R1Q7 gxsBVpChlG/U3yDzRlBp/ZFfr69/wnF//CH5O11Nr8TJX9YmFhywAPZE5MXPZ65KIr eI2rfVfxnen0Qp20/LTAQmCsWZHiKnfZ38lNdpGnqa2wh9bR/ulSrqmpB0UsmvyZ3K gqz2vWpyzL3xEy4KWePajsdMZl1cYFa0BhiXRbKAW/1zCF5C8Uh1gvBvTmvd7iKZV7 1TH6Ql2rvh4zA== From: "Masami Hiramatsu (Google)" To: Steven Rostedt , Peter Zijlstra Cc: Masami Hiramatsu , Ingo Molnar , Suleiman Souhlal , bpf , linux-kernel@vger.kernel.org, Borislav Petkov , Josh Poimboeuf , x86@kernel.org Subject: [PATCH 1/2] x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK Date: Wed, 7 Sep 2022 09:55:21 +0900 Message-Id: <166251212072.632004.16078953024905883328.stgit@devnote2> X-Mailer: git-send-email 2.25.1 In-Reply-To: <166251211081.632004.1842371136165709807.stgit@devnote2> References: <166251211081.632004.1842371136165709807.stgit@devnote2> User-Agent: StGit/0.19 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Masami Hiramatsu (Google) Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for padding after RET instruction, kprobes always failes to check the probed instruction boundary by decoding the function body if the probed address is after such paddings (Note that some conditional code blocks will be placed after RET instruction, if compiler decides it is not on the hot path.) This is because kprobes expects someone (e.g. kgdb) puts the INT3 as a software breakpoint and it will replace the original instruction. But There are INT3 just for padding in the function, it doesn't need to recover the original instruction. To avoid this issue, if kprobe finds an INT3, it gets the address of next non-INT3 byte, and search a branch which jumps to the address. If there is the branch, these INT3 will be for padding, so it can be skipped. Signed-off-by: Masami Hiramatsu (Google) Suggested-by: Peter Zijlstra Fixes: 15e67227c49a ("x86: Undo return-thunk damage") Cc: stable@vger.kernel.org --- arch/x86/kernel/kprobes/common.h | 67 ++++++++++++++++++++++++++++++++++= ++++ arch/x86/kernel/kprobes/core.c | 57 ++++++++++++++++++-------------- arch/x86/kernel/kprobes/opt.c | 23 +------------ 3 files changed, 100 insertions(+), 47 deletions(-) diff --git a/arch/x86/kernel/kprobes/common.h b/arch/x86/kernel/kprobes/com= mon.h index c993521d4933..2adb36eaf366 100644 --- a/arch/x86/kernel/kprobes/common.h +++ b/arch/x86/kernel/kprobes/common.h @@ -92,6 +92,73 @@ extern int __copy_instruction(u8 *dest, u8 *src, u8 *rea= l, struct insn *insn); extern void synthesize_reljump(void *dest, void *from, void *to); extern void synthesize_relcall(void *dest, void *from, void *to); =20 +/* Return the jump target address or 0 */ +static inline unsigned long insn_get_branch_addr(struct insn *insn) +{ + switch (insn->opcode.bytes[0]) { + case 0xe0: /* loopne */ + case 0xe1: /* loope */ + case 0xe2: /* loop */ + case 0xe3: /* jcxz */ + case 0xe9: /* near relative jump */ + case 0xeb: /* short relative jump */ + break; + case 0x0f: + if ((insn->opcode.bytes[1] & 0xf0) =3D=3D 0x80) /* jcc near */ + break; + return 0; + default: + if ((insn->opcode.bytes[0] & 0xf0) =3D=3D 0x70) /* jcc short */ + break; + return 0; + } + return (unsigned long)insn->next_byte + insn->immediate.value; +} + +static inline void __decode_insn(struct insn *insn, kprobe_opcode_t *buf, + unsigned long addr) +{ + unsigned long recovered_insn; + + /* + * Check if the instruction has been modified by another + * kprobe, in which case we replace the breakpoint by the + * original instruction in our buffer. + * Also, jump optimization will change the breakpoint to + * relative-jump. Since the relative-jump itself is + * normally used, we just go through if there is no kprobe. + */ + recovered_insn =3D recover_probed_instruction(buf, addr); + if (!recovered_insn || + insn_decode_kernel(insn, (void *)recovered_insn) < 0) { + insn->kaddr =3D NULL; + } else { + /* Recover address */ + insn->kaddr =3D (void *)addr; + insn->next_byte =3D (void *)(addr + insn->length); + } +} + +/* Iterate instructions in [saddr, eaddr), insn->next_byte is loop cursor.= */ +#define for_each_insn(insn, saddr, eaddr, buf) \ + for (__decode_insn(insn, buf, saddr); \ + (insn)->kaddr && (unsigned long)(insn)->next_byte < eaddr; \ + __decode_insn(insn, buf, (unsigned long)(insn)->next_byte)) + +/* Return next non-INT3 address, or 0 if failed to access */ +static inline unsigned long skip_padding_int3(unsigned long addr) +{ + unsigned char ops; + + while (get_kernel_nofault(ops, (void *)addr) =3D=3D 0) { + if (ops !=3D INT3_INSN_OPCODE) + return addr; + addr++; + } + + return 0; +} + #ifdef CONFIG_OPTPROBES extern int setup_detour_execution(struct kprobe *p, struct pt_regs *regs, = int reenter); extern unsigned long __recover_optprobed_insn(kprobe_opcode_t *buf, unsign= ed long addr); diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 4c3c27b6aea3..b20484cc0025 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -255,44 +255,49 @@ unsigned long recover_probed_instruction(kprobe_opcod= e_t *buf, unsigned long add /* Check if paddr is at an instruction boundary */ static int can_probe(unsigned long paddr) { - unsigned long addr, __addr, offset =3D 0; - struct insn insn; kprobe_opcode_t buf[MAX_INSN_SIZE]; + unsigned long addr, offset =3D 0; + struct insn insn; =20 if (!kallsyms_lookup_size_offset(paddr, NULL, &offset)) return 0; =20 - /* Decode instructions */ - addr =3D paddr - offset; - while (addr < paddr) { - int ret; + /* The first address must be instruction boundary. */ + if (!offset) + return 1; =20 + /* Decode instructions */ + for_each_insn(&insn, paddr - offset, paddr, buf) { /* - * Check if the instruction has been modified by another - * kprobe, in which case we replace the breakpoint by the - * original instruction in our buffer. - * Also, jump optimization will change the breakpoint to - * relative-jump. Since the relative-jump itself is - * normally used, we just go through if there is no kprobe. + * CONFIG_RETHUNK or CONFIG_SLS or another debug feature + * may install INT3. */ - __addr =3D recover_probed_instruction(buf, addr); - if (!__addr) - return 0; - - ret =3D insn_decode_kernel(&insn, (void *)__addr); - if (ret < 0) - return 0; + if (insn.opcode.bytes[0] =3D=3D INT3_INSN_OPCODE) { + /* Find the next non-INT3 instruction address */ + addr =3D skip_padding_int3((unsigned long)insn.kaddr); + if (!addr) + return 0; + /* + * This can be a padding INT3 for CONFIG_RETHUNK or + * CONFIG_SLS. If a branch jumps to the address next + * to the INT3 sequence, this is just for padding, + * then we can continue decoding. + */ + for_each_insn(&insn, paddr - offset, addr, buf) { + if (insn_get_branch_addr(&insn) =3D=3D addr) + goto found; + } =20 - /* - * Another debugging subsystem might insert this breakpoint. - * In that case, we can't recover it. - */ - if (insn.opcode.bytes[0] =3D=3D INT3_INSN_OPCODE) + /* This INT3 can not be decoded safely. */ return 0; - addr +=3D insn.length; +found: + /* Set loop cursor */ + insn.next_byte =3D (void *)addr; + continue; + } } =20 - return (addr =3D=3D paddr); + return ((unsigned long)insn.next_byte =3D=3D paddr); } =20 /* If x86 supports IBT (ENDBR) it must be skipped. */ diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index e6b8c5362b94..2e41850cab06 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -235,28 +235,9 @@ static int __insn_is_indirect_jump(struct insn *insn) /* Check whether insn jumps into specified address range */ static int insn_jump_into_range(struct insn *insn, unsigned long start, in= t len) { - unsigned long target =3D 0; - - switch (insn->opcode.bytes[0]) { - case 0xe0: /* loopne */ - case 0xe1: /* loope */ - case 0xe2: /* loop */ - case 0xe3: /* jcxz */ - case 0xe9: /* near relative jump */ - case 0xeb: /* short relative jump */ - break; - case 0x0f: - if ((insn->opcode.bytes[1] & 0xf0) =3D=3D 0x80) /* jcc near */ - break; - return 0; - default: - if ((insn->opcode.bytes[0] & 0xf0) =3D=3D 0x70) /* jcc short */ - break; - return 0; - } - target =3D (unsigned long)insn->next_byte + insn->immediate.value; + unsigned long target =3D insn_get_branch_addr(insn); =20 - return (start <=3D target && target <=3D start + len); + return target ? (start <=3D target && target <=3D start + len) : 0; } =20 static int insn_is_indirect_jump(struct insn *insn) From nobody Fri Apr 10 12:33:37 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBBE9ECAAA1 for ; Wed, 7 Sep 2022 00:55:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229611AbiIGAzs (ORCPT ); Tue, 6 Sep 2022 20:55:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60390 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229627AbiIGAzl (ORCPT ); Tue, 6 Sep 2022 20:55:41 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B5406CF4D; Tue, 6 Sep 2022 17:55:36 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 78D9661738; Wed, 7 Sep 2022 00:55:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D0DD1C433D7; Wed, 7 Sep 2022 00:55:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1662512135; bh=se/afEWZU+1U6axFk6aOj2YfAlp97jvdlbAVZU/vgKY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JCF4VYzOLWrXvWtPxzp1bA2kjumy/lT2NtuXJkaibzWjQo4ThXymOm7ujIeaY3lIC X7PfqjW3K8o+dttODAxuWTSZjaEnSQkxbCKUPQPoCAeIb6eGtZMN5UmoENWrZz8JjG 2s2NyDmLWO4OR7nE5J9N5BBsxngt5UUrE7tvmr4TB7339cHMIQo24wHClpUcwFPDWg xdPjbpVyE+WPqnHYSA+6ZBXBeQz2fR7qyTG7UYK/1xdNYLeNloP98Pg/OUZnDeZ2MP 51qMctFRTEfliB48H+LyXFkkd9FlE5iGrE48oooQ6tjE4YiZur0X2opq5Eh1HFSVpU lHXpBBWf6evzg== From: "Masami Hiramatsu (Google)" To: Steven Rostedt , Peter Zijlstra Cc: Masami Hiramatsu , Ingo Molnar , Suleiman Souhlal , bpf , linux-kernel@vger.kernel.org, Borislav Petkov , Josh Poimboeuf , x86@kernel.org Subject: [PATCH 2/2] x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK Date: Wed, 7 Sep 2022 09:55:31 +0900 Message-Id: <166251213122.632004.14890772161914623561.stgit@devnote2> X-Mailer: git-send-email 2.25.1 In-Reply-To: <166251211081.632004.1842371136165709807.stgit@devnote2> References: <166251211081.632004.1842371136165709807.stgit@devnote2> User-Agent: StGit/0.19 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Masami Hiramatsu (Google) Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for padding after RET instruction, kprobe jump optimization always fails on the functions with INT3 padding inside the function body. (It already checks the INT3 padding between functions, but not inside the function) To avoid this issue, when it finds an INT3, read following bytes and find the next non-INT3 instruction, and decode the function again to search a branch which jumps to that address. If it can not find such branch instruction, it thinks that INT3 does not come from RETHUNK or SLS. Signed-off-by: Masami Hiramatsu (Google) Suggested-by: Peter Zijlstra Fixes: 15e67227c49a ("x86: Undo return-thunk damage") Cc: stable@vger.kernel.org --- arch/x86/kernel/kprobes/opt.c | 70 +++++++++++++++++++------------------= ---- 1 file changed, 33 insertions(+), 37 deletions(-) diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index 2e41850cab06..ed77eeeef4ed 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -260,25 +260,12 @@ static int insn_is_indirect_jump(struct insn *insn) return ret; } =20 -static bool is_padding_int3(unsigned long addr, unsigned long eaddr) -{ - unsigned char ops; - - for (; addr < eaddr; addr++) { - if (get_kernel_nofault(ops, (void *)addr) < 0 || - ops !=3D INT3_INSN_OPCODE) - return false; - } - - return true; -} - /* Decode whole function to ensure any instructions don't jump into target= */ static int can_optimize(unsigned long paddr) { - unsigned long addr, size =3D 0, offset =3D 0; - struct insn insn; kprobe_opcode_t buf[MAX_INSN_SIZE]; + unsigned long size =3D 0, offset =3D 0; + struct insn insn; =20 /* Lookup symbol including addr */ if (!kallsyms_lookup_size_offset(paddr, &size, &offset)) @@ -296,11 +283,9 @@ static int can_optimize(unsigned long paddr) if (size - offset < JMP32_INSN_SIZE) return 0; =20 - /* Decode instructions */ - addr =3D paddr - offset; - while (addr < paddr - offset + size) { /* Decode until function end */ - unsigned long recovered_insn; - int ret; + /* Decode all instructions in the function */ + for_each_insn(&insn, paddr - offset, paddr - offset + size, buf) { + unsigned long addr =3D (unsigned long)insn.kaddr; =20 if (search_exception_tables(addr)) /* @@ -308,31 +293,42 @@ static int can_optimize(unsigned long paddr) * we can't optimize kprobe in this function. */ return 0; - recovered_insn =3D recover_probed_instruction(buf, addr); - if (!recovered_insn) - return 0; =20 - ret =3D insn_decode_kernel(&insn, (void *)recovered_insn); - if (ret < 0) + if (insn.opcode.bytes[0] =3D=3D INT3_INSN_OPCODE) { + addr =3D skip_padding_int3(addr); + if (!addr) + return 0; + /* + * If addr becomes the next function entry, this is + * the INT3 padding between functions. + */ + if (addr - 1 =3D=3D paddr - offset + size) + return 1; + + /* + * This can be padding INT3 for CONFIG_RETHUNK or + * CONFIG_SLS. If a branch jumps to the address next + * to the INT3 sequence, this is just for padding, + * then we can continue decoding. + */ + for_each_insn(&insn, paddr - offset, addr, buf) { + if (insn_get_branch_addr(&insn) =3D=3D addr) + goto found; + } + + /* This INT3 can not be decoded safely. */ return 0; +found: + /* Set loop cursor */ + insn.next_byte =3D (void *)addr; + continue; + } =20 - /* - * In the case of detecting unknown breakpoint, this could be - * a padding INT3 between functions. Let's check that all the - * rest of the bytes are also INT3. - */ - if (insn.opcode.bytes[0] =3D=3D INT3_INSN_OPCODE) - return is_padding_int3(addr, paddr - offset + size) ? 1 : 0; - - /* Recover address */ - insn.kaddr =3D (void *)addr; - insn.next_byte =3D (void *)(addr + insn.length); /* Check any instructions don't jump into target */ if (insn_is_indirect_jump(&insn) || insn_jump_into_range(&insn, paddr + INT3_INSN_SIZE, DISP32_SIZE)) return 0; - addr +=3D insn.length; } =20 return 1;