[for-next][PATCH 4/6] scripts/sorttable: Zero out weak functions in mcount_loc table

Steven Rostedt posted 6 patches 10 months ago
[for-next][PATCH 4/6] scripts/sorttable: Zero out weak functions in mcount_loc table
Posted by Steven Rostedt 10 months ago
From: Steven Rostedt <rostedt@goodmis.org>

When a function is annotated as "weak" and is overridden, the code is not
removed. If it is traced, the fentry/mcount location in the weak function
will be referenced by the "__mcount_loc" section. This will then be added
to the available_filter_functions list. Since only the address of the
functions are listed, to find the name to show, a search of kallsyms is
used.

Since kallsyms will return the function by simply finding the function
that the address is after but before the next function, an address of a
weak function will show up as the function before it. This is because
kallsyms does not save names of weak functions. This has caused issues in
the past, as now the traced weak function will be listed in
available_filter_functions with the name of the function before it.

At best, this will cause the previous function's name to be listed twice.
At worse, if the previous function was marked notrace, it will now show up
as a function that can be traced. Note that it only shows up that it can
be traced but will not be if enabled, which causes confusion.

 https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/

The commit b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid
adding weak function") was a workaround to this by checking the function
address before printing its name. If the address was too far from the
function given by the name then instead of printing the name it would
print: __ftrace_invalid_address___<invalid-offset>

The real issue is that these invalid addresses are listed in the ftrace
table look up which available_filter_functions is derived from. A place
holder must be listed in that file because set_ftrace_filter may take a
series of indexes into that file instead of names to be able to do O(1)
lookups to enable filtering (many tools use this method).

Even if kallsyms saved the size of the function, it does not remove the
need of having these place holders. The real solution is to not add a weak
function into the ftrace table in the first place.

To solve this, the sorttable.c code that sorts the mcount regions during
the build is modified to take a "nm -S vmlinux" input, sort it, and any
function listed in the mcount_loc section that is not within a boundary of
the function list given by nm is considered a weak function and is zeroed
out.

Note, this does not mean they will remain zero when booting as KASLR
will still shift those addresses. To handle this, the entries in the
mcount_loc section will be ignored if they are zero or match the
kaslr_offset() value.

Before:

 ~# grep __ftrace_invalid_address___ /sys/kernel/tracing/available_filter_functions | wc -l
 551

After:

 ~# grep __ftrace_invalid_address___ /sys/kernel/tracing/available_filter_functions | wc -l
 0

Cc: bpf <bpf@vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Cc: Martin  Kelly <martin.kelly@crowdstrike.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/20250218200022.883095980@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/ftrace.c   |   6 +-
 scripts/link-vmlinux.sh |   4 +-
 scripts/sorttable.c     | 128 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 134 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 728ecda6e8d4..e3f89924f603 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7004,6 +7004,7 @@ static int ftrace_process_locs(struct module *mod,
 	unsigned long count;
 	unsigned long *p;
 	unsigned long addr;
+	unsigned long kaslr;
 	unsigned long flags = 0; /* Shut up gcc */
 	int ret = -ENOMEM;
 
@@ -7052,6 +7053,9 @@ static int ftrace_process_locs(struct module *mod,
 		ftrace_pages->next = start_pg;
 	}
 
+	/* For zeroed locations that were shifted for core kernel */
+	kaslr = !mod ? kaslr_offset() : 0;
+
 	p = start;
 	pg = start_pg;
 	while (p < end) {
@@ -7063,7 +7067,7 @@ static int ftrace_process_locs(struct module *mod,
 		 * object files to satisfy alignments.
 		 * Skip any NULL pointers.
 		 */
-		if (!addr) {
+		if (!addr || addr == kaslr) {
 			skipped++;
 			continue;
 		}
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 56a077d204cf..59b07fe6fd00 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -177,12 +177,14 @@ mksysmap()
 
 sorttable()
 {
-	${objtree}/scripts/sorttable ${1}
+	${NM} -S ${1} > .tmp_vmlinux.nm-sort
+	${objtree}/scripts/sorttable -s .tmp_vmlinux.nm-sort ${1}
 }
 
 cleanup()
 {
 	rm -f .btf.*
+	rm -f .tmp_vmlinux.nm-sort
 	rm -f System.map
 	rm -f vmlinux
 	rm -f vmlinux.map
diff --git a/scripts/sorttable.c b/scripts/sorttable.c
index ec02a2852efb..23c7e0e6c024 100644
--- a/scripts/sorttable.c
+++ b/scripts/sorttable.c
@@ -580,6 +580,98 @@ static void rela_write_addend(Elf_Rela *rela, uint64_t val)
 	e.rela_write_addend(rela, val);
 }
 
+struct func_info {
+	uint64_t	addr;
+	uint64_t	size;
+};
+
+/* List of functions created by: nm -S vmlinux */
+static struct func_info *function_list;
+static int function_list_size;
+
+/* Allocate functions in 1k blocks */
+#define FUNC_BLK_SIZE	1024
+#define FUNC_BLK_MASK	(FUNC_BLK_SIZE - 1)
+
+static int add_field(uint64_t addr, uint64_t size)
+{
+	struct func_info *fi;
+	int fsize = function_list_size;
+
+	if (!(fsize & FUNC_BLK_MASK)) {
+		fsize += FUNC_BLK_SIZE;
+		fi = realloc(function_list, fsize * sizeof(struct func_info));
+		if (!fi)
+			return -1;
+		function_list = fi;
+	}
+	fi = &function_list[function_list_size++];
+	fi->addr = addr;
+	fi->size = size;
+	return 0;
+}
+
+/* Only return match if the address lies inside the function size */
+static int cmp_func_addr(const void *K, const void *A)
+{
+	uint64_t key = *(const uint64_t *)K;
+	const struct func_info *a = A;
+
+	if (key < a->addr)
+		return -1;
+	return key >= a->addr + a->size;
+}
+
+/* Find the function in function list that is bounded by the function size */
+static int find_func(uint64_t key)
+{
+	return bsearch(&key, function_list, function_list_size,
+		       sizeof(struct func_info), cmp_func_addr) != NULL;
+}
+
+static int cmp_funcs(const void *A, const void *B)
+{
+	const struct func_info *a = A;
+	const struct func_info *b = B;
+
+	if (a->addr < b->addr)
+		return -1;
+	return a->addr > b->addr;
+}
+
+static int parse_symbols(const char *fname)
+{
+	FILE *fp;
+	char addr_str[20]; /* Only need 17, but round up to next int size */
+	char size_str[20];
+	char type;
+
+	fp = fopen(fname, "r");
+	if (!fp) {
+		perror(fname);
+		return -1;
+	}
+
+	while (fscanf(fp, "%16s %16s %c %*s\n", addr_str, size_str, &type) == 3) {
+		uint64_t addr;
+		uint64_t size;
+
+		/* Only care about functions */
+		if (type != 't' && type != 'T' && type != 'W')
+			continue;
+
+		addr = strtoull(addr_str, NULL, 16);
+		size = strtoull(size_str, NULL, 16);
+		if (add_field(addr, size) < 0)
+			return -1;
+	}
+	fclose(fp);
+
+	qsort(function_list, function_list_size, sizeof(struct func_info), cmp_funcs);
+
+	return 0;
+}
+
 static pthread_t mcount_sort_thread;
 static bool sort_reloc;
 
@@ -752,6 +844,21 @@ static void *sort_mcount_loc(void *arg)
 		goto out;
 	}
 
+	/* zero out any locations not found by function list */
+	if (function_list_size) {
+		for (void *ptr = vals; ptr < vals + size; ptr += long_size) {
+			uint64_t key;
+
+			key = long_size == 4 ? r((uint32_t *)ptr) : r8((uint64_t *)ptr);
+			if (!find_func(key)) {
+				if (long_size == 4)
+					*(uint32_t *)ptr = 0;
+				else
+					*(uint64_t *)ptr = 0;
+			}
+		}
+	}
+
 	compare_values = long_size == 4 ? compare_values_32 : compare_values_64;
 
 	qsort(vals, count, long_size, compare_values);
@@ -801,6 +908,8 @@ static void get_mcount_loc(struct elf_mcount_loc *emloc, Elf_Shdr *symtab_sec,
 		return;
 	}
 }
+#else /* MCOUNT_SORT_ENABLED */
+static inline int parse_symbols(const char *fname) { return 0; }
 #endif
 
 static int do_sort(Elf_Ehdr *ehdr,
@@ -1256,14 +1365,29 @@ int main(int argc, char *argv[])
 	int i, n_error = 0;  /* gcc-4.3.0 false positive complaint */
 	size_t size = 0;
 	void *addr = NULL;
+	int c;
+
+	while ((c = getopt(argc, argv, "s:")) >= 0) {
+		switch (c) {
+		case 's':
+			if (parse_symbols(optarg) < 0) {
+				fprintf(stderr, "Could not parse %s\n", optarg);
+				return -1;
+			}
+			break;
+		default:
+			fprintf(stderr, "usage: sorttable [-s nm-file] vmlinux...\n");
+			return 0;
+		}
+	}
 
-	if (argc < 2) {
+	if ((argc - optind) < 1) {
 		fprintf(stderr, "usage: sorttable vmlinux...\n");
 		return 0;
 	}
 
 	/* Process each file in turn, allowing deep failure. */
-	for (i = 1; i < argc; i++) {
+	for (i = optind; i < argc; i++) {
 		addr = mmap_file(argv[i], &size);
 		if (!addr) {
 			++n_error;
-- 
2.47.2
Re: [for-next][PATCH 4/6] scripts/sorttable: Zero out weak functions in mcount_loc table
Posted by Nathan Chancellor 9 months, 3 weeks ago
Hi Steve,

On Wed, Feb 19, 2025 at 10:18:19AM -0500, Steven Rostedt wrote:
> When a function is annotated as "weak" and is overridden, the code is not
> removed. If it is traced, the fentry/mcount location in the weak function
> will be referenced by the "__mcount_loc" section. This will then be added
> to the available_filter_functions list. Since only the address of the
> functions are listed, to find the name to show, a search of kallsyms is
> used.
> 
> Since kallsyms will return the function by simply finding the function
> that the address is after but before the next function, an address of a
> weak function will show up as the function before it. This is because
> kallsyms does not save names of weak functions. This has caused issues in
> the past, as now the traced weak function will be listed in
> available_filter_functions with the name of the function before it.
> 
> At best, this will cause the previous function's name to be listed twice.
> At worse, if the previous function was marked notrace, it will now show up
> as a function that can be traced. Note that it only shows up that it can
> be traced but will not be if enabled, which causes confusion.
> 
>  https://lore.kernel.org/all/20220412094923.0abe90955e5db486b7bca279@kernel.org/
> 
> The commit b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid
> adding weak function") was a workaround to this by checking the function
> address before printing its name. If the address was too far from the
> function given by the name then instead of printing the name it would
> print: __ftrace_invalid_address___<invalid-offset>
> 
> The real issue is that these invalid addresses are listed in the ftrace
> table look up which available_filter_functions is derived from. A place
> holder must be listed in that file because set_ftrace_filter may take a
> series of indexes into that file instead of names to be able to do O(1)
> lookups to enable filtering (many tools use this method).
> 
> Even if kallsyms saved the size of the function, it does not remove the
> need of having these place holders. The real solution is to not add a weak
> function into the ftrace table in the first place.
> 
> To solve this, the sorttable.c code that sorts the mcount regions during
> the build is modified to take a "nm -S vmlinux" input, sort it, and any
> function listed in the mcount_loc section that is not within a boundary of
> the function list given by nm is considered a weak function and is zeroed
> out.
> 
> Note, this does not mean they will remain zero when booting as KASLR
> will still shift those addresses. To handle this, the entries in the
> mcount_loc section will be ignored if they are zero or match the
> kaslr_offset() value.
> 
> Before:
> 
>  ~# grep __ftrace_invalid_address___ /sys/kernel/tracing/available_filter_functions | wc -l
>  551
> 
> After:
> 
>  ~# grep __ftrace_invalid_address___ /sys/kernel/tracing/available_filter_functions | wc -l
>  0

I am also seeing a crash when booting arm64 with certain configurations
that I don't see at the parent change.

  $ printf 'CONFIG_%s=y\n' FTRACE FUNCTION_TRACER >kernel/configs/repro.config

  $ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux- mrproper virtconfig repro.config Image.gz

  $ qemu-system-aarch64 \
      -display none \
      -nodefaults \
      -cpu max,pauth-impdef=true \
      -machine virt,gic-version=max,virtualization=true \
      -append 'console=ttyAMA0 earlycon' \
      -kernel arch/arm64/boot/Image.gz \
      -initrd rootfs.cpio \
      -m 512m \
      -serial mon:stdio
  [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510]
  [    0.000000] Linux version 6.14.0-rc4-next-20250224-dirty (nathan@ax162) (aarch64-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP PREEMPT Mon Feb 24 18:47:59 PST 2025
  ...
  [    0.000000] ------------[ cut here ]------------
  [    0.000000] kernel BUG at arch/arm64/kernel/patching.c:39!
  [    0.000000] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
  [    0.000000] Modules linked in:
  [    0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc4-next-20250224-dirty #1
  [    0.000000] Hardware name: linux,dummy-virt (DT)
  [    0.000000] pstate: 000000c9 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [    0.000000] pc : patch_map.constprop.0+0xfc/0x108
  [    0.000000] lr : patch_map.constprop.0+0x3c/0x108
  [    0.000000] sp : ffff96c0b6fa3ce0
  [    0.000000] x29: ffff96c0b6fa3ce0 x28: ffff96c0b6faafd0 x27: 00000000000000ff
  [    0.000000] x26: fff9f3a0c2408080 x25: 0000000000000001 x24: fff9f3a0c2408000
  [    0.000000] x23: 0000000000000000 x22: ffff96c0b72391d8 x21: 00000000000000c0
  [    0.000000] x20: 000016c035400000 x19: 000016c035400000 x18: 00000000f0000000
  [    0.000000] x17: 0000000000000068 x16: 0000000000000100 x15: ffff96c0b6fa39c4
  [    0.000000] x14: 0000000000000008 x13: 0000000000000000 x12: ffffe9ce43090280
  [    0.000000] x11: fff9f3a0dfef80c8 x10: ffffe9ce43090288 x9 : 0000000000000000
  [    0.000000] x8 : fff9f3a0dfef80b8 x7 : fffa5ce02929a000 x6 : ffff96c0b6fa39d0
  [    0.000000] x5 : 0000000000000030 x4 : 0000000000000000 x3 : ffff96c0b69b4000
  [    0.000000] x2 : ffff96c0b69b4000 x1 : 0000000000000000 x0 : 0000000000000000
  [    0.000000] Call trace:
  [    0.000000]  patch_map.constprop.0+0xfc/0x108 (P)
  [    0.000000]  aarch64_insn_write_literal_u64+0x38/0x80
  [    0.000000]  ftrace_init_nop+0x40/0xe0
  [    0.000000]  ftrace_process_locs+0x2a8/0x530
  [    0.000000]  ftrace_init+0x60/0x130
  [    0.000000]  start_kernel+0x4ac/0x708
  [    0.000000]  __primary_switched+0x88/0x98
  [    0.000000] Code: d1681000 a8c27bfd d50323bf d65f03c0 (d4210000)
  [    0.000000] ---[ end trace 0000000000000000 ]---
  [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
  [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

I see the same crash with clang (after applying your suggested fix for
the issue that Arnd brought up).

  [    0.000000] Unable to handle kernel paging request at virtual address 00001cb7f7800008
  [    0.000000] Mem abort info:
  [    0.000000]   ESR = 0x000000009600002b
  [    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
  [    0.000000]   SET = 0, FnV = 0
  [    0.000000]   EA = 0, S1PTW = 0
  [    0.000000]   FSC = 0x2b: level -1 translation fault
  [    0.000000] Data abort info:
  [    0.000000]   ISV = 0, ISS = 0x0000002b, ISS2 = 0x00000000
  [    0.000000]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  [    0.000000]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
  [    0.000000] [00001cb7f7800008] user address but active_mm is swapper
  [    0.000000] Internal error: Oops: 000000009600002b [#1] PREEMPT SMP
  [    0.000000] Modules linked in:
  [    0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc4-next-20250224-dirty #1
  [    0.000000] Hardware name: linux,dummy-virt (DT)
  [    0.000000] pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [    0.000000] pc : ftrace_call_adjust+0x44/0xd0
  [    0.000000] lr : ftrace_process_locs+0x1e0/0x560
  [    0.000000] sp : ffff9cb878f93da0
  [    0.000000] x29: ffff9cb878f93da0 x28: ffff9cb879234000 x27: ffff9cb879234000
  [    0.000000] x26: 00001cb7f7800000 x25: ffff9cb878ed8578 x24: fffac24642008000
  [    0.000000] x23: ffff9cb878f3cf90 x22: fffac24642008000 x21: 0000000000000000
  [    0.000000] x20: 0000000000001000 x19: 00001cb7f7800000 x18: 0000000000000068
  [    0.000000] x17: 0000000000000002 x16: 00000000fffffffe x15: ffff9cb878fa58c0
  [    0.000000] x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000000
  [    0.000000] x11: 0000000000000000 x10: 0000000000000000 x9 : 00007fff80000000
  [    0.000000] x8 : 000000000000201f x7 : 0000000000000000 x6 : 6d6067666871ff73
  [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000001
  [    0.000000] x2 : 0000000000000004 x1 : 0000000000000040 x0 : 00001cb7f7800000
  [    0.000000] Call trace:
  [    0.000000]  ftrace_call_adjust+0x44/0xd0 (P)
  [    0.000000]  ftrace_process_locs+0x1e0/0x560
  [    0.000000]  ftrace_init+0x7c/0xc8
  [    0.000000]  start_kernel+0x160/0x3b8
  [    0.000000]  __primary_switched+0x88/0x98
  [    0.000000] Code: aa1f03e0 14000014 aa0003f3 528403e8 (b8408e74)

If there is any other information I can provide or patches I can test, I
am more than happy to do so.

Cheers,
Nathan
Re: [for-next][PATCH 4/6] scripts/sorttable: Zero out weak functions in mcount_loc table
Posted by Steven Rostedt 9 months, 3 weeks ago
On Mon, 24 Feb 2025 18:56:31 -0800
Nathan Chancellor <nathan@kernel.org> wrote:

> I am also seeing a crash when booting arm64 with certain configurations
> that I don't see at the parent change.

Thanks, I also just bisected it down to this. But I didn't have early
printk on so I didn't see what was crashing. So this is helpful.

> 
>   $ printf 'CONFIG_%s=y\n' FTRACE FUNCTION_TRACER >kernel/configs/repro.config
> 
>   $ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux- mrproper virtconfig repro.config Image.gz
> 
>   $ qemu-system-aarch64 \
>       -display none \
>       -nodefaults \
>       -cpu max,pauth-impdef=true \
>       -machine virt,gic-version=max,virtualization=true \
>       -append 'console=ttyAMA0 earlycon' \
>       -kernel arch/arm64/boot/Image.gz \
>       -initrd rootfs.cpio \
>       -m 512m \
>       -serial mon:stdio
>   [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510]
>   [    0.000000] Linux version 6.14.0-rc4-next-20250224-dirty (nathan@ax162) (aarch64-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP PREEMPT Mon Feb 24 18:47:59 PST 2025
>   ...
>   [    0.000000] ------------[ cut here ]------------
>   [    0.000000] kernel BUG at arch/arm64/kernel/patching.c:39!
>   [    0.000000] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>   [    0.000000] Modules linked in:
>   [    0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc4-next-20250224-dirty #1
>   [    0.000000] Hardware name: linux,dummy-virt (DT)
>   [    0.000000] pstate: 000000c9 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   [    0.000000] pc : patch_map.constprop.0+0xfc/0x108
>   [    0.000000] lr : patch_map.constprop.0+0x3c/0x108
>   [    0.000000] sp : ffff96c0b6fa3ce0
>   [    0.000000] x29: ffff96c0b6fa3ce0 x28: ffff96c0b6faafd0 x27: 00000000000000ff
>   [    0.000000] x26: fff9f3a0c2408080 x25: 0000000000000001 x24: fff9f3a0c2408000
>   [    0.000000] x23: 0000000000000000 x22: ffff96c0b72391d8 x21: 00000000000000c0
>   [    0.000000] x20: 000016c035400000 x19: 000016c035400000 x18: 00000000f0000000
>   [    0.000000] x17: 0000000000000068 x16: 0000000000000100 x15: ffff96c0b6fa39c4
>   [    0.000000] x14: 0000000000000008 x13: 0000000000000000 x12: ffffe9ce43090280
>   [    0.000000] x11: fff9f3a0dfef80c8 x10: ffffe9ce43090288 x9 : 0000000000000000
>   [    0.000000] x8 : fff9f3a0dfef80b8 x7 : fffa5ce02929a000 x6 : ffff96c0b6fa39d0
>   [    0.000000] x5 : 0000000000000030 x4 : 0000000000000000 x3 : ffff96c0b69b4000
>   [    0.000000] x2 : ffff96c0b69b4000 x1 : 0000000000000000 x0 : 0000000000000000
>   [    0.000000] Call trace:
>   [    0.000000]  patch_map.constprop.0+0xfc/0x108 (P)
>   [    0.000000]  aarch64_insn_write_literal_u64+0x38/0x80
>   [    0.000000]  ftrace_init_nop+0x40/0xe0
>   [    0.000000]  ftrace_process_locs+0x2a8/0x530
>   [    0.000000]  ftrace_init+0x60/0x130
>   [    0.000000]  start_kernel+0x4ac/0x708
>   [    0.000000]  __primary_switched+0x88/0x98
>   [    0.000000] Code: d1681000 a8c27bfd d50323bf d65f03c0 (d4210000)
>   [    0.000000] ---[ end trace 0000000000000000 ]---
>   [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>   [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> 
> I see the same crash with clang (after applying your suggested fix for
> the issue that Arnd brought up).
> 
>   [    0.000000] Unable to handle kernel paging request at virtual address 00001cb7f7800008
>   [    0.000000] Mem abort info:
>   [    0.000000]   ESR = 0x000000009600002b
>   [    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
>   [    0.000000]   SET = 0, FnV = 0
>   [    0.000000]   EA = 0, S1PTW = 0
>   [    0.000000]   FSC = 0x2b: level -1 translation fault
>   [    0.000000] Data abort info:
>   [    0.000000]   ISV = 0, ISS = 0x0000002b, ISS2 = 0x00000000
>   [    0.000000]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   [    0.000000]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>   [    0.000000] [00001cb7f7800008] user address but active_mm is swapper
>   [    0.000000] Internal error: Oops: 000000009600002b [#1] PREEMPT SMP
>   [    0.000000] Modules linked in:
>   [    0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc4-next-20250224-dirty #1
>   [    0.000000] Hardware name: linux,dummy-virt (DT)
>   [    0.000000] pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   [    0.000000] pc : ftrace_call_adjust+0x44/0xd0
>   [    0.000000] lr : ftrace_process_locs+0x1e0/0x560
>   [    0.000000] sp : ffff9cb878f93da0
>   [    0.000000] x29: ffff9cb878f93da0 x28: ffff9cb879234000 x27: ffff9cb879234000
>   [    0.000000] x26: 00001cb7f7800000 x25: ffff9cb878ed8578 x24: fffac24642008000
>   [    0.000000] x23: ffff9cb878f3cf90 x22: fffac24642008000 x21: 0000000000000000
>   [    0.000000] x20: 0000000000001000 x19: 00001cb7f7800000 x18: 0000000000000068
>   [    0.000000] x17: 0000000000000002 x16: 00000000fffffffe x15: ffff9cb878fa58c0
>   [    0.000000] x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000000
>   [    0.000000] x11: 0000000000000000 x10: 0000000000000000 x9 : 00007fff80000000
>   [    0.000000] x8 : 000000000000201f x7 : 0000000000000000 x6 : 6d6067666871ff73
>   [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000 x3 : 0000000000000001
>   [    0.000000] x2 : 0000000000000004 x1 : 0000000000000040 x0 : 00001cb7f7800000
>   [    0.000000] Call trace:
>   [    0.000000]  ftrace_call_adjust+0x44/0xd0 (P)
>   [    0.000000]  ftrace_process_locs+0x1e0/0x560
>   [    0.000000]  ftrace_init+0x7c/0xc8
>   [    0.000000]  start_kernel+0x160/0x3b8
>   [    0.000000]  __primary_switched+0x88/0x98
>   [    0.000000] Code: aa1f03e0 14000014 aa0003f3 528403e8 (b8408e74)
> 
> If there is any other information I can provide or patches I can test, I
> am more than happy to do so.

Thanks, I'm about to go to bed soon and I'll take a look more into it tomorrow.

-- Steve
Re: [for-next][PATCH 4/6] scripts/sorttable: Zero out weak functions in mcount_loc table
Posted by Steven Rostedt 9 months, 3 weeks ago
On Mon, 24 Feb 2025 22:28:33 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> Thanks, I'm about to go to bed soon and I'll take a look more into it tomorrow.

Can you try this patch (it has the clang fix too).

-- Steve

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 27c8def2139d..bec7b5dbdb3b 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7004,7 +7004,6 @@ static int ftrace_process_locs(struct module *mod,
 	unsigned long count;
 	unsigned long *p;
 	unsigned long addr;
-	unsigned long kaslr;
 	unsigned long flags = 0; /* Shut up gcc */
 	unsigned long pages;
 	int ret = -ENOMEM;
@@ -7056,25 +7055,37 @@ static int ftrace_process_locs(struct module *mod,
 		ftrace_pages->next = start_pg;
 	}
 
-	/* For zeroed locations that were shifted for core kernel */
-	kaslr = !mod ? kaslr_offset() : 0;
-
 	p = start;
 	pg = start_pg;
 	while (p < end) {
 		unsigned long end_offset;
-		addr = ftrace_call_adjust(*p++);
+
+		addr = *p++;
+
 		/*
 		 * Some architecture linkers will pad between
 		 * the different mcount_loc sections of different
 		 * object files to satisfy alignments.
 		 * Skip any NULL pointers.
 		 */
-		if (!addr || addr == kaslr) {
+		if (!addr) {
+			skipped++;
+			continue;
+		}
+
+		/*
+		 * If this is core kernel, make sure the address is in core
+		 * or inittext, as weak functions get zeroed and KASLR can
+		 * move them to something other than zero. It just will not
+		 * move it to an area where kernel text is.
+		 */
+		if (!mod && !(is_kernel_text(addr) || is_kernel_inittext(addr))) {
 			skipped++;
 			continue;
 		}
 
+		addr = ftrace_call_adjust(addr);
+
 		end_offset = (pg->index+1) * sizeof(pg->records[0]);
 		if (end_offset > PAGE_SIZE << pg->order) {
 			/* We should have allocated enough */
diff --git a/scripts/sorttable.c b/scripts/sorttable.c
index 23c7e0e6c024..7b4b3714b1af 100644
--- a/scripts/sorttable.c
+++ b/scripts/sorttable.c
@@ -611,13 +611,16 @@ static int add_field(uint64_t addr, uint64_t size)
 	return 0;
 }
 
+/* Used for when mcount/fentry is before the function entry */
+static int before_func;
+
 /* Only return match if the address lies inside the function size */
 static int cmp_func_addr(const void *K, const void *A)
 {
 	uint64_t key = *(const uint64_t *)K;
 	const struct func_info *a = A;
 
-	if (key < a->addr)
+	if (key + before_func < a->addr)
 		return -1;
 	return key >= a->addr + a->size;
 }
@@ -827,9 +830,14 @@ static void *sort_mcount_loc(void *arg)
 		pthread_exit(m_err);
 	}
 
-	if (sort_reloc)
+	if (sort_reloc) {
 		count = fill_relocs(vals, size, ehdr, emloc->start_mcount_loc);
-	else
+		/* gcc may use relocs to save the addresses, but clang does not. */
+		if (!count) {
+			count = fill_addrs(vals, size, start_loc);
+			sort_reloc = 0;
+		}
+	} else
 		count = fill_addrs(vals, size, start_loc);
 
 	if (count < 0) {
@@ -1248,6 +1256,8 @@ static int do_file(char const *const fname, void *addr)
 #ifdef MCOUNT_SORT_ENABLED
 		sort_reloc = true;
 		rela_type = 0x403;
+		/* arm64 uses patchable function entry placing before function */
+		before_func = 8;
 #endif
 		/* fallthrough */
 	case EM_386:
Re: [for-next][PATCH 4/6] scripts/sorttable: Zero out weak functions in mcount_loc table
Posted by Nathan Chancellor 9 months, 3 weeks ago
On Tue, Feb 25, 2025 at 10:47:26AM -0500, Steven Rostedt wrote:
> On Mon, 24 Feb 2025 22:28:33 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Thanks, I'm about to go to bed soon and I'll take a look more into it tomorrow.
> 
> Can you try this patch (it has the clang fix too).

Yup, that appears to fix all my issues with my initial tests.

Tested-by: Nathan Chancellor <nathan@kernel.org>

> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 27c8def2139d..bec7b5dbdb3b 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -7004,7 +7004,6 @@ static int ftrace_process_locs(struct module *mod,
>  	unsigned long count;
>  	unsigned long *p;
>  	unsigned long addr;
> -	unsigned long kaslr;
>  	unsigned long flags = 0; /* Shut up gcc */
>  	unsigned long pages;
>  	int ret = -ENOMEM;
> @@ -7056,25 +7055,37 @@ static int ftrace_process_locs(struct module *mod,
>  		ftrace_pages->next = start_pg;
>  	}
>  
> -	/* For zeroed locations that were shifted for core kernel */
> -	kaslr = !mod ? kaslr_offset() : 0;
> -
>  	p = start;
>  	pg = start_pg;
>  	while (p < end) {
>  		unsigned long end_offset;
> -		addr = ftrace_call_adjust(*p++);
> +
> +		addr = *p++;
> +
>  		/*
>  		 * Some architecture linkers will pad between
>  		 * the different mcount_loc sections of different
>  		 * object files to satisfy alignments.
>  		 * Skip any NULL pointers.
>  		 */
> -		if (!addr || addr == kaslr) {
> +		if (!addr) {
> +			skipped++;
> +			continue;
> +		}
> +
> +		/*
> +		 * If this is core kernel, make sure the address is in core
> +		 * or inittext, as weak functions get zeroed and KASLR can
> +		 * move them to something other than zero. It just will not
> +		 * move it to an area where kernel text is.
> +		 */
> +		if (!mod && !(is_kernel_text(addr) || is_kernel_inittext(addr))) {
>  			skipped++;
>  			continue;
>  		}
>  
> +		addr = ftrace_call_adjust(addr);
> +
>  		end_offset = (pg->index+1) * sizeof(pg->records[0]);
>  		if (end_offset > PAGE_SIZE << pg->order) {
>  			/* We should have allocated enough */
> diff --git a/scripts/sorttable.c b/scripts/sorttable.c
> index 23c7e0e6c024..7b4b3714b1af 100644
> --- a/scripts/sorttable.c
> +++ b/scripts/sorttable.c
> @@ -611,13 +611,16 @@ static int add_field(uint64_t addr, uint64_t size)
>  	return 0;
>  }
>  
> +/* Used for when mcount/fentry is before the function entry */
> +static int before_func;
> +
>  /* Only return match if the address lies inside the function size */
>  static int cmp_func_addr(const void *K, const void *A)
>  {
>  	uint64_t key = *(const uint64_t *)K;
>  	const struct func_info *a = A;
>  
> -	if (key < a->addr)
> +	if (key + before_func < a->addr)
>  		return -1;
>  	return key >= a->addr + a->size;
>  }
> @@ -827,9 +830,14 @@ static void *sort_mcount_loc(void *arg)
>  		pthread_exit(m_err);
>  	}
>  
> -	if (sort_reloc)
> +	if (sort_reloc) {
>  		count = fill_relocs(vals, size, ehdr, emloc->start_mcount_loc);
> -	else
> +		/* gcc may use relocs to save the addresses, but clang does not. */
> +		if (!count) {
> +			count = fill_addrs(vals, size, start_loc);
> +			sort_reloc = 0;
> +		}
> +	} else
>  		count = fill_addrs(vals, size, start_loc);
>  
>  	if (count < 0) {
> @@ -1248,6 +1256,8 @@ static int do_file(char const *const fname, void *addr)
>  #ifdef MCOUNT_SORT_ENABLED
>  		sort_reloc = true;
>  		rela_type = 0x403;
> +		/* arm64 uses patchable function entry placing before function */
> +		before_func = 8;
>  #endif
>  		/* fallthrough */
>  	case EM_386:
Re: [for-next][PATCH 4/6] scripts/sorttable: Zero out weak functions in mcount_loc table
Posted by Nathan Chancellor 9 months, 3 weeks ago
Hi Steve,

On Wed, Feb 19, 2025 at 10:18:19AM -0500, Steven Rostedt wrote:
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 728ecda6e8d4..e3f89924f603 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -7004,6 +7004,7 @@ static int ftrace_process_locs(struct module *mod,
>  	unsigned long count;
>  	unsigned long *p;
>  	unsigned long addr;
> +	unsigned long kaslr;
>  	unsigned long flags = 0; /* Shut up gcc */
>  	int ret = -ENOMEM;
>  
> @@ -7052,6 +7053,9 @@ static int ftrace_process_locs(struct module *mod,
>  		ftrace_pages->next = start_pg;
>  	}
>  
> +	/* For zeroed locations that were shifted for core kernel */
> +	kaslr = !mod ? kaslr_offset() : 0;
> +
>  	p = start;
>  	pg = start_pg;
>  	while (p < end) {
> @@ -7063,7 +7067,7 @@ static int ftrace_process_locs(struct module *mod,
>  		 * object files to satisfy alignments.
>  		 * Skip any NULL pointers.
>  		 */
> -		if (!addr) {
> +		if (!addr || addr == kaslr) {
>  			skipped++;
>  			continue;
>  		}

Our CI and KernelCI reports that this change as commit ef378c3b8233
("scripts/sorttable: Zero out weak functions in mcount_loc table") in
next-20250224 breaks when an architecture does not have kaslr_offset()
defined:

  $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- mrproper allmodconfig kernel/trace/ftrace.o
  kernel/trace/ftrace.c: In function 'ftrace_process_locs':
  kernel/trace/ftrace.c:7074:24: error: implicit declaration of function 'kaslr_offset' [-Wimplicit-function-declaration]
   7074 |         kaslr = !mod ? kaslr_offset() : 0;
        |                        ^~~~~~~~~~~~

https://lore.kernel.org/CACo-S-0GeJjWWcrGvos_Avg2FwGU2tj2QZpgoHOvPT+YbyknSg@mail.gmail.com/

Cheers,
Nathan
Re: [for-next][PATCH 4/6] scripts/sorttable: Zero out weak functions in mcount_loc table
Posted by Steven Rostedt 9 months, 3 weeks ago
On Mon, 24 Feb 2025 10:08:05 -0800
Nathan Chancellor <nathan@kernel.org> wrote:

> Our CI and KernelCI reports that this change as commit ef378c3b8233
> ("scripts/sorttable: Zero out weak functions in mcount_loc table") in
> next-20250224 breaks when an architecture does not have kaslr_offset()
> defined:
> 
>   $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- mrproper allmodconfig kernel/trace/ftrace.o
>   kernel/trace/ftrace.c: In function 'ftrace_process_locs':
>   kernel/trace/ftrace.c:7074:24: error: implicit declaration of function 'kaslr_offset' [-Wimplicit-function-declaration]
>    7074 |         kaslr = !mod ? kaslr_offset() : 0;
>         |                        ^~~~~~~~~~~~
> 
> https://lore.kernel.org/CACo-S-0GeJjWWcrGvos_Avg2FwGU2tj2QZpgoHOvPT+YbyknSg@mail.gmail.com/

Thanks, I'll add a patch to put an #ifdef around it.

Now the question is, can it still change the address of it with out kaslr_offset()?

-- Steve