From nobody Fri Dec 19 12:06:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36310145A07 for ; Tue, 27 Aug 2024 09:27:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; cv=none; b=kpTUJU94gcy+mH48X7m53h0A4zwE+1X7ul1VYFmPX/k17YPqaaXzOMLPzABWnsH+M6otIeK7Ynr1g414kdyXxI5znniIkgNjh19jqRmb7duzNQrGcR5+A/I2Qm9Ljlj57KQJVgbcJwQ5/nQ3UaedjBCSZJvTvVfyiv0aybZBET8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; c=relaxed/simple; bh=lzWD84QPzg//5zKEy3MASL8dK04s/QEOMWhC7WcCjfo=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=RtUoZcVd0nOSTNy0j4W5mZb6xvAStWOSCc0V6t3dYT4beIEpX3OapG6lxE1m1PaKbDZeHK54wGHoTDaxE9hyL30aJC8Wyt4lxQE+tXWmlqDg3LfgbhZS6r9fjXekdff1XuAVhxyPZk4xF3ajWfHAc6816ZhQOMMvwq89Z42XbsI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3DFBC8B7B5; Tue, 27 Aug 2024 09:27:03 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1sisUM-00000004SKm-1PmM; Tue, 27 Aug 2024 05:27:46 -0400 Message-ID: <20240827092746.200961303@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 27 Aug 2024 05:27:17 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort Subject: [for-next][PATCH 1/8] ring-buffer: Dont reset persistent ring-buffer meta saved addresses References: <20240827092716.515115830@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt The text and data address is saved in the meta data so that it can be used to know the delta of the text and data addresses of the last boot compared to the text and data addresses of the current boot. The delta is used to convert function pointer entries in the ring buffer to something that can be used by kallsyms (note this only works for built-in functions). But the saved addresses get reset on boot up. If the buffer is not used and there's another reboot, then the saved text and data addresses will be of the last boot and not that of the boot that created the content in the ring buffer. To get an idea of the issue: # trace-cmd start -B boot_mapped -p function # reboot # trace-cmd show -B boot_mapped | tail <...>-1 [000] d..1. 461.983243: native_apic_msr_write <-= native_kick_ap <...>-1 [000] d..1. 461.983244: __pfx_native_apic_msr_eo= i <-native_kick_ap <...>-1 [000] d..1. 461.983244: reserve_irq_vector_locke= d <-native_kick_ap <...>-1 [000] d..1. 461.983262: branch_emulate_op <-nati= ve_kick_ap <...>-1 [000] d..1. 461.983262: __ia32_sys_ia32_pread64 = <-native_kick_ap <...>-1 [000] d..1. 461.983263: native_kick_ap <-__smpbo= ot_create_thread <...>-1 [000] d..1. 461.983263: store_cache_disable <-na= tive_kick_ap <...>-1 [000] d..1. 461.983279: acpi_power_off_prepare <= -native_kick_ap <...>-1 [000] d..1. 461.983280: __pfx_acpi_ns_delete_nod= e <-acpi_suspend_enter <...>-1 [000] d..1. 461.983280: __pfx_acpi_os_release_lo= ck <-acpi_suspend_enter # reboot # trace-cmd show -B boot_mapped |tail <...>-1 [000] d..1. 461.983243: 0xffffffffa9669220 <-0xf= fffffffa965f3db <...>-1 [000] d..1. 461.983244: 0xffffffffa96690f0 <-0xf= fffffffa965f3db <...>-1 [000] d..1. 461.983244: 0xffffffffa9663fa0 <-0xf= fffffffa965f3db <...>-1 [000] d..1. 461.983262: 0xffffffffa9672e80 <-0xf= fffffffa965f3e0 <...>-1 [000] d..1. 461.983262: 0xffffffffa962b940 <-0xf= fffffffa965f3ec <...>-1 [000] d..1. 461.983263: 0xffffffffa965f540 <-0xf= fffffffa96e1362 <...>-1 [000] d..1. 461.983263: 0xffffffffa963c940 <-0xf= fffffffa965f55b <...>-1 [000] d..1. 461.983279: 0xffffffffa9ee30c0 <-0xf= fffffffa965f59b <...>-1 [000] d..1. 461.983280: 0xffffffffa9f16c10 <-0xf= fffffffa9ee3157 <...>-1 [000] d..1. 461.983280: 0xffffffffa9ee02e0 <-0xf= fffffffa9ee3157 By not updating the saved text and data addresses in the meta data at every boot up and only updating them when the buffer is reset, it allows multiple boots to see the same data. Cc: Masami Hiramatsu Cc: Mathieu Desnoyers Cc: Vincent Donnefort Link: https://lore.kernel.org/20240815113629.0dc90af8@rorschach.local.home Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 32 ++++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 8e3a7123937a..b16f301b8a93 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1817,12 +1817,19 @@ static void rb_meta_validate_events(struct ring_buf= fer_per_cpu *cpu_buffer) /* Used to calculate data delta */ static char rb_data_ptr[] =3D ""; =20 +#define THIS_TEXT_PTR ((unsigned long)rb_meta_init_text_addr) +#define THIS_DATA_PTR ((unsigned long)rb_data_ptr) + +static void rb_meta_init_text_addr(struct ring_buffer_meta *meta) +{ + meta->text_addr =3D THIS_TEXT_PTR; + meta->data_addr =3D THIS_DATA_PTR; +} + static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) { struct ring_buffer_meta *meta; unsigned long delta; - unsigned long this_text =3D (unsigned long)rb_range_meta_init; - unsigned long this_data =3D (unsigned long)rb_data_ptr; void *subbuf; int cpu; int i; @@ -1839,10 +1846,8 @@ static void rb_range_meta_init(struct trace_buffer *= buffer, int nr_pages) meta->first_buffer +=3D delta; meta->head_buffer +=3D delta; meta->commit_buffer +=3D delta; - buffer->last_text_delta =3D this_text - meta->text_addr; - buffer->last_data_delta =3D this_data - meta->data_addr; - meta->text_addr =3D this_text; - meta->data_addr =3D this_data; + buffer->last_text_delta =3D THIS_TEXT_PTR - meta->text_addr; + buffer->last_data_delta =3D THIS_DATA_PTR - meta->data_addr; continue; } =20 @@ -1859,8 +1864,7 @@ static void rb_range_meta_init(struct trace_buffer *b= uffer, int nr_pages) subbuf =3D rb_subbufs_from_meta(meta); =20 meta->first_buffer =3D (unsigned long)subbuf; - meta->text_addr =3D this_text; - meta->data_addr =3D this_data; + rb_meta_init_text_addr(meta); =20 /* * The buffers[] array holds the order of the sub-buffers @@ -5990,6 +5994,7 @@ static void reset_disabled_cpu_buffer(struct ring_buf= fer_per_cpu *cpu_buffer) void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu) { struct ring_buffer_per_cpu *cpu_buffer =3D buffer->buffers[cpu]; + struct ring_buffer_meta *meta; =20 if (!cpumask_test_cpu(cpu, buffer->cpumask)) return; @@ -6008,6 +6013,11 @@ void ring_buffer_reset_cpu(struct trace_buffer *buff= er, int cpu) atomic_dec(&cpu_buffer->record_disabled); atomic_dec(&cpu_buffer->resize_disabled); =20 + /* Make sure persistent meta now uses this buffer's addresses */ + meta =3D rb_range_meta(buffer, 0, cpu_buffer->cpu); + if (meta) + rb_meta_init_text_addr(meta); + mutex_unlock(&buffer->mutex); } EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu); @@ -6022,6 +6032,7 @@ EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu); void ring_buffer_reset_online_cpus(struct trace_buffer *buffer) { struct ring_buffer_per_cpu *cpu_buffer; + struct ring_buffer_meta *meta; int cpu; =20 /* prevent another thread from changing buffer sizes */ @@ -6049,6 +6060,11 @@ void ring_buffer_reset_online_cpus(struct trace_buff= er *buffer) =20 reset_disabled_cpu_buffer(cpu_buffer); =20 + /* Make sure persistent meta now uses this buffer's addresses */ + meta =3D rb_range_meta(buffer, 0, cpu_buffer->cpu); + if (meta) + rb_meta_init_text_addr(meta); + atomic_dec(&cpu_buffer->record_disabled); atomic_sub(RESET_BIT, &cpu_buffer->resize_disabled); } --=20 2.43.0 From nobody Fri Dec 19 12:06:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DD6E19D880 for ; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; cv=none; b=d9DUsAwjp6PFzH44SMAV+G92CNFjDm/kb82Lp7tF+/PsE7Z6udBIQZxYJkJS2+i8MJ2Z84ugH5PuuUmP7aqudDfr3HInokV4F/n2DEeQla7mB0RN9oFD2ArMJCxXDV6qAJXJ/uKaWz9doNF3vp1WRnmB0vjV4kdm9aJE7helefI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; c=relaxed/simple; bh=jUK1kJgfhcwf8nhkQ4JSyJF9SwO8DOZ1x0fWmhdFtvw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=QdzSXD+hOzgsYvjsloeuLjDjdKsyKKLc7bzW5UbHZUwR3xU8W5XQSALE23yvsTkpCcz7vrdida0vvH8TPSIo5IwX9Laz2SiZK/GBYBUwPw4WoV7scDcy/5hLujv9u+s6b34hnnFA2RoaXQuUXZ7yRvV9sZNU5Pa199PWPU2EWGw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5BE3C8B7B7; Tue, 27 Aug 2024 09:27:03 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1sisUM-00000004SLF-244v; Tue, 27 Aug 2024 05:27:46 -0400 Message-ID: <20240827092746.359657809@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 27 Aug 2024 05:27:18 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort Subject: [for-next][PATCH 2/8] ring-buffer: Add magic and struct size to boot up meta data References: <20240827092716.515115830@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Add a magic number as well as save the struct size of the ring_buffer_meta structure in the meta data to also use as validation. Updating the magic number could be used to force a invalidation between kernel versions, and saving the structure size is also a good method to make sure the content is what is expected. Cc: Masami Hiramatsu Cc: Mathieu Desnoyers Cc: Vincent Donnefort Link: https://lore.kernel.org/20240815115032.0c197b32@rorschach.local.home Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index b16f301b8a93..c3a5e6cbb940 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -44,7 +44,11 @@ =20 static void update_pages_handler(struct work_struct *work); =20 +#define RING_BUFFER_META_MAGIC 0xBADFEED + struct ring_buffer_meta { + int magic; + int struct_size; unsigned long text_addr; unsigned long data_addr; unsigned long first_buffer; @@ -1627,6 +1631,13 @@ static bool rb_meta_valid(struct ring_buffer_meta *m= eta, int cpu, unsigned long buffers_end; int i; =20 + /* Check the meta magic and meta struct size */ + if (meta->magic !=3D RING_BUFFER_META_MAGIC || + meta->struct_size !=3D sizeof(*meta)) { + pr_info("Ring buffer boot meta[%d] mismatch of magic or struct size\n", = cpu); + return false; + } + /* The subbuffer's size and number of subbuffers must match */ if (meta->subbuf_size !=3D subbuf_size || meta->nr_subbufs !=3D nr_pages + 1) { @@ -1858,6 +1869,9 @@ static void rb_range_meta_init(struct trace_buffer *b= uffer, int nr_pages) =20 memset(meta, 0, next_meta - (void *)meta); =20 + meta->magic =3D RING_BUFFER_META_MAGIC; + meta->struct_size =3D sizeof(*meta); + meta->nr_subbufs =3D nr_pages + 1; meta->subbuf_size =3D PAGE_SIZE; =20 --=20 2.43.0 From nobody Fri Dec 19 12:06:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76D8D19D892 for ; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; cv=none; b=pfBvaRoEfDCpPY0JE01CI7G6AHl/ZFsFoUh51PzHONIPPeJN/PX/aq1ZybfC6eIXxHpvnbfnz0UNi27XzwOizy6zmBwi8b5OebCSL3aHxOkKcS49UqlzmOR9bD5yD4jk+SvHYHj7vgkdhjpBNEVn6zPnNa8wPfFfiUMzOrNv9rU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; c=relaxed/simple; bh=TKiJo5+xBSU6Jnu3/wyofZzugr39xRcvGRJWKdE3S1Q=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=PRdyDiwsrXVT3oRf4Njco47tLjmzBpdWyIuhm3JBq2R0UK+268ZnTE+jfaa7waBGZxP51sia8EzFxXUivAnkvIlrbRij6zDzQu7omtIziTO1X054HDKdiNUx37/zPYVbMt+xCHM347CkBZyqOLi8kdbn2LPKGCnISrobZatWjpI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0EA19C8B7BC; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1sisUM-00000004SLl-2kAi; Tue, 27 Aug 2024 05:27:46 -0400 Message-ID: <20240827092746.514052482@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 27 Aug 2024 05:27:19 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort Subject: [for-next][PATCH 3/8] ring-buffer: Align meta-page to sub-buffers for improved TLB usage References: <20240827092716.515115830@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Vincent Donnefort Previously, the mapped ring-buffer layout caused misalignment between the meta-page and sub-buffers when the sub-buffer size was not a multiple of PAGE_SIZE. This prevented hardware with larger TLB entries from utilizing them effectively. Add a padding with the zero-page between the meta-page and sub-buffers. Also update the ring-buffer map_test to verify that padding. Link: https://lore.kernel.org/20240628104611.1443542-1-vdonnefort@google.com Signed-off-by: Vincent Donnefort Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 33 +++++++++++-------- .../testing/selftests/ring-buffer/map_test.c | 14 ++++++++ 2 files changed, 34 insertions(+), 13 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index c3a5e6cbb940..77dc0b25140e 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -6852,10 +6852,10 @@ static void rb_setup_ids_meta_page(struct ring_buff= er_per_cpu *cpu_buffer, /* install subbuf ID to kern VA translation */ cpu_buffer->subbuf_ids =3D subbuf_ids; =20 - meta->meta_page_size =3D PAGE_SIZE; meta->meta_struct_len =3D sizeof(*meta); meta->nr_subbufs =3D nr_subbufs; meta->subbuf_size =3D cpu_buffer->buffer->subbuf_size + BUF_PAGE_HDR_SIZE; + meta->meta_page_size =3D meta->subbuf_size; =20 rb_update_meta_page(cpu_buffer); } @@ -6949,6 +6949,12 @@ static int __rb_map_vma(struct ring_buffer_per_cpu *= cpu_buffer, !(vma->vm_flags & VM_MAYSHARE)) return -EPERM; =20 + subbuf_order =3D cpu_buffer->buffer->subbuf_order; + subbuf_pages =3D 1 << subbuf_order; + + if (subbuf_order && pgoff % subbuf_pages) + return -EINVAL; + /* * Make sure the mapping cannot become writable later. Also tell the VM * to not touch these pages (VM_DONTCOPY | VM_DONTEXPAND). @@ -6958,11 +6964,8 @@ static int __rb_map_vma(struct ring_buffer_per_cpu *= cpu_buffer, =20 lockdep_assert_held(&cpu_buffer->mapping_lock); =20 - subbuf_order =3D cpu_buffer->buffer->subbuf_order; - subbuf_pages =3D 1 << subbuf_order; - nr_subbufs =3D cpu_buffer->nr_pages + 1; /* + reader-subbuf */ - nr_pages =3D ((nr_subbufs) << subbuf_order) - pgoff + 1; /* + meta-page */ + nr_pages =3D ((nr_subbufs + 1) << subbuf_order) - pgoff; /* + meta-page */ =20 nr_vma_pages =3D vma_pages(vma); if (!nr_vma_pages || nr_vma_pages > nr_pages) @@ -6975,20 +6978,24 @@ static int __rb_map_vma(struct ring_buffer_per_cpu = *cpu_buffer, return -ENOMEM; =20 if (!pgoff) { + unsigned long meta_page_padding; + pages[p++] =3D virt_to_page(cpu_buffer->meta_page); =20 /* - * TODO: Align sub-buffers on their size, once - * vm_insert_pages() supports the zero-page. + * Pad with the zero-page to align the meta-page with the + * sub-buffers. */ - } else { - /* Skip the meta-page */ - pgoff--; + meta_page_padding =3D subbuf_pages - 1; + while (meta_page_padding-- && p < nr_pages) { + unsigned long __maybe_unused zero_addr =3D + vma->vm_start + (PAGE_SIZE * p); =20 - if (pgoff % subbuf_pages) { - err =3D -EINVAL; - goto out; + pages[p++] =3D ZERO_PAGE(zero_addr); } + } else { + /* Skip the meta-page */ + pgoff -=3D subbuf_pages; =20 s +=3D pgoff / subbuf_pages; } diff --git a/tools/testing/selftests/ring-buffer/map_test.c b/tools/testing= /selftests/ring-buffer/map_test.c index a9006fa7097e..4bb0192e43f3 100644 --- a/tools/testing/selftests/ring-buffer/map_test.c +++ b/tools/testing/selftests/ring-buffer/map_test.c @@ -228,6 +228,20 @@ TEST_F(map, data_mmap) data =3D mmap(NULL, data_len, PROT_READ, MAP_SHARED, desc->cpu_fd, meta_len); ASSERT_EQ(data, MAP_FAILED); + + /* Verify meta-page padding */ + if (desc->meta->meta_page_size > getpagesize()) { + void *addr; + + data_len =3D desc->meta->meta_page_size; + data =3D mmap(NULL, data_len, + PROT_READ, MAP_SHARED, desc->cpu_fd, 0); + ASSERT_NE(data, MAP_FAILED); + + addr =3D (void *)((unsigned long)data + getpagesize()); + ASSERT_EQ(*((int *)addr), 0); + munmap(data, data_len); + } } =20 FIXTURE(snapshot) { --=20 2.43.0 From nobody Fri Dec 19 12:06:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A02BA19D8A3 for ; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; cv=none; b=hL1IuHlRYu88o3XW0eNlhJRQcSRVxx3Y7tj+NBZlD4kX+tlsARB/SlZdJApneOc7f/539DYlbRdPcg+GccBezdoAKMQxB3W+97dovxGDInhPCLC0E/OVk6HCQTc9fXuGveWBgpR67c5g5VRg3IdHE5vQRtabdEfp4vFA+1PzFaY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; c=relaxed/simple; bh=P39uXxqL/XuG2FOWZ8aa9n/Ih1wfK1hy5FdyUrs+bRI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=WHaRvGDMAWgcqA43YldSfNdBOtewJhXu1NosMqlX4JhsPySIAy52KwbeYzd/sebjSqpuwpUzAvu33M6YrQQKNnX131Gm74RebfMQMLn1IBztwT6KyDGDL66QN0G5jHMT5NEFmbowdi9MLOVTSfHxGpwgfoJqcr/hbBxAYob9E/Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 409A4C8B7BB; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1sisUM-00000004SMF-3RDw; Tue, 27 Aug 2024 05:27:46 -0400 Message-ID: <20240827092746.676231983@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 27 Aug 2024 05:27:20 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Ingo Molnar , Peter Zijlstra , Thomas Gleixner , Vineeth Pillai , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells , Mike Rapoport , Dave Hansen , Tony Luck , Guenter Roeck , Ross Zwisler , Kees Cook , Alexander Aring , "Luis Claudio R. Goncalves" , Tomas Glozar , John Kacur , Clark Williams , Linus Torvalds , "Jonathan Corbet" Subject: [for-next][PATCH 4/8] tracing: Add "traceoff" flag to boot time tracing instances References: <20240827092716.515115830@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Add a "flags" delimiter (^) to the "trace_instance" kernel command line parameter, and add the "traceoff" flag. The format is: trace_instance=3D[^[^]][@][,] The code allows for more than one flag to be added, but currently only "traceoff" is done so. The motivation for this change came from debugging with the persistent ring buffer and having trace_printk() writing to it. The trace_printk calls are always enabled, and the boot after the crash was having the unwanted trace_printks from the current boot inject into the ring buffer with the trace_printks of the crash kernel, making the output very confusing. Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Mathieu Desnoyers Cc: Andrew Morton Cc: Vincent Donnefort Cc: Joel Fernandes Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vineeth Pillai Cc: Beau Belgrave Cc: Alexander Graf Cc: Baoquan He Cc: Borislav Petkov Cc: "Paul E. McKenney" Cc: David Howells Cc: Mike Rapoport Cc: Dave Hansen Cc: Tony Luck Cc: Guenter Roeck Cc: Ross Zwisler Cc: Kees Cook Cc: Alexander Aring Cc: "Luis Claudio R. Goncalves" Cc: Tomas Glozar Cc: John Kacur Cc: Clark Williams Cc: Linus Torvalds Cc: "Jonathan Corbet" Link: https://lore.kernel.org/20240823014019.053229958@goodmis.org Signed-off-by: Steven Rostedt (Google) --- .../admin-guide/kernel-parameters.txt | 17 ++++++++++ kernel/trace/trace.c | 31 ++++++++++++++++++- 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 388653448e72..3803f2b7f065 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6743,6 +6743,15 @@ the same thing would happen if it was left off). The irq_handler_entry event, and all events under the "initcall" system. =20 + Flags can be added to the instance to modify its behavior when it is + created. The flags are separated by '^'. Currently there's only one flag + defined, and that's "traceoff", to have the tracing instance tracing + disabled after it is created. + + trace_instance=3Dfoo^traceoff,sched,irq + + The flags must come before the defined events. + If memory has been reserved (see memmap for x86), the instance can use that memory: =20 @@ -6765,6 +6774,14 @@ kernel versions where the validator will fail and reset the ring buffer if the layout is not the same as the previous kernel. =20 + If the ring buffer is used for persistent bootups and has events enable= d, + it is recommend to disable tracing so that events from a previous boot = do not + mix with events of the current boot (unless you are debugging a random = crash + at boot up). + + reserve_mem=3D12M:4096:trace trace_instance=3Dboot_map^traceoff@trace,= sched,irq + + trace_options=3D[option-list] [FTRACE] Enable or disable tracer options at boot. The option-list is a comma delimited list of options diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 9bcef199ae90..a79eefe84d6b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -10468,10 +10468,36 @@ __init static void enable_instances(void) phys_addr_t start =3D 0; phys_addr_t size =3D 0; unsigned long addr =3D 0; + bool traceoff =3D false; + char *flag_delim; + char *addr_delim; =20 tok =3D strsep(&curr_str, ","); - name =3D strsep(&tok, "@"); =20 + flag_delim =3D strchr(tok, '^'); + addr_delim =3D strchr(tok, '@'); + + if (addr_delim) + *addr_delim++ =3D '\0'; + + if (flag_delim) + *flag_delim++ =3D '\0'; + + name =3D tok; + + if (flag_delim) { + char *flag; + + while ((flag =3D strsep(&flag_delim, "^"))) { + if (strcmp(flag, "traceoff") =3D=3D 0) + traceoff =3D true; + else + pr_info("Tracing: Invalid instance flag '%s' for %s\n", + flag, name); + } + } + + tok =3D addr_delim; if (tok && isdigit(*tok)) { start =3D memparse(tok, &tok); if (!start) { @@ -10519,6 +10545,9 @@ __init static void enable_instances(void) continue; } =20 + if (traceoff) + tracer_tracing_off(tr); + /* Only allow non mapped buffers to be deleted */ if (!start) trace_array_put(tr); --=20 2.43.0 From nobody Fri Dec 19 12:06:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B0A81805A for ; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; cv=none; b=SOCGjWuKbDC+reHQi60UW562cgsq1JjjhyZdqgm82QwUu5K1d0bSyxAmfZqG1n3vnr3hbkbULySLvulEqFa/iqCpkgxPjYLwIZHepjX28UeGbR42OUXuyatWRxEG/4ltXZ3/k3LyNYLYv+3i/BX1/OgMmaMFnwqML8RreBhGSac= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; c=relaxed/simple; bh=Xf3RA1cFF8RX3tnT1aslXT/h/n05S3ceoamhanOGl6o=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=UquwusCDYRoWes4SlTQJsYxiO7SeKv4e1z4TsP4dgz+KhkO22Fi13Oj3cszixmpTKlfuc1bbaIQcq0Uv2qk+NG/bGd69S2FUGkAloOnH1+O6LniRqJXcqfZGTRWevdLmZM2na/xJQcjOQe1PA0kCrtFWE2YXKazewzhw05+4kps= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5815BC8B7BF; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1sisUM-00000004SMj-46CK; Tue, 27 Aug 2024 05:27:46 -0400 Message-ID: <20240827092746.841799783@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 27 Aug 2024 05:27:21 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Ingo Molnar , Peter Zijlstra , Thomas Gleixner , Vineeth Pillai , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells , Mike Rapoport , Dave Hansen , Tony Luck , Guenter Roeck , Ross Zwisler , Kees Cook , Alexander Aring , "Luis Claudio R. Goncalves" , Tomas Glozar , John Kacur , Clark Williams , Linus Torvalds , "Jonathan Corbet" Subject: [for-next][PATCH 5/8] tracing: Allow trace_printk() to go to other instance buffers References: <20240827092716.515115830@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Currently, trace_printk() just goes to the top level ring buffer. But there may be times that it should go to one of the instances created by the kernel command line. Add a new trace_instance flag: traceprintk (also can use "printk" or "trace_printk" as people tend to forget the actual flag name). trace_instance=3Dfoo^traceprintk Will assign the trace_printk to this buffer at boot up. Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Mathieu Desnoyers Cc: Andrew Morton Cc: Vincent Donnefort Cc: Joel Fernandes Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vineeth Pillai Cc: Beau Belgrave Cc: Alexander Graf Cc: Baoquan He Cc: Borislav Petkov Cc: "Paul E. McKenney" Cc: David Howells Cc: Mike Rapoport Cc: Dave Hansen Cc: Tony Luck Cc: Guenter Roeck Cc: Ross Zwisler Cc: Kees Cook Cc: Alexander Aring Cc: "Luis Claudio R. Goncalves" Cc: Tomas Glozar Cc: John Kacur Cc: Clark Williams Cc: Linus Torvalds Cc: "Jonathan Corbet" Link: https://lore.kernel.org/20240823014019.226694946@goodmis.org Signed-off-by: Steven Rostedt (Google) --- .../admin-guide/kernel-parameters.txt | 14 ++++-- kernel/trace/trace.c | 46 ++++++++++++++----- 2 files changed, 45 insertions(+), 15 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 3803f2b7f065..a8803c0c0a89 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6744,11 +6744,17 @@ event, and all events under the "initcall" system. =20 Flags can be added to the instance to modify its behavior when it is - created. The flags are separated by '^'. Currently there's only one flag - defined, and that's "traceoff", to have the tracing instance tracing - disabled after it is created. + created. The flags are separated by '^'. =20 - trace_instance=3Dfoo^traceoff,sched,irq + The available flags are: + + traceoff - Have the tracing instance tracing disabled after it is c= reated. + traceprintk - Have trace_printk() write into this trace instance + (note, "printk" and "trace_printk" can also be used) + Currently, traceprintk flag cannot be used for memory + mapped ring buffers as described below. + + trace_instance=3Dfoo^traceoff^traceprintk,sched,irq =20 The flags must come before the defined events. =20 diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index a79eefe84d6b..8e28f19f5316 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -500,6 +500,8 @@ static struct trace_array global_trace =3D { .trace_flags =3D TRACE_DEFAULT_FLAGS, }; =20 +static struct trace_array *printk_trace =3D &global_trace; + void trace_set_ring_buffer_expanded(struct trace_array *tr) { if (!tr) @@ -1117,7 +1119,7 @@ EXPORT_SYMBOL_GPL(__trace_array_puts); */ int __trace_puts(unsigned long ip, const char *str, int size) { - return __trace_array_puts(&global_trace, ip, str, size); + return __trace_array_puts(printk_trace, ip, str, size); } EXPORT_SYMBOL_GPL(__trace_puts); =20 @@ -1128,6 +1130,7 @@ EXPORT_SYMBOL_GPL(__trace_puts); */ int __trace_bputs(unsigned long ip, const char *str) { + struct trace_array *tr =3D printk_trace; struct ring_buffer_event *event; struct trace_buffer *buffer; struct bputs_entry *entry; @@ -1135,14 +1138,14 @@ int __trace_bputs(unsigned long ip, const char *str) int size =3D sizeof(struct bputs_entry); int ret =3D 0; =20 - if (!(global_trace.trace_flags & TRACE_ITER_PRINTK)) + if (!(tr->trace_flags & TRACE_ITER_PRINTK)) return 0; =20 if (unlikely(tracing_selftest_running || tracing_disabled)) return 0; =20 trace_ctx =3D tracing_gen_ctx(); - buffer =3D global_trace.array_buffer.buffer; + buffer =3D tr->array_buffer.buffer; =20 ring_buffer_nest_start(buffer); event =3D __trace_buffer_lock_reserve(buffer, TRACE_BPUTS, size, @@ -1155,7 +1158,7 @@ int __trace_bputs(unsigned long ip, const char *str) entry->str =3D str; =20 __buffer_unlock_commit(buffer, event); - ftrace_trace_stack(&global_trace, buffer, trace_ctx, 4, NULL); + ftrace_trace_stack(tr, buffer, trace_ctx, 4, NULL); =20 ret =3D 1; out: @@ -3025,7 +3028,7 @@ void trace_dump_stack(int skip) /* Skip 1 to skip this function. */ skip++; #endif - __ftrace_trace_stack(global_trace.array_buffer.buffer, + __ftrace_trace_stack(printk_trace->array_buffer.buffer, tracing_gen_ctx(), skip, NULL); } EXPORT_SYMBOL_GPL(trace_dump_stack); @@ -3244,7 +3247,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt,= va_list args) struct trace_event_call *call =3D &event_bprint; struct ring_buffer_event *event; struct trace_buffer *buffer; - struct trace_array *tr =3D &global_trace; + struct trace_array *tr =3D printk_trace; struct bprint_entry *entry; unsigned int trace_ctx; char *tbuffer; @@ -3342,7 +3345,7 @@ __trace_array_vprintk(struct trace_buffer *buffer, memcpy(&entry->buf, tbuffer, len + 1); if (!call_filter_check_discard(call, entry, buffer, event)) { __buffer_unlock_commit(buffer, event); - ftrace_trace_stack(&global_trace, buffer, trace_ctx, 6, NULL); + ftrace_trace_stack(printk_trace, buffer, trace_ctx, 6, NULL); } =20 out: @@ -3438,7 +3441,7 @@ int trace_array_printk_buf(struct trace_buffer *buffe= r, int ret; va_list ap; =20 - if (!(global_trace.trace_flags & TRACE_ITER_PRINTK)) + if (!(printk_trace->trace_flags & TRACE_ITER_PRINTK)) return 0; =20 va_start(ap, fmt); @@ -3450,7 +3453,7 @@ int trace_array_printk_buf(struct trace_buffer *buffe= r, __printf(2, 0) int trace_vprintk(unsigned long ip, const char *fmt, va_list args) { - return trace_array_vprintk(&global_trace, ip, fmt, args); + return trace_array_vprintk(printk_trace, ip, fmt, args); } EXPORT_SYMBOL_GPL(trace_vprintk); =20 @@ -9666,6 +9669,9 @@ static int __remove_instance(struct trace_array *tr) set_tracer_flag(tr, 1 << i, 0); } =20 + if (printk_trace =3D=3D tr) + printk_trace =3D &global_trace; + tracing_set_nop(tr); clear_ftrace_function_probes(tr); event_trace_del_tracer(tr); @@ -10468,6 +10474,7 @@ __init static void enable_instances(void) phys_addr_t start =3D 0; phys_addr_t size =3D 0; unsigned long addr =3D 0; + bool traceprintk =3D false; bool traceoff =3D false; char *flag_delim; char *addr_delim; @@ -10489,11 +10496,16 @@ __init static void enable_instances(void) char *flag; =20 while ((flag =3D strsep(&flag_delim, "^"))) { - if (strcmp(flag, "traceoff") =3D=3D 0) + if (strcmp(flag, "traceoff") =3D=3D 0) { traceoff =3D true; - else + } else if ((strcmp(flag, "printk") =3D=3D 0) || + (strcmp(flag, "traceprintk") =3D=3D 0) || + (strcmp(flag, "trace_printk") =3D=3D 0)) { + traceprintk =3D true; + } else { pr_info("Tracing: Invalid instance flag '%s' for %s\n", flag, name); + } } } =20 @@ -10548,6 +10560,18 @@ __init static void enable_instances(void) if (traceoff) tracer_tracing_off(tr); =20 + if (traceprintk) { + /* + * The binary format of traceprintk can cause a crash if used + * by a buffer from another boot. Do not allow it for the + * memory mapped ring buffers. + */ + if (start) + pr_warn("Tracing: WARNING: memory mapped ring buffers cannot be used f= or trace_printk\n"); + else + printk_trace =3D tr; + } + /* Only allow non mapped buffers to be deleted */ if (!start) trace_array_put(tr); --=20 2.43.0 From nobody Fri Dec 19 12:06:24 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF14919DFB8 for ; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; cv=none; b=MfiJHJR/qwOpcW22Ngl+0G3lFnJT2PqsDtOYQ5AXWhK3gzGBClotVTwVi8T2IcnZoNvSw9Junf/iWUxTZDlhn8DbWS/HB4yTq/bBz2DH8/fl8960GQaQzNJL1ZwiZC7CtD+L449EF50Ib4VMowYV6hLcG+vVaTQYVh7+zu5enQc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; c=relaxed/simple; bh=tynU/d4Be9ZcuLtBouPQyAlIHYILeSRyw/x2aNqlZR4=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=OJ80ybSC9zYQWWz1J7dHwaC/O7yMD6yTYb1EUBeZK6iztgK8ebnRBN6haEQvYsAUEJIa9UXkxZRR03Lw2ZYix8YffflCOVUL74Cbyxb+gjE5gM1YzdayYGuK+RlkMYlj6i8+lNoNlGjieFoy/L/9XvfC1cYhSxuCrxy67ngMgF4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7F664C8B7C0; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1sisUN-00000004SND-0ZtR; Tue, 27 Aug 2024 05:27:47 -0400 Message-ID: <20240827092746.999458647@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 27 Aug 2024 05:27:22 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Ingo Molnar , Peter Zijlstra , Thomas Gleixner , Vineeth Pillai , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells , Mike Rapoport , Dave Hansen , Tony Luck , Guenter Roeck , Ross Zwisler , Kees Cook , Alexander Aring , "Luis Claudio R. Goncalves" , Tomas Glozar , John Kacur , Clark Williams , Linus Torvalds , "Jonathan Corbet" Subject: [for-next][PATCH 6/8] tracing: Have trace_printk not use binary prints if boot buffer References: <20240827092716.515115830@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt If the persistent boot mapped ring buffer is used for trace_printk(), force it to not use the binary versions. trace_printk() by default uses bin_printf() that only saves the pointer to the format and not the format itself inside the ring buffer. But for a persistent buffer that is read after reboot, the pointers to the format strings may not be the same, or worse, not even exist! Instead, just force the more robust, but slower, version that does the formatting before saving into the ring buffer. The boot mapped buffer can now be used for trace_printk and friends! Using the trace_printk() and the persistent buffer was used to debug the issue with the osnoise tracer: Link: https://lore.kernel.org/all/20240822103443.6a6ae051@gandalf.local.hom= e/ Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Mathieu Desnoyers Cc: Andrew Morton Cc: Vincent Donnefort Cc: Joel Fernandes Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vineeth Pillai Cc: Beau Belgrave Cc: Alexander Graf Cc: Baoquan He Cc: Borislav Petkov Cc: "Paul E. McKenney" Cc: David Howells Cc: Mike Rapoport Cc: Dave Hansen Cc: Tony Luck Cc: Guenter Roeck Cc: Ross Zwisler Cc: Kees Cook Cc: Alexander Aring Cc: "Luis Claudio R. Goncalves" Cc: Tomas Glozar Cc: John Kacur Cc: Clark Williams Cc: Linus Torvalds Cc: "Jonathan Corbet" Link: https://lore.kernel.org/20240823014019.386925800@goodmis.org Signed-off-by: Steven Rostedt (Google) --- .../admin-guide/kernel-parameters.txt | 4 +- kernel/trace/trace.c | 44 ++++++++++++------- kernel/trace/trace.h | 3 +- kernel/trace/trace_output.c | 5 ++- 4 files changed, 36 insertions(+), 20 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index a8803c0c0a89..9e507e6cb4c8 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6751,8 +6751,6 @@ traceoff - Have the tracing instance tracing disabled after it is c= reated. traceprintk - Have trace_printk() write into this trace instance (note, "printk" and "trace_printk" can also be used) - Currently, traceprintk flag cannot be used for memory - mapped ring buffers as described below. =20 trace_instance=3Dfoo^traceoff^traceprintk,sched,irq =20 @@ -6785,7 +6783,7 @@ mix with events of the current boot (unless you are debugging a random = crash at boot up). =20 - reserve_mem=3D12M:4096:trace trace_instance=3Dboot_map^traceoff@trace,= sched,irq + reserve_mem=3D12M:4096:trace trace_instance=3Dboot_map^traceoff^tracep= rintk@trace,sched,irq =20 =20 trace_options=3D[option-list] diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 8e28f19f5316..35b37c9aa26c 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -502,6 +502,17 @@ static struct trace_array global_trace =3D { =20 static struct trace_array *printk_trace =3D &global_trace; =20 +static __always_inline bool printk_binsafe(struct trace_array *tr) +{ + /* + * The binary format of traceprintk can cause a crash if used + * by a buffer from another boot. Force the use of the + * non binary version of trace_printk if the trace_printk + * buffer is a boot mapped ring buffer. + */ + return !(tr->flags & TRACE_ARRAY_FL_BOOT); +} + void trace_set_ring_buffer_expanded(struct trace_array *tr) { if (!tr) @@ -1130,7 +1141,7 @@ EXPORT_SYMBOL_GPL(__trace_puts); */ int __trace_bputs(unsigned long ip, const char *str) { - struct trace_array *tr =3D printk_trace; + struct trace_array *tr =3D READ_ONCE(printk_trace); struct ring_buffer_event *event; struct trace_buffer *buffer; struct bputs_entry *entry; @@ -1138,6 +1149,9 @@ int __trace_bputs(unsigned long ip, const char *str) int size =3D sizeof(struct bputs_entry); int ret =3D 0; =20 + if (!printk_binsafe(tr)) + return __trace_puts(ip, str, strlen(str)); + if (!(tr->trace_flags & TRACE_ITER_PRINTK)) return 0; =20 @@ -3247,12 +3261,15 @@ int trace_vbprintk(unsigned long ip, const char *fm= t, va_list args) struct trace_event_call *call =3D &event_bprint; struct ring_buffer_event *event; struct trace_buffer *buffer; - struct trace_array *tr =3D printk_trace; + struct trace_array *tr =3D READ_ONCE(printk_trace); struct bprint_entry *entry; unsigned int trace_ctx; char *tbuffer; int len =3D 0, size; =20 + if (!printk_binsafe(tr)) + return trace_vprintk(ip, fmt, args); + if (unlikely(tracing_selftest_running || tracing_disabled)) return 0; =20 @@ -10560,20 +10577,17 @@ __init static void enable_instances(void) if (traceoff) tracer_tracing_off(tr); =20 - if (traceprintk) { - /* - * The binary format of traceprintk can cause a crash if used - * by a buffer from another boot. Do not allow it for the - * memory mapped ring buffers. - */ - if (start) - pr_warn("Tracing: WARNING: memory mapped ring buffers cannot be used f= or trace_printk\n"); - else - printk_trace =3D tr; - } + if (traceprintk) + printk_trace =3D tr; =20 - /* Only allow non mapped buffers to be deleted */ - if (!start) + /* + * If start is set, then this is a mapped buffer, and + * cannot be deleted by user space, so keep the reference + * to it. + */ + if (start) + tr->flags |=3D TRACE_ARRAY_FL_BOOT; + else trace_array_put(tr); =20 while ((tok =3D strsep(&curr_str, ","))) { diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 4f448ab2d1e7..07b2d2af9b33 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -429,7 +429,8 @@ struct trace_array { }; =20 enum { - TRACE_ARRAY_FL_GLOBAL =3D (1 << 0) + TRACE_ARRAY_FL_GLOBAL =3D BIT(0), + TRACE_ARRAY_FL_BOOT =3D BIT(1), }; =20 extern struct list_head ftrace_trace_arrays; diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 48de93598897..868f2f912f28 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -1591,10 +1591,13 @@ static enum print_line_t trace_print_print(struct t= race_iterator *iter, { struct print_entry *field; struct trace_seq *s =3D &iter->seq; + unsigned long ip; =20 trace_assign_type(field, iter->ent); =20 - seq_print_ip_sym(s, field->ip, flags); + ip =3D field->ip + iter->tr->text_delta; + + seq_print_ip_sym(s, ip, flags); trace_seq_printf(s, ": %s", field->buf); =20 return trace_handle_return(s); --=20 2.43.0 From nobody Fri Dec 19 12:06:25 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE87E19D89D for ; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; cv=none; b=oQESOpRDQ8rH0kWCRyeky+2OEKnSw/coN3nH+K8KkIuU71+grXR+NVTdaNAOAmIMoEzwdz0hm5IsmbKA8XC3bVK0y1xsLUNiWzFkwlnDHgwJzge4frMj79yhvReMMUFxNFMrbujO0Q/sHCRWMkvVPt1+w1FdUvxhnTnalRLTR4k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750824; c=relaxed/simple; bh=Oj50tsTi4io39jeBF54dMLAlg7sXeXQzapMtB4VD+Dg=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=WpleOmqhVvx4VBypKKEoDf5UZ1P9IonnaJRZGAseWb+3eK8/I+R35ud3ukEZR326AGgT8h24A7bGP8xqoSp5eeu3JJ5zgSMfMKj17hZA5cq+t9N+rZeRHrY2yD2Mz9CClCOU9vnOay83eUxkacXYUdhNW/1XQxxc8DrN/EkUm2Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id AAEC4C8B7BE; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1sisUN-00000004SNh-1Gkx; Tue, 27 Aug 2024 05:27:47 -0400 Message-ID: <20240827092747.159308148@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 27 Aug 2024 05:27:23 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Ingo Molnar , Peter Zijlstra , Thomas Gleixner , Vineeth Pillai , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells , Mike Rapoport , Dave Hansen , Tony Luck , Guenter Roeck , Ross Zwisler , Kees Cook , Alexander Aring , "Luis Claudio R. Goncalves" , Tomas Glozar , John Kacur , Clark Williams , Linus Torvalds , "Jonathan Corbet" Subject: [for-next][PATCH 7/8] tracing: Add option to set an instance to be the trace_printk destination References: <20240827092716.515115830@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Add a option "trace_printk_dest" that will make the tracing instance the location that trace_printk() will go to. This is useful if the trace_printk or one of the top level tracers is too noisy and there's a need to separate the two. Then an instance can be created, the trace_printk can be set to go there instead, where it will not be lost in the noise of the top level tracer. Note, only one instance can be the destination of trace_printk at a time. If an instance sets this flag, the instance that had it set will have it cleared. There is always one instance that has this set. By default, that is the top instance. This flag cannot be cleared from the top instance. Doing so will result in an -EINVAL. The only way this flag can be cleared from the top instance is by another instance setting it. Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Mathieu Desnoyers Cc: Andrew Morton Cc: Vincent Donnefort Cc: Joel Fernandes Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vineeth Pillai Cc: Beau Belgrave Cc: Alexander Graf Cc: Baoquan He Cc: Borislav Petkov Cc: "Paul E. McKenney" Cc: David Howells Cc: Mike Rapoport Cc: Dave Hansen Cc: Tony Luck Cc: Guenter Roeck Cc: Ross Zwisler Cc: Kees Cook Cc: Alexander Aring Cc: "Luis Claudio R. Goncalves" Cc: Tomas Glozar Cc: John Kacur Cc: Clark Williams Cc: Linus Torvalds Cc: "Jonathan Corbet" Link: https://lore.kernel.org/20240823014019.545459018@goodmis.org Signed-off-by: Steven Rostedt (Google) --- Documentation/trace/ftrace.rst | 12 ++++++++++ kernel/trace/trace.c | 40 +++++++++++++++++++++++++++++----- kernel/trace/trace.h | 1 + 3 files changed, 48 insertions(+), 5 deletions(-) diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index 5aba74872ba7..4073ca48af4a 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -1186,6 +1186,18 @@ Here are the available options: trace_printk Can disable trace_printk() from writing into the buffer. =20 + trace_printk_dest + Set to have trace_printk() and similar internal tracing functions + write into this instance. Note, only one trace instance can have + this set. By setting this flag, it clears the trace_printk_dest flag + of the instance that had it set previously. By default, the top + level trace has this set, and will get it set again if another + instance has it set then clears it. + + This flag cannot be cleared by the top level instance, as it is the + default instance. The only way the top level instance has this flag + cleared, is by it being set in another instance. + annotate It is sometimes confusing when the CPU buffers are full and one CPU buffer had a lot of events recently, thus diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 35b37c9aa26c..658b40b483a3 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -482,7 +482,7 @@ EXPORT_SYMBOL_GPL(unregister_ftrace_export); TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | \ TRACE_ITER_RECORD_CMD | TRACE_ITER_OVERWRITE | \ TRACE_ITER_IRQ_INFO | TRACE_ITER_MARKERS | \ - TRACE_ITER_HASH_PTR) + TRACE_ITER_HASH_PTR | TRACE_ITER_TRACE_PRINTK) =20 /* trace_options that are only supported by global_trace */ #define TOP_LEVEL_TRACE_FLAGS (TRACE_ITER_PRINTK | \ @@ -490,7 +490,7 @@ EXPORT_SYMBOL_GPL(unregister_ftrace_export); =20 /* trace_flags that are default zero for instances */ #define ZEROED_TRACE_FLAGS \ - (TRACE_ITER_EVENT_FORK | TRACE_ITER_FUNC_FORK) + (TRACE_ITER_EVENT_FORK | TRACE_ITER_FUNC_FORK | TRACE_ITER_TRACE_PRINTK) =20 /* * The global_trace is the descriptor that holds the top-level tracing @@ -513,6 +513,16 @@ static __always_inline bool printk_binsafe(struct trac= e_array *tr) return !(tr->flags & TRACE_ARRAY_FL_BOOT); } =20 +static void update_printk_trace(struct trace_array *tr) +{ + if (printk_trace =3D=3D tr) + return; + + printk_trace->trace_flags &=3D ~TRACE_ITER_TRACE_PRINTK; + printk_trace =3D tr; + tr->trace_flags |=3D TRACE_ITER_TRACE_PRINTK; +} + void trace_set_ring_buffer_expanded(struct trace_array *tr) { if (!tr) @@ -5300,7 +5310,8 @@ int trace_keep_overwrite(struct tracer *tracer, u32 m= ask, int set) int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled) { if ((mask =3D=3D TRACE_ITER_RECORD_TGID) || - (mask =3D=3D TRACE_ITER_RECORD_CMD)) + (mask =3D=3D TRACE_ITER_RECORD_CMD) || + (mask =3D=3D TRACE_ITER_TRACE_PRINTK)) lockdep_assert_held(&event_mutex); =20 /* do nothing if flag is already set */ @@ -5312,6 +5323,25 @@ int set_tracer_flag(struct trace_array *tr, unsigned= int mask, int enabled) if (tr->current_trace->flag_changed(tr, mask, !!enabled)) return -EINVAL; =20 + if (mask =3D=3D TRACE_ITER_TRACE_PRINTK) { + if (enabled) { + update_printk_trace(tr); + } else { + /* + * The global_trace cannot clear this. + * It's flag only gets cleared if another instance sets it. + */ + if (printk_trace =3D=3D &global_trace) + return -EINVAL; + /* + * An instance must always have it set. + * by default, that's the global_trace instane. + */ + if (printk_trace =3D=3D tr) + update_printk_trace(&global_trace); + } + } + if (enabled) tr->trace_flags |=3D mask; else @@ -9687,7 +9717,7 @@ static int __remove_instance(struct trace_array *tr) } =20 if (printk_trace =3D=3D tr) - printk_trace =3D &global_trace; + update_printk_trace(&global_trace); =20 tracing_set_nop(tr); clear_ftrace_function_probes(tr); @@ -10578,7 +10608,7 @@ __init static void enable_instances(void) tracer_tracing_off(tr); =20 if (traceprintk) - printk_trace =3D tr; + update_printk_trace(tr); =20 /* * If start is set, then this is a mapped buffer, and diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 07b2d2af9b33..c866991b9c78 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1321,6 +1321,7 @@ extern int trace_get_user(struct trace_parser *parser= , const char __user *ubuf, C(IRQ_INFO, "irq-info"), \ C(MARKERS, "markers"), \ C(EVENT_FORK, "event-fork"), \ + C(TRACE_PRINTK, "trace_printk_dest"), \ C(PAUSE_ON_TRACE, "pause-on-trace"), \ C(HASH_PTR, "hash-ptr"), /* Print hashed pointer */ \ FUNCTION_FLAGS \ --=20 2.43.0 From nobody Fri Dec 19 12:06:25 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07A6719E7E7 for ; Tue, 27 Aug 2024 09:27:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750825; cv=none; b=JseTgkEuyX1eFEbEPv3HLR/iF/AAZfBHREsONpurgiKqMT2X9koYBnUUiuuPnptiHaRYcnBww3DLMfhy6q4j06wNT0+SR4628HhIQ/5eoiwvI6zXvhx5ZMfxRqpI2ppJQ54jhKYBGLoyLWp3I4JerpDWATsBCGSu0mpNnt2Un1s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724750825; c=relaxed/simple; bh=HFeILM/8I55HanXA4IykNgai7hjm86Hy9lecNhsl3HI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=E/QFOSWcHWoMyF00WceHlvsuS39P693zkuw5WYcMR386k7VLOicoqpRZXBcG7yhUDFcIoGcBpOU0GUCCNFeIPnEx4AU/48BQfHUdQQtTHaUlEDx1/uzpu4X4jjUVCOJ6mkLHJMtSddUoT5bgZtmqqMM0TkLMqn25WVMQqSQRHSE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id D1E99C8B7C9; Tue, 27 Aug 2024 09:27:04 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.98) (envelope-from ) id 1sisUN-00000004SOF-1yGF; Tue, 27 Aug 2024 05:27:47 -0400 Message-ID: <20240827092747.323259636@goodmis.org> User-Agent: quilt/0.68 Date: Tue, 27 Aug 2024 05:27:24 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Vincent Donnefort , Joel Fernandes , Ingo Molnar , Peter Zijlstra , Thomas Gleixner , Vineeth Pillai , Beau Belgrave , Alexander Graf , Baoquan He , Borislav Petkov , "Paul E. McKenney" , David Howells , Mike Rapoport , Dave Hansen , Tony Luck , Guenter Roeck , Ross Zwisler , Kees Cook , Alexander Aring , "Luis Claudio R. Goncalves" , Tomas Glozar , John Kacur , Clark Williams , Linus Torvalds , "Jonathan Corbet" Subject: [for-next][PATCH 8/8] tracing/Documentation: Start a document on how to debug with tracing References: <20240827092716.515115830@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Add a new document Documentation/trace/debugging.rst that will hold various ways to debug tracing. This initial version mentions trace_printk and how to create persistent buffers that can last across bootups. Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Mathieu Desnoyers Cc: Andrew Morton Cc: Vincent Donnefort Cc: Joel Fernandes Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vineeth Pillai Cc: Beau Belgrave Cc: Alexander Graf Cc: Baoquan He Cc: Borislav Petkov Cc: "Paul E. McKenney" Cc: David Howells Cc: Mike Rapoport Cc: Dave Hansen Cc: Tony Luck Cc: Guenter Roeck Cc: Ross Zwisler Cc: Kees Cook Cc: Alexander Aring Cc: "Luis Claudio R. Goncalves" Cc: Tomas Glozar Cc: John Kacur Cc: Clark Williams Cc: Linus Torvalds Cc: "Jonathan Corbet" Link: https://lore.kernel.org/20240823014019.702433486@goodmis.org Signed-off-by: Steven Rostedt (Google) --- .../admin-guide/kernel-parameters.txt | 2 + Documentation/trace/debugging.rst | 159 ++++++++++++++++++ 2 files changed, 161 insertions(+) create mode 100644 Documentation/trace/debugging.rst diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 9e507e6cb4c8..9bb50dc78338 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6785,6 +6785,8 @@ =20 reserve_mem=3D12M:4096:trace trace_instance=3Dboot_map^traceoff^tracep= rintk@trace,sched,irq =20 + See also Documentation/trace/debugging.rst + =20 trace_options=3D[option-list] [FTRACE] Enable or disable tracer options at boot. diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugg= ing.rst new file mode 100644 index 000000000000..54fb16239d70 --- /dev/null +++ b/Documentation/trace/debugging.rst @@ -0,0 +1,159 @@ +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D +Using the tracer for debugging +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +Copyright 2024 Google LLC. + +:Author: Steven Rostedt +:License: The GNU Free Documentation License, Version 1.2 + (dual licensed under the GPL v2) + +- Written for: 6.12 + +Introduction +------------ +The tracing infrastructure can be very useful for debugging the Linux +kernel. This document is a place to add various methods of using the tracer +for debugging. + +First, make sure that the tracefs file system is mounted:: + + $ sudo mount -t tracefs tracefs /sys/kernel/tracing + + +Using trace_printk() +-------------------- + +trace_printk() is a very lightweight utility that can be used in any conte= xt +inside the kernel, with the exception of "noinstr" sections. It can be used +in normal, softirq, interrupt and even NMI context. The trace data is +written to the tracing ring buffer in a lockless way. To make it even +lighter weight, when possible, it will only record the pointer to the form= at +string, and save the raw arguments into the buffer. The format and the +arguments will be post processed when the ring buffer is read. This way the +trace_printk() format conversions are not done during the hot path, where +the trace is being recorded. + +trace_printk() is meant only for debugging, and should never be added into +a subsystem of the kernel. If you need debugging traces, add trace events +instead. If a trace_printk() is found in the kernel, the following will +appear in the dmesg:: + + ********************************************************** + ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** + ** ** + ** trace_printk() being used. Allocating extra memory. ** + ** ** + ** This means that this is a DEBUG kernel and it is ** + ** unsafe for production use. ** + ** ** + ** If you see this message and you are not debugging ** + ** the kernel, report this immediately to your vendor! ** + ** ** + ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ** + ********************************************************** + +Debugging kernel crashes +------------------------ +There is various methods of acquiring the state of the system when a kernel +crash occurs. This could be from the oops message in printk, or one could +use kexec/kdump. But these just show what happened at the time of the cras= h. +It can be very useful in knowing what happened up to the point of the cras= h. +The tracing ring buffer, by default, is a circular buffer than will +overwrite older events with newer ones. When a crash happens, the content = of +the ring buffer will be all the events that lead up to the crash. + +There are several kernel command line parameters that can be used to help = in +this. The first is "ftrace_dump_on_oops". This will dump the tracing ring +buffer when a oops occurs to the console. This can be useful if the console +is being logged somewhere. If a serial console is used, it may be prudent = to +make sure the ring buffer is relatively small, otherwise the dumping of the +ring buffer may take several minutes to hours to finish. Here's an example +of the kernel command line:: + + ftrace_dump_on_oops trace_buf_size=3D50K + +Note, the tracing buffer is made up of per CPU buffers where each of these +buffers is broken up into sub-buffers that are by default PAGE_SIZE. The +above trace_buf_size option above sets each of the per CPU buffers to 50K, +so, on a machine with 8 CPUs, that's actually 400K total. + +Persistent buffers across boots +------------------------------- +If the system memory allows it, the tracing ring buffer can be specified at +a specific location in memory. If the location is the same across boots and +the memory is not modified, the tracing buffer can be retrieved from the +following boot. There's two ways to reserve memory for the use of the ring +buffer. + +The more reliable way (on x86) is to reserve memory with the "memmap" kern= el +command line option and then use that memory for the trace_instance. This +requires a bit of knowledge of the physical memory layout of the system. T= he +advantage of using this method, is that the memory for the ring buffer will +always be the same:: + + memmap=3D=3D12M$0x284500000 trace_instance=3Dboot_map@0x284500000:12M + +The memmap above reserves 12 megabytes of memory at the physical memory +location 0x284500000. Then the trace_instance option will create a trace +instance "boot_map" at that same location with the same amount of memory +reserved. As the ring buffer is broke up into per CPU buffers, the 12 +megabytes will be broken up evenly between those CPUs. If you have 8 CPUs, +each per CPU ring buffer will be 1.5 megabytes in size. Note, that also +includes meta data, so the amount of memory actually used by the ring buff= er +will be slightly smaller. + +Another more generic but less robust way to allocate a ring buffer mapping +at boot is with the "reserve_mem" option:: + + reserve_mem=3D12M:4096:trace trace_instance=3Dboot_map@trace + +The reserve_mem option above will find 12 megabytes that are available at +boot up, and align it by 4096 bytes. It will label this memory as "trace" +that can be used by later command line options. + +The trace_instance option creates a "boot_map" instance and will use the +memory reserved by reserve_mem that was labeled as "trace". This method is +more generic but may not be as reliable. Due to KASLR, the memory reserved +by reserve_mem may not be located at the same location. If this happens, +then the ring buffer will not be from the previous boot and will be reset. + +Sometimes, by using a larger alignment, it can keep KASLR from moving thin= gs +around in such a way that it will move the location of the reserve_mem. By +using a larger alignment, you may find better that the buffer is more +consistent to where it is placed:: + + reserve_mem=3D12M:0x2000000:trace trace_instance=3Dboot_map@trace + +On boot up, the memory reserved for the ring buffer is validated. It will = go +through a series of tests to make sure that the ring buffer contains valid +data. If it is, it will then set it up to be available to read from the +instance. If it fails any of the tests, it will clear the entire ring buff= er +and initialize it as new. + +The layout of this mapped memory may not be consistent from kernel to +kernel, so only the same kernel is guaranteed to work if the mapping is +preserved. Switching to a different kernel version may find a different +layout and mark the buffer as invalid. + +Using trace_printk() in the boot instance +----------------------------------------- +By default, the content of trace_printk() goes into the top level tracing +instance. But this instance is never preserved across boots. To have the +trace_printk() content, and some other internal tracing go to the preserved +buffer (like dump stacks), either set the instance to be the trace_printk() +destination from the kernel command line, or set it after boot up via the +trace_printk_dest option. + +After boot up:: + + echo 1 > /sys/kernel/tracing/instances/boot_map/options/trace_printk_dest + +From the kernel command line:: + + reserve_mem=3D12M:4096:trace trace_instance=3Dboot_map^traceprintk^trace= off@trace + +If setting it from the kernel command line, it is recommended to also +disable tracing with the "traceoff" flag, and enable tracing after boot up. +Otherwise the trace from the most recent boot will be mixed with the trace +from the previous boot, and may make it confusing to read. --=20 2.43.0