Documentation/mm/allocation-profiling.rst | 2 +- lib/alloc_tag.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Boot parameters prefixed with "sysctl." are processed separately
during the final stage of system initialization via kernel_init()->
do_sysctl_args(). Since mem_profiling support should be parsed
in early boot stage, it is unsuitable for centralized handling
in do_sysctl_args().
Also, when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled,
the sysctl.vm.mem_profiling entry is not writable and will cause
a warning. To prevent duplicate processing of sysctl.vm.mem_profiling,
rename the boot parameter to "mem_profiling".
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
Documentation/mm/allocation-profiling.rst | 2 +-
lib/alloc_tag.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/mm/allocation-profiling.rst b/Documentation/mm/allocation-profiling.rst
index 316311240e6a..fe341d6da7b9 100644
--- a/Documentation/mm/allocation-profiling.rst
+++ b/Documentation/mm/allocation-profiling.rst
@@ -18,7 +18,7 @@ kconfig options:
missing annotation
Boot parameter:
- sysctl.vm.mem_profiling={0|1|never}[,compressed]
+ mem_profiling={0|1|never}[,compressed]
When set to "never", memory allocation profiling overhead is minimized and it
cannot be enabled at runtime (sysctl becomes read-only).
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 846a5b5b44a4..81b248196629 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -747,7 +747,7 @@ static int __init setup_early_mem_profiling(char *str)
return 0;
}
-early_param("sysctl.vm.mem_profiling", setup_early_mem_profiling);
+early_param("mem_profiling", setup_early_mem_profiling);
static __init bool need_page_alloc_tagging(void)
{
--
2.25.1
On Fri, Jan 09, 2026 at 06:24:19AM +0000, ranxiaokai627@163.com wrote: > From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > Boot parameters prefixed with "sysctl." are processed separately > during the final stage of system initialization via kernel_init()-> > do_sysctl_args(). Since mem_profiling support should be parsed > in early boot stage, it is unsuitable for centralized handling > in do_sysctl_args(). > Also, when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, > the sysctl.vm.mem_profiling entry is not writable and will cause > a warning. To prevent duplicate processing of sysctl.vm.mem_profiling, > rename the boot parameter to "mem_profiling". > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> How was this observed/detected? My reading of early_param() would seem to indicate that setup_early_mem_profiling() is getting called at the appropriate time - and then additionally a second time by do_sysctl_args(), which then becomes a noop. So the only bug would seem to be that the sysctl is not writeable in debug mode? There's an easier fix for that one...
>On Fri, Jan 09, 2026 at 06:24:19AM +0000, ranxiaokai627@163.com wrote:
>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>
>> Boot parameters prefixed with "sysctl." are processed separately
>> during the final stage of system initialization via kernel_init()->
>> do_sysctl_args(). Since mem_profiling support should be parsed
>> in early boot stage, it is unsuitable for centralized handling
>> in do_sysctl_args().
>> Also, when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled,
>> the sysctl.vm.mem_profiling entry is not writable and will cause
>> a warning. To prevent duplicate processing of sysctl.vm.mem_profiling,
>> rename the boot parameter to "mem_profiling".
>>
>> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>
>How was this observed/detected?
Actually no kernel bug or funtional defect was observed through testing.
Via code reading, i found after commit [1],
boot parameters prefixed with sysctl is processed redundantly.
[1] https://lore.kernel.org/all/20200427180433.7029-3-vbabka@suse.cz/T/#u
>My reading of early_param() would seem to indicate that
>setup_early_mem_profiling() is getting called at the appropriate time -
>and then additionally a second time by do_sysctl_args(), which then
>becomes a noop.
In the handling of process_sysctl_arg(), it at least needs to call
kern_mount("proc"), file_open_root_mnt("/proc/sys/vm/xxx"), kernel_write(),
and filp_close() for processing.
I dont quite understand why it was optimized into a noop.
>So the only bug would seem to be that the sysctl is not writeable in
>debug mode? There's an easier fix for that one...
- When debug mode is enabled, a warning is triggered because the file is not writable.
- When debug mode is disabled, do_sysctl_args() cannot handle boot parameters
like "1,compressed". It only accepts writes of 0 or 1.
As mem_profiling should be parsed in early boot, so this makes it
unsuitable for processing in do_sysctl_args(), which is why I have renamed the parameter.
But as Andrew mentioned, I did not consider the backward compatibility issues.
On Tue, Jan 13, 2026 at 03:27:35AM +0000, ranxiaokai627@163.com wrote: > >On Fri, Jan 09, 2026 at 06:24:19AM +0000, ranxiaokai627@163.com wrote: > >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > >> > >> Boot parameters prefixed with "sysctl." are processed separately > >> during the final stage of system initialization via kernel_init()-> > >> do_sysctl_args(). Since mem_profiling support should be parsed > >> in early boot stage, it is unsuitable for centralized handling > >> in do_sysctl_args(). > >> Also, when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, > >> the sysctl.vm.mem_profiling entry is not writable and will cause > >> a warning. To prevent duplicate processing of sysctl.vm.mem_profiling, > >> rename the boot parameter to "mem_profiling". > >> > >> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > > >How was this observed/detected? > > Actually no kernel bug or funtional defect was observed through testing. > Via code reading, i found after commit [1], > boot parameters prefixed with sysctl is processed redundantly. When bcachefs was in the kernel, I spent an inordinate amount of time in code reviews trying to convince people that yes, they really do need to be testing their code. Strangely enough, I have never had this issue with project contributors who did not come to the project by way of the kernel community... :)
On Mon, Jan 12, 2026 at 7:50 PM Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Tue, Jan 13, 2026 at 03:27:35AM +0000, ranxiaokai627@163.com wrote: > > >On Fri, Jan 09, 2026 at 06:24:19AM +0000, ranxiaokai627@163.com wrote: > > >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > >> > > >> Boot parameters prefixed with "sysctl." are processed separately > > >> during the final stage of system initialization via kernel_init()-> > > >> do_sysctl_args(). Since mem_profiling support should be parsed > > >> in early boot stage, it is unsuitable for centralized handling > > >> in do_sysctl_args(). > > >> Also, when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, > > >> the sysctl.vm.mem_profiling entry is not writable and will cause > > >> a warning. To prevent duplicate processing of sysctl.vm.mem_profiling, > > >> rename the boot parameter to "mem_profiling". > > >> > > >> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > > > > >How was this observed/detected? > > > > Actually no kernel bug or funtional defect was observed through testing. > > Via code reading, i found after commit [1], > > boot parameters prefixed with sysctl is processed redundantly. I was able to reproduce the warning by enabling CONFIG_MEM_ALLOC_PROFILING, CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT, CONFIG_MEM_ALLOC_PROFILING_DEBUG, CONFIG_SYSCTL and setting CONFIG_CMDLINE="1". The fix I posted eliminates that warning. Ran, you can post my suggestion yourself with me as Suggested-by or I can post it with you as Reported-by. Let me know your preference. > > When bcachefs was in the kernel, I spent an inordinate amount of time in > code reviews trying to convince people that yes, they really do need to > be testing their code. > > Strangely enough, I have never had this issue with project contributors > who did not come to the project by way of the kernel community... :)
>On Mon, Jan 12, 2026 at 7:50 PM Kent Overstreet
><kent.overstreet@linux.dev> wrote:
>>
>> On Tue, Jan 13, 2026 at 03:27:35AM +0000, ranxiaokai627@163.com wrote:
>> > >On Fri, Jan 09, 2026 at 06:24:19AM +0000, ranxiaokai627@163.com wrote:
>> > >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>> > >>
>> > >> Boot parameters prefixed with "sysctl." are processed separately
>> > >> during the final stage of system initialization via kernel_init()->
>> > >> do_sysctl_args(). Since mem_profiling support should be parsed
>> > >> in early boot stage, it is unsuitable for centralized handling
>> > >> in do_sysctl_args().
>> > >> Also, when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled,
>> > >> the sysctl.vm.mem_profiling entry is not writable and will cause
>> > >> a warning. To prevent duplicate processing of sysctl.vm.mem_profiling,
>> > >> rename the boot parameter to "mem_profiling".
>> > >>
>> > >> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>> > >
>> > >How was this observed/detected?
>> >
>> > Actually no kernel bug or funtional defect was observed through testing.
>> > Via code reading, i found after commit [1],
>> > boot parameters prefixed with sysctl is processed redundantly.
>
>I was able to reproduce the warning by enabling
>CONFIG_MEM_ALLOC_PROFILING,
>CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
>CONFIG_MEM_ALLOC_PROFILING_DEBUG, CONFIG_SYSCTL and setting
>CONFIG_CMDLINE="1".
>The fix I posted eliminates that warning. Ran, you can post my
>suggestion yourself with me as Suggested-by or I can post it with you
>as Reported-by. Let me know your preference.
I think this version is better.
[PATCH] alloc_tag: fix rw permission issue when handling boot parameter
Boot parameters prefixed with "sysctl." are processed
during the final stage of system initialization via kernel_init()->
do_sysctl_args(). When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled,
the sysctl.vm.mem_profiling entry is not writable and will cause
a warning.
Before run_init_process(), system initialization executes in kernel
thread context. Use current->mm to distinguish sysctl writes during
do_sysctl_args() from user-space triggered ones.
And when the proc_handler is from do_sysctl_args(), always return success
because the same value was already set by setup_early_mem_profiling()
and this eliminates a permission denied warning.
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Suggested-by: Suren Baghdasaryan <surenb@google.
---
lib/alloc_tag.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 846a5b5b44a4..00ae4673a271 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -776,8 +776,22 @@ EXPORT_SYMBOL(page_alloc_tagging_ops);
static int proc_mem_profiling_handler(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
- if (!mem_profiling_support && write)
- return -EINVAL;
+ if (write) {
+ /*
+ * Call from do_sysctl_args() which is a no-op since the same
+ * value was already set by setup_early_mem_profiling.
+ * Return success to avoid warnings from do_sysctl_args().
+ */
+ if (!current->mm)
+ return 0;
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ /* User can't toggle profiling while debugging */
+ return -EACCES;
+#endif
+ if (!mem_profiling_support)
+ return -EINVAL;
+ }
return proc_do_static_key(table, write, buffer, lenp, ppos);
}
@@ -787,11 +801,7 @@ static const struct ctl_table memory_allocation_profiling_sysctls[] = {
{
.procname = "mem_profiling",
.data = &mem_alloc_profiling_key,
-#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
- .mode = 0444,
-#else
.mode = 0644,
-#endif
.proc_handler = proc_mem_profiling_handler,
},
};
--
2.25.1
>> When bcachefs was in the kernel, I spent an inordinate amount of time in
>> code reviews trying to convince people that yes, they really do need to
>> be testing their code.
>>
>> Strangely enough, I have never had this issue with project contributors
>> who did not come to the project by way of the kernel community... :)
On Tue, Jan 13, 2026 at 11:25 PM <ranxiaokai627@163.com> wrote:
>
> >On Mon, Jan 12, 2026 at 7:50 PM Kent Overstreet
> ><kent.overstreet@linux.dev> wrote:
> >>
> >> On Tue, Jan 13, 2026 at 03:27:35AM +0000, ranxiaokai627@163.com wrote:
> >> > >On Fri, Jan 09, 2026 at 06:24:19AM +0000, ranxiaokai627@163.com wrote:
> >> > >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >> > >>
> >> > >> Boot parameters prefixed with "sysctl." are processed separately
> >> > >> during the final stage of system initialization via kernel_init()->
> >> > >> do_sysctl_args(). Since mem_profiling support should be parsed
> >> > >> in early boot stage, it is unsuitable for centralized handling
> >> > >> in do_sysctl_args().
> >> > >> Also, when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled,
> >> > >> the sysctl.vm.mem_profiling entry is not writable and will cause
> >> > >> a warning. To prevent duplicate processing of sysctl.vm.mem_profiling,
> >> > >> rename the boot parameter to "mem_profiling".
> >> > >>
> >> > >> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >> > >
> >> > >How was this observed/detected?
> >> >
> >> > Actually no kernel bug or funtional defect was observed through testing.
> >> > Via code reading, i found after commit [1],
> >> > boot parameters prefixed with sysctl is processed redundantly.
> >
> >I was able to reproduce the warning by enabling
> >CONFIG_MEM_ALLOC_PROFILING,
> >CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
> >CONFIG_MEM_ALLOC_PROFILING_DEBUG, CONFIG_SYSCTL and setting
> >CONFIG_CMDLINE="1".
> >The fix I posted eliminates that warning. Ran, you can post my
> >suggestion yourself with me as Suggested-by or I can post it with you
> >as Reported-by. Let me know your preference.
>
> I think this version is better.
>
> [PATCH] alloc_tag: fix rw permission issue when handling boot parameter
>
> Boot parameters prefixed with "sysctl." are processed
> during the final stage of system initialization via kernel_init()->
> do_sysctl_args(). When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled,
> the sysctl.vm.mem_profiling entry is not writable and will cause
> a warning.
>
> Before run_init_process(), system initialization executes in kernel
> thread context. Use current->mm to distinguish sysctl writes during
> do_sysctl_args() from user-space triggered ones.
>
> And when the proc_handler is from do_sysctl_args(), always return success
> because the same value was already set by setup_early_mem_profiling()
> and this eliminates a permission denied warning.
>
> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> Suggested-by: Suren Baghdasaryan <surenb@google.
Please fix the above tag, it should end with ">" instead of "." and
send a v2 as a separate email, not a reply to the original thread.
Otherwise LGTM. Feel free to add:
Acked-by: Suren Baghdasaryan <surenb@google.com>
> ---
> lib/alloc_tag.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index 846a5b5b44a4..00ae4673a271 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -776,8 +776,22 @@ EXPORT_SYMBOL(page_alloc_tagging_ops);
> static int proc_mem_profiling_handler(const struct ctl_table *table, int write,
> void *buffer, size_t *lenp, loff_t *ppos)
> {
> - if (!mem_profiling_support && write)
> - return -EINVAL;
> + if (write) {
> + /*
> + * Call from do_sysctl_args() which is a no-op since the same
> + * value was already set by setup_early_mem_profiling.
> + * Return success to avoid warnings from do_sysctl_args().
> + */
> + if (!current->mm)
> + return 0;
> +
> +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
> + /* User can't toggle profiling while debugging */
> + return -EACCES;
> +#endif
> + if (!mem_profiling_support)
> + return -EINVAL;
> + }
>
> return proc_do_static_key(table, write, buffer, lenp, ppos);
> }
> @@ -787,11 +801,7 @@ static const struct ctl_table memory_allocation_profiling_sysctls[] = {
> {
> .procname = "mem_profiling",
> .data = &mem_alloc_profiling_key,
> -#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
> - .mode = 0444,
> -#else
> .mode = 0644,
> -#endif
> .proc_handler = proc_mem_profiling_handler,
> },
> };
> --
> 2.25.1
>
>
> >> When bcachefs was in the kernel, I spent an inordinate amount of time in
> >> code reviews trying to convince people that yes, they really do need to
> >> be testing their code.
> >>
> >> Strangely enough, I have never had this issue with project contributors
> >> who did not come to the project by way of the kernel community... :)
>
On Sat, Jan 10, 2026 at 6:34 PM Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Fri, Jan 09, 2026 at 06:24:19AM +0000, ranxiaokai627@163.com wrote: > > From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > > > Boot parameters prefixed with "sysctl." are processed separately > > during the final stage of system initialization via kernel_init()-> > > do_sysctl_args(). Since mem_profiling support should be parsed > > in early boot stage, it is unsuitable for centralized handling > > in do_sysctl_args(). > > Also, when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, > > the sysctl.vm.mem_profiling entry is not writable and will cause > > a warning. To prevent duplicate processing of sysctl.vm.mem_profiling, > > rename the boot parameter to "mem_profiling". > > > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > How was this observed/detected? > > My reading of early_param() would seem to indicate that > setup_early_mem_profiling() is getting called at the appropriate time - > and then additionally a second time by do_sysctl_args(), which then > becomes a noop. > > So the only bug would seem to be that the sysctl is not writeable in > debug mode? There's an easier fix for that one... Sorry for the delay. That's not a bug. We want this sysctrl to be read-only when the debug option is enabled. Otherwise if user toggles mem_profiling sysctrl off and then on again, all allocations that were made between these events will be missing their tags and our debug mechanism will generate warnings for each such occurrence when freeing these allocations. I'll look closer into this warning. Maybe we can suppress it when the read-only sysctrl is already set to the value being assigned to it?
On Sun, Jan 11, 2026 at 11:50 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Sat, Jan 10, 2026 at 6:34 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Fri, Jan 09, 2026 at 06:24:19AM +0000, ranxiaokai627@163.com wrote:
> > > From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> > >
> > > Boot parameters prefixed with "sysctl." are processed separately
> > > during the final stage of system initialization via kernel_init()->
> > > do_sysctl_args(). Since mem_profiling support should be parsed
> > > in early boot stage, it is unsuitable for centralized handling
> > > in do_sysctl_args().
> > > Also, when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled,
> > > the sysctl.vm.mem_profiling entry is not writable and will cause
> > > a warning. To prevent duplicate processing of sysctl.vm.mem_profiling,
> > > rename the boot parameter to "mem_profiling".
> > >
> > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >
> > How was this observed/detected?
> >
> > My reading of early_param() would seem to indicate that
> > setup_early_mem_profiling() is getting called at the appropriate time -
> > and then additionally a second time by do_sysctl_args(), which then
> > becomes a noop.
> >
> > So the only bug would seem to be that the sysctl is not writeable in
> > debug mode? There's an easier fix for that one...
>
> Sorry for the delay.
> That's not a bug. We want this sysctrl to be read-only when the debug
> option is enabled. Otherwise if user toggles mem_profiling sysctrl off
> and then on again, all allocations that were made between these events
> will be missing their tags and our debug mechanism will generate
> warnings for each such occurrence when freeing these allocations.
> I'll look closer into this warning. Maybe we can suppress it when the
> read-only sysctrl is already set to the value being assigned to it?
I think the easiest way to fix this warning is to detect when the
modification is being done by do_sysctl_args() and return success, as
it's a no-op anyway (the same value was already assigned via
early_param). Something like this:
static int proc_mem_profiling_handler(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
- if (!mem_profiling_support && write)
- return -EINVAL;
+ if (write) {
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ /* User can't toggle profiling while debugging */
+ if (current->mm)
+ return -EACCES;
+
+ /*
+ * Call from do_sysctl_args() which is a no-op since the same
+ * value was already set by setup_early_mem_profiling.
+ * Return success to avoid warnings from do_sysctl_args().
+ */
+ return 0;
+#endif
+ if (!mem_profiling_support)
+ return -EINVAL;
+ }
return proc_do_static_key(table, write, buffer, lenp, ppos);
}
@@ -787,11 +801,7 @@ static const struct ctl_table
memory_allocation_profiling_sysctls[] = {
{
.procname = "mem_profiling",
.data = &mem_alloc_profiling_key,
-#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
- .mode = 0444,
-#else
.mode = 0644,
-#endif
.proc_handler = proc_mem_profiling_handler,
},
};
WDYT?
On Fri, 9 Jan 2026 06:24:19 +0000 ranxiaokai627@163.com wrote: > From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > Boot parameters prefixed with "sysctl." are processed separately > during the final stage of system initialization via kernel_init()-> > do_sysctl_args(). Since mem_profiling support should be parsed > in early boot stage, it is unsuitable for centralized handling > in do_sysctl_args(). > Also, when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, > the sysctl.vm.mem_profiling entry is not writable and will cause > a warning. To prevent duplicate processing of sysctl.vm.mem_profiling, > rename the boot parameter to "mem_profiling". Isn't this a backwardly-incompatible change? What happens to existing steups which are using sysctl.vm.mem_profiling=?
© 2016 - 2026 Red Hat, Inc.