kernel/trace/ftrace.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-)
The pg_remaining calculation in ftrace_process_locs() assumes that
ENTRIES_PER_PAGE multiplied by 2^order equals the actual capacity of the
allocated page group. However, ENTRIES_PER_PAGE is PAGE_SIZE / ENTRY_SIZE
(integer division). When PAGE_SIZE is not a multiple of ENTRY_SIZE (e.g.
4096 / 24 = 170 with remainder 16), high-order allocations (like 256 pages)
have significantly more capacity than 256 * 170. This leads to pg_remaining
being underestimated, which in turn makes skip (derived from skipped -
pg_remaining) larger than expected, causing the WARN(skip != remaining)
to trigger.
Extra allocated pages for ftrace: 2 with 654 skipped
WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7295 ftrace_process_locs+0x5bf/0x5e0
A similar problem in ftrace_allocate_records() can result in allocating
too many pages. This can trigger the second warning in
ftrace_process_locs().
Extra allocated pages for ftrace
WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7276 ftrace_process_locs+0x548/0x580
Use the actual capacity of a page group to determine if too many pages
have been allocated to solve the problem. Also use the actual capacity
of a page group to determine the number of pages needed to avoid over-
allocations in ftrace_allocate_records().
Fixes: 4a3efc6baff93 ("ftrace: Update the mcount_loc check of skipped entries")
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
kernel/trace/ftrace.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index ef2d5dca6f70..211ec7a04f7e 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3844,7 +3844,7 @@ static int ftrace_allocate_records(struct ftrace_page *pg, int count)
return -EINVAL;
/* We want to fill as much as possible, with no empty pages */
- pages = DIV_ROUND_UP(count, ENTRIES_PER_PAGE);
+ pages = DIV_ROUND_UP(count * ENTRY_SIZE, PAGE_SIZE);
order = fls(pages) - 1;
again:
@@ -7308,24 +7308,33 @@ static int ftrace_process_locs(struct module *mod,
unsigned long skip;
/* Count the number of entries unused and compare it to skipped. */
- pg_remaining = (ENTRIES_PER_PAGE << pg->order) - pg->index;
+ pg_remaining = (PAGE_SIZE << pg->order) / ENTRY_SIZE - pg->index;
if (!WARN(skipped < pg_remaining, "Extra allocated pages for ftrace")) {
+ unsigned long space = 0;
skip = skipped - pg_remaining;
- for (pg = pg_unuse; pg; pg = pg->next)
+ for (pg = pg_unuse; pg; pg = pg->next) {
remaining += 1 << pg->order;
+ /*
+ * The capacity of a page group is
+ * (PAGE_SIZE << order) / ENTRY_SIZE
+ * Accumulate the total capacity of unused pages.
+ */
+ space += (PAGE_SIZE << pg->order) / ENTRY_SIZE;
+ }
pages -= remaining;
- skip = DIV_ROUND_UP(skip, ENTRIES_PER_PAGE);
-
/*
- * Check to see if the number of pages remaining would
- * just fit the number of entries skipped.
+ * Check to see if extra pages have been allocated.
+ * Only warn if the number of unused entries is larger
+ * than the number of entries per page to avoid false
+ * positives due to rounding.
*/
- WARN(skip != remaining, "Extra allocated pages for ftrace: %lu with %lu skipped",
+ WARN(space - skip > ENTRIES_PER_PAGE,
+ "Extra allocated pages for ftrace: %lu with %lu skipped",
remaining, skipped);
}
/* Need to synchronize with ftrace_location_range() */
--
2.45.2
On Tue, 6 Jan 2026 12:35:33 -0800
Guenter Roeck <linux@roeck-us.net> wrote:
> The pg_remaining calculation in ftrace_process_locs() assumes that
> ENTRIES_PER_PAGE multiplied by 2^order equals the actual capacity of the
> allocated page group. However, ENTRIES_PER_PAGE is PAGE_SIZE / ENTRY_SIZE
> (integer division). When PAGE_SIZE is not a multiple of ENTRY_SIZE (e.g.
> 4096 / 24 = 170 with remainder 16), high-order allocations (like 256 pages)
> have significantly more capacity than 256 * 170. This leads to pg_remaining
> being underestimated, which in turn makes skip (derived from skipped -
> pg_remaining) larger than expected, causing the WARN(skip != remaining)
> to trigger.
Nice catch! I guess you have a machine that allows much higher order
allocations than I do ;-)
>
> Extra allocated pages for ftrace: 2 with 654 skipped
> WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7295 ftrace_process_locs+0x5bf/0x5e0
>
> A similar problem in ftrace_allocate_records() can result in allocating
> too many pages. This can trigger the second warning in
> ftrace_process_locs().
>
> Extra allocated pages for ftrace
> WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7276 ftrace_process_locs+0x548/0x580
>
> Use the actual capacity of a page group to determine if too many pages
> have been allocated to solve the problem. Also use the actual capacity
> of a page group to determine the number of pages needed to avoid over-
> allocations in ftrace_allocate_records().
>
> Fixes: 4a3efc6baff93 ("ftrace: Update the mcount_loc check of skipped entries")
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
> kernel/trace/ftrace.c | 25 +++++++++++++++++--------
> 1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index ef2d5dca6f70..211ec7a04f7e 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -3844,7 +3844,7 @@ static int ftrace_allocate_records(struct ftrace_page *pg, int count)
> return -EINVAL;
>
> /* We want to fill as much as possible, with no empty pages */
> - pages = DIV_ROUND_UP(count, ENTRIES_PER_PAGE);
> + pages = DIV_ROUND_UP(count * ENTRY_SIZE, PAGE_SIZE);
> order = fls(pages) - 1;
>
> again:
> @@ -7308,24 +7308,33 @@ static int ftrace_process_locs(struct module *mod,
> unsigned long skip;
>
> /* Count the number of entries unused and compare it to skipped. */
> - pg_remaining = (ENTRIES_PER_PAGE << pg->order) - pg->index;
> + pg_remaining = (PAGE_SIZE << pg->order) / ENTRY_SIZE - pg->index;
>
> if (!WARN(skipped < pg_remaining, "Extra allocated pages for ftrace")) {
> + unsigned long space = 0;
>
> skip = skipped - pg_remaining;
>
> - for (pg = pg_unuse; pg; pg = pg->next)
> + for (pg = pg_unuse; pg; pg = pg->next) {
> remaining += 1 << pg->order;
> + /*
> + * The capacity of a page group is
> + * (PAGE_SIZE << order) / ENTRY_SIZE
> + * Accumulate the total capacity of unused pages.
> + */
> + space += (PAGE_SIZE << pg->order) / ENTRY_SIZE;
> + }
>
> pages -= remaining;
I think pages is meaningless here, as it was set in the beginning with:
pages = DIV_ROUND_UP(count, ENTRIES_PER_PAGE);
Which is incorrect. I wonder if we should set it via:
/*
* Use ftrace_number_of_pages to determine how many pages were
* allocated
*/
pages = ftrace_number_of_pages;
start_pg = ftrace_allocate_pages(count);
if (!start_pg)
return -ENOMEM;
/* ftrace_allocate_pages() increments ftrace_number_of_pages */
pages = ftrace_number_of_pages - pages;
This will make pages equal the number of pages that were allocated. Then
I'm not sure we need this extra logic.
-- Steve
>
> - skip = DIV_ROUND_UP(skip, ENTRIES_PER_PAGE);
> -
> /*
> - * Check to see if the number of pages remaining would
> - * just fit the number of entries skipped.
> + * Check to see if extra pages have been allocated.
> + * Only warn if the number of unused entries is larger
> + * than the number of entries per page to avoid false
> + * positives due to rounding.
> */
> - WARN(skip != remaining, "Extra allocated pages for ftrace: %lu with %lu skipped",
> + WARN(space - skip > ENTRIES_PER_PAGE,
> + "Extra allocated pages for ftrace: %lu with %lu skipped",
> remaining, skipped);
> }
> /* Need to synchronize with ftrace_location_range() */
Hi Steven,
On Tue, Jan 06, 2026 at 04:18:44PM -0500, Steven Rostedt wrote:
> On Tue, 6 Jan 2026 12:35:33 -0800
> Guenter Roeck <linux@roeck-us.net> wrote:
>
> > The pg_remaining calculation in ftrace_process_locs() assumes that
> > ENTRIES_PER_PAGE multiplied by 2^order equals the actual capacity of the
> > allocated page group. However, ENTRIES_PER_PAGE is PAGE_SIZE / ENTRY_SIZE
> > (integer division). When PAGE_SIZE is not a multiple of ENTRY_SIZE (e.g.
> > 4096 / 24 = 170 with remainder 16), high-order allocations (like 256 pages)
> > have significantly more capacity than 256 * 170. This leads to pg_remaining
> > being underestimated, which in turn makes skip (derived from skipped -
> > pg_remaining) larger than expected, causing the WARN(skip != remaining)
> > to trigger.
>
> Nice catch! I guess you have a machine that allows much higher order
> allocations than I do ;-)
>
I have actually seen the problem when the system tried to allocate 341
entries. 2 x 170 = 340, so the code allocated three pages, and things
went downhill from there.
Here is a specific example of the allocation sequence with debug
information.
ftrace_allocate_records: Requesting 52904 entries: 310 pages [order 8]. Alternate: 312 pages [order 8]
Allocated 43690 records, 9214 remaining
ftrace_allocate_records: Requesting 9214 entries: 54 pages [order 5]. Alternate: 55 pages [order 5]
Allocated 5461 records, 3753 remaining
ftrace_allocate_records: Requesting 3753 entries: 22 pages [order 4]. Alternate: 23 pages [order 4]
Allocated 2730 records, 1023 remaining
ftrace_allocate_records: Requesting 1023 entries: 6 pages [order 2]. Alternate: 7 pages [order 2]
Allocated 682 records, 341 remaining
ftrace_allocate_records: Requesting 341 entries: 2 pages [order 1]. Alternate: 3 pages [order 1]
Allocated 341 records, 0 remaining
The description above is just a more extreme example.
> >
> > Extra allocated pages for ftrace: 2 with 654 skipped
> > WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7295 ftrace_process_locs+0x5bf/0x5e0
> >
> > A similar problem in ftrace_allocate_records() can result in allocating
> > too many pages. This can trigger the second warning in
> > ftrace_process_locs().
> >
> > Extra allocated pages for ftrace
> > WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7276 ftrace_process_locs+0x548/0x580
> >
> > Use the actual capacity of a page group to determine if too many pages
> > have been allocated to solve the problem. Also use the actual capacity
> > of a page group to determine the number of pages needed to avoid over-
> > allocations in ftrace_allocate_records().
> >
> > Fixes: 4a3efc6baff93 ("ftrace: Update the mcount_loc check of skipped entries")
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> > ---
> > kernel/trace/ftrace.c | 25 +++++++++++++++++--------
> > 1 file changed, 17 insertions(+), 8 deletions(-)
> >
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index ef2d5dca6f70..211ec7a04f7e 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -3844,7 +3844,7 @@ static int ftrace_allocate_records(struct ftrace_page *pg, int count)
> > return -EINVAL;
> >
> > /* We want to fill as much as possible, with no empty pages */
> > - pages = DIV_ROUND_UP(count, ENTRIES_PER_PAGE);
> > + pages = DIV_ROUND_UP(count * ENTRY_SIZE, PAGE_SIZE);
> > order = fls(pages) - 1;
> >
> > again:
> > @@ -7308,24 +7308,33 @@ static int ftrace_process_locs(struct module *mod,
> > unsigned long skip;
> >
> > /* Count the number of entries unused and compare it to skipped. */
> > - pg_remaining = (ENTRIES_PER_PAGE << pg->order) - pg->index;
> > + pg_remaining = (PAGE_SIZE << pg->order) / ENTRY_SIZE - pg->index;
> >
> > if (!WARN(skipped < pg_remaining, "Extra allocated pages for ftrace")) {
> > + unsigned long space = 0;
> >
> > skip = skipped - pg_remaining;
> >
> > - for (pg = pg_unuse; pg; pg = pg->next)
> > + for (pg = pg_unuse; pg; pg = pg->next) {
> > remaining += 1 << pg->order;
> > + /*
> > + * The capacity of a page group is
> > + * (PAGE_SIZE << order) / ENTRY_SIZE
> > + * Accumulate the total capacity of unused pages.
> > + */
> > + space += (PAGE_SIZE << pg->order) / ENTRY_SIZE;
> > + }
> >
> > pages -= remaining;
>
> I think pages is meaningless here, as it was set in the beginning with:
>
> pages = DIV_ROUND_UP(count, ENTRIES_PER_PAGE);
>
> Which is incorrect. I wonder if we should set it via:
>
I know, I just didn't want to make more changes than absolutely
necessary.
> /*
> * Use ftrace_number_of_pages to determine how many pages were
> * allocated
> */
> pages = ftrace_number_of_pages;
>
> start_pg = ftrace_allocate_pages(count);
> if (!start_pg)
> return -ENOMEM;
>
> /* ftrace_allocate_pages() increments ftrace_number_of_pages */
> pages = ftrace_number_of_pages - pages;
>
That might work, assuming that the code updating ftrace_number_of_pages
is (mutex) protected. I don't immediately see that, and the
"mutex_lock(&ftrace_lock);" right after the above code makes me a bit
concerned.
> This will make pages equal the number of pages that were allocated. Then
> I'm not sure we need this extra logic.
>
Which extra logic do you refer to ? Everything in "if (pg_unused)"
except for the calls to synchronize_rcu() and ftrace_free_pages() ?
If so I'll be more than happy to drop that.
Thanks,
Guenter
On Tue, Jan 06, 2026 at 02:26:23PM -0800, Guenter Roeck wrote: > > > > > > pages -= remaining; > > > > I think pages is meaningless here, as it was set in the beginning with: > > > > pages = DIV_ROUND_UP(count, ENTRIES_PER_PAGE); > > > > Which is incorrect. I wonder if we should set it via: > > > I know, I just didn't want to make more changes than absolutely > necessary. > > > /* > > * Use ftrace_number_of_pages to determine how many pages were > > * allocated > > */ > > pages = ftrace_number_of_pages; > > > > start_pg = ftrace_allocate_pages(count); > > if (!start_pg) > > return -ENOMEM; > > > > /* ftrace_allocate_pages() increments ftrace_number_of_pages */ > > pages = ftrace_number_of_pages - pages; > > > > That might work, assuming that the code updating ftrace_number_of_pages > is (mutex) protected. I don't immediately see that, and the > "mutex_lock(&ftrace_lock);" right after the above code makes me a bit > concerned. > One way to avoid the locking problem without potentially risky code changes would be to pass a pointer to pages to ftrace_allocate_pages() and to ftrace_allocate_records(), and to update it from there. I tested that and confirmed that it works. Guenter
On Tue, 6 Jan 2026 15:05:25 -0800
Guenter Roeck <linux@roeck-us.net> wrote:
> > > /*
> > > * Use ftrace_number_of_pages to determine how many pages were
> > > * allocated
> > > */
> > > pages = ftrace_number_of_pages;
> > >
> > > start_pg = ftrace_allocate_pages(count);
> > > if (!start_pg)
> > > return -ENOMEM;
> > >
> > > /* ftrace_allocate_pages() increments ftrace_number_of_pages */
> > > pages = ftrace_number_of_pages - pages;
> > >
> >
> > That might work, assuming that the code updating ftrace_number_of_pages
> > is (mutex) protected. I don't immediately see that, and the
> > "mutex_lock(&ftrace_lock);" right after the above code makes me a bit
> > concerned.
> >
>
> One way to avoid the locking problem without potentially risky code changes
> would be to pass a pointer to pages to ftrace_allocate_pages() and to
> ftrace_allocate_records(), and to update it from there. I tested that and
> confirmed that it works.
I was originally going to suggest that, but when looking at the code, I
noticed that these variables could be useful. They are only updated on boot
up, module load, module unload and when module memory is freed.
But looking into the module code, these updates are done outside of the
module_mutex. This means these values need to be converted to atomics as
they are updated without any protection.
Yeah, better to just get the value from passing in a parameter to both
ftrace_allocate_pages() and to ftrace_allocate_records().
Something like:
unsigend long pages = 0;
[..]
start_pg = ftrace_allocate_pages(count, &pages);
[..]
ftrace_allocate_pages(unsigned long num_to_init, unsigned long *num_pages) {
[..]
cnt = ftrace_allocate_records(pg, num_to_init, num_pages);
And have ftrace_allocte_records() have:
pages = 1 << order;
*num_pages += pages;
ftrace_number_of_pages += pages;
And I'll add another patch on top of this to make the variables atomic.
Thanks,
-- Steve
Hi Steve,
On Tue, Jan 06, 2026 at 06:38:15PM -0500, Steven Rostedt wrote:
> On Tue, 6 Jan 2026 15:05:25 -0800
> Guenter Roeck <linux@roeck-us.net> wrote:
>
> > > > /*
> > > > * Use ftrace_number_of_pages to determine how many pages were
> > > > * allocated
> > > > */
> > > > pages = ftrace_number_of_pages;
> > > >
> > > > start_pg = ftrace_allocate_pages(count);
> > > > if (!start_pg)
> > > > return -ENOMEM;
> > > >
> > > > /* ftrace_allocate_pages() increments ftrace_number_of_pages */
> > > > pages = ftrace_number_of_pages - pages;
> > > >
> > >
> > > That might work, assuming that the code updating ftrace_number_of_pages
> > > is (mutex) protected. I don't immediately see that, and the
> > > "mutex_lock(&ftrace_lock);" right after the above code makes me a bit
> > > concerned.
> > >
> >
> > One way to avoid the locking problem without potentially risky code changes
> > would be to pass a pointer to pages to ftrace_allocate_pages() and to
> > ftrace_allocate_records(), and to update it from there. I tested that and
> > confirmed that it works.
>
> I was originally going to suggest that, but when looking at the code, I
> noticed that these variables could be useful. They are only updated on boot
> up, module load, module unload and when module memory is freed.
>
> But looking into the module code, these updates are done outside of the
> module_mutex. This means these values need to be converted to atomics as
> they are updated without any protection.
>
> Yeah, better to just get the value from passing in a parameter to both
> ftrace_allocate_pages() and to ftrace_allocate_records().
>
> Something like:
>
> unsigend long pages = 0;
>
> [..]
> start_pg = ftrace_allocate_pages(count, &pages);
>
> [..]
> ftrace_allocate_pages(unsigned long num_to_init, unsigned long *num_pages) {
> [..]
> cnt = ftrace_allocate_records(pg, num_to_init, num_pages);
>
> And have ftrace_allocte_records() have:
>
> pages = 1 << order;
> *num_pages += pages;
> ftrace_number_of_pages += pages;
>
That is exactly what I tested. With that, I assume I can drop the
code to update pages in ftrace_process_locs(), inclusing the warning
backtraces. I'll send v2 with those changes.
Thanks,
Guenter
© 2016 - 2026 Red Hat, Inc.