In production, show_mem() can be called concurrently from two
different entities, for example one from oom_kill_process()
another from __alloc_pages_slowpath from another kthread. This
patch adds a mutex and invokes trylock before printing out the
kernel alloc info in show_mem(). This way two alloc info won't
interleave with each other, which then makes parsing easier.
Signed-off-by: Yueyang Pan <pyyjason@gmail.com>
---
mm/show_mem.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/mm/show_mem.c b/mm/show_mem.c
index b71e222fde86..8814b5f8a7dc 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages);
unsigned long totalreserve_pages __read_mostly;
unsigned long totalcma_pages __read_mostly;
+static DEFINE_MUTEX(mem_alloc_profiling_mutex);
+
static inline void show_node(struct zone *zone)
{
if (IS_ENABLED(CONFIG_NUMA))
@@ -419,7 +421,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
#endif
#ifdef CONFIG_MEM_ALLOC_PROFILING
- if (mem_alloc_profiling_enabled()) {
+ if (mem_alloc_profiling_enabled() && mutex_trylock(&mem_alloc_profiling_mutex)) {
struct codetag_bytes tags[10];
size_t i, nr;
@@ -445,6 +447,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
ct->lineno, ct->function);
}
}
+ mutex_unlock(&mem_alloc_profiling_mutex);
}
#endif
}
--
2.47.3
On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > In production, show_mem() can be called concurrently from two > different entities, for example one from oom_kill_process() > another from __alloc_pages_slowpath from another kthread. This > patch adds a mutex and invokes trylock before printing out the > kernel alloc info in show_mem(). This way two alloc info won't > interleave with each other, which then makes parsing easier. > Fair enough, I guess. > --- a/mm/show_mem.c > +++ b/mm/show_mem.c > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); > unsigned long totalreserve_pages __read_mostly; > unsigned long totalcma_pages __read_mostly; > > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); It would be a bit neater to make this local to __show_mem() - it didn't need file scope. Also, mutex_unlock() isn't to be used from interrupt context, so problem. Something like atomic cmpxchg or test_and_set_bit could be used and wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed here. > static inline void show_node(struct zone *zone) > { > if (IS_ENABLED(CONFIG_NUMA)) > @@ -419,7 +421,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > #endif > #ifdef CONFIG_MEM_ALLOC_PROFILING > - if (mem_alloc_profiling_enabled()) { > + if (mem_alloc_profiling_enabled() && mutex_trylock(&mem_alloc_profiling_mutex)) { > struct codetag_bytes tags[10]; > size_t i, nr; > > @@ -445,6 +447,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > ct->lineno, ct->function); > } > } > + mutex_unlock(&mem_alloc_profiling_mutex); > } If we're going to suppress the usual output then how about we let people know this happened, rather than silently dropping it? pr_notice("memory allocation output suppressed due to show_mem() contention\n") or something like that?
On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > > > In production, show_mem() can be called concurrently from two > > different entities, for example one from oom_kill_process() > > another from __alloc_pages_slowpath from another kthread. This > > patch adds a mutex and invokes trylock before printing out the > > kernel alloc info in show_mem(). This way two alloc info won't > > interleave with each other, which then makes parsing easier. > > > > Fair enough, I guess. > > > --- a/mm/show_mem.c > > +++ b/mm/show_mem.c > > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); > > unsigned long totalreserve_pages __read_mostly; > > unsigned long totalcma_pages __read_mostly; > > > > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); > > It would be a bit neater to make this local to __show_mem() - it didn't > need file scope. Thanks for your feedback, Andrew. I will move it the next version. > > Also, mutex_unlock() isn't to be used from interrupt context, so > problem. > > Something like atomic cmpxchg or test_and_set_bit could be used and > wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed > here. I was not aware of interrupt context before. I will change to test-and-set lock in the next version. > > > static inline void show_node(struct zone *zone) > > { > > if (IS_ENABLED(CONFIG_NUMA)) > > @@ -419,7 +421,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > #endif > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > - if (mem_alloc_profiling_enabled()) { > > + if (mem_alloc_profiling_enabled() && mutex_trylock(&mem_alloc_profiling_mutex)) { > > struct codetag_bytes tags[10]; > > size_t i, nr; > > > > @@ -445,6 +447,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > ct->lineno, ct->function); > > } > > } > > + mutex_unlock(&mem_alloc_profiling_mutex); > > } > > If we're going to suppress the usual output then how about we let > people know this happened, rather than silently dropping it? > > pr_notice("memory allocation output suppressed due to show_mem() contention\n") > > or something like that? For this point, I am sort of on Shakeel's side. Probably I won't call it suppressed as two concurrent printers is actually sharing this global information. Thanks, Pan
On 8/28/25 10:34, Yueyang Pan wrote: > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: >> >> > In production, show_mem() can be called concurrently from two >> > different entities, for example one from oom_kill_process() >> > another from __alloc_pages_slowpath from another kthread. This >> > patch adds a mutex and invokes trylock before printing out the >> > kernel alloc info in show_mem(). This way two alloc info won't >> > interleave with each other, which then makes parsing easier. What about the rest of the information printed by show_mem() being interleaved? >> > >> >> Fair enough, I guess. >> >> > --- a/mm/show_mem.c >> > +++ b/mm/show_mem.c >> > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); >> > unsigned long totalreserve_pages __read_mostly; >> > unsigned long totalcma_pages __read_mostly; >> > >> > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); >> >> It would be a bit neater to make this local to __show_mem() - it didn't >> need file scope. > > Thanks for your feedback, Andrew. I will move it the next version. > >> >> Also, mutex_unlock() isn't to be used from interrupt context, so >> problem. >> >> Something like atomic cmpxchg or test_and_set_bit could be used and >> wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed >> here. > > I was not aware of interrupt context before. I will change to test-and-set > lock in the next version. Perhaps simply spinlock_t with spin_trylock()?
On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: > > > > I was not aware of interrupt context before. I will change to test-and-set > > lock in the next version. > > Perhaps simply spinlock_t with spin_trylock()? > Will lockdep complain that this spinlock is taken in non-irq and irq context without disabling irqs?
On 8/28/25 18:35, Shakeel Butt wrote: > On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: >> > >> > I was not aware of interrupt context before. I will change to test-and-set >> > lock in the next version. >> >> Perhaps simply spinlock_t with spin_trylock()? >> > > Will lockdep complain that this spinlock is taken in non-irq and irq > context without disabling irqs? If we only use spin_trylock then it won't.
On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: > On 8/28/25 10:34, Yueyang Pan wrote: > > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > >> > >> > In production, show_mem() can be called concurrently from two > >> > different entities, for example one from oom_kill_process() > >> > another from __alloc_pages_slowpath from another kthread. This > >> > patch adds a mutex and invokes trylock before printing out the > >> > kernel alloc info in show_mem(). This way two alloc info won't > >> > interleave with each other, which then makes parsing easier. > > What about the rest of the information printed by show_mem() being interleaved? Thanks for your feedback, Vlastimil. We cannot use trylock for the rest part as node filter can be different. Do you think we need a lock to prevent the whole show_mem() from being interleaved and to acquire it at the very beginning? Will it be too heavy? > > >> > > >> > >> Fair enough, I guess. > >> > >> > --- a/mm/show_mem.c > >> > +++ b/mm/show_mem.c > >> > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); > >> > unsigned long totalreserve_pages __read_mostly; > >> > unsigned long totalcma_pages __read_mostly; > >> > > >> > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); > >> > >> It would be a bit neater to make this local to __show_mem() - it didn't > >> need file scope. > > > > Thanks for your feedback, Andrew. I will move it the next version. > > > >> > >> Also, mutex_unlock() isn't to be used from interrupt context, so > >> problem. > >> > >> Something like atomic cmpxchg or test_and_set_bit could be used and > >> wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed > >> here. > > > > I was not aware of interrupt context before. I will change to test-and-set > > lock in the next version. > > Perhaps simply spinlock_t with spin_trylock()? > Agreed. Thanks Pan
On 8/28/25 10:47, Yueyang Pan wrote: > On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: >> On 8/28/25 10:34, Yueyang Pan wrote: >> > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: >> >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: >> >> >> >> > In production, show_mem() can be called concurrently from two >> >> > different entities, for example one from oom_kill_process() >> >> > another from __alloc_pages_slowpath from another kthread. This >> >> > patch adds a mutex and invokes trylock before printing out the >> >> > kernel alloc info in show_mem(). This way two alloc info won't >> >> > interleave with each other, which then makes parsing easier. >> >> What about the rest of the information printed by show_mem() being interleaved? > > Thanks for your feedback, Vlastimil. We cannot use trylock for the rest > part as node filter can be different. Right. > Do you think we need a lock to prevent the whole show_mem() from being > interleaved and to acquire it at the very beginning? Will it be too > heavy? It might be risky so perhaps let's not. Guess we can disentangle by dmesg showing the thread id prefix.
On Thu, Aug 28, 2025 at 10:53:01AM +0200, Vlastimil Babka wrote: > On 8/28/25 10:47, Yueyang Pan wrote: > > On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: > >> On 8/28/25 10:34, Yueyang Pan wrote: > >> > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > >> >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > >> >> > >> >> > In production, show_mem() can be called concurrently from two > >> >> > different entities, for example one from oom_kill_process() > >> >> > another from __alloc_pages_slowpath from another kthread. This > >> >> > patch adds a mutex and invokes trylock before printing out the > >> >> > kernel alloc info in show_mem(). This way two alloc info won't > >> >> > interleave with each other, which then makes parsing easier. > >> > >> What about the rest of the information printed by show_mem() being interleaved? > > > > Thanks for your feedback, Vlastimil. We cannot use trylock for the rest > > part as node filter can be different. > > Right. > > > Do you think we need a lock to prevent the whole show_mem() from being > > interleaved and to acquire it at the very beginning? Will it be too > > heavy? > > It might be risky so perhaps let's not. Guess we can disentangle by dmesg > showing the thread id prefix. I have thought about this. Since each line can interleave with another, we would end up adding tid to each line. Not sure if this is acceptable. Thanks Pan
On 8/28/25 11:51, Yueyang Pan wrote: > On Thu, Aug 28, 2025 at 10:53:01AM +0200, Vlastimil Babka wrote: >> On 8/28/25 10:47, Yueyang Pan wrote: >> > On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: >> >> On 8/28/25 10:34, Yueyang Pan wrote: >> >> > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: >> >> >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: >> >> >> >> >> >> > In production, show_mem() can be called concurrently from two >> >> >> > different entities, for example one from oom_kill_process() >> >> >> > another from __alloc_pages_slowpath from another kthread. This >> >> >> > patch adds a mutex and invokes trylock before printing out the >> >> >> > kernel alloc info in show_mem(). This way two alloc info won't >> >> >> > interleave with each other, which then makes parsing easier. >> >> >> >> What about the rest of the information printed by show_mem() being interleaved? >> > >> > Thanks for your feedback, Vlastimil. We cannot use trylock for the rest >> > part as node filter can be different. >> >> Right. >> >> > Do you think we need a lock to prevent the whole show_mem() from being >> > interleaved and to acquire it at the very beginning? Will it be too >> > heavy? >> >> It might be risky so perhaps let's not. Guess we can disentangle by dmesg >> showing the thread id prefix. > > I have thought about this. Since each line can interleave with another, we > would end up adding tid to each line. Not sure if this is acceptable. I meant that printk/dmesg already does that so it's fine. > Thanks > Pan
On Thu, Aug 28, 2025 at 11:54:58AM +0200, Vlastimil Babka wrote: > On 8/28/25 11:51, Yueyang Pan wrote: > > On Thu, Aug 28, 2025 at 10:53:01AM +0200, Vlastimil Babka wrote: > >> On 8/28/25 10:47, Yueyang Pan wrote: > >> > On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: > >> >> On 8/28/25 10:34, Yueyang Pan wrote: > >> >> > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > >> >> >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > >> >> >> > >> >> >> > In production, show_mem() can be called concurrently from two > >> >> >> > different entities, for example one from oom_kill_process() > >> >> >> > another from __alloc_pages_slowpath from another kthread. This > >> >> >> > patch adds a mutex and invokes trylock before printing out the > >> >> >> > kernel alloc info in show_mem(). This way two alloc info won't > >> >> >> > interleave with each other, which then makes parsing easier. > >> >> > >> >> What about the rest of the information printed by show_mem() being interleaved? > >> > > >> > Thanks for your feedback, Vlastimil. We cannot use trylock for the rest > >> > part as node filter can be different. > >> > >> Right. > >> > >> > Do you think we need a lock to prevent the whole show_mem() from being > >> > interleaved and to acquire it at the very beginning? Will it be too > >> > heavy? > >> > >> It might be risky so perhaps let's not. Guess we can disentangle by dmesg > >> showing the thread id prefix. > > > > I have thought about this. Since each line can interleave with another, we > > would end up adding tid to each line. Not sure if this is acceptable. > > I meant that printk/dmesg already does that so it's fine. Cool. Then I will do this for the previous part before memory allocation info. > > > Thanks > > Pan >
On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > > > In production, show_mem() can be called concurrently from two > > different entities, for example one from oom_kill_process() > > another from __alloc_pages_slowpath from another kthread. This > > patch adds a mutex and invokes trylock before printing out the > > kernel alloc info in show_mem(). This way two alloc info won't > > interleave with each other, which then makes parsing easier. > > > > Fair enough, I guess. > > > --- a/mm/show_mem.c > > +++ b/mm/show_mem.c > > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); > > unsigned long totalreserve_pages __read_mostly; > > unsigned long totalcma_pages __read_mostly; > > > > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); > > It would be a bit neater to make this local to __show_mem() - it didn't > need file scope. +1, something static to __show_mem(). > > Also, mutex_unlock() isn't to be used from interrupt context, so > problem. > > Something like atomic cmpxchg or test_and_set_bit could be used and > wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed > here. +1 > > > static inline void show_node(struct zone *zone) > > { > > if (IS_ENABLED(CONFIG_NUMA)) > > @@ -419,7 +421,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > #endif > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > - if (mem_alloc_profiling_enabled()) { > > + if (mem_alloc_profiling_enabled() && mutex_trylock(&mem_alloc_profiling_mutex)) { > > struct codetag_bytes tags[10]; > > size_t i, nr; > > > > @@ -445,6 +447,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > ct->lineno, ct->function); > > } > > } > > + mutex_unlock(&mem_alloc_profiling_mutex); > > } > > If we're going to suppress the usual output then how about we let > people know this happened, rather than silently dropping it? > > pr_notice("memory allocation output suppressed due to show_mem() contention\n") > > or something like that? Personally I think this is not needed as this patch is suppressing only the memory allocation profiling output which is global, will be same for all the consumers and context does not matter. All consumers will get the memory allocation profiling data eventually.
On Wed, Aug 27, 2025 at 03:28:41PM -0700, Shakeel Butt wrote: > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > > On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > > > > > In production, show_mem() can be called concurrently from two > > > different entities, for example one from oom_kill_process() > > > another from __alloc_pages_slowpath from another kthread. This > > > patch adds a mutex and invokes trylock before printing out the > > > kernel alloc info in show_mem(). This way two alloc info won't > > > interleave with each other, which then makes parsing easier. > > > > > > > Fair enough, I guess. > > > > > --- a/mm/show_mem.c > > > +++ b/mm/show_mem.c > > > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); > > > unsigned long totalreserve_pages __read_mostly; > > > unsigned long totalcma_pages __read_mostly; > > > > > > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); > > > > It would be a bit neater to make this local to __show_mem() - it didn't > > need file scope. > > +1, something static to __show_mem(). Thanks for your feedback, Shakeel. See my reply to Andrew for this. > > > > > Also, mutex_unlock() isn't to be used from interrupt context, so > > problem. > > > > Something like atomic cmpxchg or test_and_set_bit could be used and > > wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed > > here. > > +1 Again, see my reply to Andrew. > > > > > > static inline void show_node(struct zone *zone) > > > { > > > if (IS_ENABLED(CONFIG_NUMA)) > > > @@ -419,7 +421,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > > #endif > > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > > - if (mem_alloc_profiling_enabled()) { > > > + if (mem_alloc_profiling_enabled() && mutex_trylock(&mem_alloc_profiling_mutex)) { > > > struct codetag_bytes tags[10]; > > > size_t i, nr; > > > > > > @@ -445,6 +447,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > > ct->lineno, ct->function); > > > } > > > } > > > + mutex_unlock(&mem_alloc_profiling_mutex); > > > } > > > > If we're going to suppress the usual output then how about we let > > people know this happened, rather than silently dropping it? > > > > pr_notice("memory allocation output suppressed due to show_mem() contention\n") > > > > or something like that? > > Personally I think this is not needed as this patch is suppressing only > the memory allocation profiling output which is global, will be same > for all the consumers and context does not matter. All consumers will > get the memory allocation profiling data eventually. For this point, I sort of agree with you. Wait for others' opinions? Thanks Pan
© 2016 - 2025 Red Hat, Inc.