vm_area_alloc_pages() contains the only voluntary reschedule points
along vmalloc() allocation path. They are needed to ensure forward
progress on PREEMPT_NONE kernels under contention for vmap metadata
(e.g. alloc_vmap_area()).
However, yielding should only be done if the given GFP flags allow
blocking. This patch avoids calling cond_resched() when allocation
context is non-blocking(GFP_ATOMIC, GFP_NOWAIT).
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
mm/vmalloc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 49a0f81930a8..b77e8be75f10 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3633,7 +3633,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
pages + nr_allocated);
nr_allocated += nr;
- cond_resched();
+
+ if (gfpflags_allow_blocking(gfp))
+ cond_resched();
/*
* If zero or pages were obtained partly,
@@ -3675,7 +3677,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
for (i = 0; i < (1U << order); i++)
pages[nr_allocated + i] = page + i;
- cond_resched();
+ if (gfpflags_allow_blocking(gfp))
+ cond_resched();
+
nr_allocated += 1U << order;
}
--
2.47.3
On 09/15/25 at 03:40pm, Uladzislau Rezki (Sony) wrote: > vm_area_alloc_pages() contains the only voluntary reschedule points > along vmalloc() allocation path. They are needed to ensure forward > progress on PREEMPT_NONE kernels under contention for vmap metadata > (e.g. alloc_vmap_area()). > > However, yielding should only be done if the given GFP flags allow > blocking. This patch avoids calling cond_resched() when allocation > context is non-blocking(GFP_ATOMIC, GFP_NOWAIT). > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > --- > mm/vmalloc.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) Reviewed-by: Baoquan He <bhe@redhat.com> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 49a0f81930a8..b77e8be75f10 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3633,7 +3633,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > pages + nr_allocated); > > nr_allocated += nr; > - cond_resched(); > + > + if (gfpflags_allow_blocking(gfp)) > + cond_resched(); > > /* > * If zero or pages were obtained partly, > @@ -3675,7 +3677,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > for (i = 0; i < (1U << order); i++) > pages[nr_allocated + i] = page + i; > > - cond_resched(); > + if (gfpflags_allow_blocking(gfp)) > + cond_resched(); > + > nr_allocated += 1U << order; > } > > -- > 2.47.3 >
On Mon 15-09-25 15:40:34, Uladzislau Rezki wrote: > vm_area_alloc_pages() contains the only voluntary reschedule points > along vmalloc() allocation path. They are needed to ensure forward > progress on PREEMPT_NONE kernels under contention for vmap metadata > (e.g. alloc_vmap_area()). > > However, yielding should only be done if the given GFP flags allow > blocking. This patch avoids calling cond_resched() when allocation > context is non-blocking(GFP_ATOMIC, GFP_NOWAIT). We do have cond_resched in the page allocator path, right? So unless I am missing something we can safely drope these. I thought we have discused this already. > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > --- > mm/vmalloc.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 49a0f81930a8..b77e8be75f10 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3633,7 +3633,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > pages + nr_allocated); > > nr_allocated += nr; > - cond_resched(); > + > + if (gfpflags_allow_blocking(gfp)) > + cond_resched(); > > /* > * If zero or pages were obtained partly, > @@ -3675,7 +3677,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > for (i = 0; i < (1U << order); i++) > pages[nr_allocated + i] = page + i; > > - cond_resched(); > + if (gfpflags_allow_blocking(gfp)) > + cond_resched(); > + > nr_allocated += 1U << order; > } > > -- > 2.47.3 -- Michal Hocko SUSE Labs
On Mon, Sep 15, 2025 at 07:11:27PM +0200, Michal Hocko wrote: > On Mon 15-09-25 15:40:34, Uladzislau Rezki wrote: > > vm_area_alloc_pages() contains the only voluntary reschedule points > > along vmalloc() allocation path. They are needed to ensure forward > > progress on PREEMPT_NONE kernels under contention for vmap metadata > > (e.g. alloc_vmap_area()). > > > > However, yielding should only be done if the given GFP flags allow > > blocking. This patch avoids calling cond_resched() when allocation > > context is non-blocking(GFP_ATOMIC, GFP_NOWAIT). > > We do have cond_resched in the page allocator path, right? > So unless I am missing something we can safely drope these. I thought we > have discused this already. > Yes, we discussed this. I did some test with dropped cond_resched() for !PREEMPT kernel and i can trigger soft-lockups under really heavy stress load. I prefer to keep them so far for consistency. I need some time to investigate it more. As i noted in commit message, the vmalloc() path only has those two resched points. Probably i need to move them into another place later. As for page-allocator, it is in a slow path which i do not hit in my stress-setup. -- Uladzislau Rezki
On Tue 16-09-25 17:28:36, Uladzislau Rezki wrote: > On Mon, Sep 15, 2025 at 07:11:27PM +0200, Michal Hocko wrote: > > On Mon 15-09-25 15:40:34, Uladzislau Rezki wrote: > > > vm_area_alloc_pages() contains the only voluntary reschedule points > > > along vmalloc() allocation path. They are needed to ensure forward > > > progress on PREEMPT_NONE kernels under contention for vmap metadata > > > (e.g. alloc_vmap_area()). > > > > > > However, yielding should only be done if the given GFP flags allow > > > blocking. This patch avoids calling cond_resched() when allocation > > > context is non-blocking(GFP_ATOMIC, GFP_NOWAIT). > > > > We do have cond_resched in the page allocator path, right? > > So unless I am missing something we can safely drope these. I thought we > > have discused this already. > > > Yes, we discussed this. I did some test with dropped cond_resched() for > !PREEMPT kernel and i can trigger soft-lockups under really heavy stress > load. > > I prefer to keep them so far for consistency. I need some time to > investigate it more. As i noted in commit message, the vmalloc() > path only has those two resched points. Probably i need to move > them into another place later. > > As for page-allocator, it is in a slow path which i do not hit in > my stress-setup. OK, so the fast path can trigger the soft lockup? If yes please mention that in the changelog so that we know why this is needed. With that included feel free to add Acked-by: Michal Hocko <mhocko@suse.com> -- Michal Hocko SUSE Labs
On Tue, Sep 16, 2025 at 08:08:18PM +0200, Michal Hocko wrote: > On Tue 16-09-25 17:28:36, Uladzislau Rezki wrote: > > On Mon, Sep 15, 2025 at 07:11:27PM +0200, Michal Hocko wrote: > > > On Mon 15-09-25 15:40:34, Uladzislau Rezki wrote: > > > > vm_area_alloc_pages() contains the only voluntary reschedule points > > > > along vmalloc() allocation path. They are needed to ensure forward > > > > progress on PREEMPT_NONE kernels under contention for vmap metadata > > > > (e.g. alloc_vmap_area()). > > > > > > > > However, yielding should only be done if the given GFP flags allow > > > > blocking. This patch avoids calling cond_resched() when allocation > > > > context is non-blocking(GFP_ATOMIC, GFP_NOWAIT). > > > > > > We do have cond_resched in the page allocator path, right? > > > So unless I am missing something we can safely drope these. I thought we > > > have discused this already. > > > > > Yes, we discussed this. I did some test with dropped cond_resched() for > > !PREEMPT kernel and i can trigger soft-lockups under really heavy stress > > load. > > > > I prefer to keep them so far for consistency. I need some time to > > investigate it more. As i noted in commit message, the vmalloc() > > path only has those two resched points. Probably i need to move > > them into another place later. > > > > As for page-allocator, it is in a slow path which i do not hit in > > my stress-setup. > > OK, so the fast path can trigger the soft lockup? If yes please mention > that in the changelog so that we know why this is needed. With that > included feel free to add > Acked-by: Michal Hocko <mhocko@suse.com> > We, in vmalloc(), also have a slow path. Those two points seem to help. I will move them later to alloc_vmal_area(), after a slow path serves a request. Thank you! -- Uladzislau Rezki
© 2016 - 2025 Red Hat, Inc.