[PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible

Barry Song posted 1 patch 1 week, 2 days ago
mm/vmalloc.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 43 insertions(+), 6 deletions(-)
[PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
Posted by Barry Song 1 week, 2 days ago
From: Barry Song <v-songbaohua@oppo.com>

In many cases, the pages passed to vmap() may include
high-order pages—for example, the systemheap often allocates
pages in descending order: order 8, then 4, then 0. Currently,
vmap() iterates over every page individually—even the pages
inside a high-order block are handled one by one. This patch
detects high-order pages and maps them as a single contiguous
block whenever possible.

Another possibility is to implement a new API, vmap_sg().
However, that change seems to be quite large in scope.

When vmapping a 128MB dma-buf using the systemheap,
this RFC appears to make system_heap_do_vmap() 16× faster:

W/ patch:
[   51.363682] system_heap_do_vmap took 2474000 ns
[   53.307044] system_heap_do_vmap took 2469008 ns
[   55.061985] system_heap_do_vmap took 2519008 ns
[   56.653810] system_heap_do_vmap took 2674000 ns

W/o patch:
[    8.260880] system_heap_do_vmap took 39490000 ns
[   32.513292] system_heap_do_vmap took 38784000 ns
[   82.673374] system_heap_do_vmap took 40711008 ns
[   84.579062] system_heap_do_vmap took 40236000 ns

Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: John Stultz <jstultz@google.com>
Cc: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/vmalloc.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0832f944544c..af2e3e8c052a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -642,6 +642,34 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
 	return err;
 }
 
+static inline int get_vmap_batch_order(struct page **pages,
+		unsigned int stride,
+		int max_steps,
+		unsigned int idx)
+{
+	/*
+	 * Currently, batching is only supported in vmap_pages_range
+	 * when page_shift == PAGE_SHIFT.
+	 */
+	if (stride != 1)
+		return 0;
+
+	struct page *base = pages[idx];
+	if (!PageHead(base))
+		return 0;
+
+	int order = compound_order(base);
+	int nr_pages = 1 << order;
+
+	if (max_steps < nr_pages)
+		return 0;
+
+	for (int i = 0; i < nr_pages; i++)
+		if (pages[idx + i] != base + i)
+			return 0;
+	return order;
+}
+
 /*
  * vmap_pages_range_noflush is similar to vmap_pages_range, but does not
  * flush caches.
@@ -655,23 +683,32 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
 		pgprot_t prot, struct page **pages, unsigned int page_shift)
 {
 	unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
+	unsigned int stride;
 
 	WARN_ON(page_shift < PAGE_SHIFT);
 
+	/*
+	 * Some users may allocate pages from high-order down to order 0.
+	 * We roughly check if the first page is a compound page. If so,
+	 * there is a chance to batch multiple pages together.
+	 */
 	if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
-			page_shift == PAGE_SHIFT)
+			(page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
 		return vmap_small_pages_range_noflush(addr, end, prot, pages);
 
-	for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
-		int err;
+	stride = 1U << (page_shift - PAGE_SHIFT);
+	for (i = 0; i < nr; ) {
+		int err, order;
 
-		err = vmap_range_noflush(addr, addr + (1UL << page_shift),
+		order = get_vmap_batch_order(pages, stride, nr - i, i);
+		err = vmap_range_noflush(addr, addr + (1UL << (page_shift + order)),
 					page_to_phys(pages[i]), prot,
-					page_shift);
+					page_shift + order);
 		if (err)
 			return err;
 
-		addr += 1UL << page_shift;
+		addr += 1UL  << (page_shift + order);
+		i += 1U << (order + page_shift - PAGE_SHIFT);
 	}
 
 	return 0;
-- 
2.39.3 (Apple Git-146)

Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
Posted by David Hildenbrand (Red Hat) 14 hours ago
On 11/22/25 10:03, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> In many cases, the pages passed to vmap() may include
> high-order pages—for example, the systemheap often allocates
> pages in descending order: order 8, then 4, then 0. Currently,
> vmap() iterates over every page individually—even the pages
> inside a high-order block are handled one by one. This patch
> detects high-order pages and maps them as a single contiguous
> block whenever possible.
> 
> Another possibility is to implement a new API, vmap_sg().
> However, that change seems to be quite large in scope.
> 
> When vmapping a 128MB dma-buf using the systemheap,
> this RFC appears to make system_heap_do_vmap() 16× faster:
> 
> W/ patch:
> [   51.363682] system_heap_do_vmap took 2474000 ns
> [   53.307044] system_heap_do_vmap took 2469008 ns
> [   55.061985] system_heap_do_vmap took 2519008 ns
> [   56.653810] system_heap_do_vmap took 2674000 ns
> 
> W/o patch:
> [    8.260880] system_heap_do_vmap took 39490000 ns
> [   32.513292] system_heap_do_vmap took 38784000 ns
> [   82.673374] system_heap_do_vmap took 40711008 ns
> [   84.579062] system_heap_do_vmap took 40236000 ns
> 
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>   mm/vmalloc.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
>   1 file changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 0832f944544c..af2e3e8c052a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -642,6 +642,34 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
>   	return err;
>   }
>   
> +static inline int get_vmap_batch_order(struct page **pages,
> +		unsigned int stride,
> +		int max_steps,
> +		unsigned int idx)

These fit into less lines.

ideally

\t\tunsigned int stride, int max_steps, unsigned int idx)

> +{

int order, nr_pages, i;
struct page *base;

But I think you can just drop "base". And order.

> +	/*
> +	 * Currently, batching is only supported in vmap_pages_range
> +	 * when page_shift == PAGE_SHIFT.
> +	 */
> +	if (stride != 1)
> +		return 0;
> +
> +	struct page *base = pages[idx];
> +	if (!PageHead(base))
> +		return 0;
> +
> +	int order = compound_order(base);
> +	int nr_pages = 1 << order;


You can drop the head check etc and simply do

nr_pages = compound_nr(pages[idx]);
if (nr_pages == 1)
	return 0;

Which raises the question: are these things folios? I assume not.

> +
> +	if (max_steps < nr_pages)
> +		return 0;
> +
> +	for (int i = 0; i < nr_pages; i++)
> +		if (pages[idx + i] != base + i)
> +			return 0;

if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
	return compound_order(pages[idx]);
return 0;

-- 
Cheers

David
Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
Posted by Barry Song 3 hours ago
On Mon, Dec 1, 2025 at 6:36 PM David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
[...]
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 0832f944544c..af2e3e8c052a 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -642,6 +642,34 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> >       return err;
> >   }
> >
> > +static inline int get_vmap_batch_order(struct page **pages,
> > +             unsigned int stride,
> > +             int max_steps,
> > +             unsigned int idx)
>
> These fit into less lines.
>
> ideally
>
> \t\tunsigned int stride, int max_steps, unsigned int idx)

Right, thanks!

>
> > +{
>
> int order, nr_pages, i;
> struct page *base;
>
> But I think you can just drop "base". And order.

Right, thanks!

>
> > +     /*
> > +      * Currently, batching is only supported in vmap_pages_range
> > +      * when page_shift == PAGE_SHIFT.
> > +      */
> > +     if (stride != 1)
> > +             return 0;
> > +
> > +     struct page *base = pages[idx];
> > +     if (!PageHead(base))
> > +             return 0;
> > +
> > +     int order = compound_order(base);
> > +     int nr_pages = 1 << order;
>
>
> You can drop the head check etc and simply do
>
> nr_pages = compound_nr(pages[idx]);
> if (nr_pages == 1)
>         return 0;
>

Nice. Since compound_nr() returns 1 for tail pages.

> Which raises the question: are these things folios? I assume not.

In my case, it’s simply alloc_pages with GFP_COMP. I assume that folios
allocated via folio_alloc() would also automatically benefit from this patch?

Currently, vmap() takes a pages array as an argument. So even for a folio,
we need to expand it into individual pages. Simply passing a folios array to
vmalloc likely won’t work, since vmap() could start and end at subpages
in the middle of a folio.

Thanks
Barry
Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
Posted by Uladzislau Rezki 4 days, 7 hours ago
On Sat, Nov 22, 2025 at 05:03:43PM +0800, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> In many cases, the pages passed to vmap() may include
> high-order pages—for example, the systemheap often allocates
> pages in descending order: order 8, then 4, then 0. Currently,
> vmap() iterates over every page individually—even the pages
> inside a high-order block are handled one by one. This patch
> detects high-order pages and maps them as a single contiguous
> block whenever possible.
> 
> Another possibility is to implement a new API, vmap_sg().
> However, that change seems to be quite large in scope.
> 
> When vmapping a 128MB dma-buf using the systemheap,
> this RFC appears to make system_heap_do_vmap() 16× faster:
> 
> W/ patch:
> [   51.363682] system_heap_do_vmap took 2474000 ns
> [   53.307044] system_heap_do_vmap took 2469008 ns
> [   55.061985] system_heap_do_vmap took 2519008 ns
> [   56.653810] system_heap_do_vmap took 2674000 ns
> 
> W/o patch:
> [    8.260880] system_heap_do_vmap took 39490000 ns
> [   32.513292] system_heap_do_vmap took 38784000 ns
> [   82.673374] system_heap_do_vmap took 40711008 ns
> [   84.579062] system_heap_do_vmap took 40236000 ns
> 
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  mm/vmalloc.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 0832f944544c..af2e3e8c052a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -642,6 +642,34 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
>  	return err;
>  }
>  
> +static inline int get_vmap_batch_order(struct page **pages,
> +		unsigned int stride,
> +		int max_steps,
> +		unsigned int idx)
> +{
> +	/*
> +	 * Currently, batching is only supported in vmap_pages_range
> +	 * when page_shift == PAGE_SHIFT.
> +	 */
> +	if (stride != 1)
> +		return 0;
> +
> +	struct page *base = pages[idx];
> +	if (!PageHead(base))
> +		return 0;
> +
> +	int order = compound_order(base);
> +	int nr_pages = 1 << order;
> +
> +	if (max_steps < nr_pages)
> +		return 0;
> +
> +	for (int i = 0; i < nr_pages; i++)
> +		if (pages[idx + i] != base + i)
> +			return 0;
> +	return order;
> +}
> +
>  /*
>   * vmap_pages_range_noflush is similar to vmap_pages_range, but does not
>   * flush caches.
> @@ -655,23 +683,32 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
>  		pgprot_t prot, struct page **pages, unsigned int page_shift)
>  {
>  	unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
> +	unsigned int stride;
>  
>  	WARN_ON(page_shift < PAGE_SHIFT);
>  
> +	/*
> +	 * Some users may allocate pages from high-order down to order 0.
> +	 * We roughly check if the first page is a compound page. If so,
> +	 * there is a chance to batch multiple pages together.
> +	 */
>  	if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> -			page_shift == PAGE_SHIFT)
> +			(page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
>
Do we support __GFP_COMP as vmalloc/vmap flag? As i see from latest:

/*
 * See __vmalloc_node_range() for a clear list of supported vmalloc flags.
 * This gfp lists all flags currently passed through vmalloc. Currently,
 * __GFP_ZERO is used by BPF and __GFP_NORETRY is used by percpu. Both drm
 * and BPF also use GFP_USER. Additionally, various users pass
 * GFP_KERNEL_ACCOUNT. Xfs uses __GFP_NOLOCKDEP.
 */
#define GFP_VMALLOC_SUPPORTED (GFP_KERNEL | GFP_ATOMIC | GFP_NOWAIT |\
                               __GFP_NOFAIL |  __GFP_ZERO | __GFP_NORETRY |\
                               GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
                               GFP_USER | __GFP_NOLOCKDEP)

Could you please clarify when PageCompound(pages[0]) returns true?

>  		return vmap_small_pages_range_noflush(addr, end, prot, pages);
>  
> -	for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
> -		int err;
> +	stride = 1U << (page_shift - PAGE_SHIFT);
> +	for (i = 0; i < nr; ) {
> +		int err, order;
>  
> -		err = vmap_range_noflush(addr, addr + (1UL << page_shift),
> +		order = get_vmap_batch_order(pages, stride, nr - i, i);
> +		err = vmap_range_noflush(addr, addr + (1UL << (page_shift + order)),
>  					page_to_phys(pages[i]), prot,
> -					page_shift);
> +					page_shift + order);
>  		if (err)
>  			return err;
>  
> -		addr += 1UL << page_shift;
> +		addr += 1UL  << (page_shift + order);
> +		i += 1U << (order + page_shift - PAGE_SHIFT);
>  	}
>  
>  	return 0;
> -- 
> 2.39.3 (Apple Git-146)
> 
Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
Posted by Barry Song 4 days, 4 hours ago
> >
> > +     /*
> > +      * Some users may allocate pages from high-order down to order 0.
> > +      * We roughly check if the first page is a compound page. If so,
> > +      * there is a chance to batch multiple pages together.
> > +      */
> >       if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > -                     page_shift == PAGE_SHIFT)
> > +                     (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> >
> Do we support __GFP_COMP as vmalloc/vmap flag? As i see from latest:

This is not the case for vmalloc, but applies to dma-bufs that are allocated
using alloc_pages() with GFP_COMP.

#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO)
#define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
                                | __GFP_NORETRY) & ~__GFP_RECLAIM) \
                                | __GFP_COMP)

>
> /*
>  * See __vmalloc_node_range() for a clear list of supported vmalloc flags.
>  * This gfp lists all flags currently passed through vmalloc. Currently,
>  * __GFP_ZERO is used by BPF and __GFP_NORETRY is used by percpu. Both drm
>  * and BPF also use GFP_USER. Additionally, various users pass
>  * GFP_KERNEL_ACCOUNT. Xfs uses __GFP_NOLOCKDEP.
>  */
> #define GFP_VMALLOC_SUPPORTED (GFP_KERNEL | GFP_ATOMIC | GFP_NOWAIT |\
>                                __GFP_NOFAIL |  __GFP_ZERO | __GFP_NORETRY |\
>                                GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
>                                GFP_USER | __GFP_NOLOCKDEP)
>
> Could you please clarify when PageCompound(pages[0]) returns true?
>

In this case, dma-buf attempts to allocate as many compound high-order pages
as possible, falling back to 0-order allocations if necessary.

Then, dma_buf_vmap() is called by the GPU drivers:

 1    404  drivers/accel/amdxdna/amdxdna_gem.c <<amdxdna_gem_obj_vmap>>
             dma_buf_vmap(abo->dma_buf, map);
   2   1568  drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
             ret = dma_buf_vmap(dmabuf, map);
   3    354  drivers/gpu/drm/drm_gem_shmem_helper.c
<<drm_gem_shmem_vmap_locked>>
             ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
   4     85  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
<<etnaviv_gem_prime_vmap_impl>>
             ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
   5    433  drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
             ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
   6     88  drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
             ret = dma_buf_vmap(obj->import_attach->dmabuf, map);

Thanks
Barry
Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
Posted by Uladzislau Rezki 13 hours ago
On Fri, Nov 28, 2025 at 04:43:54AM +0800, Barry Song wrote:
> > >
> > > +     /*
> > > +      * Some users may allocate pages from high-order down to order 0.
> > > +      * We roughly check if the first page is a compound page. If so,
> > > +      * there is a chance to batch multiple pages together.
> > > +      */
> > >       if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > > -                     page_shift == PAGE_SHIFT)
> > > +                     (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> > >
> > Do we support __GFP_COMP as vmalloc/vmap flag? As i see from latest:
> 
> This is not the case for vmalloc, but applies to dma-bufs that are allocated
> using alloc_pages() with GFP_COMP.
> 
> #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO)
> #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
>                                 | __GFP_NORETRY) & ~__GFP_RECLAIM) \
>                                 | __GFP_COMP)
> 
> >
> > /*
> >  * See __vmalloc_node_range() for a clear list of supported vmalloc flags.
> >  * This gfp lists all flags currently passed through vmalloc. Currently,
> >  * __GFP_ZERO is used by BPF and __GFP_NORETRY is used by percpu. Both drm
> >  * and BPF also use GFP_USER. Additionally, various users pass
> >  * GFP_KERNEL_ACCOUNT. Xfs uses __GFP_NOLOCKDEP.
> >  */
> > #define GFP_VMALLOC_SUPPORTED (GFP_KERNEL | GFP_ATOMIC | GFP_NOWAIT |\
> >                                __GFP_NOFAIL |  __GFP_ZERO | __GFP_NORETRY |\
> >                                GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
> >                                GFP_USER | __GFP_NOLOCKDEP)
> >
> > Could you please clarify when PageCompound(pages[0]) returns true?
> >
> 
> In this case, dma-buf attempts to allocate as many compound high-order pages
> as possible, falling back to 0-order allocations if necessary.
> 
OK, it is folio who uses it.

> Then, dma_buf_vmap() is called by the GPU drivers:
> 
>  1    404  drivers/accel/amdxdna/amdxdna_gem.c <<amdxdna_gem_obj_vmap>>
>              dma_buf_vmap(abo->dma_buf, map);
>    2   1568  drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
>              ret = dma_buf_vmap(dmabuf, map);
>    3    354  drivers/gpu/drm/drm_gem_shmem_helper.c
> <<drm_gem_shmem_vmap_locked>>
>              ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
>    4     85  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> <<etnaviv_gem_prime_vmap_impl>>
>              ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
>    5    433  drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
>              ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
>    6     88  drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
>              ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> 
Thank you for clarification. That would be good to reflect it in the
commit message. Also, please note that:

>       if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> -                     page_shift == PAGE_SHIFT)
> +                     (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
>
we rely on page_shift == PAGE_SHIFT condition for the non-sleep vmalloc()
allocations(GFP_ATOMIC, GFP_NOWAIT), so we go via vmap_small_pages_range_noflush()
path. Your patch adds !PageCompound(pages[0]) also. It is not a problem
since it is vmap() path but we need to comment that.

--
Uladzislau Rezki
Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
Posted by Barry Song 2 hours ago
On Mon, Dec 1, 2025 at 7:08 PM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Fri, Nov 28, 2025 at 04:43:54AM +0800, Barry Song wrote:
> > > >
> > > > +     /*
> > > > +      * Some users may allocate pages from high-order down to order 0.
> > > > +      * We roughly check if the first page is a compound page. If so,
> > > > +      * there is a chance to batch multiple pages together.
> > > > +      */
> > > >       if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > > > -                     page_shift == PAGE_SHIFT)
> > > > +                     (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> > > >
> > > Do we support __GFP_COMP as vmalloc/vmap flag? As i see from latest:
> >
> > This is not the case for vmalloc, but applies to dma-bufs that are allocated
> > using alloc_pages() with GFP_COMP.
> >
> > #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO)
> > #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
> >                                 | __GFP_NORETRY) & ~__GFP_RECLAIM) \
> >                                 | __GFP_COMP)
> >
> > >
> > > /*
> > >  * See __vmalloc_node_range() for a clear list of supported vmalloc flags.
> > >  * This gfp lists all flags currently passed through vmalloc. Currently,
> > >  * __GFP_ZERO is used by BPF and __GFP_NORETRY is used by percpu. Both drm
> > >  * and BPF also use GFP_USER. Additionally, various users pass
> > >  * GFP_KERNEL_ACCOUNT. Xfs uses __GFP_NOLOCKDEP.
> > >  */
> > > #define GFP_VMALLOC_SUPPORTED (GFP_KERNEL | GFP_ATOMIC | GFP_NOWAIT |\
> > >                                __GFP_NOFAIL |  __GFP_ZERO | __GFP_NORETRY |\
> > >                                GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
> > >                                GFP_USER | __GFP_NOLOCKDEP)
> > >
> > > Could you please clarify when PageCompound(pages[0]) returns true?
> > >
> >
> > In this case, dma-buf attempts to allocate as many compound high-order pages
> > as possible, falling back to 0-order allocations if necessary.
> >
> OK, it is folio who uses it.
>
> > Then, dma_buf_vmap() is called by the GPU drivers:
> >
> >  1    404  drivers/accel/amdxdna/amdxdna_gem.c <<amdxdna_gem_obj_vmap>>
> >              dma_buf_vmap(abo->dma_buf, map);
> >    2   1568  drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
> >              ret = dma_buf_vmap(dmabuf, map);
> >    3    354  drivers/gpu/drm/drm_gem_shmem_helper.c
> > <<drm_gem_shmem_vmap_locked>>
> >              ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> >    4     85  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > <<etnaviv_gem_prime_vmap_impl>>
> >              ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
> >    5    433  drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
> >              ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
> >    6     88  drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
> >              ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> >
> Thank you for clarification. That would be good to reflect it in the
> commit message. Also, please note that:

Sure.

>
> >       if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > -                     page_shift == PAGE_SHIFT)
> > +                     (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> >
> we rely on page_shift == PAGE_SHIFT condition for the non-sleep vmalloc()
> allocations(GFP_ATOMIC, GFP_NOWAIT), so we go via vmap_small_pages_range_noflush()
> path. Your patch adds !PageCompound(pages[0]) also. It is not a problem
> since it is vmap() path but we need to comment that.

Sure. Would the following work?

        /*
         * For vmap(), users may allocate pages from high orders down
to order 0,
         * while always using PAGE_SHIFT as the page_shift.
         * We first check whether the initial page is a compound page. If so,
         * there may be an opportunity to batch multiple pages together.
         */
        if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
                        (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
                return vmap_small_pages_range_noflush(addr, end, prot, pages);

Thanks
Barry