[PATCH v1] mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0

Hailong Liu posted 1 patch 1 year, 6 months ago
There is a newer version of this series
mm/vmalloc.c     | 11 ++---------
mm/vmalloc.c.rej | 10 ++++++++++
2 files changed, 12 insertions(+), 9 deletions(-)
create mode 100644 mm/vmalloc.c.rej
[PATCH v1] mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
Posted by Hailong Liu 1 year, 6 months ago
The __vmap_pages_range_noflush() assumes its argument pages** contains
pages with the same page shift. However, since commit e9c3cda4d86e
(mm, vmalloc: fix high order __GFP_NOFAIL allocations), if gfp_flags
includes __GFP_NOFAIL with high order in vm_area_alloc_pages()
and page allocation failed for high order, the pages** may contain
two different page shifts (high order and order-0). This could
lead __vmap_pages_range_noflush() to perform incorrect mappings,
potentially resulting in memory corruption.

Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE):
kvmalloc(2M, __GFP_NOFAIL|GFP_X)
    __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
        vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
            vmap_pages_range()
                vmap_pages_range_noflush()
                    __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens

We can remove the fallback code because if a high-order
allocation fails, __vmalloc_node_range_noprof() will retry with
order-0. Therefore, it is unnecessary to fallback to order-0
here. Therefore, fix this by removing the fallback code.

Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
Signed-off-by: Hailong Liu <hailong.liu@oppo.com>
Reported-by: Tangquan.Zheng <zhengtangquan@oppo.com>
Cc: <stable@vger.kernel.org>
CC: Barry Song <21cnbao@gmail.com>
CC: Baoquan He <bhe@redhat.com>
CC: Matthew Wilcox <willy@infradead.org>
---
 mm/vmalloc.c     | 11 ++---------
 mm/vmalloc.c.rej | 10 ++++++++++
 2 files changed, 12 insertions(+), 9 deletions(-)
 create mode 100644 mm/vmalloc.c.rej

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6b783baf12a1..af2de36549d6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 			page = alloc_pages_noprof(alloc_gfp, order);
 		else
 			page = alloc_pages_node_noprof(nid, alloc_gfp, order);
-		if (unlikely(!page)) {
-			if (!nofail)
-				break;
-
-			/* fall back to the zero order allocations */
-			alloc_gfp |= __GFP_NOFAIL;
-			order = 0;
-			continue;
-		}
+		if (unlikely(!page))
+			break;

 		/*
 		 * Higher order allocations must be able to be treated as
diff --git a/mm/vmalloc.c.rej b/mm/vmalloc.c.rej
new file mode 100644
index 000000000000..c28017088319
--- /dev/null
+++ b/mm/vmalloc.c.rej
@@ -0,0 +1,10 @@
+--- mm/vmalloc.c
++++ mm/vmalloc.c
+@@ -3000,6 +3005,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
+ 	unsigned int nr_allocated = 0;
+ 	gfp_t alloc_gfp = gfp;
+ 	bool nofail = false;
++	bool fallback = false;
+ 	struct page *page;
+ 	int i;
+
---
Baoquan suggests set page_shift to 0 if fallback in (2 and concern about
performance of retry with order-0. But IMO with retry,
- Save memory usage if high order allocation failed.
- Keep consistancy with align and page-shift.
- make use of bulk allocator with order-0

[2] https://lore.kernel.org/lkml/20240725035318.471-1-hailong.liu@oppo.com/
--
2.34.1
Re: [PATCH v1] mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
Posted by Michal Hocko 1 year, 6 months ago
On Thu 08-08-24 20:00:58, Hailong Liu wrote:
> The __vmap_pages_range_noflush() assumes its argument pages** contains
> pages with the same page shift. However, since commit e9c3cda4d86e
> (mm, vmalloc: fix high order __GFP_NOFAIL allocations), if gfp_flags
> includes __GFP_NOFAIL with high order in vm_area_alloc_pages()
> and page allocation failed for high order, the pages** may contain
> two different page shifts (high order and order-0). This could
> lead __vmap_pages_range_noflush() to perform incorrect mappings,
> potentially resulting in memory corruption.
> 
> Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE):
> kvmalloc(2M, __GFP_NOFAIL|GFP_X)
>     __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
>         vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
>             vmap_pages_range()
>                 vmap_pages_range_noflush()
>                     __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
> 
> We can remove the fallback code because if a high-order
> allocation fails, __vmalloc_node_range_noprof() will retry with
> order-0. Therefore, it is unnecessary to fallback to order-0
> here. Therefore, fix this by removing the fallback code.
> 
> Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
> Signed-off-by: Hailong Liu <hailong.liu@oppo.com>
> Reported-by: Tangquan.Zheng <zhengtangquan@oppo.com>
> Cc: <stable@vger.kernel.org>
> CC: Barry Song <21cnbao@gmail.com>
> CC: Baoquan He <bhe@redhat.com>
> CC: Matthew Wilcox <willy@infradead.org>
> ---
>  mm/vmalloc.c     | 11 ++---------
>  mm/vmalloc.c.rej | 10 ++++++++++

What is this?

>  2 files changed, 12 insertions(+), 9 deletions(-)
>  create mode 100644 mm/vmalloc.c.rej
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6b783baf12a1..af2de36549d6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  			page = alloc_pages_noprof(alloc_gfp, order);
>  		else
>  			page = alloc_pages_node_noprof(nid, alloc_gfp, order);
> -		if (unlikely(!page)) {
> -			if (!nofail)
> -				break;
> -
> -			/* fall back to the zero order allocations */
> -			alloc_gfp |= __GFP_NOFAIL;
> -			order = 0;
> -			continue;
> -		}
> +		if (unlikely(!page))
> +			break;

This just makes the NOFAIL allocation fail. So this is not a correct
fix.

> 
>  		/*
>  		 * Higher order allocations must be able to be treated as

-- 
Michal Hocko
SUSE Labs
Re: [PATCH v1] mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
Posted by Michal Hocko 1 year, 6 months ago
On Fri 09-08-24 11:30:32, Michal Hocko wrote:
> On Thu 08-08-24 20:00:58, Hailong Liu wrote:
> > The __vmap_pages_range_noflush() assumes its argument pages** contains
> > pages with the same page shift. However, since commit e9c3cda4d86e
> > (mm, vmalloc: fix high order __GFP_NOFAIL allocations), if gfp_flags
> > includes __GFP_NOFAIL with high order in vm_area_alloc_pages()
> > and page allocation failed for high order, the pages** may contain
> > two different page shifts (high order and order-0). This could
> > lead __vmap_pages_range_noflush() to perform incorrect mappings,
> > potentially resulting in memory corruption.
> > 
> > Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE):
> > kvmalloc(2M, __GFP_NOFAIL|GFP_X)
> >     __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
> >         vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
> >             vmap_pages_range()
> >                 vmap_pages_range_noflush()
> >                     __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
> > 
> > We can remove the fallback code because if a high-order
> > allocation fails, __vmalloc_node_range_noprof() will retry with
> > order-0. Therefore, it is unnecessary to fallback to order-0
> > here. Therefore, fix this by removing the fallback code.
> > 
> > Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
> > Signed-off-by: Hailong Liu <hailong.liu@oppo.com>
> > Reported-by: Tangquan.Zheng <zhengtangquan@oppo.com>
> > Cc: <stable@vger.kernel.org>
> > CC: Barry Song <21cnbao@gmail.com>
> > CC: Baoquan He <bhe@redhat.com>
> > CC: Matthew Wilcox <willy@infradead.org>
> > ---
> >  mm/vmalloc.c     | 11 ++---------
> >  mm/vmalloc.c.rej | 10 ++++++++++
> 
> What is this?
> 
> >  2 files changed, 12 insertions(+), 9 deletions(-)
> >  create mode 100644 mm/vmalloc.c.rej
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 6b783baf12a1..af2de36549d6 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> >  			page = alloc_pages_noprof(alloc_gfp, order);
> >  		else
> >  			page = alloc_pages_node_noprof(nid, alloc_gfp, order);
> > -		if (unlikely(!page)) {
> > -			if (!nofail)
> > -				break;
> > -
> > -			/* fall back to the zero order allocations */
> > -			alloc_gfp |= __GFP_NOFAIL;
> > -			order = 0;
> > -			continue;
> > -		}
> > +		if (unlikely(!page))
> > +			break;
> 
> This just makes the NOFAIL allocation fail. So this is not a correct
> fix.

OK, I can see a newer version
-- 
Michal Hocko
SUSE Labs