This function is pretty handy for any type of VMA to provide a size-aligned
VMA address when mmap(). Rename the function and export it.
About the rename:
- Dropping "THP" because it doesn't really have much to do with THP
internally.
- The suffix "_aligned" imply it is a helper to generate aligned virtual
address based on what is specified (which can be not PMD_SIZE).
Cc: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/linux/huge_mm.h | 14 +++++++++++++-
mm/huge_memory.c | 6 ++++--
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2f190c90192d..706488d92bb6 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -339,7 +339,10 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags,
vm_flags_t vm_flags);
-
+unsigned long mm_get_unmapped_area_aligned(struct file *filp,
+ unsigned long addr, unsigned long len,
+ loff_t off, unsigned long flags, unsigned long size,
+ vm_flags_t vm_flags);
bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
unsigned int new_order);
@@ -543,6 +546,15 @@ thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
return 0;
}
+static inline unsigned long
+mm_get_unmapped_area_aligned(struct file *filp,
+ unsigned long addr, unsigned long len,
+ loff_t off, unsigned long flags, unsigned long size,
+ vm_flags_t vm_flags)
+{
+ return 0;
+}
+
static inline bool
can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
{
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4734de1dc0ae..52f13a70562f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1088,7 +1088,7 @@ static inline bool is_transparent_hugepage(const struct folio *folio)
folio_test_large_rmappable(folio);
}
-static unsigned long __thp_get_unmapped_area(struct file *filp,
+unsigned long mm_get_unmapped_area_aligned(struct file *filp,
unsigned long addr, unsigned long len,
loff_t off, unsigned long flags, unsigned long size,
vm_flags_t vm_flags)
@@ -1132,6 +1132,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
ret += off_sub;
return ret;
}
+EXPORT_SYMBOL_GPL(mm_get_unmapped_area_aligned);
unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags,
@@ -1140,7 +1141,8 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add
unsigned long ret;
loff_t off = (loff_t)pgoff << PAGE_SHIFT;
- ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags);
+ ret = mm_get_unmapped_area_aligned(filp, addr, len, off, flags,
+ PMD_SIZE, vm_flags);
if (ret)
return ret;
--
2.49.0
* Peter Xu <peterx@redhat.com> [250613 09:41]: > This function is pretty handy for any type of VMA to provide a size-aligned > VMA address when mmap(). Rename the function and export it. > > About the rename: > > - Dropping "THP" because it doesn't really have much to do with THP > internally. > > - The suffix "_aligned" imply it is a helper to generate aligned virtual > address based on what is specified (which can be not PMD_SIZE). I am not okay with this. You are renaming a function to drop thp and not moving it into generic code. Either it is a generic function that lives with the generic code, or drop this change all together. If this function is going to be generic, please make the return of 0 valid and not an error. You are masking all errors to 0 currently. vm_unmapped_area_info has an align_mask, and that's only used for hugepages. It is wrong to have a generic function that does not use the generic struct element that exists for this reason. Is there a reason that align_mask doesn't work, or why it's not used? The return of mm_get_unmapped_area_vmflags() is not aligned with the return of this function. That is, the address returned from mm_get_unmapped_area_vmflags() differs from __thp_get_unmapped_area() based on MMF_TOPDOWN, and/or something related to off_sub? Anyways, since it's different from mm_get_unmapped_area() in this regard, we cannot rename it mm_get_unmapped_area_aligned() - it sounds like a helper and is different, by a lot. I also am not okay to export it for no reason. Also, is it okay to export something as gpl or does the copyright holder need to do that (I have no idea about this stuff, or maybe you work for the copyright holder)? The hint (addr) is also never checked for alignment in this function and we are appending _aligned() to the name. With this change we can now get an unaligned _aligned() address. This (probably) can happen with MAP_FIXED today, but I don't think we imply it's going to be aligned elsewhere. Hate for the Pre-existing code in this function: Dear lord: off_sub = (off - ret) & (size - 1); Using ret here is just (to be polite) unclear to save one assignment. I expect the one assignment would be optimised out by the compilers. My hate for the unsub off_sub grows: ret += off_sub; return ret; It is extremely frustrating that the self-documenting parts of this function have documentation while poorly named variables are used in puzzling calculations without any. ... Thanks, Liam
On Sat, Jun 14, 2025 at 01:23:30AM -0400, Liam R. Howlett wrote: > vm_unmapped_area_info has an align_mask, and that's only used for > hugepages. It is wrong to have a generic function that does not use the > generic struct element that exists for this reason. Is there a reason > that align_mask doesn't work, or why it's not used? I had the same question and looked into it for a bit. It does seem desirable, but also not entirely straightforward. I think the arch code for arch_get_unmapped_area() needs some redesign to produce the vm_unmapped_area_info() that the core code can update. Unfortunately there are numerous weird things in the arches :\ Like x86 shouldn't be setting alignment for huge tlbfs files, that should be done in the core code by huge tlbfs caling the new mm_get_unmapped_area_aligned() on its own.. So I think we should leave this hacky implementation for now and start building out the generic side to call it in the right places, then we can consider how to implement a better integration with the arch code. Also, probably 'aligned' is not the right name. This new function should be called by VMA owners that know they have pgoff aligned high order folios/pfns inside their mapping. The 'align' argument is the max order of their pgoff aligned folio/pfns. The purpose of the function is to adjust the resulting area to optimize for the high order folios that are present while following the uAPI rules for mmap. Maybe call it something like _order and document it like the above? > I also am not okay to export it for no reason. The next patches are the reason. > Also, is it okay to export something as gpl or does the copyright holder > need to do that (I have no idea about this stuff, or maybe you work for > the copyright holder)? Yes, you are always safe to use the GPL export. > The hint (addr) is also never checked for alignment in this function and > we are appending _aligned() to the name. With this change we can now > get an unaligned _aligned() address. This (probably) can happen with > MAP_FIXED today, but I don't think we imply it's going to be aligned > elsewhere. Should be documented at least.. Jason
On Mon, Jun 16, 2025 at 09:14:28AM -0300, Jason Gunthorpe wrote: > Also, probably 'aligned' is not the right name. This new function > should be called by VMA owners that know they have pgoff aligned high > order folios/pfns inside their mapping. The 'align' argument is the > max order of their pgoff aligned folio/pfns. > > The purpose of the function is to adjust the resulting area to > optimize for the high order folios that are present while following > the uAPI rules for mmap. > > Maybe call it something like _order and document it like the above? Right, if it were made clear this is explicitly related to higher order folios that would go a long way to making the generalisation more acceptable. But we definitely need to have it not filter errors if it's generic. > > > > I also am not okay to export it for no reason. > > The next patches are the reason. Regardless exporting it like this raises the bar for quality here.
On Mon, Jun 16, 2025 at 01:20:55PM +0100, Lorenzo Stoakes wrote: > On Mon, Jun 16, 2025 at 09:14:28AM -0300, Jason Gunthorpe wrote: > > Also, probably 'aligned' is not the right name. This new function > > should be called by VMA owners that know they have pgoff aligned high > > order folios/pfns inside their mapping. The 'align' argument is the > > max order of their pgoff aligned folio/pfns. > > > > The purpose of the function is to adjust the resulting area to > > optimize for the high order folios that are present while following > > the uAPI rules for mmap. > > > > Maybe call it something like _order and document it like the above? > > Right, if it were made clear this is explicitly related to higher order > folios that would go a long way to making the generalisation more > acceptable. > > But we definitely need to have it not filter errors if it's generic. > > > > > > > > I also am not okay to export it for no reason. > > > > The next patches are the reason. > > Regardless exporting it like this raises the bar for quality here. Yes, it is also possible we have the wrong op, I know get_unmapped_area() pre-exists, but if we are really cleaning this stuff then something like get_max_pte_order() is probably a saner op. It would return the size of the biggest pgoff aligned folio/pfn within the file. Then the core code would do the special logic without exporting this function. Jason
On Fri, Jun 13, 2025 at 09:41:09AM -0400, Peter Xu wrote: > This function is pretty handy for any type of VMA to provide a size-aligned > VMA address when mmap(). Rename the function and export it. This isn't a great commit message, 'to provide a size-aligned VMA address when mmap()' is super unclear - do you mean 'to provide an unmapped address that is also aligned to the specified size'? I think you should also specify your motive, renaming and exporting something because it seems handy isn't sufficient justifiation. Also why would we need to export this? What modules might want to use this? I'm generally not a huge fan of exporting things unless we strictly have to. > > About the rename: > > - Dropping "THP" because it doesn't really have much to do with THP > internally. Well the function seems specifically tailored to the THP use. I think you'll need to further adjust this. > > - The suffix "_aligned" imply it is a helper to generate aligned virtual > address based on what is specified (which can be not PMD_SIZE). Ack this is sensible! > > Cc: Zi Yan <ziy@nvidia.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Cc: Dev Jain <dev.jain@arm.com> > Cc: Barry Song <baohua@kernel.org> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > include/linux/huge_mm.h | 14 +++++++++++++- > mm/huge_memory.c | 6 ++++-- > 2 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 2f190c90192d..706488d92bb6 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h Why are we keeping everything in huge_mm.h, huge_memory.c if this is being made generic? Surely this should be moved out into mm/mmap.c no? > @@ -339,7 +339,10 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > unsigned long len, unsigned long pgoff, unsigned long flags, > vm_flags_t vm_flags); > - > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > + unsigned long addr, unsigned long len, > + loff_t off, unsigned long flags, unsigned long size, > + vm_flags_t vm_flags); I echo Jason's comments about a kdoc and explanation of what this function does. > bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > unsigned int new_order); > @@ -543,6 +546,15 @@ thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > return 0; > } > > +static inline unsigned long > +mm_get_unmapped_area_aligned(struct file *filp, > + unsigned long addr, unsigned long len, > + loff_t off, unsigned long flags, unsigned long size, > + vm_flags_t vm_flags) > +{ > + return 0; > +} > + > static inline bool > can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins) > { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 4734de1dc0ae..52f13a70562f 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1088,7 +1088,7 @@ static inline bool is_transparent_hugepage(const struct folio *folio) > folio_test_large_rmappable(folio); > } > > -static unsigned long __thp_get_unmapped_area(struct file *filp, > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > unsigned long addr, unsigned long len, > loff_t off, unsigned long flags, unsigned long size, > vm_flags_t vm_flags) > @@ -1132,6 +1132,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp, > ret += off_sub; > return ret; > } > +EXPORT_SYMBOL_GPL(mm_get_unmapped_area_aligned); I'm not convinced about exporting this... shouldn't be export only if we explicitly have a user? I'd rather we didn't unless we needed to. > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > unsigned long len, unsigned long pgoff, unsigned long flags, > @@ -1140,7 +1141,8 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add > unsigned long ret; > loff_t off = (loff_t)pgoff << PAGE_SHIFT; > > - ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags); > + ret = mm_get_unmapped_area_aligned(filp, addr, len, off, flags, > + PMD_SIZE, vm_flags); > if (ret) > return ret; > > -- > 2.49.0 > So, you don't touch the original function but there's stuff there I think we need to think about if this is generalised. E.g.: if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) return 0; This still valid? /* * The failure might be due to length padding. The caller will retry * without the padding. */ if (IS_ERR_VALUE(ret)) return 0; This is assuming things the (currently single) caller will do, that is no longer an assumption you can make, especially if exported. Actually you maybe want to abstract the whole of thp_get_unmapped_area_vmflags() no? As this has a fallback mode? /* * Do not try to align to THP boundary if allocation at the address * hint succeeds. */ if (ret == addr) return addr; What was that about this no longer being relevant to THP? :>) Are all of these 'return 0' cases expected by any sensible caller? It seems like it's a way for thp_get_unmapped_area_vmflags() to recognise when to fall back to non-aligned?
On Fri, Jun 13, 2025 at 04:36:57PM +0100, Lorenzo Stoakes wrote: > On Fri, Jun 13, 2025 at 09:41:09AM -0400, Peter Xu wrote: > > This function is pretty handy for any type of VMA to provide a size-aligned > > VMA address when mmap(). Rename the function and export it. > > This isn't a great commit message, 'to provide a size-aligned VMA address when > mmap()' is super unclear - do you mean 'to provide an unmapped address that is > also aligned to the specified size'? I sincerely don't know the difference, not a native speaker here.. Suggestions welcomed, I can update to whatever both of us agree on. > > I think you should also specify your motive, renaming and exporting something > because it seems handy isn't sufficient justifiation. > > Also why would we need to export this? What modules might want to use this? I'm > generally not a huge fan of exporting things unless we strictly have to. It's one of the major reasons why I sent this together with the VFIO patches. It'll be used in VFIO patches that is in the same series. I will mention it in the commit message when repost. > > > > > About the rename: > > > > - Dropping "THP" because it doesn't really have much to do with THP > > internally. > > Well the function seems specifically tailored to the THP use. I think you'll > need to further adjust this. Actually.. it is almost exactly what I need so far. I can justify it below. > > > > > - The suffix "_aligned" imply it is a helper to generate aligned virtual > > address based on what is specified (which can be not PMD_SIZE). > > Ack this is sensible! > > > > > Cc: Zi Yan <ziy@nvidia.com> > > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> > > Cc: Ryan Roberts <ryan.roberts@arm.com> > > Cc: Dev Jain <dev.jain@arm.com> > > Cc: Barry Song <baohua@kernel.org> > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > include/linux/huge_mm.h | 14 +++++++++++++- > > mm/huge_memory.c | 6 ++++-- > > 2 files changed, 17 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index 2f190c90192d..706488d92bb6 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > Why are we keeping everything in huge_mm.h, huge_memory.c if this is being made > generic? > > Surely this should be moved out into mm/mmap.c no? No objections, but I suggest a separate discussion and patch submission when the original function resides in huge_memory.c. Hope it's ok for you. > > > @@ -339,7 +339,10 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > unsigned long len, unsigned long pgoff, unsigned long flags, > > vm_flags_t vm_flags); > > - > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > > + unsigned long addr, unsigned long len, > > + loff_t off, unsigned long flags, unsigned long size, > > + vm_flags_t vm_flags); > > I echo Jason's comments about a kdoc and explanation of what this function does. > > > bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); > > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > > unsigned int new_order); > > @@ -543,6 +546,15 @@ thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > return 0; > > } > > > > +static inline unsigned long > > +mm_get_unmapped_area_aligned(struct file *filp, > > + unsigned long addr, unsigned long len, > > + loff_t off, unsigned long flags, unsigned long size, > > + vm_flags_t vm_flags) > > +{ > > + return 0; > > +} > > + > > static inline bool > > can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins) > > { > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 4734de1dc0ae..52f13a70562f 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -1088,7 +1088,7 @@ static inline bool is_transparent_hugepage(const struct folio *folio) > > folio_test_large_rmappable(folio); > > } > > > > -static unsigned long __thp_get_unmapped_area(struct file *filp, > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > > unsigned long addr, unsigned long len, > > loff_t off, unsigned long flags, unsigned long size, > > vm_flags_t vm_flags) > > @@ -1132,6 +1132,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp, > > ret += off_sub; > > return ret; > > } > > +EXPORT_SYMBOL_GPL(mm_get_unmapped_area_aligned); > > I'm not convinced about exporting this... shouldn't be export only if we > explicitly have a user? > > I'd rather we didn't unless we needed to. > > > > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > unsigned long len, unsigned long pgoff, unsigned long flags, > > @@ -1140,7 +1141,8 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add > > unsigned long ret; > > loff_t off = (loff_t)pgoff << PAGE_SHIFT; > > > > - ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags); > > + ret = mm_get_unmapped_area_aligned(filp, addr, len, off, flags, > > + PMD_SIZE, vm_flags); > > if (ret) > > return ret; > > > > -- > > 2.49.0 > > > > So, you don't touch the original function but there's stuff there I think we > need to think about if this is generalised. > > E.g.: > > if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) > return 0; > > This still valid? Yes. I want this feature (for VFIO) to not be enabled on 32bits, and not enabled with compat syscals. > > /* > * The failure might be due to length padding. The caller will retry > * without the padding. > */ > if (IS_ERR_VALUE(ret)) > return 0; > > This is assuming things the (currently single) caller will do, that is no longer > an assumption you can make, especially if exported. It's part of core function we want from a generic helper. We want to know when the va allocation, after padded, would fail due to the padding. Then the caller can decide what to do next. It needs to fail here properly. > > Actually you maybe want to abstract the whole of thp_get_unmapped_area_vmflags() > no? As this has a fallback mode? > > /* > * Do not try to align to THP boundary if allocation at the address > * hint succeeds. > */ > if (ret == addr) > return addr; This is not a fallback. This is when user specified a hint address (no matter with / without MAP_FIXED), if that address works then we should reuse that address, ignoring the alignment requirement from the driver. This is exactly the behavior VFIO needs, and this should also be the suggested behavior for whatever new drivers that would like to start using this generic helper. > > What was that about this no longer being relevant to THP? :>) > > Are all of these 'return 0' cases expected by any sensible caller? It seems like > it's a way for thp_get_unmapped_area_vmflags() to recognise when to fall back to > non-aligned? Hope above justfies everything. It's my intention to reuse everything here. If you have any concern on any of the "return 0" cases in the function being exported, please shoot, we can discuss. Thanks, -- Peter Xu
On Fri, Jun 13, 2025 at 02:45:31PM -0400, Peter Xu wrote: > On Fri, Jun 13, 2025 at 04:36:57PM +0100, Lorenzo Stoakes wrote: > > On Fri, Jun 13, 2025 at 09:41:09AM -0400, Peter Xu wrote: > > > This function is pretty handy for any type of VMA to provide a size-aligned > > > VMA address when mmap(). Rename the function and export it. > > > > This isn't a great commit message, 'to provide a size-aligned VMA address when > > mmap()' is super unclear - do you mean 'to provide an unmapped address that is > > also aligned to the specified size'? > > I sincerely don't know the difference, not a native speaker here.. > Suggestions welcomed, I can update to whatever both of us agree on. Sure, sorry I don't mean to be pedantic I just think it would be clearer to sort of expand upon this, as the commit message is rather short. I think saying something like this function allows you to locate an unmapped region which is aligned to the specified size should suffice. > > > > > I think you should also specify your motive, renaming and exporting something > > because it seems handy isn't sufficient justifiation. > > > > Also why would we need to export this? What modules might want to use this? I'm > > generally not a huge fan of exporting things unless we strictly have to. > > It's one of the major reasons why I sent this together with the VFIO > patches. It'll be used in VFIO patches that is in the same series. I will > mention it in the commit message when repost. OK cool, I've not dug through those as not my area, really it's about having the appropriate justification. I'm always inclined to not want us to export things by default, based on experience of finding 'unusual' uses of various mm interfaces in drivers in the past which have caused problems :) But of course there are situations that warrant it, they just need to be spelled out. > > > > > > > > > About the rename: > > > > > > - Dropping "THP" because it doesn't really have much to do with THP > > > internally. > > > > Well the function seems specifically tailored to the THP use. I think you'll > > need to further adjust this. > > Actually.. it is almost exactly what I need so far. I can justify it below. Yeah, but it's not a general function that gives you an unmapped area that is aligned. It's a 'function that gets you an aligned unmapped area but only for 64-bit kernels and when you are not invoking it from a compat syscall and returns 0 instead of errors'. This doesn't sound general to me? > > > > > > > > > - The suffix "_aligned" imply it is a helper to generate aligned virtual > > > address based on what is specified (which can be not PMD_SIZE). > > > > Ack this is sensible! > > > > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> > > > Cc: Ryan Roberts <ryan.roberts@arm.com> > > > Cc: Dev Jain <dev.jain@arm.com> > > > Cc: Barry Song <baohua@kernel.org> > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > --- > > > include/linux/huge_mm.h | 14 +++++++++++++- > > > mm/huge_memory.c | 6 ++++-- > > > 2 files changed, 17 insertions(+), 3 deletions(-) > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > > index 2f190c90192d..706488d92bb6 100644 > > > --- a/include/linux/huge_mm.h > > > +++ b/include/linux/huge_mm.h > > > > Why are we keeping everything in huge_mm.h, huge_memory.c if this is being made > > generic? > > > > Surely this should be moved out into mm/mmap.c no? > > No objections, but I suggest a separate discussion and patch submission > when the original function resides in huge_memory.c. Hope it's ok for you. I like to be as flexible as I can be in review, but I'm afraid I'm going to have to be annoying about this one :) It simply makes no sense to have non-THP stuff in 'the THP file'. Also this makes this a general memory mapping function that should live with the other related code. I don't really think much discussion is required here? You could do this as 2 separate commits if that'd make life easier? Sorry to be a pain here, but I'm really allergic to our having random unrelated things in the wrong files, it's something mm has done rather too much... > > > > > > @@ -339,7 +339,10 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, > > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > > unsigned long len, unsigned long pgoff, unsigned long flags, > > > vm_flags_t vm_flags); > > > - > > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > > > + unsigned long addr, unsigned long len, > > > + loff_t off, unsigned long flags, unsigned long size, > > > + vm_flags_t vm_flags); > > > > I echo Jason's comments about a kdoc and explanation of what this function does. > > > > > bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); > > > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > > > unsigned int new_order); > > > @@ -543,6 +546,15 @@ thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > > return 0; > > > } > > > > > > +static inline unsigned long > > > +mm_get_unmapped_area_aligned(struct file *filp, > > > + unsigned long addr, unsigned long len, > > > + loff_t off, unsigned long flags, unsigned long size, > > > + vm_flags_t vm_flags) > > > +{ > > > + return 0; > > > +} > > > + > > > static inline bool > > > can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins) > > > { > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > index 4734de1dc0ae..52f13a70562f 100644 > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -1088,7 +1088,7 @@ static inline bool is_transparent_hugepage(const struct folio *folio) > > > folio_test_large_rmappable(folio); > > > } > > > > > > -static unsigned long __thp_get_unmapped_area(struct file *filp, > > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > > > unsigned long addr, unsigned long len, > > > loff_t off, unsigned long flags, unsigned long size, > > > vm_flags_t vm_flags) > > > @@ -1132,6 +1132,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp, > > > ret += off_sub; > > > return ret; > > > } > > > +EXPORT_SYMBOL_GPL(mm_get_unmapped_area_aligned); > > > > I'm not convinced about exporting this... shouldn't be export only if we > > explicitly have a user? > > > > I'd rather we didn't unless we needed to. > > > > > > > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > > unsigned long len, unsigned long pgoff, unsigned long flags, > > > @@ -1140,7 +1141,8 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add > > > unsigned long ret; > > > loff_t off = (loff_t)pgoff << PAGE_SHIFT; > > > > > > - ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags); > > > + ret = mm_get_unmapped_area_aligned(filp, addr, len, off, flags, > > > + PMD_SIZE, vm_flags); > > > if (ret) > > > return ret; > > > > > > -- > > > 2.49.0 > > > > > > > So, you don't touch the original function but there's stuff there I think we > > need to think about if this is generalised. > > > > E.g.: > > > > if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) > > return 0; > > > > This still valid? > > Yes. I want this feature (for VFIO) to not be enabled on 32bits, and not > enabled with compat syscals. OK, but then is this a 'general' function any more? These checks were introduced by commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment on 32 bit") and so are _absolutely specifically_ intended for a THP use-case. And now they _just happen_ to be useful to you but nothing about the function name suggests that this is the case? I mean it seems like you should be doing this check separately in both VFIO and THP code and having the 'general 'function not do this no? > > > > > /* > > * The failure might be due to length padding. The caller will retry > > * without the padding. > > */ > > if (IS_ERR_VALUE(ret)) > > return 0; > > > > This is assuming things the (currently single) caller will do, that is no longer > > an assumption you can make, especially if exported. > > It's part of core function we want from a generic helper. We want to know > when the va allocation, after padded, would fail due to the padding. Then > the caller can decide what to do next. It needs to fail here properly. I'm no sure I understand what you mean? It's not just this case, it's basically any error condition results in 0. It's actually quite dangerous, as the get_unmapped_area() functions are meant to return either an error value or the located address _and zero is a valid response_. So if somebody used this function naively, they'd potentially have a very nasty bug occur when an error arose. If you want to export this, I just don't think we can have this be a thing here. > > > > > Actually you maybe want to abstract the whole of thp_get_unmapped_area_vmflags() > > no? As this has a fallback mode? > > > > /* > > * Do not try to align to THP boundary if allocation at the address > > * hint succeeds. > > */ > > if (ret == addr) > > return addr; > > This is not a fallback. This is when user specified a hint address (no > matter with / without MAP_FIXED), if that address works then we should > reuse that address, ignoring the alignment requirement from the driver. > This is exactly the behavior VFIO needs, and this should also be the > suggested behavior for whatever new drivers that would like to start using > this generic helper. I didn't say this was the fallback :) this just happened to be the code underneath my comment. Sorry if that wasn't clear. This is another kinda non-general thing but one that makes more sense. This comment needs updating, however, obviously. You could just delete 'THP' in the comment that'd probalby do it. The fallback is in: unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags) { unsigned long ret; loff_t off = (loff_t)pgoff << PAGE_SHIFT; ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags); if (ret) return ret; So here, if ret returns an address, then it's fine we return that. Otherwise, we invoke the below (the fallback): return mm_get_unmapped_area_vmflags(current->mm, filp, addr, len, pgoff, flags, vm_flags); } > > > > > What was that about this no longer being relevant to THP? :>) > > > > Are all of these 'return 0' cases expected by any sensible caller? It seems like > > it's a way for thp_get_unmapped_area_vmflags() to recognise when to fall back to > > non-aligned? > > Hope above justfies everything. It's my intention to reuse everything > here. If you have any concern on any of the "return 0" cases in the > function being exported, please shoot, we can discuss. Of course, I have some doubts here :) > > Thanks, > > -- > Peter Xu > To be clearer perhaps, what I think would work here is: 1. Remove the CONFIG_64BIT, in_compat_syscall() check and place it in THP and VFIO code separately, as this isn't a general thing. 2. Rather than return 0 in this function, return error codes so it matches the other mm_get_unmapped_area_*() functions. 3. Adjust thp_get_unmapped_area_vmflags() to detect the error value from this function and do the fallback logic in this case. There's no need for this 0 stuff (and it's possibly broken actually, since _in theory_ you can get unmapped zero). 4. (sorry :) move the code to mm/mmap.c 5. Obviously address comments from others, most importantly (in my view) ensuring that there is a good kernel doc comment around the function. 6. Put the justifiation for exporting the function + stuff about VFIO in the commit message + expand it a little bit as discussed. 7. Other small stuff raised above (e.g. remove 'THP' comment etc.) Again, sorry to be a pain, but I think we need to be careful to get this right so we don't leave any footguns for ourselves in the future with 'implicit' stuff. Thanks!
On Fri, Jun 13, 2025 at 08:18:42PM +0100, Lorenzo Stoakes wrote: > On Fri, Jun 13, 2025 at 02:45:31PM -0400, Peter Xu wrote: > > On Fri, Jun 13, 2025 at 04:36:57PM +0100, Lorenzo Stoakes wrote: > > > On Fri, Jun 13, 2025 at 09:41:09AM -0400, Peter Xu wrote: > > > > This function is pretty handy for any type of VMA to provide a size-aligned > > > > VMA address when mmap(). Rename the function and export it. > > > > > > This isn't a great commit message, 'to provide a size-aligned VMA address when > > > mmap()' is super unclear - do you mean 'to provide an unmapped address that is > > > also aligned to the specified size'? > > > > I sincerely don't know the difference, not a native speaker here.. > > Suggestions welcomed, I can update to whatever both of us agree on. > > Sure, sorry I don't mean to be pedantic I just think it would be clearer to > sort of expand upon this, as the commit message is rather short. > > I think saying something like this function allows you to locate an > unmapped region which is aligned to the specified size should suffice. I changed the commit message to this: This function is pretty handy to locate an unmapped region which is aligned to the specified alignment, meanwhile taking pgoff into considerations. Rename the function and export it. VFIO will be the first candidate to reuse this function in follow up patches to calculate mmap() virtual addresses for MMIO mappings. > > > > > > > > > I think you should also specify your motive, renaming and exporting something > > > because it seems handy isn't sufficient justifiation. > > > > > > Also why would we need to export this? What modules might want to use this? I'm > > > generally not a huge fan of exporting things unless we strictly have to. > > > > It's one of the major reasons why I sent this together with the VFIO > > patches. It'll be used in VFIO patches that is in the same series. I will > > mention it in the commit message when repost. > > OK cool, I've not dug through those as not my area, really it's about > having the appropriate justification. > > I'm always inclined to not want us to export things by default, based on > experience of finding 'unusual' uses of various mm interfaces in drivers in > the past which have caused problems :) > > But of course there are situations that warrant it, they just need to be > spelled out. > > > > > > > > > > > > > > About the rename: > > > > > > > > - Dropping "THP" because it doesn't really have much to do with THP > > > > internally. > > > > > > Well the function seems specifically tailored to the THP use. I think you'll > > > need to further adjust this. > > > > Actually.. it is almost exactly what I need so far. I can justify it below. > > Yeah, but it's not a general function that gives you an unmapped area that > is aligned. > > It's a 'function that gets you an aligned unmapped area but only for 64-bit > kernels and when you are not invoking it from a compat syscall and returns > 0 instead of errors'. > > This doesn't sound general to me? I still think it's general. I think it's a general request for any huge mappings. For example, I do not want to enable aggressive VA allocations on 32 bits systems because I know it's easier to get overloaded VA address space with 32 bits. It should also apply to all potential users whoever wants to use this function by default. I don't think it always needs to do so, if there's an user that, for example, want to keep the calculation but still work on 32 bits, we can provide yet another helper. But it's not the case as of now, and I can't think of such user. In this case, I think it's OK we keep this in the helper for all existing users, including VFIO. > > > > > > > > > > > > > > - The suffix "_aligned" imply it is a helper to generate aligned virtual > > > > address based on what is specified (which can be not PMD_SIZE). > > > > > > Ack this is sensible! > > > > > > > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> > > > > Cc: Ryan Roberts <ryan.roberts@arm.com> > > > > Cc: Dev Jain <dev.jain@arm.com> > > > > Cc: Barry Song <baohua@kernel.org> > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > --- > > > > include/linux/huge_mm.h | 14 +++++++++++++- > > > > mm/huge_memory.c | 6 ++++-- > > > > 2 files changed, 17 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > > > index 2f190c90192d..706488d92bb6 100644 > > > > --- a/include/linux/huge_mm.h > > > > +++ b/include/linux/huge_mm.h > > > > > > Why are we keeping everything in huge_mm.h, huge_memory.c if this is being made > > > generic? > > > > > > Surely this should be moved out into mm/mmap.c no? > > > > No objections, but I suggest a separate discussion and patch submission > > when the original function resides in huge_memory.c. Hope it's ok for you. > > I like to be as flexible as I can be in review, but I'm afraid I'm going to > have to be annoying about this one :) > > It simply makes no sense to have non-THP stuff in 'the THP file'. Also this > makes this a general memory mapping function that should live with the > other related code. > > I don't really think much discussion is required here? You could do this as > 2 separate commits if that'd make life easier? > > Sorry to be a pain here, but I'm really allergic to our having random > unrelated things in the wrong files, it's something mm has done rather too > much... I don't understand why the helper is non-THP. The alignment so far is really about huge mappings. Core mm's HUGE_PFNMAP config option also depends on THP at least as of now. # TODO: Allow to be enabled without THP config ARCH_SUPPORTS_HUGE_PFNMAP def_bool n depends on TRANSPARENT_HUGEPAGE > > > > > > > > > > @@ -339,7 +339,10 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, > > > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > > > unsigned long len, unsigned long pgoff, unsigned long flags, > > > > vm_flags_t vm_flags); > > > > - > > > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > > > > + unsigned long addr, unsigned long len, > > > > + loff_t off, unsigned long flags, unsigned long size, > > > > + vm_flags_t vm_flags); > > > > > > I echo Jason's comments about a kdoc and explanation of what this function does. > > > > > > > bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); > > > > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > > > > unsigned int new_order); > > > > @@ -543,6 +546,15 @@ thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > > > return 0; > > > > } > > > > > > > > +static inline unsigned long > > > > +mm_get_unmapped_area_aligned(struct file *filp, > > > > + unsigned long addr, unsigned long len, > > > > + loff_t off, unsigned long flags, unsigned long size, > > > > + vm_flags_t vm_flags) > > > > +{ > > > > + return 0; > > > > +} > > > > + > > > > static inline bool > > > > can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins) > > > > { > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > > index 4734de1dc0ae..52f13a70562f 100644 > > > > --- a/mm/huge_memory.c > > > > +++ b/mm/huge_memory.c > > > > @@ -1088,7 +1088,7 @@ static inline bool is_transparent_hugepage(const struct folio *folio) > > > > folio_test_large_rmappable(folio); > > > > } > > > > > > > > -static unsigned long __thp_get_unmapped_area(struct file *filp, > > > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > > > > unsigned long addr, unsigned long len, > > > > loff_t off, unsigned long flags, unsigned long size, > > > > vm_flags_t vm_flags) > > > > @@ -1132,6 +1132,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp, > > > > ret += off_sub; > > > > return ret; > > > > } > > > > +EXPORT_SYMBOL_GPL(mm_get_unmapped_area_aligned); > > > > > > I'm not convinced about exporting this... shouldn't be export only if we > > > explicitly have a user? > > > > > > I'd rather we didn't unless we needed to. > > > > > > > > > > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > > > unsigned long len, unsigned long pgoff, unsigned long flags, > > > > @@ -1140,7 +1141,8 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add > > > > unsigned long ret; > > > > loff_t off = (loff_t)pgoff << PAGE_SHIFT; > > > > > > > > - ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags); > > > > + ret = mm_get_unmapped_area_aligned(filp, addr, len, off, flags, > > > > + PMD_SIZE, vm_flags); > > > > if (ret) > > > > return ret; > > > > > > > > -- > > > > 2.49.0 > > > > > > > > > > So, you don't touch the original function but there's stuff there I think we > > > need to think about if this is generalised. > > > > > > E.g.: > > > > > > if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) > > > return 0; > > > > > > This still valid? > > > > Yes. I want this feature (for VFIO) to not be enabled on 32bits, and not > > enabled with compat syscals. > > OK, but then is this a 'general' function any more? > > These checks were introduced by commit 4ef9ad19e176 ("mm: huge_memory: > don't force huge page alignment on 32 bit") and so are _absolutely > specifically_ intended for a THP use-case. > > And now they _just happen_ to be useful to you but nothing about the > function name suggests that this is the case? > > I mean it seems like you should be doing this check separately in both VFIO > and THP code and having the 'general 'function not do this no? I don't understand, sorry. If this helper only has two users, the two users want the same check, shouldn't we keep the check in the helper, rather than duplicating in the two callers? > > > > > > > > > /* > > > * The failure might be due to length padding. The caller will retry > > > * without the padding. > > > */ > > > if (IS_ERR_VALUE(ret)) > > > return 0; > > > > > > This is assuming things the (currently single) caller will do, that is no longer > > > an assumption you can make, especially if exported. > > > > It's part of core function we want from a generic helper. We want to know > > when the va allocation, after padded, would fail due to the padding. Then > > the caller can decide what to do next. It needs to fail here properly. > > I'm no sure I understand what you mean? > > It's not just this case, it's basically any error condition results in 0. > > It's actually quite dangerous, as the get_unmapped_area() functions are > meant to return either an error value or the located address _and zero is a > valid response_. Not by default, when you didn't change vm.mmap_min_addr. I don't think it's a good idea to be able to return NULL as a virtual address, unless extremely necessary. I don't even know whether Linux can do that now. OTOH, it's common too so far to use this retval in get_unmapped_area(). Currently, the mm API is defined as: unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); Its retval is unsigned long, and its error is returned by IS_ERR_VALUE(). That's the current API across the whole mm, and that's why this function does it because when used in THP it's easier for retval processing. Same to VFIO, as long as the API didn't change. I'm OK if any of us wants to refactor this as a whole, but it'll be great if you could agree we can do it separately, and also discussed separately. > > So if somebody used this function naively, they'd potentially have a very > nasty bug occur when an error arose. > > If you want to export this, I just don't think we can have this be a thing > here. > > > > > > > > > Actually you maybe want to abstract the whole of thp_get_unmapped_area_vmflags() > > > no? As this has a fallback mode? > > > > > > /* > > > * Do not try to align to THP boundary if allocation at the address > > > * hint succeeds. > > > */ > > > if (ret == addr) > > > return addr; > > > > This is not a fallback. This is when user specified a hint address (no > > matter with / without MAP_FIXED), if that address works then we should > > reuse that address, ignoring the alignment requirement from the driver. > > This is exactly the behavior VFIO needs, and this should also be the > > suggested behavior for whatever new drivers that would like to start using > > this generic helper. > > I didn't say this was the fallback :) this just happened to be the code > underneath my comment. Sorry if that wasn't clear. > > This is another kinda non-general thing but one that makes more sense. This > comment needs updating, however, obviously. You could just delete 'THP' in > the comment that'd probalby do it. Yes, the THP word does not apply anymore. I'll change it, thanks for pointing this out. > > The fallback is in: > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > unsigned long len, unsigned long pgoff, unsigned long flags, > vm_flags_t vm_flags) > { > unsigned long ret; > loff_t off = (loff_t)pgoff << PAGE_SHIFT; > > ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags); > if (ret) > return ret; > > So here, if ret returns an address, then it's fine we return that. > > Otherwise, we invoke the below (the fallback): > > return mm_get_unmapped_area_vmflags(current->mm, filp, addr, len, pgoff, flags, > vm_flags); > } > > > > > > > > > What was that about this no longer being relevant to THP? :>) > > > > > > Are all of these 'return 0' cases expected by any sensible caller? It seems like > > > it's a way for thp_get_unmapped_area_vmflags() to recognise when to fall back to > > > non-aligned? > > > > Hope above justfies everything. It's my intention to reuse everything > > here. If you have any concern on any of the "return 0" cases in the > > function being exported, please shoot, we can discuss. > > Of course, I have some doubts here :) > > > > > Thanks, > > > > -- > > Peter Xu > > > > To be clearer perhaps, what I think would work here is: > > 1. Remove the CONFIG_64BIT, in_compat_syscall() check and place it in THP > and VFIO code separately, as this isn't a general thing. Commented above. I still think it should be kept until we have a valid use case to not enable it. > > 2. Rather than return 0 in this function, return error codes so it matches > the other mm_get_unmapped_area_*() functions. Commented above. > > 3. Adjust thp_get_unmapped_area_vmflags() to detect the error value from > this function and do the fallback logic in this case. There's no need > for this 0 stuff (and it's possibly broken actually, since _in theory_ > you can get unmapped zero). Please see the discussion in the other thread, where I replied to Jason to explain why the fallback might not be what the user always want. For example, the last patch does try 1G first and if it fails somehow it'll try 2M. It doesn't want to fallback to 4K when 1G alloc fails. > > 4. (sorry :) move the code to mm/mmap.c Commented above. Note: I'm not saying it _can't_ be moved out, but it still makes sense to me to be in huge_memory.c. > > 5. Obviously address comments from others, most importantly (in my view) > ensuring that there is a good kernel doc comment around the function. > > 6. Put the justifiation for exporting the function + stuff about VFIO in > the commit message + expand it a little bit as discussed. Please check if above version works for you. > > 7. Other small stuff raised above (e.g. remove 'THP' comment etc.) I'll do this. > > Again, sorry to be a pain, but I think we need to be careful to get this > right so we don't leave any footguns for ourselves in the future with > 'implicit' stuff. > > Thanks! > Thanks, -- Peter Xu
Peter - I think the problem we're having here is that you're making this a _general_ and _exported_ function that must live up to the standards of such a function. But at the same time, for convenience, want it to happen to do what's convenient for VFIO and THP. To stop us going around in circles - I won't accept this patch if it: a. Returns 0 on errors, b. Does stuff that is specific to THP, etc. c. Is located in mm/huge_memory.c when you're saying it's general. This isn't a case of nits to be sorted out later, these are fundamental issues that prevent your series from being mergeable. On Fri, Jun 13, 2025 at 04:34:37PM -0400, Peter Xu wrote: > On Fri, Jun 13, 2025 at 08:18:42PM +0100, Lorenzo Stoakes wrote: > > On Fri, Jun 13, 2025 at 02:45:31PM -0400, Peter Xu wrote: > > > On Fri, Jun 13, 2025 at 04:36:57PM +0100, Lorenzo Stoakes wrote: > > > > On Fri, Jun 13, 2025 at 09:41:09AM -0400, Peter Xu wrote: > > > > > This function is pretty handy for any type of VMA to provide a size-aligned > > > > > VMA address when mmap(). Rename the function and export it. > > > > > > > > This isn't a great commit message, 'to provide a size-aligned VMA address when > > > > mmap()' is super unclear - do you mean 'to provide an unmapped address that is > > > > also aligned to the specified size'? > > > > > > I sincerely don't know the difference, not a native speaker here.. > > > Suggestions welcomed, I can update to whatever both of us agree on. > > > > Sure, sorry I don't mean to be pedantic I just think it would be clearer to > > sort of expand upon this, as the commit message is rather short. > > > > I think saying something like this function allows you to locate an > > unmapped region which is aligned to the specified size should suffice. > > I changed the commit message to this: > > This function is pretty handy to locate an unmapped region which is aligned > to the specified alignment, meanwhile taking pgoff into considerations. > > Rename the function and export it. VFIO will be the first candidate to > reuse this function in follow up patches to calculate mmap() virtual > addresses for MMIO mappings. This is better but doesn't describe what this function does as you're doing unusual things. > > > > > > > > > > > > > > I think you should also specify your motive, renaming and exporting something > > > > because it seems handy isn't sufficient justifiation. > > > > > > > > Also why would we need to export this? What modules might want to use this? I'm > > > > generally not a huge fan of exporting things unless we strictly have to. > > > > > > It's one of the major reasons why I sent this together with the VFIO > > > patches. It'll be used in VFIO patches that is in the same series. I will > > > mention it in the commit message when repost. > > > > OK cool, I've not dug through those as not my area, really it's about > > having the appropriate justification. > > > > I'm always inclined to not want us to export things by default, based on > > experience of finding 'unusual' uses of various mm interfaces in drivers in > > the past which have caused problems :) > > > > But of course there are situations that warrant it, they just need to be > > spelled out. > > > > > > > > > > > > > > > > > > > About the rename: > > > > > > > > > > - Dropping "THP" because it doesn't really have much to do with THP > > > > > internally. > > > > > > > > Well the function seems specifically tailored to the THP use. I think you'll > > > > need to further adjust this. > > > > > > Actually.. it is almost exactly what I need so far. I can justify it below. > > > > Yeah, but it's not a general function that gives you an unmapped area that > > is aligned. > > > > It's a 'function that gets you an aligned unmapped area but only for 64-bit > > kernels and when you are not invoking it from a compat syscall and returns > > 0 instead of errors'. > > > > This doesn't sound general to me? > > I still think it's general. I think it's a general request for any huge > mappings. For example, I do not want to enable aggressive VA allocations > on 32 bits systems because I know it's easier to get overloaded VA address > space with 32 bits. It should also apply to all potential users whoever > wants to use this function by default. This is a stretch, you're now assuming alignment must be large enough to be a problem on 32-bit systems, and you've not mentioned compat syscalls _at all_ here. Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment on 32 bit") is what introduced this. It literally references issued encountered in THP. > > I don't think it always needs to do so, if there's an user that, for > example, want to keep the calculation but still work on 32 bits, we can > provide yet another helper. But it's not the case as of now, and I can't > think of such user. In this case, I think it's OK we keep this in the > helper for all existing users, including VFIO. It's not OK, sorry. > > > > > > > > > > > > > > > > > > > > - The suffix "_aligned" imply it is a helper to generate aligned virtual > > > > > address based on what is specified (which can be not PMD_SIZE). > > > > > > > > Ack this is sensible! > > > > > > > > > > > > > > Cc: Zi Yan <ziy@nvidia.com> > > > > > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > > > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> > > > > > Cc: Ryan Roberts <ryan.roberts@arm.com> > > > > > Cc: Dev Jain <dev.jain@arm.com> > > > > > Cc: Barry Song <baohua@kernel.org> > > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > > --- > > > > > include/linux/huge_mm.h | 14 +++++++++++++- > > > > > mm/huge_memory.c | 6 ++++-- > > > > > 2 files changed, 17 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > > > > index 2f190c90192d..706488d92bb6 100644 > > > > > --- a/include/linux/huge_mm.h > > > > > +++ b/include/linux/huge_mm.h > > > > > > > > Why are we keeping everything in huge_mm.h, huge_memory.c if this is being made > > > > generic? > > > > > > > > Surely this should be moved out into mm/mmap.c no? > > > > > > No objections, but I suggest a separate discussion and patch submission > > > when the original function resides in huge_memory.c. Hope it's ok for you. > > > > I like to be as flexible as I can be in review, but I'm afraid I'm going to > > have to be annoying about this one :) > > > > It simply makes no sense to have non-THP stuff in 'the THP file'. Also this > > makes this a general memory mapping function that should live with the > > other related code. > > > > I don't really think much discussion is required here? You could do this as > > 2 separate commits if that'd make life easier? > > > > Sorry to be a pain here, but I'm really allergic to our having random > > unrelated things in the wrong files, it's something mm has done rather too > > much... > > I don't understand why the helper is non-THP. The alignment so far is > really about huge mappings. Core mm's HUGE_PFNMAP config option also > depends on THP at least as of now. > > # TODO: Allow to be enabled without THP > config ARCH_SUPPORTS_HUGE_PFNMAP > def_bool n > depends on TRANSPARENT_HUGEPAGE I really don't understand what your point is? You're naming this mm_get_unmapped_area_aligned()? No reference to THP or VFIO? > > > > > > > > > > > > > > > @@ -339,7 +339,10 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, > > > > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > > > > unsigned long len, unsigned long pgoff, unsigned long flags, > > > > > vm_flags_t vm_flags); > > > > > - > > > > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > > > > > + unsigned long addr, unsigned long len, > > > > > + loff_t off, unsigned long flags, unsigned long size, > > > > > + vm_flags_t vm_flags); > > > > > > > > I echo Jason's comments about a kdoc and explanation of what this function does. > > > > > > > > > bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); > > > > > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > > > > > unsigned int new_order); > > > > > @@ -543,6 +546,15 @@ thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > > > > return 0; > > > > > } > > > > > > > > > > +static inline unsigned long > > > > > +mm_get_unmapped_area_aligned(struct file *filp, > > > > > + unsigned long addr, unsigned long len, > > > > > + loff_t off, unsigned long flags, unsigned long size, > > > > > + vm_flags_t vm_flags) > > > > > +{ > > > > > + return 0; > > > > > +} > > > > > + > > > > > static inline bool > > > > > can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins) > > > > > { > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > > > index 4734de1dc0ae..52f13a70562f 100644 > > > > > --- a/mm/huge_memory.c > > > > > +++ b/mm/huge_memory.c > > > > > @@ -1088,7 +1088,7 @@ static inline bool is_transparent_hugepage(const struct folio *folio) > > > > > folio_test_large_rmappable(folio); > > > > > } > > > > > > > > > > -static unsigned long __thp_get_unmapped_area(struct file *filp, > > > > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > > > > > unsigned long addr, unsigned long len, > > > > > loff_t off, unsigned long flags, unsigned long size, > > > > > vm_flags_t vm_flags) > > > > > @@ -1132,6 +1132,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp, > > > > > ret += off_sub; > > > > > return ret; > > > > > } > > > > > +EXPORT_SYMBOL_GPL(mm_get_unmapped_area_aligned); > > > > > > > > I'm not convinced about exporting this... shouldn't be export only if we > > > > explicitly have a user? > > > > > > > > I'd rather we didn't unless we needed to. > > > > > > > > > > > > > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > > > > unsigned long len, unsigned long pgoff, unsigned long flags, > > > > > @@ -1140,7 +1141,8 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add > > > > > unsigned long ret; > > > > > loff_t off = (loff_t)pgoff << PAGE_SHIFT; > > > > > > > > > > - ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags); > > > > > + ret = mm_get_unmapped_area_aligned(filp, addr, len, off, flags, > > > > > + PMD_SIZE, vm_flags); > > > > > if (ret) > > > > > return ret; > > > > > > > > > > -- > > > > > 2.49.0 > > > > > > > > > > > > > So, you don't touch the original function but there's stuff there I think we > > > > need to think about if this is generalised. > > > > > > > > E.g.: > > > > > > > > if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) > > > > return 0; > > > > > > > > This still valid? > > > > > > Yes. I want this feature (for VFIO) to not be enabled on 32bits, and not > > > enabled with compat syscals. > > > > OK, but then is this a 'general' function any more? > > > > These checks were introduced by commit 4ef9ad19e176 ("mm: huge_memory: > > don't force huge page alignment on 32 bit") and so are _absolutely > > specifically_ intended for a THP use-case. > > > > And now they _just happen_ to be useful to you but nothing about the > > function name suggests that this is the case? > > > > I mean it seems like you should be doing this check separately in both VFIO > > and THP code and having the 'general 'function not do this no? > > I don't understand, sorry. > > If this helper only has two users, the two users want the same check, > shouldn't we keep the check in the helper, rather than duplicating in the > two callers? Because you're making this an exported 'general' function with '_aligned' in the suffix and your whole patch is about how it's general. The problem is somebody will use this function thinking it is general, then find out it's not general it's a 'de-duplicate VFIO and THP' function. > > > > > > > > > > > > > > /* > > > > * The failure might be due to length padding. The caller will retry > > > > * without the padding. > > > > */ > > > > if (IS_ERR_VALUE(ret)) > > > > return 0; > > > > > > > > This is assuming things the (currently single) caller will do, that is no longer > > > > an assumption you can make, especially if exported. > > > > > > It's part of core function we want from a generic helper. We want to know > > > when the va allocation, after padded, would fail due to the padding. Then > > > the caller can decide what to do next. It needs to fail here properly. > > > > I'm no sure I understand what you mean? > > > > It's not just this case, it's basically any error condition results in 0. > > > > It's actually quite dangerous, as the get_unmapped_area() functions are > > meant to return either an error value or the located address _and zero is a > > valid response_. > > Not by default, when you didn't change vm.mmap_min_addr. I don't think it's > a good idea to be able to return NULL as a virtual address, unless > extremely necessary. I don't even know whether Linux can do that now. It can afaik. > > OTOH, it's common too so far to use this retval in get_unmapped_area(). > > Currently, the mm API is defined as: > > unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); > > Its retval is unsigned long, and its error is returned by IS_ERR_VALUE(). > That's the current API across the whole mm, and that's why this function > does it because when used in THP it's easier for retval processing. Same > to VFIO, as long as the API didn't change. This sounds like you agree with me that you're fundamentally breaking this contract for convenience? This isn't acceptable for a general function. > > I'm OK if any of us wants to refactor this as a whole, but it'll be great > if you could agree we can do it separately, and also discussed separately. Sorry these are _fundamental_ issues, not nits or niceties that can be followed up on. > > > > > So if somebody used this function naively, they'd potentially have a very > > nasty bug occur when an error arose. > > > > If you want to export this, I just don't think we can have this be a thing > > here. > > > > > > > > > > > > > Actually you maybe want to abstract the whole of thp_get_unmapped_area_vmflags() > > > > no? As this has a fallback mode? > > > > > > > > /* > > > > * Do not try to align to THP boundary if allocation at the address > > > > * hint succeeds. > > > > */ > > > > if (ret == addr) > > > > return addr; > > > > > > This is not a fallback. This is when user specified a hint address (no > > > matter with / without MAP_FIXED), if that address works then we should > > > reuse that address, ignoring the alignment requirement from the driver. > > > This is exactly the behavior VFIO needs, and this should also be the > > > suggested behavior for whatever new drivers that would like to start using > > > this generic helper. > > > > I didn't say this was the fallback :) this just happened to be the code > > underneath my comment. Sorry if that wasn't clear. > > > > This is another kinda non-general thing but one that makes more sense. This > > comment needs updating, however, obviously. You could just delete 'THP' in > > the comment that'd probalby do it. > > Yes, the THP word does not apply anymore. I'll change it, thanks for > pointing this out. Thanks. > > > > > The fallback is in: > > > > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > > unsigned long len, unsigned long pgoff, unsigned long flags, > > vm_flags_t vm_flags) > > { > > unsigned long ret; > > loff_t off = (loff_t)pgoff << PAGE_SHIFT; > > > > ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags); > > if (ret) > > return ret; > > > > So here, if ret returns an address, then it's fine we return that. > > > > Otherwise, we invoke the below (the fallback): > > > > return mm_get_unmapped_area_vmflags(current->mm, filp, addr, len, pgoff, flags, > > vm_flags); > > } > > > > > > > > > > > > > What was that about this no longer being relevant to THP? :>) > > > > > > > > Are all of these 'return 0' cases expected by any sensible caller? It seems like > > > > it's a way for thp_get_unmapped_area_vmflags() to recognise when to fall back to > > > > non-aligned? > > > > > > Hope above justfies everything. It's my intention to reuse everything > > > here. If you have any concern on any of the "return 0" cases in the > > > function being exported, please shoot, we can discuss. > > > > Of course, I have some doubts here :) > > > > > > > > Thanks, > > > > > > -- > > > Peter Xu > > > > > > > To be clearer perhaps, what I think would work here is: > > > > 1. Remove the CONFIG_64BIT, in_compat_syscall() check and place it in THP > > and VFIO code separately, as this isn't a general thing. > > Commented above. I still think it should be kept until we have a valid use > case to not enable it. No, this isn't acceptable sorry. I won't accept the patch as-is with this in place. > > > > > 2. Rather than return 0 in this function, return error codes so it matches > > the other mm_get_unmapped_area_*() functions. > > Commented above. No, this isn't acceptable sorry. I won't accept the patch as-is with this in place. > > > > > 3. Adjust thp_get_unmapped_area_vmflags() to detect the error value from > > this function and do the fallback logic in this case. There's no need > > for this 0 stuff (and it's possibly broken actually, since _in theory_ > > you can get unmapped zero). > > Please see the discussion in the other thread, where I replied to Jason to > explain why the fallback might not be what the user always want. > > For example, the last patch does try 1G first and if it fails somehow it'll > try 2M. It doesn't want to fallback to 4K when 1G alloc fails. You're misunderstanding me. I said adjust the THP code to do the fallback. To be super clear I meant: Change: ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags); if (ret) return ret; return mm_get_unmapped_area_vmflags(current->mm, filp, addr, len, pgoff, flags, vm_flags); To: ret = mm_get_unmapped_area_align(filp, addr, len, off, flags, PMD_SIZE, vm_flags); if (!IS_ERR_VAL(ret)) return ret; return mm_get_unmapped_area_vmflags(current->mm, filp, addr, len, pgoff, flags, vm_flags); In thp_get_unmapped_area_vmflags(). > > > > > 4. (sorry :) move the code to mm/mmap.c > > Commented above. Note: I'm not saying it _can't_ be moved out, but it > still makes sense to me to be in huge_memory.c. No, this isn't acceptable sorry. I won't accept the patch as-is with this in place. > > > > > 5. Obviously address comments from others, most importantly (in my view) > > ensuring that there is a good kernel doc comment around the function. > > > > 6. Put the justifiation for exporting the function + stuff about VFIO in > > the commit message + expand it a little bit as discussed. > > Please check if above version works for you. It's not, you're not explaining at all what this function does. But even if you did, the function is doing something that isn't at all general. > > > > > 7. Other small stuff raised above (e.g. remove 'THP' comment etc.) > > I'll do this. Well there's this at least :) > > > > > Again, sorry to be a pain, but I think we need to be careful to get this > > right so we don't leave any footguns for ourselves in the future with > > 'implicit' stuff. > > > > Thanks! > > > > Thanks, > > -- > Peter Xu > Yeah sorry but you really need to rethink this. I appreciate you trying to de-duplicate here, but again we truly must have a high bar for this kind of generalised function, because it's absolutely the kind of foot-gun that'll come back to bite when somebody sees mm_get_unmapped_area_aligned() and doesn't realise it's not in any way generic. And any patch that does that will not show any reference to the zero returns, etc., it'll not cc any of us, and people will just quietly break their code in subtle ways. Thanks!
On 13 Jun 2025, at 9:41, Peter Xu wrote: > This function is pretty handy for any type of VMA to provide a size-aligned > VMA address when mmap(). Rename the function and export it. > > About the rename: > > - Dropping "THP" because it doesn't really have much to do with THP > internally. > > - The suffix "_aligned" imply it is a helper to generate aligned virtual > address based on what is specified (which can be not PMD_SIZE). > > Cc: Zi Yan <ziy@nvidia.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Cc: Dev Jain <dev.jain@arm.com> > Cc: Barry Song <baohua@kernel.org> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > include/linux/huge_mm.h | 14 +++++++++++++- > mm/huge_memory.c | 6 ++++-- > 2 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 2f190c90192d..706488d92bb6 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -339,7 +339,10 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, > unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > unsigned long len, unsigned long pgoff, unsigned long flags, > vm_flags_t vm_flags); > - > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > + unsigned long addr, unsigned long len, > + loff_t off, unsigned long flags, unsigned long size, > + vm_flags_t vm_flags); > bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > unsigned int new_order); > @@ -543,6 +546,15 @@ thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, > return 0; > } > > +static inline unsigned long > +mm_get_unmapped_area_aligned(struct file *filp, > + unsigned long addr, unsigned long len, > + loff_t off, unsigned long flags, unsigned long size, > + vm_flags_t vm_flags) > +{ > + return 0; > +} > + > static inline bool > can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins) > { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 4734de1dc0ae..52f13a70562f 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1088,7 +1088,7 @@ static inline bool is_transparent_hugepage(const struct folio *folio) > folio_test_large_rmappable(folio); > } > > -static unsigned long __thp_get_unmapped_area(struct file *filp, > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > unsigned long addr, unsigned long len, > loff_t off, unsigned long flags, unsigned long size, Since you added aligned suffix, renaming size to alignment might help improve readability. Otherwise, Reviewed-by: Zi Yan <ziy@nvidia.com> Best Regards, Yan, Zi
On Fri, Jun 13, 2025 at 11:19:30AM -0400, Zi Yan wrote: > > -static unsigned long __thp_get_unmapped_area(struct file *filp, > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > > unsigned long addr, unsigned long len, > > loff_t off, unsigned long flags, unsigned long size, > > Since you added aligned suffix, renaming size to alignment might > help improve readability. I'll use "align" per Jason's suggestion, assuming it's ok and shorter. > > Otherwise, Reviewed-by: Zi Yan <ziy@nvidia.com> I'll take this though, thanks Zi. -- Peter Xu
On Fri, Jun 13, 2025 at 09:41:09AM -0400, Peter Xu wrote: > @@ -1088,7 +1088,7 @@ static inline bool is_transparent_hugepage(const struct folio *folio) > folio_test_large_rmappable(folio); > } > > -static unsigned long __thp_get_unmapped_area(struct file *filp, > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > unsigned long addr, unsigned long len, > loff_t off, unsigned long flags, unsigned long size, > vm_flags_t vm_flags) Please add a kdoc for this since it is going to be exported.. I didn't intuitively guess how it works or why there are two length/size arguments. It seems to have an exciting return code as well. I suppose size is the alignment target? Maybe rename the parameter too? For the purposes of VFIO do we need to be careful about math overflow here: loff_t off_end = off + len; loff_t off_align = round_up(off, size); ? Jason
On Fri, Jun 13, 2025 at 11:17:45AM -0300, Jason Gunthorpe wrote: > On Fri, Jun 13, 2025 at 09:41:09AM -0400, Peter Xu wrote: > > @@ -1088,7 +1088,7 @@ static inline bool is_transparent_hugepage(const struct folio *folio) > > folio_test_large_rmappable(folio); > > } > > > > -static unsigned long __thp_get_unmapped_area(struct file *filp, > > +unsigned long mm_get_unmapped_area_aligned(struct file *filp, > > unsigned long addr, unsigned long len, > > loff_t off, unsigned long flags, unsigned long size, > > vm_flags_t vm_flags) > > Please add a kdoc for this since it is going to be exported.. Will do. And thanks for the super fast feedbacks. :) > > I didn't intuitively guess how it works or why there are two > length/size arguments. It seems to have an exciting return code as > well. > > I suppose size is the alignment target? Maybe rename the parameter too? Yes, when the kdoc is there it'll be more obvious. So far "size" is ok to me, but if you have better suggestion please shoot - whatever I came up with so far seems to be too long, and maybe not necessary when kdoc will be available too. > > For the purposes of VFIO do we need to be careful about math overflow here: > > loff_t off_end = off + len; > loff_t off_align = round_up(off, size); > > ? IIUC the 1st one was covered by the latter check here: (off + len_pad) < off Indeed I didn't see what makes sure the 2nd won't overflow. How about I add it within this patch? A whole fixup could look like this: From 4d71d1fc905da23786e1252774e42a1051253176 Mon Sep 17 00:00:00 2001 From: Peter Xu <peterx@redhat.com> Date: Fri, 13 Jun 2025 10:55:35 -0400 Subject: [PATCH] fixup! mm: Rename __thp_get_unmapped_area to mm_get_unmapped_area_aligned Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/huge_memory.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 52f13a70562f..5cbe45405623 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1088,6 +1088,24 @@ static inline bool is_transparent_hugepage(const struct folio *folio) folio_test_large_rmappable(folio); } +/** + * mm_get_unmapped_area_aligned - Allocate an aligned virtual address + * @filp: file target of the mmap() request + * @addr: hint address from mmap() request + * @len: len of the mmap() request + * @off: file offset of the mmap() request + * @flags: flags of the mmap() request + * @size: the size of alignment the caller requests + * @vm_flags: the vm_flags passed from get_unmapped_area() caller + * + * This function should normally be used by a driver's specific + * get_unmapped_area() handler to provide a properly aligned virtual + * address for a specific mmap() request. The caller should pass in most + * of the parameters from the get_unmapped_area() request, but properly + * specify @size as the alignment needed. + * + * Return: non-zero if a valid virtual address is found, zero if fails + */ unsigned long mm_get_unmapped_area_aligned(struct file *filp, unsigned long addr, unsigned long len, loff_t off, unsigned long flags, unsigned long size, @@ -1104,7 +1122,7 @@ unsigned long mm_get_unmapped_area_aligned(struct file *filp, return 0; len_pad = len + size; - if (len_pad < len || (off + len_pad) < off) + if (len_pad < len || (off + len_pad) < off || off_align < off) return 0; ret = mm_get_unmapped_area_vmflags(current->mm, filp, addr, len_pad, -- 2.49.0 -- Peter Xu
On Fri, Jun 13, 2025 at 11:13:58AM -0400, Peter Xu wrote: > > I didn't intuitively guess how it works or why there are two > > length/size arguments. It seems to have an exciting return code as > > well. > > > > I suppose size is the alignment target? Maybe rename the parameter too? > > Yes, when the kdoc is there it'll be more obvious. So far "size" is ok to > me, but if you have better suggestion please shoot - whatever I came up > with so far seems to be too long, and maybe not necessary when kdoc will be > available too. I would call it align not size > > For the purposes of VFIO do we need to be careful about math overflow here: > > > > loff_t off_end = off + len; > > loff_t off_align = round_up(off, size); > > > > ? > > IIUC the 1st one was covered by the latter check here: > > (off + len_pad) < off > > Indeed I didn't see what makes sure the 2nd won't overflow. I'm not sure the < tests are safe in this modern world. I would use the overflow helpers directly and remove the < overflow checks. > +/** > + * mm_get_unmapped_area_aligned - Allocate an aligned virtual address > + * @filp: file target of the mmap() request > + * @addr: hint address from mmap() request > + * @len: len of the mmap() request > + * @off: file offset of the mmap() request > + * @flags: flags of the mmap() request > + * @size: the size of alignment the caller requests Just "the alignment the caller requests" > + * @vm_flags: the vm_flags passed from get_unmapped_area() caller > + * > + * This function should normally be used by a driver's specific > + * get_unmapped_area() handler to provide a properly aligned virtual > + * address for a specific mmap() request. The caller should pass in most > + * of the parameters from the get_unmapped_area() request, but properly > + * specify @size as the alignment needed. .. "The function willl try to return a VMA starting address such that ret % size == 0" Jason
On Fri, Jun 13, 2025 at 01:00:20PM -0300, Jason Gunthorpe wrote: > On Fri, Jun 13, 2025 at 11:13:58AM -0400, Peter Xu wrote: > > > I didn't intuitively guess how it works or why there are two > > > length/size arguments. It seems to have an exciting return code as > > > well. > > > > > > I suppose size is the alignment target? Maybe rename the parameter too? > > > > Yes, when the kdoc is there it'll be more obvious. So far "size" is ok to > > me, but if you have better suggestion please shoot - whatever I came up > > with so far seems to be too long, and maybe not necessary when kdoc will be > > available too. > > I would call it align not size Sure thing. > > > > For the purposes of VFIO do we need to be careful about math overflow here: > > > > > > loff_t off_end = off + len; > > > loff_t off_align = round_up(off, size); > > > > > > ? > > > > IIUC the 1st one was covered by the latter check here: > > > > (off + len_pad) < off > > > > Indeed I didn't see what makes sure the 2nd won't overflow. > > I'm not sure the < tests are safe in this modern world. I would use > the overflow helpers directly and remove the < overflow checks. Good to learn the traps, and I also wasn't aware of the helpers. I'll switch to that, thanks! > > > +/** > > + * mm_get_unmapped_area_aligned - Allocate an aligned virtual address > > + * @filp: file target of the mmap() request > > + * @addr: hint address from mmap() request > > + * @len: len of the mmap() request > > + * @off: file offset of the mmap() request > > + * @flags: flags of the mmap() request > > + * @size: the size of alignment the caller requests > > Just "the alignment the caller requests" Sure. > > > + * @vm_flags: the vm_flags passed from get_unmapped_area() caller > > + * > > + * This function should normally be used by a driver's specific > > + * get_unmapped_area() handler to provide a properly aligned virtual > > + * address for a specific mmap() request. The caller should pass in most > > + * of the parameters from the get_unmapped_area() request, but properly > > + * specify @size as the alignment needed. > > .. "The function willl try to return a VMA starting address such that > ret % size == 0" This is not true though when pgoff isn't aligned.. For example, an allocation with (len=32M, size=2M, pgoff=1M) will return an address that is N*2M+1M, so that starting from pgoff=2M it'll be completely aligned. In this case the returned mmap() address must not be aligned to make it happen, and the range within pgoff=1M-2M will be mapped with 4K. -- Peter Xu
© 2016 - 2025 Red Hat, Inc.