mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
When invoke vmap_pfns, it call vmap_pfn_apply to set pfn into pte.
It check pfn is valid, if true then warn and return.
This is a mischeck, actually we need set a valid pfn into pte, not an
invalid pfn.
This patch fix it.
Signed-off-by: Huan Yang <link@vivo.com>
Reported-by: Bingbu Cao <bingbu.cao@linux.intel.com>
Closes: https://lore.kernel.org/dri-devel/eb7e0137-3508-4287-98c4-816c5fd98e10@vivo.com/T/#mbda4f64a3532b32e061f4e8763bc8e307bea3ca8
Fixes: b3f78e749865 ("mm: vmalloc must set pte via arch code")
---
mm/vmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 044af7088359..ebadb8ceeba7 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3488,7 +3488,7 @@ static int vmap_pfn_apply(pte_t *pte, unsigned long addr, void *private)
unsigned long pfn = data->pfns[data->idx];
pte_t ptent;
- if (WARN_ON_ONCE(pfn_valid(pfn)))
+ if (WARN_ON_ONCE(!pfn_valid(pfn)))
return -EINVAL;
ptent = pte_mkspecial(pfn_pte(pfn, data->prot));
--
2.39.0
On Wed, Mar 12, 2025 at 02:15:12PM +0800, Huan Yang wrote: > When invoke vmap_pfns, it call vmap_pfn_apply to set pfn into pte. > It check pfn is valid, if true then warn and return. > > This is a mischeck, actually we need set a valid pfn into pte, not an > invalid pfn. As just discussed this is wrong. vmap_pfn is for mapping non-page PFNs and the check is what enforces that. What is the point of having that detailed discussion if you just send the broken patch anyway with a commit log not even acknowledging the facts?
HI Christoph, Thanks for your reply, and I'm sorry for my late reply. Your response didn't appear in my email client; I only saw it on the website.:( >> On Wed, Mar 12, 2025 at 02:15:12PM +0800, Huan Yang wrote: >> When invoke vmap_pfns, it call vmap_pfn_apply to set pfn into pte. >> It check pfn is valid, if true then warn and return. >> >> This is a mischeck, actually we need set a valid pfn into pte, not an >> invalid pfn. > > As just discussed this is wrong. vmap_pfn is for mapping non-page Thank you for your explanation. I now understand that the design of vmap_pfn is indeed intentional. It's design to do this. > PFNs and the check is what enforces that. What is the point of having > that detailed discussion if you just send the broken patch anyway with > a commit log not even acknowledging the facts? Sorry for that. We now have a new use case where, in udmabuf, memory is passed via memfd and can be either shmem or hugetlb. When the memory is hugetlb and HVO is enabled, the tail page's struct is no longer reliable because it has been freed. Can't use vmap. Therefore, when making modifications, I recorded the pfn of the folio base pfn + offset and called vmap_pfns. And, these pfns are valid. So rejected by vmap_pfns. Can we just remove pfn_valid check in vmap_pfns, so make it suit for both of they? If you agree, I wanna send a new patch. Thank you, Huan Yang
On 3/17/25 10:12 AM, Huan Yang wrote: > HI Christoph, > > Thanks for your reply, and I'm sorry for my late reply. Your response > didn't appear in my email client; I only saw it on the website.:( > >>> On Wed, Mar 12, 2025 at 02:15:12PM +0800, Huan Yang wrote: >>> When invoke vmap_pfns, it call vmap_pfn_apply to set pfn into pte. >> It check pfn is valid, if true then warn and return. >> >> This is > a mischeck, actually we need set a valid pfn into pte, not an >> invalid pfn. > >> As just discussed this is wrong. vmap_pfn is for mapping non-page > Thank you for your explanation. I now understand that the design of vmap_pfn > is indeed intentional. It's design to do this. >> PFNs and the check is what enforces that. What is the point of having >> that detailed discussion if you just send the broken patch anyway with >> a commit log not even acknowledging the facts? > Sorry for that. > > We now have a new use case where, in udmabuf, memory is passed via memfd and can > be either shmem or hugetlb. > When the memory is hugetlb and HVO is enabled, the tail page's struct is no longer > reliable because it has been freed. Can't use vmap. > Therefore, when making modifications, I recorded the pfn of the folio base pfn + offset and called vmap_pfns. > And, these pfns are valid. So rejected by vmap_pfns. > > Can we just remove pfn_valid check in vmap_pfns, so make it suit for both of they? > If you agree, I wanna send a new patch. Huan, Why not update udmabuf to make it work with both vmap_pfns() and vmap()? As only the udmabuf knows it is actually working on? I don't think it's a good idea to hack the common API, the WARN_ON() is really a mandatory check, and current case is a good example. > > Thank you, > Huan Yang > -- Best regards, Bingbu Cao
Hi Bingbu 在 2025/3/17 13:29, Bingbu Cao 写道: > On 3/17/25 10:12 AM, Huan Yang wrote: >> HI Christoph, >> >> Thanks for your reply, and I'm sorry for my late reply. Your response >> didn't appear in my email client; I only saw it on the website.:( >> >>>> On Wed, Mar 12, 2025 at 02:15:12PM +0800, Huan Yang wrote: >>>> When invoke vmap_pfns, it call vmap_pfn_apply to set pfn into pte. >> It check pfn is valid, if true then warn and return. >> >> This is >> a mischeck, actually we need set a valid pfn into pte, not an >> invalid pfn. > >>> As just discussed this is wrong. vmap_pfn is for mapping non-page >> Thank you for your explanation. I now understand that the design of vmap_pfn >> is indeed intentional. It's design to do this. >>> PFNs and the check is what enforces that. What is the point of having >>> that detailed discussion if you just send the broken patch anyway with >>> a commit log not even acknowledging the facts? >> Sorry for that. >> >> We now have a new use case where, in udmabuf, memory is passed via memfd and can >> be either shmem or hugetlb. >> When the memory is hugetlb and HVO is enabled, the tail page's struct is no longer >> reliable because it has been freed. Can't use vmap. >> Therefore, when making modifications, I recorded the pfn of the folio base pfn + offset and called vmap_pfns. >> And, these pfns are valid. So rejected by vmap_pfns. >> >> Can we just remove pfn_valid check in vmap_pfns, so make it suit for both of they? >> If you agree, I wanna send a new patch. > Huan, > > Why not update udmabuf to make it work with both vmap_pfns() and > vmap()? As only the udmabuf knows it is actually working on? You mean, If udmabuf invoke vmap if it's normal page-based folio range, else invoke vmap_pfns if it's in HVO based? udmabuf can contained a rane in folio and offset, what if it contains folio's head(with page struct) and remain tail(without page struct, freed by HVO). I think there are no suitable way to map it into vmalloc area.:) Or else, just block hugetlb's folio mapped into vmalloc area? Which I don't think it's a good way. > > I don't think it's a good idea to hack the common API, the WARN_ON() You mean common, but I think vmalloc can provide a more common API that not care if page it's exist, if provide pfn, just map? :) Or else, document it that vmap_pfn just do not welcome page based pfn map?(Just IMO) Thanks, Huan Yang > is really a mandatory check, and current case is a good example. > >> Thank you, >> Huan Yang >>
On Mon, Mar 17, 2025 at 01:29:05PM +0800, Bingbu Cao wrote: > Why not update udmabuf to make it work with both vmap_pfns() and > vmap()? As only the udmabuf knows it is actually working on? > > I don't think it's a good idea to hack the common API, the WARN_ON() > is really a mandatory check, and current case is a good example. What non-page backed memory does udmabuf try to work on, and more importantly how does it actually work on them given that the normal DMA APIs require page backed memory. Or is this just made it up and it doesn't work at all given that it also tries to dma map to the fake miscdevice struct device which can't work for most cases? Mapping non-page memory is difficult and without having coherent theory of what non-page memory you are mapping and being very careful you are extremely unlikely to get it right.
HI Christoph > On Mon, Mar 17, 2025 at 01:29:05PM +0800, Bingbu Cao wrote: >> Why not update udmabuf to make it work with both vmap_pfns() and >> vmap()? As only the udmabuf knows it is actually working on? >> >> I don't think it's a good idea to hack the common API, the WARN_ON() >> is really a mandatory check, and current case is a good example. > What non-page backed memory does udmabuf try to work on, and more It's HUGETLB which enabled VMEMMAP_OPTIMIZE, all tail page's struct will ref to head page struct, and then release tailed page struct. > importantly how does it actually work on them given that the normal > DMA APIs require page backed memory. Or is this just made it up udmabuf's sg_table ref only folio+offset, no any page struct ref. So, any DMA APIs just use folio based. It pin each folio given by memfd&offset.(memfd can be shmem or hugetlb). So, shmem memfd can get page struct, hugetlb's may can't. > and it doesn't work at all given that it also tries to dma map > to the fake miscdevice struct device which can't work for most > cases? This implement map_dma_buf&mmap&vmap&begin/end_cpu_access. It's simple implement. > > Mapping non-page memory is difficult and without having coherent theory > of what non-page memory you are mapping and being very careful you > are extremely unlikely to get it right.
So you want a folio based version of vmap. Please work on that instead of doing crazy hacks using vmap_pfns which is just for memory not historically backed by pages and now folios.
HI Christoph 在 2025/3/18 14:48, Christoph Hellwig 写道: > So you want a folio based version of vmap. Please work on that instead The folio-based vmap implementation you mentioned may indeed be necessary, but it has limited direct relevance to the specific issue I'm currently addressing. > of doing crazy hacks using vmap_pfns which is just for memory not I mentioned that if we use HVO-based hugetlb, we cannot access the page structures corresponding to the physical memory that needs to be mapped. This prevents us from properly invoking vmap, which is why we have turned to using vmap_pfn instead. Even if a folio-based vmap is implemented, it still cannot handle mapping multiple folio ranges of physical memory to vmalloc regions. A range of folio is important, it maybe an offset in memfd, no need entire folio. So, I still consider vmap_pfn to be the optimal solution for this specific scenario. :) > historically backed by pages and now folios. So by HVO, it also not backed by pages, only contains folio head, each tail pfn's page struct go away. Do I misunderstood this part? Please correct me. Thank you Huan Yang
On Tue, Mar 18, 2025 at 04:20:17PM +0800, Huan Yang wrote: > This prevents us from properly invoking vmap, which is why we have turned to using vmap_pfn instead. > > Even if a folio-based vmap is implemented, it still cannot handle mapping multiple folio ranges of physical > > memory to vmalloc regions. A range of folio is important, it maybe an offset in memfd, no need entire folio. > > So, I still consider vmap_pfn to be the optimal solution for this specific scenario. :) No, vmap_pfn is entirely for memory not backed by pages or folios, i.e. PCIe BARs and similar memory. This must not be mixed with proper folio backed memory. So you'll need a vmap for folios to support this use case. > >> historically backed by pages and now folios. > > So by HVO, it also not backed by pages, only contains folio head, each tail pfn's page struct go away. And a fully folios based vmap solves that problem.
On Tue, Mar 18, 2025 at 09:33:30AM +0100, Christoph Hellwig wrote: > On Tue, Mar 18, 2025 at 04:20:17PM +0800, Huan Yang wrote: > > This prevents us from properly invoking vmap, which is why we have turned to using vmap_pfn instead. > > > > Even if a folio-based vmap is implemented, it still cannot handle mapping multiple folio ranges of physical > > > > memory to vmalloc regions. A range of folio is important, it maybe an offset in memfd, no need entire folio. > > > > So, I still consider vmap_pfn to be the optimal solution for this specific scenario. :) > > No, vmap_pfn is entirely for memory not backed by pages or folios, > i.e. PCIe BARs and similar memory. This must not be mixed with proper > folio backed memory. > > So you'll need a vmap for folios to support this use case. https://lore.kernel.org/linux-mm/20250131001806.92349-1-vishal.moola@gmail.com/ seems like a good match for "we need to vmap a range from a file". Vishal has some updates since this version, and I'm sure he can send them out after LSFMM.
HI Christoph 在 2025/3/18 16:33, Christoph Hellwig 写道: > [You don't often get email from hch@lst.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Tue, Mar 18, 2025 at 04:20:17PM +0800, Huan Yang wrote: >> This prevents us from properly invoking vmap, which is why we have turned to using vmap_pfn instead. >> >> Even if a folio-based vmap is implemented, it still cannot handle mapping multiple folio ranges of physical >> >> memory to vmalloc regions. A range of folio is important, it maybe an offset in memfd, no need entire folio. >> >> So, I still consider vmap_pfn to be the optimal solution for this specific scenario. :) > No, vmap_pfn is entirely for memory not backed by pages or folios, > i.e. PCIe BARs and similar memory. This must not be mixed with proper > folio backed memory. OK, I learn it more. > > So you'll need a vmap for folios to support this use case. May can't > >>> historically backed by pages and now folios. >> So by HVO, it also not backed by pages, only contains folio head, each tail pfn's page struct go away. > And a fully folios based vmap solves that problem. A folio may be 2MB or more large 1GB, what if we only need a little, 1M or 512MB, can vmap based on folio can solve it? Normally, can offer 4k-page based array map it. But consider HVO, can't. That's why wanto base on pfn. Thank you Huan Yang >
On Tue, Mar 18, 2025 at 04:39:40PM +0800, Huan Yang wrote: > A folio may be 2MB or more large 1GB, what if we only need a little, 1M or 512MB, can vmap based on folio can solve it? Then you only map part of it by passing a length argument. Note that in general when you have these large folios you also don't have highmem, so if you only map one of them, or part of one of them you don't actually need vmap at all and can just use folio-address.. > Normally, can offer 4k-page based array map it. But consider HVO, can't. That's why wanto base on pfn. Well, for any large folio using this 4k based page interface is actually highly inefficient. So let's fix that. And my loop in willy as Mr. Folio while you're at it.
On 2025/3/18 16:44, Christoph Hellwig wrote: > On Tue, Mar 18, 2025 at 04:39:40PM +0800, Huan Yang wrote: >> A folio may be 2MB or more large 1GB, what if we only need a little, 1M or 512MB, can vmap based on folio can solve it? > > Then you only map part of it by passing a length argument. Note that > in general when you have these large folios you also don't have highmem, > so if you only map one of them, or part of one of them you don't actually > need vmap at all and can just use folio-address.. > >> Normally, can offer 4k-page based array map it. But consider HVO, can't. That's why wanto base on pfn. > > Well, for any large folio using this 4k based page interface is > actually highly inefficient. So let's fix that. And my loop in > willy as Mr. Folio while you're at it. The minimum map unit is page size instead of variable-size folio. For many cases, vmap (to combine many partial folios) is useful (instead of split all folios to order-0 folios in advance) but I agree page array may be inefficient. So I don't think another folio vmap() version is better overall anyway. Thanks, Gao Xiang
On Wed, Mar 19, 2025 at 05:09:06PM +0800, Gao Xiang wrote: > The minimum map unit is page size instead of variable-size folio. > > For many cases, vmap (to combine many partial folios) is useful > (instead of split all folios to order-0 folios in advance) but > I agree page array may be inefficient. > > So I don't think another folio vmap() version is better overall > anyway. Then just reject the mappings for now. vmap/vm_map_ram isn't really intended for mapping totally arbitrary scattered memory anyway. As mentioned before udmabuf also has a grave bug in the dma mapping part, so just marking it broken would be another very sensible option.
HI Christoph Sorry for the later reply. 在 2025/3/20 13:31, Christoph Hellwig 写道: > As mentioned before udmabuf also has a grave bug in the dma mapping > part, so just marking it broken would be another very sensible Could you please provide a more detailed description of this? I'm not the maintainer, but I see the udmabuf's begin patch mentioned as: A driver to let userspace turn memfd regions into dma-bufs. Use case: Allows qemu create dmabufs for the vga framebuffer or virtio-gpu ressources. Then they can be passed around to display those guest things on the host. To spice client for classic full framebuffer display, and hopefully some day to wayland server for seamless guest window display. https://patchwork.freedesktop.org/patch/246100/ Did it also mean to do it? I'd like to learn your thinking Thank you, Huan Yang > option.
HI Christoph 在 2025/3/18 16:44, Christoph Hellwig 写道: > [You don't often get email from hch@lst.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Tue, Mar 18, 2025 at 04:39:40PM +0800, Huan Yang wrote: >> A folio may be 2MB or more large 1GB, what if we only need a little, 1M or 512MB, can vmap based on folio can solve it? > Then you only map part of it by passing a length argument. Note that > in general when you have these large folios you also don't have highmem, > so if you only map one of them, or part of one of them you don't actually > need vmap at all and can just use folio-address.. The situation remains complex. We are not only referring to this single folio, but it may be composed of multiple folios and start with a small piece. For example: Folio 1 2M - use 1M Folio 2 2M - use 2M .... Combine these folios into memory used by udmabuf. So, directly virtual memory useless. :) > >> Normally, can offer 4k-page based array map it. But consider HVO, can't. That's why wanto base on pfn. > Well, for any large folio using this 4k based page interface is > actually highly inefficient. So let's fix that. And my loop in Yes, indeed. You mentioned good to entire folio array mapped into vmalloc, I guest here are many user like it. > willy as Mr. Folio while you're at it. I'd like to research it, but not too fast. :) >
© 2016 - 2025 Red Hat, Inc.