mm/vmalloc.c | 2 ++ 1 file changed, 2 insertions(+)
vmap() takes struct page *pages as one of arguments, and user may provide
an invalid pointer which would lead to DABT at address translation later.
Currently, kernel checks the pages against NULL. In my case, however, the
address was not NULL, and was big enough so that the hardware generated
Address Size Abort on arm64.
Interestingly, this abort happens even if copy_from_kernel_nofault() is
used, which is quite inconvenient for debugging purposes.
This patch adds a pfn_valid() check into vmap() path, so that invalid
mapping will not be created.
RFC: https://lkml.org/lkml/2022/1/18/815
v1: use pfn_valid() instead of adding an arch-specific
arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
mm/vmalloc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d2a00ad4e1dd..a4134ee56b10 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
return -EBUSY;
if (WARN_ON(!page))
return -ENOMEM;
+ if (WARN_ON(!pfn_valid(page_to_pfn(page))))
+ return -EINVAL;
set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
(*nr)++;
} while (pte++, addr += PAGE_SIZE, addr != end);
--
2.30.2
On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: > vmap() takes struct page *pages as one of arguments, and user may provide > an invalid pointer which would lead to DABT at address translation later. Could we spell out 'DABT'? Presumably that's an ARM-specific thing. Just like we don't say #PF for Intel page faults, I think this is probably a 'data abort'? > Currently, kernel checks the pages against NULL. In my case, however, the > address was not NULL, and was big enough so that the hardware generated > Address Size Abort on arm64. > > Interestingly, this abort happens even if copy_from_kernel_nofault() is > used, which is quite inconvenient for debugging purposes. > > This patch adds a pfn_valid() check into vmap() path, so that invalid > mapping will not be created. > > RFC: https://lkml.org/lkml/2022/1/18/815 > v1: use pfn_valid() instead of adding an arch-specific > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > Signed-off-by: Yury Norov <yury.norov@gmail.com> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > mm/vmalloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index d2a00ad4e1dd..a4134ee56b10 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > return -EBUSY; > if (WARN_ON(!page)) > return -ENOMEM; > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > + return -EINVAL; > set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); > (*nr)++; > } while (pte++, addr += PAGE_SIZE, addr != end); > -- > 2.30.2 >
On 1/19/22 6:21 AM, Matthew Wilcox wrote: > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: >> vmap() takes struct page *pages as one of arguments, and user may provide >> an invalid pointer which would lead to DABT at address translation later. > > Could we spell out 'DABT'? Presumably that's an ARM-specific thing. > Just like we don't say #PF for Intel page faults, I think this is > probably a 'data abort'? Right, it is data abort. > >> Currently, kernel checks the pages against NULL. In my case, however, the >> address was not NULL, and was big enough so that the hardware generated >> Address Size Abort on arm64. Could you please provide the actual abort stack here with details. >> >> Interestingly, this abort happens even if copy_from_kernel_nofault() is >> used, which is quite inconvenient for debugging purposes. >> >> This patch adds a pfn_valid() check into vmap() path, so that invalid >> mapping will not be created. >> >> RFC: https://lkml.org/lkml/2022/1/18/815 >> v1: use pfn_valid() instead of adding an arch-specific >> arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. This should be after the '---' below. >> >> Signed-off-by: Yury Norov <yury.norov@gmail.com> > > Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> > >> --- >> mm/vmalloc.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index d2a00ad4e1dd..a4134ee56b10 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, >> return -EBUSY; >> if (WARN_ON(!page)) >> return -ENOMEM; >> + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) >> + return -EINVAL; >> set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); >> (*nr)++; >> } while (pte++, addr += PAGE_SIZE, addr != end); >> -- >> 2.30.2 >> Why should not this just scan over the entire user provided struct page array and make sure that all pages there in are valid via above method, but in vmap() itself before calling vmap_pages_range(). Because seems like a single invalid page detected in vmap_pages_pte_range() will anyways abort the entire vmap(). This will also enable us to drop the existing NULL check above.
On Tue, Jan 18, 2022 at 10:17 PM Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
>
> On 1/19/22 6:21 AM, Matthew Wilcox wrote:
> > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> >> vmap() takes struct page *pages as one of arguments, and user may provide
> >> an invalid pointer which would lead to DABT at address translation later.
> >
> > Could we spell out 'DABT'? Presumably that's an ARM-specific thing.
> > Just like we don't say #PF for Intel page faults, I think this is
> > probably a 'data abort'?
>
> Right, it is data abort.
>
> >
> >> Currently, kernel checks the pages against NULL. In my case, however, the
> >> address was not NULL, and was big enough so that the hardware generated
> >> Address Size Abort on arm64.
>
> Could you please provide the actual abort stack here with details.
[ 665.484101] Unhandled fault at 0xffff8000252cd000
[ 665.488807] Mem abort info:
[ 665.491617] ESR = 0x96000043
[ 665.494675] EC = 0x25: DABT (current EL), IL = 32 bits
[ 665.499985] SET = 0, FnV = 0
[ 665.503039] EA = 0, S1PTW = 0
[ 665.506167] Data abort info:
[ 665.509047] ISV = 0, ISS = 0x00000043
[ 665.512882] CM = 0, WnR = 1
[ 665.515851] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000818cb000
[ 665.522550] [ffff8000252cd000] pgd=000000affcfff003,
pud=000000affcffe003, pmd=0000008fad8c3003, pte=00688000a5217713
[ 665.533160] Internal error: level 3 address size fault: 96000043 [#1] SMP
[ 665.539936] Modules linked in: [...]
[ 665.616212] CPU: 178 PID: 13199 Comm: test Tainted: P OE
5.4.0-84-generic #94~18.04.1-Ubuntu
[ 665.626806] Hardware name: HPE Apollo 70 /C01_APACHE_MB
, BIOS L50_5.13_1.0.6 07/10/2018
[ 665.636618] pstate: 80400009 (Nzcv daif +PAN -UAO)
[ 665.641407] pc : __memset+0x38/0x188
[ 665.645146] lr : test+0xcc/0x3f8
[ 665.650184] sp : ffff8000359bb840
[ 665.653486] x29: ffff8000359bb840 x28: 0000000000000000
[ 665.658785] x27: 0000000000000000 x26: 0000000000231000
[ 665.664083] x25: ffff00ae660f6110 x24: ffff00ae668cb800
[ 665.669382] x23: 0000000000000001 x22: ffff00af533e5000
[ 665.674680] x21: 0000000000001000 x20: 0000000000000000
[ 665.679978] x19: ffff00ae66950000 x18: ffffffffffffffff
[ 665.685276] x17: 00000000588636a5 x16: 0000000000000013
[ 665.690574] x15: ffffffffffffffff x14: 000000000007ffff
[ 665.695872] x13: 0000000080000000 x12: 0140000000000000
[ 665.701170] x11: 0000000000000041 x10: ffff8000652cd000
[ 665.706468] x9 : ffff8000252cf000 x8 : ffff8000252cd000
[ 665.711767] x7 : 0303030303030303 x6 : 0000000000001000
[ 665.717065] x5 : ffff8000252cd000 x4 : 0000000000000000
[ 665.722363] x3 : ffff8000252cdfff x2 : 0000000000000001
[ 665.727661] x1 : 0000000000000003 x0 : ffff8000252cd000
[ 665.732960] Call trace:
[ 665.735395] __memset+0x38/0x188
[...]
TCR_EL1 is 34b5503510, and so TCR_EL1.IPS is 0b100. The pfn that caused
address size abort has bit#47 set, which is far above 16TB that is allowed by
ips == 100.
> >>
> >> Interestingly, this abort happens even if copy_from_kernel_nofault() is
> >> used, which is quite inconvenient for debugging purposes.
> >>
> >> This patch adds a pfn_valid() check into vmap() path, so that invalid
> >> mapping will not be created.
> >>
> >> RFC: https://lkml.org/lkml/2022/1/18/815
> >> v1: use pfn_valid() instead of adding an arch-specific
> >> arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
>
> This should be after the '---' below.
>
> >>
> >> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> >
> > Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> >
> >> ---
> >> mm/vmalloc.c | 2 ++
> >> 1 file changed, 2 insertions(+)
> >>
> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> >> index d2a00ad4e1dd..a4134ee56b10 100644
> >> --- a/mm/vmalloc.c
> >> +++ b/mm/vmalloc.c
> >> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> >> return -EBUSY;
> >> if (WARN_ON(!page))
> >> return -ENOMEM;
> >> + if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> >> + return -EINVAL;
> >> set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
> >> (*nr)++;
> >> } while (pte++, addr += PAGE_SIZE, addr != end);
> >> --
> >> 2.30.2
> >>
>
> Why should not this just scan over the entire user provided struct page
> array and make sure that all pages there in are valid via above method,
> but in vmap() itself before calling vmap_pages_range(). Because seems
> like a single invalid page detected in vmap_pages_pte_range() will
> anyways abort the entire vmap(). This will also enable us to drop the
> existing NULL check above.
I can do this, but why is it any better than the current approach?
Thanks,
Yury
On 1/19/22 10:52 PM, Yury Norov wrote: >> Why should not this just scan over the entire user provided struct page >> array and make sure that all pages there in are valid via above method, >> but in vmap() itself before calling vmap_pages_range(). Because seems >> like a single invalid page detected in vmap_pages_pte_range() will >> anyways abort the entire vmap(). This will also enable us to drop the >> existing NULL check above. > > I can do this, but why is it any better than the current approach? Because it will just return on the first instance where the valid page check fails, saving us some CPU cycles and an incomplete mapping ?
On Wed, Jan 19, 2022 at 7:37 PM Anshuman Khandual <anshuman.khandual@arm.com> wrote: > > > > On 1/19/22 10:52 PM, Yury Norov wrote: > >> Why should not this just scan over the entire user provided struct page > >> array and make sure that all pages there in are valid via above method, > >> but in vmap() itself before calling vmap_pages_range(). Because seems > >> like a single invalid page detected in vmap_pages_pte_range() will > >> anyways abort the entire vmap(). This will also enable us to drop the > >> existing NULL check above. > > > > I can do this, but why is it any better than the current approach? > > Because it will just return on the first instance where the valid page > check fails, saving us some CPU cycles and an incomplete mapping ? This should normally never happen, that's why warn_on() is there. If it happens, there is a serious problem, and the code must be fixed. So, no CPU cycles saving in real life.
Hi, I replied ot the original RFC before spotting this; duplicating those comments here because I think they apply regardless of the mechanism used to work around this. On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: > vmap() takes struct page *pages as one of arguments, and user may provide > an invalid pointer which would lead to DABT at address translation later. > > Currently, kernel checks the pages against NULL. In my case, however, the > address was not NULL, and was big enough so that the hardware generated > Address Size Abort on arm64. Can you give an example of when this might happen? It sounds like you're actually hitting this, so a backtrace would be nice. I'm a bit confused as to when why we'd try to vmap() pages that we didn't have a legitimate struct page for -- where did these addresses come from? It sounds like this is going wrong at a higher level, and we're passing entirely bogus struct page pointers around. This seems like the sort of thing DEBUG_VIRTUAL or similar should check when we initially generate the struct page pointer. > Interestingly, this abort happens even if copy_from_kernel_nofault() is > used, which is quite inconvenient for debugging purposes. I can go take a look at this, but TBH we never expect to take an address size fault to begin with, so this is arguably correct -- it's an internal consistency problem. > This patch adds a pfn_valid() check into vmap() path, so that invalid > mapping will not be created. > > RFC: https://lkml.org/lkml/2022/1/18/815 > v1: use pfn_valid() instead of adding an arch-specific > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > --- > mm/vmalloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index d2a00ad4e1dd..a4134ee56b10 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > return -EBUSY; > if (WARN_ON(!page)) > return -ENOMEM; > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > + return -EINVAL; My fear here is that for this to fire, we've already passed a bogus struct page pointer around the intermediate infrastructure, and any of that might try to use it in unsafe ways (in future even if we don't use it today). I think the fundamental issue here is that we generate a bogus struct page pointer at all, and knowing where that came from would help to fix that. Thanks, Mark. > set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); > (*nr)++; > } while (pte++, addr += PAGE_SIZE, addr != end); > -- > 2.30.2 >
On Wed, Jan 19, 2022 at 3:17 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi,
>
> I replied ot the original RFC before spotting this; duplicating those comments
> here because I think they apply regardless of the mechanism used to work around
> this.
>
> On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> > vmap() takes struct page *pages as one of arguments, and user may provide
> > an invalid pointer which would lead to DABT at address translation later.
> >
> > Currently, kernel checks the pages against NULL. In my case, however, the
> > address was not NULL, and was big enough so that the hardware generated
> > Address Size Abort on arm64.
>
> Can you give an example of when this might happen? It sounds like you're
> actually hitting this, so a backtrace would be nice.
>
> I'm a bit confused as to when why we'd try to vmap() pages that we
> didn't have a legitimate struct page for -- where did these addresses
> come from?
>
> It sounds like this is going wrong at a higher level, and we're passing
> entirely bogus struct page pointers around. This seems like the sort of
> thing DEBUG_VIRTUAL or similar should check when we initially generate
> the struct page pointer.
Hi Mark,
This is an out-of-tree code that does:
vaddr1 = dma_alloc_coherent()
page = virt_to_page() // Wrong here
...
vaddr2 = vmap(page)
memset(vaddr2) // Fault here
virt_to_page() returns a wrong pointer if vaddr1 is not a linear kernel address.
The problem is that vmap() populates pte with bad pfn successfully, and it's
much harder to debug at memory access time.
> > Interestingly, this abort happens even if copy_from_kernel_nofault() is
> > used, which is quite inconvenient for debugging purposes.
>
> I can go take a look at this, but TBH we never expect to take an address size
> fault to begin with, so this is arguably correct -- it's an internal
> consistency problem.
>
> > This patch adds a pfn_valid() check into vmap() path, so that invalid
> > mapping will not be created.
> >
> > RFC: https://lkml.org/lkml/2022/1/18/815
> > v1: use pfn_valid() instead of adding an arch-specific
> > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
> >
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> > mm/vmalloc.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index d2a00ad4e1dd..a4134ee56b10 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> > return -EBUSY;
> > if (WARN_ON(!page))
> > return -ENOMEM;
> > + if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> > + return -EINVAL;
>
> My fear here is that for this to fire, we've already passed a bogus struct page
> pointer around the intermediate infrastructure, and any of that might try to
> use it in unsafe ways (in future even if we don't use it today).
>
> I think the fundamental issue here is that we generate a bogus struct page
> pointer at all, and knowing where that came from would help to fix that.
You're right. That's why WARN_ON() is used for the page == null in the code
above, I believe, - to let client code know that something goes wrong, and it's
not a regular ENOMEM situation.
Thanks,
Yury
> Thanks,
> Mark.
>
> > set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
> > (*nr)++;
> > } while (pte++, addr += PAGE_SIZE, addr != end);
> > --
> > 2.30.2
> >
On Wed, Jan 19, 2022 at 09:00:23AM -0800, Yury Norov wrote: > On Wed, Jan 19, 2022 at 3:17 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > Hi, > > > > I replied ot the original RFC before spotting this; duplicating those comments > > here because I think they apply regardless of the mechanism used to work around > > this. > > > > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: > > > vmap() takes struct page *pages as one of arguments, and user may provide > > > an invalid pointer which would lead to DABT at address translation later. > > > > > > Currently, kernel checks the pages against NULL. In my case, however, the > > > address was not NULL, and was big enough so that the hardware generated > > > Address Size Abort on arm64. > > > > Can you give an example of when this might happen? It sounds like you're > > actually hitting this, so a backtrace would be nice. > > > > I'm a bit confused as to when why we'd try to vmap() pages that we > > didn't have a legitimate struct page for -- where did these addresses > > come from? > > > > It sounds like this is going wrong at a higher level, and we're passing > > entirely bogus struct page pointers around. This seems like the sort of > > thing DEBUG_VIRTUAL or similar should check when we initially generate > > the struct page pointer. > > Hi Mark, > > This is an out-of-tree code that does: > > vaddr1 = dma_alloc_coherent() > page = virt_to_page() // Wrong here > ... > vaddr2 = vmap(page) > memset(vaddr2) // Fault here > > virt_to_page() returns a wrong pointer if vaddr1 is not a linear kernel address. > The problem is that vmap() populates pte with bad pfn successfully, and it's > much harder to debug at memory access time. Ah, ok. FWIW, that case should be caught by DEBUG_VIRTUAL when that's enabled. It would be nice to put that within the commit message, since that clearly indicates this is about catching a mistake in buggy code, and indicates where to go looking next. > > > Interestingly, this abort happens even if copy_from_kernel_nofault() is > > > used, which is quite inconvenient for debugging purposes. > > > > I can go take a look at this, but TBH we never expect to take an address size > > fault to begin with, so this is arguably correct -- it's an internal > > consistency problem. > > > > > This patch adds a pfn_valid() check into vmap() path, so that invalid > > > mapping will not be created. > > > > > > RFC: https://lkml.org/lkml/2022/1/18/815 > > > v1: use pfn_valid() instead of adding an arch-specific > > > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > > > > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > > > --- > > > mm/vmalloc.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > index d2a00ad4e1dd..a4134ee56b10 100644 > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > > > return -EBUSY; > > > if (WARN_ON(!page)) > > > return -ENOMEM; > > > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > > > + return -EINVAL; > > > > My fear here is that for this to fire, we've already passed a bogus struct page > > pointer around the intermediate infrastructure, and any of that might try to > > use it in unsafe ways (in future even if we don't use it today). > > > > I think the fundamental issue here is that we generate a bogus struct page > > pointer at all, and knowing where that came from would help to fix that. > > You're right. That's why WARN_ON() is used for the page == null in the code > above, I believe, - to let client code know that something goes wrong, and it's > not a regular ENOMEM situation. Sure; with all the above in mind I think it's fine for this to be best-effort (e.g. as Robin noted page_to_pfn() might consume parts of the struct page before this), since either way that will indicate roughly where the problem is. As above, it would just be nice for the commit message to be a little more explicit as to that. Thanks, Mark.
On 2022-01-18 23:52, Yury Norov wrote: > vmap() takes struct page *pages as one of arguments, and user may provide > an invalid pointer which would lead to DABT at address translation later. > > Currently, kernel checks the pages against NULL. In my case, however, the > address was not NULL, and was big enough so that the hardware generated > Address Size Abort on arm64. > > Interestingly, this abort happens even if copy_from_kernel_nofault() is > used, which is quite inconvenient for debugging purposes. > > This patch adds a pfn_valid() check into vmap() path, so that invalid > mapping will not be created. > > RFC: https://lkml.org/lkml/2022/1/18/815 > v1: use pfn_valid() instead of adding an arch-specific > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > --- > mm/vmalloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index d2a00ad4e1dd..a4134ee56b10 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > return -EBUSY; > if (WARN_ON(!page)) > return -ENOMEM; > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) Is it page_to_pfn() guaranteed to work without blowing up if page is invalid in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's true... Robin. > + return -EINVAL; > set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); > (*nr)++; > } while (pte++, addr += PAGE_SIZE, addr != end);
On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: > > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's > true... Even if it does blow up, at least it's blowing up here where someone can start to debug it, rather than blowing up on first access, where we no longer have the invlid struct page pointer. I don't think we have a 'page_valid' function which will tell us whether a random pointer is actually a struct page or not.
On Wed, Jan 19, 2022 at 04:27:32PM +0000, Matthew Wilcox wrote:
> On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote:
> > > + if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> >
> > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid
> > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's
> > true...
>
> Even if it does blow up, at least it's blowing up here where someone
> can start to debug it, rather than blowing up on first access, where
> we no longer have the invlid struct page pointer.
>
> I don't think we have a 'page_valid' function which will tell us whether
> a random pointer is actually a struct page or not.
Isn't it supposed to be:
if (!pfn_valid(pfn)) {
handle invalid pfn;
}
page = pfn_to_page(pfn);
Anything else - even trying to convert an invalid page back to a pfn,
could well be unreliable (sparsemem or discontigmem).
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Wed, Jan 19, 2022 at 05:54:15PM +0000, Russell King (Oracle) wrote:
> On Wed, Jan 19, 2022 at 04:27:32PM +0000, Matthew Wilcox wrote:
> > On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote:
> > > > + if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> > >
> > > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid
> > > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's
> > > true...
> >
> > Even if it does blow up, at least it's blowing up here where someone
> > can start to debug it, rather than blowing up on first access, where
> > we no longer have the invlid struct page pointer.
> >
> > I don't think we have a 'page_valid' function which will tell us whether
> > a random pointer is actually a struct page or not.
>
> Isn't it supposed to be:
>
> if (!pfn_valid(pfn)) {
> handle invalid pfn;
> }
>
> page = pfn_to_page(pfn);
>
> Anything else - even trying to convert an invalid page back to a pfn,
> could well be unreliable (sparsemem or discontigmem).
This function is passed an array of pages. We have no way of doing
what you propose.
On 2022-01-19 16:27, Matthew Wilcox wrote: > On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: >>> + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) >> >> Is it page_to_pfn() guaranteed to work without blowing up if page is invalid >> in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's >> true... > > Even if it does blow up, at least it's blowing up here where someone > can start to debug it, rather than blowing up on first access, where > we no longer have the invlid struct page pointer. But if that were the case then we'd blow up on the following line when mk_pte(page, prot) ends up calling it same anyway. Indeed it's arguably the best-case scenario since it would also blow up in page_address() if we hit the vmap_range loop rather than going down the vmap_small_pages_range_noflush() path. Furthermore, if you *are* lucky enough to take a fault upon accessing a bogus mapping, then surely a phys_to_page() calculation on whatever ended up in the PTE should get you back the original "pointer" anyway, shouldn't it? Sure it's a bit more work to extract the caller out of the VMA if necessary, but hey, that's debugging! Maybe vmap() failing means you then pass the offending nonsense to __free_pages() and that ruins your day anyway... The implications are infinitely worse if you've mapped something that did happen to be a valid page, but wasn't the *right* page, such that you don't fault but corrupt things or trigger a fatal system error instead. I'd echo Mark's point that if we can't trust a page pointer to be correct then we've already lost. In general the proportion of "wrong" pointers one can viably attempt to detect is so small that it's rarely ever worth trying, and the cases that are so obviously wrong tend to show up well enough in normal operation (although NULL-safety is of course a bit of a special case when it can simplify API usage). > I don't think we have a 'page_valid' function which will tell us whether > a random pointer is actually a struct page or not. Indeed, my impression is that the only legitimate way to get hold of a page pointer without assumed provenance is via pfn_to_page(), which is where pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a tautology. I guess in this instance it would be technically feasible to implement a function which checks "is this a correctly-aligned pointer within memmap bounds", but IMO that would be a massive step in the wrong direction. We're developers; sometimes we introduce bugs when developing code, and it takes effort to debug them, but that still doesn't make it a good idea to optimise normal code paths for the expectation of writing new catastrophically-bad bugs. Plus logically if such a "page_valid()" check could be justified at all then that should rightfully lead to a churn-fest of adding it to pretty much every interface which accepts page pointers - one half of vmap() is hardly special. FWIW, If anything I reckon a DEBUG_VM option that made checks within page_to_x/x_to_page as appropriate would help Yury's issue just as much, while having the potential to be considerably more useful in general. Cheers, Robin.
On Wed, Jan 19, 2022 at 06:43:10PM +0000, Robin Murphy wrote:
> Indeed, my impression is that the only legitimate way to get hold of a page
> pointer without assumed provenance is via pfn_to_page(), which is where
> pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a
> tautology.
That can only be true if pfn == page_to_pfn(pfn_to_page(pfn)) for all
values of pfn.
Given how pfn_to_page() is defined in the sparsemem case:
#define __pfn_to_page(pfn) \
({ unsigned long __pfn = (pfn); \
struct mem_section *__sec = __pfn_to_section(__pfn); \
__section_mem_map_addr(__sec) + __pfn; \
})
#define page_to_pfn __page_to_pfn
that isn't the case, especially when looking at page_to_pfn():
#define __page_to_pfn(pg) \
({ const struct page *__pg = (pg); \
int __sec = page_to_section(__pg); \
(unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \
})
Where:
static inline unsigned long page_to_section(const struct page *page)
{
return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;
}
So if page_to_section() returns something that is, e.g. zero for an
invalid page in a non-zero section, you're not going to end up with
the right pfn from page_to_pfn().
As I've said now a couple of times, trying to determine of a struct
page pointer is valid is the wrong question to be asking.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On 2022-01-19 19:12, Russell King (Oracle) wrote:
> On Wed, Jan 19, 2022 at 06:43:10PM +0000, Robin Murphy wrote:
>> Indeed, my impression is that the only legitimate way to get hold of a page
>> pointer without assumed provenance is via pfn_to_page(), which is where
>> pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a
>> tautology.
>
> That can only be true if pfn == page_to_pfn(pfn_to_page(pfn)) for all
> values of pfn.
>
> Given how pfn_to_page() is defined in the sparsemem case:
>
> #define __pfn_to_page(pfn) \
> ({ unsigned long __pfn = (pfn); \
> struct mem_section *__sec = __pfn_to_section(__pfn); \
> __section_mem_map_addr(__sec) + __pfn; \
> })
> #define page_to_pfn __page_to_pfn
>
> that isn't the case, especially when looking at page_to_pfn():
>
> #define __page_to_pfn(pg) \
> ({ const struct page *__pg = (pg); \
> int __sec = page_to_section(__pg); \
> (unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \
> })
>
> Where:
>
> static inline unsigned long page_to_section(const struct page *page)
> {
> return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;
> }
>
> So if page_to_section() returns something that is, e.g. zero for an
> invalid page in a non-zero section, you're not going to end up with
> the right pfn from page_to_pfn().
Right, I emphasised "should" in an attempt to imply "in the absence of
serious bugs that have further-reaching consequences anyway".
> As I've said now a couple of times, trying to determine of a struct
> page pointer is valid is the wrong question to be asking.
And doing so in one single place, on the justification of avoiding an
incredibly niche symptom, is even more so. Not to mention that an
address size fault is one of the best possible outcomes anyway, vs. the
untold damage that may stem from accesses actually going through to
random parts of the physical memory map.
Robin.
On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote: > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's > true... Something that all the ARM people weighing in on this don't understand is that basically nobody uses SPARSEMEM without SPARSEMEM_VMEMMAP. So all this complicated code to do page_to_pfn() is never tested. Real users all do a simple subtraction and so the simple pfn_valid() works fine.
© 2016 - 2026 Red Hat, Inc.