[PATCH] vmap(): don't allow invalid pages

Yury Norov posted 1 patch 4 years, 5 months ago
There is a newer version of this series
mm/vmalloc.c | 2 ++
1 file changed, 2 insertions(+)
[PATCH] vmap(): don't allow invalid pages
Posted by Yury Norov 4 years, 5 months ago
vmap() takes struct page *pages as one of arguments, and user may provide
an invalid pointer which would lead to DABT at address translation later.

Currently, kernel checks the pages against NULL. In my case, however, the
address was not NULL, and was big enough so that the hardware generated
Address Size Abort on arm64.

Interestingly, this abort happens even if copy_from_kernel_nofault() is
used, which is quite inconvenient for debugging purposes. 

This patch adds a pfn_valid() check into vmap() path, so that invalid
mapping will not be created.

RFC: https://lkml.org/lkml/2022/1/18/815
v1: use pfn_valid() instead of adding an arch-specific
    arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 mm/vmalloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d2a00ad4e1dd..a4134ee56b10 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
 			return -EBUSY;
 		if (WARN_ON(!page))
 			return -ENOMEM;
+		if (WARN_ON(!pfn_valid(page_to_pfn(page))))
+			return -EINVAL;
 		set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
 		(*nr)++;
 	} while (pte++, addr += PAGE_SIZE, addr != end);
-- 
2.30.2

Re: [PATCH] vmap(): don't allow invalid pages
Posted by Matthew Wilcox 4 years, 5 months ago
On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> vmap() takes struct page *pages as one of arguments, and user may provide
> an invalid pointer which would lead to DABT at address translation later.

Could we spell out 'DABT'?  Presumably that's an ARM-specific thing.
Just like we don't say #PF for Intel page faults, I think this is
probably a 'data abort'?

> Currently, kernel checks the pages against NULL. In my case, however, the
> address was not NULL, and was big enough so that the hardware generated
> Address Size Abort on arm64.
> 
> Interestingly, this abort happens even if copy_from_kernel_nofault() is
> used, which is quite inconvenient for debugging purposes. 
> 
> This patch adds a pfn_valid() check into vmap() path, so that invalid
> mapping will not be created.
> 
> RFC: https://lkml.org/lkml/2022/1/18/815
> v1: use pfn_valid() instead of adding an arch-specific
>     arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>

Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>

> ---
>  mm/vmalloc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d2a00ad4e1dd..a4134ee56b10 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
>  			return -EBUSY;
>  		if (WARN_ON(!page))
>  			return -ENOMEM;
> +		if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> +			return -EINVAL;
>  		set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
>  		(*nr)++;
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
> -- 
> 2.30.2
> 
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Anshuman Khandual 4 years, 5 months ago

On 1/19/22 6:21 AM, Matthew Wilcox wrote:
> On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
>> vmap() takes struct page *pages as one of arguments, and user may provide
>> an invalid pointer which would lead to DABT at address translation later.
> 
> Could we spell out 'DABT'?  Presumably that's an ARM-specific thing.
> Just like we don't say #PF for Intel page faults, I think this is
> probably a 'data abort'?

Right, it is data abort.

> 
>> Currently, kernel checks the pages against NULL. In my case, however, the
>> address was not NULL, and was big enough so that the hardware generated
>> Address Size Abort on arm64.

Could you please provide the actual abort stack here with details.

>>
>> Interestingly, this abort happens even if copy_from_kernel_nofault() is
>> used, which is quite inconvenient for debugging purposes. 
>>
>> This patch adds a pfn_valid() check into vmap() path, so that invalid
>> mapping will not be created.
>>
>> RFC: https://lkml.org/lkml/2022/1/18/815
>> v1: use pfn_valid() instead of adding an arch-specific
>>     arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.

This should be after the '---' below.

>>
>> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> 
> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
>> ---
>>  mm/vmalloc.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index d2a00ad4e1dd..a4134ee56b10 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
>>  			return -EBUSY;
>>  		if (WARN_ON(!page))
>>  			return -ENOMEM;
>> +		if (WARN_ON(!pfn_valid(page_to_pfn(page))))
>> +			return -EINVAL;
>>  		set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
>>  		(*nr)++;
>>  	} while (pte++, addr += PAGE_SIZE, addr != end);
>> -- 
>> 2.30.2
>>

Why should not this just scan over the entire user provided struct page
array and make sure that all pages there in are valid via above method,
but in vmap() itself before calling vmap_pages_range(). Because seems
like a single invalid page detected in vmap_pages_pte_range() will
anyways abort the entire vmap(). This will also enable us to drop the
existing NULL check above.
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Yury Norov 4 years, 5 months ago
On Tue, Jan 18, 2022 at 10:17 PM Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
>
> On 1/19/22 6:21 AM, Matthew Wilcox wrote:
> > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> >> vmap() takes struct page *pages as one of arguments, and user may provide
> >> an invalid pointer which would lead to DABT at address translation later.
> >
> > Could we spell out 'DABT'?  Presumably that's an ARM-specific thing.
> > Just like we don't say #PF for Intel page faults, I think this is
> > probably a 'data abort'?
>
> Right, it is data abort.
>
> >
> >> Currently, kernel checks the pages against NULL. In my case, however, the
> >> address was not NULL, and was big enough so that the hardware generated
> >> Address Size Abort on arm64.
>
> Could you please provide the actual abort stack here with details.

[  665.484101] Unhandled fault at 0xffff8000252cd000
[  665.488807] Mem abort info:
[  665.491617]   ESR = 0x96000043
[  665.494675]   EC = 0x25: DABT (current EL), IL = 32 bits
[  665.499985]   SET = 0, FnV = 0
[  665.503039]   EA = 0, S1PTW = 0
[  665.506167] Data abort info:
[  665.509047]   ISV = 0, ISS = 0x00000043
[  665.512882]   CM = 0, WnR = 1
[  665.515851] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000818cb000
[  665.522550] [ffff8000252cd000] pgd=000000affcfff003,
pud=000000affcffe003, pmd=0000008fad8c3003, pte=00688000a5217713
[  665.533160] Internal error: level 3 address size fault: 96000043 [#1] SMP
[  665.539936] Modules linked in: [...]
[  665.616212] CPU: 178 PID: 13199 Comm: test Tainted: P           OE
   5.4.0-84-generic #94~18.04.1-Ubuntu
[  665.626806] Hardware name: HPE Apollo 70             /C01_APACHE_MB
        , BIOS L50_5.13_1.0.6 07/10/2018
[  665.636618] pstate: 80400009 (Nzcv daif +PAN -UAO)
[  665.641407] pc : __memset+0x38/0x188
[  665.645146] lr : test+0xcc/0x3f8
[  665.650184] sp : ffff8000359bb840
[  665.653486] x29: ffff8000359bb840 x28: 0000000000000000
[  665.658785] x27: 0000000000000000 x26: 0000000000231000
[  665.664083] x25: ffff00ae660f6110 x24: ffff00ae668cb800
[  665.669382] x23: 0000000000000001 x22: ffff00af533e5000
[  665.674680] x21: 0000000000001000 x20: 0000000000000000
[  665.679978] x19: ffff00ae66950000 x18: ffffffffffffffff
[  665.685276] x17: 00000000588636a5 x16: 0000000000000013
[  665.690574] x15: ffffffffffffffff x14: 000000000007ffff
[  665.695872] x13: 0000000080000000 x12: 0140000000000000
[  665.701170] x11: 0000000000000041 x10: ffff8000652cd000
[  665.706468] x9 : ffff8000252cf000 x8 : ffff8000252cd000
[  665.711767] x7 : 0303030303030303 x6 : 0000000000001000
[  665.717065] x5 : ffff8000252cd000 x4 : 0000000000000000
[  665.722363] x3 : ffff8000252cdfff x2 : 0000000000000001
[  665.727661] x1 : 0000000000000003 x0 : ffff8000252cd000
[  665.732960] Call trace:
[  665.735395]  __memset+0x38/0x188
[...]

TCR_EL1 is 34b5503510, and so TCR_EL1.IPS is 0b100. The pfn that caused
address size abort has bit#47 set, which is far above 16TB that is allowed by
ips == 100.

> >>
> >> Interestingly, this abort happens even if copy_from_kernel_nofault() is
> >> used, which is quite inconvenient for debugging purposes.
> >>
> >> This patch adds a pfn_valid() check into vmap() path, so that invalid
> >> mapping will not be created.
> >>
> >> RFC: https://lkml.org/lkml/2022/1/18/815
> >> v1: use pfn_valid() instead of adding an arch-specific
> >>     arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
>
> This should be after the '---' below.
>
> >>
> >> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> >
> > Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> >
> >> ---
> >>  mm/vmalloc.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> >> index d2a00ad4e1dd..a4134ee56b10 100644
> >> --- a/mm/vmalloc.c
> >> +++ b/mm/vmalloc.c
> >> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> >>                      return -EBUSY;
> >>              if (WARN_ON(!page))
> >>                      return -ENOMEM;
> >> +            if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> >> +                    return -EINVAL;
> >>              set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
> >>              (*nr)++;
> >>      } while (pte++, addr += PAGE_SIZE, addr != end);
> >> --
> >> 2.30.2
> >>
>
> Why should not this just scan over the entire user provided struct page
> array and make sure that all pages there in are valid via above method,
> but in vmap() itself before calling vmap_pages_range(). Because seems
> like a single invalid page detected in vmap_pages_pte_range() will
> anyways abort the entire vmap(). This will also enable us to drop the
> existing NULL check above.

I can do this, but why is it any better than the current approach?

Thanks,
Yury
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Anshuman Khandual 4 years, 5 months ago

On 1/19/22 10:52 PM, Yury Norov wrote:
>> Why should not this just scan over the entire user provided struct page
>> array and make sure that all pages there in are valid via above method,
>> but in vmap() itself before calling vmap_pages_range(). Because seems
>> like a single invalid page detected in vmap_pages_pte_range() will
>> anyways abort the entire vmap(). This will also enable us to drop the
>> existing NULL check above.
>
> I can do this, but why is it any better than the current approach?

Because it will just return on the first instance where the valid page
check fails, saving us some CPU cycles and an incomplete mapping ?
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Yury Norov 4 years, 5 months ago
On Wed, Jan 19, 2022 at 7:37 PM Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
>
> On 1/19/22 10:52 PM, Yury Norov wrote:
> >> Why should not this just scan over the entire user provided struct page
> >> array and make sure that all pages there in are valid via above method,
> >> but in vmap() itself before calling vmap_pages_range(). Because seems
> >> like a single invalid page detected in vmap_pages_pte_range() will
> >> anyways abort the entire vmap(). This will also enable us to drop the
> >> existing NULL check above.
> >
> > I can do this, but why is it any better than the current approach?
>
> Because it will just return on the first instance where the valid page
> check fails, saving us some CPU cycles and an incomplete mapping ?

This should normally never happen, that's why warn_on() is there. If it
happens, there is a serious problem, and the code must be fixed. So,
no CPU cycles saving in real life.
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Mark Rutland 4 years, 5 months ago
Hi,

I replied ot the original RFC before spotting this; duplicating those comments
here because I think they apply regardless of the mechanism used to work around
this.

On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> vmap() takes struct page *pages as one of arguments, and user may provide
> an invalid pointer which would lead to DABT at address translation later. 
>
> Currently, kernel checks the pages against NULL. In my case, however, the
> address was not NULL, and was big enough so that the hardware generated
> Address Size Abort on arm64.

Can you give an example of when this might happen? It sounds like you're
actually hitting this, so a backtrace would be nice.

I'm a bit confused as to when why we'd try to vmap() pages that we
didn't have a legitimate struct page for -- where did these addresses
come from?

It sounds like this is going wrong at a higher level, and we're passing
entirely bogus struct page pointers around. This seems like the sort of
thing DEBUG_VIRTUAL or similar should check when we initially generate
the struct page pointer.

> Interestingly, this abort happens even if copy_from_kernel_nofault() is
> used, which is quite inconvenient for debugging purposes. 

I can go take a look at this, but TBH we never expect to take an address size
fault to begin with, so this is arguably correct -- it's an internal
consistency problem.

> This patch adds a pfn_valid() check into vmap() path, so that invalid
> mapping will not be created.
> 
> RFC: https://lkml.org/lkml/2022/1/18/815
> v1: use pfn_valid() instead of adding an arch-specific
>     arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  mm/vmalloc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d2a00ad4e1dd..a4134ee56b10 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
>  			return -EBUSY;
>  		if (WARN_ON(!page))
>  			return -ENOMEM;
> +		if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> +			return -EINVAL;

My fear here is that for this to fire, we've already passed a bogus struct page
pointer around the intermediate infrastructure, and any of that might try to
use it in unsafe ways (in future even if we don't use it today).

I think the fundamental issue here is that we generate a bogus struct page
pointer at all, and knowing where that came from would help to fix that.

Thanks,
Mark.

>  		set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
>  		(*nr)++;
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
> -- 
> 2.30.2
> 
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Yury Norov 4 years, 5 months ago
On Wed, Jan 19, 2022 at 3:17 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi,
>
> I replied ot the original RFC before spotting this; duplicating those comments
> here because I think they apply regardless of the mechanism used to work around
> this.
>
> On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> > vmap() takes struct page *pages as one of arguments, and user may provide
> > an invalid pointer which would lead to DABT at address translation later.
> >
> > Currently, kernel checks the pages against NULL. In my case, however, the
> > address was not NULL, and was big enough so that the hardware generated
> > Address Size Abort on arm64.
>
> Can you give an example of when this might happen? It sounds like you're
> actually hitting this, so a backtrace would be nice.
>
> I'm a bit confused as to when why we'd try to vmap() pages that we
> didn't have a legitimate struct page for -- where did these addresses
> come from?
>
> It sounds like this is going wrong at a higher level, and we're passing
> entirely bogus struct page pointers around. This seems like the sort of
> thing DEBUG_VIRTUAL or similar should check when we initially generate
> the struct page pointer.

Hi Mark,

This is an out-of-tree code that does:

    vaddr1 = dma_alloc_coherent()
    page = virt_to_page() // Wrong here
    ...
    vaddr2 = vmap(page)
    memset(vaddr2) // Fault here

virt_to_page() returns a wrong pointer if vaddr1 is not a linear kernel address.
The problem is that vmap() populates pte with bad pfn successfully, and it's
much harder to debug at memory access time.

> > Interestingly, this abort happens even if copy_from_kernel_nofault() is
> > used, which is quite inconvenient for debugging purposes.
>
> I can go take a look at this, but TBH we never expect to take an address size
> fault to begin with, so this is arguably correct -- it's an internal
> consistency problem.
>
> > This patch adds a pfn_valid() check into vmap() path, so that invalid
> > mapping will not be created.
> >
> > RFC: https://lkml.org/lkml/2022/1/18/815
> > v1: use pfn_valid() instead of adding an arch-specific
> >     arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
> >
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  mm/vmalloc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index d2a00ad4e1dd..a4134ee56b10 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> >                       return -EBUSY;
> >               if (WARN_ON(!page))
> >                       return -ENOMEM;
> > +             if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> > +                     return -EINVAL;
>
> My fear here is that for this to fire, we've already passed a bogus struct page
> pointer around the intermediate infrastructure, and any of that might try to
> use it in unsafe ways (in future even if we don't use it today).
>
> I think the fundamental issue here is that we generate a bogus struct page
> pointer at all, and knowing where that came from would help to fix that.

You're right. That's why WARN_ON() is used for the page == null in the code
above, I believe, - to let client code know that something goes wrong, and it's
not a regular ENOMEM situation.

Thanks,
Yury

> Thanks,
> Mark.
>
> >               set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
> >               (*nr)++;
> >       } while (pte++, addr += PAGE_SIZE, addr != end);
> > --
> > 2.30.2
> >
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Mark Rutland 4 years, 5 months ago
On Wed, Jan 19, 2022 at 09:00:23AM -0800, Yury Norov wrote:
> On Wed, Jan 19, 2022 at 3:17 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > Hi,
> >
> > I replied ot the original RFC before spotting this; duplicating those comments
> > here because I think they apply regardless of the mechanism used to work around
> > this.
> >
> > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> > > vmap() takes struct page *pages as one of arguments, and user may provide
> > > an invalid pointer which would lead to DABT at address translation later.
> > >
> > > Currently, kernel checks the pages against NULL. In my case, however, the
> > > address was not NULL, and was big enough so that the hardware generated
> > > Address Size Abort on arm64.
> >
> > Can you give an example of when this might happen? It sounds like you're
> > actually hitting this, so a backtrace would be nice.
> >
> > I'm a bit confused as to when why we'd try to vmap() pages that we
> > didn't have a legitimate struct page for -- where did these addresses
> > come from?
> >
> > It sounds like this is going wrong at a higher level, and we're passing
> > entirely bogus struct page pointers around. This seems like the sort of
> > thing DEBUG_VIRTUAL or similar should check when we initially generate
> > the struct page pointer.
> 
> Hi Mark,
> 
> This is an out-of-tree code that does:
> 
>     vaddr1 = dma_alloc_coherent()
>     page = virt_to_page() // Wrong here
>     ...
>     vaddr2 = vmap(page)
>     memset(vaddr2) // Fault here
> 
> virt_to_page() returns a wrong pointer if vaddr1 is not a linear kernel address.
> The problem is that vmap() populates pte with bad pfn successfully, and it's
> much harder to debug at memory access time.
 
Ah, ok. FWIW, that case should be caught by DEBUG_VIRTUAL when that's
enabled.

It would be nice to put that within the commit message, since that
clearly indicates this is about catching a mistake in buggy code, and
indicates where to go looking next.

> > > Interestingly, this abort happens even if copy_from_kernel_nofault() is
> > > used, which is quite inconvenient for debugging purposes.
> >
> > I can go take a look at this, but TBH we never expect to take an address size
> > fault to begin with, so this is arguably correct -- it's an internal
> > consistency problem.
> >
> > > This patch adds a pfn_valid() check into vmap() path, so that invalid
> > > mapping will not be created.
> > >
> > > RFC: https://lkml.org/lkml/2022/1/18/815
> > > v1: use pfn_valid() instead of adding an arch-specific
> > >     arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
> > >
> > > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > > ---
> > >  mm/vmalloc.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index d2a00ad4e1dd..a4134ee56b10 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> > >                       return -EBUSY;
> > >               if (WARN_ON(!page))
> > >                       return -ENOMEM;
> > > +             if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> > > +                     return -EINVAL;
> >
> > My fear here is that for this to fire, we've already passed a bogus struct page
> > pointer around the intermediate infrastructure, and any of that might try to
> > use it in unsafe ways (in future even if we don't use it today).
> >
> > I think the fundamental issue here is that we generate a bogus struct page
> > pointer at all, and knowing where that came from would help to fix that.
> 
> You're right. That's why WARN_ON() is used for the page == null in the code
> above, I believe, - to let client code know that something goes wrong, and it's
> not a regular ENOMEM situation.

Sure; with all the above in mind I think it's fine for this to be
best-effort (e.g. as Robin noted page_to_pfn() might consume parts of
the struct page before this), since either way that will indicate
roughly where the problem is.

As above, it would just be nice for the commit message to be a little
more explicit as to that.

Thanks,
Mark.
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Robin Murphy 4 years, 5 months ago
On 2022-01-18 23:52, Yury Norov wrote:
> vmap() takes struct page *pages as one of arguments, and user may provide
> an invalid pointer which would lead to DABT at address translation later.
> 
> Currently, kernel checks the pages against NULL. In my case, however, the
> address was not NULL, and was big enough so that the hardware generated
> Address Size Abort on arm64.
> 
> Interestingly, this abort happens even if copy_from_kernel_nofault() is
> used, which is quite inconvenient for debugging purposes.
> 
> This patch adds a pfn_valid() check into vmap() path, so that invalid
> mapping will not be created.
> 
> RFC: https://lkml.org/lkml/2022/1/18/815
> v1: use pfn_valid() instead of adding an arch-specific
>      arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>   mm/vmalloc.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d2a00ad4e1dd..a4134ee56b10 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
>   			return -EBUSY;
>   		if (WARN_ON(!page))
>   			return -ENOMEM;
> +		if (WARN_ON(!pfn_valid(page_to_pfn(page))))

Is it page_to_pfn() guaranteed to work without blowing up if page is 
invalid in the first place? Looking at the CONFIG_SPARSEMEM case I'm not 
sure that's true...

Robin.

> +			return -EINVAL;
>   		set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
>   		(*nr)++;
>   	} while (pte++, addr += PAGE_SIZE, addr != end);
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Matthew Wilcox 4 years, 5 months ago
On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote:
> > +		if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> 
> Is it page_to_pfn() guaranteed to work without blowing up if page is invalid
> in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's
> true...

Even if it does blow up, at least it's blowing up here where someone
can start to debug it, rather than blowing up on first access, where
we no longer have the invlid struct page pointer.

I don't think we have a 'page_valid' function which will tell us whether
a random pointer is actually a struct page or not.
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Russell King (Oracle) 4 years, 5 months ago
On Wed, Jan 19, 2022 at 04:27:32PM +0000, Matthew Wilcox wrote:
> On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote:
> > > +		if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> > 
> > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid
> > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's
> > true...
> 
> Even if it does blow up, at least it's blowing up here where someone
> can start to debug it, rather than blowing up on first access, where
> we no longer have the invlid struct page pointer.
> 
> I don't think we have a 'page_valid' function which will tell us whether
> a random pointer is actually a struct page or not.

Isn't it supposed to be:

	if (!pfn_valid(pfn)) {
		handle invalid pfn;
	}

	page = pfn_to_page(pfn);

Anything else - even trying to convert an invalid page back to a pfn,
could well be unreliable (sparsemem or discontigmem). 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Matthew Wilcox 4 years, 5 months ago
On Wed, Jan 19, 2022 at 05:54:15PM +0000, Russell King (Oracle) wrote:
> On Wed, Jan 19, 2022 at 04:27:32PM +0000, Matthew Wilcox wrote:
> > On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote:
> > > > +		if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> > > 
> > > Is it page_to_pfn() guaranteed to work without blowing up if page is invalid
> > > in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's
> > > true...
> > 
> > Even if it does blow up, at least it's blowing up here where someone
> > can start to debug it, rather than blowing up on first access, where
> > we no longer have the invlid struct page pointer.
> > 
> > I don't think we have a 'page_valid' function which will tell us whether
> > a random pointer is actually a struct page or not.
> 
> Isn't it supposed to be:
> 
> 	if (!pfn_valid(pfn)) {
> 		handle invalid pfn;
> 	}
> 
> 	page = pfn_to_page(pfn);
> 
> Anything else - even trying to convert an invalid page back to a pfn,
> could well be unreliable (sparsemem or discontigmem). 

This function is passed an array of pages.  We have no way of doing
what you propose.
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Robin Murphy 4 years, 5 months ago
On 2022-01-19 16:27, Matthew Wilcox wrote:
> On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote:
>>> +		if (WARN_ON(!pfn_valid(page_to_pfn(page))))
>>
>> Is it page_to_pfn() guaranteed to work without blowing up if page is invalid
>> in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's
>> true...
> 
> Even if it does blow up, at least it's blowing up here where someone
> can start to debug it, rather than blowing up on first access, where
> we no longer have the invlid struct page pointer.

But if that were the case then we'd blow up on the following line when 
mk_pte(page, prot) ends up calling it same anyway. Indeed it's arguably 
the best-case scenario since it would also blow up in page_address() if 
we hit the vmap_range loop rather than going down the 
vmap_small_pages_range_noflush() path.

Furthermore, if you *are* lucky enough to take a fault upon accessing a 
bogus mapping, then surely a phys_to_page() calculation on whatever 
ended up in the PTE should get you back the original "pointer" anyway, 
shouldn't it? Sure it's a bit more work to extract the caller out of the 
VMA if necessary, but hey, that's debugging! Maybe vmap() failing means 
you then pass the offending nonsense to __free_pages() and that ruins 
your day anyway...

The implications are infinitely worse if you've mapped something that 
did happen to be a valid page, but wasn't the *right* page, such that 
you don't fault but corrupt things or trigger a fatal system error 
instead. I'd echo Mark's point that if we can't trust a page pointer to 
be correct then we've already lost. In general the proportion of "wrong" 
pointers one can viably attempt to detect is so small that it's rarely 
ever worth trying, and the cases that are so obviously wrong tend to 
show up well enough in normal operation (although NULL-safety is of 
course a bit of a special case when it can simplify API usage).

> I don't think we have a 'page_valid' function which will tell us whether
> a random pointer is actually a struct page or not.

Indeed, my impression is that the only legitimate way to get hold of a 
page pointer without assumed provenance is via pfn_to_page(), which is 
where pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really 
*should* be a tautology.

I guess in this instance it would be technically feasible to implement a 
function which checks "is this a correctly-aligned pointer within memmap 
bounds", but IMO that would be a massive step in the wrong direction. 
We're developers; sometimes we introduce bugs when developing code, and 
it takes effort to debug them, but that still doesn't make it a good 
idea to optimise normal code paths for the expectation of writing new 
catastrophically-bad bugs. Plus logically if such a "page_valid()" check 
could be justified at all then that should rightfully lead to a 
churn-fest of adding it to pretty much every interface which accepts 
page pointers - one half of vmap() is hardly special.

FWIW, If anything I reckon a DEBUG_VM option that made checks within 
page_to_x/x_to_page as appropriate would help Yury's issue just as much, 
while having the potential to be considerably more useful in general.

Cheers,
Robin.
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Russell King (Oracle) 4 years, 5 months ago
On Wed, Jan 19, 2022 at 06:43:10PM +0000, Robin Murphy wrote:
> Indeed, my impression is that the only legitimate way to get hold of a page
> pointer without assumed provenance is via pfn_to_page(), which is where
> pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a
> tautology.

That can only be true if pfn == page_to_pfn(pfn_to_page(pfn)) for all
values of pfn.

Given how pfn_to_page() is defined in the sparsemem case:

#define __pfn_to_page(pfn)                              \
({	unsigned long __pfn = (pfn);                    \
	struct mem_section *__sec = __pfn_to_section(__pfn);    \
	__section_mem_map_addr(__sec) + __pfn;          \
})
#define page_to_pfn __page_to_pfn

that isn't the case, especially when looking at page_to_pfn():

#define __page_to_pfn(pg)                                       \
({      const struct page *__pg = (pg);                         \
        int __sec = page_to_section(__pg);                      \
	(unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \
})

Where:

static inline unsigned long page_to_section(const struct page *page)
{
	return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;
}

So if page_to_section() returns something that is, e.g. zero for an
invalid page in a non-zero section, you're not going to end up with
the right pfn from page_to_pfn().

As I've said now a couple of times, trying to determine of a struct
page pointer is valid is the wrong question to be asking.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Robin Murphy 4 years, 5 months ago
On 2022-01-19 19:12, Russell King (Oracle) wrote:
> On Wed, Jan 19, 2022 at 06:43:10PM +0000, Robin Murphy wrote:
>> Indeed, my impression is that the only legitimate way to get hold of a page
>> pointer without assumed provenance is via pfn_to_page(), which is where
>> pfn_valid() comes in. Thus pfn_valid(page_to_pfn()) really *should* be a
>> tautology.
> 
> That can only be true if pfn == page_to_pfn(pfn_to_page(pfn)) for all
> values of pfn.
> 
> Given how pfn_to_page() is defined in the sparsemem case:
> 
> #define __pfn_to_page(pfn)                              \
> ({	unsigned long __pfn = (pfn);                    \
> 	struct mem_section *__sec = __pfn_to_section(__pfn);    \
> 	__section_mem_map_addr(__sec) + __pfn;          \
> })
> #define page_to_pfn __page_to_pfn
> 
> that isn't the case, especially when looking at page_to_pfn():
> 
> #define __page_to_pfn(pg)                                       \
> ({      const struct page *__pg = (pg);                         \
>          int __sec = page_to_section(__pg);                      \
> 	(unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \
> })
> 
> Where:
> 
> static inline unsigned long page_to_section(const struct page *page)
> {
> 	return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;
> }
> 
> So if page_to_section() returns something that is, e.g. zero for an
> invalid page in a non-zero section, you're not going to end up with
> the right pfn from page_to_pfn().

Right, I emphasised "should" in an attempt to imply "in the absence of 
serious bugs that have further-reaching consequences anyway".

> As I've said now a couple of times, trying to determine of a struct
> page pointer is valid is the wrong question to be asking.

And doing so in one single place, on the justification of avoiding an 
incredibly niche symptom, is even more so. Not to mention that an 
address size fault is one of the best possible outcomes anyway, vs. the 
untold damage that may stem from accesses actually going through to 
random parts of the physical memory map.

Robin.
Re: [PATCH] vmap(): don't allow invalid pages
Posted by Matthew Wilcox 4 years, 5 months ago
On Wed, Jan 19, 2022 at 01:28:14PM +0000, Robin Murphy wrote:
> Is it page_to_pfn() guaranteed to work without blowing up if page is invalid
> in the first place? Looking at the CONFIG_SPARSEMEM case I'm not sure that's
> true...

Something that all the ARM people weighing in on this don't understand
is that basically nobody uses SPARSEMEM without SPARSEMEM_VMEMMAP.
So all this complicated code to do page_to_pfn() is never tested.
Real users all do a simple subtraction and so the simple pfn_valid()
works fine.