target/riscv/pmp: guard against PMP ranges with a negative size

[PATCH] target/riscv/pmp: guard against PMP ranges with a negative size

Posted by Nicolas Pitre 1 year, 11 months ago

For a TOR entry to match, the stard address must be lower than the end
address. Normally this is always the case, but correct code might still
run into the following scenario:

Initial state:

	pmpaddr3 = 0x2000	pmp3cfg = OFF
	pmpaddr4 = 0x3000	pmp4cfg = TOR

Execution:

	1. write 0x40ff to pmpaddr3
	2. write 0x32ff to pmpaddr4
	3. set pmp3cfg to NAPOT with a read-modify-write on pmpcfg0
	4. set pmp4cfg to NAPOT with a read-modify-write on pmpcfg1

When (2) is emulated, a call to pmp_update_rule() creates a negative
range for pmp4 as pmp4cfg is still set to TOR. And when (3) is emulated,
a call to tlb_flush() is performed, causing pmp_get_tlb_size() to return
a very creatively large TLB size for pmp4. This, in turn, may result in
accesses to non-existent/unitialized memory regions and a fault, so that
(4) ends up never being executed.

This is in m-mode with MPRV unset, meaning that unlocked PMP entries
should have no effect. Therefore such a behavior based on PMP content
is very unexpected.

Make sure no negative PMP range can be created, whether explicitly by
the emulated code or implicitly like the above.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 151da3fa08..ea2b67d947 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -167,6 +167,9 @@ void pmp_update_rule_addr(CPURISCVState *env, uint32_t pmp_index)
     case PMP_AMATCH_TOR:
         sa = prev_addr << 2; /* shift up from [xx:0] to [xx+2:2] */
         ea = (this_addr << 2) - 1u;
+        if (sa > ea) {
+            sa = ea = 0u;
+        }
         break;
 
     case PMP_AMATCH_NA4:

Re: [PATCH] target/riscv/pmp: guard against PMP ranges with a negative size

Posted by Alistair Francis 1 year, 11 months ago

On Thu, Jun 16, 2022 at 7:12 AM Nicolas Pitre <nico@fluxnic.net> wrote:
>
> For a TOR entry to match, the stard address must be lower than the end
> address. Normally this is always the case, but correct code might still
> run into the following scenario:
>
> Initial state:
>
>         pmpaddr3 = 0x2000       pmp3cfg = OFF
>         pmpaddr4 = 0x3000       pmp4cfg = TOR
>
> Execution:
>
>         1. write 0x40ff to pmpaddr3
>         2. write 0x32ff to pmpaddr4
>         3. set pmp3cfg to NAPOT with a read-modify-write on pmpcfg0
>         4. set pmp4cfg to NAPOT with a read-modify-write on pmpcfg1
>
> When (2) is emulated, a call to pmp_update_rule() creates a negative
> range for pmp4 as pmp4cfg is still set to TOR. And when (3) is emulated,
> a call to tlb_flush() is performed, causing pmp_get_tlb_size() to return
> a very creatively large TLB size for pmp4. This, in turn, may result in
> accesses to non-existent/unitialized memory regions and a fault, so that
> (4) ends up never being executed.
>
> This is in m-mode with MPRV unset, meaning that unlocked PMP entries
> should have no effect. Therefore such a behavior based on PMP content
> is very unexpected.
>
> Make sure no negative PMP range can be created, whether explicitly by
> the emulated code or implicitly like the above.
>
> Signed-off-by: Nicolas Pitre <nico@fluxnic.net>

Thanks!

Applied to riscv-to-apply.next

Alistair

>
> diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
> index 151da3fa08..ea2b67d947 100644
> --- a/target/riscv/pmp.c
> +++ b/target/riscv/pmp.c
> @@ -167,6 +167,9 @@ void pmp_update_rule_addr(CPURISCVState *env, uint32_t pmp_index)
>      case PMP_AMATCH_TOR:
>          sa = prev_addr << 2; /* shift up from [xx:0] to [xx+2:2] */
>          ea = (this_addr << 2) - 1u;
> +        if (sa > ea) {
> +            sa = ea = 0u;
> +        }
>          break;
>
>      case PMP_AMATCH_NA4:
>

Re: [PATCH] target/riscv/pmp: guard against PMP ranges with a negative size

Posted by Alistair Francis 1 year, 11 months ago

On Thu, Jun 16, 2022 at 7:12 AM Nicolas Pitre <nico@fluxnic.net> wrote:
>
> For a TOR entry to match, the stard address must be lower than the end
> address. Normally this is always the case, but correct code might still
> run into the following scenario:
>
> Initial state:
>
>         pmpaddr3 = 0x2000       pmp3cfg = OFF
>         pmpaddr4 = 0x3000       pmp4cfg = TOR
>
> Execution:
>
>         1. write 0x40ff to pmpaddr3
>         2. write 0x32ff to pmpaddr4

Hey, thanks for that patch!

So, at this point we have a PMP region enforcing

0x40ff <= addr < 0x32ff

which is going to be wrong as that isn't valid. But this is also
partially a guest bug. If a guest sets invalid PMP regions we should
be throwing exceptions (if the PMP region is enabled and enforced in
the current mode)

>         3. set pmp3cfg to NAPOT with a read-modify-write on pmpcfg0
>         4. set pmp4cfg to NAPOT with a read-modify-write on pmpcfg1
>
> When (2) is emulated, a call to pmp_update_rule() creates a negative
> range for pmp4 as pmp4cfg is still set to TOR. And when (3) is emulated,

I don't see where the negative comes from. From what I can tell we
should just set `sa` and `ea` to the values specified by the guest.

> a call to tlb_flush() is performed, causing pmp_get_tlb_size() to return
> a very creatively large TLB size for pmp4. This, in turn, may result in

Hmm.. pmp_get_tlb_size() assumes pmp_ea > pmp_sa. Maybe we should add
a check in there?

> accesses to non-existent/unitialized memory regions and a fault, so that
> (4) ends up never being executed.
>
> This is in m-mode with MPRV unset, meaning that unlocked PMP entries
> should have no effect. Therefore such a behavior based on PMP content
> is very unexpected.

Ok, this part is a QEMU bug. If we aren't enforcing PMP regions we
should not be throwing PMP errors.

get_physical_address_pmp() should give us full permissions though in
this case, so I don't see where the failure is. Can you include some
more details?

>
> Make sure no negative PMP range can be created, whether explicitly by
> the emulated code or implicitly like the above.
>
> Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
>
> diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
> index 151da3fa08..ea2b67d947 100644
> --- a/target/riscv/pmp.c
> +++ b/target/riscv/pmp.c
> @@ -167,6 +167,9 @@ void pmp_update_rule_addr(CPURISCVState *env, uint32_t pmp_index)
>      case PMP_AMATCH_TOR:
>          sa = prev_addr << 2; /* shift up from [xx:0] to [xx+2:2] */
>          ea = (this_addr << 2) - 1u;
> +        if (sa > ea) {
> +            sa = ea = 0u;
> +        }

This doesn't seem right though.

Image if a guest sets the values you have above, then jumps to user
mode. The spec doesn't seem to say what should happen with invalid PMP
ranges, but I feel like we should throw exceptions instead of just
ignoring the config.

Alistair

>          break;
>
>      case PMP_AMATCH_NA4:
>

Re: [PATCH] target/riscv/pmp: guard against PMP ranges with a negative size

Posted by Nicolas Pitre 1 year, 11 months ago

On Thu, 16 Jun 2022, Alistair Francis wrote:

> On Thu, Jun 16, 2022 at 7:12 AM Nicolas Pitre <nico@fluxnic.net> wrote:
> >
> > For a TOR entry to match, the stard address must be lower than the end
> > address. Normally this is always the case, but correct code might still
> > run into the following scenario:
> >
> > Initial state:
> >
> >         pmpaddr3 = 0x2000       pmp3cfg = OFF
> >         pmpaddr4 = 0x3000       pmp4cfg = TOR
> >
> > Execution:
> >
> >         1. write 0x40ff to pmpaddr3
> >         2. write 0x32ff to pmpaddr4
> 
> Hey, thanks for that patch!
> 
> So, at this point we have a PMP region enforcing
> 
> 0x40ff <= addr < 0x32ff
> 
> which is going to be wrong as that isn't valid. But this is also
> partially a guest bug. If a guest sets invalid PMP regions we should
> be throwing exceptions (if the PMP region is enabled and enforced in
> the current mode)

The guest cannot change all the relevant PMP registers for a single PMP 
entry atomically. This is why you normally update the PMP configuration 
in m-mode with MPRV unset for invalid but transient PMP states to have 
no adverse effects.

> >         3. set pmp3cfg to NAPOT with a read-modify-write on pmpcfg0
> >         4. set pmp4cfg to NAPOT with a read-modify-write on pmpcfg1
> >
> > When (2) is emulated, a call to pmp_update_rule() creates a negative
> > range for pmp4 as pmp4cfg is still set to TOR. And when (3) is emulated,
> 
> I don't see where the negative comes from. From what I can tell we
> should just set `sa` and `ea` to the values specified by the guest.

Eventually that is the case i.e. when the type is changed from TOR to 
NAPOT in (4), at which point everything is well and sane. But that can't 
happen atomically as I said. Yet QEMU does interpret the intermediate 
state although it shouldn't.

> > a call to tlb_flush() is performed, causing pmp_get_tlb_size() to return
> > a very creatively large TLB size for pmp4. This, in turn, may result in
> 
> Hmm.. pmp_get_tlb_size() assumes pmp_ea > pmp_sa. Maybe we should add
> a check in there?

It is more efficient to prevent sa > ea in the first place as 
pmp_get_tlb_size() is called way more often than pmp_update_rule_addr().

Let's not forget that those sa/ea are simplified representations of the 
hardware and not what the guest can see. The original register content 
written by the guest is preserved.

> > accesses to non-existent/unitialized memory regions and a fault, so that
> > (4) ends up never being executed.
> >
> > This is in m-mode with MPRV unset, meaning that unlocked PMP entries
> > should have no effect. Therefore such a behavior based on PMP content
> > is very unexpected.
> 
> Ok, this part is a QEMU bug. If we aren't enforcing PMP regions we
> should not be throwing PMP errors.

Right. But this is not a PMP error. It is QEMU screwing up its internal 
state. It does even segfault when this condition occur if -d cpu is 
provided.

> get_physical_address_pmp() should give us full permissions though in
> this case, so I don't see where the failure is.

And it does. That's not where the problem is.

> Can you include some more details?

As I said, in the middle of updating the pmpcfg registers in (3), 
tlb_flush() is invoked which causes the core to revalidate its memory 
TLB cache down through pmp_get_tlb_size() where bad answers are 
returned. This causes the core to assume a different memory area which 
doesn't exist and the next memory access ends up calling 
unassigned_mem_accepts() where false is unconditionally returned. And I 
suspect this is also the result of a buffer overflow as the address 
logged in memory_region_access_valid() is sometimes complete garbage 
too, and the occasional QEMU segfault.

> > Make sure no negative PMP range can be created, whether explicitly by
> > the emulated code or implicitly like the above.
> >
> > Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
> >
> > diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
> > index 151da3fa08..ea2b67d947 100644
> > --- a/target/riscv/pmp.c
> > +++ b/target/riscv/pmp.c
> > @@ -167,6 +167,9 @@ void pmp_update_rule_addr(CPURISCVState *env, uint32_t pmp_index)
> >      case PMP_AMATCH_TOR:
> >          sa = prev_addr << 2; /* shift up from [xx:0] to [xx+2:2] */
> >          ea = (this_addr << 2) - 1u;
> > +        if (sa > ea) {
> > +            sa = ea = 0u;
> > +        }
> 
> This doesn't seem right though.
> 
> Image if a guest sets the values you have above, then jumps to user
> mode. The spec doesn't seem to say what should happen with invalid PMP
> ranges, but I feel like we should throw exceptions instead of just
> ignoring the config.

The spec says that a given memory access has to be >= the bottom 
boundary and < the top boundary for a PMP entry to match. So an invalid 
PMP entry should simply not match. That's what a hardware comparator 
would do: it would just not match. Furthermore, you cannot tell if the 
guest is in the middle of updating its PMP configuration which are split 
across several registers as illustrated above, therefore transient 
invalid states are unavoidable.

Nicolas

Re: [PATCH] target/riscv/pmp: guard against PMP ranges with a negative size

Posted by Alistair Francis 1 year, 11 months ago

On Thu, Jun 16, 2022 at 10:20 PM Nicolas Pitre <nico@fluxnic.net> wrote:
>
> On Thu, 16 Jun 2022, Alistair Francis wrote:
>
> > On Thu, Jun 16, 2022 at 7:12 AM Nicolas Pitre <nico@fluxnic.net> wrote:
> > >
> > > For a TOR entry to match, the stard address must be lower than the end
> > > address. Normally this is always the case, but correct code might still
> > > run into the following scenario:
> > >
> > > Initial state:
> > >
> > >         pmpaddr3 = 0x2000       pmp3cfg = OFF
> > >         pmpaddr4 = 0x3000       pmp4cfg = TOR
> > >
> > > Execution:
> > >
> > >         1. write 0x40ff to pmpaddr3
> > >         2. write 0x32ff to pmpaddr4
> >
> > Hey, thanks for that patch!
> >
> > So, at this point we have a PMP region enforcing
> >
> > 0x40ff <= addr < 0x32ff
> >
> > which is going to be wrong as that isn't valid. But this is also
> > partially a guest bug. If a guest sets invalid PMP regions we should
> > be throwing exceptions (if the PMP region is enabled and enforced in
> > the current mode)
>
> The guest cannot change all the relevant PMP registers for a single PMP
> entry atomically. This is why you normally update the PMP configuration
> in m-mode with MPRV unset for invalid but transient PMP states to have
> no adverse effects.
>
> > >         3. set pmp3cfg to NAPOT with a read-modify-write on pmpcfg0
> > >         4. set pmp4cfg to NAPOT with a read-modify-write on pmpcfg1
> > >
> > > When (2) is emulated, a call to pmp_update_rule() creates a negative
> > > range for pmp4 as pmp4cfg is still set to TOR. And when (3) is emulated,
> >
> > I don't see where the negative comes from. From what I can tell we
> > should just set `sa` and `ea` to the values specified by the guest.
>
> Eventually that is the case i.e. when the type is changed from TOR to
> NAPOT in (4), at which point everything is well and sane. But that can't
> happen atomically as I said. Yet QEMU does interpret the intermediate
> state although it shouldn't.
>
> > > a call to tlb_flush() is performed, causing pmp_get_tlb_size() to return
> > > a very creatively large TLB size for pmp4. This, in turn, may result in
> >
> > Hmm.. pmp_get_tlb_size() assumes pmp_ea > pmp_sa. Maybe we should add
> > a check in there?
>
> It is more efficient to prevent sa > ea in the first place as
> pmp_get_tlb_size() is called way more often than pmp_update_rule_addr().
>
> Let's not forget that those sa/ea are simplified representations of the
> hardware and not what the guest can see. The original register content
> written by the guest is preserved.
>
> > > accesses to non-existent/unitialized memory regions and a fault, so that
> > > (4) ends up never being executed.
> > >
> > > This is in m-mode with MPRV unset, meaning that unlocked PMP entries
> > > should have no effect. Therefore such a behavior based on PMP content
> > > is very unexpected.
> >
> > Ok, this part is a QEMU bug. If we aren't enforcing PMP regions we
> > should not be throwing PMP errors.
>
> Right. But this is not a PMP error. It is QEMU screwing up its internal
> state. It does even segfault when this condition occur if -d cpu is
> provided.
>
> > get_physical_address_pmp() should give us full permissions though in
> > this case, so I don't see where the failure is.
>
> And it does. That's not where the problem is.
>
> > Can you include some more details?
>
> As I said, in the middle of updating the pmpcfg registers in (3),
> tlb_flush() is invoked which causes the core to revalidate its memory
> TLB cache down through pmp_get_tlb_size() where bad answers are
> returned. This causes the core to assume a different memory area which
> doesn't exist and the next memory access ends up calling
> unassigned_mem_accepts() where false is unconditionally returned. And I
> suspect this is also the result of a buffer overflow as the address
> logged in memory_region_access_valid() is sometimes complete garbage
> too, and the occasional QEMU segfault.
>
> > > Make sure no negative PMP range can be created, whether explicitly by
> > > the emulated code or implicitly like the above.
> > >
> > > Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
> > >
> > > diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
> > > index 151da3fa08..ea2b67d947 100644
> > > --- a/target/riscv/pmp.c
> > > +++ b/target/riscv/pmp.c
> > > @@ -167,6 +167,9 @@ void pmp_update_rule_addr(CPURISCVState *env, uint32_t pmp_index)
> > >      case PMP_AMATCH_TOR:
> > >          sa = prev_addr << 2; /* shift up from [xx:0] to [xx+2:2] */
> > >          ea = (this_addr << 2) - 1u;
> > > +        if (sa > ea) {
> > > +            sa = ea = 0u;
> > > +        }
> >
> > This doesn't seem right though.
> >
> > Image if a guest sets the values you have above, then jumps to user
> > mode. The spec doesn't seem to say what should happen with invalid PMP
> > ranges, but I feel like we should throw exceptions instead of just
> > ignoring the config.
>
> The spec says that a given memory access has to be >= the bottom
> boundary and < the top boundary for a PMP entry to match. So an invalid
> PMP entry should simply not match. That's what a hardware comparator
> would do: it would just not match. Furthermore, you cannot tell if the
> guest is in the middle of updating its PMP configuration which are split
> across several registers as illustrated above, therefore transient
> invalid states are unavoidable.

Good point, also I just found this in the spec:

"If pmpaddri−1 ≥ pmpaddri and pmpcfgi .A=TOR, then PMP entry i matches
no addresses." so it clearly should not match then.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

>
>
> Nicolas