accel/tcg: Support execution from MMIO and small MMU regions

[Qemu-devel] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions

Posted by Peter Maydell 7 years, 7 months ago

This series adds support to TCG for executing from MMIO regions
and small MMU regions. The basic principle is that if get_page_addr_code()
finds that the region is not backed by a full page of RAM then it
returns -1, and tb_gen_code() then generates a non-cached TB
containing a single instruction. Execution from these regions
thus performs the instruction fetch every time, ensuring that we
get the read-from-MMIO and check-small-MMU-region permissions
checks right.

This means that the code path for "generate bus fault for failing
to load an instruction" no longer goes through get_page_addr_code(),
but instead via each target's translate code and its calls to
the cpu_ld*_code() or similar functions. Patch 1 makes sure we
can distinguish insn fetches from data loads when generating the
bus fault exceptions. (Aside: I have assumed that all cpu_ld*_code()
loads should trigger iside faults rather than dside. Hopefully this
is true...)

Patches 2 and 3 make trivial fixes to various callers of
get_page_addr_code(); patch 4 does the work of generating our
single-insn TBs. Patch 5 can then remove all the code that
(mis)handles MMIO regions from get_page_addr_code(). Finally
patch 6 drops the target/arm workarounds for not having support
for executing from small MPU regions.

Note for the Xilinx folks: this patchset makes the mmio-exec
testcase for running from the SPI flash pass. Cedric: you might
like to test the aspeed image you had that relies on execution
from an MMIO region too.

The diffstat is pretty satisfying for a patchset that adds
a feature, but it actually undersells it: this code renders the
hw/misc/mmio_interface.c and the mmio_ptr related code in memory.c
and the xilinx-spips device all obsolete, so there are another
couple of hundred lines of code to be deleted there. I opted not
to include that in this patchset, for ease of review.

NB: I tested this with icount, but there are potentially
some weird things that could happen with interactions between
icount's io-recompile and execution from an MMIO device
that returns different instructions each time it's read.

thanks
-- PMM


Peter Maydell (6):
  accel/tcg: Pass read access type through to io_readx()
  accel/tcg: Handle get_page_addr_code() returning -1 in hashtable
    lookups
  accel/tcg: Handle get_page_addr_code() returning -1 in
    tb_check_watchpoint()
  accel/tcg: tb_gen_code(): Create single-insn TB for execution from
    non-RAM
  accel/tcg: Return -1 for execution from MMIO regions in
    get_page_addr_code()
  target/arm: Allow execution from small regions

 accel/tcg/softmmu_template.h |  11 ++--
 include/qom/cpu.h            |   6 +++
 accel/tcg/cpu-exec.c         |   3 ++
 accel/tcg/cputlb.c           | 100 +++++------------------------------
 accel/tcg/translate-all.c    |  23 +++++++-
 memory.c                     |   3 +-
 target/arm/helper.c          |  23 --------
 7 files changed, 52 insertions(+), 117 deletions(-)

-- 
2.17.1

Re: [Qemu-devel] [Qemu-arm] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions

Posted by Philippe Mathieu-Daudé 7 years, 7 months ago

Hi Peter,

On 07/10/2018 01:00 PM, Peter Maydell wrote:
> This series adds support to TCG for executing from MMIO regions
> and small MMU regions. The basic principle is that if get_page_addr_code()
> finds that the region is not backed by a full page of RAM then it
> returns -1, and tb_gen_code() then generates a non-cached TB
> containing a single instruction. Execution from these regions
> thus performs the instruction fetch every time, ensuring that we
> get the read-from-MMIO and check-small-MMU-region permissions
> checks right.
> 
> This means that the code path for "generate bus fault for failing
> to load an instruction" no longer goes through get_page_addr_code(),
> but instead via each target's translate code and its calls to
> the cpu_ld*_code() or similar functions. Patch 1 makes sure we
> can distinguish insn fetches from data loads when generating the
> bus fault exceptions. (Aside: I have assumed that all cpu_ld*_code()
> loads should trigger iside faults rather than dside. Hopefully this
> is true...)
> 
> Patches 2 and 3 make trivial fixes to various callers of
> get_page_addr_code(); patch 4 does the work of generating our
> single-insn TBs. Patch 5 can then remove all the code that
> (mis)handles MMIO regions from get_page_addr_code(). Finally
> patch 6 drops the target/arm workarounds for not having support
> for executing from small MPU regions.
> 
> Note for the Xilinx folks: this patchset makes the mmio-exec
> testcase for running from the SPI flash pass. Cedric: you might
> like to test the aspeed image you had that relies on execution
> from an MMIO region too.

I applied and quickly tested your series on a MIPS SoC I'm working on
which has a tiny SRAM:

(qemu) info mtree
address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-00000000000007ff (prio 0, ram): sram
    0000000010000000-00000000107fffff (prio 0, i/o): pflash
    0000000014000000-0000000014ffffff (prio 0, ram): dram
    000000001fc00000-000000001fc0ffff (prio 0, rom): srom

The firmware copies the ISR in this SRAM area, sadly it didn't work as
expected:

qemu-system-mips: Bad ram pointer 0x4a4
Aborted (core dumped)
(gdb) bt
#0  0x00007f5f34d84e7b in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f5f34d86231 in __GI_abort () at abort.c:79
#2  0x000055f4527a2074 in qemu_ram_addr_from_host_nofail (ptr=0x4a4) at
accel/tcg/cputlb.c:751
#3  0x000055f4527a2887 in get_page_addr_code (env=0x55f454d06728,
addr=2415932580) at accel/tcg/cputlb.c:966
#4  0x000055f4527bd206 in tb_htable_lookup (cpu=0x55f454cfe478,
pc=2415932580, cs_base=0, flags=268435472, cf_mask=0) at
accel/tcg/cpu-exec.c:334
#5  0x000055f4527b7b31 in tb_lookup__cpu_state (cpu=0x55f454cfe478,
pc=0x7f5f17cf7f98, cs_base=0x7f5f17cf7f9c, flags=0x7f5f17cf7f94,
cf_mask=0) at include/exec/tb-lookup.h:39
#6  0x000055f4527b7e29 in helper_lookup_tb_ptr (env=0x55f454d06728) at
accel/tcg/tcg-runtime.c:154
#7  0x00007f5f2494da2d in code_gen_buffer ()
#8  0x000055f4527bcd33 in cpu_tb_exec (cpu=0x55f454cfe478,
itb=0x7f5f2494d880 <code_gen_buffer+129107>) at accel/tcg/cpu-exec.c:171
#9  0x000055f4527bda6a in cpu_loop_exec_tb (cpu=0x55f454cfe478,
tb=0x7f5f2494d880 <code_gen_buffer+129107>, last_tb=0x7f5f17cf8538,
tb_exit=0x7f5f17cf8534) at accel/tcg/cpu-exec.c:615
#10 0x000055f4527bdd57 in cpu_exec (cpu=0x55f454cfe478) at
accel/tcg/cpu-exec.c:725
#11 0x000055f452770575 in tcg_cpu_exec (cpu=0x55f454cfe478) at cpus.c:1363
#12 0x000055f4527707cb in qemu_tcg_rr_cpu_thread_fn (arg=0x55f454cfe478)
at cpus.c:1463
#13 0x000055f452c58a4b in qemu_thread_start (args=0x55f454d2afc0) at
util/qemu-thread-posix.c:504
#14 0x00007f5f351115aa in start_thread (arg=0x7f5f17cfb700) at
pthread_create.c:463
#15 0x00007f5f34e46cbf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This firmware works using the "avoid subpages" trick, allocating 4K for
the SRAM (TARGET_PAGE_SIZE).

I didn't look further, maybe I need to figure out how to add a
"target/mips: Allow execution from small regions" too.

Regards,

Phil.

> The diffstat is pretty satisfying for a patchset that adds
> a feature, but it actually undersells it: this code renders the
> hw/misc/mmio_interface.c and the mmio_ptr related code in memory.c
> and the xilinx-spips device all obsolete, so there are another
> couple of hundred lines of code to be deleted there. I opted not
> to include that in this patchset, for ease of review.
> 
> NB: I tested this with icount, but there are potentially
> some weird things that could happen with interactions between
> icount's io-recompile and execution from an MMIO device
> that returns different instructions each time it's read.

Re: [Qemu-devel] [Qemu-arm] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions

Posted by Peter Maydell 7 years, 7 months ago

On 11 July 2018 at 05:21, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
> Hi Peter,
>
> On 07/10/2018 01:00 PM, Peter Maydell wrote:
>> This series adds support to TCG for executing from MMIO regions
>> and small MMU regions. The basic principle is that if get_page_addr_code()
>> finds that the region is not backed by a full page of RAM then it
>> returns -1, and tb_gen_code() then generates a non-cached TB
>> containing a single instruction. Execution from these regions
>> thus performs the instruction fetch every time, ensuring that we
>> get the read-from-MMIO and check-small-MMU-region permissions
>> checks right.
>>
>> This means that the code path for "generate bus fault for failing
>> to load an instruction" no longer goes through get_page_addr_code(),
>> but instead via each target's translate code and its calls to
>> the cpu_ld*_code() or similar functions. Patch 1 makes sure we
>> can distinguish insn fetches from data loads when generating the
>> bus fault exceptions. (Aside: I have assumed that all cpu_ld*_code()
>> loads should trigger iside faults rather than dside. Hopefully this
>> is true...)
>>
>> Patches 2 and 3 make trivial fixes to various callers of
>> get_page_addr_code(); patch 4 does the work of generating our
>> single-insn TBs. Patch 5 can then remove all the code that
>> (mis)handles MMIO regions from get_page_addr_code(). Finally
>> patch 6 drops the target/arm workarounds for not having support
>> for executing from small MPU regions.
>>
>> Note for the Xilinx folks: this patchset makes the mmio-exec
>> testcase for running from the SPI flash pass. Cedric: you might
>> like to test the aspeed image you had that relies on execution
>> from an MMIO region too.
>
> I applied and quickly tested your series on a MIPS SoC I'm working on
> which has a tiny SRAM:
>
> (qemu) info mtree
> address-space: memory
>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>     0000000000000000-00000000000007ff (prio 0, ram): sram
>     0000000010000000-00000000107fffff (prio 0, i/o): pflash
>     0000000014000000-0000000014ffffff (prio 0, ram): dram
>     000000001fc00000-000000001fc0ffff (prio 0, rom): srom
>
> The firmware copies the ISR in this SRAM area, sadly it didn't work as
> expected:
>
> qemu-system-mips: Bad ram pointer 0x4a4
> Aborted (core dumped)
> (gdb) bt
> #0  0x00007f5f34d84e7b in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007f5f34d86231 in __GI_abort () at abort.c:79
> #2  0x000055f4527a2074 in qemu_ram_addr_from_host_nofail (ptr=0x4a4) at
> accel/tcg/cputlb.c:751
> #3  0x000055f4527a2887 in get_page_addr_code (env=0x55f454d06728,
> addr=2415932580) at accel/tcg/cputlb.c:966
> #4  0x000055f4527bd206 in tb_htable_lookup (cpu=0x55f454cfe478,
> pc=2415932580, cs_base=0, flags=268435472, cf_mask=0) at
> accel/tcg/cpu-exec.c:334
> #5  0x000055f4527b7b31 in tb_lookup__cpu_state (cpu=0x55f454cfe478,
> pc=0x7f5f17cf7f98, cs_base=0x7f5f17cf7f9c, flags=0x7f5f17cf7f94,
> cf_mask=0) at include/exec/tb-lookup.h:39
> #6  0x000055f4527b7e29 in helper_lookup_tb_ptr (env=0x55f454d06728) at
> accel/tcg/tcg-runtime.c:154
> #7  0x00007f5f2494da2d in code_gen_buffer ()
> #8  0x000055f4527bcd33 in cpu_tb_exec (cpu=0x55f454cfe478,
> itb=0x7f5f2494d880 <code_gen_buffer+129107>) at accel/tcg/cpu-exec.c:171
> #9  0x000055f4527bda6a in cpu_loop_exec_tb (cpu=0x55f454cfe478,
> tb=0x7f5f2494d880 <code_gen_buffer+129107>, last_tb=0x7f5f17cf8538,
> tb_exit=0x7f5f17cf8534) at accel/tcg/cpu-exec.c:615
> #10 0x000055f4527bdd57 in cpu_exec (cpu=0x55f454cfe478) at
> accel/tcg/cpu-exec.c:725
> #11 0x000055f452770575 in tcg_cpu_exec (cpu=0x55f454cfe478) at cpus.c:1363
> #12 0x000055f4527707cb in qemu_tcg_rr_cpu_thread_fn (arg=0x55f454cfe478)
> at cpus.c:1463
> #13 0x000055f452c58a4b in qemu_thread_start (args=0x55f454d2afc0) at
> util/qemu-thread-posix.c:504
> #14 0x00007f5f351115aa in start_thread (arg=0x7f5f17cfb700) at
> pthread_create.c:463
> #15 0x00007f5f34e46cbf in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> This firmware works using the "avoid subpages" trick, allocating 4K for
> the SRAM (TARGET_PAGE_SIZE).

Are you sure this is executing from the SRAM at this point?
The PC value in the backtrace is 2415932580 == 900034A4. I don't
know how the MMU is configured at this point, but my guess is that
that's actually in the pflash area, and you're running into a
different bug:
https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg00674.html
where when we attempt to execute from a romd device that's in romd
mode we incorrectly think it's RAM in get_page_addr_code() and crash.

Previously my view on that had been "yes, it's a bug, but if we
fixed it the only change would be that we'd fall over with
'can't execute from a device' rather than crashing". However
now we can execute from devices fixing the bug is potentially
more useful...

thanks
-- PMM

Re: [Qemu-devel] [Qemu-arm] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions

Posted by Peter Maydell 7 years, 7 months ago

On 12 July 2018 at 17:37, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 11 July 2018 at 05:21, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>> I applied and quickly tested your series on a MIPS SoC I'm working on
>> which has a tiny SRAM:
>>
>> (qemu) info mtree
>> address-space: memory
>>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>>     0000000000000000-00000000000007ff (prio 0, ram): sram
>>     0000000010000000-00000000107fffff (prio 0, i/o): pflash
>>     0000000014000000-0000000014ffffff (prio 0, ram): dram
>>     000000001fc00000-000000001fc0ffff (prio 0, rom): srom
>>
>> The firmware copies the ISR in this SRAM area, sadly it didn't work as
>> expected:
>>
>> qemu-system-mips: Bad ram pointer 0x4a4

> Are you sure this is executing from the SRAM at this point?
> The PC value in the backtrace is 2415932580 == 900034A4. I don't
> know how the MMU is configured at this point, but my guess is that
> that's actually in the pflash area, and you're running into a
> different bug:
> https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg00674.html
> where when we attempt to execute from a romd device that's in romd
> mode we incorrectly think it's RAM in get_page_addr_code() and crash.

I just sent a follow-on patch which should fix the 'bad ram pointer'
assert. If your guest really is trying to execute from the pflash
here then it will just misbehave in a different way, though,
since pflash status response bytes are probably not valid MIPS
instructions...

thanks
-- PMM

Re: [Qemu-devel] [Qemu-arm] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions

Posted by Peter Maydell 7 years, 6 months ago

On 10 July 2018 at 17:00, Peter Maydell <peter.maydell@linaro.org> wrote:
> This series adds support to TCG for executing from MMIO regions
> and small MMU regions. The basic principle is that if get_page_addr_code()
> finds that the region is not backed by a full page of RAM then it
> returns -1, and tb_gen_code() then generates a non-cached TB
> containing a single instruction. Execution from these regions
> thus performs the instruction fetch every time, ensuring that we
> get the read-from-MMIO and check-small-MMU-region permissions
> checks right.

Hi; this series has been reviewed, and I propose to put it into
my target-arm.for-3.1 queue, unless anybody has a preference for
it going into master via some other route.

thanks
-- PMM