[Qemu-devel] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions

Peter Maydell posted 6 patches 5 years, 9 months ago
Failed in applying to current master (apply log)
Test checkpatch passed
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
accel/tcg/softmmu_template.h |  11 ++--
include/qom/cpu.h            |   6 +++
accel/tcg/cpu-exec.c         |   3 ++
accel/tcg/cputlb.c           | 100 +++++------------------------------
accel/tcg/translate-all.c    |  23 +++++++-
memory.c                     |   3 +-
target/arm/helper.c          |  23 --------
7 files changed, 52 insertions(+), 117 deletions(-)
[Qemu-devel] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by Peter Maydell 5 years, 9 months ago
This series adds support to TCG for executing from MMIO regions
and small MMU regions. The basic principle is that if get_page_addr_code()
finds that the region is not backed by a full page of RAM then it
returns -1, and tb_gen_code() then generates a non-cached TB
containing a single instruction. Execution from these regions
thus performs the instruction fetch every time, ensuring that we
get the read-from-MMIO and check-small-MMU-region permissions
checks right.

This means that the code path for "generate bus fault for failing
to load an instruction" no longer goes through get_page_addr_code(),
but instead via each target's translate code and its calls to
the cpu_ld*_code() or similar functions. Patch 1 makes sure we
can distinguish insn fetches from data loads when generating the
bus fault exceptions. (Aside: I have assumed that all cpu_ld*_code()
loads should trigger iside faults rather than dside. Hopefully this
is true...)

Patches 2 and 3 make trivial fixes to various callers of
get_page_addr_code(); patch 4 does the work of generating our
single-insn TBs. Patch 5 can then remove all the code that
(mis)handles MMIO regions from get_page_addr_code(). Finally
patch 6 drops the target/arm workarounds for not having support
for executing from small MPU regions.

Note for the Xilinx folks: this patchset makes the mmio-exec
testcase for running from the SPI flash pass. Cedric: you might
like to test the aspeed image you had that relies on execution
from an MMIO region too.

The diffstat is pretty satisfying for a patchset that adds
a feature, but it actually undersells it: this code renders the
hw/misc/mmio_interface.c and the mmio_ptr related code in memory.c
and the xilinx-spips device all obsolete, so there are another
couple of hundred lines of code to be deleted there. I opted not
to include that in this patchset, for ease of review.

NB: I tested this with icount, but there are potentially
some weird things that could happen with interactions between
icount's io-recompile and execution from an MMIO device
that returns different instructions each time it's read.

thanks
-- PMM


Peter Maydell (6):
  accel/tcg: Pass read access type through to io_readx()
  accel/tcg: Handle get_page_addr_code() returning -1 in hashtable
    lookups
  accel/tcg: Handle get_page_addr_code() returning -1 in
    tb_check_watchpoint()
  accel/tcg: tb_gen_code(): Create single-insn TB for execution from
    non-RAM
  accel/tcg: Return -1 for execution from MMIO regions in
    get_page_addr_code()
  target/arm: Allow execution from small regions

 accel/tcg/softmmu_template.h |  11 ++--
 include/qom/cpu.h            |   6 +++
 accel/tcg/cpu-exec.c         |   3 ++
 accel/tcg/cputlb.c           | 100 +++++------------------------------
 accel/tcg/translate-all.c    |  23 +++++++-
 memory.c                     |   3 +-
 target/arm/helper.c          |  23 --------
 7 files changed, 52 insertions(+), 117 deletions(-)

-- 
2.17.1


Re: [Qemu-devel] [Qemu-arm] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by Philippe Mathieu-Daudé 5 years, 9 months ago
Hi Peter,

On 07/10/2018 01:00 PM, Peter Maydell wrote:
> This series adds support to TCG for executing from MMIO regions
> and small MMU regions. The basic principle is that if get_page_addr_code()
> finds that the region is not backed by a full page of RAM then it
> returns -1, and tb_gen_code() then generates a non-cached TB
> containing a single instruction. Execution from these regions
> thus performs the instruction fetch every time, ensuring that we
> get the read-from-MMIO and check-small-MMU-region permissions
> checks right.
> 
> This means that the code path for "generate bus fault for failing
> to load an instruction" no longer goes through get_page_addr_code(),
> but instead via each target's translate code and its calls to
> the cpu_ld*_code() or similar functions. Patch 1 makes sure we
> can distinguish insn fetches from data loads when generating the
> bus fault exceptions. (Aside: I have assumed that all cpu_ld*_code()
> loads should trigger iside faults rather than dside. Hopefully this
> is true...)
> 
> Patches 2 and 3 make trivial fixes to various callers of
> get_page_addr_code(); patch 4 does the work of generating our
> single-insn TBs. Patch 5 can then remove all the code that
> (mis)handles MMIO regions from get_page_addr_code(). Finally
> patch 6 drops the target/arm workarounds for not having support
> for executing from small MPU regions.
> 
> Note for the Xilinx folks: this patchset makes the mmio-exec
> testcase for running from the SPI flash pass. Cedric: you might
> like to test the aspeed image you had that relies on execution
> from an MMIO region too.

I applied and quickly tested your series on a MIPS SoC I'm working on
which has a tiny SRAM:

(qemu) info mtree
address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-00000000000007ff (prio 0, ram): sram
    0000000010000000-00000000107fffff (prio 0, i/o): pflash
    0000000014000000-0000000014ffffff (prio 0, ram): dram
    000000001fc00000-000000001fc0ffff (prio 0, rom): srom

The firmware copies the ISR in this SRAM area, sadly it didn't work as
expected:

qemu-system-mips: Bad ram pointer 0x4a4
Aborted (core dumped)
(gdb) bt
#0  0x00007f5f34d84e7b in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f5f34d86231 in __GI_abort () at abort.c:79
#2  0x000055f4527a2074 in qemu_ram_addr_from_host_nofail (ptr=0x4a4) at
accel/tcg/cputlb.c:751
#3  0x000055f4527a2887 in get_page_addr_code (env=0x55f454d06728,
addr=2415932580) at accel/tcg/cputlb.c:966
#4  0x000055f4527bd206 in tb_htable_lookup (cpu=0x55f454cfe478,
pc=2415932580, cs_base=0, flags=268435472, cf_mask=0) at
accel/tcg/cpu-exec.c:334
#5  0x000055f4527b7b31 in tb_lookup__cpu_state (cpu=0x55f454cfe478,
pc=0x7f5f17cf7f98, cs_base=0x7f5f17cf7f9c, flags=0x7f5f17cf7f94,
cf_mask=0) at include/exec/tb-lookup.h:39
#6  0x000055f4527b7e29 in helper_lookup_tb_ptr (env=0x55f454d06728) at
accel/tcg/tcg-runtime.c:154
#7  0x00007f5f2494da2d in code_gen_buffer ()
#8  0x000055f4527bcd33 in cpu_tb_exec (cpu=0x55f454cfe478,
itb=0x7f5f2494d880 <code_gen_buffer+129107>) at accel/tcg/cpu-exec.c:171
#9  0x000055f4527bda6a in cpu_loop_exec_tb (cpu=0x55f454cfe478,
tb=0x7f5f2494d880 <code_gen_buffer+129107>, last_tb=0x7f5f17cf8538,
tb_exit=0x7f5f17cf8534) at accel/tcg/cpu-exec.c:615
#10 0x000055f4527bdd57 in cpu_exec (cpu=0x55f454cfe478) at
accel/tcg/cpu-exec.c:725
#11 0x000055f452770575 in tcg_cpu_exec (cpu=0x55f454cfe478) at cpus.c:1363
#12 0x000055f4527707cb in qemu_tcg_rr_cpu_thread_fn (arg=0x55f454cfe478)
at cpus.c:1463
#13 0x000055f452c58a4b in qemu_thread_start (args=0x55f454d2afc0) at
util/qemu-thread-posix.c:504
#14 0x00007f5f351115aa in start_thread (arg=0x7f5f17cfb700) at
pthread_create.c:463
#15 0x00007f5f34e46cbf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This firmware works using the "avoid subpages" trick, allocating 4K for
the SRAM (TARGET_PAGE_SIZE).

I didn't look further, maybe I need to figure out how to add a
"target/mips: Allow execution from small regions" too.

Regards,

Phil.

> The diffstat is pretty satisfying for a patchset that adds
> a feature, but it actually undersells it: this code renders the
> hw/misc/mmio_interface.c and the mmio_ptr related code in memory.c
> and the xilinx-spips device all obsolete, so there are another
> couple of hundred lines of code to be deleted there. I opted not
> to include that in this patchset, for ease of review.
> 
> NB: I tested this with icount, but there are potentially
> some weird things that could happen with interactions between
> icount's io-recompile and execution from an MMIO device
> that returns different instructions each time it's read.

Re: [Qemu-devel] [Qemu-arm] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by Peter Maydell 5 years, 9 months ago
On 11 July 2018 at 05:21, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
> Hi Peter,
>
> On 07/10/2018 01:00 PM, Peter Maydell wrote:
>> This series adds support to TCG for executing from MMIO regions
>> and small MMU regions. The basic principle is that if get_page_addr_code()
>> finds that the region is not backed by a full page of RAM then it
>> returns -1, and tb_gen_code() then generates a non-cached TB
>> containing a single instruction. Execution from these regions
>> thus performs the instruction fetch every time, ensuring that we
>> get the read-from-MMIO and check-small-MMU-region permissions
>> checks right.
>>
>> This means that the code path for "generate bus fault for failing
>> to load an instruction" no longer goes through get_page_addr_code(),
>> but instead via each target's translate code and its calls to
>> the cpu_ld*_code() or similar functions. Patch 1 makes sure we
>> can distinguish insn fetches from data loads when generating the
>> bus fault exceptions. (Aside: I have assumed that all cpu_ld*_code()
>> loads should trigger iside faults rather than dside. Hopefully this
>> is true...)
>>
>> Patches 2 and 3 make trivial fixes to various callers of
>> get_page_addr_code(); patch 4 does the work of generating our
>> single-insn TBs. Patch 5 can then remove all the code that
>> (mis)handles MMIO regions from get_page_addr_code(). Finally
>> patch 6 drops the target/arm workarounds for not having support
>> for executing from small MPU regions.
>>
>> Note for the Xilinx folks: this patchset makes the mmio-exec
>> testcase for running from the SPI flash pass. Cedric: you might
>> like to test the aspeed image you had that relies on execution
>> from an MMIO region too.
>
> I applied and quickly tested your series on a MIPS SoC I'm working on
> which has a tiny SRAM:
>
> (qemu) info mtree
> address-space: memory
>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>     0000000000000000-00000000000007ff (prio 0, ram): sram
>     0000000010000000-00000000107fffff (prio 0, i/o): pflash
>     0000000014000000-0000000014ffffff (prio 0, ram): dram
>     000000001fc00000-000000001fc0ffff (prio 0, rom): srom
>
> The firmware copies the ISR in this SRAM area, sadly it didn't work as
> expected:
>
> qemu-system-mips: Bad ram pointer 0x4a4
> Aborted (core dumped)
> (gdb) bt
> #0  0x00007f5f34d84e7b in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007f5f34d86231 in __GI_abort () at abort.c:79
> #2  0x000055f4527a2074 in qemu_ram_addr_from_host_nofail (ptr=0x4a4) at
> accel/tcg/cputlb.c:751
> #3  0x000055f4527a2887 in get_page_addr_code (env=0x55f454d06728,
> addr=2415932580) at accel/tcg/cputlb.c:966
> #4  0x000055f4527bd206 in tb_htable_lookup (cpu=0x55f454cfe478,
> pc=2415932580, cs_base=0, flags=268435472, cf_mask=0) at
> accel/tcg/cpu-exec.c:334
> #5  0x000055f4527b7b31 in tb_lookup__cpu_state (cpu=0x55f454cfe478,
> pc=0x7f5f17cf7f98, cs_base=0x7f5f17cf7f9c, flags=0x7f5f17cf7f94,
> cf_mask=0) at include/exec/tb-lookup.h:39
> #6  0x000055f4527b7e29 in helper_lookup_tb_ptr (env=0x55f454d06728) at
> accel/tcg/tcg-runtime.c:154
> #7  0x00007f5f2494da2d in code_gen_buffer ()
> #8  0x000055f4527bcd33 in cpu_tb_exec (cpu=0x55f454cfe478,
> itb=0x7f5f2494d880 <code_gen_buffer+129107>) at accel/tcg/cpu-exec.c:171
> #9  0x000055f4527bda6a in cpu_loop_exec_tb (cpu=0x55f454cfe478,
> tb=0x7f5f2494d880 <code_gen_buffer+129107>, last_tb=0x7f5f17cf8538,
> tb_exit=0x7f5f17cf8534) at accel/tcg/cpu-exec.c:615
> #10 0x000055f4527bdd57 in cpu_exec (cpu=0x55f454cfe478) at
> accel/tcg/cpu-exec.c:725
> #11 0x000055f452770575 in tcg_cpu_exec (cpu=0x55f454cfe478) at cpus.c:1363
> #12 0x000055f4527707cb in qemu_tcg_rr_cpu_thread_fn (arg=0x55f454cfe478)
> at cpus.c:1463
> #13 0x000055f452c58a4b in qemu_thread_start (args=0x55f454d2afc0) at
> util/qemu-thread-posix.c:504
> #14 0x00007f5f351115aa in start_thread (arg=0x7f5f17cfb700) at
> pthread_create.c:463
> #15 0x00007f5f34e46cbf in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> This firmware works using the "avoid subpages" trick, allocating 4K for
> the SRAM (TARGET_PAGE_SIZE).

Are you sure this is executing from the SRAM at this point?
The PC value in the backtrace is 2415932580 == 900034A4. I don't
know how the MMU is configured at this point, but my guess is that
that's actually in the pflash area, and you're running into a
different bug:
https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg00674.html
where when we attempt to execute from a romd device that's in romd
mode we incorrectly think it's RAM in get_page_addr_code() and crash.

Previously my view on that had been "yes, it's a bug, but if we
fixed it the only change would be that we'd fall over with
'can't execute from a device' rather than crashing". However
now we can execute from devices fixing the bug is potentially
more useful...

thanks
-- PMM

Re: [Qemu-devel] [Qemu-arm] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by Peter Maydell 5 years, 9 months ago
On 12 July 2018 at 17:37, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 11 July 2018 at 05:21, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>> I applied and quickly tested your series on a MIPS SoC I'm working on
>> which has a tiny SRAM:
>>
>> (qemu) info mtree
>> address-space: memory
>>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>>     0000000000000000-00000000000007ff (prio 0, ram): sram
>>     0000000010000000-00000000107fffff (prio 0, i/o): pflash
>>     0000000014000000-0000000014ffffff (prio 0, ram): dram
>>     000000001fc00000-000000001fc0ffff (prio 0, rom): srom
>>
>> The firmware copies the ISR in this SRAM area, sadly it didn't work as
>> expected:
>>
>> qemu-system-mips: Bad ram pointer 0x4a4

> Are you sure this is executing from the SRAM at this point?
> The PC value in the backtrace is 2415932580 == 900034A4. I don't
> know how the MMU is configured at this point, but my guess is that
> that's actually in the pflash area, and you're running into a
> different bug:
> https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg00674.html
> where when we attempt to execute from a romd device that's in romd
> mode we incorrectly think it's RAM in get_page_addr_code() and crash.

I just sent a follow-on patch which should fix the 'bad ram pointer'
assert. If your guest really is trying to execute from the pflash
here then it will just misbehave in a different way, though,
since pflash status response bytes are probably not valid MIPS
instructions...

thanks
-- PMM

Re: [Qemu-devel] [Qemu-arm] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by Peter Maydell 5 years, 8 months ago
On 10 July 2018 at 17:00, Peter Maydell <peter.maydell@linaro.org> wrote:
> This series adds support to TCG for executing from MMIO regions
> and small MMU regions. The basic principle is that if get_page_addr_code()
> finds that the region is not backed by a full page of RAM then it
> returns -1, and tb_gen_code() then generates a non-cached TB
> containing a single instruction. Execution from these regions
> thus performs the instruction fetch every time, ensuring that we
> get the read-from-MMIO and check-small-MMU-region permissions
> checks right.

Hi; this series has been reviewed, and I propose to put it into
my target-arm.for-3.1 queue, unless anybody has a preference for
it going into master via some other route.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by Cédric Le Goater 5 years, 9 months ago
On 07/10/2018 06:00 PM, Peter Maydell wrote:
> This series adds support to TCG for executing from MMIO regions
> and small MMU regions. The basic principle is that if get_page_addr_code()
> finds that the region is not backed by a full page of RAM then it
> returns -1, and tb_gen_code() then generates a non-cached TB
> containing a single instruction. Execution from these regions
> thus performs the instruction fetch every time, ensuring that we
> get the read-from-MMIO and check-small-MMU-region permissions
> checks right.
> 
> This means that the code path for "generate bus fault for failing
> to load an instruction" no longer goes through get_page_addr_code(),
> but instead via each target's translate code and its calls to
> the cpu_ld*_code() or similar functions. Patch 1 makes sure we
> can distinguish insn fetches from data loads when generating the
> bus fault exceptions. (Aside: I have assumed that all cpu_ld*_code()
> loads should trigger iside faults rather than dside. Hopefully this
> is true...)
> 
> Patches 2 and 3 make trivial fixes to various callers of
> get_page_addr_code(); patch 4 does the work of generating our
> single-insn TBs. Patch 5 can then remove all the code that
> (mis)handles MMIO regions from get_page_addr_code(). Finally
> patch 6 drops the target/arm workarounds for not having support
> for executing from small MPU regions.
> 
> Note for the Xilinx folks: this patchset makes the mmio-exec
> testcase for running from the SPI flash pass. Cedric: you might
> like to test the aspeed image you had that relies on execution
> from an MMIO region too.


For the series,

Tested-by: Cédric Le Goater <clg@kaod.org>

Thanks,

C.

> 
> The diffstat is pretty satisfying for a patchset that adds
> a feature, but it actually undersells it: this code renders the
> hw/misc/mmio_interface.c and the mmio_ptr related code in memory.c
> and the xilinx-spips device all obsolete, so there are another
> couple of hundred lines of code to be deleted there. I opted not
> to include that in this patchset, for ease of review.
> 
> NB: I tested this with icount, but there are potentially
> some weird things that could happen with interactions between
> icount's io-recompile and execution from an MMIO device
> that returns different instructions each time it's read.
> 
> thanks
> -- PMM
> 
> 
> Peter Maydell (6):
>   accel/tcg: Pass read access type through to io_readx()
>   accel/tcg: Handle get_page_addr_code() returning -1 in hashtable
>     lookups
>   accel/tcg: Handle get_page_addr_code() returning -1 in
>     tb_check_watchpoint()
>   accel/tcg: tb_gen_code(): Create single-insn TB for execution from
>     non-RAM
>   accel/tcg: Return -1 for execution from MMIO regions in
>     get_page_addr_code()
>   target/arm: Allow execution from small regions
> 
>  accel/tcg/softmmu_template.h |  11 ++--
>  include/qom/cpu.h            |   6 +++
>  accel/tcg/cpu-exec.c         |   3 ++
>  accel/tcg/cputlb.c           | 100 +++++------------------------------
>  accel/tcg/translate-all.c    |  23 +++++++-
>  memory.c                     |   3 +-
>  target/arm/helper.c          |  23 --------
>  7 files changed, 52 insertions(+), 117 deletions(-)
> 


Re: [Qemu-devel] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by Cédric Le Goater 5 years, 9 months ago
On 07/10/2018 06:00 PM, Peter Maydell wrote:
> This series adds support to TCG for executing from MMIO regions
> and small MMU regions. The basic principle is that if get_page_addr_code()
> finds that the region is not backed by a full page of RAM then it
> returns -1, and tb_gen_code() then generates a non-cached TB
> containing a single instruction. Execution from these regions
> thus performs the instruction fetch every time, ensuring that we
> get the read-from-MMIO and check-small-MMU-region permissions
> checks right.
> 
> This means that the code path for "generate bus fault for failing
> to load an instruction" no longer goes through get_page_addr_code(),
> but instead via each target's translate code and its calls to
> the cpu_ld*_code() or similar functions. Patch 1 makes sure we
> can distinguish insn fetches from data loads when generating the
> bus fault exceptions. (Aside: I have assumed that all cpu_ld*_code()
> loads should trigger iside faults rather than dside. Hopefully this
> is true...)
> 
> Patches 2 and 3 make trivial fixes to various callers of
> get_page_addr_code(); patch 4 does the work of generating our
> single-insn TBs. Patch 5 can then remove all the code that
> (mis)handles MMIO regions from get_page_addr_code(). Finally
> patch 6 drops the target/arm workarounds for not having support
> for executing from small MPU regions.
> 
> Note for the Xilinx folks: this patchset makes the mmio-exec
> testcase for running from the SPI flash pass. Cedric: you might
> like to test the aspeed image you had that relies on execution
> from an MMIO region too.

I have added a memory region alias at 0x0 on the memory region where
the first flash device is mapped and all aspeed machines, palmetto, 
romulus, witherspoon booted fine. 

More or less 4MB of data access is generated and the slowdown is hardly 
noticeable, around one second on a laptop.

I wonder if I should add a bool option to the machine to activate
or not the feature ?

Thanks,

C. 

> The diffstat is pretty satisfying for a patchset that adds
> a feature, but it actually undersells it: this code renders the
> hw/misc/mmio_interface.c and the mmio_ptr related code in memory.c
> and the xilinx-spips device all obsolete, so there are another
> couple of hundred lines of code to be deleted there. I opted not
> to include that in this patchset, for ease of review.
> 
> NB: I tested this with icount, but there are potentially
> some weird things that could happen with interactions between
> icount's io-recompile and execution from an MMIO device
> that returns different instructions each time it's read.
> 
> thanks
> -- PMM
> 
> 
> Peter Maydell (6):
>   accel/tcg: Pass read access type through to io_readx()
>   accel/tcg: Handle get_page_addr_code() returning -1 in hashtable
>     lookups
>   accel/tcg: Handle get_page_addr_code() returning -1 in
>     tb_check_watchpoint()
>   accel/tcg: tb_gen_code(): Create single-insn TB for execution from
>     non-RAM
>   accel/tcg: Return -1 for execution from MMIO regions in
>     get_page_addr_code()
>   target/arm: Allow execution from small regions
> 
>  accel/tcg/softmmu_template.h |  11 ++--
>  include/qom/cpu.h            |   6 +++
>  accel/tcg/cpu-exec.c         |   3 ++
>  accel/tcg/cputlb.c           | 100 +++++------------------------------
>  accel/tcg/translate-all.c    |  23 +++++++-
>  memory.c                     |   3 +-
>  target/arm/helper.c          |  23 --------
>  7 files changed, 52 insertions(+), 117 deletions(-)
> 


Re: [Qemu-devel] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by Peter Maydell 5 years, 9 months ago
On 23 July 2018 at 15:57, Cédric Le Goater <clg@kaod.org> wrote:
> On 07/10/2018 06:00 PM, Peter Maydell wrote:
>> Note for the Xilinx folks: this patchset makes the mmio-exec
>> testcase for running from the SPI flash pass. Cedric: you might
>> like to test the aspeed image you had that relies on execution
>> from an MMIO region too.
>
> I have added a memory region alias at 0x0 on the memory region where
> the first flash device is mapped and all aspeed machines, palmetto,
> romulus, witherspoon booted fine.
>
> More or less 4MB of data access is generated and the slowdown is hardly
> noticeable, around one second on a laptop.
>
> I wonder if I should add a bool option to the machine to activate
> or not the feature ?

I would tend to go with "not" -- if we need the feature then
it's better to run slowly but correctly, and if we don't need
the feature then it doesn't cost us any speed I think.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by Cédric Le Goater 5 years, 9 months ago
On 07/23/2018 05:17 PM, Peter Maydell wrote:
> On 23 July 2018 at 15:57, Cédric Le Goater <clg@kaod.org> wrote:
>> On 07/10/2018 06:00 PM, Peter Maydell wrote:
>>> Note for the Xilinx folks: this patchset makes the mmio-exec
>>> testcase for running from the SPI flash pass. Cedric: you might
>>> like to test the aspeed image you had that relies on execution
>>> from an MMIO region too.
>>
>> I have added a memory region alias at 0x0 on the memory region where
>> the first flash device is mapped and all aspeed machines, palmetto,
>> romulus, witherspoon booted fine.
>>
>> More or less 4MB of data access is generated and the slowdown is hardly
>> noticeable, around one second on a laptop.
>>
>> I wonder if I should add a bool option to the machine to activate
>> or not the feature ?
> 
> I would tend to go with "not" -- if we need the feature then
> it's better to run slowly but correctly, and if we don't need
> the feature then it doesn't cost us any speed I think.


I have chosen a brutal activation (for the moment) :

  https://github.com/legoater/qemu/commit/10b08c5e8df385bba342f02fa0440916203b6fa8

I will send when your patchset is committed.

Images used for tests are here : 

  https://openpower.xyz/job/openbmc-build/distro=ubuntu,target=romulus/lastSuccessfulBuild/artifact/deploy/images/romulus/flash-romulus

  https://openpower.xyz/job/openbmc-build/distro=ubuntu,target=palmetto/lastSuccessfulBuild/artifact/deploy/images/palmetto/flash-palmetto

  https://openpower.xyz/job/openbmc-build/distro=ubuntu,target=witherspoon/lastSuccessfulBuild/artifact/deploy/images/witherspoon/obmc-phosphor-image-witherspoon.ubi.mtd

Thanks,

C. 

Re: [Qemu-devel] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by KONRAD Frederic 5 years, 9 months ago
Hi Peter,

Nice! Thanks for that.

A little question though.. What will happen in the case where the
CPU start executing code at random place because of eg: a badly
configured kernel?

Seeing the patch 5 I guess it will really execute stuff.. So
maybe this is less user-friendly?

Cheers,
Fred

On 07/10/2018 06:00 PM, Peter Maydell wrote:
> This series adds support to TCG for executing from MMIO regions
> and small MMU regions. The basic principle is that if get_page_addr_code()
> finds that the region is not backed by a full page of RAM then it
> returns -1, and tb_gen_code() then generates a non-cached TB
> containing a single instruction. Execution from these regions
> thus performs the instruction fetch every time, ensuring that we
> get the read-from-MMIO and check-small-MMU-region permissions
> checks right.
> 
> This means that the code path for "generate bus fault for failing
> to load an instruction" no longer goes through get_page_addr_code(),
> but instead via each target's translate code and its calls to
> the cpu_ld*_code() or similar functions. Patch 1 makes sure we
> can distinguish insn fetches from data loads when generating the
> bus fault exceptions. (Aside: I have assumed that all cpu_ld*_code()
> loads should trigger iside faults rather than dside. Hopefully this
> is true...)
> 
> Patches 2 and 3 make trivial fixes to various callers of
> get_page_addr_code(); patch 4 does the work of generating our
> single-insn TBs. Patch 5 can then remove all the code that
> (mis)handles MMIO regions from get_page_addr_code(). Finally
> patch 6 drops the target/arm workarounds for not having support
> for executing from small MPU regions.
> 
> Note for the Xilinx folks: this patchset makes the mmio-exec
> testcase for running from the SPI flash pass. Cedric: you might
> like to test the aspeed image you had that relies on execution
> from an MMIO region too.
> 
> The diffstat is pretty satisfying for a patchset that adds
> a feature, but it actually undersells it: this code renders the
> hw/misc/mmio_interface.c and the mmio_ptr related code in memory.c
> and the xilinx-spips device all obsolete, so there are another
> couple of hundred lines of code to be deleted there. I opted not
> to include that in this patchset, for ease of review.
> 
> NB: I tested this with icount, but there are potentially
> some weird things that could happen with interactions between
> icount's io-recompile and execution from an MMIO device
> that returns different instructions each time it's read.
> 
> thanks
> -- PMM
> 
> 
> Peter Maydell (6):
>    accel/tcg: Pass read access type through to io_readx()
>    accel/tcg: Handle get_page_addr_code() returning -1 in hashtable
>      lookups
>    accel/tcg: Handle get_page_addr_code() returning -1 in
>      tb_check_watchpoint()
>    accel/tcg: tb_gen_code(): Create single-insn TB for execution from
>      non-RAM
>    accel/tcg: Return -1 for execution from MMIO regions in
>      get_page_addr_code()
>    target/arm: Allow execution from small regions
> 
>   accel/tcg/softmmu_template.h |  11 ++--
>   include/qom/cpu.h            |   6 +++
>   accel/tcg/cpu-exec.c         |   3 ++
>   accel/tcg/cputlb.c           | 100 +++++------------------------------
>   accel/tcg/translate-all.c    |  23 +++++++-
>   memory.c                     |   3 +-
>   target/arm/helper.c          |  23 --------
>   7 files changed, 52 insertions(+), 117 deletions(-)
> 

Re: [Qemu-devel] [PATCH 0/6] accel/tcg: Support execution from MMIO and small MMU regions
Posted by Peter Maydell 5 years, 9 months ago
On 16 July 2018 at 13:30, KONRAD Frederic <frederic.konrad@adacore.com> wrote:
> Hi Peter,
>
> Nice! Thanks for that.
>
> A little question though.. What will happen in the case where the
> CPU start executing code at random place because of eg: a badly
> configured kernel?
>
> Seeing the patch 5 I guess it will really execute stuff.. So
> maybe this is less user-friendly?

Yes, it's true that we will now happily execute from anything
(device, unassigned memory, etc), and do what the real hardware
would do in that situation (random unhelpful things, infinite
loop of taking exceptions). That's a bit unavoidable though I
think, and there are already lots of cases where QEMU will just
sit there with a black screen because the user has loaded in
a bad guest image that goes off into the weeds without printing
anything to a UART.

It's possible we could devise a user-friendliness option that
tried to pick up symptoms of guests being stuck (eg tracking
whether we're continuously taking exceptions) but that gets
into heuristics a bit.

thanks
-- PMM