[PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory.

Jonathan Cameron via posted 3 patches 8 months, 2 weeks ago
Maintainers: Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Eduardo Habkost <eduardo@habkost.net>
There is a newer version of this series
[PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory.
Posted by Jonathan Cameron via 8 months, 2 weeks ago
On i386, after fixing the page walking code to work with pages in
MMIO memory (specifically CXL emulated interleaved memory),
a crash was seen in an interrupt handling path.

Useful part of bt

Peter identified this as being due to the BQL already being
held when the page table walker encounters MMIO memory and attempts
to take the lock again.  There are other examples of similar paths
TCG, so this follows the approach taken in those of simply checking
if the lock is already held and if it is, don't take it again.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 accel/tcg/cputlb.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 047cd2cc0a..3b8d178707 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
                                int mmu_idx, MMUAccessType type, uintptr_t ra)
 {
     MemoryRegionSection *section;
+    bool locked = bql_locked();
     MemoryRegion *mr;
     hwaddr mr_offset;
     MemTxAttrs attrs;
@@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
     section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra);
     mr = section->mr;
 
-    bql_lock();
+    if (!locked) {
+        bql_lock();
+    }
     ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx,
                           type, ra, mr, mr_offset);
-    bql_unlock();
+    if (!locked) {
+        bql_unlock();
+    }
 
     return ret;
 }
-- 
2.39.2
Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory.
Posted by Richard Henderson 8 months, 2 weeks ago
On 2/15/24 05:01, Jonathan Cameron wrote:
> On i386, after fixing the page walking code to work with pages in
> MMIO memory (specifically CXL emulated interleaved memory),
> a crash was seen in an interrupt handling path.
> 
> Useful part of bt
> 
> Peter identified this as being due to the BQL already being
> held when the page table walker encounters MMIO memory and attempts
> to take the lock again.  There are other examples of similar paths
> TCG, so this follows the approach taken in those of simply checking
> if the lock is already held and if it is, don't take it again.
> 
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>   accel/tcg/cputlb.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 047cd2cc0a..3b8d178707 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
>                                  int mmu_idx, MMUAccessType type, uintptr_t ra)
>   {
>       MemoryRegionSection *section;
> +    bool locked = bql_locked();
>       MemoryRegion *mr;
>       hwaddr mr_offset;
>       MemTxAttrs attrs;
> @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
>       section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra);
>       mr = section->mr;
>   
> -    bql_lock();
> +    if (!locked) {
> +        bql_lock();
> +    }
>       ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx,
>                             type, ra, mr, mr_offset);
> -    bql_unlock();
> +    if (!locked) {
> +        bql_unlock();
> +    }

On top of other comments, I'm never keen on this type of test/lock/test/unlock.  When this 
kind of thing is encountered, it means we should have been using a recursive lock in the 
first place.


r~
Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory.
Posted by Jonathan Cameron via 8 months, 2 weeks ago
On Thu, 15 Feb 2024 09:30:27 -1000
Richard Henderson <richard.henderson@linaro.org> wrote:

> On 2/15/24 05:01, Jonathan Cameron wrote:
> > On i386, after fixing the page walking code to work with pages in
> > MMIO memory (specifically CXL emulated interleaved memory),
> > a crash was seen in an interrupt handling path.
> > 
> > Useful part of bt
> > 
> > Peter identified this as being due to the BQL already being
> > held when the page table walker encounters MMIO memory and attempts
> > to take the lock again.  There are other examples of similar paths
> > TCG, so this follows the approach taken in those of simply checking
> > if the lock is already held and if it is, don't take it again.
> > 
> > Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >   accel/tcg/cputlb.c | 9 +++++++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> > index 047cd2cc0a..3b8d178707 100644
> > --- a/accel/tcg/cputlb.c
> > +++ b/accel/tcg/cputlb.c
> > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
> >                                  int mmu_idx, MMUAccessType type, uintptr_t ra)
> >   {
> >       MemoryRegionSection *section;
> > +    bool locked = bql_locked();
> >       MemoryRegion *mr;
> >       hwaddr mr_offset;
> >       MemTxAttrs attrs;
> > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
> >       section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra);
> >       mr = section->mr;
> >   
> > -    bql_lock();
> > +    if (!locked) {
> > +        bql_lock();
> > +    }
> >       ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx,
> >                             type, ra, mr, mr_offset);
> > -    bql_unlock();
> > +    if (!locked) {
> > +        bql_unlock();
> > +    }  
> 
> On top of other comments, I'm never keen on this type of test/lock/test/unlock.  When this 
> kind of thing is encountered, it means we should have been using a recursive lock in the 
> first place.

Hi Richard,

Whilst I agree this stuff is really ugly, is it practical to fix it for this case?
Or was intent here to make a general comment on QEMU locking?

Jonathan


> 
> 
> r~
Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory.
Posted by Alex Bennée 8 months, 2 weeks ago
Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes:

> On Thu, 15 Feb 2024 09:30:27 -1000
> Richard Henderson <richard.henderson@linaro.org> wrote:
>
>> On 2/15/24 05:01, Jonathan Cameron wrote:
>> > On i386, after fixing the page walking code to work with pages in
>> > MMIO memory (specifically CXL emulated interleaved memory),
>> > a crash was seen in an interrupt handling path.
>> > 
>> > Useful part of bt
>> > 
>> > Peter identified this as being due to the BQL already being
>> > held when the page table walker encounters MMIO memory and attempts
>> > to take the lock again.  There are other examples of similar paths
>> > TCG, so this follows the approach taken in those of simply checking
>> > if the lock is already held and if it is, don't take it again.
>> > 
>> > Suggested-by: Peter Maydell <peter.maydell@linaro.org>
>> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> > ---
>> >   accel/tcg/cputlb.c | 9 +++++++--
>> >   1 file changed, 7 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
>> > index 047cd2cc0a..3b8d178707 100644
>> > --- a/accel/tcg/cputlb.c
>> > +++ b/accel/tcg/cputlb.c
>> > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
>> >                                  int mmu_idx, MMUAccessType type, uintptr_t ra)
>> >   {
>> >       MemoryRegionSection *section;
>> > +    bool locked = bql_locked();
>> >       MemoryRegion *mr;
>> >       hwaddr mr_offset;
>> >       MemTxAttrs attrs;
>> > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
>> >       section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra);
>> >       mr = section->mr;
>> >   
>> > -    bql_lock();
>> > +    if (!locked) {
>> > +        bql_lock();
>> > +    }
>> >       ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx,
>> >                             type, ra, mr, mr_offset);
>> > -    bql_unlock();
>> > +    if (!locked) {
>> > +        bql_unlock();
>> > +    }  
>> 
>> On top of other comments, I'm never keen on this type of test/lock/test/unlock.  When this 
>> kind of thing is encountered, it means we should have been using a recursive lock in the 
>> first place.
>
> Hi Richard,
>
> Whilst I agree this stuff is really ugly, is it practical to fix it
> for this case?

You can use:

  BQL_LOCK_GUARD();

which does all the recursive checking and clean-up and for free also
ensures you don't miss an unlock leg.

> Or was intent here to make a general comment on QEMU locking?
>
> Jonathan
>
>
>> 
>> 
>> r~

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro
Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory.
Posted by Peter Maydell 8 months, 2 weeks ago
On Thu, 15 Feb 2024 at 15:03, Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On i386, after fixing the page walking code to work with pages in
> MMIO memory (specifically CXL emulated interleaved memory),
> a crash was seen in an interrupt handling path.
>
> Useful part of bt

Did you intend to put in a backtrace here?

>
> Peter identified this as being due to the BQL already being
> held when the page table walker encounters MMIO memory and attempts
> to take the lock again.  There are other examples of similar paths
> TCG, so this follows the approach taken in those of simply checking
> if the lock is already held and if it is, don't take it again.
>
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  accel/tcg/cputlb.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 047cd2cc0a..3b8d178707 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
>                                 int mmu_idx, MMUAccessType type, uintptr_t ra)
>  {
>      MemoryRegionSection *section;
> +    bool locked = bql_locked();
>      MemoryRegion *mr;
>      hwaddr mr_offset;
>      MemTxAttrs attrs;
> @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
>      section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra);
>      mr = section->mr;
>
> -    bql_lock();
> +    if (!locked) {
> +        bql_lock();
> +    }
>      ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx,
>                            type, ra, mr, mr_offset);
> -    bql_unlock();
> +    if (!locked) {
> +        bql_unlock();
> +    }
>
>      return ret;
>  }

Can we do this consistently across all four functions
do_ld_mmio_beN, do_ld16_mmio_beN, do_st_mmio_leN,
do_st16_mmio_leN, please ? It happens that your workload
only needs to do an 8-byte load but conceptually the same
thing applies in all these cases.

thanks
-- PMM
Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory.
Posted by Jonathan Cameron via 8 months, 2 weeks ago
On Thu, 15 Feb 2024 16:11:26 +0000
Peter Maydell <peter.maydell@linaro.org> wrote:

> On Thu, 15 Feb 2024 at 15:03, Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On i386, after fixing the page walking code to work with pages in
> > MMIO memory (specifically CXL emulated interleaved memory),
> > a crash was seen in an interrupt handling path.
> >
> > Useful part of bt  
> 
> Did you intend to put in a backtrace here?
ah. Indeed.

Forgot that the # at start of a bt is a comment in a git message
oops.

I'll put those back in (hash removed) for v2.

7  0x0000555555ab1929 in bql_lock_impl (file=0x555556049122 "../../accel/tcg/cputlb.c", line=2033) at ../../system/cpus.c:524
8  bql_lock_impl (file=file@entry=0x555556049122 "../../accel/tcg/cputlb.c", line=line@entry=2033) at ../../system/cpus.c:520
9  0x0000555555c9f7d6 in do_ld_mmio_beN (cpu=0x5555578e0cb0, full=0x7ffe88012950, ret_be=ret_be@entry=0, addr=19595792376, size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2033
10 0x0000555555ca0fbd in do_ld_8 (cpu=cpu@entry=0x5555578e0cb0, p=p@entry=0x7ffff4efd1d0, mmu_idx=<optimized out>, type=type@entry=MMU_DATA_LOAD, memop=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:2356
11 0x0000555555ca341f in do_ld8_mmu (cpu=cpu@entry=0x5555578e0cb0, addr=addr@entry=19595792376, oi=oi@entry=52, ra=0, ra@entry=52, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2439
12 0x0000555555ca5f59 in cpu_ldq_mmu (ra=52, oi=52, addr=19595792376, env=0x5555578e3470) at ../../accel/tcg/ldst_common.c.inc:169
13 cpu_ldq_le_mmuidx_ra (env=0x5555578e3470, addr=19595792376, mmu_idx=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/ldst_common.c.inc:301
14 0x0000555555b4b5fc in ptw_ldq (ra=0, in=0x7ffff4efd320) at ../../target/i386/tcg/sysemu/excp_helper.c:98
15 ptw_ldq (ra=0, in=0x7ffff4efd320) at ../../target/i386/tcg/sysemu/excp_helper.c:93
16 mmu_translate (env=env@entry=0x5555578e3470, in=0x7ffff4efd3e0, out=0x7ffff4efd3b0, err=err@entry=0x7ffff4efd3c0, ra=ra@entry=0) at ../../target/i386/tcg/sysemu/excp_helper.c:174
17 0x0000555555b4c4b3 in get_physical_address (ra=0, err=0x7ffff4efd3c0, out=0x7ffff4efd3b0, mmu_idx=0, access_type=MMU_DATA_LOAD, addr=18446741874686299840, env=0x5555578e3470) at ../../target/i386/tcg/sysemu/excp_helper.c:580
18 x86_cpu_tlb_fill (cs=0x5555578e0cb0, addr=18446741874686299840, size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=0, probe=<optimized out>, retaddr=0) at ../../target/i386/tcg/sysemu/excp_helper.c:606
19 0x0000555555ca0ee9 in tlb_fill (retaddr=0, mmu_idx=0, access_type=MMU_DATA_LOAD, size=<optimized out>, addr=18446741874686299840, cpu=0x7ffff4efd540) at ../../accel/tcg/cputlb.c:1315
20 mmu_lookup1 (cpu=cpu@entry=0x5555578e0cb0, data=data@entry=0x7ffff4efd540, mmu_idx=0, access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at ../../accel/tcg/cputlb.c:1713
21 0x0000555555ca2c61 in mmu_lookup (cpu=cpu@entry=0x5555578e0cb0, addr=addr@entry=18446741874686299840, oi=oi@entry=32, ra=ra@entry=0, type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffff4efd540) at ../../accel/tcg/cputlb.c:1803
22 0x0000555555ca3165 in do_ld4_mmu (cpu=cpu@entry=0x5555578e0cb0, addr=addr@entry=18446741874686299840, oi=oi@entry=32, ra=ra@entry=0, access_type=access_type@entry=MMU_DATA_LOAD) at ../../accel/tcg/cputlb.c:2416
23 0x0000555555ca5ef9 in cpu_ldl_mmu (ra=0, oi=32, addr=18446741874686299840, env=0x5555578e3470) at ../../accel/tcg/ldst_common.c.inc:158
24 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555578e3470, addr=addr@entry=18446741874686299840, mmu_idx=<optimized out>, ra=ra@entry=0) at ../../accel/tcg/ldst_common.c.inc:294
25 0x0000555555bb6cdd in do_interrupt64 (is_hw=1, next_eip=18446744072399775809, error_code=0, is_int=0, intno=236, env=0x5555578e3470) at ../../target/i386/tcg/seg_helper.c:889
26 do_interrupt_all (cpu=cpu@entry=0x5555578e0cb0, intno=236, is_int=is_int@entry=0, error_code=error_code@entry=0, next_eip=next_eip@entry=0, is_hw=is_hw@entry=1) at ../../target/i386/tcg/seg_helper.c:1130
27 0x0000555555bb87da in do_interrupt_x86_hardirq (env=env@entry=0x5555578e3470, intno=<optimized out>, is_hw=is_hw@entry=1) at ../../target/i386/tcg/seg_helper.c:1162
28 0x0000555555b5039c in x86_cpu_exec_interrupt (cs=0x5555578e0cb0, interrupt_request=<optimized out>) at ../../target/i386/tcg/sysemu/seg_helper.c:197
29 0x0000555555c94480 in cpu_handle_interrupt (last_tb=<synthetic pointer>, cpu=0x5555578e0cb0) at ../../accel/tcg/cpu-exec.c:844
> 
> >
> > Peter identified this as being due to the BQL already being
> > held when the page table walker encounters MMIO memory and attempts
> > to take the lock again.  There are other examples of similar paths
> > TCG, so this follows the approach taken in those of simply checking
> > if the lock is already held and if it is, don't take it again.
> >
> > Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >  accel/tcg/cputlb.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> > index 047cd2cc0a..3b8d178707 100644
> > --- a/accel/tcg/cputlb.c
> > +++ b/accel/tcg/cputlb.c
> > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
> >                                 int mmu_idx, MMUAccessType type, uintptr_t ra)
> >  {
> >      MemoryRegionSection *section;
> > +    bool locked = bql_locked();
> >      MemoryRegion *mr;
> >      hwaddr mr_offset;
> >      MemTxAttrs attrs;
> > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
> >      section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra);
> >      mr = section->mr;
> >
> > -    bql_lock();
> > +    if (!locked) {
> > +        bql_lock();
> > +    }
> >      ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx,
> >                            type, ra, mr, mr_offset);
> > -    bql_unlock();
> > +    if (!locked) {
> > +        bql_unlock();
> > +    }
> >
> >      return ret;
> >  }  
> 
> Can we do this consistently across all four functions
> do_ld_mmio_beN, do_ld16_mmio_beN, do_st_mmio_leN,
> do_st16_mmio_leN, please ? It happens that your workload
> only needs to do an 8-byte load but conceptually the same
> thing applies in all these cases.

Sure,

Jonathan


> 
> thanks
> -- PMM
Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory.
Posted by Philippe Mathieu-Daudé 8 months, 2 weeks ago
On 15/2/24 16:01, Jonathan Cameron via wrote:
> On i386, after fixing the page walking code to work with pages in
> MMIO memory (specifically CXL emulated interleaved memory),
> a crash was seen in an interrupt handling path.
> 
> Useful part of bt
> 
> Peter identified this as being due to the BQL already being
> held when the page table walker encounters MMIO memory and attempts
> to take the lock again.  There are other examples of similar paths
> TCG, so this follows the approach taken in those of simply checking
> if the lock is already held and if it is, don't take it again.
> 
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>   accel/tcg/cputlb.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 047cd2cc0a..3b8d178707 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
>                                  int mmu_idx, MMUAccessType type, uintptr_t ra)
>   {
>       MemoryRegionSection *section;
> +    bool locked = bql_locked();

Maybe clearer as:

        bool need_lock = !bql_locked();

>       MemoryRegion *mr;
>       hwaddr mr_offset;
>       MemTxAttrs attrs;
> @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
>       section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra);
>       mr = section->mr;
>   
> -    bql_lock();
> +    if (!locked) {

if (unlikely(need_lock)) {

> +        bql_lock();
> +    }
>       ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx,
>                             type, ra, mr, mr_offset);
> -    bql_unlock();
> +    if (!locked) {

Ditto.

> +        bql_unlock();
> +    }
>   
>       return ret;
>   }
Re: [PATCH 3/3] tcg: Avoid double lock if page tables happen to be in mmio memory.
Posted by Jonathan Cameron via 8 months, 2 weeks ago
On Thu, 15 Feb 2024 16:33:45 +0100
Philippe Mathieu-Daudé <philmd@linaro.org> wrote:

> On 15/2/24 16:01, Jonathan Cameron via wrote:
> > On i386, after fixing the page walking code to work with pages in
> > MMIO memory (specifically CXL emulated interleaved memory),
> > a crash was seen in an interrupt handling path.
> > 
> > Useful part of bt
> > 
> > Peter identified this as being due to the BQL already being
> > held when the page table walker encounters MMIO memory and attempts
> > to take the lock again.  There are other examples of similar paths
> > TCG, so this follows the approach taken in those of simply checking
> > if the lock is already held and if it is, don't take it again.
> > 
> > Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >   accel/tcg/cputlb.c | 9 +++++++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> > index 047cd2cc0a..3b8d178707 100644
> > --- a/accel/tcg/cputlb.c
> > +++ b/accel/tcg/cputlb.c
> > @@ -2019,6 +2019,7 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
> >                                  int mmu_idx, MMUAccessType type, uintptr_t ra)
> >   {
> >       MemoryRegionSection *section;
> > +    bool locked = bql_locked();  
> 
> Maybe clearer as:
> 
>         bool need_lock = !bql_locked();
> 
> >       MemoryRegion *mr;
> >       hwaddr mr_offset;
> >       MemTxAttrs attrs;
> > @@ -2030,10 +2031,14 @@ static uint64_t do_ld_mmio_beN(CPUState *cpu, CPUTLBEntryFull *full,
> >       section = io_prepare(&mr_offset, cpu, full->xlat_section, attrs, addr, ra);
> >       mr = section->mr;
> >   
> > -    bql_lock();
> > +    if (!locked) {  
> 
> if (unlikely(need_lock)) {

Isn't this reversed? Until now we've always taken the lock here so I'm guessing
it normally is needed. if (likely(need_lock))?
if we are going to mark it one way or the other.

> 
> > +        bql_lock();
> > +    }
> >       ret = int_ld_mmio_beN(cpu, full, ret_be, addr, size, mmu_idx,
> >                             type, ra, mr, mr_offset);
> > -    bql_unlock();
> > +    if (!locked) {  
> 
> Ditto.
> 
> > +        bql_unlock();
> > +    }
> >   
> >       return ret;
> >   }  
>