target/m68k: implement unaligned accesses for m68k CPUs

[PATCH 2/2] target/m68k: pass alignment into TCG memory load/store routines

Posted by Mark Cave-Ayland 1 year, 7 months ago

Now that do_unaligned_access has been implemented for 68k CPUs, pass the required
alignment into the TCG memory load/store routines. This allows the TCG memory core
to generate an Address Error exception for unaligned memory accesses if required.

Suggested-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2165
---
 target/m68k/translate.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 445966fb6a..661a7b4def 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -303,13 +303,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv addr,
                             int sign, int index)
 {
     TCGv tmp = tcg_temp_new_i32();
+    MemOp memop = opsize | (sign ? MO_SIGN : 0) | MO_TE;
 
     switch (opsize) {
     case OS_BYTE:
+        tcg_gen_qemu_ld_tl(tmp, addr, index, memop);
+        break;
     case OS_WORD:
     case OS_LONG:
-        tcg_gen_qemu_ld_tl(tmp, addr, index,
-                           opsize | (sign ? MO_SIGN : 0) | MO_TE);
+        if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) {
+            memop |= MO_ALIGN_2;
+        }
+        tcg_gen_qemu_ld_tl(tmp, addr, index, memop);
         break;
     default:
         g_assert_not_reached();
@@ -321,11 +326,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv addr,
 static inline void gen_store(DisasContext *s, int opsize, TCGv addr, TCGv val,
                              int index)
 {
+    MemOp memop = opsize | MO_TE;
+
     switch (opsize) {
     case OS_BYTE:
+        tcg_gen_qemu_st_tl(val, addr, index, memop);
+        break;
     case OS_WORD:
     case OS_LONG:
-        tcg_gen_qemu_st_tl(val, addr, index, opsize | MO_TE);
+        if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) {
+            memop |= MO_ALIGN_2;
+        }
+        tcg_gen_qemu_st_tl(val, addr, index, memop);
         break;
     default:
         g_assert_not_reached();
-- 
2.39.2

Re: [PATCH 2/2] target/m68k: pass alignment into TCG memory load/store routines

Posted by BALATON Zoltan 1 year, 7 months ago

On Sun, 23 Jun 2024, Mark Cave-Ayland wrote:
> Now that do_unaligned_access has been implemented for 68k CPUs, pass the required
> alignment into the TCG memory load/store routines. This allows the TCG memory core
> to generate an Address Error exception for unaligned memory accesses if required.
>
> Suggested-by: Laurent Vivier <laurent@vivier.eu>
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2165
> ---
> target/m68k/translate.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/target/m68k/translate.c b/target/m68k/translate.c
> index 445966fb6a..661a7b4def 100644
> --- a/target/m68k/translate.c
> +++ b/target/m68k/translate.c
> @@ -303,13 +303,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv addr,
>                             int sign, int index)
> {
>     TCGv tmp = tcg_temp_new_i32();
> +    MemOp memop = opsize | (sign ? MO_SIGN : 0) | MO_TE;
>
>     switch (opsize) {
>     case OS_BYTE:
> +        tcg_gen_qemu_ld_tl(tmp, addr, index, memop);
> +        break;
>     case OS_WORD:
>     case OS_LONG:
> -        tcg_gen_qemu_ld_tl(tmp, addr, index,
> -                           opsize | (sign ? MO_SIGN : 0) | MO_TE);
> +        if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) {
> +            memop |= MO_ALIGN_2;
> +        }
> +        tcg_gen_qemu_ld_tl(tmp, addr, index, memop);

You could swap the order of these so byte comes last and fall through to 
it from word/long to avoid duplicated line.

Maybe this answers my question about where it's restriced by CPU type. I 
wonder if this check for M68K_FEATURE_UNALIGNED_DATA could be avoded here 
and done by checking it in init and only set the unaligned method for CPUs 
that need it to not add overhead for most CPUs that don't need it.

Regards,
BALATON Zoltan

>         break;
>     default:
>         g_assert_not_reached();
> @@ -321,11 +326,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv addr,
> static inline void gen_store(DisasContext *s, int opsize, TCGv addr, TCGv val,
>                              int index)
> {
> +    MemOp memop = opsize | MO_TE;
> +
>     switch (opsize) {
>     case OS_BYTE:
> +        tcg_gen_qemu_st_tl(val, addr, index, memop);
> +        break;
>     case OS_WORD:
>     case OS_LONG:
> -        tcg_gen_qemu_st_tl(val, addr, index, opsize | MO_TE);
> +        if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) {
> +            memop |= MO_ALIGN_2;
> +        }
> +        tcg_gen_qemu_st_tl(val, addr, index, memop);
>         break;
>     default:
>         g_assert_not_reached();
>

Re: [PATCH 2/2] target/m68k: pass alignment into TCG memory load/store routines

Posted by Mark Cave-Ayland 1 year, 7 months ago

On 23/06/2024 16:23, BALATON Zoltan wrote:

> On Sun, 23 Jun 2024, Mark Cave-Ayland wrote:
>> Now that do_unaligned_access has been implemented for 68k CPUs, pass the required
>> alignment into the TCG memory load/store routines. This allows the TCG memory core
>> to generate an Address Error exception for unaligned memory accesses if required.
>>
>> Suggested-by: Laurent Vivier <laurent@vivier.eu>
>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2165
>> ---
>> target/m68k/translate.c | 18 +++++++++++++++---
>> 1 file changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/m68k/translate.c b/target/m68k/translate.c
>> index 445966fb6a..661a7b4def 100644
>> --- a/target/m68k/translate.c
>> +++ b/target/m68k/translate.c
>> @@ -303,13 +303,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv 
>> addr,
>>                             int sign, int index)
>> {
>>     TCGv tmp = tcg_temp_new_i32();
>> +    MemOp memop = opsize | (sign ? MO_SIGN : 0) | MO_TE;
>>
>>     switch (opsize) {
>>     case OS_BYTE:
>> +        tcg_gen_qemu_ld_tl(tmp, addr, index, memop);
>> +        break;
>>     case OS_WORD:
>>     case OS_LONG:
>> -        tcg_gen_qemu_ld_tl(tmp, addr, index,
>> -                           opsize | (sign ? MO_SIGN : 0) | MO_TE);
>> +        if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) {
>> +            memop |= MO_ALIGN_2;
>> +        }
>> +        tcg_gen_qemu_ld_tl(tmp, addr, index, memop);
> 
> You could swap the order of these so byte comes last and fall through to it from 
> word/long to avoid duplicated line.
> 
> Maybe this answers my question about where it's restriced by CPU type. I wonder if 
> this check for M68K_FEATURE_UNALIGNED_DATA could be avoded here and done by checking 
> it in init and only set the unaligned method for CPUs that need it to not add 
> overhead for most CPUs that don't need it.

I don't think that it matters too much if the method isn't implemented as the logic 
surrounding when to call do_unaligned_access is contained within the TCG core.

I'll have a go at updating the ordering and send a v2 if it looks good.


ATB,

Mark.

Re: [PATCH 2/2] target/m68k: pass alignment into TCG memory load/store routines

Posted by BALATON Zoltan 1 year, 7 months ago

On Mon, 24 Jun 2024, Mark Cave-Ayland wrote:
> On 23/06/2024 16:23, BALATON Zoltan wrote:
>> On Sun, 23 Jun 2024, Mark Cave-Ayland wrote:
>>> Now that do_unaligned_access has been implemented for 68k CPUs, pass the 
>>> required
>>> alignment into the TCG memory load/store routines. This allows the TCG 
>>> memory core
>>> to generate an Address Error exception for unaligned memory accesses if 
>>> required.
>>> 
>>> Suggested-by: Laurent Vivier <laurent@vivier.eu>
>>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2165
>>> ---
>>> target/m68k/translate.c | 18 +++++++++++++++---
>>> 1 file changed, 15 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/target/m68k/translate.c b/target/m68k/translate.c
>>> index 445966fb6a..661a7b4def 100644
>>> --- a/target/m68k/translate.c
>>> +++ b/target/m68k/translate.c
>>> @@ -303,13 +303,18 @@ static inline TCGv gen_load(DisasContext *s, int 
>>> opsize, TCGv addr,
>>>                             int sign, int index)
>>> {
>>>     TCGv tmp = tcg_temp_new_i32();
>>> +    MemOp memop = opsize | (sign ? MO_SIGN : 0) | MO_TE;
>>> 
>>>     switch (opsize) {
>>>     case OS_BYTE:
>>> +        tcg_gen_qemu_ld_tl(tmp, addr, index, memop);
>>> +        break;
>>>     case OS_WORD:
>>>     case OS_LONG:
>>> -        tcg_gen_qemu_ld_tl(tmp, addr, index,
>>> -                           opsize | (sign ? MO_SIGN : 0) | MO_TE);
>>> +        if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) {
>>> +            memop |= MO_ALIGN_2;
>>> +        }
>>> +        tcg_gen_qemu_ld_tl(tmp, addr, index, memop);
>> 
>> You could swap the order of these so byte comes last and fall through to it 
>> from word/long to avoid duplicated line.
>> 
>> Maybe this answers my question about where it's restriced by CPU type. I 
>> wonder if this check for M68K_FEATURE_UNALIGNED_DATA could be avoded here 
>> and done by checking it in init and only set the unaligned method for CPUs 
>> that need it to not add overhead for most CPUs that don't need it.
>
> I don't think that it matters too much if the method isn't implemented as the 
> logic surrounding when to call do_unaligned_access is contained within the 
> TCG core.

I've seen this after I've sent a patch for PPC where removing a 
conditional from a helper often called for memory access showed it had an 
effect on performance. So I thought adding a conditional here might cause 
some slow down for CPUs that don't need this. But this is compile time so 
maybe it's not that big problem as in a hepler. Yet only adding the 
unaligned method for CPUs then always set MO_ALIGN here avoiding the if 
only calls the method when needed if that works at all. I don't know if 
that's worth it, you could test with some memory intensive benchmark such 
as STREAM benchmark to check if this has any effect.

Regards,
BALATON Zotan

> I'll have a go at updating the ordering and send a v2 if it looks good.
>
>
> ATB,
>
> Mark.
>
>

Re: [PATCH 2/2] target/m68k: pass alignment into TCG memory load/store routines

Posted by Richard Henderson 1 year, 7 months ago

On 6/23/24 08:23, BALATON Zoltan wrote:
> On Sun, 23 Jun 2024, Mark Cave-Ayland wrote:
>> Now that do_unaligned_access has been implemented for 68k CPUs, pass the required
>> alignment into the TCG memory load/store routines. This allows the TCG memory core
>> to generate an Address Error exception for unaligned memory accesses if required.
>>
>> Suggested-by: Laurent Vivier <laurent@vivier.eu>
>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2165
>> ---
>> target/m68k/translate.c | 18 +++++++++++++++---
>> 1 file changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/m68k/translate.c b/target/m68k/translate.c
>> index 445966fb6a..661a7b4def 100644
>> --- a/target/m68k/translate.c
>> +++ b/target/m68k/translate.c
>> @@ -303,13 +303,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv addr,
>>                             int sign, int index)
>> {
>>     TCGv tmp = tcg_temp_new_i32();
>> +    MemOp memop = opsize | (sign ? MO_SIGN : 0) | MO_TE;
>>
>>     switch (opsize) {
>>     case OS_BYTE:
>> +        tcg_gen_qemu_ld_tl(tmp, addr, index, memop);
>> +        break;
>>     case OS_WORD:
>>     case OS_LONG:
>> -        tcg_gen_qemu_ld_tl(tmp, addr, index,
>> -                           opsize | (sign ? MO_SIGN : 0) | MO_TE);
>> +        if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) {
>> +            memop |= MO_ALIGN_2;
>> +        }
>> +        tcg_gen_qemu_ld_tl(tmp, addr, index, memop);
> 
> You could swap the order of these so byte comes last and fall through to it from word/long 
> to avoid duplicated line.
> 
> Maybe this answers my question about where it's restriced by CPU type. I wonder if this 
> check for M68K_FEATURE_UNALIGNED_DATA could be avoded here and done by checking it in init 
> and only set the unaligned method for CPUs that need it to not add overhead for most CPUs 
> that don't need it.

No, there's no overhead in having the unaligned method present always.

But swapping the order of case labels, or sinking the tcg_gen_qemu_ld_tl call below the 
switch, is a good idea.


r~