[PULL 08/41] target/arm: Set CTR_EL0.{IDC,DIC} for the 'max' CPU

Maintainers: Alexandre Iooss <erdnaxe@crans.org>, Peter Maydell <peter.maydell@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Arnaud Minier <arnaud.minier@telecom-paris.fr>, "Inès Varhol" <ines.varhol@telecom-paris.fr>, Jean-Christophe Dubois <jcd@tribudubois.net>, Subbaraya Sundeep <sundeep.lkml@gmail.com>, Alistair Francis <alistair@alistair23.me>, Felipe Balbi <balbi@kernel.org>
There is a newer version of this series
[PULL 08/41] target/arm: Set CTR_EL0.{IDC,DIC} for the 'max' CPU
Posted by Peter Maydell 10 months ago
The CTR_EL0 register has some bits which allow the implementation to
tell the guest that it does not need to do cache maintenance for
data-to-instruction coherence and instruction-to-data coherence.
QEMU doesn't emulate caches and so our cache maintenance insns are
all NOPs.

We already have some models of specific CPUs where we set these bits
(e.g.  the Neoverse V1), but the 'max' CPU still uses the settings it
inherits from Cortex-A57.  Set the bits for 'max' as well, so the
guest doesn't need to do unnecessary work.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
---
 target/arm/tcg/cpu64.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index fcda99e1583..40e7a45166f 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -1105,6 +1105,16 @@ void aarch64_max_tcg_initfn(Object *obj)
     u = FIELD_DP32(u, CLIDR_EL1, LOUU, 0);
     cpu->clidr = u;
 
+    /*
+     * Set CTR_EL0.DIC and IDC to tell the guest it doesnt' need to
+     * do any cache maintenance for data-to-instruction or
+     * instruction-to-guest coherence. (Our cache ops are nops.)
+     */
+    t = cpu->ctr;
+    t = FIELD_DP64(t, CTR_EL0, IDC, 1);
+    t = FIELD_DP64(t, CTR_EL0, DIC, 1);
+    cpu->ctr = t;
+
     t = cpu->isar.id_aa64isar0;
     t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2);      /* FEAT_PMULL */
     t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);     /* FEAT_SHA1 */
-- 
2.34.1
Re: [PULL 08/41] target/arm: Set CTR_EL0.{IDC,DIC} for the 'max' CPU
Posted by Fabiano Rosas 10 months ago
Peter Maydell <peter.maydell@linaro.org> writes:

> The CTR_EL0 register has some bits which allow the implementation to
> tell the guest that it does not need to do cache maintenance for
> data-to-instruction coherence and instruction-to-data coherence.
> QEMU doesn't emulate caches and so our cache maintenance insns are
> all NOPs.
>
> We already have some models of specific CPUs where we set these bits
> (e.g.  the Neoverse V1), but the 'max' CPU still uses the settings it
> inherits from Cortex-A57.  Set the bits for 'max' as well, so the
> guest doesn't need to do unnecessary work.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Tested-by: Miguel Luis <miguel.luis@oracle.com>
> ---
>  target/arm/tcg/cpu64.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
> index fcda99e1583..40e7a45166f 100644
> --- a/target/arm/tcg/cpu64.c
> +++ b/target/arm/tcg/cpu64.c
> @@ -1105,6 +1105,16 @@ void aarch64_max_tcg_initfn(Object *obj)
>      u = FIELD_DP32(u, CLIDR_EL1, LOUU, 0);
>      cpu->clidr = u;
>  
> +    /*
> +     * Set CTR_EL0.DIC and IDC to tell the guest it doesnt' need to
> +     * do any cache maintenance for data-to-instruction or
> +     * instruction-to-guest coherence. (Our cache ops are nops.)
> +     */
> +    t = cpu->ctr;
> +    t = FIELD_DP64(t, CTR_EL0, IDC, 1);
> +    t = FIELD_DP64(t, CTR_EL0, DIC, 1);
> +    cpu->ctr = t;
> +
>      t = cpu->isar.id_aa64isar0;
>      t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2);      /* FEAT_PMULL */
>      t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);     /* FEAT_SHA1 */

Hi, we're introducing new regression tests to migration and this patch
shows up in the bisect of an issue. I need some help figuring out
whether this is an actual regression or something else.

The migration is TCG QEMU 8.2.0 -> TCG QEMU master.

On the destination side (contains this patch) we're hitting this
condition:

bool write_list_to_cpustate(ARMCPU *cpu)
{
...
        /*
         * Write value and confirm it reads back as written
         * (to catch read-only registers and partially read-only
         * registers where the incoming migration value doesn't match)
         */
        write_raw_cp_reg(&cpu->env, ri, v);
        if (read_raw_cp_reg(&cpu->env, ri) != v) {
--->        ok = false;
        }

Thread 1 "qemu-system-aar" hit Breakpoint 3, write_list_to_cpustate (cpu=0x555557a2f8f0) at ../target/arm/helper.c:190
190                 ok = false;
(gdb) p ri->name
$7 = 0x555557ab9ae0 "CTR"
(gdb) p/x v
$3 = 0x8444c004
(gdb) p/x read_raw_cp_reg(&cpu->env, ri)
$4 = 0xb444c004

Is there any particularity in reading/writing to that register? This is
during post_load and 'v' is what came in the migration stream.
Re: [PULL 08/41] target/arm: Set CTR_EL0.{IDC,DIC} for the 'max' CPU
Posted by Peter Maydell 9 months, 4 weeks ago
On Wed, 17 Jan 2024 at 21:24, Fabiano Rosas <farosas@suse.de> wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> > diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
> > index fcda99e1583..40e7a45166f 100644
> > --- a/target/arm/tcg/cpu64.c
> > +++ b/target/arm/tcg/cpu64.c
> > @@ -1105,6 +1105,16 @@ void aarch64_max_tcg_initfn(Object *obj)
> >      u = FIELD_DP32(u, CLIDR_EL1, LOUU, 0);
> >      cpu->clidr = u;
> >
> > +    /*
> > +     * Set CTR_EL0.DIC and IDC to tell the guest it doesnt' need to
> > +     * do any cache maintenance for data-to-instruction or
> > +     * instruction-to-guest coherence. (Our cache ops are nops.)
> > +     */
> > +    t = cpu->ctr;
> > +    t = FIELD_DP64(t, CTR_EL0, IDC, 1);
> > +    t = FIELD_DP64(t, CTR_EL0, DIC, 1);
> > +    cpu->ctr = t;
> > +
> >      t = cpu->isar.id_aa64isar0;
> >      t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2);      /* FEAT_PMULL */
> >      t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);     /* FEAT_SHA1 */
>
> Hi, we're introducing new regression tests to migration and this patch
> shows up in the bisect of an issue. I need some help figuring out
> whether this is an actual regression or something else.
>
> The migration is TCG QEMU 8.2.0 -> TCG QEMU master.
>
> On the destination side (contains this patch) we're hitting this
> condition:
>
> bool write_list_to_cpustate(ARMCPU *cpu)
> {
> ...
>         /*
>          * Write value and confirm it reads back as written
>          * (to catch read-only registers and partially read-only
>          * registers where the incoming migration value doesn't match)
>          */
>         write_raw_cp_reg(&cpu->env, ri, v);
>         if (read_raw_cp_reg(&cpu->env, ri) != v) {
> --->        ok = false;
>         }

This is (among other things) effectively checking that the
source and destination CPU agree about the values of constant
registers like the ID registers, of which this is one.

The "max" CPU is a moving target, so you shouldn't expect
to be able to migrate across QEMU versions using it:
it can have different features and thus different ID
register values between versions (as well as potentially
different actual-register-state if the added feature adds
new state).

thanks
-- PMM
Re: [PULL 08/41] target/arm: Set CTR_EL0.{IDC,DIC} for the 'max' CPU
Posted by Fabiano Rosas 9 months, 4 weeks ago
Peter Maydell <peter.maydell@linaro.org> writes:

> On Wed, 17 Jan 2024 at 21:24, Fabiano Rosas <farosas@suse.de> wrote:
>> Peter Maydell <peter.maydell@linaro.org> writes:
>> > diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
>> > index fcda99e1583..40e7a45166f 100644
>> > --- a/target/arm/tcg/cpu64.c
>> > +++ b/target/arm/tcg/cpu64.c
>> > @@ -1105,6 +1105,16 @@ void aarch64_max_tcg_initfn(Object *obj)
>> >      u = FIELD_DP32(u, CLIDR_EL1, LOUU, 0);
>> >      cpu->clidr = u;
>> >
>> > +    /*
>> > +     * Set CTR_EL0.DIC and IDC to tell the guest it doesnt' need to
>> > +     * do any cache maintenance for data-to-instruction or
>> > +     * instruction-to-guest coherence. (Our cache ops are nops.)
>> > +     */
>> > +    t = cpu->ctr;
>> > +    t = FIELD_DP64(t, CTR_EL0, IDC, 1);
>> > +    t = FIELD_DP64(t, CTR_EL0, DIC, 1);
>> > +    cpu->ctr = t;
>> > +
>> >      t = cpu->isar.id_aa64isar0;
>> >      t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2);      /* FEAT_PMULL */
>> >      t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);     /* FEAT_SHA1 */
>>
>> Hi, we're introducing new regression tests to migration and this patch
>> shows up in the bisect of an issue. I need some help figuring out
>> whether this is an actual regression or something else.
>>
>> The migration is TCG QEMU 8.2.0 -> TCG QEMU master.
>>
>> On the destination side (contains this patch) we're hitting this
>> condition:
>>
>> bool write_list_to_cpustate(ARMCPU *cpu)
>> {
>> ...
>>         /*
>>          * Write value and confirm it reads back as written
>>          * (to catch read-only registers and partially read-only
>>          * registers where the incoming migration value doesn't match)
>>          */
>>         write_raw_cp_reg(&cpu->env, ri, v);
>>         if (read_raw_cp_reg(&cpu->env, ri) != v) {
>> --->        ok = false;
>>         }
>
> This is (among other things) effectively checking that the
> source and destination CPU agree about the values of constant
> registers like the ID registers, of which this is one.
>
> The "max" CPU is a moving target, so you shouldn't expect
> to be able to migrate across QEMU versions using it:
> it can have different features and thus different ID
> register values between versions (as well as potentially
> different actual-register-state if the added feature adds
> new state).

We're adding a test for migrating across QEMU versions to stop these
sort of breakages happening. If 'max' is not suitable for that scenario,
would there be another cpu you recommend to use? Otherwise aarch64 will
have to remain uncovered for migration across QEMU versions.

There's no clear statement on what kinds of combinations we support for
migration, but anything that's actually tested is more likely to be
considered supported.

Note that we're only targeting 'n-1 -> n' and 'n -> n-1' migrations,
where n-1 is the latest release and n is the development branch.
Re: [PULL 08/41] target/arm: Set CTR_EL0.{IDC,DIC} for the 'max' CPU
Posted by Peter Maydell 9 months, 4 weeks ago
On Thu, 18 Jan 2024 at 12:58, Fabiano Rosas <farosas@suse.de> wrote:
>
> Peter Maydell <peter.maydell@linaro.org> writes:
>
> > On Wed, 17 Jan 2024 at 21:24, Fabiano Rosas <farosas@suse.de> wrote:
> >> Peter Maydell <peter.maydell@linaro.org> writes:
> > The "max" CPU is a moving target, so you shouldn't expect
> > to be able to migrate across QEMU versions using it:
> > it can have different features and thus different ID
> > register values between versions (as well as potentially
> > different actual-register-state if the added feature adds
> > new state).
>
> We're adding a test for migrating across QEMU versions to stop these
> sort of breakages happening. If 'max' is not suitable for that scenario,
> would there be another cpu you recommend to use? Otherwise aarch64 will
> have to remain uncovered for migration across QEMU versions.
>
> There's no clear statement on what kinds of combinations we support for
> migration, but anything that's actually tested is more likely to be
> considered supported.

Any fixed defined CPU is a reasonable choice. I suggest neoverse-n1.

I don't think 'max' is expected to migrate cross-version on
any CPU architecture; this isn't Arm-specific.

thanks
-- PMM