[Qemu-devel] [PATCH] accel/tcg: Consider cluster index in tb_lookup__cpu_state()

Peter Maydell posted 1 patch 5 years, 1 month ago
Test docker-mingw@fedora passed
Test asan failed
Test checkpatch passed
Test docker-clang@ubuntu passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190205151810.571-1-peter.maydell@linaro.org
Maintainers: Richard Henderson <rth@twiddle.net>, Paolo Bonzini <pbonzini@redhat.com>, Peter Crosthwaite <crosthwaite.peter@gmail.com>
include/exec/tb-lookup.h | 4 ++++
accel/tcg/cpu-exec.c     | 3 ---
2 files changed, 4 insertions(+), 3 deletions(-)
[Qemu-devel] [PATCH] accel/tcg: Consider cluster index in tb_lookup__cpu_state()
Posted by Peter Maydell 5 years, 1 month ago
In commit f7b78602fdc6c6e4be we added the CPU cluster number to the
cflags field of the TB hash; this included adding it to the value
kept in tb->cflags, since we pass that field directly into the hash
calculation in some places. Unfortunately we forgot to check whether
other parts of the code were doing comparisons against tb->cflags
that would need to be updated.

It turns out that there is exactly one such place: the
tb_lookup__cpu_state() function checks whether the TB it has
found in the tb_jmp_cache has a tb->cflags matching the cf_mask
that is passed in. The tb->cflags has the cluster_index in it
but the cf_mask does not.

Hoist the "add cluster index to the cf_mask" code up from
tb_htable_lookup() to tb_lookup__cpu_state() so it can be considered
in the "did this TB match in the jmp cache" condition, as well as
when we do the full hash lookup by physical PC, flags, etc.
(tb_htable_lookup() is only called from tb_lookup__cpu_state(),
so this change doesn't require any further knock-on changes.)

Fixes: f7b78602fdc6c6e4be ("accel/tcg: Add cluster number to TCG TB hash")
Reported-by: Howard Spoelstra <hsp.cat7@gmail.com>
Reported-by: Cleber Rosa <crosa@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
Does anybody know why tb_lookup__cpu_state() has that odd
double-underscore in the middle of its name?

Since the jmp_cache is per-vcpu we know that we're always going
to match on the cluster_index, so the other option would be to
leave the cluster_index bits out of the comparison, and leave the
"fold in cluster index to cf_mask" code in tb_htable_lookup().
Or we could require the callers of tb_lookup__cpu_state() to all
provide the cluster index, but that's more places to change,
so I prefer this.
---
 include/exec/tb-lookup.h | 4 ++++
 accel/tcg/cpu-exec.c     | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h
index 492cb682894..26921b6dafd 100644
--- a/include/exec/tb-lookup.h
+++ b/include/exec/tb-lookup.h
@@ -28,6 +28,10 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base,
     cpu_get_tb_cpu_state(env, pc, cs_base, flags);
     hash = tb_jmp_cache_hash_func(*pc);
     tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]);
+
+    cf_mask &= ~CF_CLUSTER_MASK;
+    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
+
     if (likely(tb &&
                tb->pc == *pc &&
                tb->cs_base == *cs_base &&
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 7cf1292546f..60d87d5a19b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -325,9 +325,6 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
     struct tb_desc desc;
     uint32_t h;
 
-    cf_mask &= ~CF_CLUSTER_MASK;
-    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
-
     desc.env = (CPUArchState *)cpu->env_ptr;
     desc.cs_base = cs_base;
     desc.flags = flags;
-- 
2.20.1


Re: [Qemu-devel] [PATCH] accel/tcg: Consider cluster index in tb_lookup__cpu_state()
Posted by Cleber Rosa 5 years, 1 month ago

On 2/5/19 10:18 AM, Peter Maydell wrote:
> In commit f7b78602fdc6c6e4be we added the CPU cluster number to the
> cflags field of the TB hash; this included adding it to the value
> kept in tb->cflags, since we pass that field directly into the hash
> calculation in some places. Unfortunately we forgot to check whether
> other parts of the code were doing comparisons against tb->cflags
> that would need to be updated.
> 
> It turns out that there is exactly one such place: the
> tb_lookup__cpu_state() function checks whether the TB it has
> found in the tb_jmp_cache has a tb->cflags matching the cf_mask
> that is passed in. The tb->cflags has the cluster_index in it
> but the cf_mask does not.
> 
> Hoist the "add cluster index to the cf_mask" code up from
> tb_htable_lookup() to tb_lookup__cpu_state() so it can be considered
> in the "did this TB match in the jmp cache" condition, as well as
> when we do the full hash lookup by physical PC, flags, etc.
> (tb_htable_lookup() is only called from tb_lookup__cpu_state(),
> so this change doesn't require any further knock-on changes.)
> 
> Fixes: f7b78602fdc6c6e4be ("accel/tcg: Add cluster number to TCG TB hash")
> Reported-by: Howard Spoelstra <hsp.cat7@gmail.com>
> Reported-by: Cleber Rosa <crosa@redhat.com>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> Does anybody know why tb_lookup__cpu_state() has that odd
> double-underscore in the middle of its name?
> 
> Since the jmp_cache is per-vcpu we know that we're always going
> to match on the cluster_index, so the other option would be to
> leave the cluster_index bits out of the comparison, and leave the
> "fold in cluster index to cf_mask" code in tb_htable_lookup().
> Or we could require the callers of tb_lookup__cpu_state() to all
> provide the cluster index, but that's more places to change,
> so I prefer this.

I can confirm the performance regression I experienced is fixed by this
patch.

Tested-by: Cleber Rosa <crosa@redhat.com>

Re: [Qemu-devel] [PATCH] accel/tcg: Consider cluster index in tb_lookup__cpu_state()
Posted by Mark Cave-Ayland 5 years, 1 month ago
On 05/02/2019 15:18, Peter Maydell wrote:

> In commit f7b78602fdc6c6e4be we added the CPU cluster number to the
> cflags field of the TB hash; this included adding it to the value
> kept in tb->cflags, since we pass that field directly into the hash
> calculation in some places. Unfortunately we forgot to check whether
> other parts of the code were doing comparisons against tb->cflags
> that would need to be updated.
> 
> It turns out that there is exactly one such place: the
> tb_lookup__cpu_state() function checks whether the TB it has
> found in the tb_jmp_cache has a tb->cflags matching the cf_mask
> that is passed in. The tb->cflags has the cluster_index in it
> but the cf_mask does not.
> 
> Hoist the "add cluster index to the cf_mask" code up from
> tb_htable_lookup() to tb_lookup__cpu_state() so it can be considered
> in the "did this TB match in the jmp cache" condition, as well as
> when we do the full hash lookup by physical PC, flags, etc.
> (tb_htable_lookup() is only called from tb_lookup__cpu_state(),
> so this change doesn't require any further knock-on changes.)
> 
> Fixes: f7b78602fdc6c6e4be ("accel/tcg: Add cluster number to TCG TB hash")
> Reported-by: Howard Spoelstra <hsp.cat7@gmail.com>
> Reported-by: Cleber Rosa <crosa@redhat.com>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> Does anybody know why tb_lookup__cpu_state() has that odd
> double-underscore in the middle of its name?
> 
> Since the jmp_cache is per-vcpu we know that we're always going
> to match on the cluster_index, so the other option would be to
> leave the cluster_index bits out of the comparison, and leave the
> "fold in cluster index to cf_mask" code in tb_htable_lookup().
> Or we could require the callers of tb_lookup__cpu_state() to all
> provide the cluster index, but that's more places to change,
> so I prefer this.
> ---
>  include/exec/tb-lookup.h | 4 ++++
>  accel/tcg/cpu-exec.c     | 3 ---
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h
> index 492cb682894..26921b6dafd 100644
> --- a/include/exec/tb-lookup.h
> +++ b/include/exec/tb-lookup.h
> @@ -28,6 +28,10 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base,
>      cpu_get_tb_cpu_state(env, pc, cs_base, flags);
>      hash = tb_jmp_cache_hash_func(*pc);
>      tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]);
> +
> +    cf_mask &= ~CF_CLUSTER_MASK;
> +    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> +
>      if (likely(tb &&
>                 tb->pc == *pc &&
>                 tb->cs_base == *cs_base &&
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 7cf1292546f..60d87d5a19b 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -325,9 +325,6 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
>      struct tb_desc desc;
>      uint32_t h;
>  
> -    cf_mask &= ~CF_CLUSTER_MASK;
> -    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> -
>      desc.env = (CPUArchState *)cpu->env_ptr;
>      desc.cs_base = cs_base;
>      desc.flags = flags;
> 

Looks good to me: without performing a detailed benchmark, with this patch applied
the performance seems to be back to where it was before f7b78602fdc6c6e4be was merged.

Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>


ATB,

Mark.

Re: [Qemu-devel] [PATCH] accel/tcg: Consider cluster index in tb_lookup__cpu_state()
Posted by Howard Spoelstra 5 years, 1 month ago
On Tue, Feb 5, 2019 at 6:09 PM Mark Cave-Ayland <
mark.cave-ayland@ilande.co.uk> wrote:

> On 05/02/2019 15:18, Peter Maydell wrote:
>
> > In commit f7b78602fdc6c6e4be we added the CPU cluster number to the
> > cflags field of the TB hash; this included adding it to the value
> > kept in tb->cflags, since we pass that field directly into the hash
> > calculation in some places. Unfortunately we forgot to check whether
> > other parts of the code were doing comparisons against tb->cflags
> > that would need to be updated.
> >
> > It turns out that there is exactly one such place: the
> > tb_lookup__cpu_state() function checks whether the TB it has
> > found in the tb_jmp_cache has a tb->cflags matching the cf_mask
> > that is passed in. The tb->cflags has the cluster_index in it
> > but the cf_mask does not.
> >
> > Hoist the "add cluster index to the cf_mask" code up from
> > tb_htable_lookup() to tb_lookup__cpu_state() so it can be considered
> > in the "did this TB match in the jmp cache" condition, as well as
> > when we do the full hash lookup by physical PC, flags, etc.
> > (tb_htable_lookup() is only called from tb_lookup__cpu_state(),
> > so this change doesn't require any further knock-on changes.)
> >
> > Fixes: f7b78602fdc6c6e4be ("accel/tcg: Add cluster number to TCG TB
> hash")
> > Reported-by: Howard Spoelstra <hsp.cat7@gmail.com>
> > Reported-by: Cleber Rosa <crosa@redhat.com>
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > ---
> > Does anybody know why tb_lookup__cpu_state() has that odd
> > double-underscore in the middle of its name?
> >
> > Since the jmp_cache is per-vcpu we know that we're always going
> > to match on the cluster_index, so the other option would be to
> > leave the cluster_index bits out of the comparison, and leave the
> > "fold in cluster index to cf_mask" code in tb_htable_lookup().
> > Or we could require the callers of tb_lookup__cpu_state() to all
> > provide the cluster index, but that's more places to change,
> > so I prefer this.
> > ---
> >  include/exec/tb-lookup.h | 4 ++++
> >  accel/tcg/cpu-exec.c     | 3 ---
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h
> > index 492cb682894..26921b6dafd 100644
> > --- a/include/exec/tb-lookup.h
> > +++ b/include/exec/tb-lookup.h
> > @@ -28,6 +28,10 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc,
> target_ulong *cs_base,
> >      cpu_get_tb_cpu_state(env, pc, cs_base, flags);
> >      hash = tb_jmp_cache_hash_func(*pc);
> >      tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]);
> > +
> > +    cf_mask &= ~CF_CLUSTER_MASK;
> > +    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> > +
> >      if (likely(tb &&
> >                 tb->pc == *pc &&
> >                 tb->cs_base == *cs_base &&
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 7cf1292546f..60d87d5a19b 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -325,9 +325,6 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu,
> target_ulong pc,
> >      struct tb_desc desc;
> >      uint32_t h;
> >
> > -    cf_mask &= ~CF_CLUSTER_MASK;
> > -    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> > -
> >      desc.env = (CPUArchState *)cpu->env_ptr;
> >      desc.cs_base = cs_base;
> >      desc.flags = flags;
> >
>
>
Confirmed, both Mac OS 9.2 and OS X 10.4 running with qemu-system-ppc are
back to their old performance levels.

Best, and thanks,
Howard
Re: [Qemu-devel] [PATCH] accel/tcg: Consider cluster index in tb_lookup__cpu_state()
Posted by Richard Henderson 5 years, 1 month ago
On 2/5/19 3:18 PM, Peter Maydell wrote:
> In commit f7b78602fdc6c6e4be we added the CPU cluster number to the
> cflags field of the TB hash; this included adding it to the value
> kept in tb->cflags, since we pass that field directly into the hash
> calculation in some places. Unfortunately we forgot to check whether
> other parts of the code were doing comparisons against tb->cflags
> that would need to be updated.
> 
> It turns out that there is exactly one such place: the
> tb_lookup__cpu_state() function checks whether the TB it has
> found in the tb_jmp_cache has a tb->cflags matching the cf_mask
> that is passed in. The tb->cflags has the cluster_index in it
> but the cf_mask does not.
> 
> Hoist the "add cluster index to the cf_mask" code up from
> tb_htable_lookup() to tb_lookup__cpu_state() so it can be considered
> in the "did this TB match in the jmp cache" condition, as well as
> when we do the full hash lookup by physical PC, flags, etc.
> (tb_htable_lookup() is only called from tb_lookup__cpu_state(),
> so this change doesn't require any further knock-on changes.)
> 
> Fixes: f7b78602fdc6c6e4be ("accel/tcg: Add cluster number to TCG TB hash")
> Reported-by: Howard Spoelstra <hsp.cat7@gmail.com>
> Reported-by: Cleber Rosa <crosa@redhat.com>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

> Does anybody know why tb_lookup__cpu_state() has that odd
> double-underscore in the middle of its name?

I'm inclined to think typo...  Emilio?


r~

Re: [Qemu-devel] [PATCH] accel/tcg: Consider cluster index in tb_lookup__cpu_state()
Posted by Emilio G. Cota 5 years, 1 month ago
On Wed, Feb 06, 2019 at 03:15:26 +0000, Richard Henderson wrote:
> > Does anybody know why tb_lookup__cpu_state() has that odd
> > double-underscore in the middle of its name?
> 
> I'm inclined to think typo...  Emilio?

It's not a typo -- it's there to separate "tb lookup" and
"cpu state" to avoid ambiguity when guessing what
the function does based on its name.

Other projects, such as the kernel, make extensive use of
double underscores for this purpose. In fact, I believe
I picked up the habit from reading kernel code.

		E.

Re: [Qemu-devel] [PATCH] accel/tcg: Consider cluster index in tb_lookup__cpu_state()
Posted by Peter Maydell 5 years, 1 month ago
On Wed, 6 Feb 2019 at 15:55, Emilio G. Cota <cota@braap.org> wrote:
>
> On Wed, Feb 06, 2019 at 03:15:26 +0000, Richard Henderson wrote:
> > > Does anybody know why tb_lookup__cpu_state() has that odd
> > > double-underscore in the middle of its name?
> >
> > I'm inclined to think typo...  Emilio?
>
> It's not a typo -- it's there to separate "tb lookup" and
> "cpu state" to avoid ambiguity when guessing what
> the function does based on its name.

OK, I wouldn't have guessed that -- my assumption was
"accidentally removed a name component with sed" or
"just a typo", and I didn't assign any semantic meaning
to the double underscore...

thanks
-- PMM

Re: [Qemu-devel] [PATCH] accel/tcg: Consider cluster index in tb_lookup__cpu_state()
Posted by Richard Henderson 5 years, 1 month ago
On 2/5/19 3:18 PM, Peter Maydell wrote:
> In commit f7b78602fdc6c6e4be we added the CPU cluster number to the
> cflags field of the TB hash; this included adding it to the value
> kept in tb->cflags, since we pass that field directly into the hash
> calculation in some places. Unfortunately we forgot to check whether
> other parts of the code were doing comparisons against tb->cflags
> that would need to be updated.
> 
> It turns out that there is exactly one such place: the
> tb_lookup__cpu_state() function checks whether the TB it has
> found in the tb_jmp_cache has a tb->cflags matching the cf_mask
> that is passed in. The tb->cflags has the cluster_index in it
> but the cf_mask does not.
> 
> Hoist the "add cluster index to the cf_mask" code up from
> tb_htable_lookup() to tb_lookup__cpu_state() so it can be considered
> in the "did this TB match in the jmp cache" condition, as well as
> when we do the full hash lookup by physical PC, flags, etc.
> (tb_htable_lookup() is only called from tb_lookup__cpu_state(),
> so this change doesn't require any further knock-on changes.)
> 
> Fixes: f7b78602fdc6c6e4be ("accel/tcg: Add cluster number to TCG TB hash")
> Reported-by: Howard Spoelstra <hsp.cat7@gmail.com>
> Reported-by: Cleber Rosa <crosa@redhat.com>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---

Queued, thanks.


r~