[RFC PATCH 00/23] accel/tcg: Convert victim tlb to IntervalTree

Richard Henderson posted 23 patches 1 month, 2 weeks ago
There is a newer version of this series
include/exec/cpu-all.h        |   3 +
include/exec/exec-all.h       |  57 ----
include/exec/tlb-common.h     |  68 +++-
include/hw/core/cpu.h         |  75 +----
include/hw/core/tcg-cpu-ops.h |  10 -
include/qemu/interval-tree.h  |  11 +
accel/tcg/cputlb.c            | 599 +++++++++++++++++++---------------
util/interval-tree.c          |  20 ++
util/selfmap.c                |  13 +-
9 files changed, 431 insertions(+), 425 deletions(-)
[RFC PATCH 00/23] accel/tcg: Convert victim tlb to IntervalTree
Posted by Richard Henderson 1 month, 2 weeks ago
Based-on: 20241009000453.315652-1-richard.henderson@linaro.org
("[PATCH v3 00/20] accel/tcg: Introduce tlb_fill_align hook")

The initial idea was: how much can we do with an intelligent data
structure for the same cost as a linear search through an array?

This is an initial installment along these lines.  This is about
as far as I can go without first converting all targets to the
new tlb_fill_align hook.  Indeed, the final two patches will not
compile with all targets enabled, but hint at the direction of
the next steps.

I do not expect large perf changes with this patch set.  I will
be happy if performance comes out even.


r~

Richard Henderson (23):
  util/interval-tree: Introduce interval_tree_free_nodes
  accel/tcg: Split out tlbfast_flush_locked
  accel/tcg: Split out tlbfast_{index,entry}
  accel/tcg: Split out tlbfast_flush_range_locked
  accel/tcg: Fix flags usage in mmu_lookup1, atomic_mmu_lookup
  accel/tcg: Early exit for zero length in tlb_flush_range_by_mmuidx*
  accel/tcg: Flush entire tlb when a masked range wraps
  accel/tcg: Add IntervalTreeRoot to CPUTLBDesc
  accel/tcg: Populate IntervalTree in tlb_set_page_full
  accel/tcg: Remove IntervalTree entry in tlb_flush_page_locked
  accel/tcg: Remove IntervalTree entries in tlb_flush_range_locked
  accel/tcg: Process IntervalTree entries in tlb_reset_dirty
  accel/tcg: Process IntervalTree entries in tlb_set_dirty
  accel/tcg: Replace victim_tlb_hit with tlbtree_hit
  accel/tcg: Remove the victim tlb
  include/exec/tlb-common: Move CPUTLBEntryFull from hw/core/cpu.h
  accel/tcg: Delay plugin adjustment in probe_access_internal
  accel/tcg: Call cpu_ld*_code_mmu from cpu_ld*_code
  accel/tcg: Always use IntervalTree for code lookups
  accel/tcg: Link CPUTLBEntry to CPUTLBEntryTree
  accel/tcg: Remove CPUTLBDesc.fulltlb
  accel/tcg: Drop TCGCPUOps.tlb_fill -- NOTYET
  accel/tcg: Unexport tlb_set_page*

 include/exec/cpu-all.h        |   3 +
 include/exec/exec-all.h       |  57 ----
 include/exec/tlb-common.h     |  68 +++-
 include/hw/core/cpu.h         |  75 +----
 include/hw/core/tcg-cpu-ops.h |  10 -
 include/qemu/interval-tree.h  |  11 +
 accel/tcg/cputlb.c            | 599 +++++++++++++++++++---------------
 util/interval-tree.c          |  20 ++
 util/selfmap.c                |  13 +-
 9 files changed, 431 insertions(+), 425 deletions(-)

-- 
2.43.0
Re: [RFC PATCH 00/23] accel/tcg: Convert victim tlb to IntervalTree
Posted by BALATON Zoltan 1 month, 2 weeks ago
On Wed, 9 Oct 2024, Richard Henderson wrote:
> Based-on: 20241009000453.315652-1-richard.henderson@linaro.org
> ("[PATCH v3 00/20] accel/tcg: Introduce tlb_fill_align hook")
>
> The initial idea was: how much can we do with an intelligent data
> structure for the same cost as a linear search through an array?
>
> This is an initial installment along these lines.  This is about
> as far as I can go without first converting all targets to the
> new tlb_fill_align hook.  Indeed, the final two patches will not
> compile with all targets enabled, but hint at the direction of
> the next steps.
>
> I do not expect large perf changes with this patch set.  I will
> be happy if performance comes out even.

Then what's the point? Diffstat below does not show it's less code and if 
it's also not more efficient then what's the reason to change it? I'm not 
opposed to any change just don't see an explanation of what motivates this 
series.

Regards,
BALATON Zoltan

>
> r~
>
> Richard Henderson (23):
>  util/interval-tree: Introduce interval_tree_free_nodes
>  accel/tcg: Split out tlbfast_flush_locked
>  accel/tcg: Split out tlbfast_{index,entry}
>  accel/tcg: Split out tlbfast_flush_range_locked
>  accel/tcg: Fix flags usage in mmu_lookup1, atomic_mmu_lookup
>  accel/tcg: Early exit for zero length in tlb_flush_range_by_mmuidx*
>  accel/tcg: Flush entire tlb when a masked range wraps
>  accel/tcg: Add IntervalTreeRoot to CPUTLBDesc
>  accel/tcg: Populate IntervalTree in tlb_set_page_full
>  accel/tcg: Remove IntervalTree entry in tlb_flush_page_locked
>  accel/tcg: Remove IntervalTree entries in tlb_flush_range_locked
>  accel/tcg: Process IntervalTree entries in tlb_reset_dirty
>  accel/tcg: Process IntervalTree entries in tlb_set_dirty
>  accel/tcg: Replace victim_tlb_hit with tlbtree_hit
>  accel/tcg: Remove the victim tlb
>  include/exec/tlb-common: Move CPUTLBEntryFull from hw/core/cpu.h
>  accel/tcg: Delay plugin adjustment in probe_access_internal
>  accel/tcg: Call cpu_ld*_code_mmu from cpu_ld*_code
>  accel/tcg: Always use IntervalTree for code lookups
>  accel/tcg: Link CPUTLBEntry to CPUTLBEntryTree
>  accel/tcg: Remove CPUTLBDesc.fulltlb
>  accel/tcg: Drop TCGCPUOps.tlb_fill -- NOTYET
>  accel/tcg: Unexport tlb_set_page*
>
> include/exec/cpu-all.h        |   3 +
> include/exec/exec-all.h       |  57 ----
> include/exec/tlb-common.h     |  68 +++-
> include/hw/core/cpu.h         |  75 +----
> include/hw/core/tcg-cpu-ops.h |  10 -
> include/qemu/interval-tree.h  |  11 +
> accel/tcg/cputlb.c            | 599 +++++++++++++++++++---------------
> util/interval-tree.c          |  20 ++
> util/selfmap.c                |  13 +-
> 9 files changed, 431 insertions(+), 425 deletions(-)
>
>
Re: [RFC PATCH 00/23] accel/tcg: Convert victim tlb to IntervalTree
Posted by Richard Henderson 1 month, 2 weeks ago
On 10/9/24 09:27, BALATON Zoltan wrote:
> On Wed, 9 Oct 2024, Richard Henderson wrote:
>> Based-on: 20241009000453.315652-1-richard.henderson@linaro.org
>> ("[PATCH v3 00/20] accel/tcg: Introduce tlb_fill_align hook")
>>
>> The initial idea was: how much can we do with an intelligent data
>> structure for the same cost as a linear search through an array?
>>
>> This is an initial installment along these lines.  This is about
>> as far as I can go without first converting all targets to the
>> new tlb_fill_align hook.  Indeed, the final two patches will not
>> compile with all targets enabled, but hint at the direction of
>> the next steps.
>>
>> I do not expect large perf changes with this patch set.  I will
>> be happy if performance comes out even.
> 
> Then what's the point?

Eventually fixing the page size > TARGET_PAGE_SIZE performance issues.

E.g. with a 16k or 64k aarch64 guest kernel, we still have TARGET_PAGE_SIZE at 4k, so all 
guest pages are "large", and so run into our current behaviour of flushing the entire tlb 
too often.

Even without that, I expect further cleanups to improve performance, we're just not there yet.


r~

Re: [RFC PATCH 00/23] accel/tcg: Convert victim tlb to IntervalTree
Posted by Pierrick Bouvier 1 month, 2 weeks ago
On 10/9/24 10:10, Richard Henderson wrote:
> On 10/9/24 09:27, BALATON Zoltan wrote:
>> On Wed, 9 Oct 2024, Richard Henderson wrote:
>>> Based-on: 20241009000453.315652-1-richard.henderson@linaro.org
>>> ("[PATCH v3 00/20] accel/tcg: Introduce tlb_fill_align hook")
>>>
>>> The initial idea was: how much can we do with an intelligent data
>>> structure for the same cost as a linear search through an array?
>>>
>>> This is an initial installment along these lines.  This is about
>>> as far as I can go without first converting all targets to the
>>> new tlb_fill_align hook.  Indeed, the final two patches will not
>>> compile with all targets enabled, but hint at the direction of
>>> the next steps.
>>>
>>> I do not expect large perf changes with this patch set.  I will
>>> be happy if performance comes out even.
>>
>> Then what's the point?
> 
> Eventually fixing the page size > TARGET_PAGE_SIZE performance issues.
> 
> E.g. with a 16k or 64k aarch64 guest kernel, we still have TARGET_PAGE_SIZE at 4k, so all
> guest pages are "large", and so run into our current behaviour of flushing the entire tlb
> too often.
> 
> Even without that, I expect further cleanups to improve performance, we're just not there yet.
> 
> 
> r~
> 

Does merging pages over a given range be something we could benefit from 
too? In this case, entries in our tlbtree would have varying size, 
allowing us to cover more space with a single entry.

It would allow us to have a more shallow tlbtree (given that it's 
rebalanced when modified) and speed up walking operations.

I'm not sure if it can help performance wise though.
Re: [RFC PATCH 00/23] accel/tcg: Convert victim tlb to IntervalTree
Posted by Richard Henderson 1 month, 1 week ago
On 10/9/24 17:50, Pierrick Bouvier wrote:
>> Eventually fixing the page size > TARGET_PAGE_SIZE performance issues.
>>
>> E.g. with a 16k or 64k aarch64 guest kernel, we still have TARGET_PAGE_SIZE at 4k, so all
>> guest pages are "large", and so run into our current behaviour of flushing the entire tlb
>> too often.
>>
>> Even without that, I expect further cleanups to improve performance, we're just not 
>> there yet.
>>
>>
>> r~
>>
> 
> Does merging pages over a given range be something we could benefit from too? In this 
> case, entries in our tlbtree would have varying size, allowing us to cover more space with 
> a single entry.

I don't know.  I kinda doubt it, because of tlb flushing mechanics.
But we shall see once everything up until that point is done.


r~