[PATCH v7 RESEND 0/5] i386: Support SMP Cache Topology

Zhao Liu posted 5 patches 2 months, 3 weeks ago
hw/core/machine-smp.c |  9 ++++++
hw/i386/pc.c          |  4 +++
include/hw/boards.h   |  3 ++
qemu-options.hx       | 30 +++++++++++++++++-
target/i386/cpu.c     | 71 ++++++++++++++++++++++++++++++++++++++++++-
5 files changed, 115 insertions(+), 2 deletions(-)
[PATCH v7 RESEND 0/5] i386: Support SMP Cache Topology
Posted by Zhao Liu 2 months, 3 weeks ago
Hi folks,

This is my v7 resend version (updated the commit message of origin
v7's Patch 1).

Compared with v6 [1], v7 dropped the "thread" level cache topology
(cache per thread):

 - Patch 1 is the new patch to reject "thread" parameter for smp-cache.
 - Ptach 2 dropped cache per thread support.
 (Others remain unchanged.)

There're several reasons:

 * Currently, neither i386 nor ARM have real hardware support for per-
   thread cache.
 * ARM can't support thread level cache in device tree. [2].

So it is unnecessary to support it at this moment, even though per-
thread cache might have potential scheduling benefits for VMs without
CPU affinity.

In the future, if there is a clear demand for this feature, the correct
approach would be to add a new control field in MachineClass.smp_props
and enable it only for the machines that require it.


This series is based on the master branch at commit aa3a285b5bc5 ("Merge
tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu into
staging").

Smp-cache support of ARM side can be found at [3].


Background
==========

The x86 and ARM (RISCV) need to allow user to configure cache properties
(current only topology):
 * For x86, the default cache topology model (of max/host CPU) does not
   always match the Host's real physical cache topology. Performance can
   increase when the configured virtual topology is closer to the
   physical topology than a default topology would be.
 * For ARM, QEMU can't get the cache topology information from the CPU
   registers, then user configuration is necessary. Additionally, the
   cache information is also needed for MPAM emulation (for TCG) to
   build the right PPTT. (Originally from Jonathan)


About smp-cache
===============

The API design has been discussed heavily in [4].

Now, smp-cache is implemented as a array integrated in -machine. Though
-machine currently can't support JSON format, this is the one of the
directions of future.

An example is as follows:

smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die

"cache" specifies the cache that the properties will be applied on. This
field is the combination of cache level and cache type. Now it supports
"l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified
cache) and "l3" (L3 unified cache).

"topology" field accepts CPU topology levels including "core", "module",
"cluster", "die", "socket", "book", "drawer" and a special value
"default". (Note, now, in v7, smp-cache doesn't support "thread".)

The "default" is introduced to make it easier for libvirt to set a
default parameter value without having to care about the specific
machine (because currently there is no proper way for machine to
expose supported topology levels and caches).

If "default" is set, then the cache topology will follow the
architecture's default cache topology model. If other CPU topology level
is set, the cache will be shared at corresponding CPU topology level.

[1]: Patch v6: https://lore.kernel.org/qemu-devel/20241219083237.265419-1-zhao1.liu@intel.com/
[2]: Gap of cache per thread for ARM: https://lore.kernel.org/qemu-devel/20250110114100.00002296@huawei.com/T/#m50c37fa5d372feac8e607c279cd446da3e22a12c
[3]: ARM smp-cache: https://lore.kernel.org/qemu-devel/20250102152012.1049-1-alireza.sanaee@huawei.com/
[4]: API disscussion: https://lore.kernel.org/qemu-devel/8734ndj33j.fsf@pond.sub.org/

Thanks and Best Regards,
Zhao
---
Alireza Sanaee (1):
  i386/cpu: add has_caches flag to check smp_cache configuration

Zhao Liu (4):
  hw/core/machine: Reject thread level cache
  i386/cpu: Support module level cache topology
  i386/cpu: Update cache topology with machine's configuration
  i386/pc: Support cache topology in -machine for PC machine

 hw/core/machine-smp.c |  9 ++++++
 hw/i386/pc.c          |  4 +++
 include/hw/boards.h   |  3 ++
 qemu-options.hx       | 30 +++++++++++++++++-
 target/i386/cpu.c     | 71 ++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 115 insertions(+), 2 deletions(-)

-- 
2.34.1
Re: [PATCH v7 RESEND 0/5] i386: Support SMP Cache Topology
Posted by Zhao Liu 1 month, 4 weeks ago
Hi Paolo,

A kindly ping. (I dropped the cache per thread; do you think this version
is ok?)

Thanks,
Zhao

On Fri, Jan 10, 2025 at 10:51:10PM +0800, Zhao Liu wrote:
> Date: Fri, 10 Jan 2025 22:51:10 +0800
> From: Zhao Liu <zhao1.liu@intel.com>
> Subject: [PATCH v7 RESEND 0/5] i386: Support SMP Cache Topology
> X-Mailer: git-send-email 2.34.1
> 
> Hi folks,
> 
> This is my v7 resend version (updated the commit message of origin
> v7's Patch 1).
> 
> Compared with v6 [1], v7 dropped the "thread" level cache topology
> (cache per thread):
> 
>  - Patch 1 is the new patch to reject "thread" parameter for smp-cache.
>  - Ptach 2 dropped cache per thread support.
>  (Others remain unchanged.)
> 
> There're several reasons:
> 
>  * Currently, neither i386 nor ARM have real hardware support for per-
>    thread cache.
>  * ARM can't support thread level cache in device tree. [2].
> 
> So it is unnecessary to support it at this moment, even though per-
> thread cache might have potential scheduling benefits for VMs without
> CPU affinity.
> 
> In the future, if there is a clear demand for this feature, the correct
> approach would be to add a new control field in MachineClass.smp_props
> and enable it only for the machines that require it.
> 
> 
> This series is based on the master branch at commit aa3a285b5bc5 ("Merge
> tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu into
> staging").
> 
> Smp-cache support of ARM side can be found at [3].
> 
> 
> Background
> ==========
> 
> The x86 and ARM (RISCV) need to allow user to configure cache properties
> (current only topology):
>  * For x86, the default cache topology model (of max/host CPU) does not
>    always match the Host's real physical cache topology. Performance can
>    increase when the configured virtual topology is closer to the
>    physical topology than a default topology would be.
>  * For ARM, QEMU can't get the cache topology information from the CPU
>    registers, then user configuration is necessary. Additionally, the
>    cache information is also needed for MPAM emulation (for TCG) to
>    build the right PPTT. (Originally from Jonathan)
> 
> 
> About smp-cache
> ===============
> 
> The API design has been discussed heavily in [4].
> 
> Now, smp-cache is implemented as a array integrated in -machine. Though
> -machine currently can't support JSON format, this is the one of the
> directions of future.
> 
> An example is as follows:
> 
> smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> 
> "cache" specifies the cache that the properties will be applied on. This
> field is the combination of cache level and cache type. Now it supports
> "l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified
> cache) and "l3" (L3 unified cache).
> 
> "topology" field accepts CPU topology levels including "core", "module",
> "cluster", "die", "socket", "book", "drawer" and a special value
> "default". (Note, now, in v7, smp-cache doesn't support "thread".)
> 
> The "default" is introduced to make it easier for libvirt to set a
> default parameter value without having to care about the specific
> machine (because currently there is no proper way for machine to
> expose supported topology levels and caches).
> 
> If "default" is set, then the cache topology will follow the
> architecture's default cache topology model. If other CPU topology level
> is set, the cache will be shared at corresponding CPU topology level.
> 
> [1]: Patch v6: https://lore.kernel.org/qemu-devel/20241219083237.265419-1-zhao1.liu@intel.com/
> [2]: Gap of cache per thread for ARM: https://lore.kernel.org/qemu-devel/20250110114100.00002296@huawei.com/T/#m50c37fa5d372feac8e607c279cd446da3e22a12c
> [3]: ARM smp-cache: https://lore.kernel.org/qemu-devel/20250102152012.1049-1-alireza.sanaee@huawei.com/
> [4]: API disscussion: https://lore.kernel.org/qemu-devel/8734ndj33j.fsf@pond.sub.org/
> 
> Thanks and Best Regards,
> Zhao
> ---
> Alireza Sanaee (1):
>   i386/cpu: add has_caches flag to check smp_cache configuration
> 
> Zhao Liu (4):
>   hw/core/machine: Reject thread level cache
>   i386/cpu: Support module level cache topology
>   i386/cpu: Update cache topology with machine's configuration
>   i386/pc: Support cache topology in -machine for PC machine
> 
>  hw/core/machine-smp.c |  9 ++++++
>  hw/i386/pc.c          |  4 +++
>  include/hw/boards.h   |  3 ++
>  qemu-options.hx       | 30 +++++++++++++++++-
>  target/i386/cpu.c     | 71 ++++++++++++++++++++++++++++++++++++++++++-
>  5 files changed, 115 insertions(+), 2 deletions(-)
> 
> -- 
> 2.34.1
>
Re: [PATCH v7 RESEND 0/5] i386: Support SMP Cache Topology
Posted by Zhao Liu 1 month, 2 weeks ago
Hi Paolo,

A gentle poke. I plan to add cache models for Intel CPUs and extend
this smp_cache interface after this series. :-)

(The 1st patch of general machine has been picked by Phili.)

Thanks,
Zhao

> > Alireza Sanaee (1):
> >   i386/cpu: add has_caches flag to check smp_cache configuration
> > 
> > Zhao Liu (4):
> >   hw/core/machine: Reject thread level cache
> >   i386/cpu: Support module level cache topology
> >   i386/cpu: Update cache topology with machine's configuration
> >   i386/pc: Support cache topology in -machine for PC machine
> > 
> >  hw/core/machine-smp.c |  9 ++++++
> >  hw/i386/pc.c          |  4 +++
> >  include/hw/boards.h   |  3 ++
> >  qemu-options.hx       | 30 +++++++++++++++++-
> >  target/i386/cpu.c     | 71 ++++++++++++++++++++++++++++++++++++++++++-
> >  5 files changed, 115 insertions(+), 2 deletions(-)
> > 
> > -- 
> > 2.34.1
> > 
>
Re: [PATCH v7 RESEND 0/5] i386: Support SMP Cache Topology
Posted by Markus Armbruster 1 month, 4 weeks ago
Zhao Liu <zhao1.liu@intel.com> writes:

> Hi folks,
>
> This is my v7 resend version (updated the commit message of origin
> v7's Patch 1).

If anything changed, even if it's just a commit message, make it a new
version, not a resend, to avoid confusion.  Next time :)

[...]
Re: [PATCH v7 RESEND 0/5] i386: Support SMP Cache Topology
Posted by Zhao Liu 1 month, 4 weeks ago
On Wed, Feb 05, 2025 at 01:32:19PM +0100, Markus Armbruster wrote:
> Date: Wed, 05 Feb 2025 13:32:19 +0100
> From: Markus Armbruster <armbru@redhat.com>
> Subject: Re: [PATCH v7 RESEND 0/5] i386: Support SMP Cache Topology
> 
> Zhao Liu <zhao1.liu@intel.com> writes:
> 
> > Hi folks,
> >
> > This is my v7 resend version (updated the commit message of origin
> > v7's Patch 1).
> 
> If anything changed, even if it's just a commit message, make it a new
> version, not a resend, to avoid confusion.  Next time :)
> 
> [...]

Thanks Markus! I'll keep in my mind about this :-).
Re: [PATCH v7 RESEND 0/5] i386: Support SMP Cache Topology
Posted by Michael S. Tsirkin 2 months, 2 weeks ago
On Fri, Jan 10, 2025 at 10:51:10PM +0800, Zhao Liu wrote:
> Hi folks,
> 
> This is my v7 resend version (updated the commit message of origin
> v7's Patch 1).
> 
> Compared with v6 [1], v7 dropped the "thread" level cache topology
> (cache per thread):
> 
>  - Patch 1 is the new patch to reject "thread" parameter for smp-cache.
>  - Ptach 2 dropped cache per thread support.
>  (Others remain unchanged.)
> 
> There're several reasons:
> 
>  * Currently, neither i386 nor ARM have real hardware support for per-
>    thread cache.
>  * ARM can't support thread level cache in device tree. [2].
> 
> So it is unnecessary to support it at this moment, even though per-
> thread cache might have potential scheduling benefits for VMs without
> CPU affinity.
> 
> In the future, if there is a clear demand for this feature, the correct
> approach would be to add a new control field in MachineClass.smp_props
> and enable it only for the machines that require it.
> 
> 
> This series is based on the master branch at commit aa3a285b5bc5 ("Merge
> tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu into
> staging").

pc things:

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>



> Smp-cache support of ARM side can be found at [3].
> 
> 
> Background
> ==========
> 
> The x86 and ARM (RISCV) need to allow user to configure cache properties
> (current only topology):
>  * For x86, the default cache topology model (of max/host CPU) does not
>    always match the Host's real physical cache topology. Performance can
>    increase when the configured virtual topology is closer to the
>    physical topology than a default topology would be.
>  * For ARM, QEMU can't get the cache topology information from the CPU
>    registers, then user configuration is necessary. Additionally, the
>    cache information is also needed for MPAM emulation (for TCG) to
>    build the right PPTT. (Originally from Jonathan)
> 
> 
> About smp-cache
> ===============
> 
> The API design has been discussed heavily in [4].
> 
> Now, smp-cache is implemented as a array integrated in -machine. Though
> -machine currently can't support JSON format, this is the one of the
> directions of future.
> 
> An example is as follows:
> 
> smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> 
> "cache" specifies the cache that the properties will be applied on. This
> field is the combination of cache level and cache type. Now it supports
> "l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified
> cache) and "l3" (L3 unified cache).
> 
> "topology" field accepts CPU topology levels including "core", "module",
> "cluster", "die", "socket", "book", "drawer" and a special value
> "default". (Note, now, in v7, smp-cache doesn't support "thread".)
> 
> The "default" is introduced to make it easier for libvirt to set a
> default parameter value without having to care about the specific
> machine (because currently there is no proper way for machine to
> expose supported topology levels and caches).
> 
> If "default" is set, then the cache topology will follow the
> architecture's default cache topology model. If other CPU topology level
> is set, the cache will be shared at corresponding CPU topology level.
> 
> [1]: Patch v6: https://lore.kernel.org/qemu-devel/20241219083237.265419-1-zhao1.liu@intel.com/
> [2]: Gap of cache per thread for ARM: https://lore.kernel.org/qemu-devel/20250110114100.00002296@huawei.com/T/#m50c37fa5d372feac8e607c279cd446da3e22a12c
> [3]: ARM smp-cache: https://lore.kernel.org/qemu-devel/20250102152012.1049-1-alireza.sanaee@huawei.com/
> [4]: API disscussion: https://lore.kernel.org/qemu-devel/8734ndj33j.fsf@pond.sub.org/
> 
> Thanks and Best Regards,
> Zhao
> ---
> Alireza Sanaee (1):
>   i386/cpu: add has_caches flag to check smp_cache configuration
> 
> Zhao Liu (4):
>   hw/core/machine: Reject thread level cache
>   i386/cpu: Support module level cache topology
>   i386/cpu: Update cache topology with machine's configuration
>   i386/pc: Support cache topology in -machine for PC machine
> 
>  hw/core/machine-smp.c |  9 ++++++
>  hw/i386/pc.c          |  4 +++
>  include/hw/boards.h   |  3 ++
>  qemu-options.hx       | 30 +++++++++++++++++-
>  target/i386/cpu.c     | 71 ++++++++++++++++++++++++++++++++++++++++++-
>  5 files changed, 115 insertions(+), 2 deletions(-)
> 
> -- 
> 2.34.1