[PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support

Yanan Wang via posted 10 patches 2 years, 5 months ago
Test checkpatch failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20211121122502.9844-1-wangyanan55@huawei.com
Maintainers: Eduardo Habkost <ehabkost@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Ani Sinha <ani@anisinha.ca>, "Michael S. Tsirkin" <mst@redhat.com>, Shannon Zhao <shannon.zhaosl@gmail.com>, "Philippe Mathieu-Daudé" <philmd@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Igor Mammedov <imammedo@redhat.com>, Eric Blake <eblake@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Markus Armbruster <armbru@redhat.com>
There is a newer version of this series
hw/acpi/aml-build.c         |  66 ++------------------------
hw/arm/virt-acpi-build.c    |  92 +++++++++++++++++++++++++++++++++++-
hw/arm/virt.c               |  16 ++++---
hw/core/machine-smp.c       |  29 +++++++++---
hw/core/machine.c           |   3 ++
include/hw/acpi/aml-build.h |   5 +-
include/hw/boards.h         |   6 ++-
qapi/machine.json           |   5 +-
qemu-options.hx             |  91 +++++++++++++++++++++++++++--------
softmmu/vl.c                |   3 ++
tests/data/acpi/virt/PPTT   | Bin 76 -> 96 bytes
11 files changed, 214 insertions(+), 102 deletions(-)
[PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support
Posted by Yanan Wang via 2 years, 5 months ago
Hi,

This series introduces the new CPU clusters topology parameter
and enable the support for it on ARM virt machines.

Background and descriptions:
The new Cluster-Aware Scheduling support has landed in Linux 5.16,
which has been proved to benefit the scheduling performance (e.g.
load balance and wake_affine strategy) on both x86_64 and AArch64.
We can see Kernel PR [1] and the latest patch set [2] for reference.

So now in Linux 5.16 we have four-level arch-neutral CPU topology
definition like below and a new scheduler level for clusters.
struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
}

A cluster generally means a group of CPU cores which share L2 cache
or other mid-level resources, and it is the shared resources that
is used to improve scheduler's behavior. From the point of view of
the size range, it's between CPU die and CPU core. For example, on
some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
and 4 CPU cores in each cluster. The 4 CPU cores share a separate
L2 cache and a L3 cache tag, which brings cache affinity advantage.

[1] https://lore.kernel.org/lkml/163572864855.3357115.17938524897008353101.tglx@xen13/
[2] https://lkml.org/lkml/2021/9/24/178

In virtualization, on the Hosts which have pClusters, if we can
design a vCPU topology with cluster level for guest kernel and
have a dedicated vCPU pinning. A Cluster-Aware Guest kernel can
also make use of the cache affinity of CPU clusters to gain
similar scheduling performance.

This series consists of two parts:
The first part (patch 1-3):
Implement infrastructure for CPU cluster level topology support,
including the SMP documentation, configuration and parsing.

The second part (part 4-10):
Enable CPU cluster support on ARM virt machines, so that users
can specify a 4-level CPU hierarchy sockets/clusters/cores/threads.
And the 4-level topology will be described to guest kernel through
ACPI PPTT and DT cpu-map.

Changelog:
v3->v4:
- Significant change from v3 to v4, since the whole series is reworked
  based on latest QEMU SMP frame.
- v3: https://lore.kernel.org/qemu-devel/20210516103228.37792-1-wangyanan55@huawei.com/

Yanan Wang (10):
  qemu-options: Improve readability of SMP related Docs
  hw/core/machine: Introduce CPU cluster topology support
  hw/core/machine: Wrap target specific parameters together
  hw/arm/virt: Support clusters on ARM virt machines
  hw/arm/virt: Support cluster level in DT cpu-map
  hw/acpi/aml-build: Improve scalability of PPTT generation
  hw/arm/virt-acpi-build: Make an ARM specific PPTT generator
  tests/acpi/bios-tables-test: Allow changes to virt/PPTT file
  hw/acpi/virt-acpi-build: Support cluster level in PPTT generation
  tests/acpi/bios-table-test: Update expected virt/PPTT file

 hw/acpi/aml-build.c         |  66 ++------------------------
 hw/arm/virt-acpi-build.c    |  92 +++++++++++++++++++++++++++++++++++-
 hw/arm/virt.c               |  16 ++++---
 hw/core/machine-smp.c       |  29 +++++++++---
 hw/core/machine.c           |   3 ++
 include/hw/acpi/aml-build.h |   5 +-
 include/hw/boards.h         |   6 ++-
 qapi/machine.json           |   5 +-
 qemu-options.hx             |  91 +++++++++++++++++++++++++++--------
 softmmu/vl.c                |   3 ++
 tests/data/acpi/virt/PPTT   | Bin 76 -> 96 bytes
 11 files changed, 214 insertions(+), 102 deletions(-)

--
2.19.1


Re: [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support
Posted by wangyanan (Y) via 2 years, 4 months ago
Ping...

On 2021/11/21 20:24, Yanan Wang wrote:
> Hi,
>
> This series introduces the new CPU clusters topology parameter
> and enable the support for it on ARM virt machines.
>
> Background and descriptions:
> The new Cluster-Aware Scheduling support has landed in Linux 5.16,
> which has been proved to benefit the scheduling performance (e.g.
> load balance and wake_affine strategy) on both x86_64 and AArch64.
> We can see Kernel PR [1] and the latest patch set [2] for reference.
>
> So now in Linux 5.16 we have four-level arch-neutral CPU topology
> definition like below and a new scheduler level for clusters.
> struct cpu_topology {
>      int thread_id;
>      int core_id;
>      int cluster_id;
>      int package_id;
>      int llc_id;
>      cpumask_t thread_sibling;
>      cpumask_t core_sibling;
>      cpumask_t cluster_sibling;
>      cpumask_t llc_sibling;
> }
>
> A cluster generally means a group of CPU cores which share L2 cache
> or other mid-level resources, and it is the shared resources that
> is used to improve scheduler's behavior. From the point of view of
> the size range, it's between CPU die and CPU core. For example, on
> some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
> and 4 CPU cores in each cluster. The 4 CPU cores share a separate
> L2 cache and a L3 cache tag, which brings cache affinity advantage.
>
> [1] https://lore.kernel.org/lkml/163572864855.3357115.17938524897008353101.tglx@xen13/
> [2] https://lkml.org/lkml/2021/9/24/178
>
> In virtualization, on the Hosts which have pClusters, if we can
> design a vCPU topology with cluster level for guest kernel and
> have a dedicated vCPU pinning. A Cluster-Aware Guest kernel can
> also make use of the cache affinity of CPU clusters to gain
> similar scheduling performance.
>
> This series consists of two parts:
> The first part (patch 1-3):
> Implement infrastructure for CPU cluster level topology support,
> including the SMP documentation, configuration and parsing.
>
> The second part (part 4-10):
> Enable CPU cluster support on ARM virt machines, so that users
> can specify a 4-level CPU hierarchy sockets/clusters/cores/threads.
> And the 4-level topology will be described to guest kernel through
> ACPI PPTT and DT cpu-map.
>
> Changelog:
> v3->v4:
> - Significant change from v3 to v4, since the whole series is reworked
>    based on latest QEMU SMP frame.
> - v3: https://lore.kernel.org/qemu-devel/20210516103228.37792-1-wangyanan55@huawei.com/
>
> Yanan Wang (10):
>    qemu-options: Improve readability of SMP related Docs
>    hw/core/machine: Introduce CPU cluster topology support
>    hw/core/machine: Wrap target specific parameters together
>    hw/arm/virt: Support clusters on ARM virt machines
>    hw/arm/virt: Support cluster level in DT cpu-map
>    hw/acpi/aml-build: Improve scalability of PPTT generation
>    hw/arm/virt-acpi-build: Make an ARM specific PPTT generator
>    tests/acpi/bios-tables-test: Allow changes to virt/PPTT file
>    hw/acpi/virt-acpi-build: Support cluster level in PPTT generation
>    tests/acpi/bios-table-test: Update expected virt/PPTT file
>
>   hw/acpi/aml-build.c         |  66 ++------------------------
>   hw/arm/virt-acpi-build.c    |  92 +++++++++++++++++++++++++++++++++++-
>   hw/arm/virt.c               |  16 ++++---
>   hw/core/machine-smp.c       |  29 +++++++++---
>   hw/core/machine.c           |   3 ++
>   include/hw/acpi/aml-build.h |   5 +-
>   include/hw/boards.h         |   6 ++-
>   qapi/machine.json           |   5 +-
>   qemu-options.hx             |  91 +++++++++++++++++++++++++++--------
>   softmmu/vl.c                |   3 ++
>   tests/data/acpi/virt/PPTT   | Bin 76 -> 96 bytes
>   11 files changed, 214 insertions(+), 102 deletions(-)
>
> --
> 2.19.1
>
> .


Re: [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support
Posted by wangyanan (Y) via 2 years, 3 months ago
I have sent a v5 with four new patches added, so this v4 can be ignored.
v5: https://patchew.org/QEMU/20211228092221.21068-1-wangyanan55@huawei.com/

Thanks,
Yanan

On 2021/11/21 20:24, Yanan Wang wrote:
> Hi,
>
> This series introduces the new CPU clusters topology parameter
> and enable the support for it on ARM virt machines.
>
> Background and descriptions:
> The new Cluster-Aware Scheduling support has landed in Linux 5.16,
> which has been proved to benefit the scheduling performance (e.g.
> load balance and wake_affine strategy) on both x86_64 and AArch64.
> We can see Kernel PR [1] and the latest patch set [2] for reference.
>
> So now in Linux 5.16 we have four-level arch-neutral CPU topology
> definition like below and a new scheduler level for clusters.
> struct cpu_topology {
>      int thread_id;
>      int core_id;
>      int cluster_id;
>      int package_id;
>      int llc_id;
>      cpumask_t thread_sibling;
>      cpumask_t core_sibling;
>      cpumask_t cluster_sibling;
>      cpumask_t llc_sibling;
> }
>
> A cluster generally means a group of CPU cores which share L2 cache
> or other mid-level resources, and it is the shared resources that
> is used to improve scheduler's behavior. From the point of view of
> the size range, it's between CPU die and CPU core. For example, on
> some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
> and 4 CPU cores in each cluster. The 4 CPU cores share a separate
> L2 cache and a L3 cache tag, which brings cache affinity advantage.
>
> [1] https://lore.kernel.org/lkml/163572864855.3357115.17938524897008353101.tglx@xen13/
> [2] https://lkml.org/lkml/2021/9/24/178
>
> In virtualization, on the Hosts which have pClusters, if we can
> design a vCPU topology with cluster level for guest kernel and
> have a dedicated vCPU pinning. A Cluster-Aware Guest kernel can
> also make use of the cache affinity of CPU clusters to gain
> similar scheduling performance.
>
> This series consists of two parts:
> The first part (patch 1-3):
> Implement infrastructure for CPU cluster level topology support,
> including the SMP documentation, configuration and parsing.
>
> The second part (part 4-10):
> Enable CPU cluster support on ARM virt machines, so that users
> can specify a 4-level CPU hierarchy sockets/clusters/cores/threads.
> And the 4-level topology will be described to guest kernel through
> ACPI PPTT and DT cpu-map.
>
> Changelog:
> v3->v4:
> - Significant change from v3 to v4, since the whole series is reworked
>    based on latest QEMU SMP frame.
> - v3: https://lore.kernel.org/qemu-devel/20210516103228.37792-1-wangyanan55@huawei.com/
>
> Yanan Wang (10):
>    qemu-options: Improve readability of SMP related Docs
>    hw/core/machine: Introduce CPU cluster topology support
>    hw/core/machine: Wrap target specific parameters together
>    hw/arm/virt: Support clusters on ARM virt machines
>    hw/arm/virt: Support cluster level in DT cpu-map
>    hw/acpi/aml-build: Improve scalability of PPTT generation
>    hw/arm/virt-acpi-build: Make an ARM specific PPTT generator
>    tests/acpi/bios-tables-test: Allow changes to virt/PPTT file
>    hw/acpi/virt-acpi-build: Support cluster level in PPTT generation
>    tests/acpi/bios-table-test: Update expected virt/PPTT file
>
>   hw/acpi/aml-build.c         |  66 ++------------------------
>   hw/arm/virt-acpi-build.c    |  92 +++++++++++++++++++++++++++++++++++-
>   hw/arm/virt.c               |  16 ++++---
>   hw/core/machine-smp.c       |  29 +++++++++---
>   hw/core/machine.c           |   3 ++
>   include/hw/acpi/aml-build.h |   5 +-
>   include/hw/boards.h         |   6 ++-
>   qapi/machine.json           |   5 +-
>   qemu-options.hx             |  91 +++++++++++++++++++++++++++--------
>   softmmu/vl.c                |   3 ++
>   tests/data/acpi/virt/PPTT   | Bin 76 -> 96 bytes
>   11 files changed, 214 insertions(+), 102 deletions(-)
>
> --
> 2.19.1
>
> .