[PATCH v18 0/8] Specifying cache topology on ARM

Alireza Sanaee via qemu development posted 8 patches 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260220141128.789-1-alireza.sanaee@huawei.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Igor Mammedov <imammedo@redhat.com>, Ani Sinha <anisinha@redhat.com>, Shannon Zhao <shannon.zhaosl@gmail.com>, Peter Maydell <peter.maydell@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Yanan Wang <wangyanan55@huawei.com>, Zhao Liu <zhao1.liu@intel.com>, Song Gao <gaosong@loongson.cn>, Bibo Mao <maobibo@loongson.cn>, Jiaxun Yang <jiaxun.yang@flygoat.com>
hw/acpi/aml-build.c                        | 201 +++++++++++-
hw/arm/virt-acpi-build.c                   |   8 +-
hw/arm/virt.c                              | 364 ++++++++++++++++++++-
hw/core/machine-smp.c                      |  52 +++
hw/loongarch/virt-acpi-build.c             |   4 +-
include/hw/acpi/aml-build.h                |   4 +-
include/hw/acpi/cpu.h                      |  10 +
include/hw/arm/virt.h                      |  17 +-
include/hw/core/boards.h                   |   5 +
include/hw/core/cpu.h                      |  12 +
target/arm/tcg/cpu64.c                     |  10 +
tests/data/acpi/aarch64/virt/PPTT.topology | Bin 356 -> 516 bytes
tests/qtest/bios-tables-test.c             |   4 +
13 files changed, 676 insertions(+), 15 deletions(-)
[PATCH v18 0/8] Specifying cache topology on ARM
Posted by Alireza Sanaee via qemu development 1 week ago
Specifying the cache layout in virtual machines is useful for applications and
operating systems to fetch accurate information about the cache structure and
make appropriate adjustments. Enforcing correct sharing information can lead to
better optimizations. Patches that allow for an interface to express caches was
landed in the prior cycles. This patchset uses the interface as a foundation.
Thus, the device tree and ACPI/PPTT table, and device tree are populated based
on user-provided information and CPU topology.

Example:

+----------------+                         +----------------+
|    Socket 0    |                         |    Socket 1    |
|    (L3 Cache)  |                         |    (L3 Cache)  |
+--------+-------+                         +--------+-------+
         |                                          |
+--------+--------+                        +--------+--------+
|   Cluster 0     |                        |   Cluster 0     |
|   (L2 Cache)    |                        |   (L2 Cache)    |
+--------+--------+                        +--------+--------+
         |                                          |
+--------+--------+  +--------+--------+   +--------+--------+  +--------+----+
|   Core 0         | |   Core 1        |   |   Core 0        |  |   Core 1    |
|   (L1i, L1d)     | |   (L1i, L1d)    |   |   (L1i, L1d)    |  |   (L1i, L1d)|
+--------+--------+  +--------+--------+   +--------+--------+  +--------+----+
         |                    |                     |                    |
+--------+              +--------+             +--------+           +--------+
|Thread 0|              |Thread 1|             |Thread 1|           |Thread 0|
+--------+              +--------+             +--------+           +--------+
|Thread 1|              |Thread 0|             |Thread 0|           |Thread 1|
+--------+              +--------+             +--------+           +--------+


The following command will represent the system relying on **ACPI PPTT tables**.

./qemu-system-aarch64 \
 -machine virt,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluster,smp-cache.3.cache=l3,smp-cache.3.topology=socket \
 -cpu max \
 -m 2048 \
 -smp sockets=2,clusters=1,cores=2,threads=2 \
 -kernel ./Image.gz \
 -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
 -initrd rootfs.cpio.gz \
 -bios ./edk2-aarch64-code.fd \
 -nographic

The following command will represent the system relying on **the device tree**.

./qemu-system-aarch64 \
 -machine virt,acpi=off,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluster,smp-cache.3.cache=l3,smp-cache.3.topology=socket \
 -cpu max \
 -m 2048 \
 -smp sockets=2,clusters=1,cores=2,threads=2 \
 -kernel ./Image.gz \
 -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=off" \
 -initrd rootfs.cpio.gz \
 -nographic

Failure cases:
	1) There are scenarios where caches exist in systems' registers but left
unspecified by users. In this case qemu returns failure.

	2) SMT threads cannot share caches which is not very common. More
discussions here [1].

Currently only three levels of caches are supported to be specified from the
command line. However, increasing the value does not require significant
changes. Further, this patch assumes l2 and l3 unified caches and does not
allow l(2/3)(i/d). The level terminology is thread/core/cluster/socket.
Hierarchy assumed in this patch: Socket level = Cluster level + 1 = Core level
+ 2 = Thread level + 3;

Possible future enhancements:
  1) Separated data and instruction cache at L2 and L3.
  2) Additional cache controls.  e.g. size of L3 may not want to just match the
underlying system, because only some of the associated host CPUs may be bound
to this VM.

[1] https://lore.kernel.org/devicetree-spec/20250203120527.3534-1-alireza.sanaee@huawei.com/

Change Log:
  v17->v18:
    * Rebased to 07f97d5da04a9f97e273de85c76f5017d8135a6e.
    * Addressed Mst's comments regarding documenting build_cache_node function. Thanks for the review.
    * v17: https://lore.kernel.org/all/20260106155828.643-1-alireza.sanaee@huawei.com/
  v16->v17:
    * Rebased to 0fc482b73d8e085d1375b4e17b0647fd2e6fe8f0.
    * Addressed Gustavo's comments. Thanks for the review.
    * v16: https://lore.kernel.org/all/20250827142152.206-1-alireza.sanaee@huawei.com/

  v15->v16:
    * Rebase to e771ba98de25c9f43959f79fc7099cf7fbba44cce.
    * v15: https://lore.kernel.org/qemu-devel/20250812122829.204-1-alireza.sanaee@huawei.com/

  v14->v15:
   * Introduced a separate patch for loongarch64 build_pptt function.
   * Made sure loongarch64 tests pass.
   * Downgraded to V2 for ACPI PPTT. Removed PPTT IDs as was not necessary.
   * Removed dependency as it's been merged in the recent cycle.
     -- 20250604115233.1234-1-alireza.sanaee@huawei.com
   * Fixed styling issues and removed irrelevant changes.
   * Moved cache headers to core/cpu.h to be used in both acpi and virt.
   * v14: https://lore.kernel.org/qemu-devel/20250707121908.155-1-alireza.sanaee@huawei.com/
   # Thanks to Jonathan and Zhao for their comments.

  v13->v14:
   * Rebased on latest staging.
   * Made some naming changes to machine-smp.c, addd docs added to the
        same file.

  v12->v13:
   * Applied comments from Zhao.
   * Introduced a new patch for machine specific cache topology functions.
   * Base: bc98ffdc7577e55ab8373c579c28fe24d600c40f.

  v11->v12:
   * Patch #4 couldn't not merge properly as the main file diverged. Now it is fixed (hopefully).
   * Loonarch build_pptt function updated.
   * Rebased on 09be8a511a2e278b45729d7b065d30c68dd699d0.

  v10->v11:
   * Fix some coding style issues.
   * Rename some variables.

  v9->v10:
   * PPTT rev down to 2.

  v8->v9:
   * rebase to 10
   * Fixed a bug in device-tree generation related to a scenario when
        caches are shared at core in higher levels than 1.
  v7->v8:
   * rebase: Merge tag 'pull-nbd-2024-08-26' of https://repo.or.cz/qemu/ericb into staging
   * I mis-included a file in patch #4 and I removed it in this one.

  v6->v7:
   * Intel stuff got pulled up, so rebase.
   * added some discussions on device tree.

  v5->v6:
   * Minor bug fix.
   * rebase based on new Intel patchset.
     - https://lore.kernel.org/qemu-devel/20250110145115.1574345-1-zhao1.liu@intel.com/

  v4->v5:
    * Added Reviewed-by tags.
    * Applied some comments.

  v3->v4:
    * Device tree added.

Alireza Sanaee (8):
  target/arm/tcg: increase cache level for cpu=max
  hw/core/machine: topology functions capabilities added
  hw/arm/virt: add cache hierarchy to device tree
  bios-tables-test: prepare to change ARM ACPI virt PPTT
  acpi: Add parameters to pass cache descriptions to ACPI build_pptt()
  hw/acpi: add cache hierarchy to pptt table
  tests/qtest/bios-table-test: testing new ARM ACPI PPTT topology
  Update the ACPI tables based on new aml-build.c

 hw/acpi/aml-build.c                        | 201 +++++++++++-
 hw/arm/virt-acpi-build.c                   |   8 +-
 hw/arm/virt.c                              | 364 ++++++++++++++++++++-
 hw/core/machine-smp.c                      |  52 +++
 hw/loongarch/virt-acpi-build.c             |   4 +-
 include/hw/acpi/aml-build.h                |   4 +-
 include/hw/acpi/cpu.h                      |  10 +
 include/hw/arm/virt.h                      |  17 +-
 include/hw/core/boards.h                   |   5 +
 include/hw/core/cpu.h                      |  12 +
 target/arm/tcg/cpu64.c                     |  10 +
 tests/data/acpi/aarch64/virt/PPTT.topology | Bin 356 -> 516 bytes
 tests/qtest/bios-tables-test.c             |   4 +
 13 files changed, 676 insertions(+), 15 deletions(-)

-- 
2.43.0