hw/acpi/aml-build.c | 307 +++++++++++++++++++++++++++++++++++- hw/arm/virt-acpi-build.c | 137 +++++++++++++++- hw/arm/virt.c | 5 + hw/core/machine-smp.c | 6 +- hw/loongarch/acpi-build.c | 3 +- include/hw/acpi/aml-build.h | 20 ++- target/arm/tcg/cpu64.c | 35 ++++ 7 files changed, 503 insertions(+), 10 deletions(-)
Specifying the cache layout in virtual machines is useful for
applications and operating systems to fetch accurate information about
the cache structure and make appropriate adjustments. Enforcing correct
sharing information can lead to better optimizations. This patch enables
the specification of cache layout through a command line parameter,
building on a patch set by Intel [1]. It uses this set as a foundation.
The ACPI/PPTT table is populated based on user-provided information and
CPU topology.
Example:
+----------------+ +----------------+
| Socket 0 | | Socket 1 |
| (L3 Cache) | | (L3 Cache) |
+--------+-------+ +--------+-------+
| |
+--------+--------+ +--------+--------+
| Cluster 0 | | Cluster 0 |
| (L2 Cache) | | (L2 Cache) |
+--------+--------+ +--------+--------+
| |
+--------+--------+ +--------+--------+ +--------+--------+ +--------+----+
| Core 0 | | Core 1 | | Core 0 | | Core 1 |
| (L1i, L1d) | | (L1i, L1d) | | (L1i, L1d) | | (L1i, L1d)|
+--------+--------+ +--------+--------+ +--------+--------+ +--------+----+
| | | |
+--------+ +--------+ +--------+ +--------+
|Thread 0| |Thread 1| |Thread 1| |Thread 0|
+--------+ +--------+ +--------+ +--------+
|Thread 1| |Thread 0| |Thread 0| |Thread 1|
+--------+ +--------+ +--------+ +--------+
The following command will represent the system.
./qemu-system-aarch64 \
-machine virt,**smp-cache=cache0** \
-cpu max \
-m 2048 \
-smp sockets=2,clusters=1,cores=2,threads=2 \
-kernel ./Image.gz \
-append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
-initrd rootfs.cpio.gz \
-bios ./edk2-aarch64-code.fd \
**-object '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}'** \
-nographic
Failure cases:
1) there are cases where QEMU might not have any clusters selected in the
-smp option, while user specifies caches to be shared at cluster level. In
this situations, qemu returns error.
2) There are other scenarios where caches exist in systems' registers but
not left unspecified by users. In this case qemu returns failure.
Currently only three levels of caches are supported to be specified from
the command line. However, increasing the value does not require
significant changes. Further, this patch assumes l2 and l3 unified
caches and does not allow l(2/3)(i/d). The level terminology is
thread/core/cluster/socket right now.
Here is the hierarchy assumed in this patch:
Socket level = Cluster level + 1 = Core level + 2 = Thread level + 3;
[1] https://lore.kernel.org/qemu-devel/20240704031603.1744546-1-zhao1.liu@intel.com/#r
TODO:
1) Making the code to work with arbitrary levels
2) Separated data and instruction cache at L2 and L3.
3) Allow for different Data or Instruction only at a particular level.
4) Additional cache controls. e.g. size of L3 may not want to just
match the underlying system, because only some of the associated host
CPUs may be bound to this VM.
5) Add device tree related code to generate info related to caches.
Alireza Sanaee (2):
target/arm/tcg: increase cache level for cpu=max
hw/acpi: add cache hierarchy node to pptt table
hw/acpi/aml-build.c | 307 +++++++++++++++++++++++++++++++++++-
hw/arm/virt-acpi-build.c | 137 +++++++++++++++-
hw/arm/virt.c | 5 +
hw/core/machine-smp.c | 6 +-
hw/loongarch/acpi-build.c | 3 +-
include/hw/acpi/aml-build.h | 20 ++-
target/arm/tcg/cpu64.c | 35 ++++
7 files changed, 503 insertions(+), 10 deletions(-)
--
2.34.1
On 23.08.2024 14:54, Alireza Sanaee via wrote:
> Failure cases:
> 1) there are cases where QEMU might not have any clusters selected in the
> -smp option, while user specifies caches to be shared at cluster level. In
> this situations, qemu returns error.
>
> 2) There are other scenarios where caches exist in systems' registers but
> not left unspecified by users. In this case qemu returns failure.
Sockets, clusters, cores, threads. And then caches. Sounds like more fun
than it is already.
IIRC Arm hardware can have up to 16 cores per cluster (virt uses 16,
sbsa-ref uses 8) as this is GIC limitation.
I have a script to visualize Arm topology:
https://github.com/hrw/sbsa-ref-status/blob/main/parse-pptt-log.py
It uses 'EFIShell> acpiview -s PPTT' output and gives something like this:
-smp 24,sockets=1,clusters=2,cores=3,threads=4
socket: offset: 0x24 parent: 0x0
cluster: offset: 0x38 parent: 0x24
core: offset: 0x4C parent: 0x38 cpuId: 0x0 L1i: 0x68 L1d: 0x84
cache: offset: 0x68 cacheId: 1 size: 0x10000 next: 0xA0
cache: offset: 0x84 cacheId: 2 size: 0x10000 next: 0xA0
cache: offset: 0xA0 cacheId: 3 size: 0x80000
thread: offset: 0xBC parent: 0x4C cpuId: 0x0
thread: offset: 0xD0 parent: 0x4C cpuId: 0x1
thread: offset: 0xE4 parent: 0x4C cpuId: 0x2
thread: offset: 0xF8 parent: 0x4C cpuId: 0x3
core: offset: 0x10C parent: 0x38 cpuId: 0x0 L1i: 0x128 L1d: 0x144
cache: offset: 0x128 cacheId: 4 size: 0x10000 next: 0x160
cache: offset: 0x144 cacheId: 5 size: 0x10000 next: 0x160
cache: offset: 0x160 cacheId: 6 size: 0x80000
thread: offset: 0x17C parent: 0x10C cpuId: 0x4
thread: offset: 0x190 parent: 0x10C cpuId: 0x5
thread: offset: 0x1A4 parent: 0x10C cpuId: 0x6
thread: offset: 0x1B8 parent: 0x10C cpuId: 0x7
core: offset: 0x1CC parent: 0x38 cpuId: 0x0 L1i: 0x1E8 L1d: 0x204
cache: offset: 0x1E8 cacheId: 7 size: 0x10000 next: 0x220
cache: offset: 0x204 cacheId: 8 size: 0x10000 next: 0x220
cache: offset: 0x220 cacheId: 9 size: 0x80000
thread: offset: 0x23C parent: 0x1CC cpuId: 0x8
thread: offset: 0x250 parent: 0x1CC cpuId: 0x9
thread: offset: 0x264 parent: 0x1CC cpuId: 0xA
thread: offset: 0x278 parent: 0x1CC cpuId: 0xB
cluster: offset: 0x28C parent: 0x24
core: offset: 0x2A0 parent: 0x28C cpuId: 0x0 L1i: 0x2BC L1d: 0x2D8
cache: offset: 0x2BC cacheId: 10 size: 0x10000 next: 0x2F4
cache: offset: 0x2D8 cacheId: 11 size: 0x10000 next: 0x2F4
cache: offset: 0x2F4 cacheId: 12 size: 0x80000
thread: offset: 0x310 parent: 0x2A0 cpuId: 0xC
thread: offset: 0x324 parent: 0x2A0 cpuId: 0xD
thread: offset: 0x338 parent: 0x2A0 cpuId: 0xE
thread: offset: 0x34C parent: 0x2A0 cpuId: 0xF
core: offset: 0x360 parent: 0x28C cpuId: 0x0 L1i: 0x37C L1d: 0x398
cache: offset: 0x37C cacheId: 13 size: 0x10000 next: 0x3B4
cache: offset: 0x398 cacheId: 14 size: 0x10000 next: 0x3B4
cache: offset: 0x3B4 cacheId: 15 size: 0x80000
thread: offset: 0x3D0 parent: 0x360 cpuId: 0x10
thread: offset: 0x3E4 parent: 0x360 cpuId: 0x11
thread: offset: 0x3F8 parent: 0x360 cpuId: 0x12
thread: offset: 0x40C parent: 0x360 cpuId: 0x13
core: offset: 0x420 parent: 0x28C cpuId: 0x0 L1i: 0x43C L1d: 0x458
cache: offset: 0x43C cacheId: 16 size: 0x10000 next: 0x474
cache: offset: 0x458 cacheId: 17 size: 0x10000 next: 0x474
cache: offset: 0x474 cacheId: 18 size: 0x80000
thread: offset: 0x490 parent: 0x420 cpuId: 0x14
thread: offset: 0x4A4 parent: 0x420 cpuId: 0x15
thread: offset: 0x4B8 parent: 0x420 cpuId: 0x16
thread: offset: 0x4CC parent: 0x420 cpuId: 0x17
You may find it useful. I tested it only with cache at either core or
cluster level.
Hi Alireza,
Great to see your Arm side implementation!
On Fri, Aug 23, 2024 at 01:54:44PM +0100, Alireza Sanaee wrote:
> Date: Fri, 23 Aug 2024 13:54:44 +0100
> From: Alireza Sanaee <alireza.sanaee@huawei.com>
> Subject: [RFC PATCH 0/2] Specifying cache topology on ARM
> X-Mailer: git-send-email 2.34.1
>
[snip]
>
> The following command will represent the system.
>
> ./qemu-system-aarch64 \
> -machine virt,**smp-cache=cache0** \
> -cpu max \
> -m 2048 \
> -smp sockets=2,clusters=1,cores=2,threads=2 \
> -kernel ./Image.gz \
> -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
> -initrd rootfs.cpio.gz \
> -bios ./edk2-aarch64-code.fd \
> **-object '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}'** \
> -nographic
I plan to refresh a new version soon, in which the smp-cache array will
be integrated into -machine totally. And I'cc you then.
Regards,
Zhao
On Sat, 31 Aug 2024 19:25:47 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:
> Hi Alireza,
>
> Great to see your Arm side implementation!
>
> On Fri, Aug 23, 2024 at 01:54:44PM +0100, Alireza Sanaee wrote:
> > Date: Fri, 23 Aug 2024 13:54:44 +0100
> > From: Alireza Sanaee <alireza.sanaee@huawei.com>
> > Subject: [RFC PATCH 0/2] Specifying cache topology on ARM
> > X-Mailer: git-send-email 2.34.1
> >
>
> [snip]
>
> >
> > The following command will represent the system.
> >
> > ./qemu-system-aarch64 \
> > -machine virt,**smp-cache=cache0** \
> > -cpu max \
> > -m 2048 \
> > -smp sockets=2,clusters=1,cores=2,threads=2 \
> > -kernel ./Image.gz \
> > -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
> > -initrd rootfs.cpio.gz \
> > -bios ./edk2-aarch64-code.fd \
> > **-object
> > '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}'**
> > \ -nographic
>
> I plan to refresh a new version soon, in which the smp-cache array
> will be integrated into -machine totally. And I'cc you then.
>
> Regards,
> Zhao
>
>
Hi Zhao,
Yes, please keep me CCed.
One thing that I noticed, sometimes, since you were going down the
Intel path, some variables couldn't be NULL. But when I was gonna go
down to ARM path, I faced some scenarios where I ended up with
some uninit vars which is still OK but could have been avoided.
Looking forward to the next revision.
Alireza
On Mon, Sep 02, 2024 at 11:25:19AM +0100, Alireza Sanaee wrote: > > Hi Zhao, > > Yes, please keep me CCed. > > One thing that I noticed, sometimes, since you were going down the > Intel path, some variables couldn't be NULL. But when I was gonna go > down to ARM path, I faced some scenarios where I ended up with > some uninit vars which is still OK but could have been avoided. Ah I didn't get your point very clearly. Could you please figure out those places on my patches? Then I can fix them in my next version. :) Thanks, Zhao > Looking forward to the next revision. > > Alireza
© 2016 - 2026 Red Hat, Inc.