hw/acpi/aml-build.c | 307 +++++++++++++++++++++++++++++++++++- hw/arm/virt-acpi-build.c | 137 +++++++++++++++- hw/arm/virt.c | 5 + hw/core/machine-smp.c | 6 +- hw/loongarch/acpi-build.c | 3 +- include/hw/acpi/aml-build.h | 20 ++- target/arm/tcg/cpu64.c | 35 ++++ 7 files changed, 503 insertions(+), 10 deletions(-)
Specifying the cache layout in virtual machines is useful for applications and operating systems to fetch accurate information about the cache structure and make appropriate adjustments. Enforcing correct sharing information can lead to better optimizations. This patch enables the specification of cache layout through a command line parameter, building on a patch set by Intel [1]. It uses this set as a foundation. The ACPI/PPTT table is populated based on user-provided information and CPU topology. Example: +----------------+ +----------------+ | Socket 0 | | Socket 1 | | (L3 Cache) | | (L3 Cache) | +--------+-------+ +--------+-------+ | | +--------+--------+ +--------+--------+ | Cluster 0 | | Cluster 0 | | (L2 Cache) | | (L2 Cache) | +--------+--------+ +--------+--------+ | | +--------+--------+ +--------+--------+ +--------+--------+ +--------+----+ | Core 0 | | Core 1 | | Core 0 | | Core 1 | | (L1i, L1d) | | (L1i, L1d) | | (L1i, L1d) | | (L1i, L1d)| +--------+--------+ +--------+--------+ +--------+--------+ +--------+----+ | | | | +--------+ +--------+ +--------+ +--------+ |Thread 0| |Thread 1| |Thread 1| |Thread 0| +--------+ +--------+ +--------+ +--------+ |Thread 1| |Thread 0| |Thread 0| |Thread 1| +--------+ +--------+ +--------+ +--------+ The following command will represent the system. ./qemu-system-aarch64 \ -machine virt,**smp-cache=cache0** \ -cpu max \ -m 2048 \ -smp sockets=2,clusters=1,cores=2,threads=2 \ -kernel ./Image.gz \ -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \ -initrd rootfs.cpio.gz \ -bios ./edk2-aarch64-code.fd \ **-object '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}'** \ -nographic Failure cases: 1) there are cases where QEMU might not have any clusters selected in the -smp option, while user specifies caches to be shared at cluster level. In this situations, qemu returns error. 2) There are other scenarios where caches exist in systems' registers but not left unspecified by users. In this case qemu returns failure. Currently only three levels of caches are supported to be specified from the command line. However, increasing the value does not require significant changes. Further, this patch assumes l2 and l3 unified caches and does not allow l(2/3)(i/d). The level terminology is thread/core/cluster/socket right now. Here is the hierarchy assumed in this patch: Socket level = Cluster level + 1 = Core level + 2 = Thread level + 3; [1] https://lore.kernel.org/qemu-devel/20240704031603.1744546-1-zhao1.liu@intel.com/#r TODO: 1) Making the code to work with arbitrary levels 2) Separated data and instruction cache at L2 and L3. 3) Allow for different Data or Instruction only at a particular level. 4) Additional cache controls. e.g. size of L3 may not want to just match the underlying system, because only some of the associated host CPUs may be bound to this VM. 5) Add device tree related code to generate info related to caches. Alireza Sanaee (2): target/arm/tcg: increase cache level for cpu=max hw/acpi: add cache hierarchy node to pptt table hw/acpi/aml-build.c | 307 +++++++++++++++++++++++++++++++++++- hw/arm/virt-acpi-build.c | 137 +++++++++++++++- hw/arm/virt.c | 5 + hw/core/machine-smp.c | 6 +- hw/loongarch/acpi-build.c | 3 +- include/hw/acpi/aml-build.h | 20 ++- target/arm/tcg/cpu64.c | 35 ++++ 7 files changed, 503 insertions(+), 10 deletions(-) -- 2.34.1
On 23.08.2024 14:54, Alireza Sanaee via wrote: > Failure cases: > 1) there are cases where QEMU might not have any clusters selected in the > -smp option, while user specifies caches to be shared at cluster level. In > this situations, qemu returns error. > > 2) There are other scenarios where caches exist in systems' registers but > not left unspecified by users. In this case qemu returns failure. Sockets, clusters, cores, threads. And then caches. Sounds like more fun than it is already. IIRC Arm hardware can have up to 16 cores per cluster (virt uses 16, sbsa-ref uses 8) as this is GIC limitation. I have a script to visualize Arm topology: https://github.com/hrw/sbsa-ref-status/blob/main/parse-pptt-log.py It uses 'EFIShell> acpiview -s PPTT' output and gives something like this: -smp 24,sockets=1,clusters=2,cores=3,threads=4 socket: offset: 0x24 parent: 0x0 cluster: offset: 0x38 parent: 0x24 core: offset: 0x4C parent: 0x38 cpuId: 0x0 L1i: 0x68 L1d: 0x84 cache: offset: 0x68 cacheId: 1 size: 0x10000 next: 0xA0 cache: offset: 0x84 cacheId: 2 size: 0x10000 next: 0xA0 cache: offset: 0xA0 cacheId: 3 size: 0x80000 thread: offset: 0xBC parent: 0x4C cpuId: 0x0 thread: offset: 0xD0 parent: 0x4C cpuId: 0x1 thread: offset: 0xE4 parent: 0x4C cpuId: 0x2 thread: offset: 0xF8 parent: 0x4C cpuId: 0x3 core: offset: 0x10C parent: 0x38 cpuId: 0x0 L1i: 0x128 L1d: 0x144 cache: offset: 0x128 cacheId: 4 size: 0x10000 next: 0x160 cache: offset: 0x144 cacheId: 5 size: 0x10000 next: 0x160 cache: offset: 0x160 cacheId: 6 size: 0x80000 thread: offset: 0x17C parent: 0x10C cpuId: 0x4 thread: offset: 0x190 parent: 0x10C cpuId: 0x5 thread: offset: 0x1A4 parent: 0x10C cpuId: 0x6 thread: offset: 0x1B8 parent: 0x10C cpuId: 0x7 core: offset: 0x1CC parent: 0x38 cpuId: 0x0 L1i: 0x1E8 L1d: 0x204 cache: offset: 0x1E8 cacheId: 7 size: 0x10000 next: 0x220 cache: offset: 0x204 cacheId: 8 size: 0x10000 next: 0x220 cache: offset: 0x220 cacheId: 9 size: 0x80000 thread: offset: 0x23C parent: 0x1CC cpuId: 0x8 thread: offset: 0x250 parent: 0x1CC cpuId: 0x9 thread: offset: 0x264 parent: 0x1CC cpuId: 0xA thread: offset: 0x278 parent: 0x1CC cpuId: 0xB cluster: offset: 0x28C parent: 0x24 core: offset: 0x2A0 parent: 0x28C cpuId: 0x0 L1i: 0x2BC L1d: 0x2D8 cache: offset: 0x2BC cacheId: 10 size: 0x10000 next: 0x2F4 cache: offset: 0x2D8 cacheId: 11 size: 0x10000 next: 0x2F4 cache: offset: 0x2F4 cacheId: 12 size: 0x80000 thread: offset: 0x310 parent: 0x2A0 cpuId: 0xC thread: offset: 0x324 parent: 0x2A0 cpuId: 0xD thread: offset: 0x338 parent: 0x2A0 cpuId: 0xE thread: offset: 0x34C parent: 0x2A0 cpuId: 0xF core: offset: 0x360 parent: 0x28C cpuId: 0x0 L1i: 0x37C L1d: 0x398 cache: offset: 0x37C cacheId: 13 size: 0x10000 next: 0x3B4 cache: offset: 0x398 cacheId: 14 size: 0x10000 next: 0x3B4 cache: offset: 0x3B4 cacheId: 15 size: 0x80000 thread: offset: 0x3D0 parent: 0x360 cpuId: 0x10 thread: offset: 0x3E4 parent: 0x360 cpuId: 0x11 thread: offset: 0x3F8 parent: 0x360 cpuId: 0x12 thread: offset: 0x40C parent: 0x360 cpuId: 0x13 core: offset: 0x420 parent: 0x28C cpuId: 0x0 L1i: 0x43C L1d: 0x458 cache: offset: 0x43C cacheId: 16 size: 0x10000 next: 0x474 cache: offset: 0x458 cacheId: 17 size: 0x10000 next: 0x474 cache: offset: 0x474 cacheId: 18 size: 0x80000 thread: offset: 0x490 parent: 0x420 cpuId: 0x14 thread: offset: 0x4A4 parent: 0x420 cpuId: 0x15 thread: offset: 0x4B8 parent: 0x420 cpuId: 0x16 thread: offset: 0x4CC parent: 0x420 cpuId: 0x17 You may find it useful. I tested it only with cache at either core or cluster level.
Hi Alireza, Great to see your Arm side implementation! On Fri, Aug 23, 2024 at 01:54:44PM +0100, Alireza Sanaee wrote: > Date: Fri, 23 Aug 2024 13:54:44 +0100 > From: Alireza Sanaee <alireza.sanaee@huawei.com> > Subject: [RFC PATCH 0/2] Specifying cache topology on ARM > X-Mailer: git-send-email 2.34.1 > [snip] > > The following command will represent the system. > > ./qemu-system-aarch64 \ > -machine virt,**smp-cache=cache0** \ > -cpu max \ > -m 2048 \ > -smp sockets=2,clusters=1,cores=2,threads=2 \ > -kernel ./Image.gz \ > -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \ > -initrd rootfs.cpio.gz \ > -bios ./edk2-aarch64-code.fd \ > **-object '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}'** \ > -nographic I plan to refresh a new version soon, in which the smp-cache array will be integrated into -machine totally. And I'cc you then. Regards, Zhao
On Sat, 31 Aug 2024 19:25:47 +0800 Zhao Liu <zhao1.liu@intel.com> wrote: > Hi Alireza, > > Great to see your Arm side implementation! > > On Fri, Aug 23, 2024 at 01:54:44PM +0100, Alireza Sanaee wrote: > > Date: Fri, 23 Aug 2024 13:54:44 +0100 > > From: Alireza Sanaee <alireza.sanaee@huawei.com> > > Subject: [RFC PATCH 0/2] Specifying cache topology on ARM > > X-Mailer: git-send-email 2.34.1 > > > > [snip] > > > > > The following command will represent the system. > > > > ./qemu-system-aarch64 \ > > -machine virt,**smp-cache=cache0** \ > > -cpu max \ > > -m 2048 \ > > -smp sockets=2,clusters=1,cores=2,threads=2 \ > > -kernel ./Image.gz \ > > -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \ > > -initrd rootfs.cpio.gz \ > > -bios ./edk2-aarch64-code.fd \ > > **-object > > '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}'** > > \ -nographic > > I plan to refresh a new version soon, in which the smp-cache array > will be integrated into -machine totally. And I'cc you then. > > Regards, > Zhao > > Hi Zhao, Yes, please keep me CCed. One thing that I noticed, sometimes, since you were going down the Intel path, some variables couldn't be NULL. But when I was gonna go down to ARM path, I faced some scenarios where I ended up with some uninit vars which is still OK but could have been avoided. Looking forward to the next revision. Alireza
On Mon, Sep 02, 2024 at 11:25:19AM +0100, Alireza Sanaee wrote: > > Hi Zhao, > > Yes, please keep me CCed. > > One thing that I noticed, sometimes, since you were going down the > Intel path, some variables couldn't be NULL. But when I was gonna go > down to ARM path, I faced some scenarios where I ended up with > some uninit vars which is still OK but could have been avoided. Ah I didn't get your point very clearly. Could you please figure out those places on my patches? Then I can fix them in my next version. :) Thanks, Zhao > Looking forward to the next revision. > > Alireza
© 2016 - 2024 Red Hat, Inc.