[RFC 8/8] qemu-options: Add the cache topology description of -smp

Zhao Liu posted 8 patches 8 months, 3 weeks ago
There is a newer version of this series
[RFC 8/8] qemu-options: Add the cache topology description of -smp
Posted by Zhao Liu 8 months, 3 weeks ago
From: Zhao Liu <zhao1.liu@intel.com>

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
 qemu-options.hx | 54 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 47 insertions(+), 7 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 70eaf3256685..85c78c99a3b0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -281,7 +281,9 @@ ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
     "-smp [[cpus=]n][,maxcpus=maxcpus][,drawers=drawers][,books=books][,sockets=sockets]\n"
-    "               [,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]\n"
+    "               [,dies=dies][,clusters=clusters][,modules=modules][,cores=cores]\n"
+    "               [,threads=threads][,l1d-cache=level][,l1i-cache=level][,l2-cache=level]\n"
+    "               [,l3-cache=level]\n"
     "                set the number of initial CPUs to 'n' [default=1]\n"
     "                maxcpus= maximum number of total CPUs, including\n"
     "                offline CPUs for hotplug, etc\n"
@@ -290,9 +292,14 @@ DEF("smp", HAS_ARG, QEMU_OPTION_smp,
     "                sockets= number of sockets in one book\n"
     "                dies= number of dies in one socket\n"
     "                clusters= number of clusters in one die\n"
-    "                cores= number of cores in one cluster\n"
+    "                modules= number of modules in one cluster\n"
+    "                cores= number of cores in one module\n"
     "                threads= number of threads in one core\n"
-    "Note: Different machines may have different subsets of the CPU topology\n"
+    "                l1d-cache= topology level of L1 D-cache\n"
+    "                l1i-cache= topology level of L1 I-cache\n"
+    "                l2-cache= topology level of L2 cache\n"
+    "                l3-cache= topology level of L3 cache\n"
+    "Note: Different machines may have different subsets of the CPU and cache topology\n"
     "      parameters supported, so the actual meaning of the supported parameters\n"
     "      will vary accordingly. For example, for a machine type that supports a\n"
     "      three-level CPU hierarchy of sockets/cores/threads, the parameters will\n"
@@ -306,7 +313,7 @@ DEF("smp", HAS_ARG, QEMU_OPTION_smp,
     "      must be set as 1 in the purpose of correct parsing.\n",
     QEMU_ARCH_ALL)
 SRST
-``-smp [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]``
+``-smp [[cpus=]n][,maxcpus=maxcpus][,drawers=drawers][,books=books][,sockets=sockets][,dies=dies][,clusters=clusters][,modules=modules][,cores=cores][,threads=threads][,l1d-cache=level][,l1i-cache=level][,l2-cache=level][,l3-cache=level]``
     Simulate a SMP system with '\ ``n``\ ' CPUs initially present on
     the machine type board. On boards supporting CPU hotplug, the optional
     '\ ``maxcpus``\ ' parameter can be set to enable further CPUs to be
@@ -320,15 +327,34 @@ SRST
     Both parameters are subject to an upper limit that is determined by
     the specific machine type chosen.
 
+    CPU topology parameters include '\ ``drawers``\ ', '\ ``books``\ ',
+    '\ ``sockets``\ ', '\ ``dies``\ ', '\ ``clusters``\ ', '\ ``modules``\ ',
+    '\ ``cores``\ ' and '\ ``threads``\ '. These CPU parameters accept only
+    integers and are used to specify the number of specific topology domains
+    under the corresponding topology level.
+
     To control reporting of CPU topology information, values of the topology
     parameters can be specified. Machines may only support a subset of the
-    parameters and different machines may have different subsets supported
-    which vary depending on capacity of the corresponding CPU targets. So
-    for a particular machine type board, an expected topology hierarchy can
+    CPU topology parameters and different machines may have different subsets
+    supported which vary depending on capacity of the corresponding CPU targets.
+    So for a particular machine type board, an expected topology hierarchy can
     be defined through the supported sub-option. Unsupported parameters can
     also be provided in addition to the sub-option, but their values must be
     set as 1 in the purpose of correct parsing.
 
+    Cache topology parameters include '\ ``l1d-cache``\ ', '\ ``l1i-cache``\ ',
+    '\ ``l2-cache``\ ' and '\ ``l3-cache``\ '. These cache topology parameters
+    accept the strings of CPU topology levels (such as '\ ``drawer``\ ', '\ ``book``\ ',
+    '\ ``socket``\ ', '\ ``die``\ ', '\ ``cluster``\ ', '\ ``module``\ ',
+    '\ ``core``\ ' or '\ ``thread``\ '). Exactly which topology level strings
+    could be accepted as the parameter depends on the machine's support for the
+    corresponding CPU topology level.
+
+    Machines may also only support a subset of the cache topology parameters.
+    Unsupported cache topology parameters will be omitted, and correspondingly,
+    the target CPU's cache topology will use the its default cache topology
+    setting.
+
     Either the initial CPU count, or at least one of the topology parameters
     must be specified. The specified parameters must be greater than zero,
     explicit configuration like "cpus=0" is not allowed. Values for any
@@ -354,6 +380,20 @@ SRST
 
         -smp 32,sockets=2,dies=2,modules=2,cores=2,threads=2,maxcpus=32
 
+    The following sub-option defines a CPU topology hierarchy (2 sockets
+    totally on the machine, 2 dies per socket, 2 modules per die, 2 cores per
+    module, 2 threads per core) with 3-level cache topology hierarchy (L1
+    D-cache per core, L1 I-cache per core, L2 cache per core and L3 cache per
+    die) for PC machines which support sockets/dies/modules/cores/threads.
+    Some members of the CPU topology option can be omitted but their values
+    will be automatically computed. Some members of the cache topology
+    option can also be omitted and target CPU will use the default topology.:
+
+    ::
+
+        -smp 32,sockets=2,dies=2,modules=2,cores=2,threads=2,maxcpus=32,\
+             l1d-cache=core,l1i-cache=core,l2-cache=core,l3-cache=die
+
     The following sub-option defines a CPU topology hierarchy (2 sockets
     totally on the machine, 2 clusters per socket, 2 cores per cluster,
     2 threads per core) for ARM virt machines which support sockets/clusters
-- 
2.34.1
Re: [RFC 8/8] qemu-options: Add the cache topology description of -smp
Posted by Jonathan Cameron via 8 months, 3 weeks ago
On Tue, 20 Feb 2024 17:25:04 +0800
Zhao Liu <zhao1.liu@linux.intel.com> wrote:

> From: Zhao Liu <zhao1.liu@intel.com>
> 
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>

Hi,

A trivial comment, but also a possibly more significant one about
whether the defaults are correctly verified.

Jonathan
> ---
>  qemu-options.hx | 54 ++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 47 insertions(+), 7 deletions(-)
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 70eaf3256685..85c78c99a3b0 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -281,7 +281,9 @@ ERST
>  
>  DEF("smp", HAS_ARG, QEMU_OPTION_smp,
>      "-smp [[cpus=]n][,maxcpus=maxcpus][,drawers=drawers][,books=books][,sockets=sockets]\n"
> -    "               [,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]\n"
> +    "               [,dies=dies][,clusters=clusters][,modules=modules][,cores=cores]\n"
> +    "               [,threads=threads][,l1d-cache=level][,l1i-cache=level][,l2-cache=level]\n"
burns more characters but I'd go with
l1d->cache=topo_level

As level for a cache has a totally different meaning!

> +    "               [,l3-cache=level]\n"
>      "                set the number of initial CPUs to 'n' [default=1]\n"
>      "                maxcpus= maximum number of total CPUs, including\n"
>      "                offline CPUs for hotplug, etc\n"
> @@ -290,9 +292,14 @@ DEF("smp", HAS_ARG, QEMU_OPTION_smp,
>      "                sockets= number of sockets in one book\n"
>      "                dies= number of dies in one socket\n"
>      "                clusters= number of clusters in one die\n"
> -    "                cores= number of cores in one cluster\n"
> +    "                modules= number of modules in one cluster\n"
> +    "                cores= number of cores in one module\n"
>      "                threads= number of threads in one core\n"
> -    "Note: Different machines may have different subsets of the CPU topology\n"
> +    "                l1d-cache= topology level of L1 D-cache\n"
> +    "                l1i-cache= topology level of L1 I-cache\n"
> +    "                l2-cache= topology level of L2 cache\n"
> +    "                l3-cache= topology level of L3 cache\n"
> +    "Note: Different machines may have different subsets of the CPU and cache topology\n"

>  
>          -smp 32,sockets=2,dies=2,modules=2,cores=2,threads=2,maxcpus=32
>  
> +    The following sub-option defines a CPU topology hierarchy (2 sockets
> +    totally on the machine, 2 dies per socket, 2 modules per die, 2 cores per
> +    module, 2 threads per core) with 3-level cache topology hierarchy (L1
> +    D-cache per core, L1 I-cache per core, L2 cache per core and L3 cache per
> +    die) for PC machines which support sockets/dies/modules/cores/threads.
> +    Some members of the CPU topology option can be omitted but their values
> +    will be automatically computed. Some members of the cache topology
> +    option can also be omitted and target CPU will use the default topology.:

Given the default could be inconsistent I wonder if we should 'push' levels
up.  So if L2 not defined it is set either to default of equal to max of
l1i and l1d level. L3 either default or same level as l2.

Won't always correspond to a sensible system so maybe just rejecting
cases where default isn't possible is the best plan.  However I don't
see that verification as the checks on higher levels are gated on them
being specified.

> +
> +    ::
> +
> +        -smp 32,sockets=2,dies=2,modules=2,cores=2,threads=2,maxcpus=32,\
> +             l1d-cache=core,l1i-cache=core,l2-cache=core,l3-cache=die
> +
>      The following sub-option defines a CPU topology hierarchy (2 sockets
>      totally on the machine, 2 clusters per socket, 2 cores per cluster,
>      2 threads per core) for ARM virt machines which support sockets/clusters
Re: [RFC 8/8] qemu-options: Add the cache topology description of -smp
Posted by Zhao Liu 8 months, 2 weeks ago
Hi Jonathan,

On Mon, Feb 26, 2024 at 03:47:34PM +0000, Jonathan Cameron wrote:
> Date: Mon, 26 Feb 2024 15:47:34 +0000
> From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> Subject: Re: [RFC 8/8] qemu-options: Add the cache topology description of
>  -smp
> X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32)
> 
> On Tue, 20 Feb 2024 17:25:04 +0800
> Zhao Liu <zhao1.liu@linux.intel.com> wrote:
> 
> > From: Zhao Liu <zhao1.liu@intel.com>
> > 
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> 
> Hi,
> 
> A trivial comment, but also a possibly more significant one about
> whether the defaults are correctly verified.
> 
> Jonathan
> > ---
> >  qemu-options.hx | 54 ++++++++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 47 insertions(+), 7 deletions(-)
> > 
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 70eaf3256685..85c78c99a3b0 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -281,7 +281,9 @@ ERST
> >  
> >  DEF("smp", HAS_ARG, QEMU_OPTION_smp,
> >      "-smp [[cpus=]n][,maxcpus=maxcpus][,drawers=drawers][,books=books][,sockets=sockets]\n"
> > -    "               [,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]\n"
> > +    "               [,dies=dies][,clusters=clusters][,modules=modules][,cores=cores]\n"
> > +    "               [,threads=threads][,l1d-cache=level][,l1i-cache=level][,l2-cache=level]\n"
> burns more characters but I'd go with
> l1d->cache=topo_level
> 
> As level for a cache has a totally different meaning!

Yes, good catch! Thanks.

> 
> > +    "               [,l3-cache=level]\n"
> >      "                set the number of initial CPUs to 'n' [default=1]\n"
> >      "                maxcpus= maximum number of total CPUs, including\n"
> >      "                offline CPUs for hotplug, etc\n"
> > @@ -290,9 +292,14 @@ DEF("smp", HAS_ARG, QEMU_OPTION_smp,
> >      "                sockets= number of sockets in one book\n"
> >      "                dies= number of dies in one socket\n"
> >      "                clusters= number of clusters in one die\n"
> > -    "                cores= number of cores in one cluster\n"
> > +    "                modules= number of modules in one cluster\n"
> > +    "                cores= number of cores in one module\n"
> >      "                threads= number of threads in one core\n"
> > -    "Note: Different machines may have different subsets of the CPU topology\n"
> > +    "                l1d-cache= topology level of L1 D-cache\n"
> > +    "                l1i-cache= topology level of L1 I-cache\n"
> > +    "                l2-cache= topology level of L2 cache\n"
> > +    "                l3-cache= topology level of L3 cache\n"
> > +    "Note: Different machines may have different subsets of the CPU and cache topology\n"
> 
> >  
> >          -smp 32,sockets=2,dies=2,modules=2,cores=2,threads=2,maxcpus=32
> >  
> > +    The following sub-option defines a CPU topology hierarchy (2 sockets
> > +    totally on the machine, 2 dies per socket, 2 modules per die, 2 cores per
> > +    module, 2 threads per core) with 3-level cache topology hierarchy (L1
> > +    D-cache per core, L1 I-cache per core, L2 cache per core and L3 cache per
> > +    die) for PC machines which support sockets/dies/modules/cores/threads.
> > +    Some members of the CPU topology option can be omitted but their values
> > +    will be automatically computed. Some members of the cache topology
> > +    option can also be omitted and target CPU will use the default topology.:
> 
> Given the default could be inconsistent I wonder if we should 'push' levels
> up.  So if L2 not defined it is set either to default of equal to max of
> l1i and l1d level. L3 either default or same level as l2.

HMM, IIUC, I think there may be the case:

User sets L2 cache as per core and omits L3 cache. In this case, if L3
is per core (as L2) by default, how could we identify if that per core
L3 is the default or from user? We need to identify this becase x86's L3
is shared at die by default and L2 is shared at core level for current
CPU models.

To resolve this issue, we can add the status field in SMPCompatProps,
e.g., has_l3_cache, just like current SMPCompatProps.has_clusters, to
explicitly indicate that the L3 cache topo is set by user.

Then other caches also need the similar fields...It doesn't look as
simple as the current default invalid topology level.
 
> Won't always correspond to a sensible system so maybe just rejecting
> cases where default isn't possible is the best plan.  However I don't
> see that verification as the checks on higher levels are gated on them
> being specified.
> 
> > +
> > +    ::
> > +
> > +        -smp 32,sockets=2,dies=2,modules=2,cores=2,threads=2,maxcpus=32,\
> > +             l1d-cache=core,l1i-cache=core,l2-cache=core,l3-cache=die
> > +
> >      The following sub-option defines a CPU topology hierarchy (2 sockets
> >      totally on the machine, 2 clusters per socket, 2 cores per cluster,
> >      2 threads per core) for ARM virt machines which support sockets/clusters
>