Improve versioned CPU support in libvirt

[PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Jonathon Jongsma 1 year, 8 months ago

Previously, the script only generated the parent CPU and any versions
that had a defined alias. The script now generates all CPU versions. Any
version that had a defined alias will continue to use that alias, but
those without aliases will use the generated name $BASECPUNAME-vN.

The reason for this change is two-fold. First, we need to add new models
that support new features (such as SEV-SNP). To deal with this, the
script now generates model definitions for all versions.

But we also need to ensure that our CPU definitions are migration-safe.
To deal with this issue we need to make sure we're always using the
canonical versioned names for CPUs.

Qemu documentation states that unversioned names for CPU models (e.g.
'EPYC') are actually aliases to a specific versioned CPU model (e.g.
'EPYC-v1'). The documentation further states that the specific version
targeted by the alias may change based on the machine type of the
domain. Management software such as libvirt is directed to translate
these aliases to a concrete version in order to make sure that the CPU
definition is safe when migrating between different qemu versions that
may make different choices for which underlying versioned model
represents the alias.

In practice, at the time of writing qemu always maps the unversioned
aliases to the -v1 model. And libvirt's CPU model definitions also
assume that this is the case. For example, the 'x86_EPYC.xml' file
contains the features that are defined for the EPYC-v1 inside of qemu.
But if qemu ever changes their alias mapping, libvirt's idea of what an
'EPYC' CPU means and qemu's idea of what an 'EPYC' CPU means will no
longer match. So when choosing a CPU model for a domain, we should
always pass the canonical versioned name to libvirt rather than the
unversioned alias. To enable this, the script will generate a new
'canonical_name' field to the CPU model xml definition.

Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
---
 src/cpu_map/sync_qemu_models_i386.py | 42 ++++++++++++++++++++++------
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/src/cpu_map/sync_qemu_models_i386.py b/src/cpu_map/sync_qemu_models_i386.py
index 1c6a2d4d27..7fd62eba4a 100755
--- a/src/cpu_map/sync_qemu_models_i386.py
+++ b/src/cpu_map/sync_qemu_models_i386.py
@@ -322,31 +322,55 @@ def expand_model(model):
     different fields and may have differing versions into several libvirt-
     friendly cpu models."""
 
-    result = {
-        "name": model.pop(".name"),
+    basename = model.pop(".name")
+    parent = {
+        "name": basename,
         "vendor": translate_vendor(model.pop(".vendor")),
         "features": set(),
         "extra": dict()}
 
     if ".family" in model and ".model" in model:
-        result["family"] = model.pop(".family")
-        result["model"] = model.pop(".model")
+        parent["family"] = model.pop(".family")
+        parent["model"] = model.pop(".model")
 
     for k in [k for k in model if k.startswith(".features")]:
         v = model.pop(k)
         for feature in v.split():
             translated = translate_feature(feature)
             if translated:
-                result["features"].add(translated)
+                parent["features"].add(translated)
 
     versions = model.pop(".versions", [])
     for k, v in model.items():
-        result["extra"]["model" + k] = v
-    yield result
+        parent["extra"]["model" + k] = v
+
+    if not versions:
+        yield parent
+        return
+
+    result = parent
 
     for version in versions:
+        # each version builds on the previous one
         result = copy.deepcopy(result)
-        result["name"] = version.pop(".alias", result["name"])
+        vnum = int(version.pop(".version"))
+        vname = "{}-v{}".format(basename, vnum)
+        result["canonical_name"] = vname
+        if vnum == 1:
+            # the first version should always be an alias for the parent and
+            # should therefore have no extra properties
+            if version.items():
+                raise RuntimeError("Unexpected properties in version 1")
+            yield result
+            continue
+
+        # prefer the 'alias' over the generated the name if it exists since we
+        # have already been using these aliases
+        alias = version.pop(".alias", None)
+        if alias:
+            result["name"] = alias
+        else:
+            result["name"] = vname
 
         props = version.pop(".props", dict())
         for k, v in props:
@@ -377,6 +401,8 @@ def output_model(f, model):
 
     f.write("<cpus>\n")
     f.write("  <model name='{}'>\n".format(model["name"]))
+    if "canonical_name" in model and model["name"] != model["canonical_name"]:
+        f.write("    <canonical_name>{}</canonical_name>\n".format(model["canonical_name"]))
     f.write("    <decode host='on' guest='on'/>\n")
     f.write("    <signature family='{}' model='{}'/>\n".format(
         model["family"], model["model"]))
-- 
2.41.0
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Daniel P. Berrangé 1 year, 6 months ago

On Fri, Dec 15, 2023 at 04:11:57PM -0600, Jonathon Jongsma wrote:
> Previously, the script only generated the parent CPU and any versions
> that had a defined alias. The script now generates all CPU versions. Any
> version that had a defined alias will continue to use that alias, but
> those without aliases will use the generated name $BASECPUNAME-vN.
> 
> The reason for this change is two-fold. First, we need to add new models
> that support new features (such as SEV-SNP). To deal with this, the
> script now generates model definitions for all versions.
> 
> But we also need to ensure that our CPU definitions are migration-safe.
> To deal with this issue we need to make sure we're always using the
> canonical versioned names for CPUs.
> 
> Qemu documentation states that unversioned names for CPU models (e.g.
> 'EPYC') are actually aliases to a specific versioned CPU model (e.g.
> 'EPYC-v1'). The documentation further states that the specific version
> targeted by the alias may change based on the machine type of the
> domain. Management software such as libvirt is directed to translate
> these aliases to a concrete version in order to make sure that the CPU
> definition is safe when migrating between different qemu versions that
> may make different choices for which underlying versioned model
> represents the alias.

Our CPU models are already migration safe, since we expand the
unversioned machine type to a versioned machine, and thus as
a result our unversioned CPU model is implicitly turned into
the versioned CPU model associated with that versioned machine
type.

It is still compelling for libvirt to deal with versioned CPU
models, but we don't directly need that in order to be migration
safe.

Where versioned models are interesting is if the applicatio
does not care about the specific machine type, so just asks
for 'q35', but /does/ care about a specific CPU model in
order to establish migration compat between a certain set of
hosts. Letting them give a versioned CPU model gives more
direct control which is good.

> 
> In practice, at the time of writing qemu always maps the unversioned
> aliases to the -v1 model. And libvirt's CPU model definitions also
> assume that this is the case. For example, the 'x86_EPYC.xml' file
> contains the features that are defined for the EPYC-v1 inside of qemu.
> But if qemu ever changes their alias mapping, libvirt's idea of what an
> 'EPYC' CPU means and qemu's idea of what an 'EPYC' CPU means will no
> longer match. So when choosing a CPU model for a domain, we should
> always pass the canonical versioned name to libvirt rather than the
> unversioned alias. To enable this, the script will generate a new
> 'canonical_name' field to the CPU model xml definition.
> 
> Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
> ---
>  src/cpu_map/sync_qemu_models_i386.py | 42 ++++++++++++++++++++++------
>  1 file changed, 34 insertions(+), 8 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Jim Fehlig 1 year, 6 months ago

On 12/15/23 15:11, Jonathon Jongsma wrote:
> Previously, the script only generated the parent CPU and any versions
> that had a defined alias. The script now generates all CPU versions. Any
> version that had a defined alias will continue to use that alias, but
> those without aliases will use the generated name $BASECPUNAME-vN.
> 
> The reason for this change is two-fold. First, we need to add new models
> that support new features (such as SEV-SNP). To deal with this, the
> script now generates model definitions for all versions.
> 
> But we also need to ensure that our CPU definitions are migration-safe.
> To deal with this issue we need to make sure we're always using the
> canonical versioned names for CPUs.

Related to migration safety, do we need to be concerned with the expansion of 
'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC before 
introducing the new models, and EPYC-v4 afterwards? If so, what are the 
ramifications of that?

> Qemu documentation states that unversioned names for CPU models (e.g.
> 'EPYC') are actually aliases to a specific versioned CPU model (e.g.
> 'EPYC-v1'). The documentation further states that the specific version
> targeted by the alias may change based on the machine type of the
> domain. Management software such as libvirt is directed to translate
> these aliases to a concrete version in order to make sure that the CPU
> definition is safe when migrating between different qemu versions that
> may make different choices for which underlying versioned model
> represents the alias.

Do you have a link to the qemu documentation? Can it be added to the commit message?

> In practice, at the time of writing qemu always maps the unversioned
> aliases to the -v1 model. And libvirt's CPU model definitions also
> assume that this is the case. For example, the 'x86_EPYC.xml' file
> contains the features that are defined for the EPYC-v1 inside of qemu.
> But if qemu ever changes their alias mapping, libvirt's idea of what an
> 'EPYC' CPU means and qemu's idea of what an 'EPYC' CPU means will no
> longer match. So when choosing a CPU model for a domain, we should
> always pass the canonical versioned name to libvirt rather than the
> unversioned alias. To enable this, the script will generate a new
> 'canonical_name' field to the CPU model xml definition.
> 
> Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
> ---
>   src/cpu_map/sync_qemu_models_i386.py | 42 ++++++++++++++++++++++------
>   1 file changed, 34 insertions(+), 8 deletions(-)

The logic changes to the script LGTM.

Reviewed-by: Jim Fehlig <jfehlig@suse.com>

Regards,
Jim

> 
> diff --git a/src/cpu_map/sync_qemu_models_i386.py b/src/cpu_map/sync_qemu_models_i386.py
> index 1c6a2d4d27..7fd62eba4a 100755
> --- a/src/cpu_map/sync_qemu_models_i386.py
> +++ b/src/cpu_map/sync_qemu_models_i386.py
> @@ -322,31 +322,55 @@ def expand_model(model):
>       different fields and may have differing versions into several libvirt-
>       friendly cpu models."""
>   
> -    result = {
> -        "name": model.pop(".name"),
> +    basename = model.pop(".name")
> +    parent = {
> +        "name": basename,
>           "vendor": translate_vendor(model.pop(".vendor")),
>           "features": set(),
>           "extra": dict()}
>   
>       if ".family" in model and ".model" in model:
> -        result["family"] = model.pop(".family")
> -        result["model"] = model.pop(".model")
> +        parent["family"] = model.pop(".family")
> +        parent["model"] = model.pop(".model")
>   
>       for k in [k for k in model if k.startswith(".features")]:
>           v = model.pop(k)
>           for feature in v.split():
>               translated = translate_feature(feature)
>               if translated:
> -                result["features"].add(translated)
> +                parent["features"].add(translated)
>   
>       versions = model.pop(".versions", [])
>       for k, v in model.items():
> -        result["extra"]["model" + k] = v
> -    yield result
> +        parent["extra"]["model" + k] = v
> +
> +    if not versions:
> +        yield parent
> +        return
> +
> +    result = parent
>   
>       for version in versions:
> +        # each version builds on the previous one
>           result = copy.deepcopy(result)
> -        result["name"] = version.pop(".alias", result["name"])
> +        vnum = int(version.pop(".version"))
> +        vname = "{}-v{}".format(basename, vnum)
> +        result["canonical_name"] = vname
> +        if vnum == 1:
> +            # the first version should always be an alias for the parent and
> +            # should therefore have no extra properties
> +            if version.items():
> +                raise RuntimeError("Unexpected properties in version 1")
> +            yield result
> +            continue
> +
> +        # prefer the 'alias' over the generated the name if it exists since we
> +        # have already been using these aliases
> +        alias = version.pop(".alias", None)
> +        if alias:
> +            result["name"] = alias
> +        else:
> +            result["name"] = vname
>   
>           props = version.pop(".props", dict())
>           for k, v in props:
> @@ -377,6 +401,8 @@ def output_model(f, model):
>   
>       f.write("<cpus>\n")
>       f.write("  <model name='{}'>\n".format(model["name"]))
> +    if "canonical_name" in model and model["name"] != model["canonical_name"]:
> +        f.write("    <canonical_name>{}</canonical_name>\n".format(model["canonical_name"]))
>       f.write("    <decode host='on' guest='on'/>\n")
>       f.write("    <signature family='{}' model='{}'/>\n".format(
>           model["family"], model["model"]))
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Daniel P. Berrangé 1 year, 6 months ago

On Tue, Feb 20, 2024 at 05:08:02PM -0700, Jim Fehlig wrote:
> On 12/15/23 15:11, Jonathon Jongsma wrote:
> > Previously, the script only generated the parent CPU and any versions
> > that had a defined alias. The script now generates all CPU versions. Any
> > version that had a defined alias will continue to use that alias, but
> > those without aliases will use the generated name $BASECPUNAME-vN.
> > 
> > The reason for this change is two-fold. First, we need to add new models
> > that support new features (such as SEV-SNP). To deal with this, the
> > script now generates model definitions for all versions.
> > 
> > But we also need to ensure that our CPU definitions are migration-safe.
> > To deal with this issue we need to make sure we're always using the
> > canonical versioned names for CPUs.
> 
> Related to migration safety, do we need to be concerned with the expansion
> of 'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC before
> introducing the new models, and EPYC-v4 afterwards? If so, what are the
> ramifications of that?

Yes, I see that happening on my laptop in domcapabilities:

Currently libvirt reports:

    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Snowridge</model>
      <vendor>Intel</vendor>
      <maxphysaddr mode='passthrough' limit='46'/>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
     ...snip...


and after this series it reports:

    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Snowridge-v4</model>
      <vendor>Intel</vendor>
      <maxphysaddr mode='passthrough' limit='46'/>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
     ...snip...


That's not wrong per-se, becasue Snowrigde-v4 has a smaller
delta against my host CPU.

The problem is that libvirt updates the *live* XML for the
guest with this expansion.  IIUC, if we now attempt to
live migrate to a compatible machine running older libvirt
the migrate will fail as old libvirt doesn't know the -v4
CPU.

I'm not sure how to address this ? 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Jonathon Jongsma 1 year, 6 months ago

On 3/1/24 10:13 AM, Daniel P. Berrangé wrote:
> On Tue, Feb 20, 2024 at 05:08:02PM -0700, Jim Fehlig wrote:
>> On 12/15/23 15:11, Jonathon Jongsma wrote:
>>> Previously, the script only generated the parent CPU and any versions
>>> that had a defined alias. The script now generates all CPU versions. Any
>>> version that had a defined alias will continue to use that alias, but
>>> those without aliases will use the generated name $BASECPUNAME-vN.
>>>
>>> The reason for this change is two-fold. First, we need to add new models
>>> that support new features (such as SEV-SNP). To deal with this, the
>>> script now generates model definitions for all versions.
>>>
>>> But we also need to ensure that our CPU definitions are migration-safe.
>>> To deal with this issue we need to make sure we're always using the
>>> canonical versioned names for CPUs.
>>
>> Related to migration safety, do we need to be concerned with the expansion
>> of 'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC before
>> introducing the new models, and EPYC-v4 afterwards? If so, what are the
>> ramifications of that?
> 
> Yes, I see that happening on my laptop in domcapabilities:
> 
> Currently libvirt reports:
> 
>      <mode name='host-model' supported='yes'>
>        <model fallback='forbid'>Snowridge</model>
>        <vendor>Intel</vendor>
>        <maxphysaddr mode='passthrough' limit='46'/>
>        <feature policy='require' name='ss'/>
>        <feature policy='require' name='vmx'/>
>       ...snip...
> 
> 
> and after this series it reports:
> 
>      <mode name='host-model' supported='yes'>
>        <model fallback='forbid'>Snowridge-v4</model>
>        <vendor>Intel</vendor>
>        <maxphysaddr mode='passthrough' limit='46'/>
>        <feature policy='require' name='ss'/>
>        <feature policy='require' name='vmx'/>
>       ...snip...
> 
> 
> That's not wrong per-se, becasue Snowrigde-v4 has a smaller
> delta against my host CPU.
> 
> The problem is that libvirt updates the *live* XML for the
> guest with this expansion.  IIUC, if we now attempt to
> live migrate to a compatible machine running older libvirt
> the migrate will fail as old libvirt doesn't know the -v4
> CPU.
> 
> I'm not sure how to address this ?

But don't we have this issue any time we add a new CPU model to libvirt? 
Anytime there's a new model, it has the potential to be a closer match 
to the host CPU than an existing model definition was. As I mentioned in 
my previous reply, when e.g. the -noTSX CPU variants were added, didn't 
the same sort of thing (potentially) happen? Or am I doing something 
meaningfully different in this patch set than what happens in those 
scenarios?

Jonathon
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Daniel P. Berrangé 1 year, 6 months ago

On Fri, Mar 01, 2024 at 10:36:12AM -0600, Jonathon Jongsma wrote:
> On 3/1/24 10:13 AM, Daniel P. Berrangé wrote:
> > On Tue, Feb 20, 2024 at 05:08:02PM -0700, Jim Fehlig wrote:
> > > On 12/15/23 15:11, Jonathon Jongsma wrote:
> > > > Previously, the script only generated the parent CPU and any versions
> > > > that had a defined alias. The script now generates all CPU versions. Any
> > > > version that had a defined alias will continue to use that alias, but
> > > > those without aliases will use the generated name $BASECPUNAME-vN.
> > > > 
> > > > The reason for this change is two-fold. First, we need to add new models
> > > > that support new features (such as SEV-SNP). To deal with this, the
> > > > script now generates model definitions for all versions.
> > > > 
> > > > But we also need to ensure that our CPU definitions are migration-safe.
> > > > To deal with this issue we need to make sure we're always using the
> > > > canonical versioned names for CPUs.
> > > 
> > > Related to migration safety, do we need to be concerned with the expansion
> > > of 'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC before
> > > introducing the new models, and EPYC-v4 afterwards? If so, what are the
> > > ramifications of that?
> > 
> > Yes, I see that happening on my laptop in domcapabilities:
> > 
> > Currently libvirt reports:
> > 
> >      <mode name='host-model' supported='yes'>
> >        <model fallback='forbid'>Snowridge</model>
> >        <vendor>Intel</vendor>
> >        <maxphysaddr mode='passthrough' limit='46'/>
> >        <feature policy='require' name='ss'/>
> >        <feature policy='require' name='vmx'/>
> >       ...snip...
> > 
> > 
> > and after this series it reports:
> > 
> >      <mode name='host-model' supported='yes'>
> >        <model fallback='forbid'>Snowridge-v4</model>
> >        <vendor>Intel</vendor>
> >        <maxphysaddr mode='passthrough' limit='46'/>
> >        <feature policy='require' name='ss'/>
> >        <feature policy='require' name='vmx'/>
> >       ...snip...
> > 
> > 
> > That's not wrong per-se, becasue Snowrigde-v4 has a smaller
> > delta against my host CPU.
> > 
> > The problem is that libvirt updates the *live* XML for the
> > guest with this expansion.  IIUC, if we now attempt to
> > live migrate to a compatible machine running older libvirt
> > the migrate will fail as old libvirt doesn't know the -v4
> > CPU.
> > 
> > I'm not sure how to address this ?
> 
> But don't we have this issue any time we add a new CPU model to libvirt?
> Anytime there's a new model, it has the potential to be a closer match to
> the host CPU than an existing model definition was. As I mentioned in my
> previous reply, when e.g. the -noTSX CPU variants were added, didn't the
> same sort of thing (potentially) happen? Or am I doing something
> meaningfully different in this patch set than what happens in those
> scenarios?

I think it probably /did/ happen, but that doesn't make it acceptable.
The noTSX stuff was the cause of massive amounts of compatibility pain
for mgmt apps, so the incompatibility in libvirt might have been glossed
over. We're adding alot of new versions here, so the possibly increasing
the visibility/impact of this libvirt change.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Jim Fehlig 1 year, 6 months ago

On 3/1/24 10:13, Daniel P. Berrangé wrote:
> On Fri, Mar 01, 2024 at 10:36:12AM -0600, Jonathon Jongsma wrote:
>> On 3/1/24 10:13 AM, Daniel P. Berrangé wrote:
>>> On Tue, Feb 20, 2024 at 05:08:02PM -0700, Jim Fehlig wrote:
>>>> On 12/15/23 15:11, Jonathon Jongsma wrote:
>>>>> Previously, the script only generated the parent CPU and any versions
>>>>> that had a defined alias. The script now generates all CPU versions. Any
>>>>> version that had a defined alias will continue to use that alias, but
>>>>> those without aliases will use the generated name $BASECPUNAME-vN.
>>>>>
>>>>> The reason for this change is two-fold. First, we need to add new models
>>>>> that support new features (such as SEV-SNP). To deal with this, the
>>>>> script now generates model definitions for all versions.
>>>>>
>>>>> But we also need to ensure that our CPU definitions are migration-safe.
>>>>> To deal with this issue we need to make sure we're always using the
>>>>> canonical versioned names for CPUs.
>>>>
>>>> Related to migration safety, do we need to be concerned with the expansion
>>>> of 'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC before
>>>> introducing the new models, and EPYC-v4 afterwards? If so, what are the
>>>> ramifications of that?
>>>
>>> Yes, I see that happening on my laptop in domcapabilities:
>>>
>>> Currently libvirt reports:
>>>
>>>       <mode name='host-model' supported='yes'>
>>>         <model fallback='forbid'>Snowridge</model>
>>>         <vendor>Intel</vendor>
>>>         <maxphysaddr mode='passthrough' limit='46'/>
>>>         <feature policy='require' name='ss'/>
>>>         <feature policy='require' name='vmx'/>
>>>        ...snip...
>>>
>>>
>>> and after this series it reports:
>>>
>>>       <mode name='host-model' supported='yes'>
>>>         <model fallback='forbid'>Snowridge-v4</model>
>>>         <vendor>Intel</vendor>
>>>         <maxphysaddr mode='passthrough' limit='46'/>
>>>         <feature policy='require' name='ss'/>
>>>         <feature policy='require' name='vmx'/>
>>>        ...snip...
>>>
>>>
>>> That's not wrong per-se, becasue Snowrigde-v4 has a smaller
>>> delta against my host CPU.
>>>
>>> The problem is that libvirt updates the *live* XML for the
>>> guest with this expansion.  IIUC, if we now attempt to
>>> live migrate to a compatible machine running older libvirt
>>> the migrate will fail as old libvirt doesn't know the -v4
>>> CPU.

Downstream, we (SUSE) don't really support migrating from new -> old. Is this 
something we aim to support upstream?

>>>
>>> I'm not sure how to address this ?
>>
>> But don't we have this issue any time we add a new CPU model to libvirt?
>> Anytime there's a new model, it has the potential to be a closer match to
>> the host CPU than an existing model definition was. As I mentioned in my
>> previous reply, when e.g. the -noTSX CPU variants were added, didn't the
>> same sort of thing (potentially) happen? Or am I doing something
>> meaningfully different in this patch set than what happens in those
>> scenarios?
> 
> I think it probably /did/ happen, but that doesn't make it acceptable.
> The noTSX stuff was the cause of massive amounts of compatibility pain
> for mgmt apps, so the incompatibility in libvirt might have been glossed
> over. We're adding alot of new versions here, so the possibly increasing
> the visibility/impact of this libvirt change.

It can happen when we introduce an entirely new CPU model too. E.g. on a Genoa 
machine, prior to commit bfe53e9145c, host model expanded to

  <cpu mode='custom' match='exact' check='full'>
     <model fallback='forbid'>EPYC-Milan</model>
     <vendor>AMD</vendor>
     <feature policy='require' name='x2apic'/>
     <feature policy='require' name='tsc-deadline'/>
     <feature policy='require' name='hypervisor'/>
     <feature policy='require' name='tsc_adjust'/>
     <feature policy='require' name='avx512f'/>
     <feature policy='require' name='avx512dq'/>
     <feature policy='require' name='avx512ifma'/>
     <feature policy='require' name='avx512cd'/>
     <feature policy='require' name='avx512bw'/>
     <feature policy='require' name='avx512vl'/>
     <feature policy='require' name='avx512vbmi'/>
     <feature policy='require' name='avx512vbmi2'/>
     <feature policy='require' name='gfni'/>
     <feature policy='require' name='vaes'/>
     <feature policy='require' name='vpclmulqdq'/>
     <feature policy='require' name='avx512vnni'/>
     <feature policy='require' name='avx512bitalg'/>
     <feature policy='require' name='avx512-vpopcntdq'/>
     <feature policy='require' name='la57'/>
     <feature policy='require' name='spec-ctrl'/>
     <feature policy='require' name='stibp'/>
     <feature policy='require' name='arch-capabilities'/>
     <feature policy='require' name='ssbd'/>
     <feature policy='require' name='avx512-bf16'/>
     <feature policy='require' name='cmp_legacy'/>
     <feature policy='require' name='virt-ssbd'/>
     <feature policy='require' name='rdctl-no'/>
     <feature policy='require' name='skip-l1dfl-vmentry'/>
     <feature policy='require' name='mds-no'/>
     <feature policy='require' name='pschange-mc-no'/>
     <feature policy='disable' name='svm'/>
     <feature policy='require' name='topoext'/>
     <feature policy='disable' name='npt'/>
     <feature policy='disable' name='nrip-save'/>
     <feature policy='disable' name='svme-addr-chk'/>
   </cpu>

After commit bfe53e9145c

<cpu mode='custom' match='exact' check='full'>
     <model fallback='forbid'>EPYC-Genoa</model>
     <vendor>AMD</vendor>
     <feature policy='require' name='x2apic'/>
     <feature policy='require' name='tsc-deadline'/>
     <feature policy='require' name='hypervisor'/>
     <feature policy='require' name='tsc_adjust'/>
     <feature policy='require' name='spec-ctrl'/>
     <feature policy='require' name='stibp'/>
     <feature policy='require' name='arch-capabilities'/>
     <feature policy='require' name='ssbd'/>
     <feature policy='require' name='cmp_legacy'/>
     <feature policy='require' name='virt-ssbd'/>
     <feature policy='require' name='rdctl-no'/>
     <feature policy='require' name='skip-l1dfl-vmentry'/>
     <feature policy='require' name='mds-no'/>
     <feature policy='require' name='pschange-mc-no'/>
     <feature policy='disable' name='svm'/>
     <feature policy='require' name='topoext'/>
     <feature policy='disable' name='npt'/>
     <feature policy='disable' name='nrip-save'/>
     <feature policy='disable' name='svme-addr-chk'/>
   </cpu>

Regards,
Jim
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Daniel P. Berrangé 1 year, 5 months ago

On Mon, Mar 04, 2024 at 10:35:25AM -0700, Jim Fehlig wrote:
> On 3/1/24 10:13, Daniel P. Berrangé wrote:
> > On Fri, Mar 01, 2024 at 10:36:12AM -0600, Jonathon Jongsma wrote:
> > > On 3/1/24 10:13 AM, Daniel P. Berrangé wrote:
> > > > On Tue, Feb 20, 2024 at 05:08:02PM -0700, Jim Fehlig wrote:
> > > > > On 12/15/23 15:11, Jonathon Jongsma wrote:
> > > > > > Previously, the script only generated the parent CPU and any versions
> > > > > > that had a defined alias. The script now generates all CPU versions. Any
> > > > > > version that had a defined alias will continue to use that alias, but
> > > > > > those without aliases will use the generated name $BASECPUNAME-vN.
> > > > > > 
> > > > > > The reason for this change is two-fold. First, we need to add new models
> > > > > > that support new features (such as SEV-SNP). To deal with this, the
> > > > > > script now generates model definitions for all versions.
> > > > > > 
> > > > > > But we also need to ensure that our CPU definitions are migration-safe.
> > > > > > To deal with this issue we need to make sure we're always using the
> > > > > > canonical versioned names for CPUs.
> > > > > 
> > > > > Related to migration safety, do we need to be concerned with the expansion
> > > > > of 'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC before
> > > > > introducing the new models, and EPYC-v4 afterwards? If so, what are the
> > > > > ramifications of that?
> > > > 
> > > > Yes, I see that happening on my laptop in domcapabilities:
> > > > 
> > > > Currently libvirt reports:
> > > > 
> > > >       <mode name='host-model' supported='yes'>
> > > >         <model fallback='forbid'>Snowridge</model>
> > > >         <vendor>Intel</vendor>
> > > >         <maxphysaddr mode='passthrough' limit='46'/>
> > > >         <feature policy='require' name='ss'/>
> > > >         <feature policy='require' name='vmx'/>
> > > >        ...snip...
> > > > 
> > > > 
> > > > and after this series it reports:
> > > > 
> > > >       <mode name='host-model' supported='yes'>
> > > >         <model fallback='forbid'>Snowridge-v4</model>
> > > >         <vendor>Intel</vendor>
> > > >         <maxphysaddr mode='passthrough' limit='46'/>
> > > >         <feature policy='require' name='ss'/>
> > > >         <feature policy='require' name='vmx'/>
> > > >        ...snip...
> > > > 
> > > > 
> > > > That's not wrong per-se, becasue Snowrigde-v4 has a smaller
> > > > delta against my host CPU.
> > > > 
> > > > The problem is that libvirt updates the *live* XML for the
> > > > guest with this expansion.  IIUC, if we now attempt to
> > > > live migrate to a compatible machine running older libvirt
> > > > the migrate will fail as old libvirt doesn't know the -v4
> > > > CPU.
> 
> Downstream, we (SUSE) don't really support migrating from new -> old. Is
> this something we aim to support upstream?

Kind of, sort of, yes and no :)

The VIR_DOMAIN_XML_MIGRATABLE flag is a bit of an attempt to make
it possible to format XML in a way that's (hopefully) mostly acceptable
to older libvirt.

The devil is in the detail though, and there's never really been
any formal testing to prove correctness, so new -> old is one of
those things that may work, please report bugs if we missed
something.

> > > > I'm not sure how to address this ?
> > > 
> > > But don't we have this issue any time we add a new CPU model to libvirt?
> > > Anytime there's a new model, it has the potential to be a closer match to
> > > the host CPU than an existing model definition was. As I mentioned in my
> > > previous reply, when e.g. the -noTSX CPU variants were added, didn't the
> > > same sort of thing (potentially) happen? Or am I doing something
> > > meaningfully different in this patch set than what happens in those
> > > scenarios?
> > 
> > I think it probably /did/ happen, but that doesn't make it acceptable.
> > The noTSX stuff was the cause of massive amounts of compatibility pain
> > for mgmt apps, so the incompatibility in libvirt might have been glossed
> > over. We're adding alot of new versions here, so the possibly increasing
> > the visibility/impact of this libvirt change.
> 
> It can happen when we introduce an entirely new CPU model too. E.g. on a
> Genoa machine, prior to commit bfe53e9145c, host model expanded to

Yeah, true, so that's a general problem with 'host-model' when
introducing new CPU generations, if that post-dates a user
deploying on said CPU generation..

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Jim Fehlig 1 year, 5 months ago

On 3/22/24 04:54, Daniel P. Berrangé wrote:
> On Mon, Mar 04, 2024 at 10:35:25AM -0700, Jim Fehlig wrote:
>> On 3/1/24 10:13, Daniel P. Berrangé wrote:
>>> On Fri, Mar 01, 2024 at 10:36:12AM -0600, Jonathon Jongsma wrote:
>>>> On 3/1/24 10:13 AM, Daniel P. Berrangé wrote:
>>>>> On Tue, Feb 20, 2024 at 05:08:02PM -0700, Jim Fehlig wrote:
>>>>>> On 12/15/23 15:11, Jonathon Jongsma wrote:
>>>>>>> Previously, the script only generated the parent CPU and any versions
>>>>>>> that had a defined alias. The script now generates all CPU versions. Any
>>>>>>> version that had a defined alias will continue to use that alias, but
>>>>>>> those without aliases will use the generated name $BASECPUNAME-vN.
>>>>>>>
>>>>>>> The reason for this change is two-fold. First, we need to add new models
>>>>>>> that support new features (such as SEV-SNP). To deal with this, the
>>>>>>> script now generates model definitions for all versions.
>>>>>>>
>>>>>>> But we also need to ensure that our CPU definitions are migration-safe.
>>>>>>> To deal with this issue we need to make sure we're always using the
>>>>>>> canonical versioned names for CPUs.
>>>>>>
>>>>>> Related to migration safety, do we need to be concerned with the expansion
>>>>>> of 'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC before
>>>>>> introducing the new models, and EPYC-v4 afterwards? If so, what are the
>>>>>> ramifications of that?
>>>>>
>>>>> Yes, I see that happening on my laptop in domcapabilities:
>>>>>
>>>>> Currently libvirt reports:
>>>>>
>>>>>        <mode name='host-model' supported='yes'>
>>>>>          <model fallback='forbid'>Snowridge</model>
>>>>>          <vendor>Intel</vendor>
>>>>>          <maxphysaddr mode='passthrough' limit='46'/>
>>>>>          <feature policy='require' name='ss'/>
>>>>>          <feature policy='require' name='vmx'/>
>>>>>         ...snip...
>>>>>
>>>>>
>>>>> and after this series it reports:
>>>>>
>>>>>        <mode name='host-model' supported='yes'>
>>>>>          <model fallback='forbid'>Snowridge-v4</model>
>>>>>          <vendor>Intel</vendor>
>>>>>          <maxphysaddr mode='passthrough' limit='46'/>
>>>>>          <feature policy='require' name='ss'/>
>>>>>          <feature policy='require' name='vmx'/>
>>>>>         ...snip...
>>>>>
>>>>>
>>>>> That's not wrong per-se, becasue Snowrigde-v4 has a smaller
>>>>> delta against my host CPU.
>>>>>
>>>>> The problem is that libvirt updates the *live* XML for the
>>>>> guest with this expansion.  IIUC, if we now attempt to
>>>>> live migrate to a compatible machine running older libvirt
>>>>> the migrate will fail as old libvirt doesn't know the -v4
>>>>> CPU.
>>
>> Downstream, we (SUSE) don't really support migrating from new -> old. Is
>> this something we aim to support upstream?
> 
> Kind of, sort of, yes and no :)
> 
> The VIR_DOMAIN_XML_MIGRATABLE flag is a bit of an attempt to make
> it possible to format XML in a way that's (hopefully) mostly acceptable
> to older libvirt.
> 
> The devil is in the detail though, and there's never really been
> any formal testing to prove correctness, so new -> old is one of
> those things that may work, please report bugs if we missed
> something.
> 
>>>>> I'm not sure how to address this ?
>>>>
>>>> But don't we have this issue any time we add a new CPU model to libvirt?
>>>> Anytime there's a new model, it has the potential to be a closer match to
>>>> the host CPU than an existing model definition was. As I mentioned in my
>>>> previous reply, when e.g. the -noTSX CPU variants were added, didn't the
>>>> same sort of thing (potentially) happen? Or am I doing something
>>>> meaningfully different in this patch set than what happens in those
>>>> scenarios?
>>>
>>> I think it probably /did/ happen, but that doesn't make it acceptable.
>>> The noTSX stuff was the cause of massive amounts of compatibility pain
>>> for mgmt apps, so the incompatibility in libvirt might have been glossed
>>> over. We're adding alot of new versions here, so the possibly increasing
>>> the visibility/impact of this libvirt change.
>>
>> It can happen when we introduce an entirely new CPU model too. E.g. on a
>> Genoa machine, prior to commit bfe53e9145c, host model expanded to
> 
> Yeah, true, so that's a general problem with 'host-model' when
> introducing new CPU generations, if that post-dates a user
> deploying on said CPU generation..

So what's the consensus on this series? With it, there will certainly be cases 
where host-model expands to one of the new versioned models. Question is, how 
many of those result in an incompatible CPU from the guest perspective? Without 
the series, we'll need to selectively add versioned CPUs when supporting new 
features such as sev-snp.

Regards,
Jim
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Jonathon Jongsma 1 year, 5 months ago

On 3/4/24 11:35 AM, Jim Fehlig wrote:
> On 3/1/24 10:13, Daniel P. Berrangé wrote:
>> On Fri, Mar 01, 2024 at 10:36:12AM -0600, Jonathon Jongsma wrote:
>>> On 3/1/24 10:13 AM, Daniel P. Berrangé wrote:
>>>> On Tue, Feb 20, 2024 at 05:08:02PM -0700, Jim Fehlig wrote:
>>>>> On 12/15/23 15:11, Jonathon Jongsma wrote:
>>>>>> Previously, the script only generated the parent CPU and any versions
>>>>>> that had a defined alias. The script now generates all CPU 
>>>>>> versions. Any
>>>>>> version that had a defined alias will continue to use that alias, but
>>>>>> those without aliases will use the generated name $BASECPUNAME-vN.
>>>>>>
>>>>>> The reason for this change is two-fold. First, we need to add new 
>>>>>> models
>>>>>> that support new features (such as SEV-SNP). To deal with this, the
>>>>>> script now generates model definitions for all versions.
>>>>>>
>>>>>> But we also need to ensure that our CPU definitions are 
>>>>>> migration-safe.
>>>>>> To deal with this issue we need to make sure we're always using the
>>>>>> canonical versioned names for CPUs.
>>>>>
>>>>> Related to migration safety, do we need to be concerned with the 
>>>>> expansion
>>>>> of 'host-model' CPU? E.g. is it possible 'host-model' expands to 
>>>>> EPYC before
>>>>> introducing the new models, and EPYC-v4 afterwards? If so, what are 
>>>>> the
>>>>> ramifications of that?
>>>>
>>>> Yes, I see that happening on my laptop in domcapabilities:
>>>>
>>>> Currently libvirt reports:
>>>>
>>>>       <mode name='host-model' supported='yes'>
>>>>         <model fallback='forbid'>Snowridge</model>
>>>>         <vendor>Intel</vendor>
>>>>         <maxphysaddr mode='passthrough' limit='46'/>
>>>>         <feature policy='require' name='ss'/>
>>>>         <feature policy='require' name='vmx'/>
>>>>        ...snip...
>>>>
>>>>
>>>> and after this series it reports:
>>>>
>>>>       <mode name='host-model' supported='yes'>
>>>>         <model fallback='forbid'>Snowridge-v4</model>
>>>>         <vendor>Intel</vendor>
>>>>         <maxphysaddr mode='passthrough' limit='46'/>
>>>>         <feature policy='require' name='ss'/>
>>>>         <feature policy='require' name='vmx'/>
>>>>        ...snip...
>>>>
>>>>
>>>> That's not wrong per-se, becasue Snowrigde-v4 has a smaller
>>>> delta against my host CPU.
>>>>
>>>> The problem is that libvirt updates the *live* XML for the
>>>> guest with this expansion.  IIUC, if we now attempt to
>>>> live migrate to a compatible machine running older libvirt
>>>> the migrate will fail as old libvirt doesn't know the -v4
>>>> CPU.
> 
> Downstream, we (SUSE) don't really support migrating from new -> old. Is 
> this something we aim to support upstream?

I don't know the answer to this question.

> 
>>>>
>>>> I'm not sure how to address this ?
>>>
>>> But don't we have this issue any time we add a new CPU model to libvirt?
>>> Anytime there's a new model, it has the potential to be a closer 
>>> match to
>>> the host CPU than an existing model definition was. As I mentioned in my
>>> previous reply, when e.g. the -noTSX CPU variants were added, didn't the
>>> same sort of thing (potentially) happen? Or am I doing something
>>> meaningfully different in this patch set than what happens in those
>>> scenarios?
>>
>> I think it probably /did/ happen, but that doesn't make it acceptable.
>> The noTSX stuff was the cause of massive amounts of compatibility pain
>> for mgmt apps, so the incompatibility in libvirt might have been glossed
>> over. We're adding alot of new versions here, so the possibly increasing
>> the visibility/impact of this libvirt change.
> 
> It can happen when we introduce an entirely new CPU model too. E.g. on a 
> Genoa machine, prior to commit bfe53e9145c, host model expanded to
> 
>   <cpu mode='custom' match='exact' check='full'>
>      <model fallback='forbid'>EPYC-Milan</model>
>      <vendor>AMD</vendor>
>      <feature policy='require' name='x2apic'/>
>      <feature policy='require' name='tsc-deadline'/>
>      <feature policy='require' name='hypervisor'/>
>      <feature policy='require' name='tsc_adjust'/>
>      <feature policy='require' name='avx512f'/>
>      <feature policy='require' name='avx512dq'/>
>      <feature policy='require' name='avx512ifma'/>
>      <feature policy='require' name='avx512cd'/>
>      <feature policy='require' name='avx512bw'/>
>      <feature policy='require' name='avx512vl'/>
>      <feature policy='require' name='avx512vbmi'/>
>      <feature policy='require' name='avx512vbmi2'/>
>      <feature policy='require' name='gfni'/>
>      <feature policy='require' name='vaes'/>
>      <feature policy='require' name='vpclmulqdq'/>
>      <feature policy='require' name='avx512vnni'/>
>      <feature policy='require' name='avx512bitalg'/>
>      <feature policy='require' name='avx512-vpopcntdq'/>
>      <feature policy='require' name='la57'/>
>      <feature policy='require' name='spec-ctrl'/>
>      <feature policy='require' name='stibp'/>
>      <feature policy='require' name='arch-capabilities'/>
>      <feature policy='require' name='ssbd'/>
>      <feature policy='require' name='avx512-bf16'/>
>      <feature policy='require' name='cmp_legacy'/>
>      <feature policy='require' name='virt-ssbd'/>
>      <feature policy='require' name='rdctl-no'/>
>      <feature policy='require' name='skip-l1dfl-vmentry'/>
>      <feature policy='require' name='mds-no'/>
>      <feature policy='require' name='pschange-mc-no'/>
>      <feature policy='disable' name='svm'/>
>      <feature policy='require' name='topoext'/>
>      <feature policy='disable' name='npt'/>
>      <feature policy='disable' name='nrip-save'/>
>      <feature policy='disable' name='svme-addr-chk'/>
>    </cpu>
> 
> After commit bfe53e9145c
> 
> <cpu mode='custom' match='exact' check='full'>
>      <model fallback='forbid'>EPYC-Genoa</model>
>      <vendor>AMD</vendor>
>      <feature policy='require' name='x2apic'/>
>      <feature policy='require' name='tsc-deadline'/>
>      <feature policy='require' name='hypervisor'/>
>      <feature policy='require' name='tsc_adjust'/>
>      <feature policy='require' name='spec-ctrl'/>
>      <feature policy='require' name='stibp'/>
>      <feature policy='require' name='arch-capabilities'/>
>      <feature policy='require' name='ssbd'/>
>      <feature policy='require' name='cmp_legacy'/>
>      <feature policy='require' name='virt-ssbd'/>
>      <feature policy='require' name='rdctl-no'/>
>      <feature policy='require' name='skip-l1dfl-vmentry'/>
>      <feature policy='require' name='mds-no'/>
>      <feature policy='require' name='pschange-mc-no'/>
>      <feature policy='disable' name='svm'/>
>      <feature policy='require' name='topoext'/>
>      <feature policy='disable' name='npt'/>
>      <feature policy='disable' name='nrip-save'/>
>      <feature policy='disable' name='svme-addr-chk'/>
>    </cpu>
> 
> Regards,
> Jim

Does anybody have a response to this point from Jim? I can't really 
think of a way forward if it's not acceptable for the host model 
expansion to change between different versions of libvirt.

Jonathon
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Jonathon Jongsma 1 year, 6 months ago

On 2/20/24 6:08 PM, Jim Fehlig wrote:
> On 12/15/23 15:11, Jonathon Jongsma wrote:
>> Previously, the script only generated the parent CPU and any versions
>> that had a defined alias. The script now generates all CPU versions. Any
>> version that had a defined alias will continue to use that alias, but
>> those without aliases will use the generated name $BASECPUNAME-vN.
>>
>> The reason for this change is two-fold. First, we need to add new models
>> that support new features (such as SEV-SNP). To deal with this, the
>> script now generates model definitions for all versions.
>>
>> But we also need to ensure that our CPU definitions are migration-safe.
>> To deal with this issue we need to make sure we're always using the
>> canonical versioned names for CPUs.
> 
> Related to migration safety, do we need to be concerned with the 
> expansion of 'host-model' CPU? E.g. is it possible 'host-model' expands 
> to EPYC before introducing the new models, and EPYC-v4 afterwards? If 
> so, what are the ramifications of that?

Indeed, that behavior is possible. In fact, you can see it happening in 
the test results for e.g. patch #5 ("Add versioned EPYC CPUs"). But I 
think that in general we should be fine, because we don't just expand to 
a plain model name, we expand to a model name plus a list of enabled or 
disabled feature flags.

For example, consider one test output file 
(tests/domaincapsdata/qemu_8.1.0-q35.x86_64.xml) that exhibits this 
behavior change. Before my patches the host CPU expanded to a model of 
'EPYC-ROME', with a number of feature flags which included (among others):

       <feature policy='require' name='amd-ssbd'/>
       <feature policy='disable' name='xsaves'/>

After my patches, the host model was expanded to 'EPYC-Rome-v4', with a 
slightly different set of feature flags. The main differences being that 
the 'amd-ssbd' and 'xsaves' features were no longer mentioned because 
they are now included in the -v4 definition. But it also adds a new 
feature flag to disable the 'ibrs' feature since this feature is 
included in the -v4 definition but not in the -v1 definition and is not 
provided by the host CPU:

       <feature policy='disable' name='ibrs'/>

So although the model name is changed, the definition of the host CPU as 
a whole should expand to the same set of CPU features.

I think that this sort of thing can happen every time a new CPU model is 
added (e.g. the -noTSX-IBRS variants), so I think it should work. But if 
anyone knows of any specific migration issues, please let me know.


>> Qemu documentation states that unversioned names for CPU models (e.g.
>> 'EPYC') are actually aliases to a specific versioned CPU model (e.g.
>> 'EPYC-v1'). The documentation further states that the specific version
>> targeted by the alias may change based on the machine type of the
>> domain. Management software such as libvirt is directed to translate
>> these aliases to a concrete version in order to make sure that the CPU
>> definition is safe when migrating between different qemu versions that
>> may make different choices for which underlying versioned model
>> represents the alias.
> 
> Do you have a link to the qemu documentation? Can it be added to the 
> commit message?

Yes, I could certainly add a link to the commit message. It is mentioned 
here: 
https://www.qemu.org/docs/master/about/deprecated.html#runnability-guarantee-of-cpu-models-since-4-1

> 
>> In practice, at the time of writing qemu always maps the unversioned
>> aliases to the -v1 model. And libvirt's CPU model definitions also
>> assume that this is the case. For example, the 'x86_EPYC.xml' file
>> contains the features that are defined for the EPYC-v1 inside of qemu.
>> But if qemu ever changes their alias mapping, libvirt's idea of what an
>> 'EPYC' CPU means and qemu's idea of what an 'EPYC' CPU means will no
>> longer match. So when choosing a CPU model for a domain, we should
>> always pass the canonical versioned name to libvirt rather than the
>> unversioned alias. To enable this, the script will generate a new
>> 'canonical_name' field to the CPU model xml definition.
>>
>> Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
>> ---
>>   src/cpu_map/sync_qemu_models_i386.py | 42 ++++++++++++++++++++++------
>>   1 file changed, 34 insertions(+), 8 deletions(-)
> 
> The logic changes to the script LGTM.
> 
> Reviewed-by: Jim Fehlig <jfehlig@suse.com>
> 
> Regards,
> Jim
> 
>>
>> diff --git a/src/cpu_map/sync_qemu_models_i386.py 
>> b/src/cpu_map/sync_qemu_models_i386.py
>> index 1c6a2d4d27..7fd62eba4a 100755
>> --- a/src/cpu_map/sync_qemu_models_i386.py
>> +++ b/src/cpu_map/sync_qemu_models_i386.py
>> @@ -322,31 +322,55 @@ def expand_model(model):
>>       different fields and may have differing versions into several 
>> libvirt-
>>       friendly cpu models."""
>> -    result = {
>> -        "name": model.pop(".name"),
>> +    basename = model.pop(".name")
>> +    parent = {
>> +        "name": basename,
>>           "vendor": translate_vendor(model.pop(".vendor")),
>>           "features": set(),
>>           "extra": dict()}
>>       if ".family" in model and ".model" in model:
>> -        result["family"] = model.pop(".family")
>> -        result["model"] = model.pop(".model")
>> +        parent["family"] = model.pop(".family")
>> +        parent["model"] = model.pop(".model")
>>       for k in [k for k in model if k.startswith(".features")]:
>>           v = model.pop(k)
>>           for feature in v.split():
>>               translated = translate_feature(feature)
>>               if translated:
>> -                result["features"].add(translated)
>> +                parent["features"].add(translated)
>>       versions = model.pop(".versions", [])
>>       for k, v in model.items():
>> -        result["extra"]["model" + k] = v
>> -    yield result
>> +        parent["extra"]["model" + k] = v
>> +
>> +    if not versions:
>> +        yield parent
>> +        return
>> +
>> +    result = parent
>>       for version in versions:
>> +        # each version builds on the previous one
>>           result = copy.deepcopy(result)
>> -        result["name"] = version.pop(".alias", result["name"])
>> +        vnum = int(version.pop(".version"))
>> +        vname = "{}-v{}".format(basename, vnum)
>> +        result["canonical_name"] = vname
>> +        if vnum == 1:
>> +            # the first version should always be an alias for the 
>> parent and
>> +            # should therefore have no extra properties
>> +            if version.items():
>> +                raise RuntimeError("Unexpected properties in version 1")
>> +            yield result
>> +            continue
>> +
>> +        # prefer the 'alias' over the generated the name if it exists 
>> since we
>> +        # have already been using these aliases
>> +        alias = version.pop(".alias", None)
>> +        if alias:
>> +            result["name"] = alias
>> +        else:
>> +            result["name"] = vname
>>           props = version.pop(".props", dict())
>>           for k, v in props:
>> @@ -377,6 +401,8 @@ def output_model(f, model):
>>       f.write("<cpus>\n")
>>       f.write("  <model name='{}'>\n".format(model["name"]))
>> +    if "canonical_name" in model and model["name"] != 
>> model["canonical_name"]:
>> +        f.write("    
>> <canonical_name>{}</canonical_name>\n".format(model["canonical_name"]))
>>       f.write("    <decode host='on' guest='on'/>\n")
>>       f.write("    <signature family='{}' model='{}'/>\n".format(
>>           model["family"], model["model"]))
> 
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Jim Fehlig 1 year, 6 months ago

On 2/21/24 13:56, Jonathon Jongsma wrote:
> On 2/20/24 6:08 PM, Jim Fehlig wrote:
>> On 12/15/23 15:11, Jonathon Jongsma wrote:
>>> Previously, the script only generated the parent CPU and any versions
>>> that had a defined alias. The script now generates all CPU versions. Any
>>> version that had a defined alias will continue to use that alias, but
>>> those without aliases will use the generated name $BASECPUNAME-vN.
>>>
>>> The reason for this change is two-fold. First, we need to add new models
>>> that support new features (such as SEV-SNP). To deal with this, the
>>> script now generates model definitions for all versions.
>>>
>>> But we also need to ensure that our CPU definitions are migration-safe.
>>> To deal with this issue we need to make sure we're always using the
>>> canonical versioned names for CPUs.
>>
>> Related to migration safety, do we need to be concerned with the expansion of 
>> 'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC before 
>> introducing the new models, and EPYC-v4 afterwards? If so, what are the 
>> ramifications of that?
> 
> Indeed, that behavior is possible. In fact, you can see it happening in the test 
> results for e.g. patch #5 ("Add versioned EPYC CPUs"). But I think that in 
> general we should be fine, because we don't just expand to a plain model name, 
> we expand to a model name plus a list of enabled or disabled feature flags.
> 
> For example, consider one test output file 
> (tests/domaincapsdata/qemu_8.1.0-q35.x86_64.xml) that exhibits this behavior 
> change. Before my patches the host CPU expanded to a model of 'EPYC-ROME', with 
> a number of feature flags which included (among others):
> 
>        <feature policy='require' name='amd-ssbd'/>
>        <feature policy='disable' name='xsaves'/>
> 
> After my patches, the host model was expanded to 'EPYC-Rome-v4', with a slightly 
> different set of feature flags. The main differences being that the 'amd-ssbd' 
> and 'xsaves' features were no longer mentioned because they are now included in 
> the -v4 definition. But it also adds a new feature flag to disable the 'ibrs' 
> feature since this feature is included in the -v4 definition but not in the -v1 
> definition and is not provided by the host CPU:
> 
>        <feature policy='disable' name='ibrs'/>
> 
> So although the model name is changed, the definition of the host CPU as a whole 
> should expand to the same set of CPU features.

Thanks for the detailed explanation! It confirms my understanding, which is 
helpful for reviewing the series.

> I think that this sort of thing can happen every time a new CPU model is added 
> (e.g. the -noTSX-IBRS variants), so I think it should work. But if anyone knows 
> of any specific migration issues, please let me know.

Yep, it's not the first time we've introduced a new CPU model. And I'm now 
vaguely recalling a bug with Skylake-Client-IBRS vs Skylake-Client-noTSX-IBRS. 
The full details escape me, but IIRC migration of a VM using host-model failed 
between supposedly identical hosts with something like "missing hle,rtm 
features". In the end I think one or more hosts in the cluster had microcode 
update applied, which caused the host CPU to switch from Skylake-Client-IBRS to 
Skylake-Client-noTSX-IBRS. No fault of libvirt in that case, but an example of 
the scenarios to consider.

Regards,
Jim
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Jim Fehlig 1 year, 6 months ago

On 12/15/23 15:11, Jonathon Jongsma wrote:
> Previously, the script only generated the parent CPU and any versions
> that had a defined alias. The script now generates all CPU versions. Any
> version that had a defined alias will continue to use that alias, but
> those without aliases will use the generated name $BASECPUNAME-vN.
> 
> The reason for this change is two-fold. First, we need to add new models
> that support new features (such as SEV-SNP). To deal with this, the
> script now generates model definitions for all versions.
> 
> But we also need to ensure that our CPU definitions are migration-safe.
> To deal with this issue we need to make sure we're always using the
> canonical versioned names for CPUs.
> 
> Qemu documentation states that unversioned names for CPU models (e.g.
> 'EPYC') are actually aliases to a specific versioned CPU model (e.g.
> 'EPYC-v1'). The documentation further states that the specific version
> targeted by the alias may change based on the machine type of the
> domain. Management software such as libvirt is directed to translate
> these aliases to a concrete version in order to make sure that the CPU
> definition is safe when migrating between different qemu versions that
> may make different choices for which underlying versioned model
> represents the alias.
> 
> In practice, at the time of writing qemu always maps the unversioned
> aliases to the -v1 model. And libvirt's CPU model definitions also
> assume that this is the case. For example, the 'x86_EPYC.xml' file
> contains the features that are defined for the EPYC-v1 inside of qemu.
> But if qemu ever changes their alias mapping, libvirt's idea of what an
> 'EPYC' CPU means and qemu's idea of what an 'EPYC' CPU means will no
> longer match. So when choosing a CPU model for a domain, we should
> always pass the canonical versioned name to libvirt rather than the
> unversioned alias. To enable this, the script will generate a new
> 'canonical_name' field to the CPU model xml definition.
> 
> Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
> ---
>   src/cpu_map/sync_qemu_models_i386.py | 42 ++++++++++++++++++++++------


I'm not familiar with this script and how to use it in practice. Same for it's 
companion sync_qemu_features_i386. How often are they run? Is it the developer's 
responsibility to sift through the results for commitable changes? I admit to 
not looking too hard, but couldn't find any documentation on how to use these tools.

Regards,
Jim

>   1 file changed, 34 insertions(+), 8 deletions(-)
> 
> diff --git a/src/cpu_map/sync_qemu_models_i386.py b/src/cpu_map/sync_qemu_models_i386.py
> index 1c6a2d4d27..7fd62eba4a 100755
> --- a/src/cpu_map/sync_qemu_models_i386.py
> +++ b/src/cpu_map/sync_qemu_models_i386.py
> @@ -322,31 +322,55 @@ def expand_model(model):
>       different fields and may have differing versions into several libvirt-
>       friendly cpu models."""
>   
> -    result = {
> -        "name": model.pop(".name"),
> +    basename = model.pop(".name")
> +    parent = {
> +        "name": basename,
>           "vendor": translate_vendor(model.pop(".vendor")),
>           "features": set(),
>           "extra": dict()}
>   
>       if ".family" in model and ".model" in model:
> -        result["family"] = model.pop(".family")
> -        result["model"] = model.pop(".model")
> +        parent["family"] = model.pop(".family")
> +        parent["model"] = model.pop(".model")
>   
>       for k in [k for k in model if k.startswith(".features")]:
>           v = model.pop(k)
>           for feature in v.split():
>               translated = translate_feature(feature)
>               if translated:
> -                result["features"].add(translated)
> +                parent["features"].add(translated)
>   
>       versions = model.pop(".versions", [])
>       for k, v in model.items():
> -        result["extra"]["model" + k] = v
> -    yield result
> +        parent["extra"]["model" + k] = v
> +
> +    if not versions:
> +        yield parent
> +        return
> +
> +    result = parent
>   
>       for version in versions:
> +        # each version builds on the previous one
>           result = copy.deepcopy(result)
> -        result["name"] = version.pop(".alias", result["name"])
> +        vnum = int(version.pop(".version"))
> +        vname = "{}-v{}".format(basename, vnum)
> +        result["canonical_name"] = vname
> +        if vnum == 1:
> +            # the first version should always be an alias for the parent and
> +            # should therefore have no extra properties
> +            if version.items():
> +                raise RuntimeError("Unexpected properties in version 1")
> +            yield result
> +            continue
> +
> +        # prefer the 'alias' over the generated the name if it exists since we
> +        # have already been using these aliases
> +        alias = version.pop(".alias", None)
> +        if alias:
> +            result["name"] = alias
> +        else:
> +            result["name"] = vname
>   
>           props = version.pop(".props", dict())
>           for k, v in props:
> @@ -377,6 +401,8 @@ def output_model(f, model):
>   
>       f.write("<cpus>\n")
>       f.write("  <model name='{}'>\n".format(model["name"]))
> +    if "canonical_name" in model and model["name"] != model["canonical_name"]:
> +        f.write("    <canonical_name>{}</canonical_name>\n".format(model["canonical_name"]))
>       f.write("    <decode host='on' guest='on'/>\n")
>       f.write("    <signature family='{}' model='{}'/>\n".format(
>           model["family"], model["model"]))
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org

Re: [PATCH v3 01/12] cpu_map: update script to handle versioned CPUs

Posted by Jonathon Jongsma 1 year, 6 months ago

On 2/15/24 3:26 PM, Jim Fehlig wrote:
> On 12/15/23 15:11, Jonathon Jongsma wrote:
>> Previously, the script only generated the parent CPU and any versions
>> that had a defined alias. The script now generates all CPU versions. Any
>> version that had a defined alias will continue to use that alias, but
>> those without aliases will use the generated name $BASECPUNAME-vN.
>>
>> The reason for this change is two-fold. First, we need to add new models
>> that support new features (such as SEV-SNP). To deal with this, the
>> script now generates model definitions for all versions.
>>
>> But we also need to ensure that our CPU definitions are migration-safe.
>> To deal with this issue we need to make sure we're always using the
>> canonical versioned names for CPUs.
>>
>> Qemu documentation states that unversioned names for CPU models (e.g.
>> 'EPYC') are actually aliases to a specific versioned CPU model (e.g.
>> 'EPYC-v1'). The documentation further states that the specific version
>> targeted by the alias may change based on the machine type of the
>> domain. Management software such as libvirt is directed to translate
>> these aliases to a concrete version in order to make sure that the CPU
>> definition is safe when migrating between different qemu versions that
>> may make different choices for which underlying versioned model
>> represents the alias.
>>
>> In practice, at the time of writing qemu always maps the unversioned
>> aliases to the -v1 model. And libvirt's CPU model definitions also
>> assume that this is the case. For example, the 'x86_EPYC.xml' file
>> contains the features that are defined for the EPYC-v1 inside of qemu.
>> But if qemu ever changes their alias mapping, libvirt's idea of what an
>> 'EPYC' CPU means and qemu's idea of what an 'EPYC' CPU means will no
>> longer match. So when choosing a CPU model for a domain, we should
>> always pass the canonical versioned name to libvirt rather than the
>> unversioned alias. To enable this, the script will generate a new
>> 'canonical_name' field to the CPU model xml definition.
>>
>> Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
>> ---
>>   src/cpu_map/sync_qemu_models_i386.py | 42 ++++++++++++++++++++++------
> 
> 
> I'm not familiar with this script and how to use it in practice. Same 
> for it's companion sync_qemu_features_i386. How often are they run? Is 
> it the developer's responsibility to sift through the results for 
> commitable changes? I admit to not looking too hard, but couldn't find 
> any documentation on how to use these tools.
> 
> Regards,
> Jim


Yeah, there's not much documentation on this script. Here's the commit 
message from the initial commit of the script:

     cpu_map: Add script to sync from QEMU i386 cpu models

     This script is intended to help in synchronizing i386 QEMU cpu model
     definitions with libvirt.

     As the QEMU cpu model definitions are post processed by QEMU and not
     meant to be consumed by third parties directly, parsing this
     information is imperfect. Additionally, the libvirt models contain
     information that cannot be generated from the QEMU data, preventing
     fully automated usage. The output should nevertheless be helpful for
     a human in determining potentially interesting changes.

So basically you just run
$ ./sync_qemu_models_i386.py /path/to/qemu/target/i386/cpu.c outdir

and it will spit out a bunch of generated cpu model definitions in 
outdir and you can compare them to what already exists.

Jonathon


> 
>>   1 file changed, 34 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/cpu_map/sync_qemu_models_i386.py 
>> b/src/cpu_map/sync_qemu_models_i386.py
>> index 1c6a2d4d27..7fd62eba4a 100755
>> --- a/src/cpu_map/sync_qemu_models_i386.py
>> +++ b/src/cpu_map/sync_qemu_models_i386.py
>> @@ -322,31 +322,55 @@ def expand_model(model):
>>       different fields and may have differing versions into several 
>> libvirt-
>>       friendly cpu models."""
>> -    result = {
>> -        "name": model.pop(".name"),
>> +    basename = model.pop(".name")
>> +    parent = {
>> +        "name": basename,
>>           "vendor": translate_vendor(model.pop(".vendor")),
>>           "features": set(),
>>           "extra": dict()}
>>       if ".family" in model and ".model" in model:
>> -        result["family"] = model.pop(".family")
>> -        result["model"] = model.pop(".model")
>> +        parent["family"] = model.pop(".family")
>> +        parent["model"] = model.pop(".model")
>>       for k in [k for k in model if k.startswith(".features")]:
>>           v = model.pop(k)
>>           for feature in v.split():
>>               translated = translate_feature(feature)
>>               if translated:
>> -                result["features"].add(translated)
>> +                parent["features"].add(translated)
>>       versions = model.pop(".versions", [])
>>       for k, v in model.items():
>> -        result["extra"]["model" + k] = v
>> -    yield result
>> +        parent["extra"]["model" + k] = v
>> +
>> +    if not versions:
>> +        yield parent
>> +        return
>> +
>> +    result = parent
>>       for version in versions:
>> +        # each version builds on the previous one
>>           result = copy.deepcopy(result)
>> -        result["name"] = version.pop(".alias", result["name"])
>> +        vnum = int(version.pop(".version"))
>> +        vname = "{}-v{}".format(basename, vnum)
>> +        result["canonical_name"] = vname
>> +        if vnum == 1:
>> +            # the first version should always be an alias for the 
>> parent and
>> +            # should therefore have no extra properties
>> +            if version.items():
>> +                raise RuntimeError("Unexpected properties in version 1")
>> +            yield result
>> +            continue
>> +
>> +        # prefer the 'alias' over the generated the name if it exists 
>> since we
>> +        # have already been using these aliases
>> +        alias = version.pop(".alias", None)
>> +        if alias:
>> +            result["name"] = alias
>> +        else:
>> +            result["name"] = vname
>>           props = version.pop(".props", dict())
>>           for k, v in props:
>> @@ -377,6 +401,8 @@ def output_model(f, model):
>>       f.write("<cpus>\n")
>>       f.write("  <model name='{}'>\n".format(model["name"]))
>> +    if "canonical_name" in model and model["name"] != 
>> model["canonical_name"]:
>> +        f.write("    
>> <canonical_name>{}</canonical_name>\n".format(model["canonical_name"]))
>>       f.write("    <decode host='on' guest='on'/>\n")
>>       f.write("    <signature family='{}' model='{}'/>\n".format(
>>           model["family"], model["model"]))
> 
_______________________________________________
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-leave@lists.libvirt.org