[PATCH RFC 0/1] s390x CPU Model Feature Deprecation

Collin Walling posted 1 patch 2 years, 1 month ago
Failed in applying to current master (apply log)
src/qemu/qemu_capabilities.c               | 10 ++++++++++
tests/domaincapsdata/qemu_2.11.0.s390x.xml |  1 +
tests/domaincapsdata/qemu_2.12.0.s390x.xml |  1 +
tests/domaincapsdata/qemu_2.8.0.s390x.xml  |  1 +
tests/domaincapsdata/qemu_2.9.0.s390x.xml  |  1 +
tests/domaincapsdata/qemu_3.0.0.s390x.xml  |  1 +
tests/domaincapsdata/qemu_4.2.0.s390x.xml  |  1 +
tests/domaincapsdata/qemu_6.0.0.s390x.xml  |  1 +
8 files changed, 17 insertions(+)
[PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by Collin Walling 2 years, 1 month ago
The s390x architecture has a growing list of features that will no longer
be supported on future hardware releases. This introduces an issue with
migration such that guests, running on models with these features enabled,
will be rejected outright by machines that do not support these features.

A current example is the CSSKE feature that has been deprecated for some time. 
It has been publicly announced that gen15 will be the last release to
support this feature, however we have postponed this to gen16a. A possible
solution to remedy this would be to create a new QEMU QMP Response that allows
users to query for deprecated/unsupported features.

This presents two parts of the puzzle: how to report deprecated features to
a user (libvirt) and how should libvirt handle this information.

First, let's discuss the latter. The patch presented alongside this cover letter
attempts to solve the migration issue by hard-coding the CSSKE feature to be
disabled for all s390x CPU models. This is done by simply appending the CSSKE
feature with the disabled policy to the host-model.

libvirt pseudo:

if arch is s390x
    set CSSKE to disabled for host-model

This will be recorded under the host-model as (observable via domcapabilities):

    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>z13.2-base</model>
      <feature policy='require' name='aen'/>
      <feature policy='require' name='aefsi'/>
      <feature policy='require' name='diag318'/>
      ...
      <feature policy='disable' name='csske'/>
      ...

Obviously a hard-coded path is not a desired approach and requires a
constant update whenever newer features are listed for deprecation.
The patch is presented to instead spin up the discussion as to where 
it is appropriate to record these deprecated features (e.g. should these 
be reported under the host-model? or added to the guest CPU definition
prior to start up? etc). There is one issue observed by this change to
the host-model, denoted directly below.

A change in the host-model XML affects the hypervisor-cpu-comparison
command, which uses the libvirt-recorded host-model XML. Issuing 
comparison on a machine that still supports CSSKE (but with it flagged
as disabled in the host-model XML) with an equal or older CPU model 
that does *not* present CSSKE as disabled in the XML will be reported
as incompatible. The response should report "identical" or "superset"
because technically the hardware still supports the feature.

A possible solution is to modify the hypervisor-cpu-comparison command
to query the host-model via expansion to get the proper hypervisor CPU
model as opposed to using libvirt's modified definition.

Secondly, let's discuss the how to report the deprecated features. Namely,
an introduction of a new QEMU QMP response.

This would be a long-term approach that allows a user to query a list of 
deprecated features for a particular architecture. A list will be kept
within QEMU that contains all deprecated CPU features. This allows the
retention of CPU model definitions within QEMU. Libvirt may query this
list and update the host-model definition to disable the features reported
by QEMU.

QEMU QMP Response example:

{ "execute": "query-cpu-model-deprecated-features" }

{ "return": { "props": { "name": "csske", "name": "feat_a", "name": "feat_b" }}}

libvirt pseudo:

if query_deprecated_features is supported
    list = query_deprecated_features()
    for each feat in list
        set feat to disabled for host-model

Then, any new features that are flagged for deprecated in the future may
simply be added to this "deprecated features" list in QEMU alongside a
new CPU definition.

Please let me know your thoughts on these approaches. All input is welcome.

Thanks.

Collin Walling (1):
  qemu: capabilities: disable csske for host cpu

 src/qemu/qemu_capabilities.c               | 10 ++++++++++
 tests/domaincapsdata/qemu_2.11.0.s390x.xml |  1 +
 tests/domaincapsdata/qemu_2.12.0.s390x.xml |  1 +
 tests/domaincapsdata/qemu_2.8.0.s390x.xml  |  1 +
 tests/domaincapsdata/qemu_2.9.0.s390x.xml  |  1 +
 tests/domaincapsdata/qemu_3.0.0.s390x.xml  |  1 +
 tests/domaincapsdata/qemu_4.2.0.s390x.xml  |  1 +
 tests/domaincapsdata/qemu_6.0.0.s390x.xml  |  1 +
 8 files changed, 17 insertions(+)

-- 
2.31.1
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by David Hildenbrand 2 years, 1 month ago
On 11.03.22 05:17, Collin Walling wrote:
> The s390x architecture has a growing list of features that will no longer
> be supported on future hardware releases. This introduces an issue with
> migration such that guests, running on models with these features enabled,
> will be rejected outright by machines that do not support these features.
> 
> A current example is the CSSKE feature that has been deprecated for some time. 
> It has been publicly announced that gen15 will be the last release to
> support this feature, however we have postponed this to gen16a. A possible
> solution to remedy this would be to create a new QEMU QMP Response that allows
> users to query for deprecated/unsupported features.
> 
> This presents two parts of the puzzle: how to report deprecated features to
> a user (libvirt) and how should libvirt handle this information.
> 
> First, let's discuss the latter. The patch presented alongside this cover letter
> attempts to solve the migration issue by hard-coding the CSSKE feature to be
> disabled for all s390x CPU models. This is done by simply appending the CSSKE
> feature with the disabled policy to the host-model.
> 
> libvirt pseudo:
> 
> if arch is s390x
>     set CSSKE to disabled for host-model

That violates host-model semantics and possibly the user intend. There
would have to be some toggle to manually specify this, for example, a
new model type or a some magical flag.

Gluing this to the "host-model" feels wrong.

The other concern I have is that deprecated features are a moving
target, and with a new QEMU version you could suddenly have more
deprecated features. Hm.


Maybe you'd want some kind of a host-based-model from QEMU that does
this automatically? I need more coffee to get creative on a name.

-- 
Thanks,

David / dhildenb
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by Christian Borntraeger 2 years, 1 month ago

Am 11.03.22 um 10:30 schrieb David Hildenbrand:
> On 11.03.22 05:17, Collin Walling wrote:
>> The s390x architecture has a growing list of features that will no longer
>> be supported on future hardware releases. This introduces an issue with
>> migration such that guests, running on models with these features enabled,
>> will be rejected outright by machines that do not support these features.
>>
>> A current example is the CSSKE feature that has been deprecated for some time.
>> It has been publicly announced that gen15 will be the last release to
>> support this feature, however we have postponed this to gen16a. A possible
>> solution to remedy this would be to create a new QEMU QMP Response that allows
>> users to query for deprecated/unsupported features.
>>
>> This presents two parts of the puzzle: how to report deprecated features to
>> a user (libvirt) and how should libvirt handle this information.
>>
>> First, let's discuss the latter. The patch presented alongside this cover letter
>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
>> disabled for all s390x CPU models. This is done by simply appending the CSSKE
>> feature with the disabled policy to the host-model.
>>
>> libvirt pseudo:
>>
>> if arch is s390x
>>      set CSSKE to disabled for host-model
> 
> That violates host-model semantics and possibly the user intend. There
> would have to be some toggle to manually specify this, for example, a
> new model type or a some magical flag.

What we actually want to do is to disable csske completely from QEMU and
thus from the host-model. Then it would not violate the spec.
But this has all kind of issues (you cannot migrate from older versions
of software and machines) although the hardware still can provide the feature.

The hardware guys promised me to deprecate things two generations earlier
and we usually deprecate things that are not used or where software has a
runtime switch.

 From what I hear from you is that you do not want to modify the host-model
semantics to something more useful but rather define a new thing (e.g. "host-sane") ?

> 
> Gluing this to the "host-model" feels wrong.
> 
> The other concern I have is that deprecated features are a moving
> target, and with a new QEMU version you could suddenly have more
> deprecated features. Hm.
> 
> 
> Maybe you'd want some kind of a host-based-model from QEMU that does
> this automatically? I need more coffee to get creative on a name.
>
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by David Hildenbrand 2 years, 1 month ago
On 11.03.22 13:44, Christian Borntraeger wrote:
> 
> 
> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>> On 11.03.22 05:17, Collin Walling wrote:
>>> The s390x architecture has a growing list of features that will no longer
>>> be supported on future hardware releases. This introduces an issue with
>>> migration such that guests, running on models with these features enabled,
>>> will be rejected outright by machines that do not support these features.
>>>
>>> A current example is the CSSKE feature that has been deprecated for some time.
>>> It has been publicly announced that gen15 will be the last release to
>>> support this feature, however we have postponed this to gen16a. A possible
>>> solution to remedy this would be to create a new QEMU QMP Response that allows
>>> users to query for deprecated/unsupported features.
>>>
>>> This presents two parts of the puzzle: how to report deprecated features to
>>> a user (libvirt) and how should libvirt handle this information.
>>>
>>> First, let's discuss the latter. The patch presented alongside this cover letter
>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
>>> disabled for all s390x CPU models. This is done by simply appending the CSSKE
>>> feature with the disabled policy to the host-model.
>>>
>>> libvirt pseudo:
>>>
>>> if arch is s390x
>>>      set CSSKE to disabled for host-model
>>
>> That violates host-model semantics and possibly the user intend. There
>> would have to be some toggle to manually specify this, for example, a
>> new model type or a some magical flag.
> 
> What we actually want to do is to disable csske completely from QEMU and
> thus from the host-model. Then it would not violate the spec.
> But this has all kind of issues (you cannot migrate from older versions
> of software and machines) although the hardware still can provide the feature.
> 
> The hardware guys promised me to deprecate things two generations earlier
> and we usually deprecate things that are not used or where software has a
> runtime switch.
> 
>  From what I hear from you is that you do not want to modify the host-model
> semantics to something more useful but rather define a new thing (e.g. "host-sane") ?

My take would be, to keep the host model consistent, meaning, the
semantics in QEMU exactly match the semantics in Libvirt. It defines the
maximum CPU model that's runnable under KVM. If a feature is not
included (e.g., csske) that feature cannot be enabled in any way.

The "host model" has the semantics of resembling the actual host CPU.
This is only partially true, because we support some features the host
might not support (e.g., zPCI IIRC) and obviously don't support all host
features in QEMU.

So instead of playing games on the libvirt side with the host model, I
see the following alternatives:

1. Remove the problematic features from the host model in QEMU, like "we
just don't support this feature". Consequently, any migration of a VM
with csske=on to a new QEMU version will fail, similar to having an
older QEMU version without support for a certain feature.

"host-passthrough" would change between QEMU versions ... which I see as
problematic.

2. Introduce a new CPU model that has these new semantics: "host model"
- deprecated features. Migration of older VMs with csske=on to a new
QEMU version will work. Make libvirt use/expand that new CPU model

It doesn't necessarily have to be an actual new cpu model. We can use a
feature group, like "-cpu host,deprectated-features=false". What's
inside "deprecated-features" will actually change between QEMU versions,
but we don't really care, as the expanded CPU model won't change.

"host-passthrough" won't change between QEMU versions ...

3. As Daniel suggested, don't use the host model, but a CPU model
indicated as "suggested".

The real issue is that in reality, we don't simply always use a model
like "gen15a", but usually want optional features, if they are around.
Prime examples are "sie" and friends.



I tend to prefer 2. With 3. I see issues with optional features like
"sie" and friends. Often, you really want "give me all you got, but
disable deprecated features that might cause problems in the future".

-- 
Thanks,

David / dhildenb
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by Boris Fiuczynski 2 years, 1 month ago
On 3/15/22 4:58 PM, David Hildenbrand wrote:
> On 11.03.22 13:44, Christian Borntraeger wrote:
>>
>>
>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>> On 11.03.22 05:17, Collin Walling wrote:
>>>> The s390x architecture has a growing list of features that will no longer
>>>> be supported on future hardware releases. This introduces an issue with
>>>> migration such that guests, running on models with these features enabled,
>>>> will be rejected outright by machines that do not support these features.
>>>>
>>>> A current example is the CSSKE feature that has been deprecated for some time.
>>>> It has been publicly announced that gen15 will be the last release to
>>>> support this feature, however we have postponed this to gen16a. A possible
>>>> solution to remedy this would be to create a new QEMU QMP Response that allows
>>>> users to query for deprecated/unsupported features.
>>>>
>>>> This presents two parts of the puzzle: how to report deprecated features to
>>>> a user (libvirt) and how should libvirt handle this information.
>>>>
>>>> First, let's discuss the latter. The patch presented alongside this cover letter
>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
>>>> disabled for all s390x CPU models. This is done by simply appending the CSSKE
>>>> feature with the disabled policy to the host-model.
>>>>
>>>> libvirt pseudo:
>>>>
>>>> if arch is s390x
>>>>       set CSSKE to disabled for host-model
>>>
>>> That violates host-model semantics and possibly the user intend. There
>>> would have to be some toggle to manually specify this, for example, a
>>> new model type or a some magical flag.
>>
>> What we actually want to do is to disable csske completely from QEMU and
>> thus from the host-model. Then it would not violate the spec.
>> But this has all kind of issues (you cannot migrate from older versions
>> of software and machines) although the hardware still can provide the feature.
>>
>> The hardware guys promised me to deprecate things two generations earlier
>> and we usually deprecate things that are not used or where software has a
>> runtime switch.
>>
>>   From what I hear from you is that you do not want to modify the host-model
>> semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
> 
> My take would be, to keep the host model consistent, meaning, the
> semantics in QEMU exactly match the semantics in Libvirt. It defines the
> maximum CPU model that's runnable under KVM. If a feature is not
> included (e.g., csske) that feature cannot be enabled in any way.
> 
> The "host model" has the semantics of resembling the actual host CPU.
> This is only partially true, because we support some features the host
> might not support (e.g., zPCI IIRC) and obviously don't support all host
> features in QEMU.
> 
> So instead of playing games on the libvirt side with the host model, I
> see the following alternatives:
> 
> 1. Remove the problematic features from the host model in QEMU, like "we
> just don't support this feature". Consequently, any migration of a VM
> with csske=on to a new QEMU version will fail, similar to having an
> older QEMU version without support for a certain feature.
> 
> "host-passthrough" would change between QEMU versions ... which I see as
> problematic.
> 
> 2. Introduce a new CPU model that has these new semantics: "host model"
> - deprecated features. Migration of older VMs with csske=on to a new
> QEMU version will work. Make libvirt use/expand that new CPU model
> 
> It doesn't necessarily have to be an actual new cpu model. We can use a
> feature group, like "-cpu host,deprectated-features=false". What's
> inside "deprecated-features" will actually change between QEMU versions,
> but we don't really care, as the expanded CPU model won't change.
> 
> "host-passthrough" won't change between QEMU versions ...
> 
> 3. As Daniel suggested, don't use the host model, but a CPU model
> indicated as "suggested".
> 
> The real issue is that in reality, we don't simply always use a model
> like "gen15a", but usually want optional features, if they are around.
> Prime examples are "sie" and friends.
> 
> 
> 
> I tend to prefer 2. With 3. I see issues with optional features like
> "sie" and friends. Often, you really want "give me all you got, but
> disable deprecated features that might cause problems in the future".
> 

David,
if I understand you proposal 2 correctly it sounds a lot like Christians 
idea of leaving the CPU mode "host-model" as is and introduce a new CPU 
mode "host-recommended" for the new semantics in which 
query-cpu-model-expansion would be called with the additional 
"deprectated-features" property.
That way libvirt would not have to fiddle around with the deprecation 
itself and users would have the option which semantic they want to use. 
Is that correct?


-- 
Mit freundlichen Grüßen/Kind regards
    Boris Fiuczynski

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by David Hildenbrand 2 years, 1 month ago
On 15.03.22 18:40, Boris Fiuczynski wrote:
> On 3/15/22 4:58 PM, David Hildenbrand wrote:
>> On 11.03.22 13:44, Christian Borntraeger wrote:
>>>
>>>
>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>>> On 11.03.22 05:17, Collin Walling wrote:
>>>>> The s390x architecture has a growing list of features that will no longer
>>>>> be supported on future hardware releases. This introduces an issue with
>>>>> migration such that guests, running on models with these features enabled,
>>>>> will be rejected outright by machines that do not support these features.
>>>>>
>>>>> A current example is the CSSKE feature that has been deprecated for some time.
>>>>> It has been publicly announced that gen15 will be the last release to
>>>>> support this feature, however we have postponed this to gen16a. A possible
>>>>> solution to remedy this would be to create a new QEMU QMP Response that allows
>>>>> users to query for deprecated/unsupported features.
>>>>>
>>>>> This presents two parts of the puzzle: how to report deprecated features to
>>>>> a user (libvirt) and how should libvirt handle this information.
>>>>>
>>>>> First, let's discuss the latter. The patch presented alongside this cover letter
>>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
>>>>> disabled for all s390x CPU models. This is done by simply appending the CSSKE
>>>>> feature with the disabled policy to the host-model.
>>>>>
>>>>> libvirt pseudo:
>>>>>
>>>>> if arch is s390x
>>>>>       set CSSKE to disabled for host-model
>>>>
>>>> That violates host-model semantics and possibly the user intend. There
>>>> would have to be some toggle to manually specify this, for example, a
>>>> new model type or a some magical flag.
>>>
>>> What we actually want to do is to disable csske completely from QEMU and
>>> thus from the host-model. Then it would not violate the spec.
>>> But this has all kind of issues (you cannot migrate from older versions
>>> of software and machines) although the hardware still can provide the feature.
>>>
>>> The hardware guys promised me to deprecate things two generations earlier
>>> and we usually deprecate things that are not used or where software has a
>>> runtime switch.
>>>
>>>   From what I hear from you is that you do not want to modify the host-model
>>> semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
>>
>> My take would be, to keep the host model consistent, meaning, the
>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
>> maximum CPU model that's runnable under KVM. If a feature is not
>> included (e.g., csske) that feature cannot be enabled in any way.
>>
>> The "host model" has the semantics of resembling the actual host CPU.
>> This is only partially true, because we support some features the host
>> might not support (e.g., zPCI IIRC) and obviously don't support all host
>> features in QEMU.
>>
>> So instead of playing games on the libvirt side with the host model, I
>> see the following alternatives:
>>
>> 1. Remove the problematic features from the host model in QEMU, like "we
>> just don't support this feature". Consequently, any migration of a VM
>> with csske=on to a new QEMU version will fail, similar to having an
>> older QEMU version without support for a certain feature.
>>
>> "host-passthrough" would change between QEMU versions ... which I see as
>> problematic.
>>
>> 2. Introduce a new CPU model that has these new semantics: "host model"
>> - deprecated features. Migration of older VMs with csske=on to a new
>> QEMU version will work. Make libvirt use/expand that new CPU model
>>
>> It doesn't necessarily have to be an actual new cpu model. We can use a
>> feature group, like "-cpu host,deprectated-features=false". What's
>> inside "deprecated-features" will actually change between QEMU versions,
>> but we don't really care, as the expanded CPU model won't change.
>>
>> "host-passthrough" won't change between QEMU versions ...
>>
>> 3. As Daniel suggested, don't use the host model, but a CPU model
>> indicated as "suggested".
>>
>> The real issue is that in reality, we don't simply always use a model
>> like "gen15a", but usually want optional features, if they are around.
>> Prime examples are "sie" and friends.
>>
>>
>>
>> I tend to prefer 2. With 3. I see issues with optional features like
>> "sie" and friends. Often, you really want "give me all you got, but
>> disable deprecated features that might cause problems in the future".
>>
> 
> David,
> if I understand you proposal 2 correctly it sounds a lot like Christians 
> idea of leaving the CPU mode "host-model" as is and introduce a new CPU 
> mode "host-recommended" for the new semantics in which 
> query-cpu-model-expansion would be called with the additional 
> "deprectated-features" property.
> That way libvirt would not have to fiddle around with the deprecation 
> itself and users would have the option which semantic they want to use. 
> Is that correct?

Yes, exactly.


-- 
Thanks,

David / dhildenb
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by Collin Walling 2 years, 1 month ago
On 3/15/22 15:08, David Hildenbrand wrote:
> On 15.03.22 18:40, Boris Fiuczynski wrote:
>> On 3/15/22 4:58 PM, David Hildenbrand wrote:
>>> On 11.03.22 13:44, Christian Borntraeger wrote:
>>>>
>>>>
>>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>>>> On 11.03.22 05:17, Collin Walling wrote:
>>>>>> The s390x architecture has a growing list of features that will no longer
>>>>>> be supported on future hardware releases. This introduces an issue with
>>>>>> migration such that guests, running on models with these features enabled,
>>>>>> will be rejected outright by machines that do not support these features.
>>>>>>
>>>>>> A current example is the CSSKE feature that has been deprecated for some time.
>>>>>> It has been publicly announced that gen15 will be the last release to
>>>>>> support this feature, however we have postponed this to gen16a. A possible
>>>>>> solution to remedy this would be to create a new QEMU QMP Response that allows
>>>>>> users to query for deprecated/unsupported features.
>>>>>>
>>>>>> This presents two parts of the puzzle: how to report deprecated features to
>>>>>> a user (libvirt) and how should libvirt handle this information.
>>>>>>
>>>>>> First, let's discuss the latter. The patch presented alongside this cover letter
>>>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
>>>>>> disabled for all s390x CPU models. This is done by simply appending the CSSKE
>>>>>> feature with the disabled policy to the host-model.
>>>>>>
>>>>>> libvirt pseudo:
>>>>>>
>>>>>> if arch is s390x
>>>>>>       set CSSKE to disabled for host-model
>>>>>
>>>>> That violates host-model semantics and possibly the user intend. There
>>>>> would have to be some toggle to manually specify this, for example, a
>>>>> new model type or a some magical flag.
>>>>
>>>> What we actually want to do is to disable csske completely from QEMU and
>>>> thus from the host-model. Then it would not violate the spec.
>>>> But this has all kind of issues (you cannot migrate from older versions
>>>> of software and machines) although the hardware still can provide the feature.
>>>>
>>>> The hardware guys promised me to deprecate things two generations earlier
>>>> and we usually deprecate things that are not used or where software has a
>>>> runtime switch.
>>>>
>>>>   From what I hear from you is that you do not want to modify the host-model
>>>> semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
>>>
>>> My take would be, to keep the host model consistent, meaning, the
>>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
>>> maximum CPU model that's runnable under KVM. If a feature is not
>>> included (e.g., csske) that feature cannot be enabled in any way.
>>>
>>> The "host model" has the semantics of resembling the actual host CPU.
>>> This is only partially true, because we support some features the host
>>> might not support (e.g., zPCI IIRC) and obviously don't support all host
>>> features in QEMU.
>>>
>>> So instead of playing games on the libvirt side with the host model, I
>>> see the following alternatives:
>>>
>>> 1. Remove the problematic features from the host model in QEMU, like "we
>>> just don't support this feature". Consequently, any migration of a VM
>>> with csske=on to a new QEMU version will fail, similar to having an
>>> older QEMU version without support for a certain feature.
>>>
>>> "host-passthrough" would change between QEMU versions ... which I see as
>>> problematic.
>>>
>>> 2. Introduce a new CPU model that has these new semantics: "host model"
>>> - deprecated features. Migration of older VMs with csske=on to a new
>>> QEMU version will work. Make libvirt use/expand that new CPU model
>>>
>>> It doesn't necessarily have to be an actual new cpu model. We can use a
>>> feature group, like "-cpu host,deprectated-features=false". What's
>>> inside "deprecated-features" will actually change between QEMU versions,
>>> but we don't really care, as the expanded CPU model won't change.
>>>
>>> "host-passthrough" won't change between QEMU versions ...
>>>
>>> 3. As Daniel suggested, don't use the host model, but a CPU model
>>> indicated as "suggested".
>>>
>>> The real issue is that in reality, we don't simply always use a model
>>> like "gen15a", but usually want optional features, if they are around.
>>> Prime examples are "sie" and friends.
>>>
>>>
>>>
>>> I tend to prefer 2. With 3. I see issues with optional features like
>>> "sie" and friends. Often, you really want "give me all you got, but
>>> disable deprecated features that might cause problems in the future".
>>>
>>
>> David,
>> if I understand you proposal 2 correctly it sounds a lot like Christians 
>> idea of leaving the CPU mode "host-model" as is and introduce a new CPU 
>> mode "host-recommended" for the new semantics in which 
>> query-cpu-model-expansion would be called with the additional 
>> "deprectated-features" property.
>> That way libvirt would not have to fiddle around with the deprecation 
>> itself and users would have the option which semantic they want to use. 
>> Is that correct?
> 
> Yes, exactly.
> 
> 

From what I understand:

QEMU
 - add a "deprecated-features" feature group (more-or-less David's code)

libvirt
 - recognize a new model name "host-recommended"
 - query QEMU for host-model + deprecated-features and cache it in caps
file (something like <hostRecCpu>)
 - when guest is defined with "host-recommended", pull <hostRecCPU> from
caps when guest is started (similar to how host-model works today)

If this is sufficient, then I can then get to work on this.

My question is what would be the best way to include the deprecated
features when calculating a baseline or comparison. Both work with the
host-model and may no longer present an accurate result. Say, for
example, we baseline a z15 with a gen17 (which will outright not support
CSSKE). With today's implementation, this might result in a ridiculously
old CPU model which also does not support CSSKE. The ideal response
would be a z15 - deprecated features (i.e. host-recommended on a z15),
but we'd need a way to flag to QEMU that we want to exclude the
deprecated features. Or am I totally wrong about this?

-- 
Regards,
Collin

Stay safe and stay healthy
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by Daniel P. Berrangé 2 years, 1 month ago
On Fri, Mar 18, 2022 at 01:23:03PM -0400, Collin Walling wrote:
> On 3/15/22 15:08, David Hildenbrand wrote:
> > On 15.03.22 18:40, Boris Fiuczynski wrote:
> >> On 3/15/22 4:58 PM, David Hildenbrand wrote:
> >>> On 11.03.22 13:44, Christian Borntraeger wrote:
> >>>>
> >>>>
> >>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
> >>>>> On 11.03.22 05:17, Collin Walling wrote:
> >>>>>> The s390x architecture has a growing list of features that will no longer
> >>>>>> be supported on future hardware releases. This introduces an issue with
> >>>>>> migration such that guests, running on models with these features enabled,
> >>>>>> will be rejected outright by machines that do not support these features.
> >>>>>>
> >>>>>> A current example is the CSSKE feature that has been deprecated for some time.
> >>>>>> It has been publicly announced that gen15 will be the last release to
> >>>>>> support this feature, however we have postponed this to gen16a. A possible
> >>>>>> solution to remedy this would be to create a new QEMU QMP Response that allows
> >>>>>> users to query for deprecated/unsupported features.
> >>>>>>
> >>>>>> This presents two parts of the puzzle: how to report deprecated features to
> >>>>>> a user (libvirt) and how should libvirt handle this information.
> >>>>>>
> >>>>>> First, let's discuss the latter. The patch presented alongside this cover letter
> >>>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
> >>>>>> disabled for all s390x CPU models. This is done by simply appending the CSSKE
> >>>>>> feature with the disabled policy to the host-model.
> >>>>>>
> >>>>>> libvirt pseudo:
> >>>>>>
> >>>>>> if arch is s390x
> >>>>>>       set CSSKE to disabled for host-model
> >>>>>
> >>>>> That violates host-model semantics and possibly the user intend. There
> >>>>> would have to be some toggle to manually specify this, for example, a
> >>>>> new model type or a some magical flag.
> >>>>
> >>>> What we actually want to do is to disable csske completely from QEMU and
> >>>> thus from the host-model. Then it would not violate the spec.
> >>>> But this has all kind of issues (you cannot migrate from older versions
> >>>> of software and machines) although the hardware still can provide the feature.
> >>>>
> >>>> The hardware guys promised me to deprecate things two generations earlier
> >>>> and we usually deprecate things that are not used or where software has a
> >>>> runtime switch.
> >>>>
> >>>>   From what I hear from you is that you do not want to modify the host-model
> >>>> semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
> >>>
> >>> My take would be, to keep the host model consistent, meaning, the
> >>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
> >>> maximum CPU model that's runnable under KVM. If a feature is not
> >>> included (e.g., csske) that feature cannot be enabled in any way.
> >>>
> >>> The "host model" has the semantics of resembling the actual host CPU.
> >>> This is only partially true, because we support some features the host
> >>> might not support (e.g., zPCI IIRC) and obviously don't support all host
> >>> features in QEMU.
> >>>
> >>> So instead of playing games on the libvirt side with the host model, I
> >>> see the following alternatives:
> >>>
> >>> 1. Remove the problematic features from the host model in QEMU, like "we
> >>> just don't support this feature". Consequently, any migration of a VM
> >>> with csske=on to a new QEMU version will fail, similar to having an
> >>> older QEMU version without support for a certain feature.
> >>>
> >>> "host-passthrough" would change between QEMU versions ... which I see as
> >>> problematic.
> >>>
> >>> 2. Introduce a new CPU model that has these new semantics: "host model"
> >>> - deprecated features. Migration of older VMs with csske=on to a new
> >>> QEMU version will work. Make libvirt use/expand that new CPU model
> >>>
> >>> It doesn't necessarily have to be an actual new cpu model. We can use a
> >>> feature group, like "-cpu host,deprectated-features=false". What's
> >>> inside "deprecated-features" will actually change between QEMU versions,
> >>> but we don't really care, as the expanded CPU model won't change.
> >>>
> >>> "host-passthrough" won't change between QEMU versions ...
> >>>
> >>> 3. As Daniel suggested, don't use the host model, but a CPU model
> >>> indicated as "suggested".
> >>>
> >>> The real issue is that in reality, we don't simply always use a model
> >>> like "gen15a", but usually want optional features, if they are around.
> >>> Prime examples are "sie" and friends.
> >>>
> >>>
> >>>
> >>> I tend to prefer 2. With 3. I see issues with optional features like
> >>> "sie" and friends. Often, you really want "give me all you got, but
> >>> disable deprecated features that might cause problems in the future".
> >>>
> >>
> >> David,
> >> if I understand you proposal 2 correctly it sounds a lot like Christians 
> >> idea of leaving the CPU mode "host-model" as is and introduce a new CPU 
> >> mode "host-recommended" for the new semantics in which 
> >> query-cpu-model-expansion would be called with the additional 
> >> "deprectated-features" property.
> >> That way libvirt would not have to fiddle around with the deprecation 
> >> itself and users would have the option which semantic they want to use. 
> >> Is that correct?
> > 
> > Yes, exactly.
> > 
> > 
> 
> From what I understand:
> 
> QEMU
>  - add a "deprecated-features" feature group (more-or-less David's code)
> 
> libvirt
>  - recognize a new model name "host-recommended"
>  - query QEMU for host-model + deprecated-features and cache it in caps
> file (something like <hostRecCpu>)
>  - when guest is defined with "host-recommended", pull <hostRecCPU> from
> caps when guest is started (similar to how host-model works today)
> 
> If this is sufficient, then I can then get to work on this.
> 
> My question is what would be the best way to include the deprecated
> features when calculating a baseline or comparison. Both work with the
> host-model and may no longer present an accurate result. Say, for
> example, we baseline a z15 with a gen17 (which will outright not support
> CSSKE). With today's implementation, this might result in a ridiculously
> old CPU model which also does not support CSSKE. The ideal response
> would be a z15 - deprecated features (i.e. host-recommended on a z15),
> but we'd need a way to flag to QEMU that we want to exclude the
> deprecated features. Or am I totally wrong about this?

QEMU has a concept of versioned QEMU models, so you could define a
z15-v2 version without CSSKE

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by David Hildenbrand 2 years, 1 month ago
On 21.03.22 10:25, Daniel P. Berrangé wrote:
> On Fri, Mar 18, 2022 at 01:23:03PM -0400, Collin Walling wrote:
>> On 3/15/22 15:08, David Hildenbrand wrote:
>>> On 15.03.22 18:40, Boris Fiuczynski wrote:
>>>> On 3/15/22 4:58 PM, David Hildenbrand wrote:
>>>>> On 11.03.22 13:44, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>>>>>> On 11.03.22 05:17, Collin Walling wrote:
>>>>>>>> The s390x architecture has a growing list of features that will no longer
>>>>>>>> be supported on future hardware releases. This introduces an issue with
>>>>>>>> migration such that guests, running on models with these features enabled,
>>>>>>>> will be rejected outright by machines that do not support these features.
>>>>>>>>
>>>>>>>> A current example is the CSSKE feature that has been deprecated for some time.
>>>>>>>> It has been publicly announced that gen15 will be the last release to
>>>>>>>> support this feature, however we have postponed this to gen16a. A possible
>>>>>>>> solution to remedy this would be to create a new QEMU QMP Response that allows
>>>>>>>> users to query for deprecated/unsupported features.
>>>>>>>>
>>>>>>>> This presents two parts of the puzzle: how to report deprecated features to
>>>>>>>> a user (libvirt) and how should libvirt handle this information.
>>>>>>>>
>>>>>>>> First, let's discuss the latter. The patch presented alongside this cover letter
>>>>>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
>>>>>>>> disabled for all s390x CPU models. This is done by simply appending the CSSKE
>>>>>>>> feature with the disabled policy to the host-model.
>>>>>>>>
>>>>>>>> libvirt pseudo:
>>>>>>>>
>>>>>>>> if arch is s390x
>>>>>>>>       set CSSKE to disabled for host-model
>>>>>>>
>>>>>>> That violates host-model semantics and possibly the user intend. There
>>>>>>> would have to be some toggle to manually specify this, for example, a
>>>>>>> new model type or a some magical flag.
>>>>>>
>>>>>> What we actually want to do is to disable csske completely from QEMU and
>>>>>> thus from the host-model. Then it would not violate the spec.
>>>>>> But this has all kind of issues (you cannot migrate from older versions
>>>>>> of software and machines) although the hardware still can provide the feature.
>>>>>>
>>>>>> The hardware guys promised me to deprecate things two generations earlier
>>>>>> and we usually deprecate things that are not used or where software has a
>>>>>> runtime switch.
>>>>>>
>>>>>>   From what I hear from you is that you do not want to modify the host-model
>>>>>> semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
>>>>>
>>>>> My take would be, to keep the host model consistent, meaning, the
>>>>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
>>>>> maximum CPU model that's runnable under KVM. If a feature is not
>>>>> included (e.g., csske) that feature cannot be enabled in any way.
>>>>>
>>>>> The "host model" has the semantics of resembling the actual host CPU.
>>>>> This is only partially true, because we support some features the host
>>>>> might not support (e.g., zPCI IIRC) and obviously don't support all host
>>>>> features in QEMU.
>>>>>
>>>>> So instead of playing games on the libvirt side with the host model, I
>>>>> see the following alternatives:
>>>>>
>>>>> 1. Remove the problematic features from the host model in QEMU, like "we
>>>>> just don't support this feature". Consequently, any migration of a VM
>>>>> with csske=on to a new QEMU version will fail, similar to having an
>>>>> older QEMU version without support for a certain feature.
>>>>>
>>>>> "host-passthrough" would change between QEMU versions ... which I see as
>>>>> problematic.
>>>>>
>>>>> 2. Introduce a new CPU model that has these new semantics: "host model"
>>>>> - deprecated features. Migration of older VMs with csske=on to a new
>>>>> QEMU version will work. Make libvirt use/expand that new CPU model
>>>>>
>>>>> It doesn't necessarily have to be an actual new cpu model. We can use a
>>>>> feature group, like "-cpu host,deprectated-features=false". What's
>>>>> inside "deprecated-features" will actually change between QEMU versions,
>>>>> but we don't really care, as the expanded CPU model won't change.
>>>>>
>>>>> "host-passthrough" won't change between QEMU versions ...
>>>>>
>>>>> 3. As Daniel suggested, don't use the host model, but a CPU model
>>>>> indicated as "suggested".
>>>>>
>>>>> The real issue is that in reality, we don't simply always use a model
>>>>> like "gen15a", but usually want optional features, if they are around.
>>>>> Prime examples are "sie" and friends.
>>>>>
>>>>>
>>>>>
>>>>> I tend to prefer 2. With 3. I see issues with optional features like
>>>>> "sie" and friends. Often, you really want "give me all you got, but
>>>>> disable deprecated features that might cause problems in the future".
>>>>>
>>>>
>>>> David,
>>>> if I understand you proposal 2 correctly it sounds a lot like Christians 
>>>> idea of leaving the CPU mode "host-model" as is and introduce a new CPU 
>>>> mode "host-recommended" for the new semantics in which 
>>>> query-cpu-model-expansion would be called with the additional 
>>>> "deprectated-features" property.
>>>> That way libvirt would not have to fiddle around with the deprecation 
>>>> itself and users would have the option which semantic they want to use. 
>>>> Is that correct?
>>>
>>> Yes, exactly.
>>>
>>>
>>
>> From what I understand:
>>
>> QEMU
>>  - add a "deprecated-features" feature group (more-or-less David's code)
>>
>> libvirt
>>  - recognize a new model name "host-recommended"
>>  - query QEMU for host-model + deprecated-features and cache it in caps
>> file (something like <hostRecCpu>)
>>  - when guest is defined with "host-recommended", pull <hostRecCPU> from
>> caps when guest is started (similar to how host-model works today)
>>
>> If this is sufficient, then I can then get to work on this.
>>
>> My question is what would be the best way to include the deprecated
>> features when calculating a baseline or comparison. Both work with the
>> host-model and may no longer present an accurate result. Say, for
>> example, we baseline a z15 with a gen17 (which will outright not support
>> CSSKE). With today's implementation, this might result in a ridiculously
>> old CPU model which also does not support CSSKE. The ideal response
>> would be a z15 - deprecated features (i.e. host-recommended on a z15),
>> but we'd need a way to flag to QEMU that we want to exclude the
>> deprecated features. Or am I totally wrong about this?
> 
> QEMU has a concept of versioned QEMU models, so you could define a
> z15-v2 version without CSSKE

gen15a already comes with csske=false. s390x does not implement
versioned CPU models and as I raised in the past, that concept is rather
a bad fit for s390x.

-- 
Thanks,

David / dhildenb

Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by David Hildenbrand 2 years, 1 month ago
On 18.03.22 18:23, Collin Walling wrote:
> On 3/15/22 15:08, David Hildenbrand wrote:
>> On 15.03.22 18:40, Boris Fiuczynski wrote:
>>> On 3/15/22 4:58 PM, David Hildenbrand wrote:
>>>> On 11.03.22 13:44, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>>>>> On 11.03.22 05:17, Collin Walling wrote:
>>>>>>> The s390x architecture has a growing list of features that will no longer
>>>>>>> be supported on future hardware releases. This introduces an issue with
>>>>>>> migration such that guests, running on models with these features enabled,
>>>>>>> will be rejected outright by machines that do not support these features.
>>>>>>>
>>>>>>> A current example is the CSSKE feature that has been deprecated for some time.
>>>>>>> It has been publicly announced that gen15 will be the last release to
>>>>>>> support this feature, however we have postponed this to gen16a. A possible
>>>>>>> solution to remedy this would be to create a new QEMU QMP Response that allows
>>>>>>> users to query for deprecated/unsupported features.
>>>>>>>
>>>>>>> This presents two parts of the puzzle: how to report deprecated features to
>>>>>>> a user (libvirt) and how should libvirt handle this information.
>>>>>>>
>>>>>>> First, let's discuss the latter. The patch presented alongside this cover letter
>>>>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
>>>>>>> disabled for all s390x CPU models. This is done by simply appending the CSSKE
>>>>>>> feature with the disabled policy to the host-model.
>>>>>>>
>>>>>>> libvirt pseudo:
>>>>>>>
>>>>>>> if arch is s390x
>>>>>>>       set CSSKE to disabled for host-model
>>>>>>
>>>>>> That violates host-model semantics and possibly the user intend. There
>>>>>> would have to be some toggle to manually specify this, for example, a
>>>>>> new model type or a some magical flag.
>>>>>
>>>>> What we actually want to do is to disable csske completely from QEMU and
>>>>> thus from the host-model. Then it would not violate the spec.
>>>>> But this has all kind of issues (you cannot migrate from older versions
>>>>> of software and machines) although the hardware still can provide the feature.
>>>>>
>>>>> The hardware guys promised me to deprecate things two generations earlier
>>>>> and we usually deprecate things that are not used or where software has a
>>>>> runtime switch.
>>>>>
>>>>>   From what I hear from you is that you do not want to modify the host-model
>>>>> semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
>>>>
>>>> My take would be, to keep the host model consistent, meaning, the
>>>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
>>>> maximum CPU model that's runnable under KVM. If a feature is not
>>>> included (e.g., csske) that feature cannot be enabled in any way.
>>>>
>>>> The "host model" has the semantics of resembling the actual host CPU.
>>>> This is only partially true, because we support some features the host
>>>> might not support (e.g., zPCI IIRC) and obviously don't support all host
>>>> features in QEMU.
>>>>
>>>> So instead of playing games on the libvirt side with the host model, I
>>>> see the following alternatives:
>>>>
>>>> 1. Remove the problematic features from the host model in QEMU, like "we
>>>> just don't support this feature". Consequently, any migration of a VM
>>>> with csske=on to a new QEMU version will fail, similar to having an
>>>> older QEMU version without support for a certain feature.
>>>>
>>>> "host-passthrough" would change between QEMU versions ... which I see as
>>>> problematic.
>>>>
>>>> 2. Introduce a new CPU model that has these new semantics: "host model"
>>>> - deprecated features. Migration of older VMs with csske=on to a new
>>>> QEMU version will work. Make libvirt use/expand that new CPU model
>>>>
>>>> It doesn't necessarily have to be an actual new cpu model. We can use a
>>>> feature group, like "-cpu host,deprectated-features=false". What's
>>>> inside "deprecated-features" will actually change between QEMU versions,
>>>> but we don't really care, as the expanded CPU model won't change.
>>>>
>>>> "host-passthrough" won't change between QEMU versions ...
>>>>
>>>> 3. As Daniel suggested, don't use the host model, but a CPU model
>>>> indicated as "suggested".
>>>>
>>>> The real issue is that in reality, we don't simply always use a model
>>>> like "gen15a", but usually want optional features, if they are around.
>>>> Prime examples are "sie" and friends.
>>>>
>>>>
>>>>
>>>> I tend to prefer 2. With 3. I see issues with optional features like
>>>> "sie" and friends. Often, you really want "give me all you got, but
>>>> disable deprecated features that might cause problems in the future".
>>>>
>>>
>>> David,
>>> if I understand you proposal 2 correctly it sounds a lot like Christians 
>>> idea of leaving the CPU mode "host-model" as is and introduce a new CPU 
>>> mode "host-recommended" for the new semantics in which 
>>> query-cpu-model-expansion would be called with the additional 
>>> "deprectated-features" property.
>>> That way libvirt would not have to fiddle around with the deprecation 
>>> itself and users would have the option which semantic they want to use. 
>>> Is that correct?
>>
>> Yes, exactly.
>>
>>
> 
> From what I understand:
> 
> QEMU
>  - add a "deprecated-features" feature group (more-or-less David's code)
> 
> libvirt
>  - recognize a new model name "host-recommended"
>  - query QEMU for host-model + deprecated-features and cache it in caps
> file (something like <hostRecCpu>)
>  - when guest is defined with "host-recommended", pull <hostRecCPU> from
> caps when guest is started (similar to how host-model works today)
> 
> If this is sufficient, then I can then get to work on this.
> 
> My question is what would be the best way to include the deprecated
> features when calculating a baseline or comparison. Both work with the
> host-model and may no longer present an accurate result. Say, for
> example, we baseline a z15 with a gen17 (which will outright not support
> CSSKE). With today's implementation, this might result in a ridiculously
> old CPU model which also does not support CSSKE. The ideal response
> would be a z15 - deprecated features (i.e. host-recommended on a z15),
> but we'd need a way to flag to QEMU that we want to exclude the
> deprecated features. Or am I totally wrong about this?

For baselining, it would be reasonable to always disable deprecated
features, and to ignore them during the model selection. Should be
fairly easy to implement, let me know if you need any pointers.

I *assume* that for comparison there is nothing to do.

-- 
Thanks,

David / dhildenb
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by Collin Walling 2 years, 1 month ago
On 3/18/22 14:33, David Hildenbrand wrote:
> On 18.03.22 18:23, Collin Walling wrote:
>> On 3/15/22 15:08, David Hildenbrand wrote:
>>> On 15.03.22 18:40, Boris Fiuczynski wrote:
>>>> On 3/15/22 4:58 PM, David Hildenbrand wrote:
>>>>> On 11.03.22 13:44, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>>>>>> On 11.03.22 05:17, Collin Walling wrote:
>>>>>>>> The s390x architecture has a growing list of features that will no longer
>>>>>>>> be supported on future hardware releases. This introduces an issue with
>>>>>>>> migration such that guests, running on models with these features enabled,
>>>>>>>> will be rejected outright by machines that do not support these features.
>>>>>>>>
>>>>>>>> A current example is the CSSKE feature that has been deprecated for some time.
>>>>>>>> It has been publicly announced that gen15 will be the last release to
>>>>>>>> support this feature, however we have postponed this to gen16a. A possible
>>>>>>>> solution to remedy this would be to create a new QEMU QMP Response that allows
>>>>>>>> users to query for deprecated/unsupported features.
>>>>>>>>
>>>>>>>> This presents two parts of the puzzle: how to report deprecated features to
>>>>>>>> a user (libvirt) and how should libvirt handle this information.
>>>>>>>>
>>>>>>>> First, let's discuss the latter. The patch presented alongside this cover letter
>>>>>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
>>>>>>>> disabled for all s390x CPU models. This is done by simply appending the CSSKE
>>>>>>>> feature with the disabled policy to the host-model.
>>>>>>>>
>>>>>>>> libvirt pseudo:
>>>>>>>>
>>>>>>>> if arch is s390x
>>>>>>>>       set CSSKE to disabled for host-model
>>>>>>>
>>>>>>> That violates host-model semantics and possibly the user intend. There
>>>>>>> would have to be some toggle to manually specify this, for example, a
>>>>>>> new model type or a some magical flag.
>>>>>>
>>>>>> What we actually want to do is to disable csske completely from QEMU and
>>>>>> thus from the host-model. Then it would not violate the spec.
>>>>>> But this has all kind of issues (you cannot migrate from older versions
>>>>>> of software and machines) although the hardware still can provide the feature.
>>>>>>
>>>>>> The hardware guys promised me to deprecate things two generations earlier
>>>>>> and we usually deprecate things that are not used or where software has a
>>>>>> runtime switch.
>>>>>>
>>>>>>   From what I hear from you is that you do not want to modify the host-model
>>>>>> semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
>>>>>
>>>>> My take would be, to keep the host model consistent, meaning, the
>>>>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
>>>>> maximum CPU model that's runnable under KVM. If a feature is not
>>>>> included (e.g., csske) that feature cannot be enabled in any way.
>>>>>
>>>>> The "host model" has the semantics of resembling the actual host CPU.
>>>>> This is only partially true, because we support some features the host
>>>>> might not support (e.g., zPCI IIRC) and obviously don't support all host
>>>>> features in QEMU.
>>>>>
>>>>> So instead of playing games on the libvirt side with the host model, I
>>>>> see the following alternatives:
>>>>>
>>>>> 1. Remove the problematic features from the host model in QEMU, like "we
>>>>> just don't support this feature". Consequently, any migration of a VM
>>>>> with csske=on to a new QEMU version will fail, similar to having an
>>>>> older QEMU version without support for a certain feature.
>>>>>
>>>>> "host-passthrough" would change between QEMU versions ... which I see as
>>>>> problematic.
>>>>>
>>>>> 2. Introduce a new CPU model that has these new semantics: "host model"
>>>>> - deprecated features. Migration of older VMs with csske=on to a new
>>>>> QEMU version will work. Make libvirt use/expand that new CPU model
>>>>>
>>>>> It doesn't necessarily have to be an actual new cpu model. We can use a
>>>>> feature group, like "-cpu host,deprectated-features=false". What's
>>>>> inside "deprecated-features" will actually change between QEMU versions,
>>>>> but we don't really care, as the expanded CPU model won't change.
>>>>>
>>>>> "host-passthrough" won't change between QEMU versions ...
>>>>>
>>>>> 3. As Daniel suggested, don't use the host model, but a CPU model
>>>>> indicated as "suggested".
>>>>>
>>>>> The real issue is that in reality, we don't simply always use a model
>>>>> like "gen15a", but usually want optional features, if they are around.
>>>>> Prime examples are "sie" and friends.
>>>>>
>>>>>
>>>>>
>>>>> I tend to prefer 2. With 3. I see issues with optional features like
>>>>> "sie" and friends. Often, you really want "give me all you got, but
>>>>> disable deprecated features that might cause problems in the future".
>>>>>
>>>>
>>>> David,
>>>> if I understand you proposal 2 correctly it sounds a lot like Christians 
>>>> idea of leaving the CPU mode "host-model" as is and introduce a new CPU 
>>>> mode "host-recommended" for the new semantics in which 
>>>> query-cpu-model-expansion would be called with the additional 
>>>> "deprectated-features" property.
>>>> That way libvirt would not have to fiddle around with the deprecation 
>>>> itself and users would have the option which semantic they want to use. 
>>>> Is that correct?
>>>
>>> Yes, exactly.
>>>
>>>
>>
>> From what I understand:
>>
>> QEMU
>>  - add a "deprecated-features" feature group (more-or-less David's code)
>>
>> libvirt
>>  - recognize a new model name "host-recommended"
>>  - query QEMU for host-model + deprecated-features and cache it in caps
>> file (something like <hostRecCpu>)
>>  - when guest is defined with "host-recommended", pull <hostRecCPU> from
>> caps when guest is started (similar to how host-model works today)
>>
>> If this is sufficient, then I can then get to work on this.
>>
>> My question is what would be the best way to include the deprecated
>> features when calculating a baseline or comparison. Both work with the
>> host-model and may no longer present an accurate result. Say, for
>> example, we baseline a z15 with a gen17 (which will outright not support
>> CSSKE). With today's implementation, this might result in a ridiculously
>> old CPU model which also does not support CSSKE. The ideal response
>> would be a z15 - deprecated features (i.e. host-recommended on a z15),
>> but we'd need a way to flag to QEMU that we want to exclude the
>> deprecated features. Or am I totally wrong about this?
> 
> For baselining, it would be reasonable to always disable deprecated
> features, and to ignore them during the model selection. Should be
> fairly easy to implement, let me know if you need any pointers.
> 

Thanks David. I'll take a look when I can. I may not be very active this
week due to personal items, but intend to knock this out as soon as
things settle down on my end.

> I *assume* that for comparison there is nothing to do.
> 

I think you're right, at least on QEMU's end.

For libvirt, IIRC, comparison will compare the CPU model cached under
the hostCPU tag to whatever is in the XML. If comparing, say, a gen17
host (no csske support) with a gen15 XML, the result should come up as
"incompatible". To a user, they may think "what the heck, shouldn't old
gen run on new gen?"

Doesn't the comparison QAPI report which features cause the result of
"incompatible"? Would it make sense to amend the libvirt API to report
features causing this issue? I believe this is what the --error flag is
meant to do, but as far as I know, nothing useful is currently reported.

Something like this (assume we're a gen17 host, and cpu.xml contains a
gen15 host-model)

# virsh hypervisor-cpu-compare cpu.xml --error
error: Failed to compare hypervisor CPU with cpu.xml
error: the CPU is incompatible with host CPU
error: host CPU does not support: csske


-- 
Regards,
Collin

Stay safe and stay healthy
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by David Hildenbrand 2 years, 1 month ago
>>> From what I understand:
>>>
>>> QEMU
>>>  - add a "deprecated-features" feature group (more-or-less David's code)
>>>
>>> libvirt
>>>  - recognize a new model name "host-recommended"
>>>  - query QEMU for host-model + deprecated-features and cache it in caps
>>> file (something like <hostRecCpu>)
>>>  - when guest is defined with "host-recommended", pull <hostRecCPU> from
>>> caps when guest is started (similar to how host-model works today)
>>>
>>> If this is sufficient, then I can then get to work on this.
>>>
>>> My question is what would be the best way to include the deprecated
>>> features when calculating a baseline or comparison. Both work with the
>>> host-model and may no longer present an accurate result. Say, for
>>> example, we baseline a z15 with a gen17 (which will outright not support
>>> CSSKE). With today's implementation, this might result in a ridiculously
>>> old CPU model which also does not support CSSKE. The ideal response
>>> would be a z15 - deprecated features (i.e. host-recommended on a z15),
>>> but we'd need a way to flag to QEMU that we want to exclude the
>>> deprecated features. Or am I totally wrong about this?
>>
>> For baselining, it would be reasonable to always disable deprecated
>> features, and to ignore them during the model selection. Should be
>> fairly easy to implement, let me know if you need any pointers.
>>
> 
> Thanks David. I'll take a look when I can. I may not be very active this
> week due to personal items, but intend to knock this out as soon as
> things settle down on my end.

No need to rush :)

> 
>> I *assume* that for comparison there is nothing to do.
>>
> 
> I think you're right, at least on QEMU's end.
> 
> For libvirt, IIRC, comparison will compare the CPU model cached under
> the hostCPU tag to whatever is in the XML. If comparing, say, a gen17
> host (no csske support) with a gen15 XML, the result should come up as
> "incompatible". To a user, they may think "what the heck, shouldn't old
> gen run on new gen?"

I assume you mean an expanded host model on a z15 that still shows
"csske=true". And it would be correct: the deprecated feature still
around on the older machine (indicated in the host model) is not around
on the newer machine (not indicated in the host model). So starting a VM
with the "host-model" on the old machine cannot be migrated to the new
machine. You'd need to start the VM with the new host-TOBENAMED CPU
model. Comparing with that would work as expected, as the deprecated
features would not be included.

> 
> Doesn't the comparison QAPI report which features cause the result of
> "incompatible"? Would it make sense to amend the libvirt API to report
> features causing this issue? I believe this is what the --error flag is
> meant to do, but as far as I know, nothing useful is currently reported.

Most probably it was never implemented on s390x. Makes sense to me.

> 
> Something like this (assume we're a gen17 host, and cpu.xml contains a
> gen15 host-model)
> 
> # virsh hypervisor-cpu-compare cpu.xml --error
> error: Failed to compare hypervisor CPU with cpu.xml
> error: the CPU is incompatible with host CPU
> error: host CPU does not support: csske

I guess instead of "host CPU" you'd want to indicate one of the two CPU
models provided. Not sure how to differentiate them from the XML.


-- 
Thanks,

David / dhildenb
Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation
Posted by David Hildenbrand 2 years, 1 month ago
On 15.03.22 16:58, David Hildenbrand wrote:
> On 11.03.22 13:44, Christian Borntraeger wrote:
>>
>>
>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>> On 11.03.22 05:17, Collin Walling wrote:
>>>> The s390x architecture has a growing list of features that will no longer
>>>> be supported on future hardware releases. This introduces an issue with
>>>> migration such that guests, running on models with these features enabled,
>>>> will be rejected outright by machines that do not support these features.
>>>>
>>>> A current example is the CSSKE feature that has been deprecated for some time.
>>>> It has been publicly announced that gen15 will be the last release to
>>>> support this feature, however we have postponed this to gen16a. A possible
>>>> solution to remedy this would be to create a new QEMU QMP Response that allows
>>>> users to query for deprecated/unsupported features.
>>>>
>>>> This presents two parts of the puzzle: how to report deprecated features to
>>>> a user (libvirt) and how should libvirt handle this information.
>>>>
>>>> First, let's discuss the latter. The patch presented alongside this cover letter
>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
>>>> disabled for all s390x CPU models. This is done by simply appending the CSSKE
>>>> feature with the disabled policy to the host-model.
>>>>
>>>> libvirt pseudo:
>>>>
>>>> if arch is s390x
>>>>      set CSSKE to disabled for host-model
>>>
>>> That violates host-model semantics and possibly the user intend. There
>>> would have to be some toggle to manually specify this, for example, a
>>> new model type or a some magical flag.
>>
>> What we actually want to do is to disable csske completely from QEMU and
>> thus from the host-model. Then it would not violate the spec.
>> But this has all kind of issues (you cannot migrate from older versions
>> of software and machines) although the hardware still can provide the feature.
>>
>> The hardware guys promised me to deprecate things two generations earlier
>> and we usually deprecate things that are not used or where software has a
>> runtime switch.
>>
>>  From what I hear from you is that you do not want to modify the host-model
>> semantics to something more useful but rather define a new thing (e.g. "host-sane") ?
> 
> My take would be, to keep the host model consistent, meaning, the
> semantics in QEMU exactly match the semantics in Libvirt. It defines the
> maximum CPU model that's runnable under KVM. If a feature is not
> included (e.g., csske) that feature cannot be enabled in any way.
> 
> The "host model" has the semantics of resembling the actual host CPU.
> This is only partially true, because we support some features the host
> might not support (e.g., zPCI IIRC) and obviously don't support all host
> features in QEMU.
> 
> So instead of playing games on the libvirt side with the host model, I
> see the following alternatives:
> 
> 1. Remove the problematic features from the host model in QEMU, like "we
> just don't support this feature". Consequently, any migration of a VM
> with csske=on to a new QEMU version will fail, similar to having an
> older QEMU version without support for a certain feature.
> 
> "host-passthrough" would change between QEMU versions ... which I see as
> problematic.
> 
> 2. Introduce a new CPU model that has these new semantics: "host model"
> - deprecated features. Migration of older VMs with csske=on to a new
> QEMU version will work. Make libvirt use/expand that new CPU model
> 
> It doesn't necessarily have to be an actual new cpu model. We can use a
> feature group, like "-cpu host,deprectated-features=false". What's
> inside "deprecated-features" will actually change between QEMU versions,
> but we don't really care, as the expanded CPU model won't change.
> 
> "host-passthrough" won't change between QEMU versions ...
> 
> 3. As Daniel suggested, don't use the host model, but a CPU model
> indicated as "suggested".
> 
> The real issue is that in reality, we don't simply always use a model
> like "gen15a", but usually want optional features, if they are around.
> Prime examples are "sie" and friends.
> 
> 
> 
> I tend to prefer 2. With 3. I see issues with optional features like
> "sie" and friends. Often, you really want "give me all you got, but
> disable deprecated features that might cause problems in the future".
> 

Something as hacky as this:

diff --git a/slirp b/slirp
--- a/slirp
+++ b/slirp
@@ -1 +1 @@
-Subproject commit a88d9ace234a24ce1c17189642ef9104799425e0
+Subproject commit a88d9ace234a24ce1c17189642ef9104799425e0-dirty
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 11e06cc51f..37200989c6 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -708,6 +708,34 @@ static void set_feature_group(Object *obj, Visitor *v, const char *name,
     }
 }
 
+static void set_deprecated_features(Object *obj, Visitor *v, const char *name,
+                                    void *opaque, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    S390CPU *cpu = S390_CPU(obj);
+    bool value;
+
+    if (dev->realized) {
+        error_setg(errp, "Attempt to set property '%s' on '%s' after "
+                   "it was realized", name, object_get_typename(obj));
+        return;
+    } else if (!cpu->model) {
+        error_setg(errp, "Details about the host CPU model are not available, "
+                         "features cannot be changed.");
+        return;
+    }
+
+    if (!visit_type_bool(v, name, &value, errp)) {
+        return;
+    }
+    if (value) {
+        error_setg(errp, "Group '%s' can only be disabled.", name);
+        return;
+    }
+
+    clear_bit(S390_FEAT_CONDITIONAL_SSKE, cpu->model->features);
+}
+
 static void s390_cpu_model_initfn(Object *obj)
 {
     S390CPU *cpu = S390_CPU(obj);
@@ -823,6 +851,8 @@ void s390_cpu_model_class_register_props(ObjectClass *oc)
     object_class_property_add_bool(oc, "static", get_is_static,
                                    NULL);
     object_class_property_add_str(oc, "description", get_description, NULL);
+    object_class_property_add(oc, "deprecated-features", "bool", NULL,
+                              set_deprecated_features, NULL, NULL);
 
     for (feat = 0; feat < S390_FEAT_MAX; feat++) {
         const S390FeatDef *def = s390_feat_def(feat);

While it's primarily useful for the "host" model, it *might* be useful for
other (older) models as well.

Under TCG:

{ "execute": "query-cpu-model-expansion", "arguments": { "type": "static", "model": { "name": "z14" } } }
{"return": {"model": {"name": "z14-base", "props": {"aen": true, "aefsi": true, "mepoch": true, "msa8": true, "msa7": true, "msa6": true, "msa5": true, "msa4": true, "msa3": true, "msa2": true, "msa1": true, "sthyi": true, "edat": true, "ri": true, "edat2": true, "vx": true, "ipter": true, "mepochptff": true, "vxeh": true, "vxpd": true, "esop": true, "iep": true, "cte": true, "bpb": true, "gs": true, "ppa15": true, "zpci": true, "sea_esop2": true, "te": true, "cmm": true}}}}


{ "execute": "query-cpu-model-expansion", "arguments": { "type": "static", "model": { "name": "z14",  "props": {"deprecated-features":false}} } }
{"return": {"model": {"name": "z14-base", "props": {"aen": true, "aefsi": true, "csske": false, "mepoch": true, "msa8": true, "msa7": true, "msa6": true, "msa5": true, "msa4": true, "msa3": true, "msa2": true, "msa1": true, "sthyi": true, "edat": true, "ri": true, "edat2": true, "vx": true, "ipter": true, "mepochptff": true, "vxeh": true, "vxpd": true, "esop": true, "iep": true, "cte": true, "bpb": true, "gs": true, "ppa15": true, "zpci": true, "sea_esop2": true, "te": true, "cmm": true}}}}

Note the "csske=false" change.

-- 
Thanks,

David / dhildenb