docs/system/arm/cpu-features.rst | 47 +- hw/intc/armv7m_nvic.c | 27 +- scripts/gen-cpu-sysreg-properties.awk | 325 ++++++++++++ scripts/gen-cpu-sysregs-header.awk | 47 ++ scripts/update-aarch64-sysreg-code.sh | 27 + target/arm/arm-qmp-cmds.c | 19 + target/arm/cpu-custom.h | 58 +++ target/arm/cpu-features.h | 311 ++++++------ target/arm/cpu-sysreg-properties.c | 682 ++++++++++++++++++++++++++ target/arm/cpu-sysregs.h | 152 ++++++ target/arm/cpu.c | 123 ++--- target/arm/cpu.h | 120 +++-- target/arm/cpu64.c | 260 +++++++--- target/arm/helper.c | 68 +-- target/arm/internals.h | 6 +- target/arm/kvm.c | 253 +++++++--- target/arm/kvm_arm.h | 16 +- target/arm/meson.build | 1 + target/arm/ptw.c | 6 +- target/arm/tcg/cpu-v7m.c | 174 +++---- target/arm/tcg/cpu32.c | 320 ++++++------ target/arm/tcg/cpu64.c | 460 ++++++++--------- target/arm/trace-events | 8 + 23 files changed, 2594 insertions(+), 916 deletions(-) create mode 100755 scripts/gen-cpu-sysreg-properties.awk create mode 100755 scripts/gen-cpu-sysregs-header.awk create mode 100755 scripts/update-aarch64-sysreg-code.sh create mode 100644 target/arm/cpu-custom.h create mode 100644 target/arm/cpu-sysreg-properties.c create mode 100644 target/arm/cpu-sysregs.h
A respin/update on the aarch64 KVM cpu models. Also available at gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 Find Eric's original cover letter below, so that I do not need to repeat myself on the aspects that have not changed since RFCv1 :) Changes from RFCv1: Rebased on more recent QEMU (some adaptions in the register conversions of the first few patches.) Based on feedback, I have removed the "custom" cpu model; instead, I have added the new SYSREG_<REG>_<FIELD> properties to the "host" model. This works well if you want to tweak anything that does not correspond to the existing properties for the host model; however, if you e.g. wanted to tweak sve, you have two ways to do so -- we'd probably either want to check for conflicts, or just declare precedence. The kvm-specific props remain unchanged, as they are orthogonal to this configuration. The cpu model expansion for the "host" model now dumps the new SYSREG_ properties in addition to the existing host model properties; this is a bit ugly, but I don't see a good way on how to split this up. Some more adaptions due to the removal of the "custom" model. Things *not* changed from RFCv1: SYSREG_ property naming (can be tweaked easily, once we are clear on what the interface should look like.) Sysreg generation scripts, and the generated files (I have not updated anything there.) I think generating the various definitions makes sense, as long as we double-check the generated files on each update (which would be something to trigger manually anyway.) What I would like us to reach some kind of consensus on: How to continue with the patches moving the ID registers from the isar struct into the idregs array. These are a bit of churn to drag along; if they make sense, maybe they can be picked independently of this series? Whether it make sense to continue with the approach of tweaking values in the ID registers in general. If we want to be able to migrate between cpus that do not differ wildly, we'll encounter differences that cannot be expressed via FEAT_xxx -- e.g. when comparing various AmpereAltra Max systems, they only differ in parts of CTR_EL0 -- which is not a feature register, but a writable register. Please take a look, and looking forward to your feedback :) *********************************************************************** Title: Introduce a customizable aarch64 KVM host model This RFC series introduces a KVM host "custom" model. Since v6.7 kernel, KVM/arm allows the userspace to overwrite the values of a subset of ID regs. The list of writable fields continues to grow. The feature ID range is defined as the AArch64 System register space with op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}. The custom model uses this capability and allows to tune the host passthrough model by overriding some of the host passthrough ID regs. The end goal is to get more flexibility when migrating guests between different machines. We would like the upper software layer to be able detect how tunable the vpcu is on both source and destination and accordingly define a customized KVM host model that can fit both ends. With the legacy host passthrough model, this migration use case would fail. QEMU queries the host kernel to get the list of writable ID reg fields and expose all the writable fields as uint64 properties. Those are named "SYSREG_<REG>_<FIELD>". REG and FIELD names are those described in ARM ARM Reference manual and linux arch/arm64/tools/sysreg. Some awk scriptsintroduced in the series help parsing the sysreg file and generate some code. those scripts are used in a similar way as scripts/update-linux-headers.sh. In case the ABI gets broken, it is still possible to manually edit the generated code. However it is globally expected the REG and FIELD names are stable. The list of SYSREG_ID properties can be retrieved through the qmp monitor using query-cpu-model-expansion [2]. The first part of the series mostly consists in migrating id reg storage from named fields in ARMISARegisters to anonymous index ordered storage in an IdRegMap struct array. The goal is to have a generic way to store all id registers, also compatible with the way we retrieve their writable capability at kernel level through the KVM_ARM_GET_REG_WRITABLE_MASKS ioctl. Having named fields prevented us from getting this scalability/genericity. Although the change is invasive it is quite straightforward and should be easy to be reviewed. Then the bulk of the job is to retrieve the writable ID fields and match them against a "human readable" description of those fields. We use awk scripts, derived from kernel arch/arm64/tools/gen-sysreg.awk (so all the credit to Mark Rutland) that populates a data structure which describes all the ID regs in sysreg and their fields. We match writable ID reg fields with those latter and dynamically create a uint64 property. Then we need to extend the list of id regs read from the host so that we get a chance to let their value overriden and write them back into KVM . This expectation is that this custom KVM host model can prepare for the advent of named models. Introducing named models with reduced and explicitly defined features is the next step. Obviously this series is not able to cope with non writable ID regs. For instance the problematic of MIDR/REVIDR setting is not handled at the moment. TESTS: - with few IDREG fields that can be easily examined from guest userspace: -cpu custom,SYSREG_ID_AA64ISAR0_EL1_DP=0x0,SYSREG_ID_AA64ISAR1_EL1_DPB=0x0 - migration between custom models - TCG A57 non regressions. Light testing for TCG though. Deep review may detect some mistakes when migrating between named fields and IDRegMap storage - light testing of introspection. Testing a given writable ID field value with query-cpu-model-expansion is not supported yet. TODO/QUESTIONS: - Some idreg named fields are not yet migrated to an array storage. some of them are not in isar struct either. Maybe we could have handled TCG and KVM separately and it may turn out that this conversion is unneeded. So as it is quite cumbersome I prefered to keep it for a later stage. - the custom model does not come with legacy host properties such as SVE, MTE, expecially those that induce some KVM settings. This needs to be fixed. - The custom model and its exposed properties depend on the host capabilities. More and more IDREG become writable meaning that the custom model gains more properties over the time and it is host linux dependent. At the moment there is no versioning in place. By default the custom model is a host passthrough model (besides the legacy functions). So if the end-user tries to set a field that is not writable from a kernel pov, it will fail. Nevertheless a versionned custom model could constrain the props exposed, independently on the host linux capabilities. - the QEMU layer does not take care of IDREG field value consistency. The kernel neither. I imagine this could be the role of the upper layer to implement a vcpu profile that makes sure settings are consistent. Here we come to "named" models. What should they look like on ARM? - Implementation details: - it seems there are a lot of duplications in the code. ID regs are described in different manners, with different data structs, for TCG, now for KVM. - The IdRegMap->regs is sparsely populated. Maybe a better data struct could be used, although this is the one chosen for the kernel uapi. References: [1] [PATCH v12 00/11] Support writable CPU ID registers from userspace https://lore.kernel.org/all/20230609190054.1542113-1-oliver.upton@linux.dev/ [2] qemu-system-aarch64 -qmp unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -M virt --enable-kvm -cpu custom scripts/qmp/qmp-shell /home/augere/TEST/QEMU/qmp-sock Welcome to the QMP low-level shell! Connected to QEMU 9.0.50 (QEMU) query-cpu-model-expansion type=full model={"name":"custom"} [3] KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES KVM_ARM_GET_REG_WRITABLE_MASKS Documentation/virt/kvm/api.rst [4] linux "sysreg" file linux/arch/arm64/tools/sysreg and gen-sysreg.awk ./tools/include/generated/asm/sysreg-defs.h Cornelia Huck (3): kvm: kvm_get_writable_id_regs arm-qmp-cmds: introspection for ID register props arm/cpu-features: document ID reg properties Eric Auger (17): arm/cpu: Add sysreg definitions in cpu-sysregs.h arm/cpu: Store aa64isar0 into the idregs arrays arm/cpu: Store aa64isar1/2 into the idregs array arm/cpu: Store aa64drf0/1 into the idregs array arm/cpu: Store aa64mmfr0-3 into the idregs array arm/cpu: Store aa64drf0/1 into the idregs array arm/cpu: Store aa64smfr0 into the idregs array arm/cpu: Store id_isar0-7 into the idregs array arm/cpu: Store id_mfr0/1 into the idregs array arm/cpu: Store id_dfr0/1 into the idregs array arm/cpu: Store id_mmfr0-5 into the idregs array arm/cpu: Add infra to handle generated ID register definitions arm/cpu: Add sysreg generation scripts arm/cpu: Add generated files arm/kvm: Allow reading all the writable ID registers arm/kvm: write back modified ID regs to KVM arm/cpu: more customization for the kvm host cpu model docs/system/arm/cpu-features.rst | 47 +- hw/intc/armv7m_nvic.c | 27 +- scripts/gen-cpu-sysreg-properties.awk | 325 ++++++++++++ scripts/gen-cpu-sysregs-header.awk | 47 ++ scripts/update-aarch64-sysreg-code.sh | 27 + target/arm/arm-qmp-cmds.c | 19 + target/arm/cpu-custom.h | 58 +++ target/arm/cpu-features.h | 311 ++++++------ target/arm/cpu-sysreg-properties.c | 682 ++++++++++++++++++++++++++ target/arm/cpu-sysregs.h | 152 ++++++ target/arm/cpu.c | 123 ++--- target/arm/cpu.h | 120 +++-- target/arm/cpu64.c | 260 +++++++--- target/arm/helper.c | 68 +-- target/arm/internals.h | 6 +- target/arm/kvm.c | 253 +++++++--- target/arm/kvm_arm.h | 16 +- target/arm/meson.build | 1 + target/arm/ptw.c | 6 +- target/arm/tcg/cpu-v7m.c | 174 +++---- target/arm/tcg/cpu32.c | 320 ++++++------ target/arm/tcg/cpu64.c | 460 ++++++++--------- target/arm/trace-events | 8 + 23 files changed, 2594 insertions(+), 916 deletions(-) create mode 100755 scripts/gen-cpu-sysreg-properties.awk create mode 100755 scripts/gen-cpu-sysregs-header.awk create mode 100755 scripts/update-aarch64-sysreg-code.sh create mode 100644 target/arm/cpu-custom.h create mode 100644 target/arm/cpu-sysreg-properties.c create mode 100644 target/arm/cpu-sysregs.h -- 2.47.0
On Fri, 06 Dec 2024 11:21:53 +0000, Cornelia Huck <cohuck@redhat.com> wrote: > > A respin/update on the aarch64 KVM cpu models. Also available at > gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 > > Find Eric's original cover letter below, so that I do not need to > repeat myself on the aspects that have not changed since RFCv1 :) Does anyone have a branch containing both this series and Eric's KVM NV support series? Asking for a friend... M. -- Without deviation from the norm, progress is not possible.
Hi Marc, On 12/17/24 16:21, Marc Zyngier wrote: > On Fri, 06 Dec 2024 11:21:53 +0000, > Cornelia Huck <cohuck@redhat.com> wrote: >> A respin/update on the aarch64 KVM cpu models. Also available at >> gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 >> >> Find Eric's original cover letter below, so that I do not need to >> repeat myself on the aspects that have not changed since RFCv1 :) > Does anyone have a branch containing both this series and Eric's KVM > NV support series? > > Asking for a friend... I have just assembled https://github.com/eauger/qemu.git v9.0-nv-rfcv4-vcpu-model-v2 Totally untested atm Eric > > M. >
On Fri, 6 Dec 2024, Cornelia Huck wrote: > A respin/update on the aarch64 KVM cpu models. Also available at > gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 > > Find Eric's original cover letter below, so that I do not need to > repeat myself on the aspects that have not changed since RFCv1 :) > > Changes from RFCv1: > > Rebased on more recent QEMU (some adaptions in the register conversions > of the first few patches.) > > Based on feedback, I have removed the "custom" cpu model; instead, I > have added the new SYSREG_<REG>_<FIELD> properties to the "host" model. > This works well if you want to tweak anything that does not correspond > to the existing properties for the host model; however, if you e.g. > wanted to tweak sve, you have two ways to do so -- we'd probably either > want to check for conflicts, or just declare precedence. The kvm-specific > props remain unchanged, as they are orthogonal to this configuration. > > The cpu model expansion for the "host" model now dumps the new SYSREG_ > properties in addition to the existing host model properties; this is a > bit ugly, but I don't see a good way on how to split this up. > I gave this a spin today and successfully migrated a VM between 2 similar machines that only differ in the DIC bit of the cache type register using: -cpu host,SYSREG_CTR_EL0_DIC=0 This allows me to get rid of my horrid qemu hacks to achieve the same. Thanks, Sebastian
On Thu, Dec 12 2024, Sebastian Ott <sebott@redhat.com> wrote: > On Fri, 6 Dec 2024, Cornelia Huck wrote: >> A respin/update on the aarch64 KVM cpu models. Also available at >> gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 >> >> Find Eric's original cover letter below, so that I do not need to >> repeat myself on the aspects that have not changed since RFCv1 :) >> >> Changes from RFCv1: >> >> Rebased on more recent QEMU (some adaptions in the register conversions >> of the first few patches.) >> >> Based on feedback, I have removed the "custom" cpu model; instead, I >> have added the new SYSREG_<REG>_<FIELD> properties to the "host" model. >> This works well if you want to tweak anything that does not correspond >> to the existing properties for the host model; however, if you e.g. >> wanted to tweak sve, you have two ways to do so -- we'd probably either >> want to check for conflicts, or just declare precedence. The kvm-specific >> props remain unchanged, as they are orthogonal to this configuration. >> >> The cpu model expansion for the "host" model now dumps the new SYSREG_ >> properties in addition to the existing host model properties; this is a >> bit ugly, but I don't see a good way on how to split this up. >> > > I gave this a spin today and successfully migrated a VM between 2 similar > machines that only differ in the DIC bit of the cache type register using: > > -cpu host,SYSREG_CTR_EL0_DIC=0 > > This allows me to get rid of my horrid qemu hacks to achieve the same. Great, thanks for testing!
Connie, On 12/6/24 12:21, Cornelia Huck wrote: > A respin/update on the aarch64 KVM cpu models. Also available at > gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 > > Find Eric's original cover letter below, so that I do not need to > repeat myself on the aspects that have not changed since RFCv1 :) > > Changes from RFCv1: > > Rebased on more recent QEMU (some adaptions in the register conversions > of the first few patches.) > > Based on feedback, I have removed the "custom" cpu model; instead, I > have added the new SYSREG_<REG>_<FIELD> properties to the "host" model. > This works well if you want to tweak anything that does not correspond > to the existing properties for the host model; however, if you e.g. > wanted to tweak sve, you have two ways to do so -- we'd probably either > want to check for conflicts, or just declare precedence. The kvm-specific > props remain unchanged, as they are orthogonal to this configuration. > > The cpu model expansion for the "host" model now dumps the new SYSREG_ > properties in addition to the existing host model properties; this is a > bit ugly, but I don't see a good way on how to split this up. > > Some more adaptions due to the removal of the "custom" model. > > Things *not* changed from RFCv1: > > SYSREG_ property naming (can be tweaked easily, once we are clear on what > the interface should look like.) > > Sysreg generation scripts, and the generated files (I have not updated > anything there.) I think generating the various definitions makes sense, > as long as we double-check the generated files on each update (which would > be something to trigger manually anyway.) > > What I would like us to reach some kind of consensus on: > > How to continue with the patches moving the ID registers from the isar > struct into the idregs array. These are a bit of churn to drag along; > if they make sense, maybe they can be picked independently of this series? > > Whether it make sense to continue with the approach of tweaking values in > the ID registers in general. If we want to be able to migrate between cpus > that do not differ wildly, we'll encounter differences that cannot be > expressed via FEAT_xxx -- e.g. when comparing various AmpereAltra Max systems, > they only differ in parts of CTR_EL0 -- which is not a feature register, but > a writable register. In v1 most of the commenters said they would prefer to see FEAT props instead of IDREG field props. I think we shall try to go in this direction anyway. As you pointed out there will be some cases where FEAT won't be enough (CTR_EL0 is a good example). So I tend to think the end solution will be a mix of FEAT and ID reg field props. Personally I would smoothly migrate what we can from ID reg field props to FEAT props (maybe using prop aliases?), starting from the easiest 1-1 mappings and then adressing the FEAT that are more complex but are explictly needed to enable the use cases we are interested in, at RedHat: migration within Ampere AltraMax family, migration within NVidia Grace family, migration within AmpereOne family and migration between Graviton3/4. We have no info about other's use cases. If some of you want to see some other live migration combinations addressed, please raise your voice. Some CSPs may have their own LM solution/requirements but they don't use qemu. So I think we shall concentrate on those use cases. You did the exercise to identify most prevalent patterns for FEAT to IDREG fields mappings. I think we should now encode this conversion table for those which are needed in above use cases. From a named model point of view, since I do not see much traction upstream besides Red Hat use cases, targetting ARM spec revision baselines may be overkill. Personally I would try to focus on above models: AltraMax, AmpereOne, Grace, ... Or maybe the ARM cores they may be derived from. According to the discussion we had with Marc in [1] it seems it does not make sense to target migration between very heterogeneous machines and Dan said we would prefer to avoid adding plenty of feat add-ons to a named models. So I would rather be as close as possible to a specific family definition. Thanks Eric [1] https://lore.kernel.org/all/c879fda9-db5a-4743-805d-03c0acba8060@redhat.com/#r > > Please take a look, and looking forward to your feedback :) > > *********************************************************************** > > Title: Introduce a customizable aarch64 KVM host model > > This RFC series introduces a KVM host "custom" model. > > Since v6.7 kernel, KVM/arm allows the userspace to overwrite the values > of a subset of ID regs. The list of writable fields continues to grow. > The feature ID range is defined as the AArch64 System register space > with op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}. > > The custom model uses this capability and allows to tune the host > passthrough model by overriding some of the host passthrough ID regs. > > The end goal is to get more flexibility when migrating guests > between different machines. We would like the upper software layer > to be able detect how tunable the vpcu is on both source and destination > and accordingly define a customized KVM host model that can fit > both ends. With the legacy host passthrough model, this migration > use case would fail. > > QEMU queries the host kernel to get the list of writable ID reg > fields and expose all the writable fields as uint64 properties. Those > are named "SYSREG_<REG>_<FIELD>". REG and FIELD names are those > described in ARM ARM Reference manual and linux arch/arm64/tools/sysreg. > Some awk scriptsintroduced in the series help parsing the sysreg file and > generate some code. those scripts are used in a similar way as > scripts/update-linux-headers.sh. In case the ABI gets broken, it is > still possible to manually edit the generated code. However it is > globally expected the REG and FIELD names are stable. > > The list of SYSREG_ID properties can be retrieved through the qmp > monitor using query-cpu-model-expansion [2]. > > The first part of the series mostly consists in migrating id reg > storage from named fields in ARMISARegisters to anonymous index > ordered storage in an IdRegMap struct array. The goal is to have > a generic way to store all id registers, also compatible with the > way we retrieve their writable capability at kernel level through > the KVM_ARM_GET_REG_WRITABLE_MASKS ioctl. Having named fields > prevented us from getting this scalability/genericity. Although the > change is invasive it is quite straightforward and should be easy > to be reviewed. > > Then the bulk of the job is to retrieve the writable ID fields and > match them against a "human readable" description of those fields. > We use awk scripts, derived from kernel arch/arm64/tools/gen-sysreg.awk > (so all the credit to Mark Rutland) that populates a data structure > which describes all the ID regs in sysreg and their fields. We match > writable ID reg fields with those latter and dynamically create a > uint64 property. > > Then we need to extend the list of id regs read from the host > so that we get a chance to let their value overriden and write them > back into KVM . > > This expectation is that this custom KVM host model can prepare for > the advent of named models. Introducing named models with reduced > and explicitly defined features is the next step. > > Obviously this series is not able to cope with non writable ID regs. > For instance the problematic of MIDR/REVIDR setting is not handled > at the moment. > > > TESTS: > - with few IDREG fields that can be easily examined from guest > userspace: > -cpu custom,SYSREG_ID_AA64ISAR0_EL1_DP=0x0,SYSREG_ID_AA64ISAR1_EL1_DPB=0x0 > - migration between custom models > - TCG A57 non regressions. Light testing for TCG though. Deep > review may detect some mistakes when migrating between named fields > and IDRegMap storage > - light testing of introspection. Testing a given writable ID field > value with query-cpu-model-expansion is not supported yet. > > TODO/QUESTIONS: > - Some idreg named fields are not yet migrated to an array storage. > some of them are not in isar struct either. Maybe we could have > handled TCG and KVM separately and it may turn out that this > conversion is unneeded. So as it is quite cumbersome I prefered > to keep it for a later stage. > - the custom model does not come with legacy host properties > such as SVE, MTE, expecially those that induce some KVM > settings. This needs to be fixed. > - The custom model and its exposed properties depend on the host > capabilities. More and more IDREG become writable meaning that > the custom model gains more properties over the time and it is > host linux dependent. At the moment there is no versioning in > place. By default the custom model is a host passthrough model > (besides the legacy functions). So if the end-user tries to set > a field that is not writable from a kernel pov, it will fail. > Nevertheless a versionned custom model could constrain the props > exposed, independently on the host linux capabilities. > - the QEMU layer does not take care of IDREG field value consistency. > The kernel neither. I imagine this could be the role of the upper > layer to implement a vcpu profile that makes sure settings are > consistent. Here we come to "named" models. What should they look > like on ARM? > - Implementation details: > - it seems there are a lot of duplications in > the code. ID regs are described in different manners, with different > data structs, for TCG, now for KVM. > - The IdRegMap->regs is sparsely populated. Maybe a better data > struct could be used, although this is the one chosen for the kernel > uapi. > > References: > > [1] [PATCH v12 00/11] Support writable CPU ID registers from userspace > https://lore.kernel.org/all/20230609190054.1542113-1-oliver.upton@linux.dev/ > > [2] > qemu-system-aarch64 -qmp unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -M virt --enable-kvm -cpu custom > scripts/qmp/qmp-shell /home/augere/TEST/QEMU/qmp-sock > Welcome to the QMP low-level shell! > Connected to QEMU 9.0.50 > (QEMU) query-cpu-model-expansion type=full model={"name":"custom"} > > [3] > KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES > KVM_ARM_GET_REG_WRITABLE_MASKS > Documentation/virt/kvm/api.rst > > [4] linux "sysreg" file > linux/arch/arm64/tools/sysreg and gen-sysreg.awk > ./tools/include/generated/asm/sysreg-defs.h > > > Cornelia Huck (3): > kvm: kvm_get_writable_id_regs > arm-qmp-cmds: introspection for ID register props > arm/cpu-features: document ID reg properties > > Eric Auger (17): > arm/cpu: Add sysreg definitions in cpu-sysregs.h > arm/cpu: Store aa64isar0 into the idregs arrays > arm/cpu: Store aa64isar1/2 into the idregs array > arm/cpu: Store aa64drf0/1 into the idregs array > arm/cpu: Store aa64mmfr0-3 into the idregs array > arm/cpu: Store aa64drf0/1 into the idregs array > arm/cpu: Store aa64smfr0 into the idregs array > arm/cpu: Store id_isar0-7 into the idregs array > arm/cpu: Store id_mfr0/1 into the idregs array > arm/cpu: Store id_dfr0/1 into the idregs array > arm/cpu: Store id_mmfr0-5 into the idregs array > arm/cpu: Add infra to handle generated ID register definitions > arm/cpu: Add sysreg generation scripts > arm/cpu: Add generated files > arm/kvm: Allow reading all the writable ID registers > arm/kvm: write back modified ID regs to KVM > arm/cpu: more customization for the kvm host cpu model > > docs/system/arm/cpu-features.rst | 47 +- > hw/intc/armv7m_nvic.c | 27 +- > scripts/gen-cpu-sysreg-properties.awk | 325 ++++++++++++ > scripts/gen-cpu-sysregs-header.awk | 47 ++ > scripts/update-aarch64-sysreg-code.sh | 27 + > target/arm/arm-qmp-cmds.c | 19 + > target/arm/cpu-custom.h | 58 +++ > target/arm/cpu-features.h | 311 ++++++------ > target/arm/cpu-sysreg-properties.c | 682 ++++++++++++++++++++++++++ > target/arm/cpu-sysregs.h | 152 ++++++ > target/arm/cpu.c | 123 ++--- > target/arm/cpu.h | 120 +++-- > target/arm/cpu64.c | 260 +++++++--- > target/arm/helper.c | 68 +-- > target/arm/internals.h | 6 +- > target/arm/kvm.c | 253 +++++++--- > target/arm/kvm_arm.h | 16 +- > target/arm/meson.build | 1 + > target/arm/ptw.c | 6 +- > target/arm/tcg/cpu-v7m.c | 174 +++---- > target/arm/tcg/cpu32.c | 320 ++++++------ > target/arm/tcg/cpu64.c | 460 ++++++++--------- > target/arm/trace-events | 8 + > 23 files changed, 2594 insertions(+), 916 deletions(-) > create mode 100755 scripts/gen-cpu-sysreg-properties.awk > create mode 100755 scripts/gen-cpu-sysregs-header.awk > create mode 100755 scripts/update-aarch64-sysreg-code.sh > create mode 100644 target/arm/cpu-custom.h > create mode 100644 target/arm/cpu-sysreg-properties.c > create mode 100644 target/arm/cpu-sysregs.h >
On Thu, Dec 12 2024, Eric Auger <eric.auger@redhat.com> wrote: > Connie, > > On 12/6/24 12:21, Cornelia Huck wrote: >> Whether it make sense to continue with the approach of tweaking values in >> the ID registers in general. If we want to be able to migrate between cpus >> that do not differ wildly, we'll encounter differences that cannot be >> expressed via FEAT_xxx -- e.g. when comparing various AmpereAltra Max systems, >> they only differ in parts of CTR_EL0 -- which is not a feature register, but >> a writable register. > In v1 most of the commenters said they would prefer to see FEAT props > instead of IDREG field props. I think we shall try to go in this > direction anyway. As you pointed out there will be some cases where FEAT > won't be enough (CTR_EL0 is a good example). So I tend to think the end > solution will be a mix of FEAT and ID reg field props. Some analysis of FEAT_xxx mappings: https://lore.kernel.org/qemu-devel/87ikstn8sc.fsf@redhat.com/ (actually, ~190 of FEAT_xxx map to a single value in a single register, so mappings are easy other than the sheer amount of them) We probably should simply not support FEAT_xxx that are solely defined via dependencies. Some more real-world examples from some cpu pairings I had looked at: https://lore.kernel.org/qemu-devel/87ldx2krdp.fsf@redhat.com/ (but also see Peter's follow-up, the endianness field is actually covered by a feature) The values-in-registers-not-covered-by-features we are currently aware of are: - number of breakpoints - PARange values - GIC - some fields in CTR_EL0 (see also https://lore.kernel.org/qemu-devel/4fb49b5b02bb417399ee871b2c85bb35@huawei.com/ for the latter two) Also, MIDR/REVIDR handling. Given that we'll need a mix if we support FEAT_xxx, should we mandate the FEAT_xxx syntax if there is a mapping and allow direct specification of register fields only if there is none, or allow them as alternatives (with proper priority handling, or alias handling?) > > Personally I would smoothly migrate what we can from ID reg field props > to FEAT props (maybe using prop aliases?), starting from the easiest 1-1 > mappings and then adressing the FEAT that are more complex but are > explictly needed to enable the use cases we are interested in, at RedHat: > migration within Ampere AltraMax family, migration within NVidia Grace > family, migration within AmpereOne family and migration between Graviton3/4. For these, we'll already need the mix (my examples above all came from these use cases.) (Of course, the existing legacy props need to be expressed as well. I guess they should map to registers directly.) > > We have no info about other's use cases. If some of you want to see some > other live migration combinations addressed, please raise your voice. > Some CSPs may have their own LM solution/requirements but they don't use > qemu. So I think we shall concentrate on those use cases. > > You did the exercise to identify most prevalent patterns for FEAT to > IDREG fields mappings. I think we should now encode this conversion > table for those which are needed in above use cases. I'd focus on the actually needed features first, as otherwise it's really overwhelming. > > From a named model point of view, since I do not see much traction > upstream besides Red Hat use cases, targetting ARM spec revision > baselines may be overkill. Personally I would try to focus on above > models: AltraMax, AmpereOne, Grace, ... Or maybe the ARM cores they may > be derived from. According to the discussion we had with Marc in [1] it > seems it does not make sense to target migration between very > heterogeneous machines and Dan said we would prefer to avoid adding > plenty of feat add-ons to a named models. So I would rather be as close > as possible to a specific family definition. Using e.g. Neoverse-V2 as a base currently looks most attractive to me -- going with Armv<x>.<y> would probably give a larger diff (although the diff for Graviton3/4 is pretty large anyway.) > > Thanks > > Eric > > [1] > https://lore.kernel.org/all/c879fda9-db5a-4743-805d-03c0acba8060@redhat.com/#r
On Mon, Dec 16 2024, Cornelia Huck <cohuck@redhat.com> wrote: > On Thu, Dec 12 2024, Eric Auger <eric.auger@redhat.com> wrote: > >> Connie, >> >> On 12/6/24 12:21, Cornelia Huck wrote: >>> Whether it make sense to continue with the approach of tweaking values in >>> the ID registers in general. If we want to be able to migrate between cpus >>> that do not differ wildly, we'll encounter differences that cannot be >>> expressed via FEAT_xxx -- e.g. when comparing various AmpereAltra Max systems, >>> they only differ in parts of CTR_EL0 -- which is not a feature register, but >>> a writable register. >> In v1 most of the commenters said they would prefer to see FEAT props >> instead of IDREG field props. I think we shall try to go in this >> direction anyway. As you pointed out there will be some cases where FEAT >> won't be enough (CTR_EL0 is a good example). So I tend to think the end >> solution will be a mix of FEAT and ID reg field props. > > Some analysis of FEAT_xxx mappings: > https://lore.kernel.org/qemu-devel/87ikstn8sc.fsf@redhat.com/ > > (actually, ~190 of FEAT_xxx map to a single value in a single register, > so mappings are easy other than the sheer amount of them) > > We probably should simply not support FEAT_xxx that are solely defined > via dependencies. > > Some more real-world examples from some cpu pairings I had looked at: > https://lore.kernel.org/qemu-devel/87ldx2krdp.fsf@redhat.com/ > (but also see Peter's follow-up, the endianness field is actually > covered by a feature) > > The values-in-registers-not-covered-by-features we are currently aware > of are: > - number of breakpoints > - PARange values > - GIC > - some fields in CTR_EL0 > (see also > https://lore.kernel.org/qemu-devel/4fb49b5b02bb417399ee871b2c85bb35@huawei.com/ > for the latter two) And the differences in GIC might be actually due to a GICv3 not being configured, together with running a recent kernel, which will zero the field. So we might actually already be able to handle it for most cases. > > Also, MIDR/REVIDR handling. > > Given that we'll need a mix if we support FEAT_xxx, should we mandate > the FEAT_xxx syntax if there is a mapping and allow direct specification > of register fields only if there is none, or allow them as alternatives > (with proper priority handling, or alias handling?) > >> >> Personally I would smoothly migrate what we can from ID reg field props >> to FEAT props (maybe using prop aliases?), starting from the easiest 1-1 >> mappings and then adressing the FEAT that are more complex but are >> explictly needed to enable the use cases we are interested in, at RedHat: >> migration within Ampere AltraMax family, migration within NVidia Grace >> family, migration within AmpereOne family and migration between Graviton3/4. > > For these, we'll already need the mix (my examples above all came from > these use cases.) > > (Of course, the existing legacy props need to be expressed as well. I > guess they should map to registers directly.) > >> >> We have no info about other's use cases. If some of you want to see some >> other live migration combinations addressed, please raise your voice. >> Some CSPs may have their own LM solution/requirements but they don't use >> qemu. So I think we shall concentrate on those use cases. >> >> You did the exercise to identify most prevalent patterns for FEAT to >> IDREG fields mappings. I think we should now encode this conversion >> table for those which are needed in above use cases. > > I'd focus on the actually needed features first, as otherwise it's > really overwhelming. > >> >> From a named model point of view, since I do not see much traction >> upstream besides Red Hat use cases, targetting ARM spec revision >> baselines may be overkill. Personally I would try to focus on above >> models: AltraMax, AmpereOne, Grace, ... Or maybe the ARM cores they may >> be derived from. According to the discussion we had with Marc in [1] it >> seems it does not make sense to target migration between very >> heterogeneous machines and Dan said we would prefer to avoid adding >> plenty of feat add-ons to a named models. So I would rather be as close >> as possible to a specific family definition. > > Using e.g. Neoverse-V2 as a base currently looks most attractive to > me -- going with Armv<x>.<y> would probably give a larger diff (although > the diff for Graviton3/4 is pretty large anyway.) > >> >> Thanks >> >> Eric >> >> [1] >> https://lore.kernel.org/all/c879fda9-db5a-4743-805d-03c0acba8060@redhat.com/#r
On Thu, Dec 12, 2024 at 09:12:33AM +0100, Eric Auger wrote: > Connie, > > On 12/6/24 12:21, Cornelia Huck wrote: > > A respin/update on the aarch64 KVM cpu models. Also available at > > gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 snip > From a named model point of view, since I do not see much traction > upstream besides Red Hat use cases, targetting ARM spec revision > baselines may be overkill. Personally I would try to focus on above > models: AltraMax, AmpereOne, Grace, ... Or maybe the ARM cores they may > be derived from. If we target modelling of vendor named CPU models, then beware that we're opening the door to an very large set (potentially unbounded) of named CPU models over time. If we target ARM spec baselines then the set of named CPU models is fairly modest and grows slowly. Including ARM spec baselines will probably reduce the demand for adding vendor specific named models, though I expect we'll still end up wanting some, or possibly even many. Having some common baseline models is likely useful for mgmt applications in other ways though. Consider you mgmt app wants to set a CPU model that's common across heterogeneous hardware. They don't neccessarily want/need to be able to live migrate between heterogeneous CPUs, but for simplicity of configuration desire to set a single named CPU across all guests, irrespective of what host hey are launched on. The ARM spec baseline named models would give you that config simplicity. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Dec 12 2024, Daniel P. Berrangé <berrange@redhat.com> wrote: > On Thu, Dec 12, 2024 at 09:12:33AM +0100, Eric Auger wrote: >> Connie, >> >> On 12/6/24 12:21, Cornelia Huck wrote: >> > A respin/update on the aarch64 KVM cpu models. Also available at >> > gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 > > snip > >> From a named model point of view, since I do not see much traction >> upstream besides Red Hat use cases, targetting ARM spec revision >> baselines may be overkill. Personally I would try to focus on above >> models: AltraMax, AmpereOne, Grace, ... Or maybe the ARM cores they may >> be derived from. > > If we target modelling of vendor named CPU models, then beware that > we're opening the door to an very large set (potentially unbounded) > of named CPU models over time. If we target ARM spec baselines then > the set of named CPU models is fairly modest and grows slowly. > > Including ARM spec baselines will probably reduce the demand for > adding vendor specific named models, though I expect we'll still > end up wanting some, or possibly even many. > > Having some common baseline models is likely useful for mgmt > applications in other ways though. > > Consider you mgmt app wants to set a CPU model that's common across > heterogeneous hardware. They don't neccessarily want/need to be > able to live migrate between heterogeneous CPUs, but for simplicity > of configuration desire to set a single named CPU across all guests, > irrespective of what host hey are launched on. The ARM spec baseline > named models would give you that config simplicity. If we use architecture extensions (i.e. Armv8.x/9.x) as baseline, I'm seeing some drawbacks: - a lot of work before we can address some specific use cases - old models can get new optional features - a specific cpu might have a huge set of optional features on top of the baseline model Using a reference core such as Neoverse-V2 probably makes more sense (easier to get started, less feature diff?) It would still make a good starting point for a simple config.
On 12/12/24 10:36, Cornelia Huck wrote: > On Thu, Dec 12 2024, Daniel P. Berrangé <berrange@redhat.com> wrote: > >> On Thu, Dec 12, 2024 at 09:12:33AM +0100, Eric Auger wrote: >>> Connie, >>> >>> On 12/6/24 12:21, Cornelia Huck wrote: >>>> A respin/update on the aarch64 KVM cpu models. Also available at >>>> gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 >> snip >> >>> From a named model point of view, since I do not see much traction >>> upstream besides Red Hat use cases, targetting ARM spec revision >>> baselines may be overkill. Personally I would try to focus on above >>> models: AltraMax, AmpereOne, Grace, ... Or maybe the ARM cores they may >>> be derived from. >> If we target modelling of vendor named CPU models, then beware that >> we're opening the door to an very large set (potentially unbounded) >> of named CPU models over time. If we target ARM spec baselines then >> the set of named CPU models is fairly modest and grows slowly. >> >> Including ARM spec baselines will probably reduce the demand for >> adding vendor specific named models, though I expect we'll still >> end up wanting some, or possibly even many. >> >> Having some common baseline models is likely useful for mgmt >> applications in other ways though. >> >> Consider you mgmt app wants to set a CPU model that's common across >> heterogeneous hardware. They don't neccessarily want/need to be >> able to live migrate between heterogeneous CPUs, but for simplicity >> of configuration desire to set a single named CPU across all guests, >> irrespective of what host hey are launched on. The ARM spec baseline >> named models would give you that config simplicity. > If we use architecture extensions (i.e. Armv8.x/9.x) as baseline, I'm > seeing some drawbacks: > - a lot of work before we can address some specific use cases > - old models can get new optional features > - a specific cpu might have a huge set of optional features on top of > the baseline model > > Using a reference core such as Neoverse-V2 probably makes more sense > (easier to get started, less feature diff?) It would still make a good > starting point for a simple config. > Actually from a dev point of view I am not sure it changes much to have either ARM spec rev baseline or CPU ref core named model. One remark is that if you look at https://developer.arm.com/documentation/109697/2024_09?lang=en you will see there are quite a lot of spec revisions and quite a few of them are actually meaningful in the light of currently avaiable and relevant HW we want to address. What I would like to avoid is to be obliged to look at all of them in a generic manner while we just want to address few cpu ref models. Also starting from the ARM spec rev baseline the end-user may need to add more feature opt-ins to be close to a specific cpu model. So I foresee extra complexity for the end-user. But again I from a dev pov it shouldn't change much and we should end up with a proto that illustrates the working model Eric
On Thu, Dec 12, 2024 at 11:04:30AM +0100, Eric Auger wrote: Hi Eric, > On 12/12/24 10:36, Cornelia Huck wrote: > > On Thu, Dec 12 2024, Daniel P. Berrangé <berrange@redhat.com> wrote: [...] > >> Consider you mgmt app wants to set a CPU model that's common across > >> heterogeneous hardware. They don't neccessarily want/need to be > >> able to live migrate between heterogeneous CPUs, but for simplicity > >> of configuration desire to set a single named CPU across all guests, > >> irrespective of what host hey are launched on. The ARM spec baseline > >> named models would give you that config simplicity. > > If we use architecture extensions (i.e. Armv8.x/9.x) as baseline, I'm > > seeing some drawbacks: > > - a lot of work before we can address some specific use cases > > - old models can get new optional features > > - a specific cpu might have a huge set of optional features on top of > > the baseline model > > > > Using a reference core such as Neoverse-V2 probably makes more sense > > (easier to get started, less feature diff?) It would still make a good > > starting point for a simple config. > > > Actually from a dev point of view I am not sure it changes much to have > either ARM spec rev baseline or CPU ref core named model. > > One remark is that if you look at > https://developer.arm.com/documentation/109697/2024_09?lang=en > you will see there are quite a lot of spec revisions and quite a few of > them are actually meaningful in the light of currently avaiable and > relevant HW we want to address. What I would like to avoid is to be > obliged to look at all of them in a generic manner while we just want to > address few cpu ref models. > > Also starting from the ARM spec rev baseline the end-user may need to > add more feature opt-ins to be close to a specific cpu model. So I > foresee extra complexity for the end-user. (Assuming I'm parsing your last para right; correct me if not.) Isn't a user wanting to add extra CPU flags (on top of a baseline) a "normal behaviour" and not "extra complexity"? Besides coming close to a specific CPU model, there's the additional important use-case of CPU flags that provide security mitigation. Consider this: Say, there's a serious security issue in a released ARM CPU. As part of the fix, two new CPU flags need to be exposed to the guest OS, call them "secflag1" and "secflag2". Here, the user is configuring a baseline model + two extra CPU flags, not to get close to some other CPU model but to mitigate itself against a serious security flaw. An example that comes to mind is the infamous "speculative store bypass" (SSB) vulnerability and how QEMU addressed it[1][2] in x86. I'm sure, as you know, it also affects ARM[3]. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg04796.html — i386: define the 'ssbd' CPUID feature bit (CVE-2018-3639) [2] https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg04797.html — i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639) [3] https://developer.arm.com/documentation/ddi0601/2024-06/AArch64-Registers/SSBS--Speculative-Store-Bypass-Safe -- /kashyap
On Thu, 19 Dec 2024 11:35:16 +0000, Kashyap Chamarthy <kchamart@redhat.com> wrote: > > On Thu, Dec 12, 2024 at 11:04:30AM +0100, Eric Auger wrote: > > Hi Eric, > > > On 12/12/24 10:36, Cornelia Huck wrote: > > > On Thu, Dec 12 2024, Daniel P. Berrangé <berrange@redhat.com> wrote: > > [...] > > > >> Consider you mgmt app wants to set a CPU model that's common across > > >> heterogeneous hardware. They don't neccessarily want/need to be > > >> able to live migrate between heterogeneous CPUs, but for simplicity > > >> of configuration desire to set a single named CPU across all guests, > > >> irrespective of what host hey are launched on. The ARM spec baseline > > >> named models would give you that config simplicity. > > > If we use architecture extensions (i.e. Armv8.x/9.x) as baseline, I'm > > > seeing some drawbacks: > > > - a lot of work before we can address some specific use cases > > > - old models can get new optional features > > > - a specific cpu might have a huge set of optional features on top of > > > the baseline model > > > > > > Using a reference core such as Neoverse-V2 probably makes more sense > > > (easier to get started, less feature diff?) It would still make a good > > > starting point for a simple config. > > > > > Actually from a dev point of view I am not sure it changes much to have > > either ARM spec rev baseline or CPU ref core named model. > > > > One remark is that if you look at > > https://developer.arm.com/documentation/109697/2024_09?lang=en > > you will see there are quite a lot of spec revisions and quite a few of > > them are actually meaningful in the light of currently avaiable and > > relevant HW we want to address. What I would like to avoid is to be > > obliged to look at all of them in a generic manner while we just want to > > address few cpu ref models. > > > > Also starting from the ARM spec rev baseline the end-user may need to > > add more feature opt-ins to be close to a specific cpu model. So I > > foresee extra complexity for the end-user. > > (Assuming I'm parsing your last para right; correct me if not.) > > Isn't a user wanting to add extra CPU flags (on top of a baseline) a > "normal behaviour" and not "extra complexity"? Besides coming close to > a specific CPU model, there's the additional important use-case of CPU > flags that provide security mitigation. > > Consider this: > > Say, there's a serious security issue in a released ARM CPU. As part of > the fix, two new CPU flags need to be exposed to the guest OS, call them > "secflag1" and "secflag2". Here, the user is configuring a baseline > model + two extra CPU flags, not to get close to some other CPU model > but to mitigate itself against a serious security flaw. If there's such a security issue, that the hypervisor's job to do so, not userspace. See what KVM does for CSV3, for example (and all the rest of the side-channel stuff). You can't rely on userspace for security, that'd be completely ludicrous. M. -- Without deviation from the norm, progress is not possible.
On Thu, Dec 19, 2024 at 12:26:29PM +0000, Marc Zyngier wrote: > On Thu, 19 Dec 2024 11:35:16 +0000, > Kashyap Chamarthy <kchamart@redhat.com> wrote: > > > > On Thu, Dec 12, 2024 at 11:04:30AM +0100, Eric Auger wrote: > > > > Hi Eric, > > > > > On 12/12/24 10:36, Cornelia Huck wrote: > > > > On Thu, Dec 12 2024, Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > [...] > > > > > >> Consider you mgmt app wants to set a CPU model that's common across > > > >> heterogeneous hardware. They don't neccessarily want/need to be > > > >> able to live migrate between heterogeneous CPUs, but for simplicity > > > >> of configuration desire to set a single named CPU across all guests, > > > >> irrespective of what host hey are launched on. The ARM spec baseline > > > >> named models would give you that config simplicity. > > > > If we use architecture extensions (i.e. Armv8.x/9.x) as baseline, I'm > > > > seeing some drawbacks: > > > > - a lot of work before we can address some specific use cases > > > > - old models can get new optional features > > > > - a specific cpu might have a huge set of optional features on top of > > > > the baseline model > > > > > > > > Using a reference core such as Neoverse-V2 probably makes more sense > > > > (easier to get started, less feature diff?) It would still make a good > > > > starting point for a simple config. > > > > > > > Actually from a dev point of view I am not sure it changes much to have > > > either ARM spec rev baseline or CPU ref core named model. > > > > > > One remark is that if you look at > > > https://developer.arm.com/documentation/109697/2024_09?lang=en > > > you will see there are quite a lot of spec revisions and quite a few of > > > them are actually meaningful in the light of currently avaiable and > > > relevant HW we want to address. What I would like to avoid is to be > > > obliged to look at all of them in a generic manner while we just want to > > > address few cpu ref models. > > > > > > Also starting from the ARM spec rev baseline the end-user may need to > > > add more feature opt-ins to be close to a specific cpu model. So I > > > foresee extra complexity for the end-user. > > > > (Assuming I'm parsing your last para right; correct me if not.) > > > > Isn't a user wanting to add extra CPU flags (on top of a baseline) a > > "normal behaviour" and not "extra complexity"? Besides coming close to > > a specific CPU model, there's the additional important use-case of CPU > > flags that provide security mitigation. > > > > Consider this: > > > > Say, there's a serious security issue in a released ARM CPU. As part of > > the fix, two new CPU flags need to be exposed to the guest OS, call them > > "secflag1" and "secflag2". Here, the user is configuring a baseline > > model + two extra CPU flags, not to get close to some other CPU model > > but to mitigate itself against a serious security flaw. > > If there's such a security issue, that the hypervisor's job to do so, > not userspace. See what KVM does for CSV3, for example (and all the > rest of the side-channel stuff). > > You can't rely on userspace for security, that'd be completely > ludicrous. Actually that's a normal situation QEMU has to deal with. QEMU needs to be able to expose a deterministic fixed ABI to the guest VM, and that includes control over what CPU features are exposed to it. In most cases, the hypervisor cannot arbitrary force enable new guest features without agreement from QEMU. If a guest happens to be using '-cpu host', then when a new CPU flag arrives as part of a security fix, there is at least no CPU config change required. QEMU may or may not need changes, in order that the behaviour associated with the new CPU flag is correctly handled. If the guest is using a named CPU model, as well as modifying QEMU to know about the new flag, the host admin needs to explicitly decide whether & when to expose the new CPU flag for each guest VM on the host. Until the new CPU flag is exposed to the guest, while the host itself may be able to remain protected to the new security issue, the guest OS is likely remain vulnerable, or have degraded operation in some way. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, 19 Dec 2024 12:38:50 +0000, Daniel "P. Berrangé" <berrange@redhat.com> wrote: > > On Thu, Dec 19, 2024 at 12:26:29PM +0000, Marc Zyngier wrote: > > On Thu, 19 Dec 2024 11:35:16 +0000, > > Kashyap Chamarthy <kchamart@redhat.com> wrote: > > > > > > On Thu, Dec 12, 2024 at 11:04:30AM +0100, Eric Auger wrote: > > > > > > Hi Eric, > > > > > > > On 12/12/24 10:36, Cornelia Huck wrote: > > > > > On Thu, Dec 12 2024, Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > [...] > > > > > > > >> Consider you mgmt app wants to set a CPU model that's common across > > > > >> heterogeneous hardware. They don't neccessarily want/need to be > > > > >> able to live migrate between heterogeneous CPUs, but for simplicity > > > > >> of configuration desire to set a single named CPU across all guests, > > > > >> irrespective of what host hey are launched on. The ARM spec baseline > > > > >> named models would give you that config simplicity. > > > > > If we use architecture extensions (i.e. Armv8.x/9.x) as baseline, I'm > > > > > seeing some drawbacks: > > > > > - a lot of work before we can address some specific use cases > > > > > - old models can get new optional features > > > > > - a specific cpu might have a huge set of optional features on top of > > > > > the baseline model > > > > > > > > > > Using a reference core such as Neoverse-V2 probably makes more sense > > > > > (easier to get started, less feature diff?) It would still make a good > > > > > starting point for a simple config. > > > > > > > > > Actually from a dev point of view I am not sure it changes much to have > > > > either ARM spec rev baseline or CPU ref core named model. > > > > > > > > One remark is that if you look at > > > > https://developer.arm.com/documentation/109697/2024_09?lang=en > > > > you will see there are quite a lot of spec revisions and quite a few of > > > > them are actually meaningful in the light of currently avaiable and > > > > relevant HW we want to address. What I would like to avoid is to be > > > > obliged to look at all of them in a generic manner while we just want to > > > > address few cpu ref models. > > > > > > > > Also starting from the ARM spec rev baseline the end-user may need to > > > > add more feature opt-ins to be close to a specific cpu model. So I > > > > foresee extra complexity for the end-user. > > > > > > (Assuming I'm parsing your last para right; correct me if not.) > > > > > > Isn't a user wanting to add extra CPU flags (on top of a baseline) a > > > "normal behaviour" and not "extra complexity"? Besides coming close to > > > a specific CPU model, there's the additional important use-case of CPU > > > flags that provide security mitigation. > > > > > > Consider this: > > > > > > Say, there's a serious security issue in a released ARM CPU. As part of > > > the fix, two new CPU flags need to be exposed to the guest OS, call them > > > "secflag1" and "secflag2". Here, the user is configuring a baseline > > > model + two extra CPU flags, not to get close to some other CPU model > > > but to mitigate itself against a serious security flaw. > > > > If there's such a security issue, that the hypervisor's job to do so, > > not userspace. See what KVM does for CSV3, for example (and all the > > rest of the side-channel stuff). > > > > You can't rely on userspace for security, that'd be completely > > ludicrous. > > Actually that's a normal situation QEMU has to deal with. > > QEMU needs to be able to expose a deterministic fixed ABI to the guest > VM, and that includes control over what CPU features are exposed to > it. In most cases, the hypervisor cannot arbitrary force enable new > guest features without agreement from QEMU. Which ABI? The only ABI that matters is what is defined by the architecture. When it comes to CPU features, new features are exposed by default. If QEMU wants to turn it off, it can in most (but not all) cases. But that's the extent of the "agreement" we have with userspace, QEMU or otherwise. If a feature is deemed broken or unsafe, KVM will turn it at least hide it from the guest without userspace's intervention, and if possible actively turn it off. > If a guest happens to be using '-cpu host', then when a new CPU flag > arrives as part of a security fix, there is at least no CPU config > change required. QEMU may or may not need changes, in order that > the behaviour associated with the new CPU flag is correctly handled. How is that "flag" visible from the guest? The only way to expose properties is through the ID registers, and you can't invent your own, nor expose something that is not already handled by the host. > If the guest is using a named CPU model, as well as modifying QEMU > to know about the new flag, the host admin needs to explicitly > decide whether & when to expose the new CPU flag for each guest VM > on the host. > > Until the new CPU flag is exposed to the guest, while the host itself > may be able to remain protected to the new security issue, the guest > OS is likely remain vulnerable, or have degraded operation in some way. I think that's the point where we talk past each other. There is no "flag" that can be exposed to a guest as part of the architecture. We have a set of architectural features, and in 99% of the cases, we can only expose to the guest a feature that both exists on the host and that the hypervisor understands. Thanks, M. -- Without deviation from the norm, progress is not possible.
On Thu, Dec 19, 2024 at 12:26:29PM +0000, Marc Zyngier wrote: > On Thu, 19 Dec 2024 11:35:16 +0000, > Kashyap Chamarthy <kchamart@redhat.com> wrote: [...] > > Consider this: > > > > Say, there's a serious security issue in a released ARM CPU. As part of > > the fix, two new CPU flags need to be exposed to the guest OS, call them > > "secflag1" and "secflag2". Here, the user is configuring a baseline > > model + two extra CPU flags, not to get close to some other CPU model > > but to mitigate itself against a serious security flaw. > > If there's such a security issue, that the hypervisor's job to do so, > not userspace. I don't disagree. Probably that has always been the case on ARM. I asked the above based on how QEMU on x86 handles it today. > See what KVM does for CSV3, for example (and all the > rest of the side-channel stuff). Noted. From a quick look in the kernel tree, I assume you're referring to these commits[1]. > You can't rely on userspace for security, that'd be completely > ludicrous. As Dan Berrangé points out, it's the bog-standard way QEMU deals with some of the CPU-related issues on x86 today. See this "important CPU flags"[2] section in the QEMU docs. Mind you, I'm _not_ saying this is how ARM should do it. I don't know enough about ARM to make such remarks. * * * To reply to your other question on this thread[3] about "which ABI?" I think Dan is talking about the *guest* ABI: the virtual "chipset" that is exposed to a guest (e.g. PCI(e) topology, ACPI tables, CPU model, etc). As I understand it, this "guest ABI" should remain predictable, regardless of: - whether you're updating KVM, QEMU, or the underlying physical hardware itself; or - if the guest is migrated, live or offline (As you might know, QEMU's "machine types" concept allows to create a stable guest ABI.) [1] "CVE3"-related commits: - 471470bc7052 (arm64: errata: Add Cortex-A520 speculative unprivileged load workaround, 2023-09-21) - 4f1df628d4ec (KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV3=1 if the CPUs are Meltdown-safe, 2020-11-26) [2] https://www.qemu.org/docs/master/system/i386/cpu.html#important-cpu-features-for-intel-x86-hosts - "Important CPU features for Intel x86 hosts" [3] https://lists.nongnu.org/archive/html/qemu-arm/2024-12/msg01224.html -- /kashyap
On Thu, 19 Dec 2024 15:07:25 +0000, Kashyap Chamarthy <kchamart@redhat.com> wrote: > > On Thu, Dec 19, 2024 at 12:26:29PM +0000, Marc Zyngier wrote: > > On Thu, 19 Dec 2024 11:35:16 +0000, > > Kashyap Chamarthy <kchamart@redhat.com> wrote: > > [...] > > > > Consider this: > > > > > > Say, there's a serious security issue in a released ARM CPU. As part of > > > the fix, two new CPU flags need to be exposed to the guest OS, call them > > > "secflag1" and "secflag2". Here, the user is configuring a baseline > > > model + two extra CPU flags, not to get close to some other CPU model > > > but to mitigate itself against a serious security flaw. > > > > If there's such a security issue, that the hypervisor's job to do so, > > not userspace. > > I don't disagree. Probably that has always been the case on ARM. I > asked the above based on how QEMU on x86 handles it today. > > > See what KVM does for CSV3, for example (and all the > > rest of the side-channel stuff). > > Noted. From a quick look in the kernel tree, I assume you're referring > to these commits[1]. > > > You can't rely on userspace for security, that'd be completely > > ludicrous. > > As Dan Berrangé points out, it's the bog-standard way QEMU deals with > some of the CPU-related issues on x86 today. See this "important CPU > flags"[2] section in the QEMU docs. I had a look, and we do things quite differently. For example, the spec-ctrl equivalent in implemented in FW and in KVM, and is exposed by default if the HW is vulnerable. Userspace could hide that the mitigation is there, but that's the extent of the configurability. > > Mind you, I'm _not_ saying this is how ARM should do it. I don't know > enough about ARM to make such remarks. > > * * * > > To reply to your other question on this thread[3] about "which ABI?" I > think Dan is talking about the *guest* ABI: the virtual "chipset" that > is exposed to a guest (e.g. PCI(e) topology, ACPI tables, CPU model, > etc). As I understand it, this "guest ABI" should remain predictable, > regardless of: > > - whether you're updating KVM, QEMU, or the underlying physical > hardware itself; or > - if the guest is migrated, live or offline > > (As you might know, QEMU's "machine types" concept allows to create a > stable guest ABI.) All of this is under control of QEMU, *except* for the "maximum" of the architectural features exposed to the guest. All you can do is *downgrade* from there, and only to a limited extent. That, in turn has a direct impact on what you call the "CPU model", which for the ARM architecture really doesn't exist. All we have is a bag of discrete features, with intricate dependencies between them. Even ignoring virtualisation: you can readily find two machines using the same CPUs (let's say Neoverse-N1), integrated by the same vendor (let's say, Ampere), in SoCs that bear the same name (Altra), and realise that they have a different feature set. Fun, isn't it? That's why I don't see CPU models as a viable thing in terms of ABI. They are an approximation of what you could have, but the ABI is elsewhere. Thanks, M. -- Without deviation from the norm, progress is not possible.
On Thu, Dec 19, 2024 at 03:41:56PM +0000, Marc Zyngier wrote: > On Thu, 19 Dec 2024 15:07:25 +0000, > Kashyap Chamarthy <kchamart@redhat.com> wrote: > > > > On Thu, Dec 19, 2024 at 12:26:29PM +0000, Marc Zyngier wrote: > > > On Thu, 19 Dec 2024 11:35:16 +0000, > > > Kashyap Chamarthy <kchamart@redhat.com> wrote: [...] > > > You can't rely on userspace for security, that'd be completely > > > ludicrous. > > > > As Dan Berrangé points out, it's the bog-standard way QEMU deals with > > some of the CPU-related issues on x86 today. See this "important CPU > > flags"[2] section in the QEMU docs. > > I had a look, and we do things quite differently. For example, the > spec-ctrl equivalent in implemented in FW and in KVM, and is exposed > by default if the HW is vulnerable. Userspace could hide that the > mitigation is there, but that's the extent of the configurability. Noted. As Dan says, as long as QEMU can toggle the feature on/off, then that might be sufficient in the context of migratability. [...] > > To reply to your other question on this thread[3] about "which ABI?" I > > think Dan is talking about the *guest* ABI: the virtual "chipset" that > > is exposed to a guest (e.g. PCI(e) topology, ACPI tables, CPU model, > > etc). As I understand it, this "guest ABI" should remain predictable, > > regardless of: > > > > - whether you're updating KVM, QEMU, or the underlying physical > > hardware itself; or > > - if the guest is migrated, live or offline > > > > (As you might know, QEMU's "machine types" concept allows to create a > > stable guest ABI.) > > All of this is under control of QEMU, *except* for the "maximum" of > the architectural features exposed to the guest. All you can do is > *downgrade* from there, and only to a limited extent. > > That, in turn has a direct impact on what you call the "CPU model", > which for the ARM architecture really doesn't exist. All we have is a > bag of discrete features, with intricate dependencies between them. I see; thanks for this explanation. Your last sentence above is the shortest summary of the CPU features situation on ARM I've ever read so far. So, I infer this from what you're saying (do correct if it's wrong): • Currently it is impractical (not feasible?) to pull together a minimal-and-usable set of CPU features + their dependencies on ARM to come up with a "CPU model" that can work across a reasonable set of hardware. • If the above is true, then the ability to toggle CPU features on and off might become even more important for QEMU — if it wants to be able to support live migration across mixed set of hardware on ARM. NB: by "mixed set of hardware", I mean hardware that is *close enough* (e.g. among the "Ampere Altra Family" - BTW, this "family" seems to be only 2 systems far). Not arbitrarily mixed. I did read your response in this thread about "who in their right mind" would want to migrate from Nvidia "Grace" to "AmpereOne". https://lore.kernel.org/linux-arm-kernel/86y10ytpo6.wl-maz@kernel.org/ — KVM: arm64: Make the exposed feature bits in AA64DFR0_EL1 writable from userspace > Even ignoring virtualisation: you can readily find two machines using > the same CPUs (let's say Neoverse-N1), integrated by the same vendor > (let's say, Ampere), in SoCs that bear the same name (Altra), and > realise that they have a different feature set. Fun, isn't it? Yikes! I would use a different word, that starts with "m" and ends with "s" (the resulting word rhymes with the latter) ;-) * * * Related tangent on CPU feature discoverability on ARM: Speaking of "Neoverse-N1", looking at a system that I have access to, the `lscpu` output does not say anything about who the integrator is; it only says: ... Vendor ID: ARM Model name: Neoverse-N1 ... I realize, `lscpu` displays only whatever the kernel knows. Nothing in `dmidecode` either. Also, it looks like there's no equivalent of a "CPUID" instruction (I realize it is x86-specific) on ARM. Although, I came across a Google Git repo that seems to implement a bespoke, "aarch64_cpuid". From a what I see, it seems to fetch the "Main ID Register" (MIDR_EL1) - I don't know enough about it to understand its implications: https://github.com/google/cpu_features/blob/main/src/impl_aarch64_cpuid.c > That's why I don't see CPU models as a viable thing in terms of ABI. > They are an approximation of what you could have, but the ABI is > elsewhere. Hmm, this is "significant new information" for me. If CPU models can't be part of the guest ABI on ARM, then the whole "migratability across heterogenous hardware" on QEMU requires deeper thinking. Thanks for this discussion. -- /kashyap
On Fri, 20 Dec 2024 11:52:51 +0000, Kashyap Chamarthy <kchamart@redhat.com> wrote: > > On Thu, Dec 19, 2024 at 03:41:56PM +0000, Marc Zyngier wrote: > > On Thu, 19 Dec 2024 15:07:25 +0000, > > Kashyap Chamarthy <kchamart@redhat.com> wrote: > > > > > > On Thu, Dec 19, 2024 at 12:26:29PM +0000, Marc Zyngier wrote: > > > > On Thu, 19 Dec 2024 11:35:16 +0000, > > > > Kashyap Chamarthy <kchamart@redhat.com> wrote: > > [...] > > > > > You can't rely on userspace for security, that'd be completely > > > > ludicrous. > > > > > > As Dan Berrangé points out, it's the bog-standard way QEMU deals with > > > some of the CPU-related issues on x86 today. See this "important CPU > > > flags"[2] section in the QEMU docs. > > > > I had a look, and we do things quite differently. For example, the > > spec-ctrl equivalent in implemented in FW and in KVM, and is exposed > > by default if the HW is vulnerable. Userspace could hide that the > > mitigation is there, but that's the extent of the configurability. > > Noted. As Dan says, as long as QEMU can toggle the feature on/off, then > that might be sufficient in the context of migratability. > > [...] > > > > To reply to your other question on this thread[3] about "which ABI?" I > > > think Dan is talking about the *guest* ABI: the virtual "chipset" that > > > is exposed to a guest (e.g. PCI(e) topology, ACPI tables, CPU model, > > > etc). As I understand it, this "guest ABI" should remain predictable, > > > regardless of: > > > > > > - whether you're updating KVM, QEMU, or the underlying physical > > > hardware itself; or > > > - if the guest is migrated, live or offline > > > > > > (As you might know, QEMU's "machine types" concept allows to create a > > > stable guest ABI.) > > > > All of this is under control of QEMU, *except* for the "maximum" of > > the architectural features exposed to the guest. All you can do is > > *downgrade* from there, and only to a limited extent. > > > > That, in turn has a direct impact on what you call the "CPU model", > > which for the ARM architecture really doesn't exist. All we have is a > > bag of discrete features, with intricate dependencies between them. > > I see; thanks for this explanation. Your last sentence above is the > shortest summary of the CPU features situation on ARM I've ever read so > far. > > So, I infer this from what you're saying (do correct if it's wrong): > > • Currently it is impractical (not feasible?) to pull together a > minimal-and-usable set of CPU features + their dependencies on ARM > to come up with a "CPU model" that can work across a reasonable set > of hardware. It isn't quite that. It *is* technically possible, and KVM does give you the tools you need for that. In practice, the diversity of the ecosystem is so huge that you can only rely on some very basic stuff unless the implementations are already very close. And that "small details" such as the timer frequency are strictly identical. > > • If the above is true, then the ability to toggle CPU features on and > off might become even more important for QEMU — if it wants to be > able to support live migration across mixed set of hardware on ARM. Turning CPU features off is not always possible. Hiding them is generally possible, with a number of exceptions. We try our best to provide both, but it's... complicated. [...] > Related tangent on CPU feature discoverability on ARM: > > Speaking of "Neoverse-N1", looking at a system that I have access to, > the `lscpu` output does not say anything about who the integrator is; it > only says: > > ... > Vendor ID: ARM > Model name: Neoverse-N1 > ... > > I realize, `lscpu` displays only whatever the kernel knows. Nothing in > `dmidecode` either. The kernel does not know anything about the "Neoverse-N1" string. It can match some MIDR_EL1 values for errata workaround purposes, but doesn't gives two hoots about a human readable string. Every other year, we get asked to add a full database of strings in the kernel. The answer is a simple, polite, and final "no way". This serves no purpose at all. lscpu does have that database, and that's the right place to do it. When it comes to integration, the firmware can optionally report some information, which is the EL3 version of a commercial break (see the SOC_ID stuff). This isn't wildly deployed, thankfully. > Also, it looks like there's no equivalent of a "CPUID" instruction (I > realize it is x86-specific) on ARM. Although, I came across a Google > Git repo that seems to implement a bespoke, "aarch64_cpuid". From a > what I see, it seems to fetch the "Main ID Register" (MIDR_EL1) - I > don't know enough about it to understand its implications: > > https://github.com/google/cpu_features/blob/main/src/impl_aarch64_cpuid.c MIDR_EL1 doesn't give you much, and you cannot assume anything about the the feature set from it. Linux already allows you to inspect the ID registers from userspace (by trapping, emulating, and sanitising the result). That's the only reliable source of information. > > > That's why I don't see CPU models as a viable thing in terms of ABI. > > They are an approximation of what you could have, but the ABI is > > elsewhere. > > Hmm, this is "significant new information" for me. If CPU models can't > be part of the guest ABI on ARM, then the whole "migratability across > heterogenous hardware" on QEMU requires deeper thinking. As I said all along, the only source of truth is the set of ID registers. Nothing else. You can build a "model" on top of that, but not the other way around. Thanks, M. -- Without deviation from the norm, progress is not possible.
On Fri, Dec 20 2024, Kashyap Chamarthy <kchamart@redhat.com> wrote: > Related tangent on CPU feature discoverability on ARM: > > Speaking of "Neoverse-N1", looking at a system that I have access to, > the `lscpu` output does not say anything about who the integrator is; it > only says: > > ... > Vendor ID: ARM > Model name: Neoverse-N1 > ... > > I realize, `lscpu` displays only whatever the kernel knows. Nothing in > `dmidecode` either. > > Also, it looks like there's no equivalent of a "CPUID" instruction (I > realize it is x86-specific) on ARM. Although, I came across a Google > Git repo that seems to implement a bespoke, "aarch64_cpuid". From a > what I see, it seems to fetch the "Main ID Register" (MIDR_EL1) - I > don't know enough about it to understand its implications: > > https://github.com/google/cpu_features/blob/main/src/impl_aarch64_cpuid.c My guess is that this is mostly for "we have code that looks for a cpuid like on x86, let's provide some code on arm that gives something that is at least somewhat useful." For "CPU feature discoverability", I don't think that there's any way other than looking at the actual id registers. It would be nice if you could at least know that "there are some <unspecified> differences in features" by comparing MIDR/REVIDR/AIDR, but that's not the case IIRC? [Anyway, I'm off for the year :)]
On Thu, Dec 19, 2024 at 03:41:56PM +0000, Marc Zyngier wrote: > On Thu, 19 Dec 2024 15:07:25 +0000, > Kashyap Chamarthy <kchamart@redhat.com> wrote: > > > > On Thu, Dec 19, 2024 at 12:26:29PM +0000, Marc Zyngier wrote: > > > On Thu, 19 Dec 2024 11:35:16 +0000, > > > Kashyap Chamarthy <kchamart@redhat.com> wrote: > > > > [...] > > > > > > Consider this: > > > > > > > > Say, there's a serious security issue in a released ARM CPU. As part of > > > > the fix, two new CPU flags need to be exposed to the guest OS, call them > > > > "secflag1" and "secflag2". Here, the user is configuring a baseline > > > > model + two extra CPU flags, not to get close to some other CPU model > > > > but to mitigate itself against a serious security flaw. > > > > > > If there's such a security issue, that the hypervisor's job to do so, > > > not userspace. > > > > I don't disagree. Probably that has always been the case on ARM. I > > asked the above based on how QEMU on x86 handles it today. > > > > > See what KVM does for CSV3, for example (and all the > > > rest of the side-channel stuff). > > > > Noted. From a quick look in the kernel tree, I assume you're referring > > to these commits[1]. > > > > > You can't rely on userspace for security, that'd be completely > > > ludicrous. > > > > As Dan Berrangé points out, it's the bog-standard way QEMU deals with > > some of the CPU-related issues on x86 today. See this "important CPU > > flags"[2] section in the QEMU docs. > > I had a look, and we do things quite differently. For example, the > spec-ctrl equivalent in implemented in FW and in KVM, and is exposed > by default if the HW is vulnerable. Userspace could hide that the > mitigation is there, but that's the extent of the configurability. Whether it is enabled by default or disabled by default isn't a totally fatal problem. If QEMU can toggle it to the opposite value, we have the same level of configurability in both cases. It does, however, have implications for QEMU as if KVM gained support for exposing the new feature by default and QEMU didn't know about it, then the guest ABI would have changed without QEMU realizing it. IOW, it would imply a requirement for timely QEMU updates to match the kernel, which is something we wouldn't need in x86 world where the feature is disabled by default. Disable by default is a more stable approach from QEMU's POV. > > Mind you, I'm _not_ saying this is how ARM should do it. I don't know > > enough about ARM to make such remarks. > > > > * * * > > > > To reply to your other question on this thread[3] about "which ABI?" I > > think Dan is talking about the *guest* ABI: the virtual "chipset" that > > is exposed to a guest (e.g. PCI(e) topology, ACPI tables, CPU model, > > etc). As I understand it, this "guest ABI" should remain predictable, > > regardless of: > > > > - whether you're updating KVM, QEMU, or the underlying physical > > hardware itself; or > > - if the guest is migrated, live or offline > > > > (As you might know, QEMU's "machine types" concept allows to create a > > stable guest ABI.) > > All of this is under control of QEMU, *except* for the "maximum" of > the architectural features exposed to the guest. All you can do is > *downgrade* from there, and only to a limited extent. > > That, in turn has a direct impact on what you call the "CPU model", > which for the ARM architecture really doesn't exist. All we have is a > bag of discrete features, with intricate dependencies between them. > > Even ignoring virtualisation: you can readily find two machines using > the same CPUs (let's say Neoverse-N1), integrated by the same vendor > (let's say, Ampere), in SoCs that bear the same name (Altra), and > realise that they have a different feature set. Fun, isn't it? "Fun" is probably not the word I'd pick :-) > That's why I don't see CPU models as a viable thing in terms of ABI. > They are an approximation of what you could have, but the ABI is > elsewhere. Right, this makes life quite challenging for QEMU. The premise of named CPU models (as opposed to -host), is to facilitate the migration of VMs between heterogenous hardware platforms. That assumes it is possible to downgrade the CPU on both src + dst, to the common baseline you desire. If we were to define a named CPU model, for that to be usable, QEMU would have to be able to query the "maxmimum" architectural features, and validate that the delta between the host maximum, and the named CPU model is possible to downgrade. Is arm providing sufficient info to let QEMU do that ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, 19 Dec 2024 17:51:44 +0000, Daniel "P. Berrangé" <berrange@redhat.com> wrote: > > On Thu, Dec 19, 2024 at 03:41:56PM +0000, Marc Zyngier wrote: > > On Thu, 19 Dec 2024 15:07:25 +0000, > > Kashyap Chamarthy <kchamart@redhat.com> wrote: > > > > > > On Thu, Dec 19, 2024 at 12:26:29PM +0000, Marc Zyngier wrote: > > > > On Thu, 19 Dec 2024 11:35:16 +0000, > > > > Kashyap Chamarthy <kchamart@redhat.com> wrote: > > > > > > [...] > > > > > > > > Consider this: > > > > > > > > > > Say, there's a serious security issue in a released ARM CPU. As part of > > > > > the fix, two new CPU flags need to be exposed to the guest OS, call them > > > > > "secflag1" and "secflag2". Here, the user is configuring a baseline > > > > > model + two extra CPU flags, not to get close to some other CPU model > > > > > but to mitigate itself against a serious security flaw. > > > > > > > > If there's such a security issue, that the hypervisor's job to do so, > > > > not userspace. > > > > > > I don't disagree. Probably that has always been the case on ARM. I > > > asked the above based on how QEMU on x86 handles it today. > > > > > > > See what KVM does for CSV3, for example (and all the > > > > rest of the side-channel stuff). > > > > > > Noted. From a quick look in the kernel tree, I assume you're referring > > > to these commits[1]. > > > > > > > You can't rely on userspace for security, that'd be completely > > > > ludicrous. > > > > > > As Dan Berrangé points out, it's the bog-standard way QEMU deals with > > > some of the CPU-related issues on x86 today. See this "important CPU > > > flags"[2] section in the QEMU docs. > > > > I had a look, and we do things quite differently. For example, the > > spec-ctrl equivalent in implemented in FW and in KVM, and is exposed > > by default if the HW is vulnerable. Userspace could hide that the > > mitigation is there, but that's the extent of the configurability. > > Whether it is enabled by default or disabled by default isn't a > totally fatal problem. If QEMU can toggle it to the opposite value, > we have the same level of configurability in both cases. > > It does, however, have implications for QEMU as if KVM gained support > for exposing the new feature by default and QEMU didn't know about > it, then the guest ABI would have changed without QEMU realizing it. No. It just imposes that QEMU implements its part of the architecture, which is that any ID reg it doesn't know about and that is advertised as writable gets written back to 0, which is (in general, but with a couple of exceptions) the value indicating that a feature is not implemented. The ID register space is architected, and has been unchanged for the past 13 years. > IOW, it would imply a requirement for timely QEMU updates to match > the kernel, which is something we wouldn't need in x86 world where > the feature is disabled by default. Disable by default is a more > stable approach from QEMU's POV. Given the above, I don't see where the burden is. And that ship has sailed since the beginning of KVM/arm, really. It is also worth realising that for a very long time, it wasn't really possible to "disable" new features. Even today, disabling a feature really means emulating its absence. > > > > Mind you, I'm _not_ saying this is how ARM should do it. I don't know > > > enough about ARM to make such remarks. > > > > > > * * * > > > > > > To reply to your other question on this thread[3] about "which ABI?" I > > > think Dan is talking about the *guest* ABI: the virtual "chipset" that > > > is exposed to a guest (e.g. PCI(e) topology, ACPI tables, CPU model, > > > etc). As I understand it, this "guest ABI" should remain predictable, > > > regardless of: > > > > > > - whether you're updating KVM, QEMU, or the underlying physical > > > hardware itself; or > > > - if the guest is migrated, live or offline > > > > > > (As you might know, QEMU's "machine types" concept allows to create a > > > stable guest ABI.) > > > > All of this is under control of QEMU, *except* for the "maximum" of > > the architectural features exposed to the guest. All you can do is > > *downgrade* from there, and only to a limited extent. > > > > That, in turn has a direct impact on what you call the "CPU model", > > which for the ARM architecture really doesn't exist. All we have is a > > bag of discrete features, with intricate dependencies between them. > > > > Even ignoring virtualisation: you can readily find two machines using > > the same CPUs (let's say Neoverse-N1), integrated by the same vendor > > (let's say, Ampere), in SoCs that bear the same name (Altra), and > > realise that they have a different feature set. Fun, isn't it? > > "Fun" is probably not the word I'd pick :-) Of course not. "Braindead" is the word I wanted to write, but sarcasm took over... ;-) > > > That's why I don't see CPU models as a viable thing in terms of ABI. > > They are an approximation of what you could have, but the ABI is > > elsewhere. > > Right, this makes life quite challenging for QEMU. The premise of named > CPU models (as opposed to -host), is to facilitate the migration of VMs > between heterogenous hardware platforms. That assumes it is possible to > downgrade the CPU on both src + dst, to the common baseline you desire. > > If we were to define a named CPU model, for that to be usable, QEMU > would have to be able to query the "maxmimum" architectural features, > and validate that the delta between the host maximum, and the named > CPU model is possible to downgrade. Is arm providing sufficient info > to let QEMU do that ? I think so. On creating a brand new VM, you get the maximum allowed on the HW, and the subset of features you can downgrade. The intersection of these two sets and your model's will tell you whether you can actually instantiate this model on this host. You can also decide that it is OK to let a extra features advertised, such as extra page sizes or 32bit support, which the hypervisor can hide, but not disable. Thanks, M. -- Without deviation from the norm, progress is not possible.
On Thu, Dec 19 2024, Daniel P. Berrangé <berrange@redhat.com> wrote: > On Thu, Dec 19, 2024 at 03:41:56PM +0000, Marc Zyngier wrote: >> On Thu, 19 Dec 2024 15:07:25 +0000, >> Kashyap Chamarthy <kchamart@redhat.com> wrote: >> > >> > On Thu, Dec 19, 2024 at 12:26:29PM +0000, Marc Zyngier wrote: >> > > On Thu, 19 Dec 2024 11:35:16 +0000, >> > > Kashyap Chamarthy <kchamart@redhat.com> wrote: >> > >> > [...] >> > >> > > > Consider this: >> > > > >> > > > Say, there's a serious security issue in a released ARM CPU. As part of >> > > > the fix, two new CPU flags need to be exposed to the guest OS, call them >> > > > "secflag1" and "secflag2". Here, the user is configuring a baseline >> > > > model + two extra CPU flags, not to get close to some other CPU model >> > > > but to mitigate itself against a serious security flaw. >> > > >> > > If there's such a security issue, that the hypervisor's job to do so, >> > > not userspace. >> > >> > I don't disagree. Probably that has always been the case on ARM. I >> > asked the above based on how QEMU on x86 handles it today. >> > >> > > See what KVM does for CSV3, for example (and all the >> > > rest of the side-channel stuff). >> > >> > Noted. From a quick look in the kernel tree, I assume you're referring >> > to these commits[1]. >> > >> > > You can't rely on userspace for security, that'd be completely >> > > ludicrous. >> > >> > As Dan Berrangé points out, it's the bog-standard way QEMU deals with >> > some of the CPU-related issues on x86 today. See this "important CPU >> > flags"[2] section in the QEMU docs. >> >> I had a look, and we do things quite differently. For example, the >> spec-ctrl equivalent in implemented in FW and in KVM, and is exposed >> by default if the HW is vulnerable. Userspace could hide that the >> mitigation is there, but that's the extent of the configurability. > > Whether it is enabled by default or disabled by default isn't a > totally fatal problem. If QEMU can toggle it to the opposite value, > we have the same level of configurability in both cases. I don't think "hiding" is the same thing as "disabling"? The underlying behaviour will still have changed, the main question is whether that is a problem. > > It does, however, have implications for QEMU as if KVM gained support > for exposing the new feature by default and QEMU didn't know about > it, then the guest ABI would have changed without QEMU realizing it. > > IOW, it would imply a requirement for timely QEMU updates to match > the kernel, which is something we wouldn't need in x86 world where > the feature is disabled by default. Disable by default is a more > stable approach from QEMU's POV. It implies that QEMU (or generally the VMM) needs to actively disable everything it does not know about (i.e. setting everything in any writable id reg to zero if it has no idea what it is about) to provide a stable guest interface across different kernels. Just tweaking some known values is only sufficient for a stable interface across two systems with the same kernel. (...) >> That's why I don't see CPU models as a viable thing in terms of ABI. >> They are an approximation of what you could have, but the ABI is >> elsewhere. > > Right, this makes life quite challenging for QEMU. The premise of named > CPU models (as opposed to -host), is to facilitate the migration of VMs > between heterogenous hardware platforms. That assumes it is possible to > downgrade the CPU on both src + dst, to the common baseline you desire. > > If we were to define a named CPU model, for that to be usable, QEMU > would have to be able to query the "maxmimum" architectural features, > and validate that the delta between the host maximum, and the named > CPU model is possible to downgrade. Is arm providing sufficient info > to let QEMU do that ? Not sure if I understand what you mean, but "give me the contents of all id registers, and which registers are writable" should probably do the trick?
On Thu, Dec 12 2024, Eric Auger <eric.auger@redhat.com> wrote: > On 12/12/24 10:36, Cornelia Huck wrote: >> On Thu, Dec 12 2024, Daniel P. Berrangé <berrange@redhat.com> wrote: >> >>> On Thu, Dec 12, 2024 at 09:12:33AM +0100, Eric Auger wrote: >>>> Connie, >>>> >>>> On 12/6/24 12:21, Cornelia Huck wrote: >>>>> A respin/update on the aarch64 KVM cpu models. Also available at >>>>> gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 >>> snip >>> >>>> From a named model point of view, since I do not see much traction >>>> upstream besides Red Hat use cases, targetting ARM spec revision >>>> baselines may be overkill. Personally I would try to focus on above >>>> models: AltraMax, AmpereOne, Grace, ... Or maybe the ARM cores they may >>>> be derived from. >>> If we target modelling of vendor named CPU models, then beware that >>> we're opening the door to an very large set (potentially unbounded) >>> of named CPU models over time. If we target ARM spec baselines then >>> the set of named CPU models is fairly modest and grows slowly. >>> >>> Including ARM spec baselines will probably reduce the demand for >>> adding vendor specific named models, though I expect we'll still >>> end up wanting some, or possibly even many. >>> >>> Having some common baseline models is likely useful for mgmt >>> applications in other ways though. >>> >>> Consider you mgmt app wants to set a CPU model that's common across >>> heterogeneous hardware. They don't neccessarily want/need to be >>> able to live migrate between heterogeneous CPUs, but for simplicity >>> of configuration desire to set a single named CPU across all guests, >>> irrespective of what host hey are launched on. The ARM spec baseline >>> named models would give you that config simplicity. >> If we use architecture extensions (i.e. Armv8.x/9.x) as baseline, I'm >> seeing some drawbacks: >> - a lot of work before we can address some specific use cases >> - old models can get new optional features >> - a specific cpu might have a huge set of optional features on top of >> the baseline model >> >> Using a reference core such as Neoverse-V2 probably makes more sense >> (easier to get started, less feature diff?) It would still make a good >> starting point for a simple config. >> > Actually from a dev point of view I am not sure it changes much to have > either ARM spec rev baseline or CPU ref core named model. > > One remark is that if you look at > https://developer.arm.com/documentation/109697/2024_09?lang=en > you will see there are quite a lot of spec revisions and quite a few of > them are actually meaningful in the light of currently avaiable and > relevant HW we want to address. What I would like to avoid is to be > obliged to look at all of them in a generic manner while we just want to > address few cpu ref models. Yes, exactly. > > Also starting from the ARM spec rev baseline the end-user may need to > add more feature opt-ins to be close to a specific cpu model. So I > foresee extra complexity for the end-user. For ref cores, it's easier to pick the ones that actually matter for a specific use case, for arch exts I don't think we can avoid implementing those we don't really care about. And yes, from the sample of cpus I've looked at they seem to be much closer to a ref core than to an arch ext.
Shameer, On 12/12/24 09:12, Eric Auger wrote: > Connie, > > On 12/6/24 12:21, Cornelia Huck wrote: >> A respin/update on the aarch64 KVM cpu models. Also available at >> gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 >> >> Find Eric's original cover letter below, so that I do not need to >> repeat myself on the aspects that have not changed since RFCv1 :) >> >> Changes from RFCv1: >> >> Rebased on more recent QEMU (some adaptions in the register conversions >> of the first few patches.) >> >> Based on feedback, I have removed the "custom" cpu model; instead, I >> have added the new SYSREG_<REG>_<FIELD> properties to the "host" model. >> This works well if you want to tweak anything that does not correspond >> to the existing properties for the host model; however, if you e.g. >> wanted to tweak sve, you have two ways to do so -- we'd probably either >> want to check for conflicts, or just declare precedence. The kvm-specific >> props remain unchanged, as they are orthogonal to this configuration. >> >> The cpu model expansion for the "host" model now dumps the new SYSREG_ >> properties in addition to the existing host model properties; this is a >> bit ugly, but I don't see a good way on how to split this up. >> >> Some more adaptions due to the removal of the "custom" model. >> >> Things *not* changed from RFCv1: >> >> SYSREG_ property naming (can be tweaked easily, once we are clear on what >> the interface should look like.) >> >> Sysreg generation scripts, and the generated files (I have not updated >> anything there.) I think generating the various definitions makes sense, >> as long as we double-check the generated files on each update (which would >> be something to trigger manually anyway.) >> >> What I would like us to reach some kind of consensus on: >> >> How to continue with the patches moving the ID registers from the isar >> struct into the idregs array. These are a bit of churn to drag along; >> if they make sense, maybe they can be picked independently of this series? >> >> Whether it make sense to continue with the approach of tweaking values in >> the ID registers in general. If we want to be able to migrate between cpus >> that do not differ wildly, we'll encounter differences that cannot be >> expressed via FEAT_xxx -- e.g. when comparing various AmpereAltra Max systems, >> they only differ in parts of CTR_EL0 -- which is not a feature register, but >> a writable register. > In v1 most of the commenters said they would prefer to see FEAT props > instead of IDREG field props. I think we shall try to go in this > direction anyway. As you pointed out there will be some cases where FEAT > won't be enough (CTR_EL0 is a good example). So I tend to think the end > solution will be a mix of FEAT and ID reg field props. > > Personally I would smoothly migrate what we can from ID reg field props > to FEAT props (maybe using prop aliases?), starting from the easiest 1-1 > mappings and then adressing the FEAT that are more complex but are > explictly needed to enable the use cases we are interested in, at RedHat: > migration within Ampere AltraMax family, migration within NVidia Grace > family, migration within AmpereOne family and migration between Graviton3/4. > > We have no info about other's use cases. If some of you want to see some > other live migration combinations addressed, please raise your voice. In relation to [1] you seem to be also interested in the migration between heterogeneous systems with qemu. Do you think targeting migration within a cpu family is enough for your use cases. How different are the source and destination host on your cases. Do you thing feat props are relevant in your case or would you need lower granularity at idreg field levelto pass the migration? [1] [PATCH v3 0/3] KVM: arm64: Errata management for VM Live migration https://lore.kernel.org/all/20241209115311.40496-1-shameerali.kolothum.thodi@huawei.com/ Thank you in advance Eric > Some CSPs may have their own LM solution/requirements but they don't use > qemu. So I think we shall concentrate on those use cases. > > You did the exercise to identify most prevalent patterns for FEAT to > IDREG fields mappings. I think we should now encode this conversion > table for those which are needed in above use cases. > > From a named model point of view, since I do not see much traction > upstream besides Red Hat use cases, targetting ARM spec revision > baselines may be overkill. Personally I would try to focus on above > models: AltraMax, AmpereOne, Grace, ... Or maybe the ARM cores they may > be derived from. According to the discussion we had with Marc in [1] it > seems it does not make sense to target migration between very > heterogeneous machines and Dan said we would prefer to avoid adding > plenty of feat add-ons to a named models. So I would rather be as close > as possible to a specific family definition. > > Thanks > > Eric > > [1] > https://lore.kernel.org/all/c879fda9-db5a-4743-805d-03c0acba8060@redhat.com/#r > >> >> Please take a look, and looking forward to your feedback :) >> >> *********************************************************************** >> >> Title: Introduce a customizable aarch64 KVM host model >> >> This RFC series introduces a KVM host "custom" model. >> >> Since v6.7 kernel, KVM/arm allows the userspace to overwrite the values >> of a subset of ID regs. The list of writable fields continues to grow. >> The feature ID range is defined as the AArch64 System register space >> with op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}. >> >> The custom model uses this capability and allows to tune the host >> passthrough model by overriding some of the host passthrough ID regs. >> >> The end goal is to get more flexibility when migrating guests >> between different machines. We would like the upper software layer >> to be able detect how tunable the vpcu is on both source and destination >> and accordingly define a customized KVM host model that can fit >> both ends. With the legacy host passthrough model, this migration >> use case would fail. >> >> QEMU queries the host kernel to get the list of writable ID reg >> fields and expose all the writable fields as uint64 properties. Those >> are named "SYSREG_<REG>_<FIELD>". REG and FIELD names are those >> described in ARM ARM Reference manual and linux arch/arm64/tools/sysreg. >> Some awk scriptsintroduced in the series help parsing the sysreg file and >> generate some code. those scripts are used in a similar way as >> scripts/update-linux-headers.sh. In case the ABI gets broken, it is >> still possible to manually edit the generated code. However it is >> globally expected the REG and FIELD names are stable. >> >> The list of SYSREG_ID properties can be retrieved through the qmp >> monitor using query-cpu-model-expansion [2]. >> >> The first part of the series mostly consists in migrating id reg >> storage from named fields in ARMISARegisters to anonymous index >> ordered storage in an IdRegMap struct array. The goal is to have >> a generic way to store all id registers, also compatible with the >> way we retrieve their writable capability at kernel level through >> the KVM_ARM_GET_REG_WRITABLE_MASKS ioctl. Having named fields >> prevented us from getting this scalability/genericity. Although the >> change is invasive it is quite straightforward and should be easy >> to be reviewed. >> >> Then the bulk of the job is to retrieve the writable ID fields and >> match them against a "human readable" description of those fields. >> We use awk scripts, derived from kernel arch/arm64/tools/gen-sysreg.awk >> (so all the credit to Mark Rutland) that populates a data structure >> which describes all the ID regs in sysreg and their fields. We match >> writable ID reg fields with those latter and dynamically create a >> uint64 property. >> >> Then we need to extend the list of id regs read from the host >> so that we get a chance to let their value overriden and write them >> back into KVM . >> >> This expectation is that this custom KVM host model can prepare for >> the advent of named models. Introducing named models with reduced >> and explicitly defined features is the next step. >> >> Obviously this series is not able to cope with non writable ID regs. >> For instance the problematic of MIDR/REVIDR setting is not handled >> at the moment. >> >> >> TESTS: >> - with few IDREG fields that can be easily examined from guest >> userspace: >> -cpu custom,SYSREG_ID_AA64ISAR0_EL1_DP=0x0,SYSREG_ID_AA64ISAR1_EL1_DPB=0x0 >> - migration between custom models >> - TCG A57 non regressions. Light testing for TCG though. Deep >> review may detect some mistakes when migrating between named fields >> and IDRegMap storage >> - light testing of introspection. Testing a given writable ID field >> value with query-cpu-model-expansion is not supported yet. >> >> TODO/QUESTIONS: >> - Some idreg named fields are not yet migrated to an array storage. >> some of them are not in isar struct either. Maybe we could have >> handled TCG and KVM separately and it may turn out that this >> conversion is unneeded. So as it is quite cumbersome I prefered >> to keep it for a later stage. >> - the custom model does not come with legacy host properties >> such as SVE, MTE, expecially those that induce some KVM >> settings. This needs to be fixed. >> - The custom model and its exposed properties depend on the host >> capabilities. More and more IDREG become writable meaning that >> the custom model gains more properties over the time and it is >> host linux dependent. At the moment there is no versioning in >> place. By default the custom model is a host passthrough model >> (besides the legacy functions). So if the end-user tries to set >> a field that is not writable from a kernel pov, it will fail. >> Nevertheless a versionned custom model could constrain the props >> exposed, independently on the host linux capabilities. >> - the QEMU layer does not take care of IDREG field value consistency. >> The kernel neither. I imagine this could be the role of the upper >> layer to implement a vcpu profile that makes sure settings are >> consistent. Here we come to "named" models. What should they look >> like on ARM? >> - Implementation details: >> - it seems there are a lot of duplications in >> the code. ID regs are described in different manners, with different >> data structs, for TCG, now for KVM. >> - The IdRegMap->regs is sparsely populated. Maybe a better data >> struct could be used, although this is the one chosen for the kernel >> uapi. >> >> References: >> >> [1] [PATCH v12 00/11] Support writable CPU ID registers from userspace >> https://lore.kernel.org/all/20230609190054.1542113-1-oliver.upton@linux.dev/ >> >> [2] >> qemu-system-aarch64 -qmp unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -M virt --enable-kvm -cpu custom >> scripts/qmp/qmp-shell /home/augere/TEST/QEMU/qmp-sock >> Welcome to the QMP low-level shell! >> Connected to QEMU 9.0.50 >> (QEMU) query-cpu-model-expansion type=full model={"name":"custom"} >> >> [3] >> KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES >> KVM_ARM_GET_REG_WRITABLE_MASKS >> Documentation/virt/kvm/api.rst >> >> [4] linux "sysreg" file >> linux/arch/arm64/tools/sysreg and gen-sysreg.awk >> ./tools/include/generated/asm/sysreg-defs.h >> >> >> Cornelia Huck (3): >> kvm: kvm_get_writable_id_regs >> arm-qmp-cmds: introspection for ID register props >> arm/cpu-features: document ID reg properties >> >> Eric Auger (17): >> arm/cpu: Add sysreg definitions in cpu-sysregs.h >> arm/cpu: Store aa64isar0 into the idregs arrays >> arm/cpu: Store aa64isar1/2 into the idregs array >> arm/cpu: Store aa64drf0/1 into the idregs array >> arm/cpu: Store aa64mmfr0-3 into the idregs array >> arm/cpu: Store aa64drf0/1 into the idregs array >> arm/cpu: Store aa64smfr0 into the idregs array >> arm/cpu: Store id_isar0-7 into the idregs array >> arm/cpu: Store id_mfr0/1 into the idregs array >> arm/cpu: Store id_dfr0/1 into the idregs array >> arm/cpu: Store id_mmfr0-5 into the idregs array >> arm/cpu: Add infra to handle generated ID register definitions >> arm/cpu: Add sysreg generation scripts >> arm/cpu: Add generated files >> arm/kvm: Allow reading all the writable ID registers >> arm/kvm: write back modified ID regs to KVM >> arm/cpu: more customization for the kvm host cpu model >> >> docs/system/arm/cpu-features.rst | 47 +- >> hw/intc/armv7m_nvic.c | 27 +- >> scripts/gen-cpu-sysreg-properties.awk | 325 ++++++++++++ >> scripts/gen-cpu-sysregs-header.awk | 47 ++ >> scripts/update-aarch64-sysreg-code.sh | 27 + >> target/arm/arm-qmp-cmds.c | 19 + >> target/arm/cpu-custom.h | 58 +++ >> target/arm/cpu-features.h | 311 ++++++------ >> target/arm/cpu-sysreg-properties.c | 682 ++++++++++++++++++++++++++ >> target/arm/cpu-sysregs.h | 152 ++++++ >> target/arm/cpu.c | 123 ++--- >> target/arm/cpu.h | 120 +++-- >> target/arm/cpu64.c | 260 +++++++--- >> target/arm/helper.c | 68 +-- >> target/arm/internals.h | 6 +- >> target/arm/kvm.c | 253 +++++++--- >> target/arm/kvm_arm.h | 16 +- >> target/arm/meson.build | 1 + >> target/arm/ptw.c | 6 +- >> target/arm/tcg/cpu-v7m.c | 174 +++---- >> target/arm/tcg/cpu32.c | 320 ++++++------ >> target/arm/tcg/cpu64.c | 460 ++++++++--------- >> target/arm/trace-events | 8 + >> 23 files changed, 2594 insertions(+), 916 deletions(-) >> create mode 100755 scripts/gen-cpu-sysreg-properties.awk >> create mode 100755 scripts/gen-cpu-sysregs-header.awk >> create mode 100755 scripts/update-aarch64-sysreg-code.sh >> create mode 100644 target/arm/cpu-custom.h >> create mode 100644 target/arm/cpu-sysreg-properties.c >> create mode 100644 target/arm/cpu-sysregs.h >> >
Hi Eric, > -----Original Message----- > From: Eric Auger <eauger@redhat.com> > Sent: Thursday, December 12, 2024 8:42 AM > To: eric.auger@redhat.com; Cornelia Huck <cohuck@redhat.com>; > eric.auger.pro@gmail.com; qemu-devel@nongnu.org; qemu- > arm@nongnu.org; kvmarm@lists.linux.dev; peter.maydell@linaro.org; > richard.henderson@linaro.org; alex.bennee@linaro.org; maz@kernel.org; > oliver.upton@linux.dev; sebott@redhat.com; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; armbru@redhat.com; > berrange@redhat.com; abologna@redhat.com; jdenemar@redhat.com > Cc: shahuang@redhat.com; mark.rutland@arm.com; philmd@linaro.org; > pbonzini@redhat.com > Subject: Re: [PATCH RFCv2 00/20] kvm/arm: Introduce a customizable > aarch64 KVM host model > > Shameer, > > On 12/12/24 09:12, Eric Auger wrote: > > Connie, > > > > On 12/6/24 12:21, Cornelia Huck wrote: > >> A respin/update on the aarch64 KVM cpu models. Also available at > >> gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 > >> > >> Find Eric's original cover letter below, so that I do not need to > >> repeat myself on the aspects that have not changed since RFCv1 :) > >> > >> Changes from RFCv1: > >> > >> Rebased on more recent QEMU (some adaptions in the register > conversions > >> of the first few patches.) > >> > >> Based on feedback, I have removed the "custom" cpu model; instead, I > >> have added the new SYSREG_<REG>_<FIELD> properties to the "host" > model. > >> This works well if you want to tweak anything that does not correspond > >> to the existing properties for the host model; however, if you e.g. > >> wanted to tweak sve, you have two ways to do so -- we'd probably either > >> want to check for conflicts, or just declare precedence. The kvm-specific > >> props remain unchanged, as they are orthogonal to this configuration. > >> > >> The cpu model expansion for the "host" model now dumps the new > SYSREG_ > >> properties in addition to the existing host model properties; this is a > >> bit ugly, but I don't see a good way on how to split this up. > >> > >> Some more adaptions due to the removal of the "custom" model. > >> > >> Things *not* changed from RFCv1: > >> > >> SYSREG_ property naming (can be tweaked easily, once we are clear on > what > >> the interface should look like.) > >> > >> Sysreg generation scripts, and the generated files (I have not updated > >> anything there.) I think generating the various definitions makes sense, > >> as long as we double-check the generated files on each update (which > would > >> be something to trigger manually anyway.) > >> > >> What I would like us to reach some kind of consensus on: > >> > >> How to continue with the patches moving the ID registers from the isar > >> struct into the idregs array. These are a bit of churn to drag along; > >> if they make sense, maybe they can be picked independently of this > series? > >> > >> Whether it make sense to continue with the approach of tweaking > values in > >> the ID registers in general. If we want to be able to migrate between > cpus > >> that do not differ wildly, we'll encounter differences that cannot be > >> expressed via FEAT_xxx -- e.g. when comparing various AmpereAltra Max > systems, > >> they only differ in parts of CTR_EL0 -- which is not a feature register, but > >> a writable register. > > In v1 most of the commenters said they would prefer to see FEAT props > > instead of IDREG field props. I think we shall try to go in this > > direction anyway. As you pointed out there will be some cases where > FEAT > > won't be enough (CTR_EL0 is a good example). So I tend to think the end > > solution will be a mix of FEAT and ID reg field props. > > > > Personally I would smoothly migrate what we can from ID reg field props > > to FEAT props (maybe using prop aliases?), starting from the easiest 1-1 > > mappings and then adressing the FEAT that are more complex but are > > explictly needed to enable the use cases we are interested in, at RedHat: > > migration within Ampere AltraMax family, migration within NVidia Grace > > family, migration within AmpereOne family and migration between > Graviton3/4. > > > > We have no info about other's use cases. If some of you want to see some > > other live migration combinations addressed, please raise your voice. > In relation to [1] you seem to be also interested in the migration > between heterogeneous systems with qemu. Yes. That is correct. > Do you think targeting migration within a cpu family is enough for your > use cases. How different are the source and destination host on your > cases. Do you thing feat props are relevant in your case or would you > need lower granularity at idreg field levelto pass the migration? I think, from the current requirement we have for migration, the source and destination mostly can be handled by FEAT_XXX. But like Ampere, we do need to manage the CTR_EL0 differences[1]. Also we do have differences in GIC support as well(AA64PFR0_EL1.GIC) which I am not sure how to manage with FEAT_XXX. And we are checking with our Product team whether we need to support migration from an old CPU type in which case we have to do a bit more analysis. Thanks, Shameer 1. https://lore.kernel.org/kvmarm/20241022073943.35764-1-shameerali.kolothum.thodi@huawei.com/
Hi Shameer, On 12/12/24 14:09, Shameerali Kolothum Thodi wrote: > Hi Eric, > >> -----Original Message----- >> From: Eric Auger <eauger@redhat.com> >> Sent: Thursday, December 12, 2024 8:42 AM >> To: eric.auger@redhat.com; Cornelia Huck <cohuck@redhat.com>; >> eric.auger.pro@gmail.com; qemu-devel@nongnu.org; qemu- >> arm@nongnu.org; kvmarm@lists.linux.dev; peter.maydell@linaro.org; >> richard.henderson@linaro.org; alex.bennee@linaro.org; maz@kernel.org; >> oliver.upton@linux.dev; sebott@redhat.com; Shameerali Kolothum Thodi >> <shameerali.kolothum.thodi@huawei.com>; armbru@redhat.com; >> berrange@redhat.com; abologna@redhat.com; jdenemar@redhat.com >> Cc: shahuang@redhat.com; mark.rutland@arm.com; philmd@linaro.org; >> pbonzini@redhat.com >> Subject: Re: [PATCH RFCv2 00/20] kvm/arm: Introduce a customizable >> aarch64 KVM host model >> >> Shameer, >> >> On 12/12/24 09:12, Eric Auger wrote: >>> Connie, >>> >>> On 12/6/24 12:21, Cornelia Huck wrote: >>>> A respin/update on the aarch64 KVM cpu models. Also available at >>>> gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 >>>> >>>> Find Eric's original cover letter below, so that I do not need to >>>> repeat myself on the aspects that have not changed since RFCv1 :) >>>> >>>> Changes from RFCv1: >>>> >>>> Rebased on more recent QEMU (some adaptions in the register >> conversions >>>> of the first few patches.) >>>> >>>> Based on feedback, I have removed the "custom" cpu model; instead, I >>>> have added the new SYSREG_<REG>_<FIELD> properties to the "host" >> model. >>>> This works well if you want to tweak anything that does not correspond >>>> to the existing properties for the host model; however, if you e.g. >>>> wanted to tweak sve, you have two ways to do so -- we'd probably either >>>> want to check for conflicts, or just declare precedence. The kvm-specific >>>> props remain unchanged, as they are orthogonal to this configuration. >>>> >>>> The cpu model expansion for the "host" model now dumps the new >> SYSREG_ >>>> properties in addition to the existing host model properties; this is a >>>> bit ugly, but I don't see a good way on how to split this up. >>>> >>>> Some more adaptions due to the removal of the "custom" model. >>>> >>>> Things *not* changed from RFCv1: >>>> >>>> SYSREG_ property naming (can be tweaked easily, once we are clear on >> what >>>> the interface should look like.) >>>> >>>> Sysreg generation scripts, and the generated files (I have not updated >>>> anything there.) I think generating the various definitions makes sense, >>>> as long as we double-check the generated files on each update (which >> would >>>> be something to trigger manually anyway.) >>>> >>>> What I would like us to reach some kind of consensus on: >>>> >>>> How to continue with the patches moving the ID registers from the isar >>>> struct into the idregs array. These are a bit of churn to drag along; >>>> if they make sense, maybe they can be picked independently of this >> series? >>>> >>>> Whether it make sense to continue with the approach of tweaking >> values in >>>> the ID registers in general. If we want to be able to migrate between >> cpus >>>> that do not differ wildly, we'll encounter differences that cannot be >>>> expressed via FEAT_xxx -- e.g. when comparing various AmpereAltra Max >> systems, >>>> they only differ in parts of CTR_EL0 -- which is not a feature register, but >>>> a writable register. >>> In v1 most of the commenters said they would prefer to see FEAT props >>> instead of IDREG field props. I think we shall try to go in this >>> direction anyway. As you pointed out there will be some cases where >> FEAT >>> won't be enough (CTR_EL0 is a good example). So I tend to think the end >>> solution will be a mix of FEAT and ID reg field props. >>> >>> Personally I would smoothly migrate what we can from ID reg field props >>> to FEAT props (maybe using prop aliases?), starting from the easiest 1-1 >>> mappings and then adressing the FEAT that are more complex but are >>> explictly needed to enable the use cases we are interested in, at RedHat: >>> migration within Ampere AltraMax family, migration within NVidia Grace >>> family, migration within AmpereOne family and migration between >> Graviton3/4. >>> >>> We have no info about other's use cases. If some of you want to see some >>> other live migration combinations addressed, please raise your voice. >> In relation to [1] you seem to be also interested in the migration >> between heterogeneous systems with qemu. > > Yes. That is correct. > >> Do you think targeting migration within a cpu family is enough for your >> use cases. How different are the source and destination host on your >> cases. Do you thing feat props are relevant in your case or would you >> need lower granularity at idreg field levelto pass the migration? > > I think, from the current requirement we have for migration, the source and > destination mostly can be handled by FEAT_XXX. But like Ampere, we do need > to manage the CTR_EL0 differences[1]. OK > > Also we do have differences in GIC support as well(AA64PFR0_EL1.GIC) which > I am not sure how to manage with FEAT_XXX. interesting. We need to further look at this one. > > And we are checking with our Product team whether we need to support migration > from an old CPU type in which case we have to do a bit more analysis. Sure, please come back to us whenever you get more insights. It will help us define the scope of this upstream work Thanks! Eric > > Thanks, > Shameer > > 1. https://lore.kernel.org/kvmarm/20241022073943.35764-1-shameerali.kolothum.thodi@huawei.com/ > > > > >
Hi Peter, Richard, On 12/6/24 12:21, Cornelia Huck wrote: > A respin/update on the aarch64 KVM cpu models. Also available at > gitlab.com/cohuck/qemu arm-cpu-model-rfcv2 > > Find Eric's original cover letter below, so that I do not need to > repeat myself on the aspects that have not changed since RFCv1 :) > > Changes from RFCv1: > > Rebased on more recent QEMU (some adaptions in the register conversions > of the first few patches.) > > Based on feedback, I have removed the "custom" cpu model; instead, I > have added the new SYSREG_<REG>_<FIELD> properties to the "host" model. > This works well if you want to tweak anything that does not correspond > to the existing properties for the host model; however, if you e.g. > wanted to tweak sve, you have two ways to do so -- we'd probably either > want to check for conflicts, or just declare precedence. The kvm-specific > props remain unchanged, as they are orthogonal to this configuration. > > The cpu model expansion for the "host" model now dumps the new SYSREG_ > properties in addition to the existing host model properties; this is a > bit ugly, but I don't see a good way on how to split this up. > > Some more adaptions due to the removal of the "custom" model. > > Things *not* changed from RFCv1: > > SYSREG_ property naming (can be tweaked easily, once we are clear on what > the interface should look like.) > > Sysreg generation scripts, and the generated files (I have not updated > anything there.) I think generating the various definitions makes sense, > as long as we double-check the generated files on each update (which would > be something to trigger manually anyway.) > > What I would like us to reach some kind of consensus on: > > How to continue with the patches moving the ID registers from the isar > struct into the idregs array. These are a bit of churn to drag along; > if they make sense, maybe they can be picked independently of this series? What is your opinion on that? Patches 2 -12 are dedicated to the move of isar struct's individual idreg fields to an array. To me , this anonymous storage looks more generic, scalable and adapted to the way we retrieve information from KVM. Do you think this is an acceptable reshuffle of the code and can we envision to push those changes in a separate prerequisite series that would avoid us to rebase everytime until we get a solution for named vcpu models? Thanks Eric > > Whether it make sense to continue with the approach of tweaking values in > the ID registers in general. If we want to be able to migrate between cpus > that do not differ wildly, we'll encounter differences that cannot be > expressed via FEAT_xxx -- e.g. when comparing various AmpereAltra Max systems, > they only differ in parts of CTR_EL0 -- which is not a feature register, but > a writable register. > > Please take a look, and looking forward to your feedback :) > > *********************************************************************** > > Title: Introduce a customizable aarch64 KVM host model > > This RFC series introduces a KVM host "custom" model. > > Since v6.7 kernel, KVM/arm allows the userspace to overwrite the values > of a subset of ID regs. The list of writable fields continues to grow. > The feature ID range is defined as the AArch64 System register space > with op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}. > > The custom model uses this capability and allows to tune the host > passthrough model by overriding some of the host passthrough ID regs. > > The end goal is to get more flexibility when migrating guests > between different machines. We would like the upper software layer > to be able detect how tunable the vpcu is on both source and destination > and accordingly define a customized KVM host model that can fit > both ends. With the legacy host passthrough model, this migration > use case would fail. > > QEMU queries the host kernel to get the list of writable ID reg > fields and expose all the writable fields as uint64 properties. Those > are named "SYSREG_<REG>_<FIELD>". REG and FIELD names are those > described in ARM ARM Reference manual and linux arch/arm64/tools/sysreg. > Some awk scriptsintroduced in the series help parsing the sysreg file and > generate some code. those scripts are used in a similar way as > scripts/update-linux-headers.sh. In case the ABI gets broken, it is > still possible to manually edit the generated code. However it is > globally expected the REG and FIELD names are stable. > > The list of SYSREG_ID properties can be retrieved through the qmp > monitor using query-cpu-model-expansion [2]. > > The first part of the series mostly consists in migrating id reg > storage from named fields in ARMISARegisters to anonymous index > ordered storage in an IdRegMap struct array. The goal is to have > a generic way to store all id registers, also compatible with the > way we retrieve their writable capability at kernel level through > the KVM_ARM_GET_REG_WRITABLE_MASKS ioctl. Having named fields > prevented us from getting this scalability/genericity. Although the > change is invasive it is quite straightforward and should be easy > to be reviewed. > > Then the bulk of the job is to retrieve the writable ID fields and > match them against a "human readable" description of those fields. > We use awk scripts, derived from kernel arch/arm64/tools/gen-sysreg.awk > (so all the credit to Mark Rutland) that populates a data structure > which describes all the ID regs in sysreg and their fields. We match > writable ID reg fields with those latter and dynamically create a > uint64 property. > > Then we need to extend the list of id regs read from the host > so that we get a chance to let their value overriden and write them > back into KVM . > > This expectation is that this custom KVM host model can prepare for > the advent of named models. Introducing named models with reduced > and explicitly defined features is the next step. > > Obviously this series is not able to cope with non writable ID regs. > For instance the problematic of MIDR/REVIDR setting is not handled > at the moment. > > > TESTS: > - with few IDREG fields that can be easily examined from guest > userspace: > -cpu custom,SYSREG_ID_AA64ISAR0_EL1_DP=0x0,SYSREG_ID_AA64ISAR1_EL1_DPB=0x0 > - migration between custom models > - TCG A57 non regressions. Light testing for TCG though. Deep > review may detect some mistakes when migrating between named fields > and IDRegMap storage > - light testing of introspection. Testing a given writable ID field > value with query-cpu-model-expansion is not supported yet. > > TODO/QUESTIONS: > - Some idreg named fields are not yet migrated to an array storage. > some of them are not in isar struct either. Maybe we could have > handled TCG and KVM separately and it may turn out that this > conversion is unneeded. So as it is quite cumbersome I prefered > to keep it for a later stage. > - the custom model does not come with legacy host properties > such as SVE, MTE, expecially those that induce some KVM > settings. This needs to be fixed. > - The custom model and its exposed properties depend on the host > capabilities. More and more IDREG become writable meaning that > the custom model gains more properties over the time and it is > host linux dependent. At the moment there is no versioning in > place. By default the custom model is a host passthrough model > (besides the legacy functions). So if the end-user tries to set > a field that is not writable from a kernel pov, it will fail. > Nevertheless a versionned custom model could constrain the props > exposed, independently on the host linux capabilities. > - the QEMU layer does not take care of IDREG field value consistency. > The kernel neither. I imagine this could be the role of the upper > layer to implement a vcpu profile that makes sure settings are > consistent. Here we come to "named" models. What should they look > like on ARM? > - Implementation details: > - it seems there are a lot of duplications in > the code. ID regs are described in different manners, with different > data structs, for TCG, now for KVM. > - The IdRegMap->regs is sparsely populated. Maybe a better data > struct could be used, although this is the one chosen for the kernel > uapi. > > References: > > [1] [PATCH v12 00/11] Support writable CPU ID registers from userspace > https://lore.kernel.org/all/20230609190054.1542113-1-oliver.upton@linux.dev/ > > [2] > qemu-system-aarch64 -qmp unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -M virt --enable-kvm -cpu custom > scripts/qmp/qmp-shell /home/augere/TEST/QEMU/qmp-sock > Welcome to the QMP low-level shell! > Connected to QEMU 9.0.50 > (QEMU) query-cpu-model-expansion type=full model={"name":"custom"} > > [3] > KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES > KVM_ARM_GET_REG_WRITABLE_MASKS > Documentation/virt/kvm/api.rst > > [4] linux "sysreg" file > linux/arch/arm64/tools/sysreg and gen-sysreg.awk > ./tools/include/generated/asm/sysreg-defs.h > > > Cornelia Huck (3): > kvm: kvm_get_writable_id_regs > arm-qmp-cmds: introspection for ID register props > arm/cpu-features: document ID reg properties > > Eric Auger (17): > arm/cpu: Add sysreg definitions in cpu-sysregs.h > arm/cpu: Store aa64isar0 into the idregs arrays > arm/cpu: Store aa64isar1/2 into the idregs array > arm/cpu: Store aa64drf0/1 into the idregs array > arm/cpu: Store aa64mmfr0-3 into the idregs array > arm/cpu: Store aa64drf0/1 into the idregs array > arm/cpu: Store aa64smfr0 into the idregs array > arm/cpu: Store id_isar0-7 into the idregs array > arm/cpu: Store id_mfr0/1 into the idregs array > arm/cpu: Store id_dfr0/1 into the idregs array > arm/cpu: Store id_mmfr0-5 into the idregs array > arm/cpu: Add infra to handle generated ID register definitions > arm/cpu: Add sysreg generation scripts > arm/cpu: Add generated files > arm/kvm: Allow reading all the writable ID registers > arm/kvm: write back modified ID regs to KVM > arm/cpu: more customization for the kvm host cpu model > > docs/system/arm/cpu-features.rst | 47 +- > hw/intc/armv7m_nvic.c | 27 +- > scripts/gen-cpu-sysreg-properties.awk | 325 ++++++++++++ > scripts/gen-cpu-sysregs-header.awk | 47 ++ > scripts/update-aarch64-sysreg-code.sh | 27 + > target/arm/arm-qmp-cmds.c | 19 + > target/arm/cpu-custom.h | 58 +++ > target/arm/cpu-features.h | 311 ++++++------ > target/arm/cpu-sysreg-properties.c | 682 ++++++++++++++++++++++++++ > target/arm/cpu-sysregs.h | 152 ++++++ > target/arm/cpu.c | 123 ++--- > target/arm/cpu.h | 120 +++-- > target/arm/cpu64.c | 260 +++++++--- > target/arm/helper.c | 68 +-- > target/arm/internals.h | 6 +- > target/arm/kvm.c | 253 +++++++--- > target/arm/kvm_arm.h | 16 +- > target/arm/meson.build | 1 + > target/arm/ptw.c | 6 +- > target/arm/tcg/cpu-v7m.c | 174 +++---- > target/arm/tcg/cpu32.c | 320 ++++++------ > target/arm/tcg/cpu64.c | 460 ++++++++--------- > target/arm/trace-events | 8 + > 23 files changed, 2594 insertions(+), 916 deletions(-) > create mode 100755 scripts/gen-cpu-sysreg-properties.awk > create mode 100755 scripts/gen-cpu-sysregs-header.awk > create mode 100755 scripts/update-aarch64-sysreg-code.sh > create mode 100644 target/arm/cpu-custom.h > create mode 100644 target/arm/cpu-sysreg-properties.c > create mode 100644 target/arm/cpu-sysregs.h >
© 2016 - 2025 Red Hat, Inc.