Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in place. 1: disable RDSEED on Fam17 model 47 stepping 0 2: disable RDSEED on most of Zen5 Jan
On 28/10/2025 3:32 pm, Jan Beulich wrote: > Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in > place. > > 1: disable RDSEED on Fam17 model 47 stepping 0 > 2: disable RDSEED on most of Zen5 We have two existing cases for RDRAND issues in Xen: 1) IvyBridge SRBDS speculative vulnerability. Here, the RNG is good, but use of the RDRAND instruction can allow another entity on the system to observe the random number. RDRAND is off by default, but can be opted in to. 2) AMD Fam15/16h Laptop. Here, the RNG is fine, except after S3 on one single OEM. Use of RDRAND can be activated on the command line, but there's no ability for individual VMs to opt in. Being a laptop, migration isn't a major concern. For this seres about RDSEED, we've got: 1) Cyan Skillfish, the PlayStation 5 CPU but also in one crypto-mining rig. Here, RDSEED is deterministically broken and not getting a fix. The chances of Xen running on these systems is almost 0. We should turn off RDSEED and be done with it; it's not interesting in the slightest to be able to turn back on. 2) Zen5. Here, RDSEED gives a higher-than-expected rate of 0's for only the 32bit and 16bit forms; the 64bit form is unaffected. There is microcode to fix it, on server at least. Firmware fixes for client are rather further away. 64bit OSes are likely fine (using the 64bit instruction form). Some Linux devs think that Linux would be safe even using the 32bit form, if it really only has a 10% zeroes rate. There is certainly a risk that software uses the 32b/16b forms, and not mix it properly with other entropy, but the common case these days (64b) works just fine. This means that blanket-disabling does more harm than good. This case does really want to be off by default (given no microcode), but able to be opted in to. At least one major class of OSes (Linux) are safe despite the issue. ~Andrew
On 03.11.2025 15:10, Andrew Cooper wrote: > On 28/10/2025 3:32 pm, Jan Beulich wrote: >> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in >> place. >> >> 1: disable RDSEED on Fam17 model 47 stepping 0 >> 2: disable RDSEED on most of Zen5 > > We have two existing cases for RDRAND issues in Xen: > > 1) IvyBridge SRBDS speculative vulnerability. Here, the RNG is good, > but use of the RDRAND instruction can allow another entity on the system > to observe the random number. RDRAND is off by default, but can be > opted in to. > > 2) AMD Fam15/16h Laptop. Here, the RNG is fine, except after S3 on one > single OEM. Use of RDRAND can be activated on the command line, but > there's no ability for individual VMs to opt in. Being a laptop, > migration isn't a major concern. > > > For this seres about RDSEED, we've got: > > 1) Cyan Skillfish, the PlayStation 5 CPU but also in one crypto-mining > rig. Here, RDSEED is deterministically broken and not getting a fix. > > The chances of Xen running on these systems is almost 0. We should turn > off RDSEED and be done with it; it's not interesting in the slightest to > be able to turn back on. I disagree to some degree, but the code to allow re-enabling can certainly be moved to the other patch. I don't view it as wrong to have it in the 1st patch, though. > 2) Zen5. Here, RDSEED gives a higher-than-expected rate of 0's for only > the 32bit and 16bit forms; the 64bit form is unaffected. > > There is microcode to fix it, on server at least. Firmware fixes for > client are rather further away. 64bit OSes are likely fine (using the > 64bit instruction form). Some Linux devs think that Linux would be safe > even using the 32bit form, if it really only has a 10% zeroes rate. 10% is a lot. IOW I find this dubious. > There is certainly a risk that software uses the 32b/16b forms, and not > mix it properly with other entropy, but the common case these days (64b) > works just fine. This means that blanket-disabling does more harm than > good. That's guesswork. I don't see why 64-bit OSes should be expected to prefer the 64-bit form over the 32-bit one. In fact, if one only needs 32 bits of entropy, why would one even try to get 64? That's wasting a potentially precious resource. Furthermore mind me mentioning (again) that 32-bit OSes (including 32-bit environments that may be active during boot) have no way of using the 64- bit form? Jan
On Tue, Oct 28, 2025 at 04:32:17PM +0100, Jan Beulich wrote: > Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in > place. > > 1: disable RDSEED on Fam17 model 47 stepping 0 > 2: disable RDSEED on most of Zen5 For both patches: don't we need to set the feature in the max policy to allow for incoming migrations of guests that have already seen the feature? Thanks, Roger.
On 31.10.2025 11:22, Roger Pau Monné wrote: > On Tue, Oct 28, 2025 at 04:32:17PM +0100, Jan Beulich wrote: >> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in >> place. >> >> 1: disable RDSEED on Fam17 model 47 stepping 0 >> 2: disable RDSEED on most of Zen5 > > For both patches: don't we need to set the feature in the max policy > to allow for incoming migrations of guests that have already seen the > feature? No, such guests should not run on affected hosts (unless overrides are in place), or else they'd face sudden malfunction of RDSEED. If an override was in place on the source host, an override will also need to be put in place on the destination one. Jan
On Fri, Oct 31, 2025 at 11:29:44AM +0100, Jan Beulich wrote: > On 31.10.2025 11:22, Roger Pau Monné wrote: > > On Tue, Oct 28, 2025 at 04:32:17PM +0100, Jan Beulich wrote: > >> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in > >> place. > >> > >> 1: disable RDSEED on Fam17 model 47 stepping 0 > >> 2: disable RDSEED on most of Zen5 > > > > For both patches: don't we need to set the feature in the max policy > > to allow for incoming migrations of guests that have already seen the > > feature? > > No, such guests should not run on affected hosts (unless overrides are in place), > or else they'd face sudden malfunction of RDSEED. If an override was in place on > the source host, an override will also need to be put in place on the destination > one. But they may be malfunctioning before already, if started on a vulnerable hosts without this fix and having seen RDSEED? IMO after this fix is applied you should do pool leveling, at which point RDSEED shouldn't be advertised anymore. Having the feature in the max policy allows to evacuate running guests while updating the pool. Otherwise those existing guests would be stuck to run on non-updated hosts. Thanks, Roger.
On 31.10.2025 11:54, Roger Pau Monné wrote: > On Fri, Oct 31, 2025 at 11:29:44AM +0100, Jan Beulich wrote: >> On 31.10.2025 11:22, Roger Pau Monné wrote: >>> On Tue, Oct 28, 2025 at 04:32:17PM +0100, Jan Beulich wrote: >>>> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in >>>> place. >>>> >>>> 1: disable RDSEED on Fam17 model 47 stepping 0 >>>> 2: disable RDSEED on most of Zen5 >>> >>> For both patches: don't we need to set the feature in the max policy >>> to allow for incoming migrations of guests that have already seen the >>> feature? >> >> No, such guests should not run on affected hosts (unless overrides are in place), >> or else they'd face sudden malfunction of RDSEED. If an override was in place on >> the source host, an override will also need to be put in place on the destination >> one. > > But they may be malfunctioning before already, if started on a > vulnerable hosts without this fix and having seen RDSEED? Yes. But there could also be ones coming from good hosts. Imo ... > IMO after this fix is applied you should do pool leveling, at which > point RDSEED shouldn't be advertised anymore. Having the feature in > the max policy allows to evacuate running guests while updating the > pool. Otherwise those existing guests would be stuck to run on > non-updated hosts. ... we need to err on the side of caution. Jan
On Fri, Oct 31, 2025 at 12:47:51PM +0100, Jan Beulich wrote: > On 31.10.2025 11:54, Roger Pau Monné wrote: > > On Fri, Oct 31, 2025 at 11:29:44AM +0100, Jan Beulich wrote: > >> On 31.10.2025 11:22, Roger Pau Monné wrote: > >>> On Tue, Oct 28, 2025 at 04:32:17PM +0100, Jan Beulich wrote: > >>>> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in > >>>> place. > >>>> > >>>> 1: disable RDSEED on Fam17 model 47 stepping 0 > >>>> 2: disable RDSEED on most of Zen5 > >>> > >>> For both patches: don't we need to set the feature in the max policy > >>> to allow for incoming migrations of guests that have already seen the > >>> feature? > >> > >> No, such guests should not run on affected hosts (unless overrides are in place), > >> or else they'd face sudden malfunction of RDSEED. If an override was in place on > >> the source host, an override will also need to be put in place on the destination > >> one. > > > > But they may be malfunctioning before already, if started on a > > vulnerable hosts without this fix and having seen RDSEED? > > Yes. But there could also be ones coming from good hosts. Imo ... > > > IMO after this fix is applied you should do pool leveling, at which > > point RDSEED shouldn't be advertised anymore. Having the feature in > > the max policy allows to evacuate running guests while updating the > > pool. Otherwise those existing guests would be stuck to run on > > non-updated hosts. > > ... we need to err on the side of caution. While I understand your concerns, this would cause failures in the upgrade and migration model used by both XCP-ng and XenServer at least, as it could prevent eviction of running VMs to updated hosts. At a minimum we would need an option to allow the feature to be set on the max policy. Overall I think safety of migration (in this specific regard) should be enforced by the toolstack (or orchestration layer), rather than the hypervisor itself. The hypervisor can reject incompatible policies, but should leave the rest of the decisions to higher layers as it doesn't have enough knowledge. Thanks, Roger.
On 31.10.2025 13:14, Roger Pau Monné wrote: > On Fri, Oct 31, 2025 at 12:47:51PM +0100, Jan Beulich wrote: >> On 31.10.2025 11:54, Roger Pau Monné wrote: >>> On Fri, Oct 31, 2025 at 11:29:44AM +0100, Jan Beulich wrote: >>>> On 31.10.2025 11:22, Roger Pau Monné wrote: >>>>> On Tue, Oct 28, 2025 at 04:32:17PM +0100, Jan Beulich wrote: >>>>>> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in >>>>>> place. >>>>>> >>>>>> 1: disable RDSEED on Fam17 model 47 stepping 0 >>>>>> 2: disable RDSEED on most of Zen5 >>>>> >>>>> For both patches: don't we need to set the feature in the max policy >>>>> to allow for incoming migrations of guests that have already seen the >>>>> feature? >>>> >>>> No, such guests should not run on affected hosts (unless overrides are in place), >>>> or else they'd face sudden malfunction of RDSEED. If an override was in place on >>>> the source host, an override will also need to be put in place on the destination >>>> one. >>> >>> But they may be malfunctioning before already, if started on a >>> vulnerable hosts without this fix and having seen RDSEED? >> >> Yes. But there could also be ones coming from good hosts. Imo ... >> >>> IMO after this fix is applied you should do pool leveling, at which >>> point RDSEED shouldn't be advertised anymore. Having the feature in >>> the max policy allows to evacuate running guests while updating the >>> pool. Otherwise those existing guests would be stuck to run on >>> non-updated hosts. >> >> ... we need to err on the side of caution. > > While I understand your concerns, this would cause failures in the > upgrade and migration model used by both XCP-ng and XenServer at > least, as it could prevent eviction of running VMs to updated hosts. > > At a minimum we would need an option to allow the feature to be set on > the max policy. That's where the 3rd patch comes into play. "cpuid=rdseed" is the respective override. Just that it doesn't work correctly without that further patch. > Overall I think safety of migration (in this specific > regard) should be enforced by the toolstack (or orchestration layer), > rather than the hypervisor itself. The hypervisor can reject > incompatible policies, but should leave the rest of the decisions to > higher layers as it doesn't have enough knowledge. But without rendering guests vulnerable behind the admin's back. Jan
On Fri, Oct 31, 2025 at 01:34:55PM +0100, Jan Beulich wrote: > On 31.10.2025 13:14, Roger Pau Monné wrote: > > On Fri, Oct 31, 2025 at 12:47:51PM +0100, Jan Beulich wrote: > >> On 31.10.2025 11:54, Roger Pau Monné wrote: > >>> On Fri, Oct 31, 2025 at 11:29:44AM +0100, Jan Beulich wrote: > >>>> On 31.10.2025 11:22, Roger Pau Monné wrote: > >>>>> On Tue, Oct 28, 2025 at 04:32:17PM +0100, Jan Beulich wrote: > >>>>>> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in > >>>>>> place. > >>>>>> > >>>>>> 1: disable RDSEED on Fam17 model 47 stepping 0 > >>>>>> 2: disable RDSEED on most of Zen5 > >>>>> > >>>>> For both patches: don't we need to set the feature in the max policy > >>>>> to allow for incoming migrations of guests that have already seen the > >>>>> feature? > >>>> > >>>> No, such guests should not run on affected hosts (unless overrides are in place), > >>>> or else they'd face sudden malfunction of RDSEED. If an override was in place on > >>>> the source host, an override will also need to be put in place on the destination > >>>> one. > >>> > >>> But they may be malfunctioning before already, if started on a > >>> vulnerable hosts without this fix and having seen RDSEED? > >> > >> Yes. But there could also be ones coming from good hosts. Imo ... > >> > >>> IMO after this fix is applied you should do pool leveling, at which > >>> point RDSEED shouldn't be advertised anymore. Having the feature in > >>> the max policy allows to evacuate running guests while updating the > >>> pool. Otherwise those existing guests would be stuck to run on > >>> non-updated hosts. > >> > >> ... we need to err on the side of caution. > > > > While I understand your concerns, this would cause failures in the > > upgrade and migration model used by both XCP-ng and XenServer at > > least, as it could prevent eviction of running VMs to updated hosts. > > > > At a minimum we would need an option to allow the feature to be set on > > the max policy. > > That's where the 3rd patch comes into play. "cpuid=rdseed" is the respective > override. Just that it doesn't work correctly without that further patch. Won't using "cpuid=rdseed" in the Xen command line result in RDSEED getting exposed in the default policy also, which we want to avoid? Or am I getting confused on where "cpuid=rdseed" should be used? > > Overall I think safety of migration (in this specific > > regard) should be enforced by the toolstack (or orchestration layer), > > rather than the hypervisor itself. The hypervisor can reject > > incompatible policies, but should leave the rest of the decisions to > > higher layers as it doesn't have enough knowledge. > > But without rendering guests vulnerable behind the admin's back. I think that's part of the logic that should be implemented by the orchestration layer, simply because it has all the data to make an informed decision. IMO it won't be behind the admin's back, or else it's a bug in the higher layer toolstack. Not putting rdseed in the max policy completely blocks the upgrade path, even when a toolstack is possibly making the right informed decisions. I guess I need to see that 3rd patch. Thanks, Roger.
On 31.10.2025 14:15, Roger Pau Monné wrote: > Not putting rdseed in the max policy completely blocks the upgrade > path, even when a toolstack is possibly making the right informed > decisions. Why would that be? To evacuate guests, one would force-enable RDSEED on an affected host. After updating of the original host (incl fixed ucode), migrating back will be fine. The admin will thus be fully aware of where guests run unsafely, while no un-safety is going to be introduced behind the back of the admin and/or any guest. Jan
On 31.10.2025 14:15, Roger Pau Monné wrote: > On Fri, Oct 31, 2025 at 01:34:55PM +0100, Jan Beulich wrote: >> On 31.10.2025 13:14, Roger Pau Monné wrote: >>> On Fri, Oct 31, 2025 at 12:47:51PM +0100, Jan Beulich wrote: >>>> On 31.10.2025 11:54, Roger Pau Monné wrote: >>>>> On Fri, Oct 31, 2025 at 11:29:44AM +0100, Jan Beulich wrote: >>>>>> On 31.10.2025 11:22, Roger Pau Monné wrote: >>>>>>> On Tue, Oct 28, 2025 at 04:32:17PM +0100, Jan Beulich wrote: >>>>>>>> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in >>>>>>>> place. >>>>>>>> >>>>>>>> 1: disable RDSEED on Fam17 model 47 stepping 0 >>>>>>>> 2: disable RDSEED on most of Zen5 >>>>>>> >>>>>>> For both patches: don't we need to set the feature in the max policy >>>>>>> to allow for incoming migrations of guests that have already seen the >>>>>>> feature? >>>>>> >>>>>> No, such guests should not run on affected hosts (unless overrides are in place), >>>>>> or else they'd face sudden malfunction of RDSEED. If an override was in place on >>>>>> the source host, an override will also need to be put in place on the destination >>>>>> one. >>>>> >>>>> But they may be malfunctioning before already, if started on a >>>>> vulnerable hosts without this fix and having seen RDSEED? >>>> >>>> Yes. But there could also be ones coming from good hosts. Imo ... >>>> >>>>> IMO after this fix is applied you should do pool leveling, at which >>>>> point RDSEED shouldn't be advertised anymore. Having the feature in >>>>> the max policy allows to evacuate running guests while updating the >>>>> pool. Otherwise those existing guests would be stuck to run on >>>>> non-updated hosts. >>>> >>>> ... we need to err on the side of caution. >>> >>> While I understand your concerns, this would cause failures in the >>> upgrade and migration model used by both XCP-ng and XenServer at >>> least, as it could prevent eviction of running VMs to updated hosts. >>> >>> At a minimum we would need an option to allow the feature to be set on >>> the max policy. >> >> That's where the 3rd patch comes into play. "cpuid=rdseed" is the respective >> override. Just that it doesn't work correctly without that further patch. > > Won't using "cpuid=rdseed" in the Xen command line result in RDSEED > getting exposed in the default policy also, which we want to avoid? > > Or am I getting confused on where "cpuid=rdseed" should be used? No, there's no way here to get max but not default. >>> Overall I think safety of migration (in this specific >>> regard) should be enforced by the toolstack (or orchestration layer), >>> rather than the hypervisor itself. The hypervisor can reject >>> incompatible policies, but should leave the rest of the decisions to >>> higher layers as it doesn't have enough knowledge. >> >> But without rendering guests vulnerable behind the admin's back. > > I think that's part of the logic that should be implemented by the > orchestration layer, simply because it has all the data to make an > informed decision. IMO it won't be behind the admin's back, or else > it's a bug in the higher layer toolstack. I fear I simply don't see aspects like this to be exposed to a toolstack. We didn't for RDRAND. > Not putting rdseed in the max policy completely blocks the upgrade > path, even when a toolstack is possibly making the right informed > decisions. > > I guess I need to see that 3rd patch. https://lists.xen.org/archives/html/xen-devel/2025-08/msg00113.html Jan
On 10/28/25 4:32 PM, Jan Beulich wrote: > Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in > place. > > 1: disable RDSEED on Fam17 model 47 stepping 0 > 2: disable RDSEED on most of Zen5 Both patches LGTM to be in 4.21: Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> Thanks. ~ Oleksii
On 31.10.2025 10:31, Oleksii Kurochko wrote: > > On 10/28/25 4:32 PM, Jan Beulich wrote: >> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in >> place. >> >> 1: disable RDSEED on Fam17 model 47 stepping 0 >> 2: disable RDSEED on most of Zen5 > > Both patches LGTM to be in 4.21: > Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> Thanks, yet: What about the 3rd patch mentioned in the text above? Jan
On 10/31/25 10:34 AM, Jan Beulich wrote: > On 31.10.2025 10:31, Oleksii Kurochko wrote: >> On 10/28/25 4:32 PM, Jan Beulich wrote: >>> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in >>> place. >>> >>> 1: disable RDSEED on Fam17 model 47 stepping 0 >>> 2: disable RDSEED on most of Zen5 >> Both patches LGTM to be in 4.21: >> Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> > Thanks, yet: What about the 3rd patch mentioned in the text above? For 3rd patch, also: Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> Thanks. ~ Oleksii
© 2016 - 2026 Red Hat, Inc.