xen/arch/x86/Kconfig.cpu | 15 +++++++++++++++ xen/arch/x86/cpu/mcheck/amd_nonfatal.c | 3 ++- xen/arch/x86/time.c | 4 ++++ 3 files changed, 21 insertions(+), 1 deletion(-)
Hi all,
This short patch series improves Xen real-time execution on AMD x86
processors.
The key to real-time performance is deterministic guest execution times
and deterministic guest interrupt latency. In such configurations, the
null scheduler is typically used, and there should be no IPIs or other
sources of vCPU execution interruptions beyond the guest timer interrupt
as configured by the guest, and any passthrough interrupts for
passthrough devices.
This is because, upon receiving a critical interrupt, the guest (such as
FreeRTOS or Zephyr) typically has a very short window of time to
complete the required action. Being interrupted in the middle of this
critical section could prevent the guest from completing the action
within the allotted time, leading to malfunctions.
To address this, the patch series disables IPIs that could potentially
affect the real-time domain.
Cheers,
Stefano
Stefano Stabellini (2):
      xen/x86: don't send IPI to sync TSC when it is reliable
      xen/x86: introduce AMD_MCE_NONFATAL
 xen/arch/x86/Kconfig.cpu               | 15 +++++++++++++++
 xen/arch/x86/cpu/mcheck/amd_nonfatal.c |  3 ++-
 xen/arch/x86/time.c                    |  4 ++++
 3 files changed, 21 insertions(+), 1 deletion(-)
                
            On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote: > Hi all, > > This short patch series improves Xen real-time execution on AMD x86 > processors. > > The key to real-time performance is deterministic guest execution times > and deterministic guest interrupt latency. In such configurations, the > null scheduler is typically used, and there should be no IPIs or other > sources of vCPU execution interruptions beyond the guest timer interrupt > as configured by the guest, and any passthrough interrupts for > passthrough devices. > > This is because, upon receiving a critical interrupt, the guest (such as > FreeRTOS or Zephyr) typically has a very short window of time to > complete the required action. Being interrupted in the middle of this > critical section could prevent the guest from completing the action > within the allotted time, leading to malfunctions. There's IMO still one pending issue after this series on x86, maybe you have addressed this with some local patch. Interrupt forwarding from Xen into HVM/PVH guests uses a softirq to do the injection, which means there's a non-deterministic window of latency between when the interrupt is received by Xen, as to when it's injected to the guest, because the softirq might not get processed right after being set as pending (there might be other softirqs to process, or simply Xen might be busy doing some other operation). I think you want to look into adding a new command line option or similar, that allows selecting whether guest IRQs are deferred to a softirq for injection, or are injected as part of the processing done in the IRQ handler itself. Otherwise there will always be a non-deterministic amount of latency on x86 w.r.t. HVM/PVH passthrough guest interrupts. Haven't you seen some weird/unexpected variance when doing this passthrough interrupt latency measurements on x86? Regards, Roger.
On 08.07.2025 12:11, Roger Pau Monné wrote: > On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote: >> Hi all, >> >> This short patch series improves Xen real-time execution on AMD x86 >> processors. >> >> The key to real-time performance is deterministic guest execution times >> and deterministic guest interrupt latency. In such configurations, the >> null scheduler is typically used, and there should be no IPIs or other >> sources of vCPU execution interruptions beyond the guest timer interrupt >> as configured by the guest, and any passthrough interrupts for >> passthrough devices. >> >> This is because, upon receiving a critical interrupt, the guest (such as >> FreeRTOS or Zephyr) typically has a very short window of time to >> complete the required action. Being interrupted in the middle of this >> critical section could prevent the guest from completing the action >> within the allotted time, leading to malfunctions. > > There's IMO still one pending issue after this series on x86, maybe > you have addressed this with some local patch. Not just one, I think. We use IPIs for other purposes as well. The way I read the text above, all of them are a (potential) problem. Jan > Interrupt forwarding > from Xen into HVM/PVH guests uses a softirq to do the injection, which > means there's a non-deterministic window of latency between when the > interrupt is received by Xen, as to when it's injected to the guest, > because the softirq might not get processed right after being set as > pending (there might be other softirqs to process, or simply Xen might > be busy doing some other operation). > > I think you want to look into adding a new command line option or > similar, that allows selecting whether guest IRQs are deferred to a > softirq for injection, or are injected as part of the processing done > in the IRQ handler itself. > > Otherwise there will always be a non-deterministic amount of latency > on x86 w.r.t. HVM/PVH passthrough guest interrupts. Haven't you seen > some weird/unexpected variance when doing this passthrough interrupt > latency measurements on x86? > > Regards, Roger.
On Tue, 8 Jul 2025, Jan Beulich wrote: > On 08.07.2025 12:11, Roger Pau Monné wrote: > > On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote: > >> Hi all, > >> > >> This short patch series improves Xen real-time execution on AMD x86 > >> processors. > >> > >> The key to real-time performance is deterministic guest execution times > >> and deterministic guest interrupt latency. In such configurations, the > >> null scheduler is typically used, and there should be no IPIs or other > >> sources of vCPU execution interruptions beyond the guest timer interrupt > >> as configured by the guest, and any passthrough interrupts for > >> passthrough devices. > >> > >> This is because, upon receiving a critical interrupt, the guest (such as > >> FreeRTOS or Zephyr) typically has a very short window of time to > >> complete the required action. Being interrupted in the middle of this > >> critical section could prevent the guest from completing the action > >> within the allotted time, leading to malfunctions. > > > > There's IMO still one pending issue after this series on x86, maybe > > you have addressed this with some local patch. > > Not just one, I think. We use IPIs for other purposes as well. The way > I read the text above, all of them are a (potential) problem. Yes, all of them are potentially a problem. If you know of any other IPI, please let me know and I'll try to remove them. One of my goals posting this series was to raise awareness on this issue and attempting to fix it with your help. It is not just IPIs, also Xen timers and other things that could cause the guest to trap into Xen without the guest knowledge. Typically IPIs are the worst offenders in my experience. On ARM, I have done several experiments where, after the system is configured correctly, I can see that if the RTOS does nothing, there are no traps in Xen on the RTOS vCPU/pCPU for seconds. As I tried to describe in the email, typically the real time application, which tends to be based on an RTOS like FreeRTOS or Zephyr (think of them like Unikernels), has a very small window of time from receiving an interrupt to accomplish a critical task. Nothing should be disturbing the execution of the RTOS during the critical window. The operation the RTOS needs to perform is typically on a passthrough device without Xen interactions. In general from the hypervisor point of view, the idea is that Xen should inject the interrupt and then leave the RTOS alone and undisturbed to do its job. > > Interrupt forwarding > > from Xen into HVM/PVH guests uses a softirq to do the injection, which > > means there's a non-deterministic window of latency between when the > > interrupt is received by Xen, as to when it's injected to the guest, > > because the softirq might not get processed right after being set as > > pending (there might be other softirqs to process, or simply Xen might > > be busy doing some other operation). > > > > I think you want to look into adding a new command line option or > > similar, that allows selecting whether guest IRQs are deferred to a > > softirq for injection, or are injected as part of the processing done > > in the IRQ handler itself. > > > > Otherwise there will always be a non-deterministic amount of latency > > on x86 w.r.t. HVM/PVH passthrough guest interrupts. Haven't you seen > > some weird/unexpected variance when doing this passthrough interrupt > > latency measurements on x86? While this is not great and I agree with Roger that it should be improved (I'll try to do so), in a well configured system I expect that there should be no other softirqs on the RTOS vCPU/pCPU so it shouldn't matter much if it is raise as a softirq or not?
On Tue, Jul 08, 2025 at 10:11:18AM -0700, Stefano Stabellini wrote: > On Tue, 8 Jul 2025, Jan Beulich wrote: > > On 08.07.2025 12:11, Roger Pau Monné wrote: > > > On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote: > > > Interrupt forwarding > > > from Xen into HVM/PVH guests uses a softirq to do the injection, which > > > means there's a non-deterministic window of latency between when the > > > interrupt is received by Xen, as to when it's injected to the guest, > > > because the softirq might not get processed right after being set as > > > pending (there might be other softirqs to process, or simply Xen might > > > be busy doing some other operation). > > > > > > I think you want to look into adding a new command line option or > > > similar, that allows selecting whether guest IRQs are deferred to a > > > softirq for injection, or are injected as part of the processing done > > > in the IRQ handler itself. > > > > > > Otherwise there will always be a non-deterministic amount of latency > > > on x86 w.r.t. HVM/PVH passthrough guest interrupts. Haven't you seen > > > some weird/unexpected variance when doing this passthrough interrupt > > > latency measurements on x86? > > While this is not great and I agree with Roger that it should be > improved (I'll try to do so), in a well configured system I expect that > there should be no other softirqs on the RTOS vCPU/pCPU so it shouldn't > matter much if it is raise as a softirq or not? Possibly - if the physical CPU where the interrupt is injected is also the one where the target vCPU is running it won't make much of a difference whether injection to the guest is deferred to a softirq, as softirqs must always be processed before returning to guest context. So I would think that when using the interrupt-follows-vCPU Xen model, where interrupts are moved around to follow the vCPUs they target, this extra latency would only be seen when the interrupt is delivered to a CPU different than the one where the target guest vCPU is running, which is never in your scenario because you pin vCPUs. Roger.
On 08.07.2025 19:11, Stefano Stabellini wrote: > On Tue, 8 Jul 2025, Jan Beulich wrote: >> On 08.07.2025 12:11, Roger Pau Monné wrote: >>> On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote: >>>> Hi all, >>>> >>>> This short patch series improves Xen real-time execution on AMD x86 >>>> processors. >>>> >>>> The key to real-time performance is deterministic guest execution times >>>> and deterministic guest interrupt latency. In such configurations, the >>>> null scheduler is typically used, and there should be no IPIs or other >>>> sources of vCPU execution interruptions beyond the guest timer interrupt >>>> as configured by the guest, and any passthrough interrupts for >>>> passthrough devices. >>>> >>>> This is because, upon receiving a critical interrupt, the guest (such as >>>> FreeRTOS or Zephyr) typically has a very short window of time to >>>> complete the required action. Being interrupted in the middle of this >>>> critical section could prevent the guest from completing the action >>>> within the allotted time, leading to malfunctions. >>> >>> There's IMO still one pending issue after this series on x86, maybe >>> you have addressed this with some local patch. >> >> Not just one, I think. We use IPIs for other purposes as well. The way >> I read the text above, all of them are a (potential) problem. > > Yes, all of them are potentially a problem. If you know of any other > IPI, please let me know and I'll try to remove them. INVALIDATE_TLB_VECTOR, EVENT_CHECK_VECTOR, and CALL_FUNCTION_VECTOR, maybe also others in that group of vectors (see irq-vectors.h). > One of my goals > posting this series was to raise awareness on this issue and attempting > to fix it with your help. It is not just IPIs, also Xen timers and other > things that could cause the guest to trap into Xen without the guest > knowledge. Typically IPIs are the worst offenders in my experience. > > On ARM, I have done several experiments where, after the system is > configured correctly, I can see that if the RTOS does nothing, there are > no traps in Xen on the RTOS vCPU/pCPU for seconds. Being quiescent when the system is idle is only part of the overall requirement, though? Jan
On Wed, 9 Jul 2025, Jan Beulich wrote: > On 08.07.2025 19:11, Stefano Stabellini wrote: > > On Tue, 8 Jul 2025, Jan Beulich wrote: > >> On 08.07.2025 12:11, Roger Pau Monné wrote: > >>> On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote: > >>>> Hi all, > >>>> > >>>> This short patch series improves Xen real-time execution on AMD x86 > >>>> processors. > >>>> > >>>> The key to real-time performance is deterministic guest execution times > >>>> and deterministic guest interrupt latency. In such configurations, the > >>>> null scheduler is typically used, and there should be no IPIs or other > >>>> sources of vCPU execution interruptions beyond the guest timer interrupt > >>>> as configured by the guest, and any passthrough interrupts for > >>>> passthrough devices. > >>>> > >>>> This is because, upon receiving a critical interrupt, the guest (such as > >>>> FreeRTOS or Zephyr) typically has a very short window of time to > >>>> complete the required action. Being interrupted in the middle of this > >>>> critical section could prevent the guest from completing the action > >>>> within the allotted time, leading to malfunctions. > >>> > >>> There's IMO still one pending issue after this series on x86, maybe > >>> you have addressed this with some local patch. > >> > >> Not just one, I think. We use IPIs for other purposes as well. The way > >> I read the text above, all of them are a (potential) problem. > > > > Yes, all of them are potentially a problem. If you know of any other > > IPI, please let me know and I'll try to remove them. > > INVALIDATE_TLB_VECTOR, EVENT_CHECK_VECTOR, and CALL_FUNCTION_VECTOR, maybe > also others in that group of vectors (see irq-vectors.h). Thanks Jan, I'll look into those. > > One of my goals > > posting this series was to raise awareness on this issue and attempting > > to fix it with your help. It is not just IPIs, also Xen timers and other > > things that could cause the guest to trap into Xen without the guest > > knowledge. Typically IPIs are the worst offenders in my experience. > > > > On ARM, I have done several experiments where, after the system is > > configured correctly, I can see that if the RTOS does nothing, there are > > no traps in Xen on the RTOS vCPU/pCPU for seconds. > > Being quiescent when the system is idle is only part of the overall > requirement, though? Actually being quiescent when the system is idle is not a requirement. The only requirements are: 1) quick interrupt injection into the RTOS 2) the RTOS must be undisturbed while executing the critical region 1) mostly means that the physical interrupt should be delivered to the same pCPU running the RTOS vCPU. Otherwise the extra IPI causes unwanted delays. 2) means that the RTOS must be undisturbed when executing the critical section, which is typically right after receiving the interrupt and only last for less than 1ms. In practice, it means the RTOS should absolutely not be descheduled and there should be no (unnecessary) traps into Xen while the RTOS is executing the critical section. It is expected that the RTOS will run the critical section with interrupts disabled. That's pretty much it. If we get this right, we have solved 99% of the problem.
On Wed, Jul 09, 2025 at 05:44:33PM -0700, Stefano Stabellini wrote: > On Wed, 9 Jul 2025, Jan Beulich wrote: > > On 08.07.2025 19:11, Stefano Stabellini wrote: > > > On Tue, 8 Jul 2025, Jan Beulich wrote: > > >> On 08.07.2025 12:11, Roger Pau Monné wrote: > > >>> On Mon, Jul 07, 2025 at 05:06:53PM -0700, Stefano Stabellini wrote: > > >>>> Hi all, > > >>>> > > >>>> This short patch series improves Xen real-time execution on AMD x86 > > >>>> processors. > > >>>> > > >>>> The key to real-time performance is deterministic guest execution times > > >>>> and deterministic guest interrupt latency. In such configurations, the > > >>>> null scheduler is typically used, and there should be no IPIs or other > > >>>> sources of vCPU execution interruptions beyond the guest timer interrupt > > >>>> as configured by the guest, and any passthrough interrupts for > > >>>> passthrough devices. > > >>>> > > >>>> This is because, upon receiving a critical interrupt, the guest (such as > > >>>> FreeRTOS or Zephyr) typically has a very short window of time to > > >>>> complete the required action. Being interrupted in the middle of this > > >>>> critical section could prevent the guest from completing the action > > >>>> within the allotted time, leading to malfunctions. > > >>> > > >>> There's IMO still one pending issue after this series on x86, maybe > > >>> you have addressed this with some local patch. > > >> > > >> Not just one, I think. We use IPIs for other purposes as well. The way > > >> I read the text above, all of them are a (potential) problem. > > > > > > Yes, all of them are potentially a problem. If you know of any other > > > IPI, please let me know and I'll try to remove them. > > > > INVALIDATE_TLB_VECTOR, EVENT_CHECK_VECTOR, and CALL_FUNCTION_VECTOR, maybe > > also others in that group of vectors (see irq-vectors.h). > > Thanks Jan, I'll look into those. > > > > > One of my goals > > > posting this series was to raise awareness on this issue and attempting > > > to fix it with your help. It is not just IPIs, also Xen timers and other > > > things that could cause the guest to trap into Xen without the guest > > > knowledge. Typically IPIs are the worst offenders in my experience. > > > > > > On ARM, I have done several experiments where, after the system is > > > configured correctly, I can see that if the RTOS does nothing, there are > > > no traps in Xen on the RTOS vCPU/pCPU for seconds. > > > > Being quiescent when the system is idle is only part of the overall > > requirement, though? > > Actually being quiescent when the system is idle is not a requirement. > > The only requirements are: > 1) quick interrupt injection into the RTOS > 2) the RTOS must be undisturbed while executing the critical region > > 1) mostly means that the physical interrupt should be delivered to the > same pCPU running the RTOS vCPU. Otherwise the extra IPI causes unwanted > delays. This should already be the case, in the Xen model interrupts follow vCPUs, so if you use pinning the vCPU should always be running on the pCPU that's the target of the physical interrupt. > 2) means that the RTOS must be undisturbed when executing the critical > section, which is typically right after receiving the interrupt and only > last for less than 1ms. In practice, it means the RTOS should absolutely > not be descheduled and there should be no (unnecessary) traps into Xen > while the RTOS is executing the critical section. It is expected that > the RTOS will run the critical section with interrupts disabled. What about other external interrupts? While the guest runs the critical interrupt handling section with interrupts disabled, an external interrupt from a device targeting the pCPU could cause a vmexit. I'm not aware of a nice way to solve this however, as for PVH/HVM Xen doesn't know when the guest has finished interrupt processing (iret). Maybe this is not an issue in practice if you isolate interrupts to different vCPUs (you might have to do this already to ensure deterministic latency). Roger.
On 10.07.2025 09:02, Roger Pau Monné wrote: > On Wed, Jul 09, 2025 at 05:44:33PM -0700, Stefano Stabellini wrote: >> 2) means that the RTOS must be undisturbed when executing the critical >> section, which is typically right after receiving the interrupt and only >> last for less than 1ms. In practice, it means the RTOS should absolutely >> not be descheduled and there should be no (unnecessary) traps into Xen >> while the RTOS is executing the critical section. It is expected that >> the RTOS will run the critical section with interrupts disabled. > > What about other external interrupts? While the guest runs the > critical interrupt handling section with interrupts disabled, an > external interrupt from a device targeting the pCPU could cause a > vmexit. For interrupts to be handled by the guest, we may need to finally gain AVIC support (albeit I'm not sure how close that is to VMX-es posted interrupts). For interrupts handled in Xen the only way would be to allow the guest to announce such critical sections to Xen. Which, besides being a security concern, may of course itself represent unacceptable overhead. Jan > I'm not aware of a nice way to solve this however, as for > PVH/HVM Xen doesn't know when the guest has finished interrupt > processing (iret). Maybe this is not an issue in practice if you > isolate interrupts to different vCPUs (you might have to do this > already to ensure deterministic latency). > > Roger.
On Thu, 10 Jul 2025, Jan Beulich wrote: > On 10.07.2025 09:02, Roger Pau Monné wrote: > > On Wed, Jul 09, 2025 at 05:44:33PM -0700, Stefano Stabellini wrote: > >> 2) means that the RTOS must be undisturbed when executing the critical > >> section, which is typically right after receiving the interrupt and only > >> last for less than 1ms. In practice, it means the RTOS should absolutely > >> not be descheduled and there should be no (unnecessary) traps into Xen > >> while the RTOS is executing the critical section. It is expected that > >> the RTOS will run the critical section with interrupts disabled. > > > > What about other external interrupts? While the guest runs the > > critical interrupt handling section with interrupts disabled, an > > external interrupt from a device targeting the pCPU could cause a > > vmexit. > > For interrupts to be handled by the guest, we may need to finally gain AVIC > support (albeit I'm not sure how close that is to VMX-es posted interrupts). > For interrupts handled in Xen the only way would be to allow the guest to > announce such critical sections to Xen. Which, besides being a security > concern, may of course itself represent unacceptable overhead. In the past, I wrote a patch for an ARM user basically to do what you suggested: "announce such critical sections to Xen". It is easy for Xen to know when the critical section start: upon receiving the critical interrupt. I added an hypercall so that the RTOS could tell Xen when it ends. This is the kind of dirty patch that is very effective but difficult to generalize. As an example, you can pause all other VMs during the critical section to make sure the RTOS has full bandwidth on the bus. The critical section is much shorter than a scheduler slot anyway. I did not try to upstream the patch. > > I'm not aware of a nice way to solve this however, as for > > PVH/HVM Xen doesn't know when the guest has finished interrupt > > processing (iret). Maybe this is not an issue in practice if you > > isolate interrupts to different vCPUs (you might have to do this > > already to ensure deterministic latency). Yeah, that should be solvable by moving around other interrupts to other pCPUs.
On 7/10/25 17:39, Stefano Stabellini wrote: > On Thu, 10 Jul 2025, Jan Beulich wrote: >> On 10.07.2025 09:02, Roger Pau Monné wrote: >>> On Wed, Jul 09, 2025 at 05:44:33PM -0700, Stefano Stabellini wrote: >>>> 2) means that the RTOS must be undisturbed when executing the critical >>>> section, which is typically right after receiving the interrupt and only >>>> last for less than 1ms. In practice, it means the RTOS should absolutely >>>> not be descheduled and there should be no (unnecessary) traps into Xen >>>> while the RTOS is executing the critical section. It is expected that >>>> the RTOS will run the critical section with interrupts disabled. >>> >>> What about other external interrupts? While the guest runs the >>> critical interrupt handling section with interrupts disabled, an >>> external interrupt from a device targeting the pCPU could cause a >>> vmexit. >> >> For interrupts to be handled by the guest, we may need to finally gain AVIC >> support (albeit I'm not sure how close that is to VMX-es posted interrupts). >> For interrupts handled in Xen the only way would be to allow the guest to >> announce such critical sections to Xen. Which, besides being a security >> concern, may of course itself represent unacceptable overhead. > > In the past, I wrote a patch for an ARM user basically to do what you > suggested: "announce such critical sections to Xen". It is easy for Xen > to know when the critical section start: upon receiving the critical > interrupt. I added an hypercall so that the RTOS could tell Xen when it > ends. This is the kind of dirty patch that is very effective but > difficult to generalize. As an example, you can pause all other VMs > during the critical section to make sure the RTOS has full bandwidth on > the bus. The critical section is much shorter than a scheduler slot > anyway. I did not try to upstream the patch. Curious: why is the RTOS running on an x86 core at all, and not on a microcontroller dedicated exclusively to real-time tasks? The performance impact of isolating the RTOS from other tasks seems huge compared to the cost of a tiny microcontroller that just runs the RTOS. Have you considered upstreaming the patch? -- Sincerely, Demi Marie Obenour (she/her/hers)
© 2016 - 2025 Red Hat, Inc.