All,
This mail is to bring another approach to solve the stack-overflow due to nested interrupts. Michael solved this problem in OVMF through NestedInterruptTplLib.
I made a draft patch as attached "DxeCore.diff". The patch simply to avoid the interrupt in enable state when TPL is dropped to the interrupted TPL. The interrupt will be enabled later through "IRET".
So, a timer driver has two ways to implement its timer interrupt handler:
1. Do raise/restore TPL in the TimerInterruptHandler(). But call the APIs in NestedInterruptTplLib.
2. Do not raise/restore TPL in the TimerInterruptHandler(). So that only DxeCore restores the TPL. And when DxeCore restores the TPL, the interrupt is not enabled when TPL is dropped to the interrupted TPL (as it will be enabled later by "IRET").
Implementing the logic in DxeCore does not prevent the TimerInterruptHandler() from implementing the way #1.
Agree on the draft patch?
My 2nd question is can we set a rule that TimerInterruptHandler() should NOT restore TPL so that way #2 (changing DxeCore) is enough to solve the stack overflow issue due to nested interrupts.
I was aware of the discussion between Laszlo and Michael in end of 2022 but never dig deeply as today into this problem.
I really appreciate the long discussion in the bugzilla (https://bugzilla.tianocore.org/show_bug.cgi?id=4162) and comments in NestedInterruptTplLib. I learned a lot from them and they are quite interesting!
Thanks,
Ray
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#114369): https://edk2.groups.io/g/devel/message/114369
Mute This Topic: https://groups.io/mt/103950154/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/3901457/1787277/102458076/xyzzy [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-
diff --git a/MdeModulePkg/Core/Dxe/Event/Timer.c b/MdeModulePkg/Core/Dxe/Event/Timer.c
index 29e507c67c..8e8b9d6592 100644
--- a/MdeModulePkg/Core/Dxe/Event/Timer.c
+++ b/MdeModulePkg/Core/Dxe/Event/Timer.c
@@ -176,6 +176,11 @@ CoreInitializeTimer (
ASSERT_EFI_ERROR (Status);
}
+VOID
+CoreRestoreTplDeferEnableInterrupt (
+ IN EFI_TPL NewTpl
+ );
+
/**
Called by the platform code to process a tick.
@@ -190,11 +195,9 @@ CoreTimerTick (
)
{
IEVENT *Event;
+ EFI_TPL OriginalTpl;
- //
- // Check runtiem flag in case there are ticks while exiting boot services
- //
- CoreAcquireLock (&mEfiSystemTimeLock);
+ OriginalTpl = CoreRaiseTpl (TPL_HIGH_LEVEL);
//
// Update the system time
@@ -213,7 +216,7 @@ CoreTimerTick (
}
}
- CoreReleaseLock (&mEfiSystemTimeLock);
+ CoreRestoreTplDeferEnableInterrupt (OriginalTpl); //CoreRestoreTpl (OriginalTpl);
}
/**
diff --git a/MdeModulePkg/Core/Dxe/Event/Tpl.c b/MdeModulePkg/Core/Dxe/Event/Tpl.c
index b33f80573c..8d4b92ef4e 100644
--- a/MdeModulePkg/Core/Dxe/Event/Tpl.c
+++ b/MdeModulePkg/Core/Dxe/Event/Tpl.c
@@ -147,3 +147,55 @@ CoreRestoreTpl (
CoreSetInterruptState (TRUE);
}
}
+
+
+VOID
+CoreRestoreTplDeferEnableInterrupt (
+ IN EFI_TPL NewTpl
+ )
+{
+ EFI_TPL OldTpl;
+ EFI_TPL PendingTpl;
+
+ OldTpl = gEfiCurrentTpl;
+ if (NewTpl > OldTpl) {
+ DEBUG ((DEBUG_ERROR, "FATAL ERROR - RestoreTpl with NewTpl(0x%x) > OldTpl(0x%x)\n", NewTpl, OldTpl));
+ ASSERT (FALSE);
+ }
+
+ ASSERT (VALID_TPL (NewTpl));
+
+ //
+ // If lowering below HIGH_LEVEL, make sure
+ // interrupts are enabled
+ //
+
+ if ((OldTpl >= TPL_HIGH_LEVEL) && (NewTpl < TPL_HIGH_LEVEL)) {
+ gEfiCurrentTpl = TPL_HIGH_LEVEL;
+ }
+
+ //
+ // Dispatch any pending events
+ //
+ while (gEventPending != 0) {
+ PendingTpl = (UINTN)HighBitSet64 (gEventPending);
+ if (PendingTpl <= NewTpl) {
+ break;
+ }
+
+ gEfiCurrentTpl = PendingTpl;
+ if (gEfiCurrentTpl < TPL_HIGH_LEVEL) {
+ CoreSetInterruptState (TRUE);
+ }
+
+ CoreDispatchEventNotifies (gEfiCurrentTpl);
+ }
+
+ //
+ // Disable interrupt before restoring to the original TPL.
+ // This is to avoid the nest interrupt happens in the same TPL
+ // which will be endless.
+ //
+ CoreSetInterruptState (FALSE);
+ gEfiCurrentTpl = NewTpl;
+}
\ No newline at end of file
On 25/01/2024 07:57, Ni, Ray wrote: > This mail is to bring another approach to solve the stack-overflow due > to nested interrupts. Michael solved this problem in OVMF through > NestedInterruptTplLib. > > I made a draft patch as attached “DxeCore.diff”. The patch simply to > avoid the interrupt in enable state when TPL is dropped to the > interrupted TPL. The interrupt will be enabled later through “IRET”. I don't disagree with the approach, but it does break the API as per the UEFI PI specification (version 1.8 section II-12.10), and so this is not something that can just be dropped in as an EDK2 code change. There are no version number fields or spare parameters (that I can see) that could be used within EFI_TIMER_ARCH_PROTOCOL to allow for this change of semantics, so this would have to be a breaking change. I think you'd need to first define an EFI_TIMER_ARCH2_PROTOCOL with different semantics for RegisterHandler(), but I'm not a UEFI Forum member and I have no idea what the bureaucratic process is for pushing through this kind of change. I suspect we'd end up having to support both protocol variants (which means added maintenance burden). It might be plausible instead to extend EFI_CPU_ARCH_PROTOCOL in a backwards-compatible way. The InterruptType parameter to RegisterInterruptHandler() is already handled in a very casual way: the UEFI spec defines a limited range of possible types (0-19 for IA32/X64) for EFI_EXCEPTION_TYPE, but the code in LocalApicTimerDxe.c already treats it as meaning just an IRQ vector number by passing the value LOCAL_APIC_TIMER_VECTOR=32. (Incidentally, the code comments in MdePkg/Include/Protocol/Cpu.h are completely wrong - the description of the InterruptType parameter seems to have been copied from the description of the State parameter in EFI_CPU_GET_INTERRUPT_STATE.) The extension I'm thinking of could be: - Clean up the definition of EFI_EXCEPTION_TYPE to clarify that it represents a CPU interrupt vector number (if this is how all current architectures actually use it). - Extend the definition of EFI_EXCEPTION_TYPE to include a high bit flag that can be ORed on to the exception type, e.g. #define EFI_EXCEPTION_TYPE_TPL_HIGH 0x01000000 - The RegisterInterruptHandler() implementation could then treat this flag as meaning that the handler should be called via a wrapper that does RaiseTPL(TPL_HIGH_LEVEL) before calling the handler, and the equivalent of your CoreRestoreTplDeferEnableInterrupt() after the handler returns, i.e. that the handler is automatically called at TPL_HIGH_LEVEL. This would not make any breaking change to the definitions of EFI_TIMER_ARCH_PROTOCOL or EFI_CPU_ARCH_PROTOCOL, and so would not require an ARCH2_PROTOCOL variant to be created. I haven't tried implementing this to see if it's viable, so I've probably missed something crucial. Personally I would go for moving NestedInterruptTplLib to MdeModulePkg since this can be done today, whereas anything involving a spec change will take a lot more time, but that's not my call. Thanks, Michael -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#114401): https://edk2.groups.io/g/devel/message/114401 Mute This Topic: https://groups.io/mt/103950154/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
> On 25/01/2024 07:57, Ni, Ray wrote: > > This mail is to bring another approach to solve the stack-overflow due > > to nested interrupts. Michael solved this problem in OVMF through > > NestedInterruptTplLib. > > > > I made a draft patch as attached “DxeCore.diff”. The patch simply to > > avoid the interrupt in enable state when TPL is dropped to the > > interrupted TPL. The interrupt will be enabled later through “IRET”. > > I don't disagree with the approach, but it does break the API as per the > UEFI PI specification (version 1.8 section II-12.10), and so this is not > something that can just be dropped in as an EDK2 code change. You think that the TimerInterruptHandler() doesn't raise/restore TPL which would violate the PI spec as PI spec says " NotifyFunction ... executes at EFI_TPL_HIGH_LEVEL."? I do not think the PI spec requires TimerInterruptHandler() raises TPL to HIGH before invoking NotifyFunction. It just means the NotifyFunction will execute at TPL_HIGH. If you review HpetTimer driver, it does not raise TPL to HIGH before invoking NotifyFunction. And I think implementing the DxeCore changes as attached does not prevent the TimerInterruptHandler() from calling raise/restore TPL. So, with the changes done in DxeCore, a timer driver could either not raise/restore TPL in TimerInterruptHandler(), or it calls NestedInterruptTplLib if it wants. I like the idea behind NestedInterruptTplLib. The only concern is the maintenance effort. I want to avoid the difficulty of debuggability brought by NestedInterruptTplLib when some timer interrupt related issues happen in future. I am ok with OvmfPkg code using NestedInterruptTplLib. Thanks, Ray -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#114405): https://edk2.groups.io/g/devel/message/114405 Mute This Topic: https://groups.io/mt/103950154/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/3901457/1787277/102458076/xyzzy [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 25/01/2024 13:54, Ni, Ray wrote: >> I don't disagree with the approach, but it does break the API as per the >> UEFI PI specification (version 1.8 section II-12.10), and so this is not >> something that can just be dropped in as an EDK2 code change. > > You think that the TimerInterruptHandler() doesn't raise/restore TPL > which would violate the PI spec as PI spec says " NotifyFunction ... executes at EFI_TPL_HIGH_LEVEL."? > > I do not think the PI spec requires TimerInterruptHandler() raises TPL > to HIGH before invoking NotifyFunction. It just means the NotifyFunction > will execute at TPL_HIGH. If the caller is not supposed to raise TPL to TPL_HIGH_LEVEL before calling NotifyFunction, then the statement "This function executes at EFI_TPL_HIGH_LEVEL" in the PI specification is meaningless. There is no other possible interpretation besides "the caller must raise TPL to TPL_HIGH_LEVEL before calling this function". > If you review HpetTimer driver, it does not raise TPL to HIGH before > invoking NotifyFunction. That would then be a bug in HpetTimer, which ought to be fixed. If HpetTimer were to be used on a platform where the NotifyFunction correctly assumes that it is called at TPL_HIGH_LEVEL and does something that would break at a lower level, then this could lead to undefined behaviour. > And I think implementing the DxeCore changes as attached does not > prevent the TimerInterruptHandler() from calling raise/restore TPL. No, but a spec-conforming timer interrupt handler could not take advantage of the feature, because it would have to raise to TPL_HIGH_LEVEL before calling the NotifyFunction. (Any raise/restore within the NotifyFunction would then have no effect.) > So, with the changes done in DxeCore, a timer driver could either > not raise/restore TPL in TimerInterruptHandler(), or it calls > NestedInterruptTplLib if it wants. As a pure code change, I do agree that it solves the problem and it's a much simpler approach. However, it is a breaking change to the specification and I think it would need be handled as such. The minimal specification change I can think of that would make this possible would be to relax the wording on NotifyFunction in the next version of the PI specification to say that * the NotifyFunction can be called at any TPL level * the NotifyFunction will raise TPL to TPL_HIGH_LEVEL, restore TPL back to the original TPL before returning * the NotifyFunction may re-enable interrupts during its execution, and that the caller must be prepared to be re-entered before NotifyFunction returns * the timer interrupt must have been rearmed before calling NotifyFunction * the NotifyFunction must guarantee that it never reaches a state in which the TPL has been restored to the original level with CPU interrupts enabled. This would be backwards compatible with the existing behaviour. A caller written to the current specification would call NotifyFunction at TPL_HIGH_LEVEL and so any RaiseTPL/RestoreTPL done within a NotifyFunction complying to the new specification would be a no-op anyway. A caller written to the new specification would have to check the supported version of the PI specification (which I assume is available in some system configuration table somewhere) to know that it was safe to call NotifyFunction without first raising to TPL_HIGH_LEVEL. This approach would at least avoid the need for an ARCH2_PROTOCOL variant, which is potentially lower impact. Thanks, Michael -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#114407): https://edk2.groups.io/g/devel/message/114407 Mute This Topic: https://groups.io/mt/103950154/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
> That would then be a bug in HpetTimer, which ought to be fixed. If > HpetTimer were to be used on a platform where the NotifyFunction > correctly assumes that it is called at TPL_HIGH_LEVEL and does something > that would break at a lower level, then this could lead to undefined > behaviour. In theory, it could happen that the NotifyFunction may not raise TPL to HIGH. In real world, NotifyFunction (CoreTimerTick() in DxeCore) does raise TPL to HIGH. I agree binding the NotifyFunction to CoreTimerTick() is not the best. But it could help to reduce the complexity of the problem. > As a pure code change, I do agree that it solves the problem and it's a > much simpler approach. However, it is a breaking change to the > specification and I think it would need be handled as such. > > The minimal specification change I can think of that would make this > possible would be to relax the wording on NotifyFunction in the next > version of the PI specification to say that > > * the NotifyFunction can be called at any TPL level > > * the NotifyFunction will raise TPL to TPL_HIGH_LEVEL, restore TPL back > to the original TPL before returning > > * the NotifyFunction may re-enable interrupts during its execution, and > that the caller must be prepared to be re-entered before NotifyFunction > returns > > * the timer interrupt must have been rearmed before calling NotifyFunction > > * the NotifyFunction must guarantee that it never reaches a state in > which the TPL has been restored to the original level with CPU > interrupts enabled. I agree with you about the above PI spec clarifications. Would you like to write a PI spec ECR? But I do not think the PI spec version stored in the PI system table needs to reflect whether a DxeCore implementation follows the clarification. Since the DxeCore::CoreTimerTick() implementation raises TPL to HIGH in the very first version created in more than 10 years ago, I think it's safe for TimerInterruptHandler() assumes CoreTimerTick() will raise TPL to HIGH so that TimerInterruptHandler() does not need to raise TPL to HIGH. (I agree changing the spec version is the most correct way if we review the problem in a very theorical way.) I really want to keep the UEFI world simple with the bug fixed. (The cost is assumption on existing DxeCore::CoreTimerTick().) -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#114409): https://edk2.groups.io/g/devel/message/114409 Mute This Topic: https://groups.io/mt/103950154/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/3901457/1787277/102458076/xyzzy [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
On 25/01/2024 15:06, Ni, Ray wrote: > I agree with you about the above PI spec clarifications. > Would you like to write a PI spec ECR? I am not a UEFI Forum member, so I don't think I have standing to do this. (I also don't have the spare time to do this for free, sorry.) > But I do not think the PI spec version stored in the PI system table needs to > reflect whether a DxeCore implementation follows the clarification. > > Since the DxeCore::CoreTimerTick() implementation raises TPL to HIGH in the very first version created > in more than 10 years ago, I think it's safe for TimerInterruptHandler() assumes CoreTimerTick() will > raise TPL to HIGH so that TimerInterruptHandler() does not need to raise TPL to HIGH. > (I agree changing the spec version is the most correct way if we review the problem in a very theorical way.) I don't see any call to RaiseTPL() in the current version of CoreTimerTick(), unless this is implicit in the use of CoreAcquireLock()? > I really want to keep the UEFI world simple with the bug fixed. I definitely agree with this aim! :) > (The cost is assumption on existing DxeCore::CoreTimerTick().) For me, the problem is that we can't assume that it's the EDK2 implementation of CoreTimerTick() that is being used. It's perfectly legitimate for an implementation to not use EDK2's DxeCore, and to provide a NotifyFunction that *requires* being called with the TPL already raised to TPL_HIGH_LEVEL. If we're not worried about maintaining conformance to the published specifications, then there are several very breaking changes I'd like to make, starting with the USB I/O protocols. :) Thanks, Michael -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#114427): https://edk2.groups.io/g/devel/message/114427 Mute This Topic: https://groups.io/mt/103950154/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=-=-=-=-=-=-=-=-=-=-=-
© 2016 - 2024 Red Hat, Inc.