[PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable

Laurent Vivier posted 1 patch 4 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250806152940.362418-1-lvivier@redhat.com
Maintainers: Dmitry Fleytman <dmitry.fleytman@gmail.com>, Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>, Jason Wang <jasowang@redhat.com>
There is a newer version of this series
hw/net/e1000e_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable
Posted by Laurent Vivier 4 months, 1 week ago
A race condition between guest driver actions and QEMU timers can lead
to an assertion failure when the guest switches the e1000e from legacy
interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
RDTR) is active, but the guest enables MSI-X before the timer fires,
the pending interrupt cause can trigger an assert in
`e1000e_intmgr_collect_delayed_causes()`.

The function's assertion (`assert(core->delayed_causes == 0)`)
incorrectly assumes that it's impossible for a legacy delayed interrupt
to be pending once the device is in MSI-X mode.

This behavior is incorrect. On a physical device, a driver-initiated
mode switch would mask interrupts, reconfigure the hardware, and clear
any stale interrupt states. The legacy delay timers (TIDV/RDTR) are not
used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
mechanism is used instead. Therefore, any pending interrupt from the
old mode should be ignored.

Replace the overly strict assertion with a statement that clears any
stale `delayed_causes`. This correctly models the hardware's behavior
of discarding obsolete interrupt events during a mode change and
prevents the QEMU process from terminating.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1863
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/net/e1000e_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 24138587905b..d0ec892488d4 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -342,7 +342,7 @@ e1000e_intmgr_collect_delayed_causes(E1000ECore *core)
     uint32_t res;
 
     if (msix_enabled(core->owner)) {
-        assert(core->delayed_causes == 0);
+        core->delayed_causes = 0;
         return 0;
     }
 
-- 
2.49.0
Re: [PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable
Posted by Akihiko Odaki 4 months, 1 week ago
On 2025/08/07 0:29, Laurent Vivier wrote:
> A race condition between guest driver actions and QEMU timers can lead
> to an assertion failure when the guest switches the e1000e from legacy
> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
> RDTR) is active, but the guest enables MSI-X before the timer fires,
> the pending interrupt cause can trigger an assert in
> `e1000e_intmgr_collect_delayed_causes()`.
> 
> The function's assertion (`assert(core->delayed_causes == 0)`)
> incorrectly assumes that it's impossible for a legacy delayed interrupt
> to be pending once the device is in MSI-X mode.
> 
> This behavior is incorrect. On a physical device, a driver-initiated
> mode switch would mask interrupts, reconfigure the hardware, and clear
> any stale interrupt states. The legacy delay timers (TIDV/RDTR) are not
> used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
> mechanism is used instead. Therefore, any pending interrupt from the
> old mode should be ignored.

It is true that triggering assertion is incorrect as per: 
docs/devel/secure-coding-practices.rst

However, I don't see statements in the datasheet that says mode switch 
will clear stale interrupts.

The expression "TIDV/RDTR are not used for moderation in MSI-X mode" is 
also unclear. Behaving drivers may indeed use ITR for that purpose, but 
the question for us is: what will e1000e do when the guest tries to use 
TIDV/RDTR in MSI-X mode anyway? That defines the behavior we need to 
implement.

If the datasheet describes the expected behavior with delayed interrupts 
in MSI-X, a reference to the datasheet should be made at least in the 
patch message. Otherwise, perhaps this "if (msix_enabled(core->owner))" 
is just extraneous and should be removed.

Regards,
Akihiko Odaki

> 
> Replace the overly strict assertion with a statement that clears any
> stale `delayed_causes`. This correctly models the hardware's behavior
> of discarding obsolete interrupt events during a mode change and
> prevents the QEMU process from terminating.
> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1863
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
>   hw/net/e1000e_core.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> index 24138587905b..d0ec892488d4 100644
> --- a/hw/net/e1000e_core.c
> +++ b/hw/net/e1000e_core.c
> @@ -342,7 +342,7 @@ e1000e_intmgr_collect_delayed_causes(E1000ECore *core)
>       uint32_t res;
>   
>       if (msix_enabled(core->owner)) {
> -        assert(core->delayed_causes == 0);
> +        core->delayed_causes = 0;
>           return 0;
>       }
>
Re: [PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable
Posted by Laurent Vivier 4 months, 1 week ago
On 06/08/2025 19:44, Akihiko Odaki wrote:
> On 2025/08/07 0:29, Laurent Vivier wrote:
>> A race condition between guest driver actions and QEMU timers can lead
>> to an assertion failure when the guest switches the e1000e from legacy
>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
>> RDTR) is active, but the guest enables MSI-X before the timer fires,
>> the pending interrupt cause can trigger an assert in
>> `e1000e_intmgr_collect_delayed_causes()`.
>>
>> The function's assertion (`assert(core->delayed_causes == 0)`)
>> incorrectly assumes that it's impossible for a legacy delayed interrupt
>> to be pending once the device is in MSI-X mode.
>>
>> This behavior is incorrect. On a physical device, a driver-initiated
>> mode switch would mask interrupts, reconfigure the hardware, and clear
>> any stale interrupt states. The legacy delay timers (TIDV/RDTR) are not
>> used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
>> mechanism is used instead. Therefore, any pending interrupt from the
>> old mode should be ignored.
> 
> It is true that triggering assertion is incorrect as per: docs/devel/secure-coding- 
> practices.rst
> 
> However, I don't see statements in the datasheet that says mode switch will clear stale 
> interrupts.

The datasheet doesn't say "stale interrupts are cleared." but it describes two 
fundamentally separate and mutually exclusive hardware paths for generating interrupts:

Intel® 82574 GbE Controller Family Datasheet
https://docs.rs-online.com/96e8/0900766b81384733.pdf

See 7.4.1 Legacy and MSI Interrupt Modes

Legacy/MSI Path: An event (like a packet transmission completing) sets a cause bit in the 
ICR (Interrupt Cause Register). The legacy delay timers (TIDV/RDTR) can delay the 
propagation of this cause. When the timer expires, if the corresponding bit in the IMS 
(Interrupt Mask Set) is enabled, the hardware asserts the INTx pin to signal the CPU.

See 7.4.2 MSI-X Mode

MSI-X Path: An event on a specific queue (e.g., Rx Queue 1) is mapped via the IVAR 
(Interrupt Vector Allocation Register) to a specific MSI-X vector. The ITR (Interrupt 
Throttle Rate) register for that vector then determines if an interrupt should be 
generated. If allowed, the hardware performs a memory write to the address specified in 
the PCI MSI-X table, delivering the message to a specific CPU core.

Only non-queue causes are reflected in ICR (so not TIDV/RDTR).

The key here is that a TIDV timer expiring is only connected to the legacy ICR->IMS->INTx 
path. There is no described hardware path for a TIDV timer expiration to trigger an MSI-X 
memory write.

Therefore, if a guest enables MSI-X, it reconfigures its interrupt controller (APIC) to 
listen for MSI-X messages, not legacy INTx pin assertions. So, even if the stale TIDV 
timer fires on the hardware, the interrupt it generates has no configured path to be 
received by the guest OS. From the guest's perspective, the interrupt is lost. Our 
emulation should model this by ignoring/clearing the stale event, which is precisely what 
the patch does.

> 
> The expression "TIDV/RDTR are not used for moderation in MSI-X mode" is also unclear. 
> Behaving drivers may indeed use ITR for that purpose, but the question for us is: what 
> will e1000e do when the guest tries to use TIDV/RDTR in MSI-X mode anyway? That defines 
> the behavior we need to implement.

The TIDV and RDTR registers are part of the device's memory-mapped I/O space. The hardware 
will almost certainly allow a write to these registers at any time.

However, based on the separate hardware paths described above, the write would be 
ineffective. A driver could set the TIDV timer, and the timer would likely count down, but 
its expiration event is only wired to the legacy interrupt generation logic. Since the 
device is configured for MSI-X interrupt delivery, that path is dormant. The write is 
accepted, but the action is inert.

> 
> If the datasheet describes the expected behavior with delayed interrupts in MSI-X, a 
> reference to the datasheet should be made at least in the patch message. Otherwise, 
> perhaps this "if (msix_enabled(core->owner))" is just extraneous and should be removed.

The "if (msix_enabled(core->owner))" check is not extraneous and must be kept. It 
correctly separates the emulation logic for these two mutually exclusive hardware modes. 
Removing it would incorrectly allow a legacy delayed interrupt to be processed as if it 
were valid in MSI-X mode, which is not how the hardware works.

Moreover it can introduce unexpected behavior in the guest as this case could not be managed.

Thanks,
Laurent


Re: [PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable
Posted by Akihiko Odaki 4 months, 1 week ago
On 2025/08/07 17:36, Laurent Vivier wrote:
> On 06/08/2025 19:44, Akihiko Odaki wrote:
>> On 2025/08/07 0:29, Laurent Vivier wrote:
>>> A race condition between guest driver actions and QEMU timers can lead
>>> to an assertion failure when the guest switches the e1000e from legacy
>>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
>>> RDTR) is active, but the guest enables MSI-X before the timer fires,
>>> the pending interrupt cause can trigger an assert in
>>> `e1000e_intmgr_collect_delayed_causes()`.
>>>
>>> The function's assertion (`assert(core->delayed_causes == 0)`)
>>> incorrectly assumes that it's impossible for a legacy delayed interrupt
>>> to be pending once the device is in MSI-X mode.
>>>
>>> This behavior is incorrect. On a physical device, a driver-initiated
>>> mode switch would mask interrupts, reconfigure the hardware, and clear
>>> any stale interrupt states. The legacy delay timers (TIDV/RDTR) are not
>>> used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
>>> mechanism is used instead. Therefore, any pending interrupt from the
>>> old mode should be ignored.
>>
>> It is true that triggering assertion is incorrect as per: docs/devel/ 
>> secure-coding- practices.rst
>>
>> However, I don't see statements in the datasheet that says mode switch 
>> will clear stale interrupts.
> 
> The datasheet doesn't say "stale interrupts are cleared." but it 
> describes two fundamentally separate and mutually exclusive hardware 
> paths for generating interrupts:
> 
> Intel® 82574 GbE Controller Family Datasheet
> https://docs.rs-online.com/96e8/0900766b81384733.pdf

It is an old revision (2.5). More recent revision (3.4) can be found at:
https://courses.cs.washington.edu/courses/cse451/18au/readings/e1000e.pdf

> 
> See 7.4.1 Legacy and MSI Interrupt Modes
> 
> Legacy/MSI Path: An event (like a packet transmission completing) sets a 
> cause bit in the ICR (Interrupt Cause Register). The legacy delay timers 
> (TIDV/RDTR) can delay the propagation of this cause. When the timer 
> expires, if the corresponding bit in the IMS (Interrupt Mask Set) is 
> enabled, the hardware asserts the INTx pin to signal the CPU.

Both revisions 2.5 and 3.4 have the following statements, different from 
what you quoted:

 > In legacy and MSI modes, an interrupt cause is reflected by setting
 > one of the bits in the ICR register, where each bit reflects one or
 > more causes. This description of ICR register provides the mapping of
 > interrupt causes (for example, a specific Rx queue event or a LSC
 > event) to bits in the ICR.
 >
 > Mapping of causes relating to the Tx and Rx queues as well as
 > non-queue causes in this mode is not configurable. Each possible queue
 > interrupt cause (such as, each Rx queue, Tx queue or any other
 > interrupt source) has an entry in the ICR.
 >
 > The following configuration and parameters are involved:
 > • The ICR[31:0] bits are allocated to specific interrupt causes

Please ensure that you refer a correct datasheet and share it for me.

> 
> See 7.4.2 MSI-X Mode
> 
> MSI-X Path: An event on a specific queue (e.g., Rx Queue 1) is mapped 
> via the IVAR (Interrupt Vector Allocation Register) to a specific MSI-X 
> vector. The ITR (Interrupt Throttle Rate) register for that vector then 
> determines if an interrupt should be generated. If allowed, the hardware 
> performs a memory write to the address specified in the PCI MSI-X table, 
> delivering the message to a specific CPU core.
> 
> Only non-queue causes are reflected in ICR (so not TIDV/RDTR).
> 
> The key here is that a TIDV timer expiring is only connected to the 
> legacy ICR->IMS->INTx path. There is no described hardware path for a 
> TIDV timer expiration to trigger an MSI-X memory write.
> 
> Therefore, if a guest enables MSI-X, it reconfigures its interrupt 
> controller (APIC) to listen for MSI-X messages, not legacy INTx pin 
> assertions. So, even if the stale TIDV timer fires on the hardware, the 
> interrupt it generates has no configured path to be received by the 
> guest OS. From the guest's perspective, the interrupt is lost. Our 
> emulation should model this by ignoring/clearing the stale event, which 
> is precisely what the patch does.
> 
>>
>> The expression "TIDV/RDTR are not used for moderation in MSI-X mode" 
>> is also unclear. Behaving drivers may indeed use ITR for that purpose, 
>> but the question for us is: what will e1000e do when the guest tries 
>> to use TIDV/RDTR in MSI-X mode anyway? That defines the behavior we 
>> need to implement.
> 
> The TIDV and RDTR registers are part of the device's memory-mapped I/O 
> space. The hardware will almost certainly allow a write to these 
> registers at any time.
> 
> However, based on the separate hardware paths described above, the write 
> would be ineffective. A driver could set the TIDV timer, and the timer 
> would likely count down, but its expiration event is only wired to the 
> legacy interrupt generation logic. Since the device is configured for 
> MSI-X interrupt delivery, that path is dormant. The write is accepted, 
> but the action is inert.
> 
>>
>> If the datasheet describes the expected behavior with delayed 
>> interrupts in MSI-X, a reference to the datasheet should be made at 
>> least in the patch message. Otherwise, perhaps this "if 
>> (msix_enabled(core->owner))" is just extraneous and should be removed.
> 
> The "if (msix_enabled(core->owner))" check is not extraneous and must be 
> kept. It correctly separates the emulation logic for these two mutually 
> exclusive hardware modes. Removing it would incorrectly allow a legacy 
> delayed interrupt to be processed as if it were valid in MSI-X mode, 
> which is not how the hardware works.
> 
> Moreover it can introduce unexpected behavior in the guest as this case 
> could not be managed.

Revision 3.4's 4.6.1 Interrupts During Initialization says:
 > Most drivers disable interrupts during initialization to prevent
 > re-entrancy. Interrupts are disabled by writing to the IMC register.
 > Note that the interrupts need to be disabled also after issuing a
 > global reset, so a typical driver initialization flow is:
 >
 > 1. Disable interrupts
 > 2. Issue a global reset
 > 3. Disable interrupts (again)
 > 4. …
 >
 > After the initialization completes, a typical driver enables the
 > desired interrupts by
 > writing to the IMS register.

Drivers can ensure that old interrupts will not fire by following this 
procedure. The behavior when not following this is undefined (in the 
datasheet I'm referring to).

Regards,
Akihiko Odaki

Re: [PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable
Posted by Laurent Vivier 4 months, 1 week ago
On 07/08/2025 12:13, Akihiko Odaki wrote:
> On 2025/08/07 17:36, Laurent Vivier wrote:
>> On 06/08/2025 19:44, Akihiko Odaki wrote:
>>> On 2025/08/07 0:29, Laurent Vivier wrote:
>>>> A race condition between guest driver actions and QEMU timers can lead
>>>> to an assertion failure when the guest switches the e1000e from legacy
>>>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
>>>> RDTR) is active, but the guest enables MSI-X before the timer fires,
>>>> the pending interrupt cause can trigger an assert in
>>>> `e1000e_intmgr_collect_delayed_causes()`.
>>>>
>>>> The function's assertion (`assert(core->delayed_causes == 0)`)
>>>> incorrectly assumes that it's impossible for a legacy delayed interrupt
>>>> to be pending once the device is in MSI-X mode.
>>>>
>>>> This behavior is incorrect. On a physical device, a driver-initiated
>>>> mode switch would mask interrupts, reconfigure the hardware, and clear
>>>> any stale interrupt states. The legacy delay timers (TIDV/RDTR) are not
>>>> used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
>>>> mechanism is used instead. Therefore, any pending interrupt from the
>>>> old mode should be ignored.
>>>
>>> It is true that triggering assertion is incorrect as per: docs/devel/ secure-coding- 
>>> practices.rst
>>>
>>> However, I don't see statements in the datasheet that says mode switch will clear stale 
>>> interrupts.
>>
>> The datasheet doesn't say "stale interrupts are cleared." but it describes two 
>> fundamentally separate and mutually exclusive hardware paths for generating interrupts:
>>
>> Intel® 82574 GbE Controller Family Datasheet
>> https://docs.rs-online.com/96e8/0900766b81384733.pdf
> 
> It is an old revision (2.5). More recent revision (3.4) can be found at:
> https://courses.cs.washington.edu/courses/cse451/18au/readings/e1000e.pdf
> 
>>
>> See 7.4.1 Legacy and MSI Interrupt Modes
>>
>> Legacy/MSI Path: An event (like a packet transmission completing) sets a cause bit in 
>> the ICR (Interrupt Cause Register). The legacy delay timers (TIDV/RDTR) can delay the 
>> propagation of this cause. When the timer expires, if the corresponding bit in the IMS 
>> (Interrupt Mask Set) is enabled, the hardware asserts the INTx pin to signal the CPU.
> 
> Both revisions 2.5 and 3.4 have the following statements, different from what you quoted:
> 
>  > In legacy and MSI modes, an interrupt cause is reflected by setting
>  > one of the bits in the ICR register, where each bit reflects one or
>  > more causes. This description of ICR register provides the mapping of
>  > interrupt causes (for example, a specific Rx queue event or a LSC
>  > event) to bits in the ICR.
>  >
>  > Mapping of causes relating to the Tx and Rx queues as well as
>  > non-queue causes in this mode is not configurable. Each possible queue
>  > interrupt cause (such as, each Rx queue, Tx queue or any other
>  > interrupt source) has an entry in the ICR.
>  >
>  > The following configuration and parameters are involved:
>  > • The ICR[31:0] bits are allocated to specific interrupt causes
> 
> Please ensure that you refer a correct datasheet and share it for me.
> 
>>
>> See 7.4.2 MSI-X Mode
>>
>> MSI-X Path: An event on a specific queue (e.g., Rx Queue 1) is mapped via the IVAR 
>> (Interrupt Vector Allocation Register) to a specific MSI-X vector. The ITR (Interrupt 
>> Throttle Rate) register for that vector then determines if an interrupt should be 
>> generated. If allowed, the hardware performs a memory write to the address specified in 
>> the PCI MSI-X table, delivering the message to a specific CPU core.
>>
>> Only non-queue causes are reflected in ICR (so not TIDV/RDTR).
>>
>> The key here is that a TIDV timer expiring is only connected to the legacy ICR->IMS- 
>> >INTx path. There is no described hardware path for a TIDV timer expiration to trigger 
>> an MSI-X memory write.
>>
>> Therefore, if a guest enables MSI-X, it reconfigures its interrupt controller (APIC) to 
>> listen for MSI-X messages, not legacy INTx pin assertions. So, even if the stale TIDV 
>> timer fires on the hardware, the interrupt it generates has no configured path to be 
>> received by the guest OS. From the guest's perspective, the interrupt is lost. Our 
>> emulation should model this by ignoring/clearing the stale event, which is precisely 
>> what the patch does.
>>
>>>
>>> The expression "TIDV/RDTR are not used for moderation in MSI-X mode" is also unclear. 
>>> Behaving drivers may indeed use ITR for that purpose, but the question for us is: what 
>>> will e1000e do when the guest tries to use TIDV/RDTR in MSI-X mode anyway? That defines 
>>> the behavior we need to implement.
>>
>> The TIDV and RDTR registers are part of the device's memory-mapped I/O space. The 
>> hardware will almost certainly allow a write to these registers at any time.
>>
>> However, based on the separate hardware paths described above, the write would be 
>> ineffective. A driver could set the TIDV timer, and the timer would likely count down, 
>> but its expiration event is only wired to the legacy interrupt generation logic. Since 
>> the device is configured for MSI-X interrupt delivery, that path is dormant. The write 
>> is accepted, but the action is inert.
>>
>>>
>>> If the datasheet describes the expected behavior with delayed interrupts in MSI-X, a 
>>> reference to the datasheet should be made at least in the patch message. Otherwise, 
>>> perhaps this "if (msix_enabled(core->owner))" is just extraneous and should be removed.
>>
>> The "if (msix_enabled(core->owner))" check is not extraneous and must be kept. It 
>> correctly separates the emulation logic for these two mutually exclusive hardware modes. 
>> Removing it would incorrectly allow a legacy delayed interrupt to be processed as if it 
>> were valid in MSI-X mode, which is not how the hardware works.
>>
>> Moreover it can introduce unexpected behavior in the guest as this case could not be 
>> managed.
> 
> Revision 3.4's 4.6.1 Interrupts During Initialization says:
>  > Most drivers disable interrupts during initialization to prevent
>  > re-entrancy. Interrupts are disabled by writing to the IMC register.
>  > Note that the interrupts need to be disabled also after issuing a
>  > global reset, so a typical driver initialization flow is:
>  >
>  > 1. Disable interrupts
>  > 2. Issue a global reset
>  > 3. Disable interrupts (again)
>  > 4. …
>  >
>  > After the initialization completes, a typical driver enables the
>  > desired interrupts by
>  > writing to the IMS register.
> 
> Drivers can ensure that old interrupts will not fire by following this procedure. The 
> behavior when not following this is undefined (in the datasheet I'm referring to).

So, do you agree if I only replace the assert() by a qemu_log_mask(LOG_GUEST_ERROR, ...)?

Removing totally the "if (msix_enabled(core->owner))" could introduce a behavior change in 
the current implementation and I'd like to avoid that.

Thanks,
Laurent

Thanks,
Laurent


Re: [PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable
Posted by Akihiko Odaki 4 months, 1 week ago
On 2025/08/07 19:31, Laurent Vivier wrote:
> Removing totally the "if (msix_enabled(core->owner))" could introduce a 
> behavior change in the current implementation and I'd like to avoid that.

I failed to respond to this. There will be no behavioral change other 
than getting rid of the assertion failure.

If core->delayed_causes == 0, the latter part of this function is no-op 
and just returns 0. This is no different whether
"if (msix_enabled(core->owner))" is present or not.

If core->delayed_causes != 0, we had the assertion failure, but the 
assertion failure will be gone. The behavior after assert() was never 
present in the past, so we don't have to worry about behavioral change.

Regards,
Akihiko Odaki
Re: [PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable
Posted by Akihiko Odaki 4 months, 1 week ago
On 2025/08/07 19:31, Laurent Vivier wrote:
> On 07/08/2025 12:13, Akihiko Odaki wrote:
>> On 2025/08/07 17:36, Laurent Vivier wrote:
>>> On 06/08/2025 19:44, Akihiko Odaki wrote:
>>>> On 2025/08/07 0:29, Laurent Vivier wrote:
>>>>> A race condition between guest driver actions and QEMU timers can lead
>>>>> to an assertion failure when the guest switches the e1000e from legacy
>>>>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
>>>>> RDTR) is active, but the guest enables MSI-X before the timer fires,
>>>>> the pending interrupt cause can trigger an assert in
>>>>> `e1000e_intmgr_collect_delayed_causes()`.
>>>>>
>>>>> The function's assertion (`assert(core->delayed_causes == 0)`)
>>>>> incorrectly assumes that it's impossible for a legacy delayed 
>>>>> interrupt
>>>>> to be pending once the device is in MSI-X mode.
>>>>>
>>>>> This behavior is incorrect. On a physical device, a driver-initiated
>>>>> mode switch would mask interrupts, reconfigure the hardware, and clear
>>>>> any stale interrupt states. The legacy delay timers (TIDV/RDTR) are 
>>>>> not
>>>>> used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
>>>>> mechanism is used instead. Therefore, any pending interrupt from the
>>>>> old mode should be ignored.
>>>>
>>>> It is true that triggering assertion is incorrect as per: docs/ 
>>>> devel/ secure-coding- practices.rst
>>>>
>>>> However, I don't see statements in the datasheet that says mode 
>>>> switch will clear stale interrupts.
>>>
>>> The datasheet doesn't say "stale interrupts are cleared." but it 
>>> describes two fundamentally separate and mutually exclusive hardware 
>>> paths for generating interrupts:
>>>
>>> Intel® 82574 GbE Controller Family Datasheet
>>> https://docs.rs-online.com/96e8/0900766b81384733.pdf
>>
>> It is an old revision (2.5). More recent revision (3.4) can be found at:
>> https://courses.cs.washington.edu/courses/cse451/18au/readings/e1000e.pdf
>>
>>>
>>> See 7.4.1 Legacy and MSI Interrupt Modes
>>>
>>> Legacy/MSI Path: An event (like a packet transmission completing) 
>>> sets a cause bit in the ICR (Interrupt Cause Register). The legacy 
>>> delay timers (TIDV/RDTR) can delay the propagation of this cause. 
>>> When the timer expires, if the corresponding bit in the IMS 
>>> (Interrupt Mask Set) is enabled, the hardware asserts the INTx pin to 
>>> signal the CPU.
>>
>> Both revisions 2.5 and 3.4 have the following statements, different 
>> from what you quoted:
>>
>>  > In legacy and MSI modes, an interrupt cause is reflected by setting
>>  > one of the bits in the ICR register, where each bit reflects one or
>>  > more causes. This description of ICR register provides the mapping of
>>  > interrupt causes (for example, a specific Rx queue event or a LSC
>>  > event) to bits in the ICR.
>>  >
>>  > Mapping of causes relating to the Tx and Rx queues as well as
>>  > non-queue causes in this mode is not configurable. Each possible queue
>>  > interrupt cause (such as, each Rx queue, Tx queue or any other
>>  > interrupt source) has an entry in the ICR.
>>  >
>>  > The following configuration and parameters are involved:
>>  > • The ICR[31:0] bits are allocated to specific interrupt causes
>>
>> Please ensure that you refer a correct datasheet and share it for me.
>>
>>>
>>> See 7.4.2 MSI-X Mode
>>>
>>> MSI-X Path: An event on a specific queue (e.g., Rx Queue 1) is mapped 
>>> via the IVAR (Interrupt Vector Allocation Register) to a specific 
>>> MSI-X vector. The ITR (Interrupt Throttle Rate) register for that 
>>> vector then determines if an interrupt should be generated. If 
>>> allowed, the hardware performs a memory write to the address 
>>> specified in the PCI MSI-X table, delivering the message to a 
>>> specific CPU core.
>>>
>>> Only non-queue causes are reflected in ICR (so not TIDV/RDTR).
>>>
>>> The key here is that a TIDV timer expiring is only connected to the 
>>> legacy ICR->IMS- >INTx path. There is no described hardware path for 
>>> a TIDV timer expiration to trigger an MSI-X memory write.
>>>
>>> Therefore, if a guest enables MSI-X, it reconfigures its interrupt 
>>> controller (APIC) to listen for MSI-X messages, not legacy INTx pin 
>>> assertions. So, even if the stale TIDV timer fires on the hardware, 
>>> the interrupt it generates has no configured path to be received by 
>>> the guest OS. From the guest's perspective, the interrupt is lost. 
>>> Our emulation should model this by ignoring/clearing the stale event, 
>>> which is precisely what the patch does.
>>>
>>>>
>>>> The expression "TIDV/RDTR are not used for moderation in MSI-X mode" 
>>>> is also unclear. Behaving drivers may indeed use ITR for that 
>>>> purpose, but the question for us is: what will e1000e do when the 
>>>> guest tries to use TIDV/RDTR in MSI-X mode anyway? That defines the 
>>>> behavior we need to implement.
>>>
>>> The TIDV and RDTR registers are part of the device's memory-mapped I/ 
>>> O space. The hardware will almost certainly allow a write to these 
>>> registers at any time.
>>>
>>> However, based on the separate hardware paths described above, the 
>>> write would be ineffective. A driver could set the TIDV timer, and 
>>> the timer would likely count down, but its expiration event is only 
>>> wired to the legacy interrupt generation logic. Since the device is 
>>> configured for MSI-X interrupt delivery, that path is dormant. The 
>>> write is accepted, but the action is inert.
>>>
>>>>
>>>> If the datasheet describes the expected behavior with delayed 
>>>> interrupts in MSI-X, a reference to the datasheet should be made at 
>>>> least in the patch message. Otherwise, perhaps this "if 
>>>> (msix_enabled(core->owner))" is just extraneous and should be removed.
>>>
>>> The "if (msix_enabled(core->owner))" check is not extraneous and must 
>>> be kept. It correctly separates the emulation logic for these two 
>>> mutually exclusive hardware modes. Removing it would incorrectly 
>>> allow a legacy delayed interrupt to be processed as if it were valid 
>>> in MSI-X mode, which is not how the hardware works.
>>>
>>> Moreover it can introduce unexpected behavior in the guest as this 
>>> case could not be managed.
>>
>> Revision 3.4's 4.6.1 Interrupts During Initialization says:
>>  > Most drivers disable interrupts during initialization to prevent
>>  > re-entrancy. Interrupts are disabled by writing to the IMC register.
>>  > Note that the interrupts need to be disabled also after issuing a
>>  > global reset, so a typical driver initialization flow is:
>>  >
>>  > 1. Disable interrupts
>>  > 2. Issue a global reset
>>  > 3. Disable interrupts (again)
>>  > 4. …
>>  >
>>  > After the initialization completes, a typical driver enables the
>>  > desired interrupts by
>>  > writing to the IMS register.
>>
>> Drivers can ensure that old interrupts will not fire by following this 
>> procedure. The behavior when not following this is undefined (in the 
>> datasheet I'm referring to).
> 
> So, do you agree if I only replace the assert() by a 
> qemu_log_mask(LOG_GUEST_ERROR, ...)?

The following "return 0" is extraneous so should be removed along with 
assert().

Regards,
Akihiko Odaki

Re: [PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable
Posted by Laurent Vivier 4 months, 1 week ago
On 07/08/2025 12:34, Akihiko Odaki wrote:
> On 2025/08/07 19:31, Laurent Vivier wrote:
>> On 07/08/2025 12:13, Akihiko Odaki wrote:
>>> On 2025/08/07 17:36, Laurent Vivier wrote:
>>>> On 06/08/2025 19:44, Akihiko Odaki wrote:
>>>>> On 2025/08/07 0:29, Laurent Vivier wrote:
>>>>>> A race condition between guest driver actions and QEMU timers can lead
>>>>>> to an assertion failure when the guest switches the e1000e from legacy
>>>>>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
>>>>>> RDTR) is active, but the guest enables MSI-X before the timer fires,
>>>>>> the pending interrupt cause can trigger an assert in
>>>>>> `e1000e_intmgr_collect_delayed_causes()`.
>>>>>>
>>>>>> The function's assertion (`assert(core->delayed_causes == 0)`)
>>>>>> incorrectly assumes that it's impossible for a legacy delayed interrupt
>>>>>> to be pending once the device is in MSI-X mode.
>>>>>>
>>>>>> This behavior is incorrect. On a physical device, a driver-initiated
>>>>>> mode switch would mask interrupts, reconfigure the hardware, and clear
>>>>>> any stale interrupt states. The legacy delay timers (TIDV/RDTR) are not
>>>>>> used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
>>>>>> mechanism is used instead. Therefore, any pending interrupt from the
>>>>>> old mode should be ignored.
>>>>>
>>>>> It is true that triggering assertion is incorrect as per: docs/ devel/ secure-coding- 
>>>>> practices.rst
>>>>>
>>>>> However, I don't see statements in the datasheet that says mode switch will clear 
>>>>> stale interrupts.
>>>>
>>>> The datasheet doesn't say "stale interrupts are cleared." but it describes two 
>>>> fundamentally separate and mutually exclusive hardware paths for generating interrupts:
>>>>
>>>> Intel® 82574 GbE Controller Family Datasheet
>>>> https://docs.rs-online.com/96e8/0900766b81384733.pdf
>>>
>>> It is an old revision (2.5). More recent revision (3.4) can be found at:
>>> https://courses.cs.washington.edu/courses/cse451/18au/readings/e1000e.pdf
>>>
>>>>
>>>> See 7.4.1 Legacy and MSI Interrupt Modes
>>>>
>>>> Legacy/MSI Path: An event (like a packet transmission completing) sets a cause bit in 
>>>> the ICR (Interrupt Cause Register). The legacy delay timers (TIDV/RDTR) can delay the 
>>>> propagation of this cause. When the timer expires, if the corresponding bit in the IMS 
>>>> (Interrupt Mask Set) is enabled, the hardware asserts the INTx pin to signal the CPU.
>>>
>>> Both revisions 2.5 and 3.4 have the following statements, different from what you quoted:
>>>
>>>  > In legacy and MSI modes, an interrupt cause is reflected by setting
>>>  > one of the bits in the ICR register, where each bit reflects one or
>>>  > more causes. This description of ICR register provides the mapping of
>>>  > interrupt causes (for example, a specific Rx queue event or a LSC
>>>  > event) to bits in the ICR.
>>>  >
>>>  > Mapping of causes relating to the Tx and Rx queues as well as
>>>  > non-queue causes in this mode is not configurable. Each possible queue
>>>  > interrupt cause (such as, each Rx queue, Tx queue or any other
>>>  > interrupt source) has an entry in the ICR.
>>>  >
>>>  > The following configuration and parameters are involved:
>>>  > • The ICR[31:0] bits are allocated to specific interrupt causes
>>>
>>> Please ensure that you refer a correct datasheet and share it for me.
>>>
>>>>
>>>> See 7.4.2 MSI-X Mode
>>>>
>>>> MSI-X Path: An event on a specific queue (e.g., Rx Queue 1) is mapped via the IVAR 
>>>> (Interrupt Vector Allocation Register) to a specific MSI-X vector. The ITR (Interrupt 
>>>> Throttle Rate) register for that vector then determines if an interrupt should be 
>>>> generated. If allowed, the hardware performs a memory write to the address specified 
>>>> in the PCI MSI-X table, delivering the message to a specific CPU core.
>>>>
>>>> Only non-queue causes are reflected in ICR (so not TIDV/RDTR).
>>>>
>>>> The key here is that a TIDV timer expiring is only connected to the legacy ICR->IMS- 
>>>> >INTx path. There is no described hardware path for a TIDV timer expiration to trigger 
>>>> an MSI-X memory write.
>>>>
>>>> Therefore, if a guest enables MSI-X, it reconfigures its interrupt controller (APIC) 
>>>> to listen for MSI-X messages, not legacy INTx pin assertions. So, even if the stale 
>>>> TIDV timer fires on the hardware, the interrupt it generates has no configured path to 
>>>> be received by the guest OS. From the guest's perspective, the interrupt is lost. Our 
>>>> emulation should model this by ignoring/clearing the stale event, which is precisely 
>>>> what the patch does.
>>>>
>>>>>
>>>>> The expression "TIDV/RDTR are not used for moderation in MSI-X mode" is also unclear. 
>>>>> Behaving drivers may indeed use ITR for that purpose, but the question for us is: 
>>>>> what will e1000e do when the guest tries to use TIDV/RDTR in MSI-X mode anyway? That 
>>>>> defines the behavior we need to implement.
>>>>
>>>> The TIDV and RDTR registers are part of the device's memory-mapped I/ O space. The 
>>>> hardware will almost certainly allow a write to these registers at any time.
>>>>
>>>> However, based on the separate hardware paths described above, the write would be 
>>>> ineffective. A driver could set the TIDV timer, and the timer would likely count down, 
>>>> but its expiration event is only wired to the legacy interrupt generation logic. Since 
>>>> the device is configured for MSI-X interrupt delivery, that path is dormant. The write 
>>>> is accepted, but the action is inert.
>>>>
>>>>>
>>>>> If the datasheet describes the expected behavior with delayed interrupts in MSI-X, a 
>>>>> reference to the datasheet should be made at least in the patch message. Otherwise, 
>>>>> perhaps this "if (msix_enabled(core->owner))" is just extraneous and should be removed.
>>>>
>>>> The "if (msix_enabled(core->owner))" check is not extraneous and must be kept. It 
>>>> correctly separates the emulation logic for these two mutually exclusive hardware 
>>>> modes. Removing it would incorrectly allow a legacy delayed interrupt to be processed 
>>>> as if it were valid in MSI-X mode, which is not how the hardware works.
>>>>
>>>> Moreover it can introduce unexpected behavior in the guest as this case could not be 
>>>> managed.
>>>
>>> Revision 3.4's 4.6.1 Interrupts During Initialization says:
>>>  > Most drivers disable interrupts during initialization to prevent
>>>  > re-entrancy. Interrupts are disabled by writing to the IMC register.
>>>  > Note that the interrupts need to be disabled also after issuing a
>>>  > global reset, so a typical driver initialization flow is:
>>>  >
>>>  > 1. Disable interrupts
>>>  > 2. Issue a global reset
>>>  > 3. Disable interrupts (again)
>>>  > 4. …
>>>  >
>>>  > After the initialization completes, a typical driver enables the
>>>  > desired interrupts by
>>>  > writing to the IMS register.
>>>
>>> Drivers can ensure that old interrupts will not fire by following this procedure. The 
>>> behavior when not following this is undefined (in the datasheet I'm referring to).
>>
>> So, do you agree if I only replace the assert() by a qemu_log_mask(LOG_GUEST_ERROR, ...)?
> 
> The following "return 0" is extraneous so should be removed along with assert().

This is the behavior change I'd like to avoid.

But if you are sure there is no risk, I can remove totally the "if 
(msix_enabled(core->owner)) { ... }"

Thanks,
Laurent


Re: [PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable
Posted by Thomas Huth 4 months, 1 week ago
On 06/08/2025 19.44, Akihiko Odaki wrote:
> On 2025/08/07 0:29, Laurent Vivier wrote:
>> A race condition between guest driver actions and QEMU timers can lead
>> to an assertion failure when the guest switches the e1000e from legacy
>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
>> RDTR) is active, but the guest enables MSI-X before the timer fires,
>> the pending interrupt cause can trigger an assert in
>> `e1000e_intmgr_collect_delayed_causes()`.
>>
>> The function's assertion (`assert(core->delayed_causes == 0)`)
>> incorrectly assumes that it's impossible for a legacy delayed interrupt
>> to be pending once the device is in MSI-X mode.
>>
>> This behavior is incorrect. On a physical device, a driver-initiated
>> mode switch would mask interrupts, reconfigure the hardware, and clear
>> any stale interrupt states. The legacy delay timers (TIDV/RDTR) are not
>> used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
>> mechanism is used instead. Therefore, any pending interrupt from the
>> old mode should be ignored.
> 
> It is true that triggering assertion is incorrect as per: docs/devel/secure- 
> coding-practices.rst
> 
> However, I don't see statements in the datasheet that says mode switch will 
> clear stale interrupts.
> 
> The expression "TIDV/RDTR are not used for moderation in MSI-X mode" is also 
> unclear. Behaving drivers may indeed use ITR for that purpose, but the 
> question for us is: what will e1000e do when the guest tries to use TIDV/ 
> RDTR in MSI-X mode anyway? That defines the behavior we need to implement.

If it's not clear what to do here, maybe we could use a 
qemu_log_mask(LOG_UNIMP, ...) for now?

  Thomas
Re: [PATCH] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable
Posted by Akihiko Odaki 4 months, 1 week ago
On 2025/08/07 15:57, Thomas Huth wrote:
> On 06/08/2025 19.44, Akihiko Odaki wrote:
>> On 2025/08/07 0:29, Laurent Vivier wrote:
>>> A race condition between guest driver actions and QEMU timers can lead
>>> to an assertion failure when the guest switches the e1000e from legacy
>>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
>>> RDTR) is active, but the guest enables MSI-X before the timer fires,
>>> the pending interrupt cause can trigger an assert in
>>> `e1000e_intmgr_collect_delayed_causes()`.
>>>
>>> The function's assertion (`assert(core->delayed_causes == 0)`)
>>> incorrectly assumes that it's impossible for a legacy delayed interrupt
>>> to be pending once the device is in MSI-X mode.
>>>
>>> This behavior is incorrect. On a physical device, a driver-initiated
>>> mode switch would mask interrupts, reconfigure the hardware, and clear
>>> any stale interrupt states. The legacy delay timers (TIDV/RDTR) are not
>>> used for moderation in MSI-X mode; the Interrupt Throttle Rate (ITR)
>>> mechanism is used instead. Therefore, any pending interrupt from the
>>> old mode should be ignored.
>>
>> It is true that triggering assertion is incorrect as per: docs/devel/ 
>> secure- coding-practices.rst
>>
>> However, I don't see statements in the datasheet that says mode switch 
>> will clear stale interrupts.
>>
>> The expression "TIDV/RDTR are not used for moderation in MSI-X mode" 
>> is also unclear. Behaving drivers may indeed use ITR for that purpose, 
>> but the question for us is: what will e1000e do when the guest tries 
>> to use TIDV/ RDTR in MSI-X mode anyway? That defines the behavior we 
>> need to implement.
> 
> If it's not clear what to do here, maybe we could use a 
> qemu_log_mask(LOG_UNIMP, ...) for now?

The behavior is undefined here as far as I understand. If so, 
LOG_GUEST_ERROR will fit better because it is not appropriate for guests 
to make an assumption on the behavior.

Regards,
Akihiko Odaki