[v2] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable

[PATCH v2] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable

Posted by Laurent Vivier 6 months ago

A race condition between guest driver actions and QEMU timers can lead
to an assertion failure when the guest switches the e1000e from legacy
interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
RDTR) is active, but the guest enables MSI-X before the timer fires,
the pending interrupt cause can trigger an assert in
e1000e_intmgr_collect_delayed_causes().

This patch removes the assertion and executes the code that clears the
pending legacy causes. This change is safe and introduces no unintended
behavioral side effects, as it only alters a state that previously led
to termination.

- when core->delayed_causes == 0 the function was already a no-op and
  remains so.

- when core->delayed_causes != 0 the function would previously
  crash due to the assertion failure. The patch now defines a safe
  outcome by clearing the cause and returning. Since behavior after
  the assertion never existed, this simply corrects the crash.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863
Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/net/e1000e_core.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 24138587905b..06657bb3ac5c 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -341,11 +341,6 @@ e1000e_intmgr_collect_delayed_causes(E1000ECore *core)
 {
     uint32_t res;
 
-    if (msix_enabled(core->owner)) {
-        assert(core->delayed_causes == 0);
-        return 0;
-    }
-
     res = core->delayed_causes;
     core->delayed_causes = 0;
 
-- 
2.49.0

Re: [PATCH v2] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable

Posted by Michael Tokarev 5 months, 1 week ago

On 07.08.2025 14:08, Laurent Vivier wrote:
> A race condition between guest driver actions and QEMU timers can lead
> to an assertion failure when the guest switches the e1000e from legacy
> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
> RDTR) is active, but the guest enables MSI-X before the timer fires,
> the pending interrupt cause can trigger an assert in
> e1000e_intmgr_collect_delayed_causes().
> 
> This patch removes the assertion and executes the code that clears the
> pending legacy causes. This change is safe and introduces no unintended
> behavioral side effects, as it only alters a state that previously led
> to termination.
> 
> - when core->delayed_causes == 0 the function was already a no-op and
>    remains so.
> 
> - when core->delayed_causes != 0 the function would previously
>    crash due to the assertion failure. The patch now defines a safe
>    outcome by clearing the cause and returning. Since behavior after
>    the assertion never existed, this simply corrects the crash.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863
> Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

It feels like this is a qemu-stable material.  Picking it up for
10.0 & 10.1.  And for 7.2 too.

Please let me know if I shouldn't.

Thanks,

/mjt

> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> index 24138587905b..06657bb3ac5c 100644
> --- a/hw/net/e1000e_core.c
> +++ b/hw/net/e1000e_core.c
> @@ -341,11 +341,6 @@ e1000e_intmgr_collect_delayed_causes(E1000ECore *core)
>   {
>       uint32_t res;
>   
> -    if (msix_enabled(core->owner)) {
> -        assert(core->delayed_causes == 0);
> -        return 0;
> -    }
> -
>       res = core->delayed_causes;
>       core->delayed_causes = 0;
>

Re: [PATCH v2] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable

Posted by Jason Wang 5 months, 3 weeks ago

On Thu, Aug 7, 2025 at 7:08 PM Laurent Vivier <lvivier@redhat.com> wrote:
>
> A race condition between guest driver actions and QEMU timers can lead
> to an assertion failure when the guest switches the e1000e from legacy
> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
> RDTR) is active, but the guest enables MSI-X before the timer fires,
> the pending interrupt cause can trigger an assert in
> e1000e_intmgr_collect_delayed_causes().
>
> This patch removes the assertion and executes the code that clears the
> pending legacy causes. This change is safe and introduces no unintended
> behavioral side effects, as it only alters a state that previously led
> to termination.
>
> - when core->delayed_causes == 0 the function was already a no-op and
>   remains so.
>
> - when core->delayed_causes != 0 the function would previously
>   crash due to the assertion failure. The patch now defines a safe
>   outcome by clearing the cause and returning. Since behavior after
>   the assertion never existed, this simply corrects the crash.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863
> Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---

Acked-by: Jason Wang <jasowang@redhat.com>

Consider rc3 is out. Can this be applied directly by maintainers or a
PULL request is expected?

Thanks

Re: [PATCH v2] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable

Posted by Stefan Hajnoczi 5 months, 3 weeks ago

On Mon, Aug 18, 2025 at 10:08:18AM +0800, Jason Wang wrote:
> On Thu, Aug 7, 2025 at 7:08 PM Laurent Vivier <lvivier@redhat.com> wrote:
> >
> > A race condition between guest driver actions and QEMU timers can lead
> > to an assertion failure when the guest switches the e1000e from legacy
> > interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
> > RDTR) is active, but the guest enables MSI-X before the timer fires,
> > the pending interrupt cause can trigger an assert in
> > e1000e_intmgr_collect_delayed_causes().
> >
> > This patch removes the assertion and executes the code that clears the
> > pending legacy causes. This change is safe and introduces no unintended
> > behavioral side effects, as it only alters a state that previously led
> > to termination.
> >
> > - when core->delayed_causes == 0 the function was already a no-op and
> >   remains so.
> >
> > - when core->delayed_causes != 0 the function would previously
> >   crash due to the assertion failure. The patch now defines a safe
> >   outcome by clearing the cause and returning. Since behavior after
> >   the assertion never existed, this simply corrects the crash.
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863
> > Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> > ---
> 
> Acked-by: Jason Wang <jasowang@redhat.com>
> 
> Consider rc3 is out. Can this be applied directly by maintainers or a
> PULL request is expected?

The commit description doesn't mention whether this fixes a regression
introduced since QEMU 10.0, whether there is a security impact, etc.
In the absence of more information, this looks like a regular bug fix
that does not need to be merged for -rc4.

Only release blockers will be merged for -rc4 (Tue 19 Aug). Please
provide a justification if this commit is a release blocker. Reasoning:
- From -rc3 onwards the goal is to make the final release and adding
  additional patches risks introducing new issues that will delay the
  release further.
- Commits should include enough information to make the decision to
  merge easy and documented in git-log(1). Don't rely on me to judge the
  severity in areas of the codebase I'm not an expert in.

Thanks!

Stefan

Re: [PATCH v2] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable

Posted by Jason Wang 5 months, 3 weeks ago

On Mon, Aug 18, 2025 at 10:03 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Aug 18, 2025 at 10:08:18AM +0800, Jason Wang wrote:
> > On Thu, Aug 7, 2025 at 7:08 PM Laurent Vivier <lvivier@redhat.com> wrote:
> > >
> > > A race condition between guest driver actions and QEMU timers can lead
> > > to an assertion failure when the guest switches the e1000e from legacy
> > > interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
> > > RDTR) is active, but the guest enables MSI-X before the timer fires,
> > > the pending interrupt cause can trigger an assert in
> > > e1000e_intmgr_collect_delayed_causes().
> > >
> > > This patch removes the assertion and executes the code that clears the
> > > pending legacy causes. This change is safe and introduces no unintended
> > > behavioral side effects, as it only alters a state that previously led
> > > to termination.
> > >
> > > - when core->delayed_causes == 0 the function was already a no-op and
> > >   remains so.
> > >
> > > - when core->delayed_causes != 0 the function would previously
> > >   crash due to the assertion failure. The patch now defines a safe
> > >   outcome by clearing the cause and returning. Since behavior after
> > >   the assertion never existed, this simply corrects the crash.
> > >
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863
> > > Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> > > ---
> >
> > Acked-by: Jason Wang <jasowang@redhat.com>
> >
> > Consider rc3 is out. Can this be applied directly by maintainers or a
> > PULL request is expected?
>
> The commit description doesn't mention whether this fixes a regression
> introduced since QEMU 10.0, whether there is a security impact, etc.
> In the absence of more information, this looks like a regular bug fix
> that does not need to be merged for -rc4.
>
> Only release blockers will be merged for -rc4 (Tue 19 Aug). Please
> provide a justification if this commit is a release blocker. Reasoning:
> - From -rc3 onwards the goal is to make the final release and adding
>   additional patches risks introducing new issues that will delay the
>   release further.
> - Commits should include enough information to make the decision to
>   merge easy and documented in git-log(1). Don't rely on me to judge the
>   severity in areas of the codebase I'm not an expert in.

I see, I think it's not a release blocker so we can defer this to the
next release.

Thanks

>
> Thanks!
>
> Stefan

Re: [PATCH v2] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable

Posted by Laurent Vivier 5 months, 1 week ago

Hi Jason,

On 19/08/2025 04:46, Jason Wang wrote:
> On Mon, Aug 18, 2025 at 10:03 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>
>> On Mon, Aug 18, 2025 at 10:08:18AM +0800, Jason Wang wrote:
>>> On Thu, Aug 7, 2025 at 7:08 PM Laurent Vivier <lvivier@redhat.com> wrote:
>>>>
>>>> A race condition between guest driver actions and QEMU timers can lead
>>>> to an assertion failure when the guest switches the e1000e from legacy
>>>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
>>>> RDTR) is active, but the guest enables MSI-X before the timer fires,
>>>> the pending interrupt cause can trigger an assert in
>>>> e1000e_intmgr_collect_delayed_causes().
>>>>
>>>> This patch removes the assertion and executes the code that clears the
>>>> pending legacy causes. This change is safe and introduces no unintended
>>>> behavioral side effects, as it only alters a state that previously led
>>>> to termination.
>>>>
>>>> - when core->delayed_causes == 0 the function was already a no-op and
>>>>    remains so.
>>>>
>>>> - when core->delayed_causes != 0 the function would previously
>>>>    crash due to the assertion failure. The patch now defines a safe
>>>>    outcome by clearing the cause and returning. Since behavior after
>>>>    the assertion never existed, this simply corrects the crash.
>>>>
>>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863
>>>> Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>>>> ---
>>>
>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>
>>> Consider rc3 is out. Can this be applied directly by maintainers or a
>>> PULL request is expected?
>>
>> The commit description doesn't mention whether this fixes a regression
>> introduced since QEMU 10.0, whether there is a security impact, etc.
>> In the absence of more information, this looks like a regular bug fix
>> that does not need to be merged for -rc4.
>>
>> Only release blockers will be merged for -rc4 (Tue 19 Aug). Please
>> provide a justification if this commit is a release blocker. Reasoning:
>> - From -rc3 onwards the goal is to make the final release and adding
>>    additional patches risks introducing new issues that will delay the
>>    release further.
>> - Commits should include enough information to make the decision to
>>    merge easy and documented in git-log(1). Don't rely on me to judge the
>>    severity in areas of the codebase I'm not an expert in.
> 
> I see, I think it's not a release blocker so we can defer this to the
> next release.

just a reminder not to forget to pull it now...

Thanks,
Laurent

Re: [PATCH v2] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable

Posted by Philippe Mathieu-Daudé 5 months, 1 week ago

On 1/9/25 13:57, Laurent Vivier wrote:
> Hi Jason,
> 
> On 19/08/2025 04:46, Jason Wang wrote:
>> On Mon, Aug 18, 2025 at 10:03 PM Stefan Hajnoczi <stefanha@redhat.com> 
>> wrote:
>>>
>>> On Mon, Aug 18, 2025 at 10:08:18AM +0800, Jason Wang wrote:
>>>> On Thu, Aug 7, 2025 at 7:08 PM Laurent Vivier <lvivier@redhat.com> 
>>>> wrote:
>>>>>
>>>>> A race condition between guest driver actions and QEMU timers can lead
>>>>> to an assertion failure when the guest switches the e1000e from legacy
>>>>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
>>>>> RDTR) is active, but the guest enables MSI-X before the timer fires,
>>>>> the pending interrupt cause can trigger an assert in
>>>>> e1000e_intmgr_collect_delayed_causes().
>>>>>
>>>>> This patch removes the assertion and executes the code that clears the
>>>>> pending legacy causes. This change is safe and introduces no 
>>>>> unintended
>>>>> behavioral side effects, as it only alters a state that previously led
>>>>> to termination.
>>>>>
>>>>> - when core->delayed_causes == 0 the function was already a no-op and
>>>>>    remains so.
>>>>>
>>>>> - when core->delayed_causes != 0 the function would previously
>>>>>    crash due to the assertion failure. The patch now defines a safe
>>>>>    outcome by clearing the cause and returning. Since behavior after
>>>>>    the assertion never existed, this simply corrects the crash.
>>>>>
>>>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863
>>>>> Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>>>>> ---
>>>>
>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>>
>>>> Consider rc3 is out. Can this be applied directly by maintainers or a
>>>> PULL request is expected?
>>>
>>> The commit description doesn't mention whether this fixes a regression
>>> introduced since QEMU 10.0, whether there is a security impact, etc.
>>> In the absence of more information, this looks like a regular bug fix
>>> that does not need to be merged for -rc4.
>>>
>>> Only release blockers will be merged for -rc4 (Tue 19 Aug). Please
>>> provide a justification if this commit is a release blocker. Reasoning:
>>> - From -rc3 onwards the goal is to make the final release and adding
>>>    additional patches risks introducing new issues that will delay the
>>>    release further.
>>> - Commits should include enough information to make the decision to
>>>    merge easy and documented in git-log(1). Don't rely on me to judge 
>>> the
>>>    severity in areas of the codebase I'm not an expert in.
>>
>> I see, I think it's not a release blocker so we can defer this to the
>> next release.
> 
> just a reminder not to forget to pull it now...

Since Jason Acked the patch, I'll merge it via my hw-misc tree; thanks!

Re: [PATCH v2] e1000e: Prevent crash from legacy interrupt firing after MSI-X enable

Posted by Akihiko Odaki 6 months ago

On 2025/08/07 20:08, Laurent Vivier wrote:
> A race condition between guest driver actions and QEMU timers can lead
> to an assertion failure when the guest switches the e1000e from legacy
> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
> RDTR) is active, but the guest enables MSI-X before the timer fires,
> the pending interrupt cause can trigger an assert in
> e1000e_intmgr_collect_delayed_causes().
> 
> This patch removes the assertion and executes the code that clears the
> pending legacy causes. This change is safe and introduces no unintended
> behavioral side effects, as it only alters a state that previously led
> to termination.
> 
> - when core->delayed_causes == 0 the function was already a no-op and
>    remains so.
> 
> - when core->delayed_causes != 0 the function would previously
>    crash due to the assertion failure. The patch now defines a safe
>    outcome by clearing the cause and returning. Since behavior after
>    the assertion never existed, this simply corrects the crash.

This description is better than my comment written in haste. Thank you 
for taking care of this.

Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>