hw/net/e1000e_core.c | 5 ----- 1 file changed, 5 deletions(-)
A race condition between guest driver actions and QEMU timers can lead
to an assertion failure when the guest switches the e1000e from legacy
interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
RDTR) is active, but the guest enables MSI-X before the timer fires,
the pending interrupt cause can trigger an assert in
e1000e_intmgr_collect_delayed_causes().
This patch removes the assertion and executes the code that clears the
pending legacy causes. This change is safe and introduces no unintended
behavioral side effects, as it only alters a state that previously led
to termination.
- when core->delayed_causes == 0 the function was already a no-op and
remains so.
- when core->delayed_causes != 0 the function would previously
crash due to the assertion failure. The patch now defines a safe
outcome by clearing the cause and returning. Since behavior after
the assertion never existed, this simply corrects the crash.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863
Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
hw/net/e1000e_core.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 24138587905b..06657bb3ac5c 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -341,11 +341,6 @@ e1000e_intmgr_collect_delayed_causes(E1000ECore *core)
{
uint32_t res;
- if (msix_enabled(core->owner)) {
- assert(core->delayed_causes == 0);
- return 0;
- }
-
res = core->delayed_causes;
core->delayed_causes = 0;
--
2.49.0
On 07.08.2025 14:08, Laurent Vivier wrote:
> A race condition between guest driver actions and QEMU timers can lead
> to an assertion failure when the guest switches the e1000e from legacy
> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or
> RDTR) is active, but the guest enables MSI-X before the timer fires,
> the pending interrupt cause can trigger an assert in
> e1000e_intmgr_collect_delayed_causes().
>
> This patch removes the assertion and executes the code that clears the
> pending legacy causes. This change is safe and introduces no unintended
> behavioral side effects, as it only alters a state that previously led
> to termination.
>
> - when core->delayed_causes == 0 the function was already a no-op and
> remains so.
>
> - when core->delayed_causes != 0 the function would previously
> crash due to the assertion failure. The patch now defines a safe
> outcome by clearing the cause and returning. Since behavior after
> the assertion never existed, this simply corrects the crash.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863
> Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
It feels like this is a qemu-stable material. Picking it up for
10.0 & 10.1. And for 7.2 too.
Please let me know if I shouldn't.
Thanks,
/mjt
> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> index 24138587905b..06657bb3ac5c 100644
> --- a/hw/net/e1000e_core.c
> +++ b/hw/net/e1000e_core.c
> @@ -341,11 +341,6 @@ e1000e_intmgr_collect_delayed_causes(E1000ECore *core)
> {
> uint32_t res;
>
> - if (msix_enabled(core->owner)) {
> - assert(core->delayed_causes == 0);
> - return 0;
> - }
> -
> res = core->delayed_causes;
> core->delayed_causes = 0;
>
On Thu, Aug 7, 2025 at 7:08 PM Laurent Vivier <lvivier@redhat.com> wrote: > > A race condition between guest driver actions and QEMU timers can lead > to an assertion failure when the guest switches the e1000e from legacy > interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or > RDTR) is active, but the guest enables MSI-X before the timer fires, > the pending interrupt cause can trigger an assert in > e1000e_intmgr_collect_delayed_causes(). > > This patch removes the assertion and executes the code that clears the > pending legacy causes. This change is safe and introduces no unintended > behavioral side effects, as it only alters a state that previously led > to termination. > > - when core->delayed_causes == 0 the function was already a no-op and > remains so. > > - when core->delayed_causes != 0 the function would previously > crash due to the assertion failure. The patch now defines a safe > outcome by clearing the cause and returning. Since behavior after > the assertion never existed, this simply corrects the crash. > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863 > Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > --- Acked-by: Jason Wang <jasowang@redhat.com> Consider rc3 is out. Can this be applied directly by maintainers or a PULL request is expected? Thanks
On Mon, Aug 18, 2025 at 10:08:18AM +0800, Jason Wang wrote: > On Thu, Aug 7, 2025 at 7:08 PM Laurent Vivier <lvivier@redhat.com> wrote: > > > > A race condition between guest driver actions and QEMU timers can lead > > to an assertion failure when the guest switches the e1000e from legacy > > interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or > > RDTR) is active, but the guest enables MSI-X before the timer fires, > > the pending interrupt cause can trigger an assert in > > e1000e_intmgr_collect_delayed_causes(). > > > > This patch removes the assertion and executes the code that clears the > > pending legacy causes. This change is safe and introduces no unintended > > behavioral side effects, as it only alters a state that previously led > > to termination. > > > > - when core->delayed_causes == 0 the function was already a no-op and > > remains so. > > > > - when core->delayed_causes != 0 the function would previously > > crash due to the assertion failure. The patch now defines a safe > > outcome by clearing the cause and returning. Since behavior after > > the assertion never existed, this simply corrects the crash. > > > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863 > > Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > > --- > > Acked-by: Jason Wang <jasowang@redhat.com> > > Consider rc3 is out. Can this be applied directly by maintainers or a > PULL request is expected? The commit description doesn't mention whether this fixes a regression introduced since QEMU 10.0, whether there is a security impact, etc. In the absence of more information, this looks like a regular bug fix that does not need to be merged for -rc4. Only release blockers will be merged for -rc4 (Tue 19 Aug). Please provide a justification if this commit is a release blocker. Reasoning: - From -rc3 onwards the goal is to make the final release and adding additional patches risks introducing new issues that will delay the release further. - Commits should include enough information to make the decision to merge easy and documented in git-log(1). Don't rely on me to judge the severity in areas of the codebase I'm not an expert in. Thanks! Stefan
On Mon, Aug 18, 2025 at 10:03 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Mon, Aug 18, 2025 at 10:08:18AM +0800, Jason Wang wrote: > > On Thu, Aug 7, 2025 at 7:08 PM Laurent Vivier <lvivier@redhat.com> wrote: > > > > > > A race condition between guest driver actions and QEMU timers can lead > > > to an assertion failure when the guest switches the e1000e from legacy > > > interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or > > > RDTR) is active, but the guest enables MSI-X before the timer fires, > > > the pending interrupt cause can trigger an assert in > > > e1000e_intmgr_collect_delayed_causes(). > > > > > > This patch removes the assertion and executes the code that clears the > > > pending legacy causes. This change is safe and introduces no unintended > > > behavioral side effects, as it only alters a state that previously led > > > to termination. > > > > > > - when core->delayed_causes == 0 the function was already a no-op and > > > remains so. > > > > > > - when core->delayed_causes != 0 the function would previously > > > crash due to the assertion failure. The patch now defines a safe > > > outcome by clearing the cause and returning. Since behavior after > > > the assertion never existed, this simply corrects the crash. > > > > > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863 > > > Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> > > > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > > > --- > > > > Acked-by: Jason Wang <jasowang@redhat.com> > > > > Consider rc3 is out. Can this be applied directly by maintainers or a > > PULL request is expected? > > The commit description doesn't mention whether this fixes a regression > introduced since QEMU 10.0, whether there is a security impact, etc. > In the absence of more information, this looks like a regular bug fix > that does not need to be merged for -rc4. > > Only release blockers will be merged for -rc4 (Tue 19 Aug). Please > provide a justification if this commit is a release blocker. Reasoning: > - From -rc3 onwards the goal is to make the final release and adding > additional patches risks introducing new issues that will delay the > release further. > - Commits should include enough information to make the decision to > merge easy and documented in git-log(1). Don't rely on me to judge the > severity in areas of the codebase I'm not an expert in. I see, I think it's not a release blocker so we can defer this to the next release. Thanks > > Thanks! > > Stefan
Hi Jason, On 19/08/2025 04:46, Jason Wang wrote: > On Mon, Aug 18, 2025 at 10:03 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: >> >> On Mon, Aug 18, 2025 at 10:08:18AM +0800, Jason Wang wrote: >>> On Thu, Aug 7, 2025 at 7:08 PM Laurent Vivier <lvivier@redhat.com> wrote: >>>> >>>> A race condition between guest driver actions and QEMU timers can lead >>>> to an assertion failure when the guest switches the e1000e from legacy >>>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or >>>> RDTR) is active, but the guest enables MSI-X before the timer fires, >>>> the pending interrupt cause can trigger an assert in >>>> e1000e_intmgr_collect_delayed_causes(). >>>> >>>> This patch removes the assertion and executes the code that clears the >>>> pending legacy causes. This change is safe and introduces no unintended >>>> behavioral side effects, as it only alters a state that previously led >>>> to termination. >>>> >>>> - when core->delayed_causes == 0 the function was already a no-op and >>>> remains so. >>>> >>>> - when core->delayed_causes != 0 the function would previously >>>> crash due to the assertion failure. The patch now defines a safe >>>> outcome by clearing the cause and returning. Since behavior after >>>> the assertion never existed, this simply corrects the crash. >>>> >>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863 >>>> Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> >>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com> >>>> --- >>> >>> Acked-by: Jason Wang <jasowang@redhat.com> >>> >>> Consider rc3 is out. Can this be applied directly by maintainers or a >>> PULL request is expected? >> >> The commit description doesn't mention whether this fixes a regression >> introduced since QEMU 10.0, whether there is a security impact, etc. >> In the absence of more information, this looks like a regular bug fix >> that does not need to be merged for -rc4. >> >> Only release blockers will be merged for -rc4 (Tue 19 Aug). Please >> provide a justification if this commit is a release blocker. Reasoning: >> - From -rc3 onwards the goal is to make the final release and adding >> additional patches risks introducing new issues that will delay the >> release further. >> - Commits should include enough information to make the decision to >> merge easy and documented in git-log(1). Don't rely on me to judge the >> severity in areas of the codebase I'm not an expert in. > > I see, I think it's not a release blocker so we can defer this to the > next release. just a reminder not to forget to pull it now... Thanks, Laurent
On 1/9/25 13:57, Laurent Vivier wrote: > Hi Jason, > > On 19/08/2025 04:46, Jason Wang wrote: >> On Mon, Aug 18, 2025 at 10:03 PM Stefan Hajnoczi <stefanha@redhat.com> >> wrote: >>> >>> On Mon, Aug 18, 2025 at 10:08:18AM +0800, Jason Wang wrote: >>>> On Thu, Aug 7, 2025 at 7:08 PM Laurent Vivier <lvivier@redhat.com> >>>> wrote: >>>>> >>>>> A race condition between guest driver actions and QEMU timers can lead >>>>> to an assertion failure when the guest switches the e1000e from legacy >>>>> interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or >>>>> RDTR) is active, but the guest enables MSI-X before the timer fires, >>>>> the pending interrupt cause can trigger an assert in >>>>> e1000e_intmgr_collect_delayed_causes(). >>>>> >>>>> This patch removes the assertion and executes the code that clears the >>>>> pending legacy causes. This change is safe and introduces no >>>>> unintended >>>>> behavioral side effects, as it only alters a state that previously led >>>>> to termination. >>>>> >>>>> - when core->delayed_causes == 0 the function was already a no-op and >>>>> remains so. >>>>> >>>>> - when core->delayed_causes != 0 the function would previously >>>>> crash due to the assertion failure. The patch now defines a safe >>>>> outcome by clearing the cause and returning. Since behavior after >>>>> the assertion never existed, this simply corrects the crash. >>>>> >>>>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1863 >>>>> Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> >>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com> >>>>> --- >>>> >>>> Acked-by: Jason Wang <jasowang@redhat.com> >>>> >>>> Consider rc3 is out. Can this be applied directly by maintainers or a >>>> PULL request is expected? >>> >>> The commit description doesn't mention whether this fixes a regression >>> introduced since QEMU 10.0, whether there is a security impact, etc. >>> In the absence of more information, this looks like a regular bug fix >>> that does not need to be merged for -rc4. >>> >>> Only release blockers will be merged for -rc4 (Tue 19 Aug). Please >>> provide a justification if this commit is a release blocker. Reasoning: >>> - From -rc3 onwards the goal is to make the final release and adding >>> additional patches risks introducing new issues that will delay the >>> release further. >>> - Commits should include enough information to make the decision to >>> merge easy and documented in git-log(1). Don't rely on me to judge >>> the >>> severity in areas of the codebase I'm not an expert in. >> >> I see, I think it's not a release blocker so we can defer this to the >> next release. > > just a reminder not to forget to pull it now... Since Jason Acked the patch, I'll merge it via my hw-misc tree; thanks!
On 2025/08/07 20:08, Laurent Vivier wrote: > A race condition between guest driver actions and QEMU timers can lead > to an assertion failure when the guest switches the e1000e from legacy > interrupt mode to MSI-X. If a legacy interrupt delay timer (TIDV or > RDTR) is active, but the guest enables MSI-X before the timer fires, > the pending interrupt cause can trigger an assert in > e1000e_intmgr_collect_delayed_causes(). > > This patch removes the assertion and executes the code that clears the > pending legacy causes. This change is safe and introduces no unintended > behavioral side effects, as it only alters a state that previously led > to termination. > > - when core->delayed_causes == 0 the function was already a no-op and > remains so. > > - when core->delayed_causes != 0 the function would previously > crash due to the assertion failure. The patch now defines a safe > outcome by clearing the cause and returning. Since behavior after > the assertion never existed, this simply corrects the crash. This description is better than my comment written in haste. Thank you for taking care of this. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
© 2016 - 2025 Red Hat, Inc.