From nobody Sun Nov 24 12:04:24 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87B791898E9 for ; Wed, 6 Nov 2024 01:51:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730857900; cv=none; b=ufui5amGf30KZdIrTiFqV8iMaIQTDLJ+XsEItbrLp5WdUe9SR8+GWcqyP5WAQzybENEWQTlvktyvrLdGPbIqbJXN9dtb1tRmoHP7dbMDg2iWr3fTONenll37pezzDFelZ8ctqE+ZbKrhyO4CpSMLKkWMMBL8ctbxueuXmZApcHY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730857900; c=relaxed/simple; bh=0cfREXYBv/O6QXSpQyDtQ6EZkt8ICo8sWJAbzDYUgA4=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=qqdi5LFSTnQ4hfPlvdRIlPMwVVjmKIZdiJCamKCpC3/f4MF6LKjYRVwzVv2Dkdu6rTkmU8VqE4/Izpu3c1Nl0FQd+5XBtPfuYHchCncBjJWPiVfvJ3QOCeLoEqHowmtJpeuRUMMriy5dNMfvhWYDiN3bLZIyU7Hn1wqcJzAOAAM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Ys5ScLeg; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ys5ScLeg" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e59dc7df64so5239417b3.1 for ; Tue, 05 Nov 2024 17:51:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730857897; x=1731462697; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:reply-to:from:to:cc :subject:date:message-id:reply-to; bh=ZsVWtySqNFgTVWgHXk6f3n0Xscm8CdqH5W1OuzF+xZg=; b=Ys5ScLegXytCfTIyxOHU6c0NqGJlq5OBzFLvKvwYFGEa7kvZe5l3CWVNeKfbtx5+y4 iUTF0xZOFboOHe3Ol+MpfZnt+gPJHVwwmI3kH9CHtpp6SOHlMfQtfac5QpSo8XBS+eoa OVkVIY9s7J+/z3sQ8mg3/WkWcwWp6wdVJu2F2xkKIonvPPUQBmxcpITxu8dPixAUUYUl Ob0sXG3PaGMgx+Q9akH6M/8C5QtmLXkCXJrqauhaUzDm/yBEAthlx9X+ecJPuCHxMn7E /u4tS3DdaG3nfPtH5ECEp/yKdp4/eJgI7lfWG+YS07pIug0CQwByqkU9J0sNHgTiJGBQ i3tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730857897; x=1731462697; h=cc:to:from:subject:message-id:mime-version:date:reply-to :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZsVWtySqNFgTVWgHXk6f3n0Xscm8CdqH5W1OuzF+xZg=; b=Xv6H/aaUVdc23/WEDI2WDbvM0dxerR8LzT8sNVswza2a1+qYjXgDpCAv6eEXFMMcWp GtWsjSnNjDhM+t3SN58PYD8cTgyhiPCTR6E3yk3PmQbszPHDs1uzs8meOgmzy9vr4n8L s7Zy0Nyk2ZjqpFuZCJ/lxBaH6g8IZH5gPs6aCd0AAbyIKEP2Ri9jvlK03QPDKIkE5EKj 2T66WkRNSO4YnntxYtJ24LoPqTGcDql0O55dR+2kCvP6oXWUyW2P2U5oqryeG1HJ9XZ0 eAiQGBrw8jEcursCXConJ9fx6EJfbXZ3d76Ghag/bhXthMd53zoZ/A51QrdPQUFpSh7c 8TRg== X-Forwarded-Encrypted: i=1; AJvYcCVomMZ9BzJbOhN3sFNj11nFjpf1Sc2TlziTGIsnuScWYhYcbeLfx0d36KhhoUr2yqkNYZZX8FDpE/KUspY=@vger.kernel.org X-Gm-Message-State: AOJu0Yz3jh2R5s2ICguAGBKLYYB2gb0BCPsEeZGlIVrolY9HIRT+DRwo q58Anayly4SiOB4UP0yl3ywM+azQ0IU5yEh36qg0vdvfNpcxlWN1ygtyEvH5KfYVr7Tg9n12gho 3wA== X-Google-Smtp-Source: AGHT+IGazWVeu0FiUgVNQ3537CJxZulfvjKMPsuJVlG1YN0Q7hnoVT4PImyk2lCO7WXJSsE83WQ9+k4BJSU= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a81:8d0c:0:b0:6d9:d865:46c7 with SMTP id 00721157ae682-6eabed866b9mr123877b3.2.1730857897571; Tue, 05 Nov 2024 17:51:37 -0800 (PST) Reply-To: Sean Christopherson Date: Tue, 5 Nov 2024 17:51:35 -0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.47.0.199.ga7371fff76-goog Message-ID: <20241106015135.2462147-1-seanjc@google.com> Subject: [PATCH v2] KVM: x86: Unconditionally set irr_pending when updating APICv state From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Maxim Levitsky , Yong He Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Always set irr_pending (to true) when updating APICv status to fix a bug where KVM fails to set irr_pending when userspace sets APIC state and APICv is disabled, which ultimate results in KVM failing to inject the pending interrupt(s) that userspace stuffed into the vIRR, until another interrupt happens to be emulated by KVM. Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to be true if APICv is enabled, because not all vIRR updates will be visible to KVM. Hit the bug with a big hammer, even though strictly speaking KVM can scan the vIRR and set/clear irr_pending as appropriate for this specific case. The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it"), which as the shortlog suggests, deleted code that updated irr_pending. Before that commit, kvm_apic_update_apicv() did indeed scan the vIRR, with with the crucial difference that kvm_apic_update_apicv() did the scan even when APICv was being *disabled*, e.g. due to an AVIC inhibition. struct kvm_lapic *apic =3D vcpu->arch.apic; if (vcpu->arch.apicv_active) { /* irr_pending is always true when apicv is activated. */ apic->irr_pending =3D true; apic->isr_count =3D 1; } else { apic->irr_pending =3D (apic_search_irr(apic) !=3D -1); apic->isr_count =3D count_vectors(apic->regs + APIC_ISR); } And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function"), prior to which KVM unconditionally set irr_pending to true in kvm_apic_set_state(), i.e. assumed that the new virtual APIC state could have a pending IRQ. Furthermore, in addition to introducing this issue, commit 755c2bf87860 also papered over the underlying bug: KVM doesn't ensure CPUs and devices see APICv as disabled prior to searching the IRR. Waiting until KVM emulates an EOI to update irr_pending "works", but only because KVM won't emulate EOI until after refresh_apicv_exec_ctrl(), and there are plenty of memory barriers in between. I.e. leaving irr_pending set is basically hacking around bad ordering. So, effectively revert to the pre-b26a695a1d78 behavior for state restore, even though it's sub-optimal if no IRQs are pending, in order to provide a minimal fix, but leave behind a FIXME to document the ugliness. With luck, the ordering issue will be fixed and the mess will be cleaned up in the not-too-distant future. Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_= update_apicv when inhibiting it") Cc: stable@vger.kernel.org Cc: Maxim Levitsky Reported-by: Yong He Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40ten= cent.com Signed-off-by: Sean Christopherson --- v2: Go with a big hammer fix, and plan on scanning the vIRR for all cases once the ordering bug has been resolved, i.e. once KVM guarantees the scan happens after CPUs and devices see the new APICv state. v1: https://lore.kernel.org/all/20241101193532.1817004-1-seanjc@google.com arch/x86/kvm/lapic.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 65412640cfc7..e470061b744a 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2629,19 +2629,26 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic =3D vcpu->arch.apic; =20 - if (apic->apicv_active) { - /* irr_pending is always true when apicv is activated. */ - apic->irr_pending =3D true; + /* + * When APICv is enabled, KVM must always search the IRR for a pending + * IRQ, as other vCPUs and devices can set IRR bits even if the vCPU + * isn't running. If APICv is disabled, KVM _should_ search the IRR + * for a pending IRQ. But KVM currently doesn't ensure *all* hardware, + * e.g. CPUs and IOMMUs, has seen the change in state, i.e. searching + * the IRR at this time could race with IRQ delivery from hardware that + * still sees APICv as being enabled. + * + * FIXME: Ensure other vCPUs and devices observe the change in APICv + * state prior to updating KVM's metadata caches, so that KVM + * can safely search the IRR and set irr_pending accordingly. + */ + apic->irr_pending =3D true; + + if (apic->apicv_active) apic->isr_count =3D 1; - } else { - /* - * Don't clear irr_pending, searching the IRR can race with - * updates from the CPU as APICv is still active from hardware's - * perspective. The flag will be cleared as appropriate when - * KVM injects the interrupt. - */ + else apic->isr_count =3D count_vectors(apic->regs + APIC_ISR); - } + apic->highest_isr_cache =3D -1; } =20 base-commit: 5cb1659f412041e4780f2e8ee49b2e03728a2ba6 --=20 2.47.0.199.ga7371fff76-goog