From nobody Sun Feb 8 14:10:42 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C38D1C77B7A for ; Fri, 2 Jun 2023 01:15:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233077AbjFBBPa (ORCPT ); Thu, 1 Jun 2023 21:15:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233251AbjFBBP0 (ORCPT ); Thu, 1 Jun 2023 21:15:26 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AE941A7 for ; Thu, 1 Jun 2023 18:15:23 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-565d1b86a64so20936757b3.3 for ; Thu, 01 Jun 2023 18:15:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685668523; x=1688260523; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=tmG5Gj4Mwq4ntwod8eDhcTKnVqwdZ3BWjaq2d7Ob80k=; b=uC2XLUjv1JyRrK7ep7CVM4ZlT2APqmo2uQ6DEC+UkTT7G4T5nodeNoub/YBwvFRguI PslfYWKsTdTj4HRUhOMESTHjjITLrF++g2d95YBnetmlqjkh4Lv7sHnfD84BU4IrrLr1 Y5btKdmMLEjN0rli82gVKnmJetTXW7TIlzAhgKrhCFh2EB93DEqW98F0N4CKWyb3NKFe /4auYE/4xbSdPJJGl4joVAFDBd7DD7YCEKNCJGP22ATCcQktaQFTJ4boUisttl+4im18 LgdVZ1pi3R27V3/VTdX4GGBBI6KUSrPwDxMKSZHt1pIbFB+sgE3lARNsbPLX286Nkozl rqlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685668523; x=1688260523; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tmG5Gj4Mwq4ntwod8eDhcTKnVqwdZ3BWjaq2d7Ob80k=; b=dXKi5bamr/fkz0UHkJCdERE+Ivsgkni+wQUDfLUoL4QBl9O8Z3jfxotVcHEngWH58B ArSbddOSpoXVrVYC8lhjCPhjSI/L+zOBvuVUcSyFcOQrMxluXb90I1VEaBRVlvnXEMBS CFZCyPvunR2rxSfUQrMPZtbgQ6bIeaED9dWu9YRRvTZfLYsr+EYsQ+XMntBsqF+CqZNA QNNHB7YXe6KTDwitDB4DZfmVtmnwrau7x/P9kxtBAC4bV4pCSNjhqUlWcu13G2iUe4jo XyvKYTNuCvC8LveUVSkyQ1S2NixwDwPNdUKCkud5Lp9dggOZFj/OqoHwT/vE4fXQ9Bkb TD7A== X-Gm-Message-State: AC+VfDwdrKojG7oB+xENuL1LWEy9h59B7MalvYjtKL1yjj//lIOYIQ81 Vd3IqAg/tMrKKXTqhDBkjVifNtPJkeo= X-Google-Smtp-Source: ACHHUZ7j+p4pM6p9SvTqPZPmKRs6yjDhBW+n4Osich7yYjAeQUUKDzqfRrjSWym7Urrf7J8sCrbaGS8Bcjg= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:c509:0:b0:552:b607:634b with SMTP id k9-20020a81c509000000b00552b607634bmr6259856ywi.4.1685668522848; Thu, 01 Jun 2023 18:15:22 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 1 Jun 2023 18:15:16 -0700 In-Reply-To: <20230602011518.787006-1-seanjc@google.com> Mime-Version: 1.0 References: <20230602011518.787006-1-seanjc@google.com> X-Mailer: git-send-email 2.41.0.rc2.161.g9c6817b8e7-goog Message-ID: <20230602011518.787006-2-seanjc@google.com> Subject: [PATCH 1/3] KVM: VMX: Retry APIC-access page reload if invalidation is in-progress From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jason Gunthorpe , Alistair Popple , Robin Murphy Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Re-request an APIC-access page reload if there is a relevant mmu_notifier invalidation in-progress when KVM retrieves the backing pfn, i.e. stall vCPUs until the backing pfn for the APIC-access page is "officially" stable. Relying on the primary MMU to not make changes after invoking ->invalidate_range() works, e.g. any additional changes to a PRESENT PTE would also trigger an ->invalidate_range(), but using ->invalidate_range() to fudge around KVM not honoring past and in-progress invalidations is a bit hacky. Honoring invalidations will allow using KVM's standard mmu_notifier hooks to detect APIC-access page reloads, which will in turn allow removing KVM's implementation of ->invalidate_range() (the APIC-access page case is a true one-off). Opportunistically add a comment to explain why doing nothing if a memslot isn't found is functionally correct. Suggested-by: Jason Gunthorpe Cc: Alistair Popple Cc: Robin Murphy Signed-off-by: Sean Christopherson Reviewed-by: Alistair Popple --- arch/x86/kvm/vmx/vmx.c | 50 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 45 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 44fb619803b8..59195f0dc7a5 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6708,7 +6708,12 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) =20 static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) { - struct page *page; + const gfn_t gfn =3D APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT; + struct kvm *kvm =3D vcpu->kvm; + struct kvm_memslots *slots =3D kvm_memslots(kvm); + struct kvm_memory_slot *slot; + unsigned long mmu_seq; + kvm_pfn_t pfn; =20 /* Defer reload until vmcs01 is the current VMCS. */ if (is_guest_mode(vcpu)) { @@ -6720,18 +6725,53 @@ static void vmx_set_apic_access_page_addr(struct kv= m_vcpu *vcpu) SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) return; =20 - page =3D gfn_to_page(vcpu->kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); - if (is_error_page(page)) + /* + * Grab the memslot so that the hva lookup for the mmu_notifier retry + * is guaranteed to use the same memslot as the pfn lookup, i.e. rely + * on the pfn lookup's validation of the memslot to ensure a valid hva + * is used for the retry check. + */ + slot =3D id_to_memslot(slots, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT); + if (!slot || slot->flags & KVM_MEMSLOT_INVALID) return; =20 - vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(page)); + /* + * Ensure that the mmu_notifier sequence count is read before KVM + * retrieves the pfn from the primary MMU. Note, the memslot is + * protected by SRCU, not the mmu_notifier. Pairs with the smp_wmb() + * in kvm_mmu_invalidate_end(). + */ + mmu_seq =3D kvm->mmu_invalidate_seq; + smp_rmb(); + + /* + * No need to retry if the memslot does not exist or is invalid. KVM + * controls the APIC-access page memslot, and only deletes the memslot + * if APICv is permanently inhibited, i.e. the memslot won't reappear. + */ + pfn =3D gfn_to_pfn_memslot(slot, gfn); + if (is_error_noslot_pfn(pfn)) + return; + + read_lock(&vcpu->kvm->mmu_lock); + if (mmu_invalidate_retry_hva(kvm, mmu_seq, + gfn_to_hva_memslot(slot, gfn))) { + kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); + read_unlock(&vcpu->kvm->mmu_lock); + goto out; + } + + vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn)); + read_unlock(&vcpu->kvm->mmu_lock); + vmx_flush_tlb_current(vcpu); =20 +out: /* * Do not pin apic access page in memory, the MMU notifier * will call us again if it is migrated or swapped out. */ - put_page(page); + kvm_release_pfn_clean(pfn); } =20 static void vmx_hwapic_isr_update(int max_isr) --=20 2.41.0.rc2.161.g9c6817b8e7-goog