From nobody Sun Nov 24 23:02:16 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 981EA1CB51C for ; Thu, 31 Oct 2024 20:20:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730406018; cv=none; b=LaqfSAwFoUcFwYZ0MVIEcMmQEs8C2dW18I49BRDFJNSaVFLpbEFom16HBkCntfJxT1sKROB1YJP9uP1/VCUeTqObJ1qGQxHL8UvzpsviEw76Zgf6BM82TFsMCojHflD+qr/C0Zdvfo2QHP9rflTysNRKyugOdB+w+RKt4xApa0g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730406018; c=relaxed/simple; bh=ibguVxv5ys3EG+bTYwAwYyEFPl4BfLP061AdL88v55g=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=G8f/wf7zmWXmMe70fTlBIueOTdM3tgUOYmFy4vWu3NwWwqtN5JZDWTJwroOQa+Rx0AalPX4qyW09tzP8KljgG9a8lFV5v1JU6BQ/jIHT+3O/Tb7LuFpt76ZiC+Z2HxfLJTShPfS1vqpRGkVgk1QxTio16hHzo4UBCovQIklKLjM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=SQ69r03r; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SQ69r03r" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e35a643200so29964767b3.0 for ; Thu, 31 Oct 2024 13:20:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730406015; x=1731010815; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:reply-to:from:to:cc :subject:date:message-id:reply-to; bh=jZhenKn3T1lSjjdyHpwG37r5ky1yYEV/H0p+NeciZMA=; b=SQ69r03rBEjwIP11WMj2DKbafqz46+LhQnb6q/iFVs3s6UJ2QfGS5qPNTyAQNAtMb/ 9o1AI00H0vylpQE2JDf1iTYVy8Vj8wNW7SEXCDBBABfmeOdVWqYNShOxeIloW3BKFW8Z FWEvRN3qwqf8Pw5MIMbNrvCdRAjag64sCDIRwQkO8Qjy9nlggo7YX1638nGvJFx4kIYZ nw99Yk7EuwMztnHEsqJv472BvPrJxEtvgmG25/8A58h3HcFIrYxUUgdid3oxGdsgm8dX gBm9BXUHrGUlABCRB7z9duLxpvrZlzW87uXeyx/4+q5G8gu6XXMYR/6jF9saNBCvW9LQ d5zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730406015; x=1731010815; h=cc:to:from:subject:message-id:mime-version:date:reply-to :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jZhenKn3T1lSjjdyHpwG37r5ky1yYEV/H0p+NeciZMA=; b=rcnIkMvy1H1mIoJPmLQMGX2EzSZrMqQPODwY7Ia6yuQG1yHDZkTIbViKlpTvOOXqrn +D5G6x5QvWseyCNMcfc/F2EiuGuqCroIrmwj+rJNuw1bvecCRfK1pq3Ff0fKuciJUJDT 8CnJ7MvMj4XGcZa/mF3GB1Y+TpZHoe6fEuHnHLMSGhRaUUN6s/ZA9nHLBBwrj2rK8EZD hqdi1vmf+9y1dl/mGs23C7b4nxlWYBDNOgdXKMKrnDKzFvNAI629KQ6WamBjySe8MEWM BMnwq20XdKM0Fz6L1Lu3QDw69LqTiJ9RrZlKYvHc4+w9M8SLkiZexF/DasuE9mIm7FBK QXBg== X-Forwarded-Encrypted: i=1; AJvYcCXEpqR0P6MHRL+JSr6/zAM79ElW0ZjQ2w8UEZEi7jZkYopnQIhLJ3/Y7d7JHed5KFxeUEbJQeRN23Bdo/c=@vger.kernel.org X-Gm-Message-State: AOJu0Yx/pc+MubBkTUOpTC3W7KLk1SgcemyVjcElVvvx6twQcx7qQBVC GnbvWS/Kcqos20KiktZi3v8gfeukJbB3nPEkh8QawKMVKtdhAupqbi14tNT/66oA6+KCi/xL7mD Ajw== X-Google-Smtp-Source: AGHT+IFrERFxys/nFxJS2UFqRSq8jptFMXlQLla0whmxS+Xa4ijJlHYJjILBUPjl0bRKEkidO9eZEYX5Z+E= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a81:a888:0:b0:6db:c3b8:c4ce with SMTP id 00721157ae682-6ea52557dffmr73077b3.7.1730406015706; Thu, 31 Oct 2024 13:20:15 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 31 Oct 2024 13:20:11 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.47.0.163.g1226f6d8fa-goog Message-ID: <20241031202011.1580522-1-seanjc@google.com> Subject: [PATCH] KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabled From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Like Xu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When getting the current VPID, e.g. to emulate a guest TLB flush, return vpid01 if L2 is running but with VPID disabled, i.e. if VPID is disabled in vmcs12. Architecturally, if VPID is disabled, then the guest and host effectively share VPID=3D0. KVM emulates this behavior by using vpid01 when running an L2 with VPID disabled (see prepare_vmcs02_early_rare()), and so KVM must also treat vpid01 as the current VPID while L2 is active. Unconditionally treating vpid02 as the current VPID when L2 is active causes KVM to flush TLB entries for vpid02 instead of vpid01, which results in TLB entries from L1 being incorrectly preserved across nested VM-Enter to L2 (L2=3D>L1 isn't problematic, because the TLB flush after nested VM-Exit flushes vpid01). The bug manifests as failures in the vmx_apicv_test KVM-Unit-Test, as KVM incorrectly retains TLB entries for the APIC-access page across a nested VM-Enter. Opportunisticaly add comments at various touchpoints to explain the architectural requirements, and also why KVM uses vpid01 instead of vpid02. All credit goes to Chao, who root caused the issue and identified the fix. Link: https://lore.kernel.org/all/ZwzczkIlYGX+QXJz@intel.com Fixes: 2b4a5a5d5688 ("KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ= _TLB_FLUSH_GUEST") Cc: stable@vger.kernel.org Cc: Like Xu Debugged-by: Chao Gao Signed-off-by: Sean Christopherson Reviewed-by: Chao Gao Tested-by: Chao Gao --- arch/x86/kvm/vmx/nested.c | 30 +++++++++++++++++++++++++----- arch/x86/kvm/vmx/vmx.c | 2 +- 2 files changed, 26 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 746cb41c5b98..aa78b6f38dfe 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1197,11 +1197,14 @@ static void nested_vmx_transition_tlb_flush(struct = kvm_vcpu *vcpu, kvm_hv_nested_transtion_tlb_flush(vcpu, enable_ept); =20 /* - * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings - * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a - * full TLB flush from the guest's perspective. This is required even - * if VPID is disabled in the host as KVM may need to synchronize the - * MMU in response to the guest TLB flush. + * If VPID is disabled, then guest TLB accesses use VPID=3D0, i.e. the + * same VPID as the host, and so architecturally, linear and combined + * mappings for VPID=3D0 must be flushed at VM-Enter and VM-Exit. KVM + * emulates L2 sharing L1's VPID=3D0 by using vpid01 while running L2, + * and so KVM must also emulate TLB flush of VPID=3D0, i.e. vpid01. This + * is required if VPID is disabled in KVM, as a TLB flush (there are no + * VPIDs) still occurs from L1's perspective, and KVM may need to + * synchronize the MMU in response to the guest TLB flush. * * Note, using TLB_FLUSH_GUEST is correct even if nested EPT is in use. * EPT is a special snowflake, as guest-physical mappings aren't @@ -2315,6 +2318,17 @@ static void prepare_vmcs02_early_rare(struct vcpu_vm= x *vmx, =20 vmcs_write64(VMCS_LINK_POINTER, INVALID_GPA); =20 + /* + * If VPID is disabled, then guest TLB accesses use VPID=3D0, i.e. the + * same VPID as the host. Emulate this behavior by using vpid01 for L2 + * if VPID is disabled in vmcs12. Note, if VPID is disabled, VM-Enter + * and VM-Exit are architecturally required to flush VPID=3D0, but *only* + * VPID=3D0. I.e. using vpid02 would be ok (so long as KVM emulates the + * required flushes), but doing so would cause KVM to over-flush. E.g. + * if L1 runs L2 X with VPID12=3D1, then runs L2 Y with VPID12 disabled, + * and then runs L2 X again, then KVM can and should retain TLB entries + * for VPID12=3D1. + */ if (enable_vpid) { if (nested_cpu_has_vpid(vmcs12) && vmx->nested.vpid02) vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->nested.vpid02); @@ -5957,6 +5971,12 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) return nested_vmx_fail(vcpu, VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); =20 + /* + * Always flush the effective vpid02, i.e. never flush the current VPID + * and never explicitly flush vpid01. INVVPID targets a VPID, not a + * VMCS, and so whether or not the current vmcs12 has VPID enabled is + * irrelevant (and there may not be a loaded vmcs12). + */ vpid02 =3D nested_get_vpid02(vcpu); switch (type) { case VMX_VPID_EXTENT_INDIVIDUAL_ADDR: diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6ed801ffe33f..3d4a8d5b0b80 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3213,7 +3213,7 @@ void vmx_flush_tlb_all(struct kvm_vcpu *vcpu) =20 static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu) { - if (is_guest_mode(vcpu)) + if (is_guest_mode(vcpu) && nested_cpu_has_vpid(get_vmcs12(vcpu))) return nested_get_vpid02(vcpu); return to_vmx(vcpu)->vpid; } base-commit: e466901b947d529f7b091a3b00b19d2bdee206ee --=20 2.47.0.163.g1226f6d8fa-goog