From nobody Thu Dec 18 20:37:45 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F72C15C120 for ; Wed, 18 Dec 2024 21:36:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734557778; cv=none; b=n0cCORgJ0yiKN27LeQnJpKKOXYtkzSvO5LaBET/J6psKfdJQwhGvpTkC0SAO9a44iXPhyTGHj+fkPcetDQhmje033kkk+0vWwLGXIzAN4+G3D5znGq/nyRoyR61PhKJSrUoBxJamiOvRy0pwzyU7MeI8KNUQhq+m1HHIvo8OhFE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734557778; c=relaxed/simple; bh=uxBcCRhjGzXie5kmTqCNMlXwLz3YytowLn20K9+5MWg=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=nByNdgWNDMqmSfgctD7M2XBmxNHpefOCQiHg1JhUk7rMKttpQ3vmKLZru1Jxhe+sPXI67MSVliVEFGaXf6Yp1fWNJB5cQiB0fD+7QA+wQ3WCjFcmHc6Pa3GMtm4vFiVfuIltpywEQn2uAu8zlS7WW4OclVGBZb72dn+x6sWAO2E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=z6BZp0aq; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="z6BZp0aq" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2efa74481fdso104635a91.1 for ; Wed, 18 Dec 2024 13:36:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734557775; x=1735162575; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:reply-to:from:to:cc :subject:date:message-id:reply-to; bh=oWc1sGaOiZw/BmI6uxoyXnLhxhQzeG+z+yh5nXhj+rA=; b=z6BZp0aq6pLS9pvnQEcMkW3mhIPGEauPivslPfVVmd9UkysbMsuRZYGmk7ifPuR8Il Grg4Gq6aphhqX28+UsGi2zkKjDFAnqGLHdtkAVIbMjxZ7DakRnftSaa4xyqJF8sFvukL SHVhq+UXSuj4M0nhYJAqMdAIHv3/7ZkSPkxxDosGWeH6ksJ7H25oXYBKgZyrlD+B3Ypk F6GZGJt8cc5I8IDaTyvjw+nX1FLPWu9AYzdF4pwGPgYl6rqaG71NQITXnonqy+GchzEy I8vtGxs+Litac4cK4Y5qrvcKFoAtKOcMFqZNTbdkxG0CCuWLxASqOKx2BNI/Ndtj2uxU K4dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734557775; x=1735162575; h=cc:to:from:subject:message-id:mime-version:date:reply-to :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oWc1sGaOiZw/BmI6uxoyXnLhxhQzeG+z+yh5nXhj+rA=; b=YZ1BwgKe+BEUgdQrKSyKou0SPn8PpKBtEln4udIoCC66umeES4NFrSXuLpiFepnGZz YqweQ7lHh3fgwtcvBUBqQy90Ck2wGl79Z9fsyVfj/hDXBW3TNDoBTVCXYV0gx7gGmBE0 GnI9ewfQteihdrbw9X5r2j1AndWBHqCtu2r3t+DUmZzZUFIkexzk5nGjLyJ4SwstP69n bceo6KEZhG+Xhuf4q2oTvDRZtQ8KKbYmkFGXUfMQMb/K0gU3u1VrRlTtwP6RF8OGCRh9 kbmGlVWkUloq9PgfJbPgNXgqoePbgltAVrQzvFP/jkXHdxAbgYZgpLCblZl/6A2fAgqs 8kEg== X-Forwarded-Encrypted: i=1; AJvYcCXLz4jnToKjPYIRyaWmJJTEvJ4EoKYkbVYQZUqTYfTXhD07N9jm4dZKYiBpaZeYIvMhJhWbDxV66zTGBUI=@vger.kernel.org X-Gm-Message-State: AOJu0YycL3rQyOiP5VSwXkUYj2GL+g4EaGl2x6kI3M+HxD7tu+/WzYcL dUJCkEPxP8qFMpH8bJPHM+tqwJtb71pb2WYrjYkidA4W/WJzh6cTOWZ05UrKL3UIAlbGftaQqLP e2g== X-Google-Smtp-Source: AGHT+IH0cIBHEW0fINMkCPcMt726gLZ9Yj+wo2r0Bn7iO0S4VoKApfZ0SG0zJpt73h2i5OzKgNkb7JNq/Bk= X-Received: from pjm6.prod.google.com ([2002:a17:90b:2fc6:b0:2ee:3cc1:7944]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:534e:b0:2ee:d63f:d73 with SMTP id 98e67ed59e1d1-2f443ce5318mr1136420a91.11.1734557775107; Wed, 18 Dec 2024 13:36:15 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 18 Dec 2024 13:36:11 -0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241218213611.3181643-1-seanjc@google.com> Subject: [PATCH] KVM: x86/mmu: Treat TDP MMU faults as spurious if access is already allowed From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, leiyang@redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Treat slow-path TDP MMU faults as spurious if the access is allowed given the existing SPTE to fix a benign warning (other than the WARN itself) due to replacing a writable SPTE with a read-only SPTE, and to avoid the unnecessary LOCK CMPXCHG and subsequent TLB flush. If a read fault races with a write fault, fast GUP fails for any reason when trying to "promote" the read fault to a writable mapping, and KVM resolves the write fault first, then KVM will end up trying to install a read-only SPTE (for a !map_writable fault) overtop a writable SPTE. Note, it's not entirely clear why fast GUP fails, or if that's even how KVM ends up with a !map_writable fault with a writable SPTE. If something else is going awry, e.g. due to a bug in mmu_notifiers, then treating read faults as spurious in this scenario could effectively mask the underlying problem. However, retrying the faulting access instead of overwriting an existing SPTE is functionally correct and desirable irrespective of the WARN, and fast GUP _can_ legitimately fail with a writable VMA, e.g. if the Accessed bit in primary MMU's PTE is toggled and causes a PTE value mismatch. The WARN was also recently added, specifically to track down scenarios where KVM is unnecessarily overwrites SPTEs, i.e. treating the fault as spurious doesn't regress KVM's bug-finding capabilities in any way. In short, letting the WARN linger because there's a tiny chance it's due to a bug elsewhere would be excessively paranoid. Fixes: 1a175082b190 ("KVM: x86/mmu: WARN and flush if resolving a TDP MMU f= ault clears MMU-writable") Reported-by: leiyang@redhat.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=3D219588 Signed-off-by: Sean Christopherson Reviewed-by: Yan Zhao Tested-by: Lei Yang --- arch/x86/kvm/mmu/mmu.c | 12 ------------ arch/x86/kvm/mmu/spte.h | 17 +++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.c | 5 +++++ 3 files changed, 22 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 22e7ad235123..2401606db260 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3364,18 +3364,6 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu = *vcpu, return true; } =20 -static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte) -{ - if (fault->exec) - return is_executable_pte(spte); - - if (fault->write) - return is_writable_pte(spte); - - /* Fault was on Read access */ - return spte & PT_PRESENT_MASK; -} - /* * Returns the last level spte pointer of the shadow page walk for the giv= en * gpa, and sets *spte to the spte value. This spte may be non-preset. If = no diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index f332b33bc817..af10bc0380a3 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -461,6 +461,23 @@ static inline bool is_mmu_writable_spte(u64 spte) return spte & shadow_mmu_writable_mask; } =20 +/* + * Returns true if the access indicated by @fault is allowed by the existi= ng + * SPTE protections. Note, the caller is responsible for checking that the + * SPTE is a shadow-present, leaf SPTE (either before or after). + */ +static inline bool is_access_allowed(struct kvm_page_fault *fault, u64 spt= e) +{ + if (fault->exec) + return is_executable_pte(spte); + + if (fault->write) + return is_writable_pte(spte); + + /* Fault was on Read access */ + return spte & PT_PRESENT_MASK; +} + /* * If the MMU-writable flag is cleared, i.e. the SPTE is write-protected f= or * write-tracking, remote TLBs must be flushed, even if the SPTE was read-= only, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 4508d868f1cd..2f15e0e33903 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -985,6 +985,11 @@ static int tdp_mmu_map_handle_target_level(struct kvm_= vcpu *vcpu, if (fault->prefetch && is_shadow_present_pte(iter->old_spte)) return RET_PF_SPURIOUS; =20 + if (is_shadow_present_pte(iter->old_spte) && + is_access_allowed(fault, iter->old_spte) && + is_last_spte(iter->old_spte, iter->level)) + return RET_PF_SPURIOUS; + if (unlikely(!fault->slot)) new_spte =3D make_mmio_spte(vcpu, iter->gfn, ACC_ALL); else base-commit: 3522c419758ee8dca5a0e8753ee0070a22157bc1 --=20 2.47.1.613.gc27f4b7a9f-goog