From nobody Thu Dec 18 04:42:13 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 354FBC3DA6F for ; Fri, 25 Aug 2023 02:08:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240072AbjHYCIB (ORCPT ); Thu, 24 Aug 2023 22:08:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240779AbjHYCHj (ORCPT ); Thu, 24 Aug 2023 22:07:39 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B855E66 for ; Thu, 24 Aug 2023 19:07:37 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-68a400a6e38so606025b3a.0 for ; Thu, 24 Aug 2023 19:07:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692929257; x=1693534057; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=IGe/iF9SBZy5+j6HXUabX6OshBa9kEx+gGQsGFRDhhI=; b=u3xIqi9osuTvUCTmUOsYQxgIxFLIf96g/NRPmenE/NCUekvOeA3QZmB+K52LPFPwNF xEVmV1L/UKl0Xrfn/eWKc4r/IlOTYLpHE8VprA5emav26PB+zZLBQW7Dm58jOCtAmYuD QBWCtjzSC1HkAKZynbCpalz5g7KSPmdGRrGawljdc3pv4JedzRwhHMlgjEniNstJoa/m wSYsv96Av9ZAajSCFP8BGXixBnq+Ve4A3NdqqBf6V0x29ai/Z9BBDRJgDdruOj7OAJP6 fp9xQ8TBpf2+7OytS6xy8wLi2ejBOUpmWOtZzlFxdkVWxTjn7wLHY0EsvM/Uxayr7afa Ufhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692929257; x=1693534057; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=IGe/iF9SBZy5+j6HXUabX6OshBa9kEx+gGQsGFRDhhI=; b=LRCSPmNVfgaBhpUuzvkIOG0WlrJLLP6U9TYlIEYOLddp6zmc0vCANf47VeaulyGpSV 62r29Lm6tBQKpgU41C7RjunCaZfbfbVguAAfPXdzfuCSr3Tdrze+fxA7fiRODz5a7G3Y 29/PCEt6SQQ3iphGSfKM/EFy1eMhgQlRCXlZrBAqYmdcweu6C8/dmeehTalEEf2fnJp4 soc1I0c6Cw/5PClWKl6O/bub7c+fM+LPbfLvoGoxTDeoo8JnXYcLDWlKShyVwE6xzpvs u38MLyQrsRIExmt/CWxHw7AbSnS91U9TvlcZ9CHFfQqLqt2pd5rtWjJDe+u9ZG443oHl xozg== X-Gm-Message-State: AOJu0Yyf54TC/WVWp2HMMEpcpBU0vXQ81pSIWuAi3hY03vfRoFzH2pOc 51/pYZkV03ywihIMrXrr5BaExeetzmc= X-Google-Smtp-Source: AGHT+IH6trbdMiY01sxIrZj6hxaaLAYXKk3TyN33OwO8ttuBlmlUrTwS7e+FDOumukPVb66A8xaSRMzmG8U= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:2389:b0:687:9855:ab23 with SMTP id f9-20020a056a00238900b006879855ab23mr9667830pfc.1.1692929256877; Thu, 24 Aug 2023 19:07:36 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 24 Aug 2023 19:07:32 -0700 In-Reply-To: <20230825020733.2849862-1-seanjc@google.com> Mime-Version: 1.0 References: <20230825020733.2849862-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <20230825020733.2849862-2-seanjc@google.com> Subject: [PATCH 1/2] KVM: Allow calling mmu_invalidate_retry_hva() without holding mmu_lock From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Yan Zhao Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow checking mmu_invalidate_retry_hva() without holding mmu_lock, even though mmu_lock must be held to guarantee correctness, i.e. to avoid false negatives. Dropping the requirement that mmu_lock be held will allow pre-checking for retry before acquiring mmu_lock, e.g. to avoid contending mmu_lock when the guest is accessing a range that is being invalidated by the host. Contending mmu_lock can have severe negative side effects for x86's TDP MMU when running on preemptible kernels, as KVM will yield from the zapping task (holds mmu_lock for write) when there is lock contention, and yielding after any SPTEs have been zapped requires a VM-scoped TLB flush. Wrap mmu_invalidate_in_progress in READ_ONCE() to ensure that calling mmu_invalidate_retry_hva() in a loop won't put KVM into an infinite loop, e.g. due to caching the in-progress flag and never seeing it go to '0'. Force a load of mmu_invalidate_seq as well, even though it isn't strictly necessary to avoid an infinite loop, as doing so improves the probability that KVM will detect an invalidation that already completed before acquiring mmu_lock and bailing anyways. Note, adding READ_ONCE() isn't entirely free, e.g. on x86, the READ_ONCE() may generate a load into a register instead of doing a direct comparison (MOV+TEST+Jcc instead of CMP+Jcc), but practically speaking the added cost is a few bytes of code and maaaaybe a cycle or three. Signed-off-by: Sean Christopherson Acked-by: Kai Huang --- include/linux/kvm_host.h | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7418e881c21c..7314138ba5f4 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1962,18 +1962,29 @@ static inline int mmu_invalidate_retry_hva(struct k= vm *kvm, unsigned long mmu_seq, unsigned long hva) { - lockdep_assert_held(&kvm->mmu_lock); /* * If mmu_invalidate_in_progress is non-zero, then the range maintained * by kvm_mmu_notifier_invalidate_range_start contains all addresses * that might be being invalidated. Note that it may include some false * positives, due to shortcuts when handing concurrent invalidations. + * + * Note the lack of a memory barriers! The caller *must* hold mmu_lock + * to avoid false negatives! Holding mmu_lock is not mandatory though, + * e.g. to allow pre-checking for an in-progress invalidation to + * avoiding contending mmu_lock. Ensure that the in-progress flag and + * sequence counter are always read from memory, so that checking for + * retry in a loop won't result in an infinite retry loop. Don't force + * loads for start+end, as the key to avoiding an infinite retry loops + * is observing the 1=3D>0 transition of in-progress, i.e. getting false + * negatives (if mmu_lock isn't held) due to stale start+end values is + * acceptable. */ - if (unlikely(kvm->mmu_invalidate_in_progress) && + if (unlikely(READ_ONCE(kvm->mmu_invalidate_in_progress)) && hva >=3D kvm->mmu_invalidate_range_start && hva < kvm->mmu_invalidate_range_end) return 1; - if (kvm->mmu_invalidate_seq !=3D mmu_seq) + + if (READ_ONCE(kvm->mmu_invalidate_seq) !=3D mmu_seq) return 1; return 0; } --=20 2.42.0.rc2.253.gd59a3bf2b4-goog