From nobody Sun Dec 14 06:21:57 2025 Received: from mail-vk1-f201.google.com (mail-vk1-f201.google.com [209.85.221.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E70470805 for ; Tue, 4 Feb 2025 00:40:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629659; cv=none; b=kzLja6c4KzRE4DbomlTg2PCg/6nmx8VpyXLMcr8CeCETO0MHLlTjhsi4cHmR98+gk8pZqDGjsPmUkJpmc/suwHq0EX507VvQ4hqPK8kvvN2mdu14ahDtKq/Q7eo+4j5w3ml28s/lzR42iA87cG7poVxZtgfQ6FW6lzeTvGFQSTg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629659; c=relaxed/simple; bh=UL28ZgR3kUeF2Xp82uMj6FzafYe+V0R6ZlLnORKDOpc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=FrxZS+nBc3f2oMaf5rXRDKVmiaeCoMKRdJbEbZ7+/AEBn1vrUKonIrYr28xIqXaUXndtcn2Giyy7P9ph7BHyVCabDgSP/kvMpISXT8PzY0NyDrcbWXaPl3DYUxQg5mo60Aj49yqjsJt+iWpGkIz6tOQIXjU4UoVwtY1RbBosB9I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=f8hJsGj5; arc=none smtp.client-ip=209.85.221.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="f8hJsGj5" Received: by mail-vk1-f201.google.com with SMTP id 71dfb90a1353d-516054f130dso4146766e0c.3 for ; Mon, 03 Feb 2025 16:40:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629656; x=1739234456; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UxEUYl8a/dKUlJdS6GEIUWUFOjQCNo9vrgo3n4ZkdWk=; b=f8hJsGj55j+ghiQ7nVU5BKbP5XIcRew4BoqdzAJ0a0kfR1PpXgYs2ZHWiQ4Av+gDPu LU+b849+eVYbc15Y56cgZQqmEVkhkFDVvDQ/tKyJVqhg/uRxHm4lkRkRavmxGEDvbWxT KJfb2ofGHycpVPmMuASt2Cn7oh0d2bxdj4QZ1lZuAw502iOystaStb5hUU7Nw3nJRgzL 7mBcpn/9iXORVrrx6y1edn4g/WW6/xE+jny0uuPrtviBkIIXDkaH3ZlFXbXqC/FDOAom uJqoEKlHjrnkRiT3dZrMpTyNRxzP8Obb1HvpaKjnagI0sJxXsQBQctdxdIaJTdygKytv LGMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629656; x=1739234456; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UxEUYl8a/dKUlJdS6GEIUWUFOjQCNo9vrgo3n4ZkdWk=; b=RLgYIZ8Bb05s+diGBi99nzLUxGhXmGZqLxYPudS66OFr2n/DKUeTR/lx1k5diCxjvj 9yqOtBV96wfGqav/r4deKwf6lQIs5WqGVMATRoa66BvrLXCKSXSZEqOOgv9Q4A36mwbl u/UlEYKrPolEsWuSsofodXPlY5EDG13dDRmWsap/ITxbgr8pWK9wqRJQ4RtKnbVRZ0/X LZQkdRivR9FpKZUUNphqUSFRF+WpYE6MKZiotWzRDMqPVkydHu1KPwJZy5Nm6k3pRPw0 dQjBoT740n/j4dghMlPExAaMv3u+mQZSxdHy2R5KmPPuCE20fO9iHuVz3N3WZPHZGpOM msLA== X-Forwarded-Encrypted: i=1; AJvYcCXFcgqyPihXi623/UK0PD8ozz1xCk9WoGNTRwi75zkpw6c0PtNJEjbGe7E644buIafSSpuX4J4kGVnaT6M=@vger.kernel.org X-Gm-Message-State: AOJu0YyhA7dWCm8typONE1PpJz2ENegRwzxFwCNMia7u5cxdrD/zIdQj 5XNoVgtaouNM4iBjw/7WK55Wl560xPCG+sjBD4TbspvymKXEG2bDaHww/7j5A9BoQqdw9yKNh2E VNVu/GO4/CWrcXvMg8A== X-Google-Smtp-Source: AGHT+IH1OMmPekDOsDXJdjNiOLgCD/s/BK6ZQ+I8kVUAc7o1sZtVsX7n6D39xPBo/DPZuRGHQr2EoQQcIrmlSwCC X-Received: from vkbcp2.prod.google.com ([2002:a05:6122:4302:b0:516:2831:75fd]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6122:3181:b0:51b:8949:c9a7 with SMTP id 71dfb90a1353d-51e9e5161abmr17882570e0c.8.1738629656189; Mon, 03 Feb 2025 16:40:56 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:28 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-2-jthoughton@google.com> Subject: [PATCH v9 01/11] KVM: Rename kvm_handle_hva_range() From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Rename kvm_handle_hva_range() to kvm_age_hva_range(), kvm_handle_hva_range_no_flush() to kvm_age_hva_range_no_flush(), and __kvm_handle_hva_range() to kvm_handle_hva_range(), as kvm_age_hva_range() will get more aging-specific functionality. Suggested-by: Sean Christopherson Signed-off-by: James Houghton --- virt/kvm/kvm_main.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index add042839823..1bd49770506a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -551,8 +551,8 @@ static void kvm_null_fn(void) node; \ node =3D interval_tree_iter_next(node, start, last)) \ =20 -static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, - const struct kvm_mmu_notifier_range *range) +static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, + const struct kvm_mmu_notifier_range *range) { struct kvm_mmu_notifier_return r =3D { .ret =3D false, @@ -633,7 +633,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_ra= nge(struct kvm *kvm, return r; } =20 -static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn, +static __always_inline int kvm_age_hva_range(struct mmu_notifier *mn, unsigned long start, unsigned long end, gfn_handler_t handler, @@ -649,15 +649,15 @@ static __always_inline int kvm_handle_hva_range(struc= t mmu_notifier *mn, .may_block =3D false, }; =20 - return __kvm_handle_hva_range(kvm, &range).ret; + return kvm_handle_hva_range(kvm, &range).ret; } =20 -static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifi= er *mn, - unsigned long start, - unsigned long end, - gfn_handler_t handler) +static __always_inline int kvm_age_hva_range_no_flush(struct mmu_notifier = *mn, + unsigned long start, + unsigned long end, + gfn_handler_t handler) { - return kvm_handle_hva_range(mn, start, end, handler, false); + return kvm_age_hva_range(mn, start, end, handler, false); } =20 void kvm_mmu_invalidate_begin(struct kvm *kvm) @@ -752,7 +752,7 @@ static int kvm_mmu_notifier_invalidate_range_start(stru= ct mmu_notifier *mn, * that guest memory has been reclaimed. This needs to be done *after* * dropping mmu_lock, as x86's reclaim path is slooooow. */ - if (__kvm_handle_hva_range(kvm, &hva_range).found_memslot) + if (kvm_handle_hva_range(kvm, &hva_range).found_memslot) kvm_arch_guest_memory_reclaimed(kvm); =20 return 0; @@ -798,7 +798,7 @@ static void kvm_mmu_notifier_invalidate_range_end(struc= t mmu_notifier *mn, }; bool wake; =20 - __kvm_handle_hva_range(kvm, &hva_range); + kvm_handle_hva_range(kvm, &hva_range); =20 /* Pairs with the increment in range_start(). */ spin_lock(&kvm->mn_invalidate_lock); @@ -822,8 +822,8 @@ static int kvm_mmu_notifier_clear_flush_young(struct mm= u_notifier *mn, { trace_kvm_age_hva(start, end); =20 - return kvm_handle_hva_range(mn, start, end, kvm_age_gfn, - !IS_ENABLED(CONFIG_KVM_ELIDE_TLB_FLUSH_IF_YOUNG)); + return kvm_age_hva_range(mn, start, end, kvm_age_gfn, + !IS_ENABLED(CONFIG_KVM_ELIDE_TLB_FLUSH_IF_YOUNG)); } =20 static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, @@ -846,7 +846,7 @@ static int kvm_mmu_notifier_clear_young(struct mmu_noti= fier *mn, * cadence. If we find this inaccurate, we might come up with a * more sophisticated heuristic later. */ - return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); + return kvm_age_hva_range_no_flush(mn, start, end, kvm_age_gfn); } =20 static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, @@ -855,8 +855,8 @@ static int kvm_mmu_notifier_test_young(struct mmu_notif= ier *mn, { trace_kvm_test_age_hva(address); =20 - return kvm_handle_hva_range_no_flush(mn, address, address + 1, - kvm_test_age_gfn); + return kvm_age_hva_range_no_flush(mn, address, address + 1, + kvm_test_age_gfn); } =20 static void kvm_mmu_notifier_release(struct mmu_notifier *mn, --=20 2.48.1.362.g079036d154-goog From nobody Sun Dec 14 06:21:57 2025 Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0649F5684 for ; Tue, 4 Feb 2025 00:40:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629659; cv=none; b=YMWKEby8DTSnHtxEziT20ArOMuxwjrwr78igBtJtvCJihwCmi7vnLJ4QobwhVbTD8wOheFAYhGkzuB3tCc5Wwy8wfD/EkCWYSY05epHckAyqH3CQKSSICwIhZNnfP3SZuxytB+/v6v+0csFdIndR57sQmroOLxQQoQA5Ez2p5GA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629659; c=relaxed/simple; bh=BgGA7ROTPL0sIEgMvRuiQa9cabWE/8+IjXJZdJHbfGw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ttMsLqijHsuK9d25BEXTX80Gm6Q3yTzxNNY3JGJI88uWjOJx+dzqUP8p1Rz3F06txBsWhNVQzc3voIsrKTGI0jVLzBAGlQYEI2k8pkZo4p5nm1NQGqQpTeaW6oJ32ViVSN+U8oE0ZO4lKgFNdFGSfS2y4YBJdTCal/mAzCJs2YA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mwNdEXKj; arc=none smtp.client-ip=209.85.217.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mwNdEXKj" Received: by mail-vs1-f73.google.com with SMTP id ada2fe7eead31-4aff4bc545dso469046137.3 for ; Mon, 03 Feb 2025 16:40:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629657; x=1739234457; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=p6K1OVMuQTd658Umax8NEjVCXyrHg3jp4bgkmGXf6oM=; b=mwNdEXKjyUKZPytD7GrhMD/ZXFhxYtlElMZJRr1CsvM43MP6T4omE+tuyu4DW662Rw WYJDUqZsl3Z76oGOBibpa56oOejJbnTxJDR8D6Cq72MIKg2zHLL0lLT03+9cKlvnYlKO 0d36kDpThzUoxYvuwdWh9nHKKYfPjjGDahJoDF8cIeoyl8cJ16Yhdqj19CryqK+n28p9 CxMRWBnwvZYESjSZtka/8Okt5lbERVbCEnMGc3TBFRzaCYpKvAhv3t0m5C3Ul5/J7IE9 NN3exiCWTBEtgg7IvaxBrTSTcxJiY9k8bF+CDDDk6KfJyZFyuMrcxOtk98WHDLSUgAQc FaGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629657; x=1739234457; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=p6K1OVMuQTd658Umax8NEjVCXyrHg3jp4bgkmGXf6oM=; b=K1HTpdKRvfDTWgctft5IXvd3FEB45XsDxfat9H5q1NqCHxCKazWgzMaoNGdkv7jAi+ M89VhZzRbROEKT8XIG/c2wO36bXFqLw9OE+KOYR5gh8iORdgCGyhX+e7gv64DWw+RJyo GOr5SflhS2gvHjj5KrUQC1Ej0kZBW48d5sedYQtjnMLtAeWqtmLA+lFRQ4al/O8AR0pU 5i9ivpHJKkgdVyJINjYkaMpyJ4tvbxXYBgUIgms97+kUgHFrUdFSCgzoOTkfGdUmcVCr IQha9wrjdxjBe+9DrG35szXhIgv8NQ6bB3YCSzrgE5SBrBotZHLI6XQN9szw9NO3r3QC JWPw== X-Forwarded-Encrypted: i=1; AJvYcCU2xhoyySizr0I2f6H5QtOH2u8dGDgJNdRqdxCn2yDDFvVhO8c5NC2yDn+RY06r2Ng7G9iFr+CdjAd+Zo4=@vger.kernel.org X-Gm-Message-State: AOJu0YwEdteX9b6npWfGfGu8xeLq4oda52kLJsLP4Ua8dZ5SN2ftHxwP 8OBJiy6ncSnUW9MPPvCfAoJmvXSEUOmNwKoMQWxk4Hhol3sAIW7kvxmoqSzg3OlI7i2d35MRbLS SSW2+WV24fJuaqVviCQ== X-Google-Smtp-Source: AGHT+IG7mZazUgI6zXPd8PxKR1PwQ9GiCLBimHc3vVCrmcoMwEafOmvY7s+h7RDaMlvDg6+p4+ggaHz+iPCzJVV7 X-Received: from vsqd6.prod.google.com ([2002:a05:6102:406:b0:4ba:d75:c698]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:504b:b0:4b1:1b33:eb0f with SMTP id ada2fe7eead31-4b9a526c5f9mr19288103137.24.1738629656799; Mon, 03 Feb 2025 16:40:56 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:29 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-3-jthoughton@google.com> Subject: [PATCH v9 02/11] KVM: Add lockless memslot walk to KVM From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It is possible to correctly do aging without taking the KVM MMU lock; this option allows such architectures to do so. Architectures that select CONFIG_KVM_MMU_NOTIFIER_AGING_LOCKLESS are responsible for correctness. Suggested-by: Yu Zhao Signed-off-by: James Houghton Reviewed-by: David Matlack --- include/linux/kvm_host.h | 1 + virt/kvm/Kconfig | 2 ++ virt/kvm/kvm_main.c | 24 +++++++++++++++++------- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f34f4cfaa513..c28a6aa1f2ed 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -267,6 +267,7 @@ struct kvm_gfn_range { union kvm_mmu_notifier_arg arg; enum kvm_gfn_range_filter attr_filter; bool may_block; + bool lockless; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 54e959e7d68f..9356f4e4e255 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -102,6 +102,8 @@ config KVM_GENERIC_MMU_NOTIFIER =20 config KVM_ELIDE_TLB_FLUSH_IF_YOUNG depends on KVM_GENERIC_MMU_NOTIFIER + +config KVM_MMU_NOTIFIER_AGING_LOCKLESS bool =20 config KVM_GENERIC_MEMORY_ATTRIBUTES diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1bd49770506a..4734ae9e8a54 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -517,6 +517,7 @@ struct kvm_mmu_notifier_range { on_lock_fn_t on_lock; bool flush_on_ret; bool may_block; + bool lockless; }; =20 /* @@ -571,6 +572,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_ran= ge(struct kvm *kvm, IS_KVM_NULL_FN(range->handler))) return r; =20 + /* on_lock will never be called for lockless walks */ + if (WARN_ON_ONCE(range->lockless && !IS_KVM_NULL_FN(range->on_lock))) + return r; + idx =3D srcu_read_lock(&kvm->srcu); =20 for (i =3D 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { @@ -607,15 +612,18 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_ra= nge(struct kvm *kvm, gfn_range.start =3D hva_to_gfn_memslot(hva_start, slot); gfn_range.end =3D hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot =3D slot; + gfn_range.lockless =3D range->lockless; =20 if (!r.found_memslot) { r.found_memslot =3D true; - KVM_MMU_LOCK(kvm); - if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm); - - if (IS_KVM_NULL_FN(range->handler)) - goto mmu_unlock; + if (!range->lockless) { + KVM_MMU_LOCK(kvm); + if (!IS_KVM_NULL_FN(range->on_lock)) + range->on_lock(kvm); + + if (IS_KVM_NULL_FN(range->handler)) + goto mmu_unlock; + } } r.ret |=3D range->handler(kvm, &gfn_range); } @@ -625,7 +633,7 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_rang= e(struct kvm *kvm, kvm_flush_remote_tlbs(kvm); =20 mmu_unlock: - if (r.found_memslot) + if (r.found_memslot && !range->lockless) KVM_MMU_UNLOCK(kvm); =20 srcu_read_unlock(&kvm->srcu, idx); @@ -647,6 +655,8 @@ static __always_inline int kvm_age_hva_range(struct mmu= _notifier *mn, .on_lock =3D (void *)kvm_null_fn, .flush_on_ret =3D flush_on_ret, .may_block =3D false, + .lockless =3D + IS_ENABLED(CONFIG_KVM_MMU_NOTIFIER_AGING_LOCKLESS), }; =20 return kvm_handle_hva_range(kvm, &range).ret; --=20 2.48.1.362.g079036d154-goog From nobody Sun Dec 14 06:21:57 2025 Received: from mail-vk1-f202.google.com (mail-vk1-f202.google.com [209.85.221.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A55C770825 for ; Tue, 4 Feb 2025 00:40:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629660; cv=none; b=T91HCgNMDaaKmn8VRkC0/GDNR0kBST62RvHW9ti0Hpjfc6phfDTGaRMD7VU9QuqgMaYVLc4WbsEhujdDdPpSYmPt4HKwuPlJcyuHpsYaN3Cu5VisYZ05GypJOseAa904S4JERWV2EN6QOTKEQ4G9TpL7Knc2TmsZPXmVkZ1ACyE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629660; c=relaxed/simple; bh=d6Z0Qr4CTc08P9WkJMHZwVj8gKSLjMo2tlv6iVU7it0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=r9LW8P8ZSMauVcI98OsY5jMwxDyU8f/o4jNQw+qQr2jDKA2YLDntEGxck/VvTHZpNNw2v3ZWuXbgu3jcBzmYDVd96AKIoJ5y5+Nt39XoPgsEl+fwUGTwQ/7cXSoJkyxuSLWQbbTB6nbbjzi6vJi7DMBBge65qhcC1mVdg/oM8Bo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DTg5c4IK; arc=none smtp.client-ip=209.85.221.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DTg5c4IK" Received: by mail-vk1-f202.google.com with SMTP id 71dfb90a1353d-51604d32337so1006754e0c.3 for ; Mon, 03 Feb 2025 16:40:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629657; x=1739234457; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Z8EVTlPHi7Mb2Te93qLO1p412b+GgWZ4HCCPa7QLCB4=; b=DTg5c4IK4uqeuKtprc8dCv/TZ7A4uurNkQMARBgJxO+sjnAeHiw8rFBELPn6xJP5uC xgJfetwG8KkHjqgLqiwIKmMJria6QEul1ClGwcQ3fPKRFbqo8xIfN8WUK5NkGLGBpmXd 9IVxv/pkg0IeWlG9S5kp04FDegDFP4FlD9eebRcu7p+uSU8x6THomfAV9aVzDT08bWJ1 6xsOo0TDwPuqVx3NRUNjKXgR8ObT1AGkb9Wdp7Xxp7U1JMJVZQW5Pq/qiWCxB/m53djr 2TFRspj9z/Pw23JuoJZDv3s5Oav9LvjH46cxLrHPJut+nKDZ0qSt48U5VyTbres1KwFE 3CtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629657; x=1739234457; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Z8EVTlPHi7Mb2Te93qLO1p412b+GgWZ4HCCPa7QLCB4=; b=uhoHTu1YviyZ/+pvNZCfgSRAqCvWvK1WEYRlTzdTHTBrc92IUNES86XVK+6Lg4b9pA rxhkBOMx56jL9FcT2JSkm6lf5mX4DeQqYCIm20DUBVx367CNFDik78S4c9Gkfr3yBXod NMTXnbnvHMsqX+BzeMGYVe2Pa+z0oVn0Ckn7CXbRfr8RK86wBLHSCTA31nsBoe0Mh3J3 yHATKlE2TpWSo5G1ski+Wd52w7K1U+svU6gmuECe2h9vTUpK+Q6P4ULlzSB0YocHFW9R 6mJH+93fcg8WElLQeQosPRilCkeFVxCvRvUv2V/EQNzpmg6akVZ835ycRs620exzoFTI Hm1g== X-Forwarded-Encrypted: i=1; AJvYcCU/8EDSHE56AxWnqRuc9BK0iAkwEv4J3LPTOcOAo9hyxmpFxo1ddbsCg/mzvRF050QPx9lZdViBC4bXDVY=@vger.kernel.org X-Gm-Message-State: AOJu0YzZ/BrMQMVcnxeKU+Pb8PoVAK3P9/rLZyiytsmdyO1WAfwTZlaO WKRMmcs3dJxnsQzqCmDobCjp6V5gHKUH1nz5ue31mGplFqdLcAxEVQPH550z4GLqO1g+wg+GnFm 3/YA47Ud2xbPhJc/HbQ== X-Google-Smtp-Source: AGHT+IGHq940BRezUkVwlNOCiT1XgwqRxT3qyHCt+hNVEw0Fg5bWPvg40CR+xh2gIgoPsc7fxthizhVN+hgnsK9w X-Received: from vsbid8.prod.google.com ([2002:a05:6102:4bc8:b0:4af:ed99:8589]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:c0f:b0:4af:c519:4e86 with SMTP id ada2fe7eead31-4b9a4ec4902mr18110648137.1.1738629657558; Mon, 03 Feb 2025 16:40:57 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:30 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-4-jthoughton@google.com> Subject: [PATCH v9 03/11] KVM: x86/mmu: Factor out spte atomic bit clearing routine From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This new function, tdp_mmu_clear_spte_bits_atomic(), will be used in a follow-up patch to enable lockless Accessed bit clearing. Signed-off-by: James Houghton Acked-by: Yu Zhao --- arch/x86/kvm/mmu/tdp_iter.h | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 047b78333653..9135b035fa40 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -25,6 +25,13 @@ static inline u64 kvm_tdp_mmu_write_spte_atomic(tdp_ptep= _t sptep, u64 new_spte) return xchg(rcu_dereference(sptep), new_spte); } =20 +static inline u64 tdp_mmu_clear_spte_bits_atomic(tdp_ptep_t sptep, u64 mas= k) +{ + atomic64_t *sptep_atomic =3D (atomic64_t *)rcu_dereference(sptep); + + return (u64)atomic64_fetch_and(~mask, sptep_atomic); +} + static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) { KVM_MMU_WARN_ON(is_ept_ve_possible(new_spte)); @@ -63,12 +70,8 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t spte= p, u64 old_spte, static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte, u64 mask, int level) { - atomic64_t *sptep_atomic; - - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) { - sptep_atomic =3D (atomic64_t *)rcu_dereference(sptep); - return (u64)atomic64_fetch_and(~mask, sptep_atomic); - } + if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) + return tdp_mmu_clear_spte_bits_atomic(sptep, mask); =20 __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask); return old_spte; --=20 2.48.1.362.g079036d154-goog From nobody Sun Dec 14 06:21:57 2025 Received: from mail-qk1-f201.google.com (mail-qk1-f201.google.com [209.85.222.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC6F678F26 for ; Tue, 4 Feb 2025 00:40:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629661; cv=none; b=c6U1bO1AxY0TU+LjZdmWh+dMT8Vh80jrKGXelTpZdsYeBElTR4dQyGkDCFi0hnsw07ajCNhmDdH9dA9sSFuOzq8Ad30ieGnqi7Dx4q2Zz/TT9KQPk0HQ2nrEClUaK7gzo1k5Fb1UbYMfDlo005HoQDAK84kKourdegbrCXp3n40= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629661; c=relaxed/simple; bh=bX2/w7UmvLEkOahma7kREuSvI6Ls5tXqD4E2HIZYIY0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cXyUUcU8f0cayvPsq+n3ty9IbBFz1ccTZVQhsq8TyBGH7eWjz2pWavbd5tQTtTrhPWH1i/unVRAKziBRpIhm4533Sd/59zsV7DEFu+d+Ny62Kofas1k7djdy2VasA/g9YNwQbt3QcBl5IcIil5sHlPkrimlJU4rUg4o2RlenZso= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=YYF2yOB4; arc=none smtp.client-ip=209.85.222.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YYF2yOB4" Received: by mail-qk1-f201.google.com with SMTP id af79cd13be357-7b6e6cf6742so1455613385a.3 for ; Mon, 03 Feb 2025 16:40:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629658; x=1739234458; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gwfNdavV7e4Rb86VqYhaSGV1yuPDe9C0uBkRmAorJGI=; b=YYF2yOB4yolatRddZmhZOuX+14I5uHozwCWJH97qfs0z7ja7LIEHR41MJzUnEry8Ni lVeBX/ZUowXN8/p7GuxnNG55s4r5lHBi8cK2wRGzNa2hIAQQQJ+nlt/roeg97YrJjSOW Ng0QfxyXXH3GWivoKCZNfTxw3uux1hKeOvFN3OULphm4OkC/QIEqd0XhSdAFSQyUnCHf ntX45QVg82wusS6Po7phWX3dsSAjJxzzf7gHkI5MCKvEObiT+eE7Fp1XbAA/l4LohRoZ nceWmb3/I/XwuWRCErO0yrC+OeJ+BjXnKh/PHBoBdzpgG1Hk4598Xs48Jx00lZfihFDl bLRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629658; x=1739234458; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gwfNdavV7e4Rb86VqYhaSGV1yuPDe9C0uBkRmAorJGI=; b=W7Rfg0HvrlNebgPT1F90a2ztB98A/FrPtuNZYMhfhnWobfeRCM0RYussJekmea/XfZ G6eQ40cPptbM9t/5vBXI7aNnEDTmbFfZsadgIN2WR7h8Qktksz05HilH6kEmpLLMO2/Q TqfzRKS73V6WSioYSM2lI7AWmOQIKtJf3m/4TyJoy9h5ua3SC4UQ0IvTUnPNa2hA5reb ZDcLQF8JmE2PGmqjc5KynRscPI9d7pLWv8e7f9rW4mTvucdjHVIMMc55V527KZsouvzc NOgTS/oXHYjZya3y4hMf8uJwTMEJhbgZ5RD9Bqi+0sE5l6juMMGuRkp0aogNWiregdil ajGg== X-Forwarded-Encrypted: i=1; AJvYcCXTgvE1sN6H7QPsqEWTetnzkvv5vY1EE+sJVNnOhn6bgVYLYqtd0EroC5YG5mztTg0ob/gNKu0Kg83mqSA=@vger.kernel.org X-Gm-Message-State: AOJu0YwBV1z0lbn7OUHe0YAUY/owvMkfje1TnW5wko1+AilK+NVRMLEq LpYBnkWQ3Y4ciYMLK02Gni2fbdsasVqQonkbngwHS/xvMHAlSq1aHSbheXE1KxZF5CstX5ljH9+ RGvppMOb+VghidtYtNg== X-Google-Smtp-Source: AGHT+IGeok83Q9IAnky23jRFLjinK54hMup/gDbGN0jTr0epHOw3qamViOu7KianemfptLmTnNVDBCnVR1genjs+ X-Received: from qknpr12.prod.google.com ([2002:a05:620a:86cc:b0:7bc:dee1:94a3]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:620a:bcb:b0:7af:c60b:5acf with SMTP id af79cd13be357-7bffccbfc15mr3017337685a.10.1738629658570; Mon, 03 Feb 2025 16:40:58 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:31 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-5-jthoughton@google.com> Subject: [PATCH v9 04/11] KVM: x86/mmu: Relax locking for kvm_test_age_gfn() and kvm_age_gfn() From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Walk the TDP MMU in an RCU read-side critical section without holding mmu_lock when harvesting and potentially updating age information on sptes. This requires a way to do RCU-safe walking of the tdp_mmu_roots; do this with a new macro. The PTE modifications are now always done atomically. spte_has_volatile_bits() no longer checks for Accessed bit at all. It can (now) be set and cleared without taking the mmu_lock, but dropping Accessed bit updates is already tolerated (the TLB is not invalidated after clearing the Accessed bit). If the cmpxchg for marking the spte for access tracking fails, leave it as is and treat it as if it were young, as if the spte is being actively modified, it is most likely young. Harvesting age information from the shadow MMU is still done while holding the MMU write lock. Suggested-by: Yu Zhao Signed-off-by: James Houghton Reviewed-by: David Matlack Reviewed-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 10 +++++++-- arch/x86/kvm/mmu/spte.c | 10 +++++++-- arch/x86/kvm/mmu/tdp_iter.h | 9 +++++---- arch/x86/kvm/mmu/tdp_mmu.c | 36 +++++++++++++++++++++++---------- 6 files changed, 48 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index f378cd43241c..0e44fc1cec0d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1479,6 +1479,7 @@ struct kvm_arch { * tdp_mmu_page set. * * For reads, this list is protected by: + * RCU alone or * the MMU lock in read mode + RCU or * the MMU lock in write mode * diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index ea2c4f21c1ca..f0a60e59c884 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -22,6 +22,7 @@ config KVM_X86 select KVM_COMMON select KVM_GENERIC_MMU_NOTIFIER select KVM_ELIDE_TLB_FLUSH_IF_YOUNG + select KVM_MMU_NOTIFIER_AGING_LOCKLESS select HAVE_KVM_IRQCHIP select HAVE_KVM_PFNCACHE select HAVE_KVM_DIRTY_RING_TSO diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a45ae60e84ab..7779b49f386d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1592,8 +1592,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_ran= ge *range) { bool young =3D false; =20 - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young =3D kvm_rmap_age_gfn_range(kvm, range, false); + write_unlock(&kvm->mmu_lock); + } =20 if (tdp_mmu_enabled) young |=3D kvm_tdp_mmu_age_gfn_range(kvm, range); @@ -1605,8 +1608,11 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gf= n_range *range) { bool young =3D false; =20 - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young =3D kvm_rmap_age_gfn_range(kvm, range, true); + write_unlock(&kvm->mmu_lock); + } =20 if (tdp_mmu_enabled) young |=3D kvm_tdp_mmu_test_age_gfn(kvm, range); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 22551e2f1d00..e984b440c0f0 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -142,8 +142,14 @@ bool spte_has_volatile_bits(u64 spte) return true; =20 if (spte_ad_enabled(spte)) { - if (!(spte & shadow_accessed_mask) || - (is_writable_pte(spte) && !(spte & shadow_dirty_mask))) + /* + * Do not check the Accessed bit. It can be set (by the CPU) + * and cleared (by kvm_tdp_mmu_age_spte()) without holding + * the mmu_lock, but when clearing the Accessed bit, we do + * not invalidate the TLB, so we can already miss Accessed bit + * updates. + */ + if (is_writable_pte(spte) && !(spte & shadow_dirty_mask)) return true; } =20 diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 9135b035fa40..05e9d678aac9 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -39,10 +39,11 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t = sptep, u64 new_spte) } =20 /* - * SPTEs must be modified atomically if they are shadow-present, leaf - * SPTEs, and have volatile bits, i.e. has bits that can be set outside - * of mmu_lock. The Writable bit can be set by KVM's fast page fault - * handler, and Accessed and Dirty bits can be set by the CPU. + * SPTEs must be modified atomically if they have bits that can be set out= side + * of the mmu_lock. This can happen for any shadow-present leaf SPTEs, as = the + * Writable bit can be set by KVM's fast page fault handler, the Accessed = and + * Dirty bits can be set by the CPU, and the Accessed and W/R/X bits can be + * cleared by age_gfn_range(). * * Note, non-leaf SPTEs do have Accessed bits and those bits are * technically volatile, but KVM doesn't consume the Accessed bit of diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 046b6ba31197..c9778c3e6ecd 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -193,6 +193,19 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct k= vm *kvm, !tdp_mmu_root_match((_root), (_types)))) { \ } else =20 +/* + * Iterate over all TDP MMU roots in an RCU read-side critical section. + * It is safe to iterate over the SPTEs under the root, but their values w= ill + * be unstable, so all writes must be atomic. As this routine is meant to = be + * used without holding the mmu_lock at all, any bits that are flipped must + * be reflected in kvm_tdp_mmu_spte_need_atomic_write(). + */ +#define for_each_tdp_mmu_root_rcu(_kvm, _root, _as_id, _types) \ + list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link) \ + if ((_as_id >=3D 0 && kvm_mmu_page_as_id(_root) !=3D _as_id) || \ + !tdp_mmu_root_match((_root), (_types))) { \ + } else + #define for_each_valid_tdp_mmu_root(_kvm, _root, _as_id) \ __for_each_tdp_mmu_root(_kvm, _root, _as_id, KVM_VALID_ROOTS) =20 @@ -1332,21 +1345,22 @@ bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, s= truct kvm_gfn_range *range, * from the clear_young() or clear_flush_young() notifier, which uses the * return value to determine if the page has been accessed. */ -static void kvm_tdp_mmu_age_spte(struct tdp_iter *iter) +static void kvm_tdp_mmu_age_spte(struct kvm *kvm, struct tdp_iter *iter) { u64 new_spte; =20 if (spte_ad_enabled(iter->old_spte)) { - iter->old_spte =3D tdp_mmu_clear_spte_bits(iter->sptep, - iter->old_spte, - shadow_accessed_mask, - iter->level); + iter->old_spte =3D tdp_mmu_clear_spte_bits_atomic(iter->sptep, + shadow_accessed_mask); new_spte =3D iter->old_spte & ~shadow_accessed_mask; } else { new_spte =3D mark_spte_for_access_track(iter->old_spte); - iter->old_spte =3D kvm_tdp_mmu_write_spte(iter->sptep, - iter->old_spte, new_spte, - iter->level); + /* + * It is safe for the following cmpxchg to fail. Leave the + * Accessed bit set, as the spte is most likely young anyway. + */ + if (__tdp_mmu_set_spte_atomic(kvm, iter, new_spte)) + return; } =20 trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level, @@ -1371,9 +1385,9 @@ static bool __kvm_tdp_mmu_age_gfn_range(struct kvm *k= vm, * valid roots! */ WARN_ON(types & ~KVM_VALID_ROOTS); - __for_each_tdp_mmu_root(kvm, root, range->slot->as_id, types) { - guard(rcu)(); =20 + guard(rcu)(); + for_each_tdp_mmu_root_rcu(kvm, root, range->slot->as_id, types) { tdp_root_for_each_leaf_pte(iter, kvm, root, range->start, range->end) { if (!is_accessed_spte(iter.old_spte)) continue; @@ -1382,7 +1396,7 @@ static bool __kvm_tdp_mmu_age_gfn_range(struct kvm *k= vm, return true; =20 ret =3D true; - kvm_tdp_mmu_age_spte(&iter); + kvm_tdp_mmu_age_spte(kvm, &iter); } } =20 --=20 2.48.1.362.g079036d154-goog From nobody Sun Dec 14 06:21:57 2025 Received: from mail-vs1-f74.google.com (mail-vs1-f74.google.com [209.85.217.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 677A87082D for ; Tue, 4 Feb 2025 00:41:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629662; cv=none; b=E2vo0kguF6RfJJ4NuKRYr/eN1xiy7+pzXxcH//n5sqH3J29tNOXjeQ3cH6+qWzbd1KqbCf9cgJwzHScQWy0rCkUFuGnxHBcasY2LP5i7+Xnx+Z5L4FC1IbQUj9Uyu59V5cPH3apbvzlXxHBj+BcNH37Ml7T+/FhaRsqGg+TZ0YA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629662; c=relaxed/simple; bh=laK1gDLMFvyUOb7SAvftZWPYqAgaPTCfy3IryJLdS2o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=QRRmlkHmHJwgr8bqOkxHFjNxgwMqqF9OhIWpqqxmoO1ZCzHJHBvuK8rh9MsrHc/2IQcXxz1Zzk/HM+QtcSfZs+DqKDM3/68ihOKbVgJ7gf/f6Tq9YXat8Fk6ZkFeyVpzgHvwQWXx3l4MrtxP1JCNTBYzw6v537z7KmvuuXbRoc0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oK/dptWb; arc=none smtp.client-ip=209.85.217.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oK/dptWb" Received: by mail-vs1-f74.google.com with SMTP id ada2fe7eead31-4b053191e52so466922137.2 for ; Mon, 03 Feb 2025 16:41:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629659; x=1739234459; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UClTVLduxzV//E3HpIXY8T1q8+8Xs6EHUNjdjil8pCk=; b=oK/dptWb3NxpzG3CGLqtB8xIjYbYX+ekgAPHYpotDE8cKKviHiklqTSxcHcJT+lhWn s4SgDu2vB9EaRKwSZ/OWCB62lwaA1fExkAKbU4J65oiuDp23nM08+YMyGXjcFg9x+3EC INkUAhStVpd8Ulrax51Wr35FoG5qxoX5CODKn4/FpzttpJCdmssqCyu+YT3xyW2ob9Ez lsTT3lwmeOQOd1ncRRRmnOgdX1DkX5CeIcZPXRcMCqCq/AFH+5R0RWjuaZp0N1KpNQZX ePPQq51V0B2ENKMI5o8EQMwJir0g4Uq5DIvgyZuwem7H1+j0sXP2JJ3c18R79PsK6OUB ujqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629659; x=1739234459; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UClTVLduxzV//E3HpIXY8T1q8+8Xs6EHUNjdjil8pCk=; b=KS5+SRWcT/rxa/R0qIDR0b+GqqoLKtKGGg31hRbXVf68zfe5+3YHwsohPFQrIFRjwc OoSjPUE1V0OVvIH3BlkqhBHpEFnLd5agCD/o521VwcLP2JBNF43CY8hdsSeYd9bAGGyy OB9mNvlQz3FOHzmsVmAcMIJxNY27o2qnH3hhh1hxG/jrjXK+n3O1KAIYfvZ8oSSVarTC 02Jp7nGUnups/mwlBQ/yoFCLlZ1HnCFwb/SWN5EUOXCCiXIv++eKqBG0q5NxiERFfErQ 1i1BBhhFBYPiUD2gWc2Co0tMdaJDi+RF4o3AiTqOg6tjuopsTJj3MTHypU9r4/nCkNvr eiEw== X-Forwarded-Encrypted: i=1; AJvYcCV1hAiR1u0bK87zcCA/3LiaplG1bvvAUyyVd56AkJuK6HIjjEUJPJEuP2IhXlYkz4vvnEmWHrut5eNX4AE=@vger.kernel.org X-Gm-Message-State: AOJu0YzA2/QhYR+z2Zucj3XmMVL+l8ujTpFMzT34nVEbdh9WSnpCsmGM ureM2Julbc/foEHcG0hvrkmETJWy1LypzSKjGGbT0DkAnpnFobJ2xx0mgoyP2sIMdxhmLlCfre/ EPks7rVF/6k59hOKkoA== X-Google-Smtp-Source: AGHT+IFoYCI+/fk9XlW4Og9OojNpJjeFqCn4vcpYW8e9yXKsurANBIx2UF+cPzuP7f1QaF+j3i3290eON/+qh+LD X-Received: from vsbij9.prod.google.com ([2002:a05:6102:5e89:b0:4af:ac82:5669]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:54a0:b0:4b2:5c0a:9aff with SMTP id ada2fe7eead31-4b9a4ec8e82mr19199359137.3.1738629659286; Mon, 03 Feb 2025 16:40:59 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:32 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-6-jthoughton@google.com> Subject: [PATCH v9 05/11] KVM: x86/mmu: Rename spte_has_volatile_bits() to spte_needs_atomic_write() From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" spte_has_volatile_bits() is now a misnomer, as the an SPTE can have its Accessed bit set or cleared without the mmu_lock held, but the state of the Accessed bit is not checked in spte_has_volatile_bits(). Even if a caller uses spte_needs_atomic_write(), Accessed bit information may still be lost, but that is already tolerated, as the TLB is not invalidated after the Accessed bit is cleared. Signed-off-by: James Houghton --- Documentation/virt/kvm/locking.rst | 4 ++-- arch/x86/kvm/mmu/mmu.c | 4 ++-- arch/x86/kvm/mmu/spte.c | 9 +++++---- arch/x86/kvm/mmu/spte.h | 2 +- arch/x86/kvm/mmu/tdp_iter.h | 2 +- 5 files changed, 11 insertions(+), 10 deletions(-) diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/lo= cking.rst index c56d5f26c750..4720053c70a3 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -196,7 +196,7 @@ writable between reading spte and updating spte. Like b= elow case: The Dirty bit is lost in this case. =20 In order to avoid this kind of issue, we always treat the spte as "volatil= e" -if it can be updated out of mmu-lock [see spte_has_volatile_bits()]; it me= ans +if it can be updated out of mmu-lock [see spte_needs_atomic_write()]; it m= eans the spte is always atomically updated in this case. =20 3) flush tlbs due to spte updated @@ -212,7 +212,7 @@ function to update spte (present -> present). =20 Since the spte is "volatile" if it can be updated out of mmu-lock, we alwa= ys atomically update the spte and the race caused by fast page fault can be a= voided. -See the comments in spte_has_volatile_bits() and mmu_spte_update(). +See the comments in spte_needs_atomic_write() and mmu_spte_update(). =20 Lockless Access Tracking: =20 diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7779b49f386d..1fa0f47eb6a5 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -501,7 +501,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) return false; } =20 - if (!spte_has_volatile_bits(old_spte)) + if (!spte_needs_atomic_write(old_spte)) __update_clear_spte_fast(sptep, new_spte); else old_spte =3D __update_clear_spte_slow(sptep, new_spte); @@ -524,7 +524,7 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u= 64 *sptep) int level =3D sptep_to_sp(sptep)->role.level; =20 if (!is_shadow_present_pte(old_spte) || - !spte_has_volatile_bits(old_spte)) + !spte_needs_atomic_write(old_spte)) __update_clear_spte_fast(sptep, SHADOW_NONPRESENT_VALUE); else old_spte =3D __update_clear_spte_slow(sptep, SHADOW_NONPRESENT_VALUE); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index e984b440c0f0..ae2017cc1239 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -129,11 +129,12 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) } =20 /* - * Returns true if the SPTE has bits that may be set without holding mmu_l= ock. - * The caller is responsible for checking if the SPTE is shadow-present, a= nd - * for determining whether or not the caller cares about non-leaf SPTEs. + * Returns true if the SPTE has bits other than the Accessed bit that may = be + * changed without holding mmu_lock. The caller is responsible for checkin= g if + * the SPTE is shadow-present, and for determining whether or not the call= er + * cares about non-leaf SPTEs. */ -bool spte_has_volatile_bits(u64 spte) +bool spte_needs_atomic_write(u64 spte) { if (!is_writable_pte(spte) && is_mmu_writable_spte(spte)) return true; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 59746854c0af..4c290ae9a02a 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -519,7 +519,7 @@ static inline u64 get_mmio_spte_generation(u64 spte) return gen; } =20 -bool spte_has_volatile_bits(u64 spte); +bool spte_needs_atomic_write(u64 spte); =20 bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, const struct kvm_memory_slot *slot, diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 05e9d678aac9..b54123163efc 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -55,7 +55,7 @@ static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64= old_spte, int level) { return is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) && - spte_has_volatile_bits(old_spte); + spte_needs_atomic_write(old_spte); } =20 static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, --=20 2.48.1.362.g079036d154-goog From nobody Sun Dec 14 06:21:57 2025 Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 190D886325 for ; Tue, 4 Feb 2025 00:41:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629663; cv=none; b=rBMyLM3FTOCiC6fglcpjTcGvUbkOJ7I8tg4Q3i6JLDudku+ttQRNp0/IE2G76bnBlP+6VdU4+qu69ErzmWW9CgGII7tmRyFQmT1fzQ8hieaIdgPQikdj5vk4HpQ71g2ZaSP6irmhK7Dp3DZFgzoh79F0XezALYPrZbuB6yFMIPQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629663; c=relaxed/simple; bh=a3TRDZM8x1fiiRZnJfYVD3hVEP1SNiDTsiq1Frcsqko=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=HlcViYOwQiJxN0Rfw5gris35vy6anADWMW9X2zE/wLgByqgnR0jQqc7jaeRpYzVNZTrPDDOHQuBuI8LLKSAiPYwOEj/nUz8RgQUgrKUavvHle69LEgcXd697b+j90GpPvXaKMPxSGoQXr6X2hNEttjM32W6FzMiZZV/s12/RE0A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ByVWq+xs; arc=none smtp.client-ip=209.85.217.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ByVWq+xs" Received: by mail-vs1-f73.google.com with SMTP id ada2fe7eead31-4b053191e52so466924137.2 for ; Mon, 03 Feb 2025 16:41:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629660; x=1739234460; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KfdRry+BcOH3zeRQLxBsJjxeZnQGmGZoCLOL9+q/wCU=; b=ByVWq+xsGVzDqZ2q3JFj3OTiJbhFRdZwi+Va8ijflp//SCMs6C/zaea2EOX0dS4Yp/ rdd0AxMLLuuRlxioi0P9wTxeyOgiskhcYbiDf6UbLZwNR+9uKii/ihaq9vhzhQGpIAZg dv9RxEHaN2oM8yTpn5CFFwJNEM05xFIb7Ht91m89FQkh6KYr1Jslf/c46JYkVCboPC1V DlFrbccx3UK31dYxoByFRQlmMdcmYtwFFrPSJSYlqFSMlaQ5LLdRE+b7IV15ymXo5pRu /kdNXKnbnqHNODr7gkJif2W7wxZsd17SOrNShjKBQqOqYkI/5aeL4wr7pCdbntUMfhLn sMlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629660; x=1739234460; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KfdRry+BcOH3zeRQLxBsJjxeZnQGmGZoCLOL9+q/wCU=; b=WPrTM6d9794sS/lAX1zbJTsjQDWYVAGA/jcoNGsRXvaAxlsL7uky/o1O47OvlwNsqP Eu2tzlZn0EL76YzAXkCZDtw5s/NEIsLLKufBI67Wyye0zakh+KadlwDtPbdNjRhVsv/x YIVIG5KI6gkVtgOYKNDjn28oAO6KuChyCmm6Lzapo3R6Vy921xH0i/WrW6ESOnNU46Wl heS0AaB6/9dZz6wQipgKhzZubilNozyqBC273eAXotY6JGp3BJXrYIbrX3tpN7qHtjSM OcmqeXspXrZ7xiNTuWVawy3gT9jIASw8D554imdz4OcILeWcvFBr2Q58XcbH4wR7D0QH kuVg== X-Forwarded-Encrypted: i=1; AJvYcCVdefR22VoHFtQpZcH6WD9Mm/JSBwGNTod//G7nKwx69nBZhBoPjqzU2qYmTw9zXQzOi2EjWnF3TDzhbMk=@vger.kernel.org X-Gm-Message-State: AOJu0YxbXqRXL3MF/GvkcnlbaFl18ooBdjWOhZXcs9lR1CrYRQoJtb2r 2QNz/79zVu5AF8Xe9Ddvzlht+PJhI+zRTQDHL1f3zRlWb2sopLoX7NnBIAH7juSmRRXjARIJioF qhkBE7Min4tL1tKBcJQ== X-Google-Smtp-Source: AGHT+IEamuzx9s5qse7UsUK89rO9T79mCD2dXIbBlho6815iBfrQcz4XNUZNgGX5066o3jnMHhInIwduF5Gws1Iv X-Received: from vsvj20.prod.google.com ([2002:a05:6102:3e14:b0:4b2:ae6d:7c1d]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:226a:b0:4b9:bdd8:2091 with SMTP id ada2fe7eead31-4b9bdd82bf9mr9551814137.14.1738629660026; Mon, 03 Feb 2025 16:41:00 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:33 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-7-jthoughton@google.com> Subject: [PATCH v9 06/11] KVM: x86/mmu: Skip shadow MMU test_young if TDP MMU reports page as young From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Reorder the processing of the TDP MMU versus the shadow MMU when aging SPTEs, and skip the shadow MMU entirely in the test-only case if the TDP MMU reports that the page is young, i.e. completely avoid taking mmu_lock if the TDP MMU SPTE is young. Swap the order for the test-and-age helper as well for consistency. Signed-off-by: James Houghton Acked-by: Yu Zhao --- arch/x86/kvm/mmu/mmu.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1fa0f47eb6a5..4a9de4b330d7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1592,15 +1592,15 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_ra= nge *range) { bool young =3D false; =20 + if (tdp_mmu_enabled) + young =3D kvm_tdp_mmu_age_gfn_range(kvm, range); + if (kvm_memslots_have_rmaps(kvm)) { write_lock(&kvm->mmu_lock); - young =3D kvm_rmap_age_gfn_range(kvm, range, false); + young |=3D kvm_rmap_age_gfn_range(kvm, range, false); write_unlock(&kvm->mmu_lock); } =20 - if (tdp_mmu_enabled) - young |=3D kvm_tdp_mmu_age_gfn_range(kvm, range); - return young; } =20 @@ -1608,15 +1608,15 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_g= fn_range *range) { bool young =3D false; =20 - if (kvm_memslots_have_rmaps(kvm)) { + if (tdp_mmu_enabled) + young =3D kvm_tdp_mmu_test_age_gfn(kvm, range); + + if (!young && kvm_memslots_have_rmaps(kvm)) { write_lock(&kvm->mmu_lock); - young =3D kvm_rmap_age_gfn_range(kvm, range, true); + young |=3D kvm_rmap_age_gfn_range(kvm, range, true); write_unlock(&kvm->mmu_lock); } =20 - if (tdp_mmu_enabled) - young |=3D kvm_tdp_mmu_test_age_gfn(kvm, range); - return young; } =20 --=20 2.48.1.362.g079036d154-goog From nobody Sun Dec 14 06:21:57 2025 Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2E7535948 for ; Tue, 4 Feb 2025 00:41:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629663; cv=none; b=hiOhCWEi4+a3rDf8pfP8crt0j752uVw1ZBWZsnPywcfLlTzGGVZOUOgzBnxZZ8Tjxy4CdtCUbnEV5H2bRxgqeogrOZit/lfsp64ufFIP9yx4LeMpjEqwBiXOJhaNwYqxQb/qf2fsxqi7RdiRMAr9N2W5SD1qZopFtzHhStejAPk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629663; c=relaxed/simple; bh=gKWMm2etwru6nxao/XwThnbqHsLiE1msJPSwlLVbyq4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gV6dmUFVgyQVMCLijWms124/t/fiysWGMGIPFknok6sINpIMRNoYY/i2FC2VZ7C2eLxt0fc1pldRk01HJK3/Lm0Cf8a9ywB4P61eoslE/atwog2qj2uGn8gnPkWKewmNIVUMXeZPVr0XIiwmInMB/GiegM3MEdIXCUjqXISeE/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rOeGVNsq; arc=none smtp.client-ip=209.85.217.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rOeGVNsq" Received: by mail-vs1-f73.google.com with SMTP id ada2fe7eead31-4b2e29307edso5868344137.0 for ; Mon, 03 Feb 2025 16:41:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629660; x=1739234460; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=evkb5L1+Bb/XCjZRw4KhkzBAPm7Ic3vYQhlj6o4qwVc=; b=rOeGVNsqQl3XOgiNQzqB9MdFEGkWxtTnqV2/kx31I/odULvyHRXjapi2hBWTxYoh6V vuBDrjaPGQE6tSk0XWBzcwLK0mCHetmlfjhXqxEyIIPN1A9aCoLnr9PeXYnDQBQP28eY 2GumGNPlxw1Sa2hMbdXVkPeObDztxOBq8SFWETU+E5JuHjsQAkWPFrf62oy/Vh5Zy6UK ezGbkw0Mq8dZEehcxsXhLnoVJCG7Q3TIg99fSCGM6Qc5rUhXkQsemnv+ytTdHPN9djxo voWJH2Frin2oTF6tyV+qYMIA2p1WNC5T8nR4vH6HPT+0c13e9hRx0ZOe84uqosh7Znkp oDRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629660; x=1739234460; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=evkb5L1+Bb/XCjZRw4KhkzBAPm7Ic3vYQhlj6o4qwVc=; b=tNDmdMvdojGl9Og+K5QRRg223wADNxdrELaRRsd/q4ufqJUU89rhUihjNcvysdAVMI GteLu3oRpgJI3kU6Bz/cx5NCVcdks++sGbaDufLjkqJ1W7j2Wu+AG+XlWFO38hS76Ng5 ZuA1Lfi0iP/duEnzYDD8SswBRsjrdebntwy+9fr4rZ2dyNmTLTRD2f8dPUmGE15AKmuo crLtYcWHHSXg5OWpmbixgf/E34m1FW1bkEP5KVTS1U2UBPIVv5nmn5UUe+QrMQmYzMZ3 gM6n8KBFJMM3yrURtWErK5r5ZJJQaB0j0My/12/TWq2x5zeEIpPgm5htOyYk/X66FzIn 8AEw== X-Forwarded-Encrypted: i=1; AJvYcCU+p4UyNRqR7UZnisjeCfj3LsYAQ0f4PnEl9f7pbnuVWbsWKxLElFA06ecBWOeKY38zqJ7VYDmRFYOIpp8=@vger.kernel.org X-Gm-Message-State: AOJu0Yx1ZwAkH/HoKY3gdum4UJQZ2SxFB4kxr7fUaccaNbHUSh0wGmt1 j6tnVdw4mNtTx4PENAwMNuW6Kr50keLD6JYqDMSViNZjx/0ZMeGjBvQBCqmfDk6ETg0YHn4GNVs HtljkCNPqKDzA+F/KOw== X-Google-Smtp-Source: AGHT+IGU8DrUd67RaNBXFdA4LIHF4vKSiFY0NavTZMn7WxDHSLQeFzRZ+wePcvzz/rTJd7HD6bj8y/+Yhz70nz8T X-Received: from uabib8.prod.google.com ([2002:a05:6130:1c88:b0:857:38b8:a246]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:534b:b0:4b2:ad82:133a with SMTP id ada2fe7eead31-4b9a5300d5emr19705292137.25.1738629660712; Mon, 03 Feb 2025 16:41:00 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:34 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-8-jthoughton@google.com> Subject: [PATCH v9 07/11] KVM: x86/mmu: Only check gfn age in shadow MMU if indirect_shadow_pages > 0 From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When aging SPTEs and the TDP MMU is enabled, process the shadow MMU if and only if the VM has at least one shadow page, as opposed to checking if the VM has rmaps. Checking for rmaps will effectively yield a false positive if the VM ran nested TDP VMs in the past, but is not currently doing so. Signed-off-by: James Houghton Acked-by: Yu Zhao --- arch/x86/kvm/mmu/mmu.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4a9de4b330d7..f75779d8d6fd 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1588,6 +1588,11 @@ static bool kvm_rmap_age_gfn_range(struct kvm *kvm, return young; } =20 +static bool kvm_may_have_shadow_mmu_sptes(struct kvm *kvm) +{ + return !tdp_mmu_enabled || READ_ONCE(kvm->arch.indirect_shadow_pages); +} + bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young =3D false; @@ -1595,7 +1600,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_rang= e *range) if (tdp_mmu_enabled) young =3D kvm_tdp_mmu_age_gfn_range(kvm, range); =20 - if (kvm_memslots_have_rmaps(kvm)) { + if (kvm_may_have_shadow_mmu_sptes(kvm)) { write_lock(&kvm->mmu_lock); young |=3D kvm_rmap_age_gfn_range(kvm, range, false); write_unlock(&kvm->mmu_lock); @@ -1611,7 +1616,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn= _range *range) if (tdp_mmu_enabled) young =3D kvm_tdp_mmu_test_age_gfn(kvm, range); =20 - if (!young && kvm_memslots_have_rmaps(kvm)) { + if (!young && kvm_may_have_shadow_mmu_sptes(kvm)) { write_lock(&kvm->mmu_lock); young |=3D kvm_rmap_age_gfn_range(kvm, range, true); write_unlock(&kvm->mmu_lock); --=20 2.48.1.362.g079036d154-goog From nobody Sun Dec 14 06:21:57 2025 Received: from mail-vs1-f74.google.com (mail-vs1-f74.google.com [209.85.217.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFF1F142903 for ; Tue, 4 Feb 2025 00:41:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629665; cv=none; b=TOO0k9MEl2i7sQh8HSdspE8Bow5uaYCtpSRN6Z8lA4n6+zysU2ArwlWNLEVh4df8q3W971LeiHrHZOJjYWxEtJUlmIY184gEMqN4/ok5ZyYjgu78Am0U3hkYoaD4JGEs/yv5WotvEdog6FLHeUCQnm78bEkemN3C2plirN26ZmA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629665; c=relaxed/simple; bh=NQkRdnDJ8oV4CwyBueAB1Gp67A6naYtqcTLSaIZ4KK0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MlqEhvlEy3knnEjTpzXgM5QkXGkzfRJLzREUwBfXnZFpy+2bXci6f5VHY0XB0lG8d85yoxuRcm7DAbgfVeTDpPOMhTpeDpbsFgzZzq98n65Q4uHEHFdOqxOWx8URtIScP+KSCP9/s2KX7cUdmnjg4j8pLytV0SCexwjNdqstEJ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=BZa0f4/b; arc=none smtp.client-ip=209.85.217.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BZa0f4/b" Received: by mail-vs1-f74.google.com with SMTP id ada2fe7eead31-4b3cc537766so515720137.2 for ; Mon, 03 Feb 2025 16:41:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629662; x=1739234462; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8RxCUl1rGMBFv/rhBwFBhZEDraoafq6dJ5Zr6afNrok=; b=BZa0f4/bhrWVGpv+YEUObPz7sJvXj9npm3BC7KUsWiKt14n/FMg+RQmHYK3ow1BcOB Y0MwaU4ymlFYQrnnjEsavPWCVmexxO0q7rncpa1kbf/rqu77mYAh4VLit4/LlzQ/VA1P FeH/1NleT1LVuvVUl4uKO0oMWmSyBGvpheiAo5l+/sMlg5ob75Ww2zhEbHVlmD6Ht9N2 3hI3AHM2WSqaxctabcERXNHd1oWLfaRejnongDA6eyOL9tk+nm+itcmeBt01B1qDEFlq aLYW7MI2eN1uF96IOwbVpHgxY3wXczPcpVCtMtTRrPMEJKaubxDVtTOkT+1aRQP9B/yA h7ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629662; x=1739234462; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8RxCUl1rGMBFv/rhBwFBhZEDraoafq6dJ5Zr6afNrok=; b=mSF/6q2wKPsnU7gTRSIYHNDHkTz/ENFtWjFeRKTcQbH60FautVS9ekiM6gfj8nulp9 4k7IbTAjogH0NVklFvzdnv3cpyihgPUOt4ZPce5uFUYFzKXyKLpMNt/geihArwldAhra IRzpQCjtulo5pVXBrlVOIyhiHdvk1UtHMWGneZqomBCCjVf+Y4KyEAs5qVG4ocJn52OZ 4tFYGVJ00Vs0+G2xEfa8CeUbU9kgQl02GPY8lyNuuJKbeaGoytGIeH6mX70i7YZDRn2o HCHFqOOOtxkaYAzG+KwI/zZX6bzyL8GSH2ltO7GtMkHsD2FhevY4KwYvZXomMcjd8aZq 2xOg== X-Forwarded-Encrypted: i=1; AJvYcCXbMrmySRM1yNroPs9LC+qcWWwL0OEqCc+P7mY6p1wQMC53D47jVMY8zJCzxyxR7sgKu6BU6yOHOORCn6I=@vger.kernel.org X-Gm-Message-State: AOJu0YyYhoa2QfqZxQT+LiP8puZ7jYRlm+Q8+/LwGQmu+VhqTkcOcoXL BrcwBPZ8N64/PC5TbhmpRChpnpVPA040hiBGJP26+v5dbvnFDduorojqwDk6wRlHPncON9Y99sU hD4kaIvv6cfiSN2MoMQ== X-Google-Smtp-Source: AGHT+IGaTi9ZNq9xOes9fU6YyaVEV8EpFgnNHamMKa9cr3UKjn+KasYzgH/SPpIJ+0LGZyE7LGqO1tR+XsBjYhsh X-Received: from vsvg20.prod.google.com ([2002:a05:6102:1594:b0:4af:b35d:162c]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:5e8b:b0:4af:e5fd:77fc with SMTP id ada2fe7eead31-4b9a4ebe487mr20551082137.3.1738629661703; Mon, 03 Feb 2025 16:41:01 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:35 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-9-jthoughton@google.com> Subject: [PATCH v9 08/11] KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o mmu_lock From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Refactor the pte_list and rmap code to always read and write rmap_head->val exactly once, e.g. by collecting changes in a local variable and then propagating those changes back to rmap_head->val as appropriate. This will allow implementing a per-rmap rwlock (of sorts) by adding a LOCKED bit into the rmap value alongside the MANY bit. Signed-off-by: Sean Christopherson Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 83 +++++++++++++++++++++++++----------------- 1 file changed, 50 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f75779d8d6fd..a24cf8ddca7f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -864,21 +864,24 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_b= itmap(struct kvm_vcpu *vcpu static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, struct kvm_rmap_head *rmap_head) { + unsigned long old_val, new_val; struct pte_list_desc *desc; int count =3D 0; =20 - if (!rmap_head->val) { - rmap_head->val =3D (unsigned long)spte; - } else if (!(rmap_head->val & KVM_RMAP_MANY)) { + old_val =3D rmap_head->val; + + if (!old_val) { + new_val =3D (unsigned long)spte; + } else if (!(old_val & KVM_RMAP_MANY)) { desc =3D kvm_mmu_memory_cache_alloc(cache); - desc->sptes[0] =3D (u64 *)rmap_head->val; + desc->sptes[0] =3D (u64 *)old_val; desc->sptes[1] =3D spte; desc->spte_count =3D 2; desc->tail_count =3D 0; - rmap_head->val =3D (unsigned long)desc | KVM_RMAP_MANY; + new_val =3D (unsigned long)desc | KVM_RMAP_MANY; ++count; } else { - desc =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc =3D (struct pte_list_desc *)(old_val & ~KVM_RMAP_MANY); count =3D desc->tail_count + desc->spte_count; =20 /* @@ -887,21 +890,25 @@ static int pte_list_add(struct kvm_mmu_memory_cache *= cache, u64 *spte, */ if (desc->spte_count =3D=3D PTE_LIST_EXT) { desc =3D kvm_mmu_memory_cache_alloc(cache); - desc->more =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY= ); + desc->more =3D (struct pte_list_desc *)(old_val & ~KVM_RMAP_MANY); desc->spte_count =3D 0; desc->tail_count =3D count; - rmap_head->val =3D (unsigned long)desc | KVM_RMAP_MANY; + new_val =3D (unsigned long)desc | KVM_RMAP_MANY; + } else { + new_val =3D old_val; } desc->sptes[desc->spte_count++] =3D spte; } + + rmap_head->val =3D new_val; + return count; } =20 -static void pte_list_desc_remove_entry(struct kvm *kvm, - struct kvm_rmap_head *rmap_head, +static void pte_list_desc_remove_entry(struct kvm *kvm, unsigned long *rma= p_val, struct pte_list_desc *desc, int i) { - struct pte_list_desc *head_desc =3D (struct pte_list_desc *)(rmap_head->v= al & ~KVM_RMAP_MANY); + struct pte_list_desc *head_desc =3D (struct pte_list_desc *)(*rmap_val & = ~KVM_RMAP_MANY); int j =3D head_desc->spte_count - 1; =20 /* @@ -928,9 +935,9 @@ static void pte_list_desc_remove_entry(struct kvm *kvm, * head at the next descriptor, i.e. the new head. */ if (!head_desc->more) - rmap_head->val =3D 0; + *rmap_val =3D 0; else - rmap_head->val =3D (unsigned long)head_desc->more | KVM_RMAP_MANY; + *rmap_val =3D (unsigned long)head_desc->more | KVM_RMAP_MANY; mmu_free_pte_list_desc(head_desc); } =20 @@ -938,24 +945,26 @@ static void pte_list_remove(struct kvm *kvm, u64 *spt= e, struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc; + unsigned long rmap_val; int i; =20 - if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_head->val, kvm)) - return; + rmap_val =3D rmap_head->val; + if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) + goto out; =20 - if (!(rmap_head->val & KVM_RMAP_MANY)) { - if (KVM_BUG_ON_DATA_CORRUPTION((u64 *)rmap_head->val !=3D spte, kvm)) - return; + if (!(rmap_val & KVM_RMAP_MANY)) { + if (KVM_BUG_ON_DATA_CORRUPTION((u64 *)rmap_val !=3D spte, kvm)) + goto out; =20 - rmap_head->val =3D 0; + rmap_val =3D 0; } else { - desc =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc =3D (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); while (desc) { for (i =3D 0; i < desc->spte_count; ++i) { if (desc->sptes[i] =3D=3D spte) { - pte_list_desc_remove_entry(kvm, rmap_head, + pte_list_desc_remove_entry(kvm, &rmap_val, desc, i); - return; + goto out; } } desc =3D desc->more; @@ -963,6 +972,9 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, =20 KVM_BUG_ON_DATA_CORRUPTION(true, kvm); } + +out: + rmap_head->val =3D rmap_val; } =20 static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -977,17 +989,19 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc, *next; + unsigned long rmap_val; int i; =20 - if (!rmap_head->val) + rmap_val =3D rmap_head->val; + if (!rmap_val) return false; =20 - if (!(rmap_head->val & KVM_RMAP_MANY)) { - mmu_spte_clear_track_bits(kvm, (u64 *)rmap_head->val); + if (!(rmap_val & KVM_RMAP_MANY)) { + mmu_spte_clear_track_bits(kvm, (u64 *)rmap_val); goto out; } =20 - desc =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc =3D (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); =20 for (; desc; desc =3D next) { for (i =3D 0; i < desc->spte_count; i++) @@ -1003,14 +1017,15 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, =20 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head) { + unsigned long rmap_val =3D rmap_head->val; struct pte_list_desc *desc; =20 - if (!rmap_head->val) + if (!rmap_val) return 0; - else if (!(rmap_head->val & KVM_RMAP_MANY)) + else if (!(rmap_val & KVM_RMAP_MANY)) return 1; =20 - desc =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + desc =3D (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); return desc->tail_count + desc->spte_count; } =20 @@ -1053,6 +1068,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) */ struct rmap_iterator { /* private fields */ + struct rmap_head *head; struct pte_list_desc *desc; /* holds the sptep if not NULL */ int pos; /* index of the sptep */ }; @@ -1067,18 +1083,19 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { + unsigned long rmap_val =3D rmap_head->val; u64 *sptep; =20 - if (!rmap_head->val) + if (!rmap_val) return NULL; =20 - if (!(rmap_head->val & KVM_RMAP_MANY)) { + if (!(rmap_val & KVM_RMAP_MANY)) { iter->desc =3D NULL; - sptep =3D (u64 *)rmap_head->val; + sptep =3D (u64 *)rmap_val; goto out; } =20 - iter->desc =3D (struct pte_list_desc *)(rmap_head->val & ~KVM_RMAP_MANY); + iter->desc =3D (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); iter->pos =3D 0; sptep =3D iter->desc->sptes[iter->pos]; out: --=20 2.48.1.362.g079036d154-goog From nobody Sun Dec 14 06:21:57 2025 Received: from mail-vk1-f201.google.com (mail-vk1-f201.google.com [209.85.221.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF61A14A60F for ; Tue, 4 Feb 2025 00:41:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629666; cv=none; b=l5a7RgycscbVIoHnqR6pMmsEF+SzwXPJLj4jm4bHV0gh7T778d1TijbpgHa4np+rGi6rs6P9sIQcJ83EVxnsyfKYFvOwtQ2knPhTuAh6twnx4vOJaCm240vat1pIpDcV+qxfGH7PGyZGz7X44nffRjhWpuEAKevzMiGAThwKYlw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629666; c=relaxed/simple; bh=Zy4oye91lFdBLcv+ukgYWg+1Vp5yzDO7e3LTrEFoLk8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XIb2J45yjB4Jna+7jzhGJBLCv8/UJ8bvlo8iEY3TNcfjePl6lAHPYti91PObgStz3U9+a3dz9ms76nlFyTjBJQDWf0Nh0RseFxZbQx5WXJ2T85VSZL3Rvh4G+PFZfHfqKHjBugjzVqPiDxq1mus5DXQsNCT/WxEGzy+RuAt3s3A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kA5Sn7zU; arc=none smtp.client-ip=209.85.221.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kA5Sn7zU" Received: by mail-vk1-f201.google.com with SMTP id 71dfb90a1353d-51604d32337so1006767e0c.3 for ; Mon, 03 Feb 2025 16:41:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629662; x=1739234462; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eWoO+lMMXqWIzaVuugfET0/bTOkUIXS847FK2QmI/tI=; b=kA5Sn7zUMab4I3lAMkTkTosZQedg+WL1kLH87/ChT6Og4OaVKpMsEuvBZdCfnxRIOz cc6XrlcgM9HkLG5YN8h5bIJK7pKTuXw5iRKBhfhVGg+VgKhkSpqk5aisG0Ku8XXeRegN 15g2bFqyeMHtI0GzcLNGfGa4Af+qHi+n7WCykOKvv3sjVMY6T3J7rtr2b3yTTu1103QP GAXLgwlispCrjNoWHZDD3vQfh7nwiIIzs1+Chidc6LOPQLDJj44oMp2YfuPyNyO8Hpnv CAGS5cPbv/l5chByXW4dtGLdUHNg8W1Y/g8+TSPI1LTYULSr1NxZdjcmt4NtUIduLhxG xxRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629662; x=1739234462; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eWoO+lMMXqWIzaVuugfET0/bTOkUIXS847FK2QmI/tI=; b=dxGUNEl9a3wPBgAoepJxipLtJNBG63QBCH9lxXFjOhoPpzbQ680xrPk1gIRJ8Fxy8a 2UkXH4Cbk17PzzB1Z1UFafZ2F06XVsykFZ2Hm1C+J8oQQsTsDNhdZ4ab53V6Rjp/xiqz /ukh3m3Qe2aalVcHlYnVyxdEiNuSNNIXMpPaEpOXnqUsIHCttWd+w24p/7WdhbWkUptL pUe+DeQTutDTg099EAAk1tdcxuLQttRSwI6K23DDVIOBD80jqCvPtUq3lOnA8fH+3uuO wNY+Bf/kqYvJQrRKK8Hnp+uLcmXRWz+Q9iLdPf5MTAJoOEjSnxNeouLt2PsmmQ+I/WGP 1NOg== X-Forwarded-Encrypted: i=1; AJvYcCW0VZjZ+9PcNGTdJGA6XeqadKLkyXFFmW+aMka6o+HFioS0ypV/Dnmimq+Ym4Qe3mfOuxvcN544QAd9aSw=@vger.kernel.org X-Gm-Message-State: AOJu0YwzSpAsxCz7L/Q+9HD/fnQhaE0OMsmhwj0fiRpNm+ZfjlcNE0jb +xpOHTa1x3b/MfTuEVOVGe8FI+EYOM2GVJ3M5D3mCBR+PKd9dG73pMMrteezguF9oryKBIHMLUW shOCWtnbol2qek8JKSA== X-Google-Smtp-Source: AGHT+IGwZC+pQj5KVs1BoZvl9VTW+YoD2540d5PUuKxlYB7xTw8o5JqIwG6QPjFodj3FS42DjhlUFnaztBK8rSbO X-Received: from vkbej7.prod.google.com ([2002:a05:6122:2707:b0:51c:f313:dfc6]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6122:240c:b0:518:859e:87c3 with SMTP id 71dfb90a1353d-51e9e3edb13mr19459670e0c.7.1738629662372; Mon, 03 Feb 2025 16:41:02 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:36 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-10-jthoughton@google.com> Subject: [PATCH v9 09/11] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Steal another bit from rmap entries (which are word aligned pointers, i.e. have 2 free bits on 32-bit KVM, and 3 free bits on 64-bit KVM), and use the bit to implement a *very* rudimentary per-rmap spinlock. The only anticipated usage of the lock outside of mmu_lock is for aging gfns, and collisions between aging and other MMU rmap operations are quite rare, e.g. unless userspace is being silly and aging a tiny range over and over in a tight loop, time between contention when aging an actively running VM is O(seconds). In short, a more sophisticated locking scheme shouldn't be necessary. Note, the lock only protects the rmap structure itself, SPTEs that are pointed at by a locked rmap can still be modified and zapped by another task (KVM drops/zaps SPTEs before deleting the rmap entries) Signed-off-by: Sean Christopherson Co-developed-by: James Houghton Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/mmu/mmu.c | 107 ++++++++++++++++++++++++++++---- 2 files changed, 98 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 0e44fc1cec0d..bd18fde99116 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -28,6 +28,7 @@ #include #include #include +#include =20 #include #include @@ -406,7 +407,7 @@ union kvm_cpu_role { }; =20 struct kvm_rmap_head { - unsigned long val; + atomic_long_t val; }; =20 struct kvm_pio_request { diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a24cf8ddca7f..267cf2d4c3e3 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -853,11 +853,95 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_b= itmap(struct kvm_vcpu *vcpu * About rmap_head encoding: * * If the bit zero of rmap_head->val is clear, then it points to the only = spte - * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct + * in this rmap chain. Otherwise, (rmap_head->val & ~3) points to a struct * pte_list_desc containing more mappings. */ #define KVM_RMAP_MANY BIT(0) =20 +/* + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MMU al= ways + * operates with mmu_lock held for write), but rmaps can be walked without + * holding mmu_lock so long as the caller can tolerate SPTEs in the rmap c= hain + * being zapped/dropped _while the rmap is locked_. + * + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries must= be + * done while holding mmu_lock for write. This allows a task walking rmaps + * without holding mmu_lock to concurrently walk the same entries as a task + * that is holding mmu_lock but _not_ the rmap lock. Neither task will mo= dify + * the rmaps, thus the walks are stable. + * + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP_LOC= KED, + * only the rmap chains themselves are protected. E.g. holding an rmap's = lock + * ensures all "struct pte_list_desc" fields are stable. + */ +#define KVM_RMAP_LOCKED BIT(1) + +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) +{ + unsigned long old_val, new_val; + + /* + * Elide the lock if the rmap is empty, as lockless walkers (read-only + * mode) don't need to (and can't) walk an empty rmap, nor can they add + * entries to the rmap. I.e. the only paths that process empty rmaps + * do so while holding mmu_lock for write, and are mutually exclusive. + */ + old_val =3D atomic_long_read(&rmap_head->val); + if (!old_val) + return 0; + + do { + /* + * If the rmap is locked, wait for it to be unlocked before + * trying acquire the lock, e.g. to bounce the cache line. + */ + while (old_val & KVM_RMAP_LOCKED) { + cpu_relax(); + old_val =3D atomic_long_read(&rmap_head->val); + } + + /* + * Recheck for an empty rmap, it may have been purged by the + * task that held the lock. + */ + if (!old_val) + return 0; + + new_val =3D old_val | KVM_RMAP_LOCKED; + /* + * Use try_cmpxchg_acquire to prevent reads and writes to the rmap + * from being reordered outside of the critical section created by + * kvm_rmap_lock. + * + * Pairs with smp_store_release in kvm_rmap_unlock. + * + * For the !old_val case, no ordering is needed, as there is no rmap + * to walk. + */ + } while (!atomic_long_try_cmpxchg_acquire(&rmap_head->val, &old_val, new_= val)); + + /* Return the old value, i.e. _without_ the LOCKED bit set. */ + return old_val; +} + +static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, + unsigned long new_val) +{ + WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED); + /* + * Ensure that all accesses to the rmap have completed + * before we actually unlock the rmap. + * + * Pairs with the atomic_long_try_cmpxchg_acquire in kvm_rmap_lock. + */ + atomic_long_set_release(&rmap_head->val, new_val); +} + +static unsigned long kvm_rmap_get(struct kvm_rmap_head *rmap_head) +{ + return atomic_long_read(&rmap_head->val) & ~KVM_RMAP_LOCKED; +} + /* * Returns the number of pointers in the rmap chain, not counting the new = one. */ @@ -868,7 +952,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *ca= che, u64 *spte, struct pte_list_desc *desc; int count =3D 0; =20 - old_val =3D rmap_head->val; + old_val =3D kvm_rmap_lock(rmap_head); =20 if (!old_val) { new_val =3D (unsigned long)spte; @@ -900,7 +984,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *ca= che, u64 *spte, desc->sptes[desc->spte_count++] =3D spte; } =20 - rmap_head->val =3D new_val; + kvm_rmap_unlock(rmap_head, new_val); =20 return count; } @@ -948,7 +1032,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, unsigned long rmap_val; int i; =20 - rmap_val =3D rmap_head->val; + rmap_val =3D kvm_rmap_lock(rmap_head); if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) goto out; =20 @@ -974,7 +1058,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, } =20 out: - rmap_head->val =3D rmap_val; + kvm_rmap_unlock(rmap_head, rmap_val); } =20 static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -992,7 +1076,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned long rmap_val; int i; =20 - rmap_val =3D rmap_head->val; + rmap_val =3D kvm_rmap_lock(rmap_head); if (!rmap_val) return false; =20 @@ -1011,13 +1095,13 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, } out: /* rmap_head is meaningless now, remember to reset it */ - rmap_head->val =3D 0; + kvm_rmap_unlock(rmap_head, 0); return true; } =20 unsigned int pte_list_count(struct kvm_rmap_head *rmap_head) { - unsigned long rmap_val =3D rmap_head->val; + unsigned long rmap_val =3D kvm_rmap_get(rmap_head); struct pte_list_desc *desc; =20 if (!rmap_val) @@ -1083,7 +1167,7 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { - unsigned long rmap_val =3D rmap_head->val; + unsigned long rmap_val =3D kvm_rmap_get(rmap_head); u64 *sptep; =20 if (!rmap_val) @@ -1418,7 +1502,7 @@ static void slot_rmap_walk_next(struct slot_rmap_walk= _iterator *iterator) while (++iterator->rmap <=3D iterator->end_rmap) { iterator->gfn +=3D KVM_PAGES_PER_HPAGE(iterator->level); =20 - if (iterator->rmap->val) + if (atomic_long_read(&iterator->rmap->val)) return; } =20 @@ -2444,7 +2528,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct k= vm_mmu_page *sp, * avoids retaining a large number of stale nested SPs. */ if (tdp_enabled && invalid_list && - child->role.guest_mode && !child->parent_ptes.val) + child->role.guest_mode && + !atomic_long_read(&child->parent_ptes.val)) return kvm_mmu_prepare_zap_page(kvm, child, invalid_list); } --=20 2.48.1.362.g079036d154-goog From nobody Sun Dec 14 06:21:57 2025 Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6ABEF1547F0 for ; Tue, 4 Feb 2025 00:41:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629667; cv=none; b=Yi8zaiVHFHR82P2ewmqdEC/Es/4GeTzAja5ySkpBMUStaOzZ+WO6NJqeTamgvpkK0iIfwIOK5zVzWc/McH363wI/FoAcIP9/bzvOiXAUvsU+W43D9iMgthg+Jy9uf5b+AhCvchI2cQrdd5FW/0jTTczHco6IQUwnjTCPNzTVoDY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629667; c=relaxed/simple; bh=FDFCk2LogfvC6RP8YTOSiGFoYjSxrG9RdMJhIpmoHow=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JvVLslntRIEkR2K0JQz3CMbEkw152rmVoC7LaXr97QvbzjiRebu23vykTYR4jOiFeIYhWXQYoolG/JVLWmqjvucVT2SujU7QXVQZekg6nmMfJmWf8zPijeGB/3nBnn+8nS5Gfv3rcdhUiImx12FEj5vLIH6/BorYSOaKH+tIZk8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=20rFfx47; arc=none smtp.client-ip=209.85.217.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="20rFfx47" Received: by mail-vs1-f73.google.com with SMTP id ada2fe7eead31-4b3cc537766so515721137.2 for ; Mon, 03 Feb 2025 16:41:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629663; x=1739234463; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=NT/1Ea4uOdT49wXiAqqe8cJlC1P1xI9VIain2UmREN0=; b=20rFfx473GOBjoNUYs4JozDIW1yedvGFdyY9+PnUN34sLDpUhoNZNDrk3NsGTX2BJy eZQZgQUPh4x0Ao3HKHj/0T97hDgMuclZVFrtinNJjfwmjTK1GBlFY+qc9ngm4mfSGdRe pWY0YfVRjGzyFL+nGBt8/LPJeCumCEoqwJ4cV0zqHQBtYOSv8aXd3h4Y5fr76IcCKfmG QE/T1jx64XD7P1XOb85oAat7S5a5ZyPA5uUmY3fn2KTFGSw1oi8edZJA05xRbQFPoFie lRQXscmyKUP7LZvynWFvcMGOW1d8uBq8SjAV41qRYE3IgkKevpS2OAb1uf1dpVeE1r7k n9lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629663; x=1739234463; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NT/1Ea4uOdT49wXiAqqe8cJlC1P1xI9VIain2UmREN0=; b=pJMPIdMZFNf6oUtgFhJwHZCxoAVktYBpz/tnqkaPMmXF1o8BO9UgBqkR/66INiEa4Y Yxz1ffU1Tuldaumn5Spzq+oq9OVLeCDT20ag5bYYRT/NBvNx1ou0N3hh7sqA8qOBa73C eaUoYsKMTzlZD8+9zli5K5ylDhkB0MtXPnVFYhiCzGRRxPBSQS03QWQYDRHlV6KafEvA w6e+dEnF7Gg9rxMLCdInUoe0JnvTQQ/j5kb1kVAc0Qkko/6lwPSm0vk+ZWT6ToW6io7X b8kOAlgLVeLevowOSAOZnpbIpEZ88DHUnbvet9SdJ97vOMV8NQZB4PUtT0w2vAVZ/9Pq X1bg== X-Forwarded-Encrypted: i=1; AJvYcCWNNB2/amjxmZ+zEmdC/FCCIeaa42YdWyx2M2mBzzBDdKr8m6Y6tH//wact5Ok8lr9oVwlwwrZ2mJu2MkU=@vger.kernel.org X-Gm-Message-State: AOJu0Yy46nD5qhu1c+/2hCp2OjhTSvGcGB+o4ReeKsap2xJZhrz0UiQD 3AFKXP8aUN4pkVrYItzpBEseYtTstAWoEf+bsyvQK8YRaR2h0tBfE0P4y7dhVpAYuUuYWs8Mqhk aui0qOc9TIeqThSbq6Q== X-Google-Smtp-Source: AGHT+IGvEefJIQ+qh5TNzlPbwPP5FFWQIpGkJTgioLbeggYzaRqHQ2iUX4TpLwM9epwUB9f8udHnoz355QhDxYKM X-Received: from vkbci31.prod.google.com ([2002:a05:6122:321f:b0:516:25ed:28e4]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6102:3913:b0:4b6:20a5:8a11 with SMTP id ada2fe7eead31-4b9a4ec0890mr18276154137.1.1738629663101; Mon, 03 Feb 2025 16:41:03 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:37 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-11-jthoughton@google.com> Subject: [PATCH v9 10/11] KVM: x86/mmu: Add support for lockless walks of rmap SPTEs From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Add a lockless version of for_each_rmap_spte(), which is pretty much the same as the normal version, except that it doesn't BUG() the host if a non-present SPTE is encountered. When mmu_lock is held, it should be impossible for a different task to zap a SPTE, _and_ zapped SPTEs must be removed from their rmap chain prior to dropping mmu_lock. Thus, the normal walker BUG()s if a non-present SPTE is encountered as something is wildly broken. When walking rmaps without holding mmu_lock, the SPTEs pointed at by the rmap chain can be zapped/dropped, and so a lockless walk can observe a non-present SPTE if it runs concurrently with a different operation that is zapping SPTEs. Signed-off-by: Sean Christopherson Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 135 ++++++++++++++++++++++++++++------------- 1 file changed, 94 insertions(+), 41 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 267cf2d4c3e3..a0f735eeaaeb 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -876,10 +876,12 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_b= itmap(struct kvm_vcpu *vcpu */ #define KVM_RMAP_LOCKED BIT(1) =20 -static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) +static unsigned long __kvm_rmap_lock(struct kvm_rmap_head *rmap_head) { unsigned long old_val, new_val; =20 + lockdep_assert_preemption_disabled(); + /* * Elide the lock if the rmap is empty, as lockless walkers (read-only * mode) don't need to (and can't) walk an empty rmap, nor can they add @@ -911,7 +913,7 @@ static unsigned long kvm_rmap_lock(struct kvm_rmap_head= *rmap_head) /* * Use try_cmpxchg_acquire to prevent reads and writes to the rmap * from being reordered outside of the critical section created by - * kvm_rmap_lock. + * __kvm_rmap_lock. * * Pairs with smp_store_release in kvm_rmap_unlock. * @@ -920,21 +922,42 @@ static unsigned long kvm_rmap_lock(struct kvm_rmap_he= ad *rmap_head) */ } while (!atomic_long_try_cmpxchg_acquire(&rmap_head->val, &old_val, new_= val)); =20 - /* Return the old value, i.e. _without_ the LOCKED bit set. */ + /* + * Return the old value, i.e. _without_ the LOCKED bit set. It's + * impossible for the return value to be 0 (see above), i.e. the read- + * only unlock flow can't get a false positive and fail to unlock. + */ return old_val; } =20 -static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, - unsigned long new_val) +static unsigned long kvm_rmap_lock(struct kvm *kvm, + struct kvm_rmap_head *rmap_head) +{ + lockdep_assert_held_write(&kvm->mmu_lock); + + return __kvm_rmap_lock(rmap_head); +} + +static void __kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, + unsigned long val) { - WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED); + KVM_MMU_WARN_ON(val & KVM_RMAP_LOCKED); /* * Ensure that all accesses to the rmap have completed * before we actually unlock the rmap. * - * Pairs with the atomic_long_try_cmpxchg_acquire in kvm_rmap_lock. + * Pairs with the atomic_long_try_cmpxchg_acquire in __kvm_rmap_lock. */ - atomic_long_set_release(&rmap_head->val, new_val); + atomic_long_set_release(&rmap_head->val, val); +} + +static void kvm_rmap_unlock(struct kvm *kvm, + struct kvm_rmap_head *rmap_head, + unsigned long new_val) +{ + lockdep_assert_held_write(&kvm->mmu_lock); + + __kvm_rmap_unlock(rmap_head, new_val); } =20 static unsigned long kvm_rmap_get(struct kvm_rmap_head *rmap_head) @@ -942,17 +965,49 @@ static unsigned long kvm_rmap_get(struct kvm_rmap_hea= d *rmap_head) return atomic_long_read(&rmap_head->val) & ~KVM_RMAP_LOCKED; } =20 +/* + * If mmu_lock isn't held, rmaps can only be locked in read-only mode. The + * actual locking is the same, but the caller is disallowed from modifying= the + * rmap, and so the unlock flow is a nop if the rmap is/was empty. + */ +__maybe_unused +static unsigned long kvm_rmap_lock_readonly(struct kvm_rmap_head *rmap_hea= d) +{ + unsigned long rmap_val; + + preempt_disable(); + rmap_val =3D __kvm_rmap_lock(rmap_head); + + if (!rmap_val) + preempt_enable(); + + return rmap_val; +} + +__maybe_unused +static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, + unsigned long old_val) +{ + if (!old_val) + return; + + KVM_MMU_WARN_ON(old_val !=3D kvm_rmap_get(rmap_head)); + + __kvm_rmap_unlock(rmap_head, old_val); + preempt_enable(); +} + /* * Returns the number of pointers in the rmap chain, not counting the new = one. */ -static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, - struct kvm_rmap_head *rmap_head) +static int pte_list_add(struct kvm *kvm, struct kvm_mmu_memory_cache *cach= e, + u64 *spte, struct kvm_rmap_head *rmap_head) { unsigned long old_val, new_val; struct pte_list_desc *desc; int count =3D 0; =20 - old_val =3D kvm_rmap_lock(rmap_head); + old_val =3D kvm_rmap_lock(kvm, rmap_head); =20 if (!old_val) { new_val =3D (unsigned long)spte; @@ -984,7 +1039,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *c= ache, u64 *spte, desc->sptes[desc->spte_count++] =3D spte; } =20 - kvm_rmap_unlock(rmap_head, new_val); + kvm_rmap_unlock(kvm, rmap_head, new_val); =20 return count; } @@ -1032,7 +1087,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spt= e, unsigned long rmap_val; int i; =20 - rmap_val =3D kvm_rmap_lock(rmap_head); + rmap_val =3D kvm_rmap_lock(kvm, rmap_head); if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) goto out; =20 @@ -1058,7 +1113,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spt= e, } =20 out: - kvm_rmap_unlock(rmap_head, rmap_val); + kvm_rmap_unlock(kvm, rmap_head, rmap_val); } =20 static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -1076,7 +1131,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned long rmap_val; int i; =20 - rmap_val =3D kvm_rmap_lock(rmap_head); + rmap_val =3D kvm_rmap_lock(kvm, rmap_head); if (!rmap_val) return false; =20 @@ -1095,7 +1150,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, } out: /* rmap_head is meaningless now, remember to reset it */ - kvm_rmap_unlock(rmap_head, 0); + kvm_rmap_unlock(kvm, rmap_head, 0); return true; } =20 @@ -1168,23 +1223,18 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rm= ap_head, struct rmap_iterator *iter) { unsigned long rmap_val =3D kvm_rmap_get(rmap_head); - u64 *sptep; =20 if (!rmap_val) return NULL; =20 if (!(rmap_val & KVM_RMAP_MANY)) { iter->desc =3D NULL; - sptep =3D (u64 *)rmap_val; - goto out; + return (u64 *)rmap_val; } =20 iter->desc =3D (struct pte_list_desc *)(rmap_val & ~KVM_RMAP_MANY); iter->pos =3D 0; - sptep =3D iter->desc->sptes[iter->pos]; -out: - BUG_ON(!is_shadow_present_pte(*sptep)); - return sptep; + return iter->desc->sptes[iter->pos]; } =20 /* @@ -1194,14 +1244,11 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rm= ap_head, */ static u64 *rmap_get_next(struct rmap_iterator *iter) { - u64 *sptep; - if (iter->desc) { if (iter->pos < PTE_LIST_EXT - 1) { ++iter->pos; - sptep =3D iter->desc->sptes[iter->pos]; - if (sptep) - goto out; + if (iter->desc->sptes[iter->pos]) + return iter->desc->sptes[iter->pos]; } =20 iter->desc =3D iter->desc->more; @@ -1209,20 +1256,24 @@ static u64 *rmap_get_next(struct rmap_iterator *ite= r) if (iter->desc) { iter->pos =3D 0; /* desc->sptes[0] cannot be NULL */ - sptep =3D iter->desc->sptes[iter->pos]; - goto out; + return iter->desc->sptes[iter->pos]; } } =20 return NULL; -out: - BUG_ON(!is_shadow_present_pte(*sptep)); - return sptep; } =20 -#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \ - for (_spte_ =3D rmap_get_first(_rmap_head_, _iter_); \ - _spte_; _spte_ =3D rmap_get_next(_iter_)) +#define __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + for (_sptep_ =3D rmap_get_first(_rmap_head_, _iter_); \ + _sptep_; _sptep_ =3D rmap_get_next(_iter_)) + +#define for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + if (!WARN_ON_ONCE(!is_shadow_present_pte(*(_sptep_)))) \ + +#define for_each_rmap_spte_lockless(_rmap_head_, _iter_, _sptep_, _spte_) \ + __for_each_rmap_spte(_rmap_head_, _iter_, _sptep_) \ + if (is_shadow_present_pte(_spte_ =3D mmu_spte_get_lockless(sptep))) =20 static void drop_spte(struct kvm *kvm, u64 *sptep) { @@ -1308,12 +1359,13 @@ static bool __rmap_clear_dirty(struct kvm *kvm, str= uct kvm_rmap_head *rmap_head, struct rmap_iterator iter; bool flush =3D false; =20 - for_each_rmap_spte(rmap_head, &iter, sptep) + for_each_rmap_spte(rmap_head, &iter, sptep) { if (spte_ad_need_write_protect(*sptep)) flush |=3D test_and_clear_bit(PT_WRITABLE_SHIFT, (unsigned long *)sptep); else flush |=3D spte_clear_dirty(sptep); + } =20 return flush; } @@ -1634,7 +1686,7 @@ static void __rmap_add(struct kvm *kvm, kvm_update_page_stats(kvm, sp->role.level, 1); =20 rmap_head =3D gfn_to_rmap(gfn, sp->role.level, slot); - rmap_count =3D pte_list_add(cache, spte, rmap_head); + rmap_count =3D pte_list_add(kvm, cache, spte, rmap_head); =20 if (rmap_count > kvm->stat.max_mmu_rmap_size) kvm->stat.max_mmu_rmap_size =3D rmap_count; @@ -1768,13 +1820,14 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn) return hash_64(gfn, KVM_MMU_HASH_SHIFT); } =20 -static void mmu_page_add_parent_pte(struct kvm_mmu_memory_cache *cache, +static void mmu_page_add_parent_pte(struct kvm *kvm, + struct kvm_mmu_memory_cache *cache, struct kvm_mmu_page *sp, u64 *parent_pte) { if (!parent_pte) return; =20 - pte_list_add(cache, parent_pte, &sp->parent_ptes); + pte_list_add(kvm, cache, parent_pte, &sp->parent_ptes); } =20 static void mmu_page_remove_parent_pte(struct kvm *kvm, struct kvm_mmu_pag= e *sp, @@ -2464,7 +2517,7 @@ static void __link_shadow_page(struct kvm *kvm, =20 mmu_spte_set(sptep, spte); =20 - mmu_page_add_parent_pte(cache, sp, sptep); + mmu_page_add_parent_pte(kvm, cache, sp, sptep); =20 /* * The non-direct sub-pagetable must be updated before linking. For --=20 2.48.1.362.g079036d154-goog From nobody Sun Dec 14 06:21:57 2025 Received: from mail-vk1-f202.google.com (mail-vk1-f202.google.com [209.85.221.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A3E9156677 for ; Tue, 4 Feb 2025 00:41:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629667; cv=none; b=K3HzqoF6Ece/zdQ2h+n67IEgF0vxHR0yFG2YwWw3lVl2eqX5XLI+9y4ftN+UjIVgjhSCQdTWbRJkleb2QqmOuYiivrd7MYwHRm4/v22lZNA5Xh0srnyJ0UeU2EMkWDnlC9/UNpY8Oabo62y8MuA3b1vya1Z2MGdBWRBSTZ1CXL8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738629667; c=relaxed/simple; bh=4Z6U9GTWfR1nveiuKtqo1MRye+TyhaPepZk533SRaN4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZG1sNbMxBX/phUmwWR+NZ/sBMOzacf35rYe56qQPoJ7I66nYONxeoYWoo5CDjDG+mP3bz6QWPQrMyyGE2SjYwH6K9hUqHpAkxjK8KjojYUwTH5A8fqtaYGEUA4FeyV2smbN4B4yhD8qb0vUAujvUnQOAnczuq9neV5LmsZMHWiI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TYFGe7OZ; arc=none smtp.client-ip=209.85.221.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TYFGe7OZ" Received: by mail-vk1-f202.google.com with SMTP id 71dfb90a1353d-51885f2c9f8so848421e0c.0 for ; Mon, 03 Feb 2025 16:41:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738629664; x=1739234464; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XB20a67KR7DeYrSjHAqgweuJ5njCwGEox7XlvgNvZEY=; b=TYFGe7OZBfZnGZf5CSawINtJLZ4PTjs67v6LBzo+duSCaRk2ol7qcmXEi8RrF4DJ44 T3gFntTiMAfa/1L22ctUBAMy4vazUecogMd9V8mZbBNYdBUzwySTlPzdB6lMRPLAhGFC wxwZILNMw751kiY03IFhwNHP1iYGxJahQYc2PZYuJzTjD2BzA3vrWDcsnu260lBiVd+d Rqg92QpyZOACVWzW0o8qzsUkyHtFbfnbdzbnGLH93zbTl4bScXOeIZ1JWCBVLrrHmMPD CvcgXYsdZ9/DtVt+HeYztQlCxq1im4Ioio/bb8GPLczFlF27okag0ANyhQKcsWvakjzO 60DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738629664; x=1739234464; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XB20a67KR7DeYrSjHAqgweuJ5njCwGEox7XlvgNvZEY=; b=AuWVMY8b1T9p9WpGkLCrm0bfOr1+E8YR1OFjqCW/QXiEvyBIQ/zeEYi8vxUO5S+PGc AWwCobusAZhyD6zndI5X3o65f0LPSdAyK0uMaYrtZ/U3uNHsLf8as2Q8FPJoluzPEY1l j2i0LcZFyKcJE24C7wHhiG3A3HfaOlV9L+f1AElMam7owHqoF00+a0G/hT0OHAWgJknN fSQoexXGeB6vjMLoEGbkPu9j1g+KMjZyQcFXef/gG3hQ7YAg5aZfXpDw5USAXUbfgHoH wRByOf1ywyMUkENCTUxa8FZ79rTiO46AzYfxPZAiy/ClMzFDS/I9uAyUIHxaD0IknYGf 3aYA== X-Forwarded-Encrypted: i=1; AJvYcCW2rL+wsL7ZlV/0RKWzf5ckYCWZA5Yrf8TcrSolmV4pikghjJ8GebPnIDUzX+T/968P3I7go9SdPOsAUJM=@vger.kernel.org X-Gm-Message-State: AOJu0Yzy8cjwYhtVv+OTygqoOrRjV5dU4vdBnyI3SkCppUDucXn+q6NB qpDYX4n9rc0T/AVr7Bwz3PKM3TlfZJOtozk7c0ap31JEbpaN1KBgv6FxcAbUrncsJeqiuyZH0xL V/Zx+4d2RX/OZnLQwmg== X-Google-Smtp-Source: AGHT+IHkxO6ZPBe74K2xXD+6ZgmR8X9CtK0Avp1jbltMoCBw3n9EM24JEDU/Ye39c2lV2gucwTZ1OATmaKLU8JCe X-Received: from vkbfi24.prod.google.com ([2002:a05:6122:4d18:b0:51a:e48:fdff]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6122:1999:b0:51b:8949:c996 with SMTP id 71dfb90a1353d-51e9e5195e7mr18189806e0c.9.1738629663888; Mon, 03 Feb 2025 16:41:03 -0800 (PST) Date: Tue, 4 Feb 2025 00:40:38 +0000 In-Reply-To: <20250204004038.1680123-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250204004038.1680123-1-jthoughton@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250204004038.1680123-12-jthoughton@google.com> Subject: [PATCH v9 11/11] KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging gfns From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: David Matlack , David Rientjes , James Houghton , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson When A/D bits are supported on sptes, it is safe to simply clear the Accessed bits. The less obvious case is marking sptes for access tracking in the non-A/D case (for EPT only). In this case, we have to be sure that it is okay for TLB entries to exist for non-present sptes. For example, when doing dirty tracking, if we come across a non-present SPTE, we need to know that we need to do a TLB invalidation. This case is already supported today (as we already support *not* doing TLBIs for clear_young(); there is a separate notifier for clearing *and* flushing, clear_flush_young()). This works today because GET_DIRTY_LOG flushes the TLB before returning to userspace. Signed-off-by: Sean Christopherson Co-developed-by: James Houghton Signed-off-by: James Houghton --- arch/x86/kvm/mmu/mmu.c | 72 +++++++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a0f735eeaaeb..57b99daa8614 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -970,7 +970,6 @@ static unsigned long kvm_rmap_get(struct kvm_rmap_head = *rmap_head) * actual locking is the same, but the caller is disallowed from modifying= the * rmap, and so the unlock flow is a nop if the rmap is/was empty. */ -__maybe_unused static unsigned long kvm_rmap_lock_readonly(struct kvm_rmap_head *rmap_hea= d) { unsigned long rmap_val; @@ -984,7 +983,6 @@ static unsigned long kvm_rmap_lock_readonly(struct kvm_= rmap_head *rmap_head) return rmap_val; } =20 -__maybe_unused static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, unsigned long old_val) { @@ -1705,37 +1703,48 @@ static void rmap_add(struct kvm_vcpu *vcpu, const s= truct kvm_memory_slot *slot, } =20 static bool kvm_rmap_age_gfn_range(struct kvm *kvm, - struct kvm_gfn_range *range, bool test_only) + struct kvm_gfn_range *range, + bool test_only) { - struct slot_rmap_walk_iterator iterator; + struct kvm_rmap_head *rmap_head; struct rmap_iterator iter; + unsigned long rmap_val; bool young =3D false; u64 *sptep; + gfn_t gfn; + int level; + u64 spte; =20 - for_each_slot_rmap_range(range->slot, PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL, - range->start, range->end - 1, &iterator) { - for_each_rmap_spte(iterator.rmap, &iter, sptep) { - u64 spte =3D *sptep; + for (level =3D PG_LEVEL_4K; level <=3D KVM_MAX_HUGEPAGE_LEVEL; level++) { + for (gfn =3D range->start; gfn < range->end; + gfn +=3D KVM_PAGES_PER_HPAGE(level)) { + rmap_head =3D gfn_to_rmap(gfn, level, range->slot); + rmap_val =3D kvm_rmap_lock_readonly(rmap_head); =20 - if (!is_accessed_spte(spte)) - continue; + for_each_rmap_spte_lockless(rmap_head, &iter, sptep, spte) { + if (!is_accessed_spte(spte)) + continue; + + if (test_only) { + kvm_rmap_unlock_readonly(rmap_head, rmap_val); + return true; + } =20 - if (test_only) - return true; - - if (spte_ad_enabled(spte)) { - clear_bit((ffs(shadow_accessed_mask) - 1), - (unsigned long *)sptep); - } else { - /* - * WARN if mmu_spte_update() signals the need - * for a TLB flush, as Access tracking a SPTE - * should never trigger an _immediate_ flush. - */ - spte =3D mark_spte_for_access_track(spte); - WARN_ON_ONCE(mmu_spte_update(sptep, spte)); + if (spte_ad_enabled(spte)) + clear_bit((ffs(shadow_accessed_mask) - 1), + (unsigned long *)sptep); + else + /* + * If the following cmpxchg fails, the + * spte is being concurrently modified + * and should most likely stay young. + */ + cmpxchg64(sptep, spte, + mark_spte_for_access_track(spte)); + young =3D true; } - young =3D true; + + kvm_rmap_unlock_readonly(rmap_head, rmap_val); } } return young; @@ -1753,11 +1762,8 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_ran= ge *range) if (tdp_mmu_enabled) young =3D kvm_tdp_mmu_age_gfn_range(kvm, range); =20 - if (kvm_may_have_shadow_mmu_sptes(kvm)) { - write_lock(&kvm->mmu_lock); + if (kvm_may_have_shadow_mmu_sptes(kvm)) young |=3D kvm_rmap_age_gfn_range(kvm, range, false); - write_unlock(&kvm->mmu_lock); - } =20 return young; } @@ -1769,11 +1775,11 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_g= fn_range *range) if (tdp_mmu_enabled) young =3D kvm_tdp_mmu_test_age_gfn(kvm, range); =20 - if (!young && kvm_may_have_shadow_mmu_sptes(kvm)) { - write_lock(&kvm->mmu_lock); + if (young) + return young; + + if (kvm_may_have_shadow_mmu_sptes(kvm)) young |=3D kvm_rmap_age_gfn_range(kvm, range, true); - write_unlock(&kvm->mmu_lock); - } =20 return young; } --=20 2.48.1.362.g079036d154-goog