From nobody Sun Feb 8 00:00:14 2026 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CE60809 for ; Thu, 11 Jan 2024 02:00:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LRTDMAfG" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-5f6f51cd7e8so67338277b3.1 for ; Wed, 10 Jan 2024 18:00:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938454; x=1705543254; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=D95IJiOspZnkXY3ro8lL9La7I4fTbWFlMXjetLW2U0s=; b=LRTDMAfGbmk6wSsgJk8RgPAcPhkjdO8tzDe6nFux9W+Dn3Y2UYoHS/4BcAWIFD0zb+ WRryUaxYeI32zfmVPpEvBF2zVWFPHMfnuIhuhWbLY73wYWdA5rYg0eWVBMC0PsgTr/wv pbRmHSTlcRou01ntV8Wlg/+ghcW5CYTLIN7h6aP4p740NDvosyVGN2A5d3IwydG0r1ju Z7m/2nVXlBt6HAOjnnrFWFIydloE06r5ME2emWUAz61aSEPWzUBdrQ2bFLxdMRoZ6XO4 gVdbCaH+XII7yXcs0mTH9Qcabpb2AjGLD1Jbog65LjoYxqVPLjnZTrSe6QrUnvgq9Vr5 bacg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938454; x=1705543254; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=D95IJiOspZnkXY3ro8lL9La7I4fTbWFlMXjetLW2U0s=; b=bbTh6sfQoqyFjA4JeozOzIzYzkhnHXAdBHQjzzV8n7pyI7/6GQ7+Dolz66RuD+Noe4 WqA5to6884zTjicBjPMVMc4gglWkV5t8SBHFp3RrPBuqJF4NsXyPFT0b/pwYmbrsFOGX znhTiughoz/8X/5N1dVLjnK/LYU2anzVP2KlrSJ7b+Yv2FZ+qB8eXPURkIB+klC2wrE+ xvowRl6AxRzj3nAfsPx23OL8ELq6iQomM1P8rmEvkVkab0NTOOE4+co5mztUkYKfQHT9 2v2wsEM4irmeNgyp7/9jTFAA5jCqPhKGurgK1VcdtQzf3iyLJJn3BOhToYzL+tQk2wI0 VGbg== X-Gm-Message-State: AOJu0YzXyyhXKCTmTpSqcQ15Pyif/V+92Bnm9GZdrgcC0DlZlAxUSl0j xUoNqlRFFZcevOfXohtXk/w/raC8M4Kpqa5RFQ== X-Google-Smtp-Source: AGHT+IHU2L3AWsqg+VKFxa07fDe5pTrj2AA3wwqdZdelomTt1akVCSN2W4ZaypRmqaBHpCR2z17RoF67hKE= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a0d:efc3:0:b0:5d7:a8b2:327 with SMTP id y186-20020a0defc3000000b005d7a8b20327mr248962ywe.7.1704938454524; Wed, 10 Jan 2024 18:00:54 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:41 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-2-seanjc@google.com> Subject: [PATCH 1/8] KVM: x86/mmu: Zap invalidated TDP MMU roots at 4KiB granularity From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Zap invalidated TDP MMU roots at maximum granularity, i.e. with more frequent conditional resched checkpoints, in order to avoid running for an extended duration (milliseconds, or worse) without honoring a reschedule request. And for kernels running with full or real-time preempt models, zapping at 4KiB granularity also provides significantly reduced latency for other tasks that are contending for mmu_lock (which isn't necessarily an overall win for KVM, but KVM should do its best to honor the kernel's preemption model). To keep KVM's assertion that zapping at 1GiB granularity is functionally ok, which is the main reason 1GiB was selected in the past, skip straight to zapping at 1GiB if KVM is configured to prove the MMU. Zapping roots is far more common than a vCPU replacing a 1GiB page table with a hugepage, e.g. generally happens multiple times during boot, and so keeping the test coverage provided by root zaps is desirable, just not for production. Cc: David Matlack Cc: Pattara Teerapong Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6ae19b4ee5b1..372da098d3ce 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -734,15 +734,26 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct = kvm_mmu_page *root, rcu_read_lock(); =20 /* - * To avoid RCU stalls due to recursively removing huge swaths of SPs, - * split the zap into two passes. On the first pass, zap at the 1gb - * level, and then zap top-level SPs on the second pass. "1gb" is not - * arbitrary, as KVM must be able to zap a 1gb shadow page without - * inducing a stall to allow in-place replacement with a 1gb hugepage. + * Zap roots in multiple passes of decreasing granularity, i.e. zap at + * 4KiB=3D>2MiB=3D>1GiB=3D>root, in order to better honor need_resched() = (all + * preempt models) or mmu_lock contention (full or real-time models). + * Zapping at finer granularity marginally increases the total time of + * the zap, but in most cases the zap itself isn't latency sensitive. * - * Because zapping a SP recurses on its children, stepping down to - * PG_LEVEL_4K in the iterator itself is unnecessary. + * If KVM is configured to prove the MMU, skip the 4KiB and 2MiB zaps + * in order to mimic the page fault path, which can replace a 1GiB page + * table with an equivalent 1GiB hugepage, i.e. can get saddled with + * zapping a 1GiB region that's fully populated with 4KiB SPTEs. This + * allows verifying that KVM can safely zap 1GiB regions, e.g. without + * inducing RCU stalls, without relying on a relatively rare event + * (zapping roots is orders of magnitude more common). Note, because + * zapping a SP recurses on its children, stepping down to PG_LEVEL_4K + * in the iterator itself is unnecessary. */ + if (!IS_ENABLED(CONFIG_KVM_PROVE_MMU)) { + __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_4K); + __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_2M); + } __tdp_mmu_zap_root(kvm, root, shared, PG_LEVEL_1G); __tdp_mmu_zap_root(kvm, root, shared, root->role.level); =20 --=20 2.43.0.275.g3460e3d667-goog From nobody Sun Feb 8 00:00:14 2026 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F1EE110C for ; Thu, 11 Jan 2024 02:00:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="L/Prt7W+" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dbf405319b9so104032276.0 for ; Wed, 10 Jan 2024 18:00:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938456; x=1705543256; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=SfisKc9ufFv9a7SmizBx1UAzf631mmq7jI6n3buw95E=; b=L/Prt7W+qLQtSixtHUPxuITHKFxaXo3oNoNS/s50MDDLr25CNooHRVb6lQmHE4VZVu qiZqtgoKBTwGxpSZKEmkXf516KJJP+rapdd4B7tTFTIgl6kTLuIs+j7FYlZhT1UN9t4j WmvG6CLFTejdc7qNhizASNDBbR9XOIT4GVRDgIYWc7urbZ+5nnRTUhHV96ZoMcP1LO1m Afas9nHDqVTU4GMzu82WgIe/mzUduoVv8a5y7NO6oZhQChpUvSMqIOhNzfWQxr7sAOzk k1Zn+NZGPZtWQVHsu1m9Kjdr0aVMNM1XSkuNliOgMjdFLLW+hV9WVsZbyTXNjgt8hr3C E0wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938456; x=1705543256; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SfisKc9ufFv9a7SmizBx1UAzf631mmq7jI6n3buw95E=; b=Ihlcj8DTzkGhMfkKDGXdlIivceTFXyYBJewIOmq/Pb8SfZItrYPmJAbdd6xjhJLJbc vXBAxR9ksMzOdX4c/PVFqUKrT6Do3ECMTJU/fd9FrDIvBQszBWBOspvI8/B+FXBktYhI MrSn82I5YFayl7KVob+SfkL0Y7/1oNIlzMU72YedgE9M/c/8yEYdeitYhFw8qy6LqGOE 9NXkznKbsdpQzklT6TizgL0kqF4Ago6DP37W5Yy+Ulsiy4kvbHNTVNHt0qWRzP50pJjP sut6oJhI7ETOqBq9xnW6NOQl5Bh/sWoMQOUcIc001tidEwv7OyWUL6Q1U61g5JFuC03M kjvQ== X-Gm-Message-State: AOJu0YxHNJ/RbxmyAwbaRGIz86TodwLK7oxbuV9r2sYr3cDlWqhVGbP7 lemXoWGCZnPy6DqQDO+HICX+Kw+2q7DWVXkXyw== X-Google-Smtp-Source: AGHT+IGGWpDDh5XpprEp6J9VIccg3KI+1ylvIJG3ySsOn2OkL4yYLmL25smhU4dgMdYekhhrPGmQCn5Bj+w= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:7404:0:b0:dbe:111b:8875 with SMTP id p4-20020a257404000000b00dbe111b8875mr25375ybc.12.1704938456484; Wed, 10 Jan 2024 18:00:56 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:42 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-3-seanjc@google.com> Subject: [PATCH 2/8] KVM: x86/mmu: Don't do TLB flush when zappings SPTEs in invalid roots From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Don't force a TLB flush when zapping SPTEs in invalid roots as vCPUs can't be actively using invalid roots (zapping SPTEs in invalid roots is necessary only to ensure KVM doesn't mark a page accessed/dirty after it is freed by the primary MMU). Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 372da098d3ce..68920877370b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -811,7 +811,13 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct = kvm_mmu_page *root, continue; =20 tdp_mmu_iter_set_spte(kvm, &iter, 0); - flush =3D true; + + /* + * Zappings SPTEs in invalid roots doesn't require a TLB flush, + * see kvm_tdp_mmu_zap_invalidated_roots() for details. + */ + if (!root->role.invalid) + flush =3D true; } =20 rcu_read_unlock(); --=20 2.43.0.275.g3460e3d667-goog From nobody Sun Feb 8 00:00:14 2026 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1757A20F8 for ; Thu, 11 Jan 2024 02:00:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Lh6jSNZ6" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-db402e6f61dso6228135276.3 for ; Wed, 10 Jan 2024 18:00:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938458; x=1705543258; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=l9/uUlAc1H6ges5WZZ0my7oKxRDSlYhBNEqPxH+MbT4=; b=Lh6jSNZ6lnn/YTXtA//qLIgGy6u0hLEtPrmRi5Pdkw5vn/FTbaSzmLXnYkxLsE2128 c0v/kjJso7bu1epNSy5w9FvTq98LzRZxnBtZmDkDXt4O+FG/bCycXVhVY2FBpTEkmqsW 5HSkpjkp3MphMjU1+zHcgujvsvldwir1V/YaFzYBqgPSWwrnUtt0+6yXFqJriQW6D8bD RzjRibYTT/p7+tG5Q+xBEy6Klu1clqcfCxRWZSGGjuxbP649zS7h7S5DtLhJe2oTmw/Y CUp2+HAglm1fYnMhIXAV5Xlqq75XUA4ApEW3j90EkddrIq2VljHYqDuKXLjzQIyPxDm9 p52A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938458; x=1705543258; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=l9/uUlAc1H6ges5WZZ0my7oKxRDSlYhBNEqPxH+MbT4=; b=ptsHCFKv8GLW8Bg3E/q/8jvUBIcPIVmLIABEJwIaEqYtOq4AguDL8m3mum80xsDSj6 boRPsPuPU47DcK0CQ3otInNErZRUvU6j4iUjAn90IaL86+QVsfK5Ln5gDPkynJSerAIE 0CxFkrfOQZQ08n0tuVOMZWgFvHf8NbWgj1wFio4UDJuPfmUdcrN78rREhvbvwTcx9NLs 2KK5iCryMwCujgma8z9V8psAt9J8vGYLC61/LvK0+Q0Qg0klBPQtNoWrZ0/sxU9UeRei vRXoqelVX63zdL9sAxjBxMBcp/hVLhDgh8CEJTJEOIwVXze6seNuM3hQtGyGLGCbEu5K WSrw== X-Gm-Message-State: AOJu0YzTLkWTLH4JpeEOPSzNVMS6y2MGLdhw3jFyHwc34KA01P+AhGkX cz/wo0rQx5nQWyZPT4jwiibOE+F5CHIkIyEYCg== X-Google-Smtp-Source: AGHT+IH1AmU0WCb2Gp9Ea9RybRQmF02qQ8h/mg95lotcVnIJqt7Wd2ZELLjTguwBfn6b4owyhXqtgNr922s= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:9f02:0:b0:dbe:642c:2124 with SMTP id n2-20020a259f02000000b00dbe642c2124mr222202ybq.0.1704938458187; Wed, 10 Jan 2024 18:00:58 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:43 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-4-seanjc@google.com> Subject: [PATCH 3/8] KVM: x86/mmu: Allow passing '-1' for "all" as_id for TDP MMU iterators From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Modify for_each_tdp_mmu_root() and __for_each_tdp_mmu_root_yield_safe() to accept -1 for _as_id to mean "process all memslot address spaces". That way code that wants to process both SMM and !SMM doesn't need to iterate over roots twice (and likely copy+paste code in the process). Deliberately don't cast _as_id to an "int", just in case not casting helps the compiler elide the "_as_id >=3D0" check when being passed an unsigned value, e.g. from a memslot. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 68920877370b..60fff2aad59e 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -149,11 +149,11 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct = kvm *kvm, * If shared is set, this function is operating under the MMU lock in read * mode. */ -#define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _only_vali= d)\ - for (_root =3D tdp_mmu_next_root(_kvm, NULL, _only_valid); \ - ({ lockdep_assert_held(&(_kvm)->mmu_lock); }), _root; \ - _root =3D tdp_mmu_next_root(_kvm, _root, _only_valid)) \ - if (kvm_mmu_page_as_id(_root) !=3D _as_id) { \ +#define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _only_vali= d) \ + for (_root =3D tdp_mmu_next_root(_kvm, NULL, _only_valid); \ + ({ lockdep_assert_held(&(_kvm)->mmu_lock); }), _root; \ + _root =3D tdp_mmu_next_root(_kvm, _root, _only_valid)) \ + if (_as_id >=3D 0 && kvm_mmu_page_as_id(_root) !=3D _as_id) { \ } else =20 #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id) \ @@ -171,10 +171,10 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct = kvm *kvm, * Holding mmu_lock for write obviates the need for RCU protection as the = list * is guaranteed to be stable. */ -#define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ - list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) \ - if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) && \ - kvm_mmu_page_as_id(_root) !=3D _as_id) { \ +#define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ + list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) \ + if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) && \ + _as_id >=3D 0 && kvm_mmu_page_as_id(_root) !=3D _as_id) { \ } else =20 static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) --=20 2.43.0.275.g3460e3d667-goog From nobody Sun Feb 8 00:00:14 2026 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A2964C8D for ; Thu, 11 Jan 2024 02:01:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EGCWAu/R" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-5f9e5455db1so32614747b3.1 for ; Wed, 10 Jan 2024 18:01:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938460; x=1705543260; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=rqBadYBCjb9CMnJD46FFXiqodKZoyoy85BjKjELucac=; b=EGCWAu/RoLW5bBt2zCHMnoPeZXE+ZUo7T4XO7g9D9PzcGx42PlKKRO3zAc2XgsgXG0 4YlVgaWoyOOxieQoHrof0fLfqTUh1+DkbELePUrShMEh2nrpDsJ8CFTrOAS9i65K6kQZ st9peNN9CWJd2Mi1LoDjRn2t+dVG6l3yqyZZVhkkR0z4+EItI1V3YBlrGM1MB3grPDCC 2xMi8RsIOPLD7YBkBE8BUQEG2sWN5cyHrINhFNnWo+k5IuP5imfjRYHcL1WT5/ASp/uj RQYNQgbFFpUpctcDDcSwRPnscYuWB/+NXZfsWr/K20sGxKFgWeDr6nH8YhXO8TSCMr9p fpjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938460; x=1705543260; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rqBadYBCjb9CMnJD46FFXiqodKZoyoy85BjKjELucac=; b=M7nNQ3gGioEGTj1o4WrUCChVut+MUzqpRz9+hvy2hKSub9b5URgS+7w7BiBBybNsyW ZiXdTmEajTap6tb0Zsq8A+3HMrDOhQoHqrmM+2uOLlNyclfE9AnZiDWKaVbD4G8hAJtT YHnZFD6NFtP7W/q2o+shdP5EuN2pCFnzdwcrB1AKfDEA1biRr7qTVoxUknmVmFgA96dC 2B+nO7G76ptHmqpuNVNt09V9nHKyJUlTTKzha/hYU15QH90kqP66RjwZtwaEgB83yT/Z YDjMT0D0ayytFzV4RZFdlGWWBbsu41G0GnbzDPOy+bYwbP92rbHfjhQAIYvszF+SpmiB z4YQ== X-Gm-Message-State: AOJu0YyXSGINbM4CfbiXt8uRTWcpt2sCU/mSbSXjSsVDsGOjVs9jbwTj ueJr4kFrWcriE+qQW96WTmDFCcW0WjSsbeVbmw== X-Google-Smtp-Source: AGHT+IHfVtQseKKaCaqrvm06WnOcOa35Fgo35eG5hbD17jp+kZ/ViXCrRFzICGPMEErgkHJRcVy+SZx9Mgc= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a0d:fac4:0:b0:5f9:7737:d9a8 with SMTP id k187-20020a0dfac4000000b005f97737d9a8mr228905ywf.7.1704938460020; Wed, 10 Jan 2024 18:01:00 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:44 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-5-seanjc@google.com> Subject: [PATCH 4/8] KVM: x86/mmu: Skip invalid roots when zapping leaf SPTEs for GFN range From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When zapping a GFN in response to an APICv or MTRR change, don't zap SPTEs for invalid roots as KVM only needs to ensure the guest can't use stale mappings for the GFN. Unlike kvm_tdp_mmu_unmap_gfn_range(), which must zap "unreachable" SPTEs to ensure KVM doesn't mark a page accessed/dirty, kvm_tdp_mmu_zap_leafs() isn't used (and isn't intended to be used) to handle freeing of host memory. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 60fff2aad59e..1a9c16e5c287 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -830,16 +830,16 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct= kvm_mmu_page *root, } =20 /* - * Zap leaf SPTEs for the range of gfns, [start, end), for all roots. Retu= rns - * true if a TLB flush is needed before releasing the MMU lock, i.e. if on= e or - * more SPTEs were zapped since the MMU lock was last acquired. + * Zap leaf SPTEs for the range of gfns, [start, end), for all *VALID** ro= ots. + * Returns true if a TLB flush is needed before releasing the MMU lock, i.= e. if + * one or more SPTEs were zapped since the MMU lock was last acquired. */ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool f= lush) { struct kvm_mmu_page *root; =20 lockdep_assert_held_write(&kvm->mmu_lock); - for_each_tdp_mmu_root_yield_safe(kvm, root) + for_each_valid_tdp_mmu_root_yield_safe(kvm, root, -1) flush =3D tdp_mmu_zap_leafs(kvm, root, start, end, true, flush); =20 return flush; --=20 2.43.0.275.g3460e3d667-goog From nobody Sun Feb 8 00:00:14 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FB2D63B2 for ; Thu, 11 Jan 2024 02:01:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KsyZc9+X" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-1d3f888ee39so20291035ad.3 for ; Wed, 10 Jan 2024 18:01:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938462; x=1705543262; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=MERWwVvejuOoLKFvXvHDQW4SBfvZPExzmuJgrpmkPtA=; b=KsyZc9+X753QEbMYIxIFHgh2aSpt1CIkEbAzz4Gc+jChvwQU96S3a9W6Enby7BQD/z BaqVCAGI9HujSMaaKdCbJTFetadmlmI+M20+kqOquSeFibmGd3qqvk7/dPMU4Rry+1X9 6gsfoBc1TZlboafJSMMWgiwh76fB4hU94l36MRUeD7sZJd5z4yCNKDcEYOpzK4xapGYg ABTaIQ8eX3WRMDbEAiJPNoJaUW/sLIUb1YBUbMLQpFGPtF5owgev6/JVBhRo2fS2opJ9 AysvSTtSGsuAHM9HXt9LkbcRmk5JufHD/zTto1KxysEIe8jxQcOfUhP7Dp7/uw/O4Y0W rSKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938462; x=1705543262; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MERWwVvejuOoLKFvXvHDQW4SBfvZPExzmuJgrpmkPtA=; b=FM+FPKtTLiLDXJJgftST5I7tnHBWmyFVsrly1MY7PcljWosi9kj1herEem86rPkjR9 k+TdsuZBo19l7nYquHR6T+1CueKxVVfJprgE+1P6laxMy/jjGNk7gGjVv4h0p2l0fTXC K2NonecXTxNvbEfxIaZAIXR1URJx5l5VAJGnr+nT9QKDcrGWP3H0ckEAmlBknOzQh79Z gdQekVXx7grUOwKwtOflbLasgPY6wOjaQs+VXMt5LPhTRuWLmxkEDoP6C0grWVX22Iba JSry+Modv+Hp6dhKHlPWkUUVBCxP5LddDDsy0j2xs7WUorNKeYsLSd615NbumKGS6en5 Sifw== X-Gm-Message-State: AOJu0YziMbK3w2mvNE0qfbbPmyMDd4aVyF3XFOP+ikSc1O3tD2YQSWpR YZPsu2Ju/2z/mop/pYkPJ0YcF0gmNglaCTQAxw== X-Google-Smtp-Source: AGHT+IEOXGSNva7+1Hjq8n3roSHCuS96MuRgZ5O0CIDTBvuo6MOx+0GOvFC6l2NkE5Z2+nycZJZ4g32dEtU= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:f68f:b0:1d4:c27a:db7d with SMTP id l15-20020a170902f68f00b001d4c27adb7dmr2556plg.0.1704938461751; Wed, 10 Jan 2024 18:01:01 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:45 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-6-seanjc@google.com> Subject: [PATCH 5/8] KVM: x86/mmu: Skip invalid TDP MMU roots when write-protecting SPTEs From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When write-protecting SPTEs, don't process invalid roots as invalid roots are unreachable, i.e. can't be used to access guest memory and thus don't need to be write-protected. Note, this is *almost* a nop for kvm_tdp_mmu_clear_dirty_pt_masked(), which is called under slots_lock, i.e. is mutually exclusive with kvm_mmu_zap_all_fast(). But it's possible for something other than the "fast zap" thread to grab a reference to an invalid root and thus keep a root alive (but completely empty) after kvm_mmu_zap_all_fast() completes. The kvm_tdp_mmu_write_protect_gfn() case is more interesting as KVM write- protects SPTEs for reasons other than dirty logging, e.g. if a KVM creates a SPTE for a nested VM while a fast zap is in-progress. Add another TDP MMU iterator to visit only valid roots, and opportunistically convert kvm_tdp_mmu_get_vcpu_root_hpa() to said iterator. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1a9c16e5c287..e0a8343f66dc 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -171,12 +171,19 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct = kvm *kvm, * Holding mmu_lock for write obviates the need for RCU protection as the = list * is guaranteed to be stable. */ -#define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ +#define __for_each_tdp_mmu_root(_kvm, _root, _as_id, _only_valid) \ list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) \ if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) && \ - _as_id >=3D 0 && kvm_mmu_page_as_id(_root) !=3D _as_id) { \ + ((_as_id >=3D 0 && kvm_mmu_page_as_id(_root) !=3D _as_id) || \ + ((_only_valid) && (_root)->role.invalid))) { \ } else =20 +#define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ + __for_each_tdp_mmu_root(_kvm, _root, _as_id, false) + +#define for_each_valid_tdp_mmu_root(_kvm, _root, _as_id) \ + __for_each_tdp_mmu_root(_kvm, _root, _as_id, true) + static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) { struct kvm_mmu_page *sp; @@ -224,11 +231,8 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *v= cpu) =20 lockdep_assert_held_write(&kvm->mmu_lock); =20 - /* - * Check for an existing root before allocating a new one. Note, the - * role check prevents consuming an invalid root. - */ - for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { + /* Check for an existing root before allocating a new one. */ + for_each_valid_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { if (root->role.word =3D=3D role.word && kvm_tdp_mmu_get_root(root)) goto out; @@ -1639,7 +1643,7 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kv= m, { struct kvm_mmu_page *root; =20 - for_each_tdp_mmu_root(kvm, root, slot->as_id) + for_each_valid_tdp_mmu_root(kvm, root, slot->as_id) clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot); } =20 @@ -1757,7 +1761,7 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, bool spte_set =3D false; =20 lockdep_assert_held_write(&kvm->mmu_lock); - for_each_tdp_mmu_root(kvm, root, slot->as_id) + for_each_valid_tdp_mmu_root(kvm, root, slot->as_id) spte_set |=3D write_protect_gfn(kvm, root, gfn, min_level); =20 return spte_set; --=20 2.43.0.275.g3460e3d667-goog From nobody Sun Feb 8 00:00:14 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50B90944C for ; Thu, 11 Jan 2024 02:01:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yfjUfKDo" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-28bf83bcae2so5110666a91.0 for ; Wed, 10 Jan 2024 18:01:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938463; x=1705543263; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=S8K/qn0TNKAsHDt1ajFUKBR5bZ6xARBJmrA6iyLHTPU=; b=yfjUfKDoElqVE3+fRe6KWLVGB3Ngr7yy7xljEHS1HPNOEWZN9mOEEstx0jqMwCXvnA DE8TDHc5uZkiT8/jH39O+ltF8ZhlaQm5aRe/i30g8YVDgOfE5qPnqTZ37jvwSaUAV5c0 PyMNUE/8jKq5/bonxkYsnewkyMpP8HfmrSbjW4y2dClUFQgpcwP85A0dCjBJDUi4dnnl JpOb1Z28eKj60zZju4xgG2Oh450Bv4ophVHtZ24ApVyeFBR5nGf5wIpWIqb6ZG1GR/Ik eABmdHa/KVNu4FomG2yCV/Zn0Q1iljcug2tS17HiL+xpgbYRPaHdRN5u7G1FKtP2w6Ts DYDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938463; x=1705543263; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=S8K/qn0TNKAsHDt1ajFUKBR5bZ6xARBJmrA6iyLHTPU=; b=IoGRkuh9LtceJnCI2k0V2Z3GUzE4skExOziNB010pchjezTPas1KVNJDpJ8CGS3cKb QsPlqBiwllyIo2i8BHufN3Xo7KtUSwDRykMU30fEw7+S3wDOmjrxXSXnsMdl2w7I3/Jo NUB2dCmb+CwO+0o+d5r6+k0sU+GNxMIFvayfVhlk0EmWXofJcQigsBi4TizZN1aEebBa +pINdba7CT7uF+0q5D058rRaaqGt7aMXzsLRw84OwvQ7Xuu0rkDHNlCIfahDrndykxwY ZrYOIInJyX2ICdmjUCvx711IZcaB8dAFPp/UeBrM3TR8NbrxLqDnGG7lN2GY+AK5NaFh jlcg== X-Gm-Message-State: AOJu0YwiXWh9G3+Kanb2WnBP6VTfAzJXzUHZSAtbNCOrG/RAnXbXxfLW F3VR1HdB+j4A5clclLrJVOU+GooknISmI+yWbQ== X-Google-Smtp-Source: AGHT+IHSD9fjbqJvlkpqaABL6emPcHj6dKK9SdhpI0/FNTBle8c/824ljZA4HSmlBkdYHtVhi4s3w3qBcGk= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:2e8d:b0:28d:34d0:125b with SMTP id sn13-20020a17090b2e8d00b0028d34d0125bmr10583pjb.7.1704938463699; Wed, 10 Jan 2024 18:01:03 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:46 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-7-seanjc@google.com> Subject: [PATCH 6/8] KVM: x86/mmu: Check for usable TDP MMU root while holding mmu_lock for read From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When allocating a new TDP MMU root, check for a usable root while holding mmu_lock for read and only acquire mmu_lock for write if a new root needs to be created. There is no need to serialize other MMU operations if a vCPU is simply grabbing a reference to an existing root, holding mmu_lock for write is "necessary" (spoiler alert, it's not strictly necessary) only to ensure KVM doesn't end up with duplicate roots. Allowing vCPUs to get "new" roots in parallel is beneficial to VM boot and to setups that frequently delete memslots, i.e. which force all vCPUs to reload all roots. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 8 ++--- arch/x86/kvm/mmu/tdp_mmu.c | 60 +++++++++++++++++++++++++++++++------- arch/x86/kvm/mmu/tdp_mmu.h | 2 +- 3 files changed, 55 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3c844e428684..ea18aca23196 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3693,15 +3693,15 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *= vcpu) unsigned i; int r; =20 + if (tdp_mmu_enabled) + return kvm_tdp_mmu_alloc_root(vcpu); + write_lock(&vcpu->kvm->mmu_lock); r =3D make_mmu_pages_available(vcpu); if (r < 0) goto out_unlock; =20 - if (tdp_mmu_enabled) { - root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu); - mmu->root.hpa =3D root; - } else if (shadow_root_level >=3D PT64_ROOT_4LEVEL) { + if (shadow_root_level >=3D PT64_ROOT_4LEVEL) { root =3D mmu_alloc_root(vcpu, 0, 0, shadow_root_level); mmu->root.hpa =3D root; } else if (shadow_root_level =3D=3D PT32E_ROOT_LEVEL) { diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index e0a8343f66dc..9a8250a14fc1 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -223,21 +223,52 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page= *child_sp, tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role); } =20 -hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) +static struct kvm_mmu_page *kvm_tdp_mmu_try_get_root(struct kvm_vcpu *vcpu) { union kvm_mmu_page_role role =3D vcpu->arch.mmu->root_role; + int as_id =3D kvm_mmu_role_as_id(role); struct kvm *kvm =3D vcpu->kvm; struct kvm_mmu_page *root; =20 - lockdep_assert_held_write(&kvm->mmu_lock); - - /* Check for an existing root before allocating a new one. */ - for_each_valid_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { - if (root->role.word =3D=3D role.word && - kvm_tdp_mmu_get_root(root)) - goto out; + for_each_valid_tdp_mmu_root_yield_safe(kvm, root, as_id) { + if (root->role.word =3D=3D role.word) + return root; } =20 + return NULL; +} + +int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) +{ + struct kvm_mmu *mmu =3D vcpu->arch.mmu; + union kvm_mmu_page_role role =3D mmu->root_role; + struct kvm *kvm =3D vcpu->kvm; + struct kvm_mmu_page *root; + + /* + * Check for an existing root while holding mmu_lock for read to avoid + * unnecessary serialization if multiple vCPUs are loading a new root. + * E.g. when bringing up secondary vCPUs, KVM will already have created + * a valid root on behalf of the primary vCPU. + */ + read_lock(&kvm->mmu_lock); + root =3D kvm_tdp_mmu_try_get_root(vcpu); + read_unlock(&kvm->mmu_lock); + + if (root) + goto out; + + write_lock(&kvm->mmu_lock); + + /* + * Recheck for an existing root after acquiring mmu_lock for write. It + * is possible a new usable root was created between dropping mmu_lock + * (for read) and acquiring it for write. + */ + root =3D kvm_tdp_mmu_try_get_root(vcpu); + if (root) + goto out_unlock; + root =3D tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_sp(root, NULL, 0, role); =20 @@ -254,8 +285,17 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *v= cpu) list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots); spin_unlock(&kvm->arch.tdp_mmu_pages_lock); =20 +out_unlock: + write_unlock(&kvm->mmu_lock); out: - return __pa(root->spt); + /* + * Note, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS will prevent entering the guest + * and actually consuming the root if it's invalidated after dropping + * mmu_lock, and the root can't be freed as this vCPU holds a reference. + */ + mmu->root.hpa =3D __pa(root->spt); + mmu->root.pgd =3D 0; + return 0; } =20 static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, @@ -917,7 +957,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) * the VM is being destroyed). * * Note, kvm_tdp_mmu_zap_invalidated_roots() is gifted the TDP MMU's refer= ence. - * See kvm_tdp_mmu_get_vcpu_root_hpa(). + * See kvm_tdp_mmu_alloc_root(). */ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) { diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 20d97aa46c49..6e1ea04ca885 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -10,7 +10,7 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm); void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); =20 -hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu); +int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu); =20 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *= root) { --=20 2.43.0.275.g3460e3d667-goog From nobody Sun Feb 8 00:00:14 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F084CD294 for ; Thu, 11 Jan 2024 02:01:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1EX4g/8o" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-6da83a2cf03so3302340b3a.2 for ; Wed, 10 Jan 2024 18:01:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938465; x=1705543265; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=1yded+bpmAK7QB4V7SI1S7JIiv7Mm++nYuN9zOqBzwE=; b=1EX4g/8oJpSk4+xYbfSN1ML93P8YbI0VWrU8dwBgDvYtTwx/zRG+iL2zvn2PhUNPCC N8ER6kn4Xx/MnkeW2J+IbI1dU0iSeyQsBylKf++jj/y4xgnMo5hvCDfYg10TsUfOXewm rHQabi7SVhyhG/oicwowxqBjmPSI2tj55dybM2Pj89WchlptNluwgCtd7/pZlVHeEPIP z/Kr3QrsW5Mw7OVWblROPr2ZLCX6UszT5Klk4zKIgrU/eVG7zJbHtaLXxo34h/vuJckE QghmpWmE+vUhWZtzJ+2uqg/4lGoehj/GYYDT08qGMd/TV4HEhw1ytuk7l+Ljttk/Wic8 1iuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938465; x=1705543265; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1yded+bpmAK7QB4V7SI1S7JIiv7Mm++nYuN9zOqBzwE=; b=EOY9zxLhUheZjZWvo+4qjkPRDr+KQWw4SHTrTAsmzCxea2eG3LWfW8WpQ1H2ed35oi oVjy3C6uge9ZNEC0YA98mEpE3RZ23ZE7L6C2/Ni+m+EAdEAEsAxMN+aCyHiDW0SPezZ/ CCDj2hdsz7/UKKkm2SP7SmC+aoF41DUF6fyJI9NaAid3xLreDQFfhHlXFBP0fqmN9tZ3 9YmX2l73cZ1bYGQst9WkEvuq3q0AbJ/D6PPk74P8dlSu2sPvKRKMogF64OW77P0YUa9p 00byqaG9Ix1h+TeaxOGS0p4v+8foND9t9lgMB2/Da/dcpvPW2O32jKwbD+0hb8MOjmTg gzvg== X-Gm-Message-State: AOJu0YznooCVTMgEosh+BZqwb1RlroXbHdvKFRn3b++4pvzfGSZs4Fmg rRj/Fks4pTwmTezjOK/rJw/qDvkBO59FrBnIcA== X-Google-Smtp-Source: AGHT+IFvq58twADSMJm+6d/S+IGe2SEDfp86/oaj1BjXVD6hOjWWQD6324p/9p6gR2+0tHp36tMWsUDDvVw= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:3905:b0:6d9:a971:9685 with SMTP id fh5-20020a056a00390500b006d9a9719685mr64912pfb.6.1704938465257; Wed, 10 Jan 2024 18:01:05 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:47 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-8-seanjc@google.com> Subject: [PATCH 7/8] KVM: x86/mmu: Alloc TDP MMU roots while holding mmu_lock for read From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allocate TDP MMU roots while holding mmu_lock for read, and instead use tdp_mmu_pages_lock to guard against duplicate roots. This allows KVM to create new roots without forcing kvm_tdp_mmu_zap_invalidated_roots() to yield, e.g. allows vCPUs to load new roots after memslot deletion without forcing the zap thread to detect contention and yield (or complete if the kernel isn't preemptible). Note, creating a new TDP MMU root as an mmu_lock reader is safe for two reasons: (1) paths that must guarantee all roots/SPTEs are *visited* take mmu_lock for write and so are still mutually exclusive, e.g. mmu_notifier invalidations, and (2) paths that require all roots/SPTEs to *observe* some given state without holding mmu_lock for write must ensure freshness through some other means, e.g. toggling dirty logging must first wait for SRCU readers to recognize the memslot flags change before processing existing roots/SPTEs. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 55 +++++++++++++++----------------------- 1 file changed, 22 insertions(+), 33 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 9a8250a14fc1..d078157e62aa 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -223,51 +223,42 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page= *child_sp, tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role); } =20 -static struct kvm_mmu_page *kvm_tdp_mmu_try_get_root(struct kvm_vcpu *vcpu) -{ - union kvm_mmu_page_role role =3D vcpu->arch.mmu->root_role; - int as_id =3D kvm_mmu_role_as_id(role); - struct kvm *kvm =3D vcpu->kvm; - struct kvm_mmu_page *root; - - for_each_valid_tdp_mmu_root_yield_safe(kvm, root, as_id) { - if (root->role.word =3D=3D role.word) - return root; - } - - return NULL; -} - int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu =3D vcpu->arch.mmu; union kvm_mmu_page_role role =3D mmu->root_role; + int as_id =3D kvm_mmu_role_as_id(role); struct kvm *kvm =3D vcpu->kvm; struct kvm_mmu_page *root; =20 /* - * Check for an existing root while holding mmu_lock for read to avoid + * Check for an existing root before acquiring the pages lock to avoid * unnecessary serialization if multiple vCPUs are loading a new root. * E.g. when bringing up secondary vCPUs, KVM will already have created * a valid root on behalf of the primary vCPU. */ read_lock(&kvm->mmu_lock); - root =3D kvm_tdp_mmu_try_get_root(vcpu); - read_unlock(&kvm->mmu_lock); =20 - if (root) - goto out; + for_each_valid_tdp_mmu_root_yield_safe(kvm, root, as_id) { + if (root->role.word =3D=3D role.word) + goto out_read_unlock; + } =20 - write_lock(&kvm->mmu_lock); + spin_lock(&kvm->arch.tdp_mmu_pages_lock); =20 /* - * Recheck for an existing root after acquiring mmu_lock for write. It - * is possible a new usable root was created between dropping mmu_lock - * (for read) and acquiring it for write. + * Recheck for an existing root after acquiring the pages lock, another + * vCPU may have raced ahead and created a new usable root. Manually + * walk the list of roots as the standard macros assume that the pages + * lock is *not* held. WARN if grabbing a reference to a usable root + * fails, as the last reference to a root can only be put *after* the + * root has been invalidated, which requires holding mmu_lock for write. */ - root =3D kvm_tdp_mmu_try_get_root(vcpu); - if (root) - goto out_unlock; + list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { + if (root->role.word =3D=3D role.word && + !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) + goto out_spin_unlock; + } =20 root =3D tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_sp(root, NULL, 0, role); @@ -280,14 +271,12 @@ int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) * is ultimately put by kvm_tdp_mmu_zap_invalidated_roots(). */ refcount_set(&root->tdp_mmu_root_count, 2); - - spin_lock(&kvm->arch.tdp_mmu_pages_lock); list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots); - spin_unlock(&kvm->arch.tdp_mmu_pages_lock); =20 -out_unlock: - write_unlock(&kvm->mmu_lock); -out: +out_spin_unlock: + spin_unlock(&kvm->arch.tdp_mmu_pages_lock); +out_read_unlock: + read_unlock(&kvm->mmu_lock); /* * Note, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS will prevent entering the guest * and actually consuming the root if it's invalidated after dropping --=20 2.43.0.275.g3460e3d667-goog From nobody Sun Feb 8 00:00:14 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C72F0DDD7 for ; Thu, 11 Jan 2024 02:01:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mrAA3av0" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-28bbe965867so3242294a91.1 for ; Wed, 10 Jan 2024 18:01:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704938467; x=1705543267; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=jlRZWhcWzn+fD2gGlO5tUrHHod5m3bTS9lxlAjC7rqo=; b=mrAA3av08b+1wU9XXI6KJBfPy9It8WVJVAkoqCHGd2dC+6xdLo8mtXh+26raYQgPqx 1LOP0tqzcxtccxpEBRclGRh89lbpB8RRWzZMM7WHBfq3H8Cxv8zWoODBU27sz16RV79u +Q8fKqETL8k4mC9v12Dv6VSKlAxJ0h1nq/hHs8e9uzbhkNLyEA7H/2Mto6Y9fK3sAqDq luMZPUvHVIKXfb9WcgrlOAuH1CInLo7Lx/YGM34prPMt0qAmMRCPVnppNEC8+g+FZ2Md k/W9xJqTopLRZQyW//coFpTh2mXeHl6LScWtlbh/8hJ4lpBwxIqIKcIak0ki2qr2CKGh nR3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704938467; x=1705543267; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jlRZWhcWzn+fD2gGlO5tUrHHod5m3bTS9lxlAjC7rqo=; b=BnGfGz+uAg1VLY3GBE/PtSpCMMYVx5N7NL8VjyHB1sVlP1ZW67JLFK25ZpROZufBQI f09Ezsayd5XHjKtgob9kwNGkALkbinfjOq3nDzA5zzapjKJZnwuz7zlYz2R+BdZ49F+Y izZV65Zbi56lIF+KmLSFRxis9IKMjWfZXrEGc1ko5OlWJF6xneUiYEWxlH32LWszee/P 9B5vHip3IujAZvpvuBwWADYQaCEjAVTwZQ+vkBdBPLdYg6PfCtU+3WTUMz6ka7Rw89GJ 4An1S5TB5yYv+dLNssZZ4UJDvGdAACEy/SoyfDih2Z8Sz7cqmwBGZqRC5S0d+zskzhn1 fCPA== X-Gm-Message-State: AOJu0YykF5lj2mwxyEs4FWVEYd6YcyvG/G5AgjOh35bgL6GWHTZv5qF5 cagb3TG+WDeTo/qk1PqcItNkexSeiqjXFmZAUw== X-Google-Smtp-Source: AGHT+IGljudK2WkxjvCUXom4s/baTziit1tp1cV/SDZ+pOowIaMwxm7n0RoCWyaSi/GHX1x6iql700kjaSA= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:2e04:b0:28d:c7ba:c44d with SMTP id sl4-20020a17090b2e0400b0028dc7bac44dmr2389pjb.9.1704938467174; Wed, 10 Jan 2024 18:01:07 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 10 Jan 2024 18:00:48 -0800 In-Reply-To: <20240111020048.844847-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111020048.844847-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111020048.844847-9-seanjc@google.com> Subject: [PATCH 8/8] KVM: x86/mmu: Free TDP MMU roots while holding mmy_lock for read From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Pattara Teerapong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Free TDP MMU roots from vCPU context while holding mmu_lock for read, it is completely legal to invoke kvm_tdp_mmu_put_root() as a reader. This eliminates the last mmu_lock writer in the TDP MMU's "fast zap" path after requesting vCPUs to reload roots, i.e. allows KVM to zap invalidated roots, free obsolete roots, and allocate new roots in parallel. On large VMs, e.g. 100+ vCPUs, allowing the bulk of the "fast zap" operation to run in parallel with freeing and allocating roots reduces the worst case latency for a vCPU to reload a root from 2-3ms to <100us. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index ea18aca23196..90773cdb73bb 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3575,10 +3575,14 @@ static void mmu_free_root_page(struct kvm *kvm, hpa= _t *root_hpa, if (WARN_ON_ONCE(!sp)) return; =20 - if (is_tdp_mmu_page(sp)) + if (is_tdp_mmu_page(sp)) { + lockdep_assert_held_read(&kvm->mmu_lock); kvm_tdp_mmu_put_root(kvm, sp); - else if (!--sp->root_count && sp->role.invalid) - kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); + } else { + lockdep_assert_held_write(&kvm->mmu_lock); + if (!--sp->root_count && sp->role.invalid) + kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); + } =20 *root_hpa =3D INVALID_PAGE; } @@ -3587,6 +3591,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t= *root_hpa, void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu, ulong roots_to_free) { + bool is_tdp_mmu =3D tdp_mmu_enabled && mmu->root_role.direct; int i; LIST_HEAD(invalid_list); bool free_active_root; @@ -3609,7 +3614,10 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_= mmu *mmu, return; } =20 - write_lock(&kvm->mmu_lock); + if (is_tdp_mmu) + read_lock(&kvm->mmu_lock); + else + write_lock(&kvm->mmu_lock); =20 for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) if (roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) @@ -3635,8 +3643,13 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_= mmu *mmu, mmu->root.pgd =3D 0; } =20 - kvm_mmu_commit_zap_page(kvm, &invalid_list); - write_unlock(&kvm->mmu_lock); + if (is_tdp_mmu) { + read_unlock(&kvm->mmu_lock); + WARN_ON_ONCE(!list_empty(&invalid_list)); + } else { + kvm_mmu_commit_zap_page(kvm, &invalid_list); + write_unlock(&kvm->mmu_lock); + } } EXPORT_SYMBOL_GPL(kvm_mmu_free_roots); =20 --=20 2.43.0.275.g3460e3d667-goog