From nobody Sat Oct 4 00:26:54 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B7E22EC56E; Fri, 22 Aug 2025 08:02:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755849768; cv=none; b=df4E9FmXUKpxWMkl1jzFrcVnKHc895TWcC6bsDT/fNrUC8E878tma0jkbKzB2doJTSkNInT0UzoI6RRUPAPw4LUMC6PtHCqaW1xJ6GiV3WWfUs5p2pmggrFEksBDGOhrAs3xBjrxIPd6Mw5PjFpvgN6+c7wMAxaD71bBaTiD1Os= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755849768; c=relaxed/simple; bh=ox5HagInQkEYD1GyuBjm/ab9Ea4XC/Vn17AhqV0cy7U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SWB7fhaBRNC5YRKHSBv05IwKtalyRZ+FHQy5bIPA/fv2vjxVfbYqPj4CzNlW6Pv0CkIIuUgzwejOZs7f3mESlJ66WbRS2ahlwbxZBGdzRL2J3vacXReTHauulrtVHKqauoicIgI0SxfHZbx9n36PDJqfELil3VfbUO0zYoPaKBk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KrV4KVFS; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KrV4KVFS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755849766; x=1787385766; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ox5HagInQkEYD1GyuBjm/ab9Ea4XC/Vn17AhqV0cy7U=; b=KrV4KVFS6gSGrB/zR3VghDCr+naorLKc4EaHKrYNXCRR0ATV9T0JS9mT Wc1y288ZesYys2RmdiR/kCyRp8Bh3/Pc85cI8w0AT080myVDlfQ8yQOV6 0ZBIS/uSODQvjNfix3FILUoPd0vgfnRKm7qmoXDRmPXK2oIRHvQK0Onga EcGjw7V+uTZWJH/ixjm/lqiNnJ1fE3mETfgHgeqPoOVHhDVqYypEb3MG7 0MHuA0Hcoc3h7vDbJUemq62gQxy0pLBy/O6x0EmuMp3lAV9AsO1/aIadw lgn4jeR7lryOnjft44uhPqbLyczE+l0FWEOKebliTFzgllMJ1XNG1i48z g==; X-CSE-ConnectionGUID: G4HMukrbSZC12nvmsDKupA== X-CSE-MsgGUID: byTF2TsNScWOl0XimPVx5w== X-IronPort-AV: E=McAfee;i="6800,10657,11529"; a="75734597" X-IronPort-AV: E=Sophos;i="6.17,309,1747724400"; d="scan'208";a="75734597" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Aug 2025 01:02:46 -0700 X-CSE-ConnectionGUID: nV5Fy5uyQ/e+psGksPr6ug== X-CSE-MsgGUID: yEvuChSPTjWxAFfP30bjmw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,309,1747724400"; d="scan'208";a="199530676" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Aug 2025 01:02:44 -0700 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com Cc: peterx@redhat.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Yan Zhao Subject: [PATCH v3 1/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking Date: Fri, 22 Aug 2025 16:02:03 +0800 Message-ID: <20250822080203.27247-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250822080100.27218-1-yan.y.zhao@intel.com> References: <20250822080100.27218-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Do not allow resetting dirty GFNs in memslots that do not enable dirty tracking. vCPUs' dirty rings are shared between userspace and KVM. After KVM sets dirtied entries in the dirty rings, userspace is responsible for harvesting/resetting these entries and calling the ioctl KVM_RESET_DIRTY_RINGS to inform KVM to advance the reset_index in the dirty rings and invoke kvm_arch_mmu_enable_log_dirty_pt_masked() to clear the SPTEs' dirty bits or perform write protection of the GFNs. Although KVM does not set dirty entries for GFNs in a memslot that does not enable dirty tracking, userspace can write arbitrary data into the dirty ring. This makes it possible for misbehaving userspace to specify that it has harvested a GFN from such a memslot. When this happens, KVM will be asked to clear dirty bits or perform write protection for GFNs in a memslot that does not enable dirty tracking, which is undesirable. For TDX, this unexpected resetting of dirty GFNs could cause inconsistency between the mirror SPTE and the external SPTE in hardware (e.g., the mirror SPTE has no write bit while the external SPTE is writable). When kvm_dirty_log_manual_protect_and_init_set() is true and huge pages are enabled in TDX, this could even lead to kvm_mmu_slot_gfn_write_protect() being called and trigger KVM_BUG_ON() due to permission reduction changes in the huge mirror SPTEs. Signed-off-by: Yan Zhao --- virt/kvm/dirty_ring.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c index 02bc6b00d76c..b38b4b7d7667 100644 --- a/virt/kvm/dirty_ring.c +++ b/virt/kvm/dirty_ring.c @@ -63,7 +63,13 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slo= t, u64 offset, u64 mask) =20 memslot =3D id_to_memslot(__kvm_memslots(kvm, as_id), id); =20 - if (!memslot || (offset + __fls(mask)) >=3D memslot->npages) + /* + * Userspace can write arbitrary data into the dirty ring, making it + * possible for misbehaving userspace to try to reset an out-of-memslot + * GFN or a GFN in a memslot that isn't being dirty-logged. + */ + if (!memslot || (offset + __fls(mask)) >=3D memslot->npages || + !kvm_slot_dirty_track_enabled(memslot)) return; =20 KVM_MMU_LOCK(kvm); --=20 2.43.2 From nobody Sat Oct 4 00:26:54 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 656632EAD01; Fri, 22 Aug 2025 08:03:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755849800; cv=none; b=WPR3PpzmgXvCo/itOn20pp6iCu4YulvoRQ1nutuavjMSLGL/0R5vxxawcJk1W7dFYuMgFpJ+BpZ4nG/STxxL4QlXXmlbCevf5bmQAU/fBnZItanNcyEtkveztQhoILv9m9/W3+FNqlCPxY5R/Fsf3+5oqqq9qka7V/Hd28/lmtw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755849800; c=relaxed/simple; bh=EVzPf3iifPs/2ZMUXNHNk5PkufPMs3ozehXKj84KWnk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JHkq+4hqKcThS5Ueoha5mlUfS/MjXrvQhC6+8Fua2A53U/GjGWRJyudbuOdYZH2/Ewb8nu23Q3tOCDbbL92/SKznnf/yCcaUbKx82hCVph5P3BWNtlacSmE0WS4LBCCzQ84+aOPty0rtV/t+X3GTE5iO7rdSD/FM5fWywxVQlZY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=F3QO8FD4; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="F3QO8FD4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755849798; x=1787385798; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EVzPf3iifPs/2ZMUXNHNk5PkufPMs3ozehXKj84KWnk=; b=F3QO8FD4D69TgJVneslkcoxEnFzpDuoDbnLyLWXg2w4slPoPLzASeqfn 4RgouK/q4I2IQLWZBBTqlNAr9zLvUMe5XithxOwazMO33lG+h8s16IHiO hiiLP8yIOx4xc+mNecTKno8WLUDiIkaael6o/rJJLsJbs4hNHTIt56HRE HUxtcpqT0L7p62R8Lzwob7g8zC7qXVuicttfZAIa8Xjw41Kgw0T5qF8Dq End1qcX+fp2Proh51oT4LzkQ1E8sunddFKen/Y+oAQa9ljQerSu0JrVsj it5GwLRznP0TeusMiHFKyDinvBO6RaBykDJ2CRV3/IWnzHgh+iVYLE4jl A==; X-CSE-ConnectionGUID: rWKqiFNKRGurTV18MZFCPw== X-CSE-MsgGUID: XgQYl4ZsQw632kdUPlBo6g== X-IronPort-AV: E=McAfee;i="6800,10657,11529"; a="58078198" X-IronPort-AV: E=Sophos;i="6.17,309,1747724400"; d="scan'208";a="58078198" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Aug 2025 01:03:17 -0700 X-CSE-ConnectionGUID: 7xYl+hhAShS3eaKN54z8kA== X-CSE-MsgGUID: 6vp66775TT2JXJdzYyYKsg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,309,1747724400"; d="scan'208";a="199606690" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Aug 2025 01:03:15 -0700 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com Cc: peterx@redhat.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Yan Zhao Subject: [PATCH v3 2/3] KVM: Skip invoking shared memory handler for entirely private GFN ranges Date: Fri, 22 Aug 2025 16:02:34 +0800 Message-ID: <20250822080235.27274-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250822080100.27218-1-yan.y.zhao@intel.com> References: <20250822080100.27218-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a GFN range is entirely private, it's unnecessary for kvm_handle_hva_range() to invoke handlers for the GFN range, because 1) the gfn_range.attr_filter for the handler is KVM_FILTER_SHARED, which is for shared mappings only; 2) KVM has already zapped all shared mappings before setting the memory attribute to private. This can avoid unnecessary zaps on private mappings for VMs of type KVM_X86_SW_PROTECTED_VM, e.g., during auto numa balancing scans of VMAs. Signed-off-by: Yan Zhao --- virt/kvm/kvm_main.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f769d1dccc21..e615ad405ce4 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -620,6 +620,17 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_ran= ge(struct kvm *kvm, gfn_range.slot =3D slot; gfn_range.lockless =3D range->lockless; =20 +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES + /* + * If GFN range are all private, no need to invoke the + * handler. + */ + if (kvm_range_has_memory_attributes(kvm, gfn_range.start, + gfn_range.end, ~0, + KVM_MEMORY_ATTRIBUTE_PRIVATE)) + continue; +#endif + if (!r.found_memslot) { r.found_memslot =3D true; if (!range->lockless) { --=20 2.43.2 From nobody Sat Oct 4 00:26:54 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B3E12AD20; Fri, 22 Aug 2025 08:03:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755849833; cv=none; b=MbrXqqoa1T2iFwkN+aRRun9AzKw4mccTnIVWGEvAZce4ddQu+G9Lrp3pJ+4cnN1NAKABIbRjjigcK4dZ34TvpFcmrawOxSkXuyHBnNSw7dgsTquwpdit2KtK+1U5CptVhWWcRaexYMWJQrvjxK5/olww486rQgm/Zg6FLgXtq/0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755849833; c=relaxed/simple; bh=PncTHgIjF3nJ8qLaY+C8va7O1RZ8dN4JMH8eZV6EYQk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qRZwQrQ4gIkgfUJIgfInaTJp6aF0WvlV4IvtjFFtw9Jq4kSSkC8iI1EQiHam9mXjisMQa6cthGLtys50q/eo8+wK2aG3xJDuAfgqDHqisC8j5QO5sQHa0J4p+jg9k4YcmdbrJVJzwIUxaNf9UtQMFOk7cqhsZLRqiQjtF/I7G9g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Xr43KDVn; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Xr43KDVn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755849831; x=1787385831; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PncTHgIjF3nJ8qLaY+C8va7O1RZ8dN4JMH8eZV6EYQk=; b=Xr43KDVnxF4ztN6/KUxsD8LQr6F+maFhfQIhoiG7y0TdQNUPNyYqYXO7 nLEv8JYw3dO3X8KkfUyS7SJGdGuge+PQ/2emAbgf9g9nxlX2V/9DjouQW dQgLuoJKiSjKLWxq0/0XWNgpzH/0Lx1ouUaiiDz0F6rrvbWj3sseGRed9 SeJodb+3nSazorPWZzKKQv9peAM7atNiw1+VTysOlh4JpGVDQ98kPx66G Xc//9JPZln5sNu3HDfbAvxh6iTcITIBb0+x5/Jm1sNYmZ+nMB+NK+ek7G /8abPF9tqldlXog1kUWRqBuHBtRH97Qw+IqXYN6qWumxaCFqDqpwicIig Q==; X-CSE-ConnectionGUID: QOr3fXixQM+GsGGfmOR3Bg== X-CSE-MsgGUID: 6eqt+MQ1TJmLIgsqI6T+Lg== X-IronPort-AV: E=McAfee;i="6800,10657,11529"; a="75610814" X-IronPort-AV: E=Sophos;i="6.17,309,1747724400"; d="scan'208";a="75610814" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Aug 2025 01:03:50 -0700 X-CSE-ConnectionGUID: oF4KtcS9Rg2bgbcEi4LMBQ== X-CSE-MsgGUID: J8sXjZpqQlmiblrnSKWjUw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,309,1747724400"; d="scan'208";a="168555292" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Aug 2025 01:03:49 -0700 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com Cc: peterx@redhat.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Yan Zhao Subject: [PATCH v3 3/3] KVM: selftests: Test resetting dirty ring in gmem slots in protected VMs Date: Fri, 22 Aug 2025 16:03:04 +0800 Message-ID: <20250822080304.27304-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250822080100.27218-1-yan.y.zhao@intel.com> References: <20250822080100.27218-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Test resetting dirty ring in slots with the KVM_MEM_GUEST_MEMFD flag in KVM_X86_SW_PROTECTED_VM VMs. Purposely resetting dirty ring entries incorrectly to point to a gmem slot. Unlike in TDX VMs, where resetting the dirty ring in a gmem slot could trigger KVM_BUG_ON(), there are no obvious errors for KVM_X86_SW_PROTECTED_VM VMs. Therefore, detect SPTE changes by reading trace messages with the kvm_tdp_mmu_spte_changed event enabled. Consequently, the test is conducted only when tdp_mmu is enabled and tracing is available. Signed-off-by: Yan Zhao --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../kvm/x86/reset_dirty_ring_on_gmem_test.c | 392 ++++++++++++++++++ 2 files changed, 393 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/reset_dirty_ring_on_gme= m_test.c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index f6fe7a07a0a2..ebd1d829c3f9 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -136,6 +136,7 @@ TEST_GEN_PROGS_x86 +=3D x86/max_vcpuid_cap_test TEST_GEN_PROGS_x86 +=3D x86/triple_fault_event_test TEST_GEN_PROGS_x86 +=3D x86/recalc_apic_map_test TEST_GEN_PROGS_x86 +=3D x86/aperfmperf_test +TEST_GEN_PROGS_x86 +=3D x86/reset_dirty_ring_on_gmem_test TEST_GEN_PROGS_x86 +=3D access_tracking_perf_test TEST_GEN_PROGS_x86 +=3D coalesced_io_test TEST_GEN_PROGS_x86 +=3D dirty_log_perf_test diff --git a/tools/testing/selftests/kvm/x86/reset_dirty_ring_on_gmem_test.= c b/tools/testing/selftests/kvm/x86/reset_dirty_ring_on_gmem_test.c new file mode 100644 index 000000000000..cf1746c0149f --- /dev/null +++ b/tools/testing/selftests/kvm/x86/reset_dirty_ring_on_gmem_test.c @@ -0,0 +1,392 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Test reset dirty ring on gmem slot on x86. + * Copyright (C) 2025, Intel, Inc. + * + * The slot flag KVM_MEM_GUEST_MEMFD is incompatible with the flag + * KVM_MEM_LOG_DIRTY_PAGES, which means KVM does not permit dirty page tra= cking + * on gmem slots. + * + * When dirty ring is enabled, although KVM does not mark GFNs in gmem slo= ts as + * dirty, userspace can reset and write arbitrary data into the dirty ring + * entries shared between KVM and userspace. This can lead KVM to incorrec= tly + * clear write permission or dirty bits on SPTEs of gmem slots. + * + * While this might be harmless for non-TDX VMs, it could cause inconsiste= ncies + * between the mirror SPTEs and the external SPTEs in hardware, or even tr= igger + * KVM_BUG_ON() for TDX. + * + * Purposely reset dirty ring incorrectly on gmem slots (gmem slots do not= allow + * dirty page tracking) to verify malbehaved userspace cannot cause any SP= TE + * permission reduction change. + * + * Steps conducted in this test: + * 1. echo nop > ${TRACING_ROOT}/current_tracer + * echo 1 > ${TRACING_ROOT}/events/kvmmmu/kvm_tdp_mmu_spte_changed/enab= le + * echo > ${TRACING_ROOT}/set_event_pid + * echo > ${TRACING_ROOT}/set_event_notrace_pid + * + * 2. echo "common_pid =3D=3D $tid && gfn =3D=3D 0xc0400" > \ + * ${TRACING_ROOT}/events/kvmmmu/kvm_tdp_mmu_spte_changed/filter + * + * 3. echo 0 > ${TRACING_ROOT}/tracing_on + * echo > ${TRACING_ROOT}/trace + * echo 1 > ${TRACING_ROOT}/tracing_on + * + * 4. purposely reset dirty ring incorrectly + * + * 5. cat ${TRACING_ROOT}/trace + */ +#include +#include +#include +#include +#include + +#define DEBUGFS "/sys/kernel/debug/tracing" +#define TRACEFS "/sys/kernel/tracing" + +#define TEST_DIRTY_RING_GPA (0xc0400000) +#define TEST_DIRTY_RING_GVA (0x90400000) +#define TEST_DIRTY_RING_REGION_SLOT 11 +#define TEST_DIRTY_RING_REGION_SIZE 0x200000 +#define TEST_DIRTY_RING_COUNT 4096 +#define TEST_DIRTY_RING_GUEST_WRITE_MAX_CNT 3 + +static const char *PATTEN =3D "spte_changed"; +static char *tracing_root; + +static int open_path(char *subpath, int flags) +{ + static char path[100]; + int count, fd; + + count =3D snprintf(path, sizeof(path), "%s/%s", tracing_root, subpath); + TEST_ASSERT(count > 0, "Incorrect path\n"); + fd =3D open(path, flags); + TEST_ASSERT(fd >=3D 0, "Cannot open %s\n", path); + + return fd; +} + +static void setup_tracing(void) +{ + int fd; + + /* set current_tracer to nop */ + fd =3D open_path("current_tracer", O_WRONLY); + test_write(fd, "nop\n", 4); + close(fd); + + /* turn on event kvm_tdp_mmu_spte_changed */ + fd =3D open_path("events/kvmmmu/kvm_tdp_mmu_spte_changed/enable", O_WRONL= Y); + test_write(fd, "1\n", 2); + close(fd); + + /* clear set_event_pid & set_event_notrace_pid */ + fd =3D open_path("set_event_pid", O_WRONLY | O_TRUNC); + close(fd); + + fd =3D open_path("set_event_notrace_pid", O_WRONLY | O_TRUNC); + close(fd); +} + +static void filter_event(void) +{ + int count, fd; + char buf[100]; + + fd =3D open_path("events/kvmmmu/kvm_tdp_mmu_spte_changed/filter", + O_WRONLY | O_TRUNC); + + count =3D snprintf(buf, sizeof(buf), "common_pid =3D=3D %d && gfn =3D=3D = 0x%x\n", + gettid(), TEST_DIRTY_RING_GPA >> PAGE_SHIFT); + TEST_ASSERT(count > 0, "Incorrect number of data written\n"); + test_write(fd, buf, count); + close(fd); +} + +static void enable_tracing(bool enable) +{ + char *val =3D enable ? "1\n" : "0\n"; + int fd; + + if (enable) { + /* clear trace log before enabling */ + fd =3D open_path("trace", O_WRONLY | O_TRUNC); + close(fd); + } + + fd =3D open_path("tracing_on", O_WRONLY); + test_write(fd, val, 2); + close(fd); +} + +static void reset_tracing(void) +{ + enable_tracing(false); + enable_tracing(true); +} + +static void detect_spte_change(void) +{ + static char buf[1024]; + FILE *file; + int count; + + count =3D snprintf(buf, sizeof(buf), "%s/trace", tracing_root); + TEST_ASSERT(count > 0, "Incorrect path\n"); + file =3D fopen(buf, "r"); + TEST_ASSERT(file, "Cannot open %s\n", buf); + + while (fgets(buf, sizeof(buf), file)) + TEST_ASSERT(!strstr(buf, PATTEN), "Unexpected SPTE change %s\n", buf); + + fclose(file); +} + +/* + * Write to a gmem slot and exit to host after each write to allow host to= check + * dirty ring. + */ +void guest_code(void) +{ + uint64_t count =3D 0; + + while (count < TEST_DIRTY_RING_GUEST_WRITE_MAX_CNT) { + count++; + memset((void *)TEST_DIRTY_RING_GVA, 1, 8); + GUEST_SYNC(count); + } + GUEST_DONE(); +} + +/* + * Verify that KVM_MEM_LOG_DIRTY_PAGES cannot be set on a memslot with flag + * KVM_MEM_GUEST_MEMFD. + */ +static void verify_turn_on_log_dirty_pages_flag(struct kvm_vcpu *vcpu) +{ + struct userspace_mem_region *region; + int ret; + + region =3D memslot2region(vcpu->vm, TEST_DIRTY_RING_REGION_SLOT); + region->region.flags |=3D KVM_MEM_LOG_DIRTY_PAGES; + + ret =3D __vm_ioctl(vcpu->vm, KVM_SET_USER_MEMORY_REGION2, ®ion->region= ); + + TEST_ASSERT(ret, "KVM_SET_USER_MEMORY_REGION2 incorrectly succeeds\n"); + region->region.flags &=3D ~KVM_MEM_LOG_DIRTY_PAGES; +} + +static inline bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn) +{ + return smp_load_acquire(&gfn->flags) =3D=3D KVM_DIRTY_GFN_F_DIRTY; +} + +static inline void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn) +{ + smp_store_release(&gfn->flags, KVM_DIRTY_GFN_F_RESET); +} + +static bool dirty_ring_empty(struct kvm_vcpu *vcpu) +{ + struct kvm_dirty_gfn *dirty_gfns =3D vcpu_map_dirty_ring(vcpu); + struct kvm_dirty_gfn *cur; + int i; + + for (i =3D 0; i < TEST_DIRTY_RING_COUNT; i++) { + cur =3D &dirty_gfns[i]; + + if (dirty_gfn_is_dirtied(cur)) + return false; + } + return true; +} + +/* + * Purposely reset the dirty ring incorrectly by resetting a dirty ring en= try + * even when KVM does not report the entry as dirty. + * + * In the kvm_dirty_gfn entry, specify the slot to the gmem slot that does= not + * allow dirty page tracking and has no flag KVM_MEM_LOG_DIRTY_PAGES. + */ +static void reset_dirty_ring(struct kvm_vcpu *vcpu, int *reset_index) +{ + struct kvm_dirty_gfn *dirty_gfns =3D vcpu_map_dirty_ring(vcpu); + struct kvm_dirty_gfn *cur =3D &dirty_gfns[*reset_index]; + uint32_t cleared; + + reset_tracing(); + + cur->slot =3D TEST_DIRTY_RING_REGION_SLOT; + cur->offset =3D 0; + dirty_gfn_set_collected(cur); + cleared =3D kvm_vm_reset_dirty_ring(vcpu->vm); + *reset_index +=3D cleared; + TEST_ASSERT(cleared =3D=3D 1, "Unexpected cleared count %d\n", cleared); + + detect_spte_change(); +} + +/* + * The vCPU worker to loop vcpu_run(). After each vCPU access to a GFN, ch= eck if + * the dirty ring is empty and reset the dirty ring. + */ +static void reset_dirty_ring_worker(struct kvm_vcpu *vcpu) +{ + struct kvm_run *run =3D vcpu->run; + struct ucall uc; + uint64_t cmd; + int index =3D 0; + + filter_event(); + while (1) { + vcpu_run(vcpu); + + if (run->exit_reason =3D=3D KVM_EXIT_IO) { + cmd =3D get_ucall(vcpu, &uc); + if (cmd !=3D UCALL_SYNC) + break; + + TEST_ASSERT(dirty_ring_empty(vcpu), + "Guest write should not cause GFN dirty\n"); + + reset_dirty_ring(vcpu, &index); + } + } +} + +static struct kvm_vm *create_vm(unsigned long vm_type, struct kvm_vcpu **v= cpu, + bool private) +{ + unsigned int npages =3D TEST_DIRTY_RING_REGION_SIZE / getpagesize(); + const struct vm_shape shape =3D { + .mode =3D VM_MODE_DEFAULT, + .type =3D vm_type, + }; + struct kvm_vm *vm; + + vm =3D __vm_create(shape, 1, 0); + vm_enable_dirty_ring(vm, TEST_DIRTY_RING_COUNT * sizeof(struct kvm_dirty_= gfn)); + *vcpu =3D vm_vcpu_add(vm, 0, guest_code); + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, + TEST_DIRTY_RING_GPA, + TEST_DIRTY_RING_REGION_SLOT, + npages, KVM_MEM_GUEST_MEMFD); + vm->memslots[MEM_REGION_TEST_DATA] =3D TEST_DIRTY_RING_REGION_SLOT; + virt_map(vm, TEST_DIRTY_RING_GVA, TEST_DIRTY_RING_GPA, npages); + if (private) + vm_mem_set_private(vm, TEST_DIRTY_RING_GPA, + TEST_DIRTY_RING_REGION_SIZE); + return vm; +} + +struct test_config { + unsigned long vm_type; + bool manual_protect_and_init_set; + bool private_access; + char *test_desc; +}; + +void test_dirty_ring_on_gmem_slot(struct test_config *config) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + + if (config->vm_type && + !(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(config->vm_type))) { + ksft_test_result_skip("\n"); + return; + } + + vm =3D create_vm(config->vm_type, &vcpu, config->private_access); + + /* + * Let KVM detect that kvm_dirty_log_manual_protect_and_init_set() is + * true in kvm_arch_mmu_enable_log_dirty_pt_masked() to check if + * kvm_mmu_slot_gfn_write_protect() will be called on a gmem memslot. + */ + if (config->manual_protect_and_init_set) { + u64 manual_caps; + + manual_caps =3D kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); + + manual_caps &=3D (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | + KVM_DIRTY_LOG_INITIALLY_SET); + + if (!manual_caps) + return; + + vm_enable_cap(vm, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, manual_caps); + } + + verify_turn_on_log_dirty_pages_flag(vcpu); + + reset_dirty_ring_worker(vcpu); + + kvm_vm_free(vm); + ksft_test_result_pass("\n"); +} + +static bool dirty_ring_supported(void) +{ + return (kvm_has_cap(KVM_CAP_DIRTY_LOG_RING) || + kvm_has_cap(KVM_CAP_DIRTY_LOG_RING_ACQ_REL)); +} + +static bool has_tracing(void) +{ + if (faccessat(AT_FDCWD, DEBUGFS, F_OK, AT_EACCESS) =3D=3D 0) { + tracing_root =3D DEBUGFS; + return true; + } + + if (faccessat(AT_FDCWD, TRACEFS, F_OK, AT_EACCESS) =3D=3D 0) { + tracing_root =3D TRACEFS; + return true; + } + + return false; +} + +static struct test_config tests[] =3D { + { + .vm_type =3D KVM_X86_SW_PROTECTED_VM, + .manual_protect_and_init_set =3D false, + .private_access =3D true, + .test_desc =3D "SW_PROTECTED_VM, manual_protect_and_init_set=3Dfalse, pr= ivate access", + }, + { + .vm_type =3D KVM_X86_SW_PROTECTED_VM, + .manual_protect_and_init_set =3D true, + .private_access =3D true, + .test_desc =3D "SW_PROTECTED_VM, manual_protect_and_init_set=3Dtrue, pri= vate access", + }, +}; + +int main(int argc, char **argv) +{ + int test_cnt =3D ARRAY_SIZE(tests); + + ksft_print_header(); + ksft_set_plan(test_cnt); + + TEST_REQUIRE(get_kvm_param_bool("tdp_mmu")); + TEST_REQUIRE(has_tracing()); + TEST_REQUIRE(dirty_ring_supported()); + + setup_tracing(); + + for (int i =3D 0; i < test_cnt; i++) { + pthread_t vm_thread; + + pthread_create(&vm_thread, NULL, + (void *(*)(void *))test_dirty_ring_on_gmem_slot, + &tests[i]); + pthread_join(vm_thread, NULL); + } + + ksft_finished(); + return 0; +} --=20 2.43.2