From nobody Mon Feb 9 06:25:01 2026 Received: from mail-qk1-f201.google.com (mail-qk1-f201.google.com [209.85.222.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75DD31F03FE for ; Wed, 4 Dec 2024 19:14:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733339663; cv=none; b=najhQ9ww8p02QQ+TNtH8y7g/SPHbf3xCRUBFIOtFPD7LZNkP+D3gFOTbIZaxu0CU6nHJrFrUUGFgZlPbS4OmPUYhH9w3/KqxQgIQYkQR3hx8OQkK3rGeawwIQ/AxPZvfpw0xmqyXzFfnOfH+x1fkqaYIOmqyZB0vaI0h0k7gCtY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733339663; c=relaxed/simple; bh=XRqLRNPGp4B36Newh1hGDaGyYURJiyoWnca3tYrEiRo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ebkQS3q3elEy+w8oX/nSxX+6m5DZCd6gyanA1itaPgTm8ZuI+yQIkwgq7qax6/G6LTKojSnxx9LBh8UiW8tpLsTIWuDWDGX4X5X21QGtzR6b4yGRz8fGBsJBcv2FoYttm/EGREKvR03JTMAA9Q0EZE7R3G5bUG/q9RCj5dlvyEc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=jOAxqi7F; arc=none smtp.client-ip=209.85.222.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jthoughton.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jOAxqi7F" Received: by mail-qk1-f201.google.com with SMTP id af79cd13be357-7b67201e64eso12125685a.0 for ; Wed, 04 Dec 2024 11:14:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733339660; x=1733944460; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tFj98HJEqxlM2yS1YScvzRxxJLlbahOUdL6PgedEnEc=; b=jOAxqi7FUI8TWziOOADTcMJ+ff3QMXEPr5+U2QMXBrOHGN8+j/HLOsH2fLdk9ZHLvg A/zhy5KpMHbODaovtM7BBdDtJaD3vl4aMPs9CnaYQO2ud/dAV9/GN8mCA9XeUhMkIdTj oZs/otNILoTrv99SbEkkPlf3dXGRonlZQjWyIYPAn05CsfxOEJhve+1b/oEbBb50tyN4 tc2qadAlZmM+mamEd7uci3YFrd/5XNE5ZxDkuQPZWFk9x/rAAeVsLlyRkgIRkSYIE6R+ zcySB3ktJ9wQX/6Mf/ra/K1uTAvkNfOODxQz/10WL4tYuAfJKFnIW7SlGtyAR8jv2VJC 47Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733339660; x=1733944460; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tFj98HJEqxlM2yS1YScvzRxxJLlbahOUdL6PgedEnEc=; b=slewNZ77nWiheAk2svaCcCM4GMc5/zu8XEGWMlCrTYlrVEbiCMhMZe3PqGGgx7ySQ5 JLnVXuNnhdZhtW9Jl092HHT75b7FMP1PfvlxBFf7/lpLFfY+WHPf8pfIFC9c8h90C/eh 5tqoFEL1W3qrpdZqay36QSwccMFWOCXGsdJIWpAPcfDdz/bWyHGKGIFSZWzlfdQ0mQ6A HUAr7LOxuwvLV+nY2WmYqBy/AVcoIJkD0b7s9BIgEfsVT822zMD7o7PQe9gpsoTlUMZk iYZh0fBxgrggOP8HEYiHLiu/MBMM5ObgkLRI6o0h2q6DlIPxOgsirT7yr+E6+Ig0iapZ FZKg== X-Forwarded-Encrypted: i=1; AJvYcCXfj2w9qOwBPBQ8pWGDp3fYu6M7FOoAahmDqqvsuOvQSnMuz/W91dc/KvNkm0Nw41yq7PXzAx+Vnr/2xiY=@vger.kernel.org X-Gm-Message-State: AOJu0Yw6Udk/l2TaLkeDnQixAoZIoP8mJijuNPZSd6kD8waNxcanzKn6 xuHwWv11YpifzaDd82skuPUQiaIGXx0vqjznKMXR3WwH0fYm4zqPKlQgGjd0IBPhpIJbaq0nazk DPGo/fO37Tp7diBNeaA== X-Google-Smtp-Source: AGHT+IEjAFXSEtfmGOodffFw31TXE80vwkJvDLAFbZ3yUQDUssUZgM+UMQkcUbluyRy21fc1aO4FFwapwVcSSi9j X-Received: from qtbci19.prod.google.com ([2002:a05:622a:2613:b0:462:aa34:aae2]) (user=jthoughton job=prod-delivery.src-stubby-dispatcher) by 2002:a05:620a:3188:b0:7b6:73f5:2867 with SMTP id af79cd13be357-7b6a61c819cmr926705785a.44.1733339660438; Wed, 04 Dec 2024 11:14:20 -0800 (PST) Date: Wed, 4 Dec 2024 19:13:40 +0000 In-Reply-To: <20241204191349.1730936-1-jthoughton@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241204191349.1730936-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.338.g60cca15819-goog Message-ID: <20241204191349.1730936-6-jthoughton@google.com> Subject: [PATCH v1 05/13] KVM: x86/mmu: Add support for KVM_MEM_USERFAULT From: James Houghton To: Paolo Bonzini , Sean Christopherson Cc: Jonathan Corbet , Marc Zyngier , Oliver Upton , Yan Zhao , James Houghton , Nikita Kalyazin , Anish Moorthy , Peter Gonda , Peter Xu , David Matlack , Wang@google.com, Wei W , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Adhering to the requirements of KVM Userfault: 1. Zap all sptes for the memslot when KVM_MEM_USERFAULT is toggled on with kvm_arch_flush_shadow_memslot(). 2. Only all PAGE_SIZE sptes while KVM_MEM_USERFAULT is enabled (for both normal/GUP memory and guest_memfd memory). 3. Reconstruct huge mappings when KVM_MEM_USERFAULT is toggled off with kvm_mmu_recover_huge_pages(). With the new logic in kvm_mmu_slot_apply_flags(), I've simplified the two dirty-logging-toggle checks into one, and I have dropped the WARN_ON() that was there. Signed-off-by: James Houghton --- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 27 +++++++++++++++++++++---- arch/x86/kvm/mmu/mmu_internal.h | 20 +++++++++++++++--- arch/x86/kvm/x86.c | 36 ++++++++++++++++++++++++--------- include/linux/kvm_host.h | 5 ++++- 5 files changed, 71 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index ea2c4f21c1ca..286c6825cd1c 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -47,6 +47,7 @@ config KVM_X86 select KVM_GENERIC_PRE_FAULT_MEMORY select KVM_GENERIC_PRIVATE_MEM if KVM_SW_PROTECTED_VM select KVM_WERROR if WERROR + select HAVE_KVM_USERFAULT =20 config KVM tristate "Kernel-based Virtual Machine (KVM) support" diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 22e7ad235123..2f7381255d11 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4292,14 +4292,19 @@ static inline u8 kvm_max_level_for_order(int order) return PG_LEVEL_4K; } =20 -static u8 kvm_max_private_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, - u8 max_level, int gmem_order) +static u8 kvm_max_private_mapping_level(struct kvm *kvm, + struct kvm_memory_slot *slot, + kvm_pfn_t pfn, u8 max_level, + int gmem_order) { u8 req_max_level; =20 if (max_level =3D=3D PG_LEVEL_4K) return PG_LEVEL_4K; =20 + if (kvm_memslot_userfault(slot)) + return PG_LEVEL_4K; + max_level =3D min(kvm_max_level_for_order(gmem_order), max_level); if (max_level =3D=3D PG_LEVEL_4K) return PG_LEVEL_4K; @@ -4336,8 +4341,10 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vc= pu *vcpu, } =20 fault->map_writable =3D !(fault->slot->flags & KVM_MEM_READONLY); - fault->max_level =3D kvm_max_private_mapping_level(vcpu->kvm, fault->pfn, - fault->max_level, max_order); + fault->max_level =3D kvm_max_private_mapping_level(vcpu->kvm, fault->slot, + fault->pfn, + fault->max_level, + max_order); =20 return RET_PF_CONTINUE; } @@ -4346,6 +4353,18 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vc= pu, struct kvm_page_fault *fault) { unsigned int foll =3D fault->write ? FOLL_WRITE : 0; + int userfault; + + userfault =3D kvm_gfn_userfault(vcpu->kvm, fault->slot, fault->gfn); + if (userfault < 0) + return userfault; + if (userfault) { + kvm_mmu_prepare_userfault_exit(vcpu, fault); + return -EFAULT; + } + + if (kvm_memslot_userfault(fault->slot)) + fault->max_level =3D PG_LEVEL_4K; =20 if (fault->is_private) return kvm_mmu_faultin_pfn_private(vcpu, fault); diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index b00abbe3f6cf..15705faa3b67 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -282,12 +282,26 @@ enum { RET_PF_SPURIOUS, }; =20 -static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, - struct kvm_page_fault *fault) +static inline void __kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vc= pu, + struct kvm_page_fault *fault, + bool is_userfault) { kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, PAGE_SIZE, fault->write, fault->exec, - fault->is_private); + fault->is_private, + is_userfault); +} + +static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + __kvm_mmu_prepare_memory_fault_exit(vcpu, fault, false); +} + +static inline void kvm_mmu_prepare_userfault_exit(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + __kvm_mmu_prepare_memory_fault_exit(vcpu, fault, true); } =20 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_o= r_gpa, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2e713480933a..2f7080fd6218 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13053,12 +13053,36 @@ static void kvm_mmu_slot_apply_flags(struct kvm *= kvm, u32 new_flags =3D new ? new->flags : 0; bool log_dirty_pages =3D new_flags & KVM_MEM_LOG_DIRTY_PAGES; =20 + /* + * When toggling KVM Userfault on, zap all sptes so that userfault-ness + * will be respected at refault time. All new faults will only install + * small sptes. Therefore, when toggling it off, recover hugepages. + * + * For MOVE and DELETE, there will be nothing to do, as the old + * mappings will have already been deleted by + * kvm_arch_flush_shadow_memslot(). + * + * For CREATE, no mappings will have been created yet. + */ + if ((old_flags ^ new_flags) & KVM_MEM_USERFAULT && + (change =3D=3D KVM_MR_FLAGS_ONLY)) { + if (old_flags & KVM_MEM_USERFAULT) + kvm_mmu_recover_huge_pages(kvm, new); + else + kvm_arch_flush_shadow_memslot(kvm, old); + } + + /* + * Nothing more to do if dirty logging isn't being toggled. + */ + if (!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES)) + return; + /* * Update CPU dirty logging if dirty logging is being toggled. This * applies to all operations. */ - if ((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES) - kvm_mmu_update_cpu_dirty_logging(kvm, log_dirty_pages); + kvm_mmu_update_cpu_dirty_logging(kvm, log_dirty_pages); =20 /* * Nothing more to do for RO slots (which can't be dirtied and can't be @@ -13078,14 +13102,6 @@ static void kvm_mmu_slot_apply_flags(struct kvm *k= vm, if ((change !=3D KVM_MR_FLAGS_ONLY) || (new_flags & KVM_MEM_READONLY)) return; =20 - /* - * READONLY and non-flags changes were filtered out above, and the only - * other flag is LOG_DIRTY_PAGES, i.e. something is wrong if dirty - * logging isn't being toggled on or off. - */ - if (WARN_ON_ONCE(!((old_flags ^ new_flags) & KVM_MEM_LOG_DIRTY_PAGES))) - return; - if (!log_dirty_pages) { /* * Recover huge page mappings in the slot now that dirty logging diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f7a3dfd5e224..9e8a8dcf2b73 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2465,7 +2465,8 @@ static inline void kvm_account_pgtable_pages(void *vi= rt, int nr) static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, gpa_t gpa, gpa_t size, bool is_write, bool is_exec, - bool is_private) + bool is_private, + bool is_userfault) { vcpu->run->exit_reason =3D KVM_EXIT_MEMORY_FAULT; vcpu->run->memory_fault.gpa =3D gpa; @@ -2475,6 +2476,8 @@ static inline void kvm_prepare_memory_fault_exit(stru= ct kvm_vcpu *vcpu, vcpu->run->memory_fault.flags =3D 0; if (is_private) vcpu->run->memory_fault.flags |=3D KVM_MEMORY_EXIT_FLAG_PRIVATE; + if (is_userfault) + vcpu->run->memory_fault.flags |=3D KVM_MEMORY_EXIT_FLAG_USERFAULT; } =20 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES --=20 2.47.0.338.g60cca15819-goog